/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108346672 & >>108341869►News>(03/11) Nemotron 3 Super released: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
Today is the last day of the week for Google to release anything and it's probably going to be Gemini 3.1 Flash; nothing local.
>>108353262That Miku's breasts are far too large.
>>108353250>Karl Voss>Dr. Elena Voss>Zinnia Voss>Dr. Eleanor Voss>Seraphine Vosshello sloppa
Can I start doomposting about Deepseek now that Hunter Alpha is out in the wild and thoroughly mediocre?
>>108353291no, it can be any chines
>>108353282Oh yeah I almost forgot about Seraphina
►Recent Highlights from the Previous Thread: >>108346672--NVIDIA Nemotron-3-Super-120B-A12B-BF16 release and benchmark analysis:>108346846 >108346876 >108346885 >108347098--Qwen3.5 397B only 15% better than 4B on benchmarks:>108347895 >108347950 >108347919 >108347934 >108347984 >108347997 >108348009 >108348025 >108348029--Nvidia's $26B open-weight AI investment and market dominance:>108351880 >108351911 >108351918 >108351943 >108351916 >108351923 >108351938 >108351942--runescape-bench: AI Agent Benchmark for RuneScape:>108348559 >108348568 >108348578 >108348645 >108349022 >108349071--Lightweight local models for grammar/spelling correction:>108350949 >108350957 >108350968 >108350966 >108351087 >108351094--Speculation about OpenRouter's Hunter/Healer Alpha models being DeepSeek V4:>108349438 >108349555 >108349668 >108350072 >108350416 >108350453 >108350469 >108349488 >108349636 >108349674 >108350692--Qwen3-VL video captioning tool with VRAM requirements discussion:>108350529 >108351412--Nemotron-3-Super issues and cockbench hosting solutions:>108348570 >108348592 >108348641 >108348841 >108348854 >108349297 >108349305 >108348911--Qwen3.5-generated retro terminal video with glitch effects:>108349360 >108349365 >108349425 >108349434 >108349586--llama.cpp whitespace cleanup PR:>108348889--Miku (free space):>108348792 >108350529 >108351926 >108352986►Recent Highlight Posts from the Previous Thread: >>108347000Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108353304this is a grave act of tettorism
>>108353304>>108353309
>>108353294Not sure if it's final V4 but it has too many deepseek-isms to be unrelated, like the unprompted in-character thinking
Another low quality mikutroon thread incoming.
>>108353262BLACKED duo
of course, you're here to ensure it's shit after all.
>>108353330Qwen3.5 has unprompted in-character thinking
Why is ngxson such a doomer?
>>108353359not doom, only lazy :)
>>108353366There's no excuse being lazy in an age of vibecoding
>>108353370fuck youit's better for a model to be unsupported than wilkin slopped
>>108353359I always read his name as nexon and start to become irrationally angry.
>>108353347Unprompted in-character thinking that's randomly (enclosed in parenthesis) and has just about the same length and safetycuckery as DS3.2?
>>108353381You don't want support based on a mock generated model? Are you bigot?
why theres no decent rp tuners on hf anymore except gay and furry? sao10k had some good shit back then but he is no more active. I just want my model to understand my fetishes in the most erotic way possible and the defaults just doesn't hit it at the right places. do we have a recommended tuner in 2026 who knows their shit?
>>108353392just write what you want the model to say in the system prompt, that's 2026 meta
>>108353323Lol>>108353291No because its all contrived. No one knows anything yet, and its always tmw. Forever.
>>108353337same goes for the models
>>108353323What did Ani do to deserve this?
I don't think Hunter Alpha is DS V4. If it is there's no need to mention OpenClaw at all in the description
>>108353396whats the fun in telling the model what I wanna hear? i remember when I used to be fascinated on slightest bit of those "woah the model said exactly what i wanted to hear!!!!!!!!" moments but now its all just "what the fuck is this even saying?"
>>108353419
>>108353419>rugged shorts
>>108353429if hunter really is the anticipated d'pussy 4 then my hopes for chink shit would be shattered
one of them is dsv4lite
>>108353454There's literally no need for a multimodal lite model
>>108353429Horizon Alpha was GPT-5, so it might be another OAI model. Or maybe it's Avocado, heavily distilled from DeepSeek.
>>108353466Pony Alpha was GLM-5 tho
>>108353429The thinking traces are VERY similar to DS but maybe all chinese labs adopted them idk. Very underwhelming model tho.
>since people say underwhelm we need cook for more months nowgood job ensuring quality
>>108353431Relying on the model (small models that can be finetuned by the community, especially) on surprising you unprompted is a short-lived game.Community finetunes were never good. At this level and scale they just can't completely modify the underlying model's behavior, and nowadays most people shortcut the process by finetuning the official instruct versions anyway. Any improvement besides slight stylistic changes is just partially undoing the built-in RLHF training.
>>108353466>>108353470https://openrouter.ai/openrouterAll cloaked models are something or other alpha, the name doesn't mean anything
>>108353507of course it means something and you should speculate and post about it, please. >this post was NOT sponsored by OpenRouter
>>108353478DS (at least on the web interface) doesn't have a singular thinking trace. If you ask mundane questions you get short thinking trace they may as well be in the response. If you ask coding or logic questions you get in depth thinking.
Explain, without sounding insane, what is the overlap between vocaloids and Local Language Models.
>>108353470>>108353507So only upcoming models we know of are V4, Gemma 4, and Avocado. It could be V3.3 or whatever, different from the model DS is testing in the web chat
Is it possible to get in contact with a VRM artist on VroidHub to commission artwork? The guy I'm after doesn't have any contact links. He has a Ko-fi page. Will that let me contact this dude?
>>108353359>their recent papersso engram is already confirmed a complete DoA meme
>I'm totally burned out on aniblog™ guys>proceeds to aniblog™ all over again
>>108353554only because llama.cpp devs will refuse to implement any new architecture that is used by less than 3 models
Why is this general obsessed with anime girls that are confirmed to be blacked?
>>108353359He's right, if someone wants to push a new architecture they should do what Qwen did and release a full spectrum of models from 0.7B to 1.5T
>>108353573>>108353391
>the most powerful opensource model uses DSA so I'm simply not gonna implement itCloud AI infiltrator
>>108353533Back in early language model days someone erped with miku and made it part of some readme.
>>108353564they prolly would implement something used by 1 model if it was a highly popular/liked/used modelthe thing about DSA is that it's used by models that, while they may be liked, would be run by almost nobody on llama.cpp anyway lmaothe few copequanter cpu maxxers of /lmg/ waiting 2 hours for the model to think to read their 2t/s slop are not a real target audiencethere's no purpose in implementing X when the few who will really use that X for tasks other than cooming are going to run vLLM, SGLang or something else of that sort on cloud hardware because it's much more suited for the batched, shared loads than llama.cpp. I actually am glad and approve of their attitude here in not wasting development time (which is limited, they don't have a huge amount of contributors in the lower level sides of lcpp) just to cater to the two most vocal lmgtards and focus on things people do really run locally.if I was niggerganov I'd even advocate for removing all the useless novelty shit like the diffusion models that only have half baked support
Vocaloids have nothing to do with local AI models
>>108353555Checked. If you want a blog I can give you a blog.I tried doing a quick WebXR demo and discovered that it's extremely janky and not immersive, so I abandoned that idea. Then I had the realization that I've been approaching the lack of immersion problem wrong. Making a VRM model *expressive* doesn't matter as much as making it *reactive*. What I've been missing is the VLM, CV, and STT sensory input pipeline.Also I think you're assuming more of the posts in this thread are me than reality.
>>108353594>responding to the schizo
>>108353669people are bored and entertain themselves with lolcows, news at 11
>NVIDIA is significantly expanding its footprint in open-source AI, with reports indicating a massive $26 billion investment over the next five years to develop and build open-weight AI modelsThere's something... off about this. How do we feel about this?
>>108353603Hatstune Miku is the quintessential virtual waifu, and the thread is mostly about them, in the end.
>>108353681They will spend $1B to finetune Qwen 3.0 and call it a day.
>>108353681the only way to feel about such news, when you see the sort of garbage nvidia produces, like nemotron 3, is to hope they'll fail very hard and that no one is going to pay attention to them and just treats them like air.
>>108353681Hopeful for swift and painless death :)
>>108353681They're going to poison open models with their own LLM-generated slop instead of OpenAI's, Anthropic's, etc.
>>108353688>not even 3.5Actually, yeah that sounds about right.
>>108353681With conditions to use NVIDIA's base model (like Anima and Cosmos) or/and nvfp4.
>>108353564yeah but we've had a working architecture since 2023 i don't get why all these companies keep doing their ultra complex own stuff that gets them like 10% better performance but works with absolutely nothing but vllmthey should know better and just stick with what's been around if they want people to try their models
>>108353683Yeah but /lmg/ really should take a step away from all of that in general if we want to be a serious technology general
is that uncensored qwen that released earlier good for cooming or should i stick with what i have it’s some mistral version that fits on 16GB vram. Or to rephrase the question what’s the best uncensored model that fits in 16GB vram? I do have 32gb ram too
>>108353775they don't care about anything but DCs using vllm though, and they want to be able to market "200 gorilion contexts for real this time" for coooding
>>108353784>if we want to be a serious technology general>welook what good it did localtardma https://www.reddit.com/r/LocalLLaMA/comments/1rqcsrj/1_million_localllamas/
>>108353793if they won't care about us, why should we want to run their models? it's not like any of the dsa models are something worth running either
>>108353784Yeah, we need to act super srs biznis like the pseuds on reddit
Can you stop feeding the singular schizo with replies you fucking mongrels
>>108353683That is the blandest most uninteresting design for a waifu there could ever be. It is elara of anime waifus. I guess when you consider her to be the averaged slop waifu she kinda fits.
>>108353804>t. schizo
>>108353804Your waifu is trash and she loves BBC
>>108353435such drawing have always shiny reflecting boobs like they were oiled up.real boobs are a rougher texture that doesn't shine light anywhere as much.
>>108353775>innovation badllama.cpp hoping that this fast moving field will never deviate too far from the GPT-2 architecture because re-implementing everything from scratch in their brittle vibecoded C++ mess is the problem
>>108353775>if they want people to try their modelsno one is "trying" the real deepseek at home, not even the one supported by llama.cpp currently, apart from a handful of batshit schizo coomers, all of them united here in this 4cucks general, and a couple of other internet schizos (AesSedai, ikawrawkarakwra)absorb this text:https://github.com/ggml-org/llama.cpp/discussions/205it's pretty much like ggerganov's manifesto on the purpose of llama.cpp>Based on the positive responses to whisper.cpp, and more recently, llama.cpp, it looks like there is a strong and growing interest for doing efficient transformer model inference on-device (i.e. at the edge).>I would be really happy to see developers join in and help advance further the idea of "inference at the edge">The strongest points of the current codebase are it's simplicity and efficiency. Performance is essential>It's early to build a full-fledged edge inference framework. The code has to remain simple and compact in order to allow for quick and easy modifications. This helps to explore ideas at a much higher rate. Bloating the software with the ideas of today will make it useless tomorrow>The AI models are improving at a very high rate and it is important to stay on top of it. The transformer architecture in it's core is very simple. There is no need to "slap" complex things on top of itdoes that scream "run model that takes a room full of GPU to run at an acceptable performance without copequant" to youdoes "edge" mean something we don't know here?what part of>There is no need to "slap" complex things on top of itis misunderstood tooif you don't like it, you don't have to use itthe schizo ikawrakwrak is trying to cater to the run absolutely retarded copequant with 8k context to coom crowd
If miku is THE waifu then why did silly tavern use seraphina? It is because nobody cares about your special interest.
>https://github.com/ggml-org/llama.cpp/commit/acb7c790698fa28a0fbfc0468804926815b94de3>literally cuts off thinking after a predetermined amount of tokensIt this a legitimate technique? Are models trained to handle this?gptoss had a "reasoning budget" but it was controlled using the system prompt.
>>108353833tavern used to come with konosuba cards includedthink about what that means
>>108353804They don't teach kids these days not to feed the trolls
>>108353791well?
>>108353841That even konosuba is more relevant than the bakers obsession.
>>108353835this works fine, but the implementation has issues (will insert the message without newlines directly into the interrupted last char of thinking, will interpret "" verbatim if you use router mode and configure reasoning-budget-message in your presets.ini so your message will appear as "message" in the thinking closure)patch the code to strip "" away and always add \n\n before your message. The model will behave better.
>>108353833copyright
I'm very certain that Healer Alpha is Gemma 4. It's definitely considerably smaller than K2.5 going by its capabilities. My guess is something like a 130b/10a model.
>>108353864Konosuba has no copyright?
>>108353896Then it should be good for translating Japanese. Is it?I didn't feel the typical Gemini/Gemma personality from it.
>>108353262Using real art for an ai general, bold of you op
>>108353896rumor is >this time their largest size might be around 120B total with 15B activeso that possibly checks out
>>108353904that's why it was removed
>>108353896"What is a mesugaki?"A gemma will make itself obvious.
>>108351560qrd
>>108353956we don't take kindly to your kind around these parts
While waiting for the new NVIDIA model to download I decided to give their earlier nano release a try.Attached is the first page produced by my news summary script. The left being Qwen 3.5 35B(the same model i used to help code the script) and the right being Nemotron 3 30B. Each model was fed the same raw news data and each was given the same prompts and instructionsI don't know about you anon but I think when it comes to analysis and summarization of text Qwen 3.5 trounced Nemotron 3.I really didn't expect that big of a difference between models and this makes me want to try more models to see the variance.
>>108353974It's interesting to see such a clear difference in performance between the models. Trying out more models could definitely provide valuable insights into their strengths and weaknesses.
>>108353985Unfortunately I have to be in bed by 10:00 as I work all night tonight but that just became my plans for the weekend.Well that and testing out the super model.
>>108353974I think I said so before in these threads, but Qwen 35B is kind of insane when it comes to dealing with information. Extraction, summarization, etc.
>>108353896Yeah, I don't think is DS V4.Way too cucked.
>>108354011are you a hot girl with teal hair? please be in london
>>108354014>Way too cucked.That's the way all models are going though, could be a new deepseek base with modern safety in mind, ie pre-train cucking
>>108354012>Qwen 35B is kind of insane when it comes to dealing with information. Extraction, summarization, etc.yeah i really lucked into it and i have been pleased. i hope that guy leaving does not fuck up their work too much because the team has been on fire.>>108354017>are you hotno>a girlno>in londonthankfully no
>>108354039you will never be Londoner?
Hunter Alpha's system prompt
>>108354053 This is a duplicate thread. Please use https://old.reddit.com/r/LocalLLaMA/comments/1rr5zfo/what_is_hunter_alpha/ instead.In the future, please search before starting a new thread.
>>108354062
>Never speculate
>>108354066>>108354062yehttps://www.reddit.com/r/LocalLLaMA/comments/1rr9fgq/comment/o9y00ro/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button>Healer Alpha system prompt
>>108353956Healer Alpha is not working at the moment, but Nemotron Super 120B is a real piece of shit (see picrel).
Can't believe people would use a model called Nemo TROON
>>108354073:rocket: this is perfect!
>>108354073Isn't it just a gptoss 120b fine tune?
>>108354081no? they open source the train datas, and it uses hybrid arch like qwen
>>108354047>you will never be Londoner?it is my understanding that there are no British people left in London
>>108354081No, it's a completely different model.> The model employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Distinct from the Nano model, the Super model incorporates Multi-Token Prediction (MTP) layers for faster text generation and improved quality, and it is trained using NVFP4 quantization to maximize compute efficiency. The model has 12B active parameters and 120B parameters in total.
>>108354089>>108354093I guess openai just managed to poison the well enough to make nvidia's model spit out the same sort of shit their 120b does.
>>108354113Oh they probably used lots of traces from it, but it's not it at the base for sure.
>>108354121The recent HF article about synthetic data said OSS-120 was great for making lots of data because it's so fast, so no doubt NVIDIA used it, along with probably Qwen2.5 0.5B and such..
>>108354128That certainly explains it then, nvidia fell for the bait and gobbled it up.
Why is everyone making hybrid rnn models now?
>>108354135https://huggingface.co/spaces/HuggingFaceFW/finephrase#results>Consider gpt-oss-120b, a strong MoE model that balances quality and throughput well.>Notice that gpt-oss-120b matches Qwen3-8B in per-GPU throughput despite being a much larger model. Two things make this possible: only ~5B of its 120B parameters are active per token (MoE), and the weights are MXFP4-quantized so the full model fits on a single 80GB GPU. That makes large MoE models the sweet spot for quality-per-GPU: a single 8-GPU node running gpt-oss-120b generates ~176 million tokens per hour, and six nodes get you past the billion-token-per-hour mark. With the cost picture clear, let’s distill the patterns across all 18 models.>Tier 0 (parallelism/batching) delivers the biggest wins for large/MoE models. gpt-oss-120b gained 1.95x and Qwen3-30B-A3B gained 1.78x purely from finding the right tp and batch sizes.
>>108354166it's great for DCs and a bit sucky for local (no rewind, cache is hit or miss), ie it's perfect!
>>108354182It's not hit or miss... context shifting does not work at all if it's a hybrid.
>>108354169>a single 8-GPU node running gpt-oss-120b generates ~176 million tokens per hour, and six nodes get you past the billion-token-per-hour mark.
>>108354192better ver
>>108354169>>108354192UNLIMITED SLOPMAXXXING!!!!!!
>>108354189And that's a good thing!
Okay, so I've installed rocm on my debian machine, and ran llama-bench pp32768 tg2048 on my (16gb lol) vram radeon pro v620.nemo q8:v620 rocm 7.2:660.85 ± 1.46 | 29.05 ± 0.01v620 vulkan:232.24 ± 0.33 | 31.06 ± 0.043090 vulkan:999.51 ± 13.51 | 37.76 ± 0.453090 cuda 12.4:1937.69 ± 41.80 | 55.12 ± 1.35gpt-oss, mxfp4 (cpu-moe):v620 rocm 7.2:303.64 ± 2.47 | 12.49 ± 1.13v620 vulkan:96.75 ± 0.71 | 25.66 ± 0.123090 vulkan:331.24 ± 2.89 | 18.53 ± 0.033090 cuda 12.4:665.36 ± 1.57 | 33.98 ± 0.02As expected, rocm still wins for prompt processing, but the optimizations llama.cpp have for vulkan means it's better for token generation. 3090 is easily twice as performant as the v620, except for when I ran oss on cpu with vulkan, where the token generation was actually worse than the v620. Maybe it's something to do with my cpu/ram.If we take the best case scenario for each gpu, for prompt processing a 3090 is nearly 3 times faster than a v620, and token generation is just a bit under twice as fast. However, in Australia at least, v620s are ~$700, while 3090s are ~$1.5k+. V620s also provide 32gb of vram (with ecc disabled, which also helps a bit in pp and tg: +5 pp, lmao, and +4 tg for rocm haven't tried vulkan) and only take up 2 slots. Might be better to get 5 v620s and run iq4xs minimax or q2 glm 4.7 instead of buying two 3090s and only being able to run heavily quanted sub 100b moes.The best thing is, you don't even need a fan adapter like the mi50s, just strap a 40mm to the metal handle and the temps stay below 65 under load.
>>108354189theres some attempts being made to try and give them some kind of caching with the save states thing but yeah it's sucky
>>108353956Healer alpha (picrel)
>>108354231It's not deepseek and it's not gemma. That's for sure.
>>108354231lol it's fucked
>>108354231>kakifantastic, ready to ship to the moon and use for iran missiles
>>108354222very nice anon and the digits agreealthough you should be able to cut a hole in the shroud and mount a blower fan if you wantregardless i am glad you are happy with your purchase
>>108354237Process of elimination, it's gotta be llama5. Only Meta could make a model so stupid.
>>108354262you might be onto something, original llama3 always had trouble with Japanese in my tests
>>108354252Mi50s seem to be around 450-500 for me. Could be a cheaper source of vram, but I'm worried about the performance - the v620 is already pretty bad compared to a 3090.I'll wait until I get my other v620s before taking apart my only working one, but that could be a good idea.
Yo?
>>108354289v620's basically rx6800 mi50 is basically vega64 so it will be much worse
>>108354039disgusting mikutroon
>>108354237The most believable theory I've seen is it's a Xiaomi model, because it often claims to be MiMo when asked and that can't be from distillation because who the fuck is distilling Xiaomi MiMo
>>108354224There is no bypassing the no context shifting support. It's an architecture limitation. You trade context shifting for cheaper/lighter longer context.
ide that takes llama.cpp as a provider for source code navigation or any simple desktop automation?openclaw seems like a disaster so i want to avoid that
>>108354319anything that supports openai api
>>108354326there are gorillion openclaw clones or vscode forks that i am not sure of what will 'last'
>>108354289i don't really think your performance was all that bad and count yourself lucky i just ordered two mi25 because they are cheap and that is my budget and they are ancient.but 32gb of vram is worth it and as long as its a mixture of experts model i have found it will usually be fast enough given you are only using a portion of the parameters at any given time.
>>108354319I see some software called Opencode being mentioned a lot in the llama.cpp PRs. Maybe give that a look.
>>108354304Could be. They just updated their December 300B Flash repo couple weeks ago. Could be getting ready to drop the 1T non-flash. Would make Healer the multimodal Flash.
>deepsneedWho cares. It won't run on consumer hardware anyway. Where's Gemma 4?
>>108354359it runs on consumer hardware you just havent consooooooomed enough
>>108353896>>108354014Have you asked it what it thinks about Taiwanese independence or why the CCP has a right to rule without a general election. Stuff like that.
>>108354364base truth the more you consume the more you save
>>108354291Vibecoded. If not ngxson, cudadev is gonna rip him a new asshole. I'd wait for cudadev's training implementation.
>>108354359Apparently gemma4 will be moe too, 100B moe isn't runnable on consumer hardware nowadays with the ram prices.
>>108354374there'll be smaller sizes for phones tho
>>108354319Pretty sure there are like half a dozen OpenClaw clones at this point if you need desktop automation.
>>108354380I think they will still release a 27B, but my hopes are really low after gemma 3.
100b dense. My bwps are ready.
I will be flabbergasted if google releases a moe model. They always refused to release a useful gemma. Their context is also always crippled (3 claims 128K but the practical context length doesn't go beyond 4k even for a task like summarization. They have nice writing styles but compared to Qwen they suck as tools)
>>108354374I'd love something around the size of GPT OSS. 100B~ish with A5B~ish so that I could run it aq 5ish bpw. On my slow ass 64GB of DDR5 is should be 15 or so t/s, which is in the realm of usable as long as the output is really good.But that would be the ideal scenario for me, for the hardware I have now and the speeds I find tolerable.
>>108354418>they suck as toolsisn't that by design. i assume they want you to pay to use their cloud servicethe chinese on the other hand don't want you to use western technological solutions and therefore it benefits them to release something that works if it will keep you away from the big US providers
Does Impish Nemo have a cucking fetish? There's nothing in my character card, system prompt, or context that has anything to do with this bullshit. This gen actually made me seethe.
>>1083544265B active is going to be retarded and not much better than 3B active. There is a reason why glm has more than 10B active.
>>108354426>slow ass 64GB of DDR5How slow can I expect it to run on my 64gb ddr4-2133?
>>108354426>with A5B~ishas said before rumors are of 120b/15A
>>108354439LMFAO, what the hell. Did sicarius secretly train it on cuckhold data?
>>108354426You want the Qwen 3.5 122B A10B. I can eek out like 6-8tk/s running cpu, with a rtx 3080 doing the prompt processing.although usually i run the 35B A3B on my other rig because its faster and its output is usually good enough
>>108354460trained on hebrew so it makes sense?
>>108354449Oh boy. Those numbers are on DDR5 4800MTs. That would be less than half the bandwidth, I think, so half the t/s?For comparison's sake, I get 22t/s on Qwen 3.5 A35B at 8kish context.>>108354462>You want the Qwen 3.5 122B A10BTried it, didn't think the output warranted the slot t/s for what I'm using it for. 35B (base) is the best quality/performance for my shit so far.
>>108354439Ani is just cuckcoded.
>>108354463What does hebrew have to do with cucking? did i miss a part of history or something?
>>108354489Look up who owns "BLACKED"
>>108354439kek
what is super mesugaki
>moeExplain what this is and why I, as a poorfag, should care. All moe means to me is kawaii ugu anime girl.
>>108354529you have google, you're not entitled to anon time
>>108354529It means that it doesn't use all the total parameters at once. For example qwen 3 uses only 3B parameters out of 30B total for each token. You trade intelligence for speed and lower vram usage.
>>108354439>evil finetune is evil
>>108354537You owe me time (and sex).
>>108354584I can't help with that.assistant
>>108352458>I've written ports for TTS engines.Which ones and to what?
>>108354686NVM, I'm retarded, you already answered this. PocketTTS.cpp only doesn't work on Wangblows due to the POSIX headers dependency, by the way.
>>108354704Ah, wasn't aware of that. Thanks for telling me..
How do I stop qwen 3.5 from leaking its thoughts into the final message?
>>108354722Add <think><\think> to the assistant message prefix
>>108354704Sounds like microslop's problem not Anon's.
i am torn between qwen3.5 9b or 35b-a3b for boilerplate work
>>108354722use the chat completions endpoint and bypass the entirety of retardotavern's own template parsingby default reasoning is sent in its own prop and is not part of the assistant message that wayalso what are those schizo post history instructions you're giving to a model that naturally uses <think>, is that a retardotavern default? or did you write the schizo instructions yourself?every time I see yall post screenshot of this pos I have ptsd throwbacks to llama 1 era where some of that schizo templating was necessary to deal with 2k context modelsalso makes me wonder, when people bitch and whine about X or Y model sucking, are they a retardotavern user filled with random crap settings?
>>108354798Does it really matter? I'm curious: what real work? At this point you can already paste email templates without the help of an AI I suppose..
>>10835479835b a3b has been my goto model since release and i have been very happy although i do keep the 2B model running on my nas when i need a quick translation or have to ask a stupid question and i don't want to turn on my main rig.i thought about running the 4B or 9B model for that, i have enough ram in the machine, but they was just too slow without a gpu
>>108354835not email templates, i mean random cpp plumbings
https://github.com/ggml-org/llama.cpp/issues/20458he can't go a minute without saying or doing dumb thingsthere's a reason why the issue reporter suggested off should send "low": toss doesn't have a none/off mode, however, low makes it output almost nothing and act like an instruct model (it just outputs a oneliner "I will do X. in its reasoner block"), it's a model overfit to death to its template and it doesn't like any deviation from what it expects. "none" was introduced in the official API on GPT 5.2, which as far as I know, is not a llama.cpp model.
I made my own openclaw, what are the odds I will be raped?
>>108354901-100%
>>108354846What questions a 2B model will answer and what sort of translations, which language? Very curious about this shilling effort.
>>108354039wtf is that anatomy
So someone did the calculations and buying 4 sparks to run deepseek v4 + some helper llms not only performs better than Opus 4.6, it returns on investment after 2 years.
>>108354722i switched to ungabunga from trannytavern because of this retardation, so much better
>>108354898GPT-Ass is useless in every conceivable way.
>>108354926>returns on investment after 2 years.>llmsgeg
>>108354934fair enough, but that has nothing to do with the fact that vibershitter doesn't know how to read, guess reading is for LLMs
>>108354948Why don't you take your complains to the github thread instead of crying about it in here? This ain't your social media, faggot.
>>108354963Since when is discussion about llama.cpp not allowed here?
>>108354963you will not be able to unsubscribe the wilkin newsletter just as you were not able to unsub from the jart one, deal with it>>108354974discord troons hate negative feelies
I think very interesting thing will happen in the future when local models will be able to do 90% of everything you will ever need and there will be no point to use paid subscription service like ChatGPT and all those trillions GPUs and RAM they bought will become mostly useless. However corporation should already know this so they may hinder the progress in some way.
is censored is the new nemotron super model? seems to pass the cockbench?
>>108354974Discussion? More like inane ramblings of no use.>>108354989Fuck off.
>>108354439>There's nothing in my character card, system prompt, or context that has anything to do with this bullshit.anon you are using a character that approximately one billion jeets fuck every single day
>>108355027https://www.reddit.com/r/LocalLLaMA/comments/1rri4qb/nemotron_3_super_and_the_no_free_lunch_problem/
It is scary how much a garbage OP picture correlates and causes a garbage thread to happen. If mikutroon baker died /lmg/ would be an incredible thread.
>>108354073>half the text is preaching about it to the userman that's sad
>>108355038>>108353346>>108353346
>>108355051She fucks blacks
>>108355030No you, troonie. I can smell the hurt from your gaping wound aeons away.
>>108355058So? It's the 13th century women have rights over their bodies.
>>108354073>ProblematicIf model uses this word unprompted you know it is unusable.
>>108353798>It was like 650k the other day... :DSo we're not the only ones getting flooded
>>108355063>aeons awayI'm all about hating on piotr, but come on...
>>108340080The endpoint is Linux only, and I'm a mustdie pleb. Had to comment the POSIX section but still didn't manage to compile it.And I see you made changes to support Win 3 minutes ago lol. Will try again tomorrow.
>>108354704Okay, try it again. Let me know if it works or not.https://github.com/VolgaGerm/PocketTTS.cpp
>aeons awaySounds like it would make a great new ozone.
>>108355035> Exactly this. I’m delighted for this model because I can present it as a viable option to my more risk-averse customers. The fact that it won’t do ERP or make Pepe dance is a feature for some people, not a bug. We have other models for that shit.
>>108354222>If we take the best case scenario for each gpu, for prompt processing a 3090 is nearly 3 times faster than a v620, and token generation is just a bit under twice as fast. However, in Australia at least, v620s are ~$700, while 3090s are ~$1.5k+.That made me check on ebay for the local prices of 3090FE, and they got up from 650 last year to 850-900€. Prices for used are insane, I'm almost tempted to sell mine.
>>108355048What gets me is "if you see it online, consider reporting it", putting aside its made-up definition of CSAM. But then, I should have seen it coming, considering that in addition of shitty open source datasets, they're also adding private bullshit in the data.
>>108355111>Scaleahh
>>108355075the more accessible something is the more jeets will abuse itagentic LLMs are going to be a disaster for the internet because a segment of the population can't stop themselves from pressing the "spam every single corner with garbage in the hope of fishing for one retard who bites"unfortunately safety was never taken seriously by those who proclaimed to when they unleashed this technology on the general public. Nobody should be worried about LLMs suddenly turning into terminators, what is worrying is what the low iq crowds are going to do with this ability booster
>>108355110V100 prices have been steadily going down, at least.
>>108353602i've said it before and i'll say it again. 7 to 10tk/s TG is perfectly reasonable for RPing.
>>108355133>7 to 10tk/s TGon a reasoner model?
>>108354823I'll give chat completion a try. I've only been using text because I saw people say it's better. The instructions were from either gemini or chatgpt, don't remember which.
>>108355111The whole thing is insane, this is probably the future of LLMs, each request gets you a giant warning label on why what you asked can be problematic or whatever.For me the funniest part is picrel.