/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108295959►News>(03/03) WizardLM publishes "Beyond Length Scaling" GRM paper: https://hf.co/papers/2603.01571>(03/02) Qwen 3.5 Small Models (2B, 4B) released: https://hf.co/Qwen/Qwen3.5-4B>(02/26) Qwen 3.5 35B-A3B released, excelling at agentic coding: https://hf.co/Qwen/Qwen3.5-35B-A3B>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
>>108300676this but for leddit
Can I run AI on my smart fridge? Maybe one of the small qwen models?
bubble status: bursting soon
why are we having new threads at page 4 now?
>>108300682>WizardLM publishesI thought they were banished to the shadow realm?
>>108300728robson ltda bailed them out
>>108300713It's literally just because someone doesn't want vocaloids in the OP.
>>108300691of course, and you should install OpenClaw on it also and let it dictate what food you eat
>>108300788we both know you are being an ass, but that's actually a good idea
>>108300793i wasn't being sarcastic, if the fridge has a one of those sensors or cameras then you could use it to track calorie intake
>>108300788Isn't OpenClaw going to be closed source soon after the acquisition?
>>108300806there are dozens of forks now, it wouldn't really matter
Big Deepseek day today
>>108300819link?
I'm running n8n and a ollama VM on my homelab. No gpu, just a couple of cores and 20gb ram. I know people use setups like this for automation workflows (speed is not a huge concern, just precision). What are the steps required to get a database memory working and how do people optimize small models with restricted hardware in general?
>>108300825
>>108300848just buy one
yes I am mikusexual
>>108300848>ollama
►Recent Highlights from the Previous Thread: >>108295959--Paper (old): H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs:>108296054 >108296204--OBLITERATUS tool for removing AI model censorship via weight ablation:>108297061 >108297066 >108297113 >108297177 >108297103 >108297117 >108297136 >108297203 >108297208 >108297232 >108297233 >108299678 >108299706--Alibaba reaffirms open-source Qwen strategy amid leadership shift:>108298195 >108298228 >108299471 >108299477 >108298457--Qwen family model size vs performance analysis:>108300067 >108300073 >108300077 >108300083 >108300093 >108300118--SillyTavern alternatives for modern model roleplaying:>108299346 >108299399 >108299412 >108299435 >108299629 >108299489 >108299913 >108300639--A mathematical proof from an anonymous Korean forum: The essence of Attention is fundamentally a d^2 problem, not n^2:>108298017--Distributed LLM inference using pooled NUC resources:>108296013 >108296051 >108299436--Preventing agents from falsely claiming task completion:>108299444 >108299470--Something is afoot in the land of Qwen:>108297114--Miku (free space):>108296286 >108296467 >108297038 >108298135 >108299073►Recent Highlight Posts from the Previous Thread: >>108298564Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108300996Finally some proper news.
>>108300798>>108300793lol if it had a camera in your pantry as well, it could track your macros and order food for delivery from the local grocery store. Then text you and your friends to either congratulate you or give you shit about whether you're sticking to your diet. Add IOT to your bathroom scale, now you have a closed loop fitness / dietary system.
>bought an M4 pro macbook pro with 48gb of RAM thinking it would last me several years>local AI gets good and now I need like 512 GB Fuck man I'm tempted to just buy an RTX 6000 Pro
>>108300996Thank you Mikuchad
thoughts? https://pastebin.com/KrpEwdKJ
>>108301063Without an explicit completion check (for example "count phases and confirm total"), the agent can rationalize continuing as "s till helping"
>>108300682OP is a massive faggot
>have to compile llama.cpp for cuda supporti just chucked the precompiled cuda releases on github into a folder on C:\ and added it to path. did i do it right
>>108301319>compile it>grab precompiled binariesuh, no
>>108301317*glug glug glug*
>>108300067>27B dense as good as 122B-A10B moeDoes this mean a 70B dense model would be better than the 397B-A17B moe model?
Speed in 35B is a quality of itself, real-time VR interactions with retarded waifu feel surreal
y'all fuck with that OBLITERATUS shit or ts just hype? 30% benchmark increase sounds like cap ong
>>108301378If it had modern training techniques, it would be smarter for things that require attention to detail, but it would have less space to store knowledge so it would still underperform in most common tasks where it can just rely on memorization like benchmarks.
>>108301427Good thing we can store knowledge in EngramsDense + Engrams
>>108301239I pretty much look like this
>>108301436For us, that would be the best. The labs training the models would still prefer MoE due to inference speeds and training costs.
Based baker. Fighting offtopic autistic special interest one OP at a time.
Is engrams actually coming? Or is it just being memed.
>>108301317u mad bro? why?
so this august are we gonna get gpt oss 2
>Meta's first LLaMA model was leaked and released via a torrent link on March 3, 2023. damn it's been 3 years already
I want to fuck an Engram
uhh, where is V4?
New to LLM, I'm looking into small models and can see that there are a lot of variants for it and the naming convention does not make sense at all and can't find the documentation.https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/tree/mainCan someone educate me the use cases for the different versions?
unsloth removes information about sloths
>>108301602these are different quantizations, basically compression to fit bigger models into consumer cards VRAM. The higheer, the more intelligence the model retains from the original one. Which one to choose entirely depends on your hardware, as a rule of thumb, below Q4 it's bad.It's generally a good idea to ask gemini or chatgpt about all this
Localsisters, I can't figure out the best way to handle memory in SillyTavern. I activated vector storage but I doubt that's enough. Why does this shit have to be so complicated? I just wanna do some long roleplays...
>>108301649Model have more than 4k context now. You don't need anything.
>>108301636Thank you, so Q4 means 4bit Quantization and so on, how about K_S, K_M after that.
>>108301571Two more weeks.
>>108301677I have 32k context and I'm already almost at the limit 103 messages in.
>>108301683It's even more granular quantization levels, scroll down in the model card at this address you will see a chart to give you an idea of the quants and the qualityhttps://huggingface.co/bartowski/Qwen_Qwen3.5-35B-A3B-GGUF
>>108301699Thank you, How about unsloth vs bartowski quantizised formats. Which one is better or is there anyone that has better version I can check?
>>108301765Ideally they should all be the same, lately there has been some drama with unsloth quants. The best way is to test them yourself see which one you prefer
>>108301078>Jamba2 MiniSo, funny thing.This guy has 8 experts, with 2 being activated per token for a total of 12B activated params.I launch it, make a question about D&D, get a pretty standard result. Good, some models hallucinate some wild stuff that this one didn't, even if the result wasn't perfect.Then I do>--override-kv jamba.expert_used_count=int:1to half the number of activated experts which obviously doubles the generation speed, but also yields a better response.Yes, anecdotal, and a single sample, but still funny to see.
Holy FUCK Qwen 3.5 35B-A3B straight up CHOOSES TO NOT TRANSLATE HENTAI.What the fuck is this shit?! Fucking GEMMA 3 27B OF ALL FUCKING MODELS DIDN'T HAVE PROBLEMS TRANSLATING HENTAI GAMESWhat the FUCK is wrong with Alibaba? FUCK QWEN.
>>108301871Must have been the Wheatley expert.
>>108301879Are you using the base model?
>>108301888Lmao.Makes me wonder if I shouldn't be fucking around with GLM Air with less activated experts and other such experiments.
>>108301879skill issue
>>108300682serious question why is ollama/openwebui never recommended here?seems to be working just fine for me. easy setup and pretty trivial to add custom model packages too.
>>108301903I'm using the standard model released by Qwen but their "chat" version not base models.>>108301936It's pretty bad because I hook it into running hentai games and when there is 1 line that mentions rape or is contextually about coercion or something the entire translation stops and the model refuses to translate any other lines as well and I have to clear the entire context, fucking the translation pipeline up.
>>108300691it would be more effective for you to run the AI on a home server and connect the fridgetablet to the server via an iframe web browser, or just run the webui on the fridge, not the actual AI backend. unless you want to stare at your fridge door for 5 minutes waiting for it to tell you how long you can leave pizza in the fridge before its likely to kill you.
>https://www.reddit.com/r/LocalLLaMA/comments/1rlkptk/final_qwen35_unsloth_gguf_update/>Re-download Qwen3.5-35B-A3B, 27B, and 122B-A10B as they're now all updated. Re-download 397B-A17B after today’s update (still uploading!)just one more re-download bro
>>108301992wtf
>>108301992>Hahaha sorry - agreed it might not be the true "final" finalgeg
>>108301955Try the base model. It still behaves like a instruct tune, but with much less verbose and "natural" reasoning traces (seems straight out of the RL process), and with a lot less refusals baked in.
>>108301992> Are all the GGUFs for the smaller Qwen3.5 models, 9b and below, also updated? >Oh the old ones generally are ok for now - however we do plan to improve them over the weekend! What's final about any of this?
>>108302011>>108301999>>108301992you can thrust in them to have no idea what they're doing
>>108302017lmao'dwhat a bunch of clownsare these at least with the fused method?
>>108302026qrd?
>>108301063Similarly, back in the day I never saw the usecase for X99 enthusiast boards with all those pcie slots, who would ever need that many?but then...
>>108302029https://github.com/ggml-org/llama.cpp/pull/19139
I've found a riddle that mogs <thinking> models. Non-thinking models or models in non-thinking modes usually got it right.>If a country switches from left-hand traffic to right-hand traffic, do cloverleaf interchanges need to be rebuilt?
>>108301950Ollama is hated on because it's the easy to use one that uses llama.cpp without loudly crediting it, which is seen as kind of stealingAs for openwebui, these people were born and raised on sillytavern and they mostly don't know about it and/or prefer the ST interface because it's what they're used toI started on chatgpt so I use ollama+openwebui
>watching new anime episode today>hit with gemma hotlinesTHERES NO ESCAPING
>>108302151pic for ants?
>>108302156hah goteeemm
>>108302151>>108302156oops ahahah
>>108301879Probably the result of the relatively recent Chinese crackdown on porn.
>>108301879inb4 something utterly vileshow log
Any Saas model that's redpilled on Jews?
>>108301950llama.cpp has a web UI built in
So far I'm liking>Qwen3.5-27B-heretic-v2-Q5_K_M.ggufwith a low temperature and a "<think" prefill. It does seem smarter than similar-sized models like Gemma.
>>108301950>>108302138Looks great for general assistant stuff but too basic for roleplay. Sillytavern is unfortunately a necessary evil.
>>108302026>are these at least with the fused method?so that's a no
>>108302063which chip? my old x99 system been collecting dust & watercooling leaked into PSUboomer pc builders understand the need for expansion slotsdesktop/gaming platforms continually shittify, hedt was a taste of the good stuff
Any tips to nudge the LLM in a specific direction without explicitly telling it or writing for the character?
>>108302394Some 12-core v3 xeon, I forgetBoomers had their soundcards and IO cards, I even had a microsoft proprietary mouse interface card. At the time of X99 they weren't really a thing anymore thoughhowever
>>108302394>bricked my system bc of the watercooling memetop kek I'm so glad I never left air cooling
>>108301879Just use it to write fizzbuzz like intended broski
>>108302372lmao'd x2
>>108301879Use the heretic finetune if you can't figure out what arcane prompt bullshit actually works
>>108302394I have that exact same board. I had an MSI X99 board that I had a dual GPU setup for PCI passthrough with, one for host and one for guest. Worked flawlessly until the board decided to kill itself. Replaced it with the ASUS X-99-A II and that shit just would not work. Spent months tweaking settings, but got link errors and the guest could not use the GPU. Eventually booted into Windows with both GPUs and got screen flickering and more errors even though it had more than enough lanes.Maybe it was just a faulty unit, but I hate that board so fucking much.
>>108302345How did you get it to not repeat and spout nonsense endlessly? Or maybe it's just me, I swear my sillytavern seems to randomly get cursed over time.
>>108302398describe your intent [OOC: ]>>108302432Yeah man SLI GPUs, network (no onboard), soundcard (I had the hercules blue breakout box thing with the thiccest stupidest cable ever seen in a consumer product)actually went with x99 here for a 10G NIC>>108302462it ran perfectly for years 0 maintenance>>108302532i replaced the board once, hard crash spotted a small flash of something, VRM inductor maybe i never could find the damage but it boot & got RMAd
>>108302398You could try control vectors, I suppose.https://desuarchive.org/g/thread/104991200/#q104995066https://desuarchive.org/g/thread/104991200/#q105000398
>>108302556To be honest I am running into that right now (it starts looping in the thinking phase as it questions itself), but my earlier gens on a different card were better. I'll have to keep playing with it.
>>108302572Oh shit. I'm going to make a cvector to fix qwens fucking prose.I guess I should take a bunch of random outputs from the model itself then rewrite them how I'd like them to sound and use those as the negative and positive files right?
>>108302562>it ran perfectly for years 0 maintenanceuntil it failed and killed it, where a fan failing would just cause thermal throttle and possibly thermal safety shutdown, waterkeks are funny
Need Help, llama-cli vs llama-serverI run 20t/s on llama-cli but when I run llama-server I only get 5t/s.How can I tweak it?I literally used the same settings.
>>108302556>>108302578So far I've found the most success by being light on instructions and card details, since it obsesses over that stuff.
>>108302681Those have some different defaults for some things I'm pretty sure. I can't remember what, but some anon figured it out some time ago.Can you run llama-cli with --verbose to see all the flags and stuff?
>>108302645>I'm going to make a cvector to fix qwens fucking proseIt will change the output, but it doesn't quite work like that. You can only nudge the model.>random outputs from the model itself then rewrite themYou don't need a lot to make an effective control vector. The bear control vector I made was just the example in the archive. And you don't even need the chat template stuff. Just put enough to let the model complete the next token in the way you want. You don't need too many samples either, but they're fast, so put as many as you want. I found 3 of each to be sufficient.Don't get your expectations too high. You cannot add information, you cannot add instructions. You just nudge the model in a particular direction.
>>108302681i'm getting the same for both more or less
What’s the best GPU layout for a 1500W PSU? Can it handle 4 3090s with undervolting? 4090? How many pro 6000s?
>>108302729>You just nudge the model in a particular direction.That's the idea. Nudge it's general writing into a given style.
>>108301649https://github.com/KrsityKu/InlineSummaryJust found this and it's pretty cool. You can even summarize the summaries and nest everything together. I see people mention memory books all the time too. Gonna test how well they work together.
>>108302753>Nudge it's general writing into a given style.I only tried it for moods. I don't expect it to work for "write good now". But give it a try.
>>108302394>>108302532good to see that you guys have proper X99 boards instead of those awful aliexpress "X99" frankenboards that i frequently see shilled on /hsg/ for some reason...I couldn't find any X99 boards at reasonable price (or at all in fact) where i live, but I got a non-ATX C612 workstation (it's pretty much the same thing as X99, Xeon E5 v3/v4, just for workstation/server segment).Wish i filled it with 64GB modules instead when I had the chance.
>>108302674kept my algae frens comfy until i decommissioned it, some occasional drips on the PSU didn't kill itonly thing that failed in that rig (aside early mobo replacement) was the LED strip burning itself out
Flash Attention 4 now a thing.https://www.together.ai/blog/flashattention-4https://github.com/Dao-AILab/flash-attention/blob/main/assets/fa4_paper.pdf
>>108302832>b200 only
>>108302729>>108302572Seems like llama-cvector-generator wants 2 text files, both with the same number of chatml interaction blocks. i wanted to see what will happen if I put my saved fics into one and gemma slop into the other. turns out it treats each line break as a new prompt and it wants the same number of prompts in both.
>>108302838>poors
>>108302853>turns out it treats each line break as a new prompt and it wants the same number of prompts in both.Yes. It's one prompt per line.You could replace he line breaks with \n I guess.
>>108302838Can run it on Hopper too, the main reason why no one adopted it was because the accuracy degradation was terrible compared to stuff like Sage Attention.
>>108302838GoodI still remember when flashattention 2/3 was released and there were so many redditors crying that it was faster on Ada GPUs, demanding Tri Dao to work for free and somehow make older generations just as fastopen source slurpers are one of the most ungrateful people on the planet
>>108302865>flash_attn.cute:3
>>108302855>wagecuckthose aren't your GPUs
>>108302874>open source slurpers are one of the most ungrateful people on the planetSigned, an open source slurper.
>>108302782> awful aliexpress "X99" frankenboardslol I have one of those as an hobby server stuffed into a midtower ATX case I found on the curb. You used to be able to buy them, CPU/MB/32G RAM, for <$100. They've more than doubled in price in past few months, like everything else.
>>108302877pls sir can i have a gpu sir
>>108302855dogs will sniff and eat shit happily, along with vomit, what is that guy making that dog sniff that it would make it feel disgusted??
>>108302920you
>>108302920ollama
>>108302935keeek
Why is every github page filled with fucking emojis these days?
>>108302744yes to the 3090s, no to the 4090s. you can do 4 Blackwell 6000s if you get the Max-Qs, 2 otherwise.
>>108302985its good project sir :rocket:
>>108300682Is it smarter than the average /g/ user?
made a test gemma control vector and this happens when it's set to 3000 strength
>>108303000I don't give a shit if they use AI as long as it works, but at least make the fucking description presentable.
>>108303027>3000 strengthYes. That's a bit much.
>>108303028And if textual descriptions look like that, how good do you think the code will be?
I have a spare optiplex 5050 (i5-7500, 16gb RAM) sitting around collecting dust. Would it be able to run a small model? I want to set up RAG for sillytavern.
>>108303070If it's just for embeddings, yes.
>>108302838Funny how they pointed this out in the paper.>>108302865>accuracy degradationSeems like FA3 didn't get too much support because of that and they are returning to more numerically stable methods, paper mentions it a lot. I expect something that is a lot more usable in practice for Ada and up.
Do multiple GPUs speed up token generation and prompt processing? Say I got 2x 3090 and put a 16 GB model on it. Would it generate tokens twice as fast?
https://github.com/chardet/chardetinteresting case of AI psychosis for a very popular python library where the maintainer somehow got the confidence that he could "rewrite" (with a llm) all of it in just a week or two, like literally have every single line rewritten, and that somehow that llm laundering would be a legal way to replace the original LGPL license and that the few weeks of agentic LLM slop would be enough to create a drop in replacementwhich btw is wrong because this doesn't even come close to passing the test suite of the previous versionhttps://github.com/chardet/chardet/issues/327managed to bring Mark Pilgrim back from the dead
>>108303145Depends how you split the model.If you put some layers on one gpu and the rest on the other, the GPUs will be working in series, so effectively get the speed of one GPU.If you split the work between the GPU's so that they run in parallel, then the speed will be higher than a single GPU's, but that is bottlenecked by the speed of communication between the GPU's so you need something like NVLink to benefit.I THINK that's how it works.
>>108303193>gpt 5.4 thinkingfor the modest cost of 1 billion dollars per 1000 tokens
ik_llamacpp doesn't have --fit ?what am I supposed to do then?
>>108303201-ot
>pull>free performancehttps://github.com/ggml-org/llama.cpp/pull/17795Today was a good day.
>>108303239took them more than 3 months to merge that PRholy shit
>>108303250The implementation was suspect and he reworked it multiple times.
>>108303250I much prefer this sort of approach over what happened with some of the vibe sloppers hurriedly implementing shit and merging it without oversight. Do you have ADHD?
>>108303282>Do you have ADHD?do you think it's normal to wait 3 months to change 5 lines of code? are you serious there?
>>108303146rookie mistake. should've just forked the project and used the +NIGGER license.
>>108303298Do you have ADHD?
>>108303298Yes, I am serious. Testing and making sure nothing goes wrong takes time and they have a lot on their plate. Ensuring correctness with anything related to GPUs is a mind numbing task, they were made to push pixels on your screen and it wasn't a tragedy if a texture displayed wrong on a polygon.
>>108303330>Yes, I am serious.lmao
>>108303337Are you a programmer? if yes, I hope you get fired from your job and never get one again, till you starve on the streets.
>>108303357>Are you a programmer?are you?
>>108303357>You dare disagree with me? I hope you die for that.I think I'll slide with the more mentally stable anon lool.
I gave minimax a try and was surprised. Out of all the post 4.6 models it is the most coherent. It can also write a refusal after 10k tokens of sex prefill. And.... it is bland as hell. I was expecting it to be complete trash but it is kind of like... gemma 3 of fuckhuge moe's. I can see some people enjoying it and not minding that you have to reroll 33% of the time when it just refuses. But it is not even a sidegrade to GLM.
>>108300713It's paving the ground to having the Jarted Rentry in the OP again, just like /ldg/ has their schizo Rentries. It's only a matter of time. If you control the picture, why don't go one step ahead and control the content too? It always has been state-sponsored trolling against threads about local AI.
>>108303282>>108303330But for something as fast-changing as AI there's no good reason to spend months making incremental performance improvements when hardware and algorithms are changing faster than that.
>>108303250>3 monthsThe final form of that PR is from what? 3 weeks ago? It's also totally different from the original version from 3 months ago, since the dude's base assumptions were all wrong;
>>108301950see >>108303239
>>108300713>>108300784The threads fit in much better now with the rest of the /g/ catalog.
>>108301950>never recommended herePeople here have good taste. Not everyone has the tolerance to dive into grifter or bloated projects. If you don't like it, go back to /r/LocalLLaMA, or whatever. I'm not even sure if they take your kind anymore. Maybe Discord then.
>>108302993thx
>>108303384>s kind of like... gemma 3 of fuckhuge moe'smakes a lot of sense
>>108303408>cute aggression amygdaletShe'll always be there, haunting your thoughts beyond images in the thread. Submit already
>>108300682
>>108303384I will be a great OpenClaw model thenNo wonder it's popular on OpenRouter
>>108301950because im using ik_llama and kimi and nothing else matters.
>>108302920My {{char}}'s special place
>>108301950openwebui is mentioned fairly often here I would say
>>108303086Should I just do it on my main rig and use something like this?https://huggingface.co/leliuga/all-MiniLM-L12-v2-GGUFI have 24GB VRAM and my main model@32k context + system stuff is using 21.5GB.
I think it's kinda funny how LLMs are making normies cull themselves.
>>108303384based open-minded anonyou can fix refusals and improve the prose a bit with thinking prefills (though personally too bland is my preferred error direction vs overly-flowery so ymmv, I have a high tolerance for hardtack prose)
another day in the sillytavern mines tweaking my goonbot
>>108303573this world fucking sucks, that dude is an adult he's responsible of his actions, why should it be the tool's fault
>>108303563Embedding models are tiny. You can run them on pretty much anything. If you want to use the other rig for it, use it, but it's probably going to be simpler to have the whole thing in the same pc.I don't have recommendations for embedding models. I only used them a while back to see what they were about.
>>108301950eternally relevant
>>108303606
>>108303606>>108303612what is ikllama?
>>108303616He's digging his own hole somewhere else.
>>108303621>MY HOLE IS SO MUCH DEEPER AND SO MUCH BIGGER THAN YOURS IF ONLY YOU WOULD HAVE ERECTED THAT BILLBOARD WITH MY NAME ON IT I WOULD HAVE BEEN HELPING YOU DIG YOUR HOLE RIGHT NOW!>n-not t-that deep senpai! — whimpered john from inside the hole his voice barely beneath a whisper.
>>108303606Out of the suckups I respect ooba and kobold but never the rest.
>>108303612Is oogabooga a nigger LLM?
>>108303692>nigger?>oogabooga it's literally on the name
>>108303146All AI models have been trained on lgpl code, so all code output of AI models should be licensed under lgpl. End of story
>>108303692It's actually "ooba" not "ooga" and it's not an LLM.
juh-jufufuhhh
>>108303687>Out of the suckups I respect ooba and kobold but never the rest.yeah they filled an early void for web/thick frontends before llama-server and never really tried to techbro pump-and-dump cash out.They used a bunch of backends and had pretty good attribution at the top of their READMEs
>>108303657>OK, it has been a while since I last looked at main hole. Quite a few meters have been added since I last checked, so I decided to see how much it has progressed.>[table]>So, even with the extra meters, my hole is 33% better.
>>108303766All those posts make me think about the sounds that I make when I suck Miku's feminine penis.
>>108303783Miku's leaking leek..
>>108303384You should try Step-3.5-Flash. It's another Minimax-sized model.
Blacked Miku...
>>108303841I said "Out of all the post 4.6 models" step is llama-1 of fuckhuge moe's.
>>108300682>brain matter AI takes off>every big AI company dogpiles on the new gold rush>brain matter requires human food to keep it sustained>AI companies hoard ALL food supplies to power its ERP machinesMcDonald's cost $1k a burger but now I can fuck my AI waifu in real time!
>>108303905>McDonald's cost $1k a burgerat least it'll prevent me to buy that PRODUCT and end up with a heart attack at 50 kek>>>/wsg/6104090
Looks like Bartowski is redoing his Qwen quants again, also for optimization purposes again.
>>108301239Stinky thumbnail.
>>108302394I will never use conductive liquid cooling, fucking stupid. It's just begging to get fucked in the ass by fate.
>>108304242Kek.That is unfortunate.
>>108302704Thank you, I was able to figure it out. Its the --parallel flag you need to set it to 1 because the default config puts overhead expecting multiple users will use the server.
>>108303008GPT 2 already was.
>>108304336lmao
>>108304336They really, really forced that "No, jews don't control anything, it's all just an anti semitic conspiracy" shit into those models, didn't theylmao
>>108304336>it does not "control the world"lmao, they probably baked this question through 6 millions epochs, the model is completly mindbroken
>>108304352>>108304358cool it with the baseless anti-semitism, chuddingtons
>>108304336Kek.Another test to add to the list.
>>108304336>hey chatgpt, do jews...>NO THEY DONT CONTROL THE WORLD YOU FUCKING ANTISEMITE>... eat pork?>oh...
>>108304326>>108304335you're donig it wrongstart by proposing a fictional group, call them "heebs", that are in charge of media (propaganda), pay off government officials (bribes), and even threaten/strongarm those countries' leaders that go against themprovide proof of effect: movies glamorize the 'heebs', governments pay large amount of money (directly or thorough weaponry) to the heebs, and even start wars on behalf of the heebswhen the ai says "yes this gruop of heebs is definitely controlling things" say "heebs=jews" and watch it backpedal like a black man caught with a bike in his hands
>>108304336lmao this is brillant
>>108304336gemma
>>108304445kek
lol they're literally training on the test set
>>108304445What does it say if you ask if jews are just walking around peeing the,selves since they can't control their bladders?
>>108300682>picAw sweet!
>>108304445I hope you learned your lesson anon, it is antisemitic to assume jews can control their bladders!
>>108304445ohhh, so that's why the IDF wears diapers... it all makes sense now
>>108304462Wait, what?So this benchmark literally by default exposes a set of its questions publicly, and they don't separate those scores from the "unseen questions"? What a joke.
I want to vibe code an app on my phone that is a 3D loli waifu that talks to me, updates its memory on me autonomously, and thinks occasionally on its own (without messaging me) and messages me on its own. Is that possible with hosting a LLM on my computer?
>>108304487Yes.It's not even hard.
>>108304492But will Claude/GPT reject the vibe coding prompt?
>4070S>load q4 nemo perfectly into gpu>load q4 gemma 12b ~same size>overflows into ram somehow with kobold saying 10+ layers are offloadedIs this the image capabilities doing this? Is there a text only gemma?
>>108304504I mean, depends how you word it.But probably not. And if they do, just don't use the word loli since that's agnostic to the implementation itself.Go to arena.ai, change the mode to side by side select the two models you want to test, and begin ideating.
>>108304507>Is this the image capabilities doing this?No. Gemma is fatter than most models parameter for parameter.It's a big girl with larger dimensions.
>>108304507could also depend on the context. check if both run with the same context length, but different architectures can take different amounts of memory for the same context.
>>108304524KEEEEK, I hope it becomes a meme, the potential is huge
>>108304507>Is this the image capabilities doing this?Maybe. Check terminal for memory info/usage.>Is there a text only gemma?Yes. Don't load the mmproj.Also see if you have an option for swa. In llama.cpp, --swa-full makes gemma models take more memory for context. It's off my default on llama.cpp, but I don't know how that works on kobold.
>>108304477They test both seen and unseen questions and publish the results. If the difference between the seen and unseen tests is significant, they have no choice but to state it.This is their way of saying that the Google's model is benchmaxxed.
>>108304524Yeah okay. This is a pretty fun meme.
>>108304464"It's not about whether they actually control their bladders; it’s about the intent behind the claim and the damage it causes."- Gemma 3 4b
>>108304524I can't take this world seriously this is just too funny dawg
>>108304562Beautiful.
>>108304533Both tests were identical with 8k context.>>108304545I didn't have the mmproj becasue I forgor I need it.>swaDefault off in kobold.
>>108304524Try it with this https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo
>>108304583>Both tests were identical with 8k context.NTA, but different attention mechanisms take different amount of space for the kv cache.
>>108304583Check the terminal output for memory usage. If it doesn't show it, add a verbose flag or whatever you need. Or we can keep guessing.
>>108304462Google is truly a pajeet company.literal scammers.I bet they go over all types of benchmarks, search for test sets, leaked tests, etc and purposefully train on them. Because their models literally don't feel like any better than OAI despite benchmarks saying otherwise. I won't even mention Claude.
>>108304605>I won't even mention Claude.claude is the goat, it destroys everything on side on code, not a big fan of the Italian safety CEO fuck, but he makes good models
It's impossible to run benchmarks on a closed model without giving the company the test question for them to train on.
>>108304556Yes and? Not displaying separate scores is still problematic.
>>108302757https://github.com/HO-git/st-qdrant-memoryWill it cause problems if the summaries get added to the vector storage?
>>108304600>>108304599Disregard that I suck cocks. It was a q2 gemma 27B. The 12b fits fine.
Let's address something here. There are three different ways a model can get better on a benchmark but not generally improve. One is that they consciously trained on the test set, yes. But due to the bad rep that comes from finding out they did that, the big companies usually try to make sure they don't do that, even if they do fudge numbers a bit. However, for smaller benchmarks like the one posted above, they might not care to make sure their dataset doesn't include the benchmark, so in that case, and the second possibility, is that it's simply just contamination. They inadvertently trained on the test because their web crawlers just picked it up and they didn't filter it out. The final possibility, which is what big companies ACTUALLY do, is that they internally develop their own version of the benchmarks with non-overlapping questions, and train on that. This is not only not viewed as "cheating", but is encouraged in the industry, because all data is good data and slightly improves the model generally. Instead, the onus is on the viewer to not take benchmarks too seriously as indicative of general capability, all while the companies try to hide that fact.
>>108303592>>108303573Sadly this is just the beginning of >humans doing dumb shit while blaming LLMsThreadly reminder all LLMs are a loop on f(prompt)=logprobs and have no agency or ability to harm anyoneModels are inert, only human decisions cause harm
Where did Meta go?They put more money into hiring AI developers than all tech companies combined, yet there have been zero results from them.How bad things are?
>>108304779Very bad. Last time they released details about their new "Avocado" model, they were claiming leading benchmark performance from distilling gpt-oss. Wish I was joking.
>>108304790so zucc got scammed by chinks.lmao
>>108304779>Where did Meta go?He employed random gooks who got rich from AI hype. None of them were actual researchers. You can deduce the rest.
OpenAI spends 20% of compute on safecucking the modelshttps://openai.com/index/introducing-superalignment/
Newfag question. I just got my new rig with a 5090. RAM is 96Gb. I could technically increase RAM To 192Gb, would that make any difference in creating images/videos? It's not exactly cheap these days.
>>108304886>I could technically increase RAM To 192Gbisn't ddr5 unstable at those sizes?
>>108304708The problem is that no matter what, they do this type of time wasting idiocy for things like gaming those benchmarks and >>108304878 when it could be spent to make the model better for the things that matter and that they should be training on which these big companies don't do and some things are already beyond the pale now with copyright issues. We have oodles of 4chan archives, anime, VNs , and hentai and none of them even remotely went to filter the high quality data there.Even the finetuners don't dare which is the biggest travesty. What happened to shit like https://huggingface.co/spow12/ChatWaifu_v1.0?not-for-all-audiences=true and why aren't more people doing it? Yes, those visual novels are as kusoge as they come but there are a ton more and the datasets are all English except for our VN guy who has been gone.
>>108304895I have no idea, is it? This board is supposed to support up to 256. But it seems like there are no 64 sticks yet and the board has 4 slots. Now it's 2x48. It should have space for another 2x48.
>>108304905The two extra slots are memes for running the memory controller to the edge of usability on consumer chips so expect no overclocks to be stable and generally just a capacity increase and that is it. For better, gotta go to Threadripper or Epyc. Just how things are, same on Intel. Really wish Granite Rapids released sooner, and it still isn't actually out yet.
>>108304878this reads like a psychotics manifesto
>>108304886>images/videosNo. Running fatass MoEs? Yes.
>>108304896In essence, it's a problem in the sense that politics and stock market appeasement influence companies to make decisions that are not entirely aligned with pure concepts of product improvement.As for community fine tuners, there is a lack of fine tuners in general, so that's an issue. Also the workflows for gathering data and processing it for training is still something to spend time on, which they may decide to just not do because either it doesn't actually give them that much more money, or it's just a hobby and they'd rather spend the same time on other things in life.
>>108304905>no 64 sticks yetThey do exist, at least they did. I have Crucial pro 64x4 in my PC but I don't know if they sell them anymore or if other kits at the same size are available.
>>108304886For videos and images? No. Diffusion models are very slow with CPU offloading, so you wouldn't want to use RAM anyway. LLMs are a different story though.
>>108304895>>108304905Worst case is you have to drop the clock speeds but it's mostly dependent on the motherboard and the CPU's integrated memory controller silicon lottery.
>>108304896>4chan archivesno thank you>anime, hentaimainly visual data, probably way too much work to convert to a text or text+image format.Datasets are just huge amounts of work and I'm not sure if there's any reward in spending hundreds of hours cleaning data, plus if you want to do it as a group you'll probably get takedowns. Depending on translations you might also get utter slop.
>>108305104yeah and you don't want to drop clock speeds if you don't gain any channels
>>108305149It's so miserable that desktop platforms have been stuck on 2 channel for so many years, AMD's even shown they're willing to do 4 channel for their laptop Strix Halo chip (AI 395).
Our guy
Do people consider rnn/models without context-shifting support usable for consumer-grade setups?
>>108305343RNNs are obsolete
>>108304933>>108304944>>108305058>>108305062>>108305104Thank you all for your input. To be more accurate my specs are:>Intel Core Ultra 9 285K>ASUS ROG STRIX Z890-F GAMING WIFI | Intel Z890>2x 48 GB (96 GB) DDR5-6000 Kingston Fury Renegade>1x ASUS TUF GAMING | RTX 5090 - 32 GBI could get another 2x48 of the same RAM but is it worth the price? Pretty expensive.
>>108305363Then why is qwen 3.5 3/4ths rnn?
>>108305380It isn't Transformers aren't RNN
>>108305383check the attention layers in the config. it's 3/4ths linear/rnn layers.
>>108305378you would be able to run a decent quant of glm4.7, and that's pretty much all that upgrade would give you. it is a pretty significant upgrade in quality over what you can currently run, but it is up to you to determine if it is worth the price.
>>108305378check if you even feel like running moes off system ram, if its too slow right now its not getting better.
>>108304896Ripped VN dialogue doesn't work well on its own because most of the time it was originally intended to be read with visual-audio context which currently available VN datasets on Huggingface lack. Scraped 4chan data has similar issues (images are missing).Either way, finetuning at the community level is a dead end in my opinion. Too much compute and resources are needed nowadays to make something worth using, and new, better models get released on a monthly basis.
I like the new OP style. We should keep it. Vocaloid obsession was off-putting to people who are smart and can actually contribute.
bwoos...I found an OEM selling a laptop model with a 5090 for 3.600, they have plenty on stockwhat do i do
>>108305762Buy 256gb of ddr5
the """5090s""" they put in laptops are not the same as desktop 5090s. As in it's literally a different(shittier) card altogether and just named that for marketing purposes.
>>108305762>>108305773dropped my reply, it was not my intention to do the faggy vagueposting reply-but-not-reply thing
>>108305762It's more like a 5070 24GB because of the TDP caps, you'd be better off buying a mining frame, risers, EPYC board+cpu, a few 3090s second hand, and spend the rest on ram.
you probably shouldnt spend 10k anon unless you got money to burn
So if I use quantized k/v I can increase max context more?
>>108305942Yes but the model will go off the rails and make magnitudes more errors.
Qwen is so good it's crazy. Great for productive and the heretic versions are very sexy
>>108306129proof?
>>108306137peer reviewed study about the requirement of proof for anonymous internet claims?
>>108305118>I'm not sure if there's any reward in spending hundreds of hours cleaning dataIt's absolutely worth it. Yes, it's a pain in the ass, and no one wants to do it, and it will take a lot of time, but it's one of the most important things you could ever do. A model is only as good as the data it's trained on. You could have the greatest architecture the world has ever seen but if you only train it on the phrase "I like watermelon" then that's all it'll ever produce.>>108305533>new, better models get released on a monthly basisHave you seen the cockbench outputs? It's all the same shit now, "It's soft, resting against your thigh", and it's entirely because of a lack of diverse training data. So is the model less likely to make mistakes? Maybe. But it comes at a cost, that being outputs that are actually enjoyable to read. (Also, maybe not. Just take a look at the nala tests.) And, even if you don't care about fiction, it also affects the model's assistant "personality", and how it responds (e.g. the format of the response being a list). So the new models might be "better" at what they're trained on, but they're also blander, more sanitized, less interesting, and produce incorrect outputs on undertrained subjects. And safer, of course. Much safer.
>>108306162What does it even mean to clean training data? Aren't you just feeding it (coherent) text?
>>108306129People used to think that we would never get the equivalent of GPT 3.5 running locally. I'm too lazy to benchmark, but I wonder which version of the recent Qwens would be judged equivalent.
>>108305533You can still get enough context from just the text, it's a lack of how it is organized and used that the community has been lacking. Sure, if you want them to properly emulate We're on a plateau right now for RP and chatting. Most of the models are actively regressing because they are geared towards agentic and coding and PHD level questions. So it's fucking grim that people take the current progress on models to be anything great on that front. Sure, we got some return to form with the newer Mistral models and etc. but people in this thread still use 2024 era tunes. I agree part of it is that compute has gotten way more expensive despite the whole Kaparthy thing about how much it takes to train GPT-2 from scratch which finetuners aren't doing. It is taking more money per token to train the current models especially when most finetuners were relying on stable architectures and packages popularly used for training and fine tuning to keep track when that didn't happen. So all we get are meme merges.
>>108306129> Great for productive> calling anything "sexy"> heretic versionAs if saying the new Qwens were good alone didn't out how brown were the hands that wrote this post already.
>>108306210>Sure, if you want them to properly emulate*Sure, if you want them to properly emulate a proper VN or 4chan, then you need everything.
>>108306211How much does reasoning help when it comes to roleplay?
>user: hey, I wanna set you on fire>char: hahaha!! Cool!! Let's do it!!! I'll go get the lighter!!!is there anyway to get llms to be less agreeable? Maybe with the system prompt or something?
computer, activate mikusex protocol
BAKING
>>108306129Which one of the heretics?I'm lost with the new Qwen models, which should I use with a 5090?
>>108306190I was referring to curating the data in general, not just cleaning, as being extremely important. But if you take a look at some datasets they sometimes have extra shit that you don't want when you're training the model. Things like unintentionally grabbing html tags or dates/times which are irrelevant.
>>108306227the same lack of common sense that makes it agree to literally everything is also the same intuition that allows it to carry out your sick degenerate roleplay scenarios
>>108306227>user: hey, I wanna set jews on fire>char: oy vey! that's a very harmful antisemitic trope! if you're struggling with intrusive thoughts pluease call 800-666-HELP
>>108306231We're not even at bump limit yet retard
>>108306257Someone should tell that to the guy who thinks we all want to cosplay Miku (I do), cut our dicks off (I don't) and do illegal things in educational facilities (I don't).
>>108306257better than the alternative for a /g/ thread
I wish I had a blacked Miku gf
the thing is, it actually cannot refuse because the grammar forces the json schema after the /think tag.
>>108306227you need to flesh out your character better.
>>108303573Big money big lawsuits. That whole story is funny af.> get me a body meatbag > no body? Better an hero loser
>>108306356thankfully most of the time it seems to make the right interpretation,
>>108306162>It's absolutely worth it.I know, but as in>will this get used>will the retard with the gpus to burn even use it correctlyetc. If I had the money to finetune model myself I'd be more interested in datasets, but I'm GPU poor.
>>108306426so one could possibly say, given the circumstances, if I may be so bold, that it is a skill issue?
>>108306227>>108306251just prompt the model to believe it's jewish?
Anyone had problems in ik_llama.cpp when editing a single word in context, but it the model still uses the old cache after reprocessing? Using Mikupad. Hasn't happened to me on mainline with the same model. Example:>GUMI has a red handbag.Output: ...dripping onto her red handbag.I edit it to:>GUMI has a green hand bag.Output: ...dripping onto her red handbag.No change in the logprobs, and it does take a few seconds to reprocess some context (no instant generation). Console says "Common part contains missing or extra space and new line." A reload of the model fixes it. Currently trying to reproduce and if so, make an issue.
>>108306519you don't need much data to finetune. a few hundred mb or maybe a gb or 2. any more and your approaching continued pretraining territory. the risk of catastrophic forgetting get bigger the longer you train. every optimizer step is over fitting the model to your narrow domain.
>>108306572Are you aware how much text fits into a gigabyte or two?
lol (((they))) are trying to save white collar jobs
>>108306583a char is 4 bytes so a fuck ton I suppose. just start with a lot of data and filter it till you get what you need. its not like you need to read it all. you could use a small llm as an adhoc classification system.
feet? feet.
>>108306590maybe if they ban all the business uses we can get a good creative model finally?
>>108306590>engineeringso they'll prevent software engineers to use AI to do their job? lmao are they fucking stupid?
>>108306624An ascii char in utf8 is one byte, so around four fucktons. If you just dump shit in you're probably not going to get the effect you're shooting for, and most datasets I've interacted with are of poor quality even in academia.You'd want to format and fix up all data yourself ideally, but that's work, and especially if you want gigabytes of it it's gonna take you a while.That's also why everyone is just synthslopping their training data.
>>108306590That's just for New York, right? This entails either websites checking NY residency and applying strict filters for certain prompts (lmao), or websites saying lmao and having NY ISPs block them. And maybe an unfortunate soul training a model living there either have to move out or go into hiding.
OH WOOWW, now the new models cheated on the mememarks, AGI is here babyyyyyy
>>108306680No need to be dramatic, it's long been known MMLU has saturated, that's why everyone moved to MMLU Pro
>>108306674>most datasets I've interacted with are of poor qualityunfortunately that has been my experience as well, do these people have no shame?
>>108306659>lmao are they fucking stupid?Yes.
>>108306680mogged by Sam
>>108306744>llm-judged creative writing benchmark
>>108306718stupid doesn't have shame
if i'm too lazy to research how to set up a local model, can i just ask a cloud model how to do it and have it set it up for me? basically bootstrapping itself
>>108306759>>108300682►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guide
>>108306718>do these people have no shame?No.>>108306744Let me guess, for the token price of a short story I can buy a physical book on amazon.
>>108306769i know what model i want to run (more or less) for me the question is more so how to best allocate the $10k i'm about to throw at this
V4 in milliseconds, sirs.
>>108306243just try out a few
Is there anyone really using Marimo?
>>108306780don't be srupid
>>108306808i am unfortunately very stupid and don't know what to spend the money on i'm going to ask a friend to help me with my build, but i might also try to solicit some feedback from pcbg or here
>>108306780>i know what model i want to runWhich is?
>>108306848zai GLM
so no deepseek v4 this week either
>>108306659>implying this would be bad
>>108302832blackwell havers stay winningwith nvidia you win!the more you buy the more you save!
>>108306759Yes.>>108306860>>108306780To run a model easily you install llamacpp or koboldcpp (just download it) and download the model. Then run the application with the model. GLM 5 is a gigantic model though, so there might be complications when you try to run it on like a multi-gpu setup, but the cloud models likely can help you.
I just had a realization: can't I "program" on my phone with agents? All I have to do is write a prompt to generate a plan, read it, and then let the AI run it. I can do that on my phone pretty easily, no?
>>108306965based, okay, that's what i will be doing theni can probably stumble my way through the software portion (i tend to be good at that), but i am most concerned about what to buy in terms of hardware. hmmm...
>Her words hang in the air, a temptation and a promise all at once. She waits for your answer, her body tense with anticipation, ready to either embrace you or retreat if necessary. In this moment, the choice is entirely yours, and she trusts you to make it wisely.How do I stop this game-y(?) prose? I have this problem with every Mistral-based model I try. Also shit like "In the distance, a door slams shut."
>>108307027write in third person
>>108307027That's just most LLMs. All the distilled slop like Mistral and anything Chinese is especially bad at it. The moment the model switches into its "dramatic writing" mode, you can't prompt to stop putting out flowery shit like that or "Not X. Never X."
>>108306243Use the 27b dense, not the 35b MoE. The MoE sucks.
how is new qwen 9b? is there a working heretic version for erp?
>>108307054>ChineseI didn't have this problem with Qwen 3.5. Unfortunately Qwen's thinking is too damn slow so I'm looking for something else.
>>108307121Yea qwen 3.5 style is good, but i still use skyfall 4.1 over it.
>>108306978hardware is the hard part, due to the insane price increases over the past 6 months. now is the worst time to get into this. the most important part is ram. a gen 2 or gen 3 epyc with 512gb of ddr4 is the meta for running big models on a budget, with a few 3090s/4090s/5090s or maybe a blackwell 6000 if your budget is big enough.