/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106582475 & >>106575202►News>(09/14) model : add grok-2 support #15539 merged: https://github.com/ggml-org/llama.cpp/pull/15539>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking>(09/09) Ling & Ring mini 2.0 16B-A1.4B released: https://hf.co/inclusionAI/Ring-mini-2.0>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106582475--Paper: Steering MoE LLMs via expert activation/deactivation for behavior control:>106586569 >106586649 >106586696--Papers:>106589525--Node-based agent circuit for multi-model daydreaming experiments:>106591301 >106591335 >106591411 >106591447 >106591518 >106591560 >106591683--DDR5 RAM purchase recommendation for glm air over waiting for Arc B60:>106585865 >106585907 >106586028 >106586157 >106586691 >106587973 >106588740 >106588044--MoE architecture enables larger models to be faster through selective parameter activation:>106587275 >106587302 >106587405 >106587419--glm 4.5 air setup issues in Silly Tavern template configuration:>106586816 >106586886 >106587013 >106587027--Qwen model dataset imbalances and performance tradeoffs:>106582623 >106582643 >106583124 >106583138 >106583143 >106583155 >106586595 >106583147 >106592024 >106592033 >106592110 >106592242--VibeVoice model availability, quality tradeoffs, and reverse-engineering challenges:>106585909 >106585930 >106585940 >106588461 >106586039 >106586587 >106586610 >106586647 >106587720 >106586704 >106587007 >106587090 >106588243--CPU offloading performance trade-offs for mid-sized MOE models:>106583262 >106583338--IndexTTS 2 speed and interface improvements for text-to-speech:>106585295 >106585756--Grok-2 support merged into llama.cpp:>106587526 >106589842 >106589942 >106589949 >106590115--Critique of flawed AI-generated writing despite model advancements:>106592247--ROCm 7.0 RC1 boosts AMD's AI performance, challenging NVIDIA dominance:>106589235 >106589359 >106589362--Parameter tuning suggestions for K2 model version differences:>106584425 >106584478 >106585603--Miku (free space):>106584024 >106584226 >106584417 >106587589 >106587800 >106589360 >106589741 >106589764 >106592033 >106589913►Recent Highlight Posts from the Previous Thread: >>106582480Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
qwenext goofs???????????????????????????????????????
>>106593104I want to do backpropagation with her, if you know what I mean.
>OP image is a random non-slop Miku I posted a few threads ago
>>106593104PANTYHOSE FEET
>>106593132Boil some rice, put it on a plate and let it dry and then eat it for a similar experience
>>106593164yes butWHERE ARE THE GOOFS?
>>106593180goofs for this feel?
>>106593180I look like this, say this, and also fail to quote posts
Jaks are a sign of a diseased mind.
>>106593196
what prompts this schizophrenia? I just want my hecking wholesomechungus 'THE CAKE IS A LE LIE' qwen 80b goofs
>>106593208Me when using gpt oss
this thread ggoofy af
The melting man is backhe's much softer than beforedid you borrow a personalityor did you steal it all on your own?
>decide to take a break from /lmg/ and doomscroll on twitter for a bit.>it's not X, it's Y>the smell of stale cigarette smoke and regrets>fake greentext pasta spaced into paragraphs>you hit on the core of the issue>shivers, ozone, Elara, emojishow do I unsee
https://files.catbox.moe/eegitb.jpg
>>106593302I fell for it last time, ain't happening again.
>>106593302thicku miku
>>106593302nigga that's nuts
>>106593302Meh.
https://old.reddit.com/r/LocalLLaMA/comments/1nhgd9k/the_glm_team_dropped_me_a_mail/lol glm has employees doing social media engagementwonder if one of them is among the people shitting this thread right now
OP just delete thread if you can
>>106593393nah, fuck qatroons
>>106593386You are even more gullible than reddit.Or something worse.
>>106593393Let the retard seethe. It's not like he can do anything.
>>106593386why would GLM shit up the thread where their models are praised?>>106593413what's the shitter even angry about? Is it the thread mascot debate again?
>>106591301I was thinking of fucking around with those sorts of workflows to see if I can make a smaller model perform better by making it go through steps before providing a final response. Almost like a thinking workflow that tries to extract as much information from the big picture to then focus on the relevant details and the like.I got caught up with other projects and ended up forgetting about that.
>>106593420fuck your thread culture bullshit
>>106593421What's the UI in the quoted reply? Seems cool.>>106593424fuck you I didn't even advocate for "thread culture" I was just asking a question you dork
I ask again , just in case. Can "Mistral-Nemo-Instruct-2407-GGUF" handle beyond 16K context?
>>106593429Try it. Only you can know if it can handle it to your satisfaction.
>>106593429Technically yes but realistically no. Just try it out for yourself, the model could fit on a 6G card ffs
>>106593427>What's the UI in the quoted reply?Not sure, but I know of two UIs that can do that kind of thing, NoAssTavern (simpler and recommended), and astrsk (don't even download it, has telemetry and shit).
>>106593429it creates mustard gas
>>106593429No.
>>106593429Yes, of course.It will perform worse than it does at, say, 4k context, however.
>>106593420>why would GLM shit up the thread where their models are praised?you assumed I was talking about the meme spammer. I don't even pay attention to his image spam, it doesn't register in my eyes, image posters are to be ignored.I was talking about people who praise this garbage model like you, you are the reason this is a garbage threadspammer is just a minor annoyance that will go away after a b&, the retards never go away though
>>106593462>image posters are to be ignored.sir this is image baords
>>106593444Huh, I stumbled upon another interesting UI called "talemate" mentioned in one of the NoAssTavern's issues.https://github.com/vegu-ai/talemate>>106593462Every model smaller than Deepseek is garbo, get a grip. Smaller models like Air are the only thing most people can run. Fucking hell, you see how often Rocinante gets mentioned here? What is there to discuss with "non-shit" models if nobody can run them you dickweed?
>>106593487>talemateAlright, that looks promising.
>>106593487>if nobody can run themthen let's close this so called local model general if no one is even doing local?
>>106593504>if no one is even doing localNobody is using anything smaller than deepseek? news to me...
>>106593504I am running the local sirGLM chan very large
>>106593511deepseek 8b
>>106593511>Every model smaller than Deepseek is garboyou said it yourself it's time to stop
After I stopped shitposting in this thread the quality of it became even worse. I can't believe it.
>>106593526You're absolutely right! This really delves into the tapestry of how shit lmg is!
>>106593420like kids need a reason to be angry
>>106593539>itt raises the kid experince
>>106593525It's garbo compared to large, cloud-hosted models but it's still fun. If the only car you have is a shitbox, do you throw it away? Come on, man.
>>106593393Delete your posts
>>106593553>If the only car you have is a shitbox, do you throw it away?yes, take the bus and train (API) like a normal person
>>106593525maybe I love garbo
>>106593539While it doesn't change my position on it at all, I suddenly understand where the proponents of age verification are coming from.
>>106593566That wouldn't help tho as clearly an adult is helping and encouraging the corruption
>>106593301You cannot close your eyes once they've been opened
>>106593566lmao you actually think age checks are to protect kids?
>>106593575anon is you okay, you can close the eyes
>>106593559Nah I think I'll stick to my shitbox. I can drive it when and where ever I want, and it won't suddenly change routes and timetables. But I support your ability to choose, just don't pretend like the only options are public transport or a lambo...
>>106593301If you get into imagegen, you'll see it everywhere.
>>106593574It wouldn't, but I get the emotional reaction.
Thanks this is very helpfuls.
>>106593612I do not like this miku
>>106593587Im fine. Thanks for asking
can i get a short stack miku pls
>>106593629
>>106593690best xhe can steal is shart miku
>>106593690No. You get a baby Miku instead.
Is NoobAI still the meta or have things moved on
>>106593694>>106593698>>106593704my day is ruined
>>106593709ponyv7 releases this month
>>106593743oh? can it be downloaded or is it online only?
>>106593743back to your board barney
>>106593743more sdxl slop?
>>106593774as opposed to what then?
>>106593777you haven't heard about the current best local model called chroma?
>>106593777idk, I haven't kept up with image gen, I wish we had something integrated with LLMs instead of CLIP
>>106593777Chroma SOTA 4futures!
>>106593787Can it match noobAI/pony for character stuff?
>>106593787That's just a rip off of ligma
>>106593774Wasn't it gonna be based on some random shit nobody has ever used>AuraFlowYep.
>>106593756weights>>106593774it's based on auraflow
>>106593813>weightsok, can it be downloaded or is it online only?
>>106593820Yes you will be able to download it
>>106593829Thank you.
>>106593832You're not welcome
>>106593832You're free to leave
>>106593104Good morning /lmg/ frens. I've got a question:So it it pretty much confirmed and fact that you HAVE to use at least a 12B model I order for it to be "smart"? (Not forgetting important details mentioned earlier in the content)? Based on my own testing 7B - 8B models struggle immensely with this. What has your experience been like with the different sized parameter models?
>>106593869If you don't train on The Entire Internet a simple 4B is more than enough for the narrow use case of ERP.
>>106593104mikubutt
>>106593914should've been a miku short stack
>>106593869I wouldn't say smart, but 12b models are about the starting point where you don't need to hold their hand for every reply to get a usable output.
>>106593919*miku shart stacked
>>106593539He's just like me except I'm using a pc
VRAMlets:>image generationpretty good>voice cloning/TTSokay>text generation (simple) decent>text generation (advanced)really bad
>>106593930What is this (advanced) thing about?
>>106593935DeepSeek K2 4.5
>>106593869I don't think 12B is enough, Nemo is pretty dumb too. GLM-air often mistakes who did what and struggles with theory of mind (secret keeping test and such). I'm not cool enoguh to run larger models though.>Not forgetting important details mentioned earlier in the contentThis one in particular is about specific context training and architecture, not really about parameter size.
>>106593935not brain dead
>>106593942>GLM-air often mistakes who did what and struggles with theory of mind (secret keeping test and such)Mistral Small 24b and Gemma 27b are guilty of both these things as well.
>>106593942>GLM-air often mistakes who did whatsounds like prompt format issue that nemo used to have early on, probably broken implementation as usual
Holy schizo
Cursed schizo
>>106593953>>106593958>>106593942>>106593869>>106593881>>106593924So I guess we have to accept that ALL local LLMs will make fuck ups in some way shape or form? What contribute more to how BADLY It fucks up: perimeter size, architecture, and/or training methods?
>>106593857>>106593862Bawww.
>>106593958I mostly run it in text completion modecan't have prompt format issues if you don't format your prompts.
>>106593942>GLM-air often mistakes who did what and struggles with theory of mind (secret keeping test and such).Funny. I find that it does pretty well in keeping secrets.Granted, I do prefill the thinking block with instructions to consider exactly those things, which might have some adverse effects in other areas I guess, but still.To me, the one strong point about GLM is that it actually follows its thinking, instead of something like Qwen that might draft a whole plan in the thinking block then reply with something completely different, even with guidance.
>>106594021And for clarification I'm mostly referring to forgetting details right after you mentioned something, temporal coherence (if a system prompt or previous prompt mentions there in a park, they should stay in the park until stated otherwise or the LLM makes a transition that makes sense), not randomly switching the genders of main characters (this one really likes doing that: >>106593869 , ) etc
>>106594021>What contribute more to how BADLY It fucks up: perimeter size, architecture, and/or training methods?yes
>>106594021>What contribute more to how BADLY It fucks up: perimeter size, architecture, and/or training methods?Training on The Entire Internet will do that to you.
has someone scrapped AO3 to create a dataset?
>>106594111it's already on most models and yes they did to creators dismay and threats
>>106594111IDK if they specifically from AO3 or from other sites to but here's The closest thing I could find to something like that that hasn't been nukedhttps://huggingface.co/datasets/mrcuddle/NSFW-Stories-JsonLIt's not formatted to actually be useful for training but it does have a bunch of raw stories.
>>106594146https://archive.org/details/AO3_final_location
>>106594111its better to just do it yourself so you can filter it how ever you like. its like 40% gay porn by tag. and 50% Harry Potter by universe. it needs balancing if you want it to be useful.
I thought I could get away with running unquanted <4B model CPU-only on an old machine.Nope, absolutely unusable.Edge AI Status: Meme.
>>106593869Again, your prompting format is all wrong, if that's Llama 3.
>>106594126Gemma 2/3 and Mistral Small, that I've tested didn't appear to be trained on the ones explicitly tagged as "Explicit" or "Underage".
>>106594305It isn't. Elaborate further if you're certain it is. If you're going to tell someone something is fucked up with the hopes they will unfuck it, at least explain WHY....
>>106594319i mean obviously, why train on low quality illegal shit, the classifier correctly said hell no to that sick shit
>>106594324https://www.llama.com/docs/model-cards-and-prompt-formats/meta-llama-3/<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>What is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>Bonjour! The capital of France is Paris!<|eot_id|><|start_header_id|>user<|end_header_id|>What can I do there?<|eot_id|><|start_header_id|>assistant<|end_header_id|>Paris, the City of Light, offers a romantic getaway with must-see attractions like the Eiffel Tower and Louvre Museum, romantic experiences like river cruises and charming neighborhoods, and delicious food and drink options, with helpful tips for making the most of your trip.<|eot_id|><|start_header_id|>user<|end_header_id|>Give me a detailed list of the attractions I should visit, and time it takes in each one, to plan my trip accordingly.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>What is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>Bonjour! The capital of France is Paris!<|eot_id|><|start_header_id|>user<|end_header_id|>What can I do there?<|eot_id|><|start_header_id|>assistant<|end_header_id|>Paris, the City of Light, offers a romantic getaway with must-see attractions like the Eiffel Tower and Louvre Museum, romantic experiences like river cruises and charming neighborhoods, and delicious food and drink options, with helpful tips for making the most of your trip.<|eot_id|><|start_header_id|>user<|end_header_id|>Give me a detailed list of the attractions I should visit, and time it takes in each one, to plan my trip accordingly.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
>>106594302>CPU-onlyYeah, that's going to be pain. Not so much the token generation, but prompt processing is so slow.There's a reason we use MoE models the way we do, generation on CPU, PP on the GPU.That said, does whatever device not have a GPU you could use for PP with vulkan?
>>106594324just look right at the middle of the screenshot, man.
>>106593104Can someone recommend best Mistral model? Preferably abliterated
>>106594382The biggest you can run. Any.
>>106594382Medium 3 or Large if you know where to look.
>>106594355Old machine was promoted into a home server after I got new one. I like my home servers to be quiet and low-power, so I don't feel like sticking a GPU in it.
>>106594353>>106594356That's a fuck up with how axolotl inference outputs. It likes to duplicate portions of text. Here's the correctly formatted text file i inference off ofhttps://files.catbox.moe/fozpkz.txtNothing I thought it is fucked up as far as I can see....
>>106593301Enjoy the wonderland and see how deep the rabbit hole goes
>>106594408>>106594356>>106594353>>106594305It either way it completed in The exact fashion it was supposed to complete in so I don't see what the hyper fixation on that is.
>>106594421A single extra space can make your model drop 90IQ
>>106594421>I don't see what the hyper fixation>>106593869>Not forgetting important details>Based on my own testing
>>106594435>>106594439Nta. So what was stopping you from pointing that out the first time?
>>106594421>>106594446Nta it's technically formatted correctly but also not really. It has duplications of the assistant token towards the middle and the end. remove those and then try again. Not quite sure why ultra autists >>106594353>>106594435>>106594439Were so unwilling to point that out
>>106594446The assumption that anon can google "llama3 chat format".In that much, I admit I was wrong.I don't care either way. Anon wanted info on how his chat format is wrong. I provided it.>>106594461>it's technically formatted correctly but also not reallyIt is or it isn't. It is not.
>>106594461>That's a fuck up with how axolotl inference outputs
>GLM-4.5-IQ2_Mis it even worth using or would i be wasting my bandwidth?
>>106594470They understood how the formatting works it just had duplicates for some reason. He probably ran The prompt to AI or something and it injected the duplications and they didn't realize. A simple "hey you have duplicate assistant tokens you might want to remove that" what have sufficed instead of being condescending. You know it's exhausting going out of your way to be that way right?Not that it would have made much of a difference anyway since anything below 12b is retarded regardless.
>>106594499>anything below 12b is retarded regardless.completely wrong though that is the fault of training on too much data
>>106594514Who are referring to?
>>106594522every lab right now cramming too much into small models instead of making narrow use case ones
>>106594527You mean something like>https://huggingface.co/allenai/Flex-creative-2x7B-1T
>>106594499Anon is assessing the quality of models and can't use google, read or follow instructions.>they, he, theyBe consistent.I posted the example in llama's site. With his carefully constructed tests, eagle eye and attention for detail, I would have expected him to notice all the empty space between the chat format tokens and the content, which his catbox post clearly doesn't have. The other anon pointed out the template dups.
>>106594559>data owners can contribute to the development of open language models without giving up control of their data. There is no need to share raw data directly, and data contributors can decide when their data is active in the model, deactivate it at any time, and receive attributions whenever it's used for inference.What?
>>106594559no what the hell is this abomination fuck allencucks
>>106594565I used the format though, it just had duplications. The only error where the duplications....
>>106594387>>106594394Ty, I just saw a lot of focused tarins... focused on some specific stuff like RP or philosophy, but I was looking for good one for general purpose research and deep thinking. So wander maybe someone know a good one that is stands out
>>106594575>>106594576There's also a literal reddit version.>https://huggingface.co/allenai/Flex-reddit-2x7B-1T
>>106594585What da fak I just spit out lol, I mean *trainings
>>106594583>The only error where the duplicationsYou're missing the empty lines.
does Linux have an alternative to sillytavern yet
>>106594598>>106594559It claims they can contribute to training without providing the user data.... How the fuck does that even work? Am I misunderstanding what they're saying?
>>106594619does window?
>>106594616Which followed after the duplications right? Removing those should have fixed the incorrect formatting
>>106594619llama.cpp HTTP server + curl
>>106594625You basically train a smaller domain specific model (expert modules) that can later be part of the larger final product.>https://www.datocms-assets.com/64837/1752084947-flexolmo-5.pdf
>>106594626I don't use windows
beg me to shitpost again so this thread stops being dead.
stfu im zorking it
Just give me the goof
>>106594630Look at this >>106594353 or llama's site.After<|begin_of_text|><|start_header_id|>system<|end_header_id|>there's an empty line. Every other line is an empty line. Those are not in your catbox file.
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
>>106594652i beg of sama-sama please just let us rest in piss
>>106594044>>106593104I'm asking this to everyone: what's the bare minimum parameter size someone should use if they want to have decent RP where the "assistant" isn't retarded? >>106594666I don't think those are strictly necessary given that it autocompletes correctly without them. How do you know that's not done just for ease of readability?
>>1065946874B with proper training.
>>106594687you'll have to accept retardation and learn to live with it
>>106594687>How do you know that's not done just for ease of readability?>>106594470>I don't care either way. Anon wanted info on how his chat format is wrong. I provided it.
>>106594729I wonder if the deepseek api users over at /aicg/ have to suffer with it anywhere near as much as we do.
>>106594652i dare you to do it again
>>106594738>Doesn't answer the question
>>106594743Yes, I don't recommend reading their thread for your sanity but even they complain about all their models even Opus and such.
>>106594756Damn... So the retardation is inescapable no matter how big or "smart" the model is?
>>106594687The thing is, retarded is a spectrum.Some people will have more tolerance for certain errors and certain magnitudes of errors than others, so the lower boundary us fuzzy as hell and a model can be perfectly serviceable in one scenario while fucking up another.Some people will tell you 12B is enough, others will say 70B dense, other's will tell you to not bother unless you can go for the biggest best-est thing because retardation exists even in the best models, just to a much lesser extent.Etc etc.tl;dr : There's no consensus and I'm not sure there can be, at least for now.>>106594648Reminds me of CUDADEV's idea of training a bunch of different models on a subset of the full training set, running them in parallel, then averaging the logits, although in that case it was more about getting the results equivalent to a model trained on >[number of models] x [training tokens each model sees]tokens than specializing models.
>>106594743Deepseek as to deal with theirs. A much worse fate.
>>106594774Correct, this is the LLM blackpill there are zero non retarded one currently.
>>106594743i am a 4 bit cpumaxxing coperllama_model_loader: loaded meta data with 52 key-value pairs and 1096 tensors from models/Kimi-K2-Instruct-0905-GGUF-smol-IQ4_KSS/Kimi-K2-Instruct-0905-smol-IQ4_KSS-00001-of-00011.gguf llm_load_print_meta: model ftype = IQ4_KSS - 4.0 bpwllm_load_print_meta: model params = 1.026 Tllm_load_print_meta: model size = 485.008 GiB (4.059 BPW) llm_load_print_meta: repeating layers = 483.197 GiB (4.053 BPW, 1024.059 B parameters)llm_load_tensors: offloaded 62/62 layers to GPUllm_load_tensors: CPU buffer size = 420246.00 MiBllm_load_tensors: CUDA_Host buffer size = 927.50 MiBllm_load_tensors: CUDA0 buffer size = 13632.97 MiBllm_load_tensors: CUDA1 buffer size = 18510.81 MiBllm_load_tensors: CUDA2 buffer size = 18668.47 MiBllm_load_tensors: CUDA3 buffer size = 19280.69 MiBllm_load_tensors: CUDA4 buffer size = 5382.00 MiB
>>106594794>>106594786>>106594780Are we at least in agreeance that The higher the perimeter size, be lower the retardation generally is? Or is that not a reliable way to gauge?
>>106594817Generally somewhat, but then there's stuff like Llama4.
>>106594699do you have empirical evidence of this claim? what 4b model is best for rp? how come 4 and not 3 or 5?
>>106594817>>106594822dataset quality matters a bunch. garbage in garbage out..
>>106594817Generally, yes. Although training data and procedure plays a large role in it too, and there's also dense vs sparse to consider, etc.Basically, there are not enough scientific comparative experiments for us to tell how much each component matters (general architecture, depth, width, training data, training procedure,e tc) and there's a good chance that the ffinal result also varies with usecase.Meaning, it's a clusterfuck.
>>106594831That's the best I can run. So it HAS to be the best size and everything anyone could ever need.
>>106594856What do you use your 4B models for?
>>106594867I was joking. I'm not that anon. But I think the sentiment is still the same.
>>106594867I can run and currently cope with 12-24B but models are so bloated it's implausible we can't do better with less trash and more use case data.
So what I'm getting here is that LLMs RP. What else can they be useful for? I feel like the main reason they don't hit the mainstream is because you need beefy graphics cards to even consider trying them. And tonight if you consider attacking the train them yourself.
>>106594924code and math is the only other use case
>>106594924>I feel like the main reason they don't hit the mainstreamClaude, chatgpt and gemini are mainstream.>What else can they be useful for?>And tonight if you consider attacking the train them yourself.They could be used to correct text before being sent. Other than that, simple translation, google replacement for simple verifiable things, spamming image boards, replying to corporate. You know... the usual...
>>106594924Also non-generative use cases like classifying data.
>>106594974>Claude, chatgpt and gemini are mainstream.Was referring to local LLMs. Also forgive that last part of the last post. I'm writing this on voice to text.
>>106594745i said beg you maggot
>>106593427>>106593444The UI is in the Regions repo, and makes flows for it. Deleting and renaming nodes is jank, but it works otherwise.https://github.com/dibrale/Regions
>>106594998ya that's what i thought pussy
>>106594996>Was referring to local LLMsThen yes. Lack of GPU, not knowing how to compile stuff, terminals are scary and all that. A tech-literacy gap, if you will. Not that anons here are much more tech-savvy.>git pull. thing broke>he pulled
>>106593942The workflow from the last thread is supposed to help with that, but I'm not sure what the best way of testing it is. Might be cool to turn it into a server script if it helps.>>106591301
llama.cpp changed the metal backend and made it eat way more memory, I'm OOMing with the same params that left me with 10GB of headroom on the last commit... curse you gerganov
>>106595008That's pretty sick.I might scrap the shit I was working on and use that as a reference to start over.Or maybe just use that as a middleware between the LLM backend and my app. Either or.
>>106595017fine. enjoy your dead thread.
shitposters won
>>106595242One kike throwing an endless temper tantrum over this thread hardly counts as winning. Imagine a parent, their child is having a full, flailing on the ground, pant shitting tantrum. Are they proud? That's you. Your "pride" is but a cope.
reddit won
>>106594495I was running iq2_kl since it fits on my 5090 + 128 ram setup and yea it's not completly retarded sure beats air... if you can fit that then you can alternatively get away with qwen 235b at iq4
>>106595261funnily enough I don't think I've ever had a pants shitting tantrumI imagine it's rare?
>>106595370I remember pissing myself a few times but it wasn't because of a tantrum.
>>106595041I just want an EXE, not any of that hacker shit
>>106595114What were you working on? Also, deletion and renaming in the Regions GUI is allegedly fixed as of the last commit?
>>106593942I feel like most of the schizo retard moments from glm air come from using cope quants. I switched to using q8 from q3 after upgrading my ram and the difference was immediately noticeable in the way that it remembered and incorporated details from context. Still not perfect and still somewhat slopped, but definitely better.
>>106593444>astrsk (don't even download it, has telemetry and shit).The only non-local host domain it connects is Google Fonts. As far as I understand, you can enable analytics by setting an API key during the build. But it doesn't seem to have one by default. This was a normal site that became open source later.
https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earthmoesissies don't look
>>106595722a single glance at the readme is enough to close the tab instantly
>>106595369thanks downloading them now
>>106595786It has the correct license.
>>106595786Someone posted this one in another thread.https://github.com/onestardao/WFGY
>>106593104Many normies are claiming that AI is "eating itself to death". What do they mean by this? https://www.tiktok.com/t/ZT6ofKC5U/
>Someone in r*ddit built a DDR4 server with 8 MI50 (256gb vram) for the price of a single 5090>400w idleoofDon't build it if you don't have solar panels.
>>106595849Sounds like this shitjeet has no idea what the fuck he is talking about and has no fucking idea how pretraining works. And by "this shitjeet" I mean you.Fuck off back to whatever normie shithole you crawled out of.
>>106595865You forgot about heat and noise too
>>106594795>models/Kimi-K2-Instruct-0905-GGUF-smol-IQ4_KSS/Kimi-K2-Instruct-0905-smol-IQ4_KSS-00001-of-00011.ggufWhen you load first part, does it mean you just using first part or it's automatically know where to look to next one on the load process?
>>106595943>or it's automatically know where to look to next one on the load processThat.
>>106595849Retards who believe AI is a living being that constantly feeds of the internet instead of simply being a file that can be backed up
Grok-2 impressions: (running IQ4_XS)>*Yawn*Not sure if it's just impatience from only getting half a token per second in generation, but really not worth the fuss. Would run Llama-3-70B over it any day of the week-
>>106595953Ty
im backed up rn
Whats a good uncensored LLM? No politically correct bullshit and refusing to give answers. I have low VRAM, I don't mind if its a bit laggy and I don't care about it being 'smart' on programming tasks etc. Most important is just that it chats well and is uncensored in its responses.
>>106595966I actually like grok 2(Q8) and think that it's a hidden gem. Their official prompt on lmarena sucked and made me undervalue it.>>106595985I'd suggest grok2, but you are a ramlet...
>>106595865Just turn it off when you're not using it.Server motherboards come with baseboard management controllers so you can even turn them on and off remotely.
>>106593539Why are parents like this?
>check thedrummer's page on hf>still finetrooning command A>only uploaded Q5_K_M goofswhy is this the state of finetuning in 2025?
>>106596059My amd workstation takes forever to boot if I don't turn off ram training.
>>106596106Be the change you want to see
>>106596053It's decent at NalaIt's less slopped than most open models, but it comes up pretty dry in soft mommy RP, sadly.
>>1065961105 minutes is not a long time, just make some coffee in the meantime. make a script that makes a coffee at the exact time it takes for you to walk to your kitchen plus five minutes and while you are at it have it write an email that tells kumar that he's an asshole.
>>106595985>uncensored>low VRAMMistral Nemo, always and forever.
>>106596053isn't grok2 8 experts 2 active? you can't run it decently with dual channel
>>106595960You should think of AI is an industry that needs to churn out new models in return for investor money.
>>106596134Some people's time is too valuable to be a glorified data entry and sanitation monkey.
>>106596110>turn off ram training.turn off what
>>106596174Sadly not, but I have 12+12 channels
>>106596191Opinion discarded then
>>106595849Not entirely wrong tho I didn't look at the asstok link, new models are more and more poisoned by the gpt slop being poured all over and the labs themselves doing synthetic data and amplifying bias for more slop
which one does the best lolis
>>106596305gemma3 closely followed by gpt-oss they're the only ones with the proper knowledge
>>106595849It is inbreeding, not eating itself to death.
Why are vibe coders like this?
>>106596402Ugh...
grandpa crying about zoomies again
>>106596420https://github.com/ggml-org/llama.cpp/pull/16016Aaaaaaaa
>>106596402It's funnier this way. As long as you don't have to deal with them yourself, anyway.
>>106596402>>106596412>https://github.com/creatorrr
>>106596426https://www.startupgrind.com/events/details/startup-grind-hyderabad-presents-diwank-singh-tomer-thiel-fellowship/explains a lot actually
>>106596402Literally all they have to do is change the remark and nobody will ever be the wiser.
What will happen to Mistral AI now that ASML bought it for $1.3B?https://www.asml.com/en/news/press-releases/2025/asml-mistral-ai-enter-strategic-partnership
>>106596402He's probably trying to build his CV to find a job in America or Europe.
>>106596453Someone will have to.>>106596514>>106596515Oh. I had forgotten what puke tasted like. I didn't want to know that much. Thanks.>>106596522Yeah. It wasn't obvious. Like that other one....
>>106596568honestly don't think he needs to, sounds like he's already making decent money living in the US
>>106596568>DiwankDam Son...
>>106596568sounds like a nguyen
>>106596542>https://www.asml.comOh...
>>106596542Holy shit.I suppose that does make sense, but still.Holy shit.I wonder if the idea is to diversify in case their monopoly on high end lithography machines ever comes to an end or if the intent is to somehow improve their existing business.
>>106596739Lower your temp
>>106596739>if the intent is to somehow improve their existing business.No way...
>>106596793Companies do invest in things other than their core businesses, to the point where sometimes they shift completely away from it.I doubt ASML will stop selling EUV machines to become an AI lab, but the point stands.
>>106595847That's so fucking funny.>Tutorial: How to Awaken the Soul of Your AI in under 60 seconds — by the WFGY EngineIs this what all those "awakened AI" tick toks I've been hearing of are about?
>>106596568>em dash in his two sentence descriptionbros....
>>106596568Hello sarrs I have build very AI system for you
>>106596426>>106596412>>106596402>>106596514>>106596600>>106596568What am I looking at? I see a bunch of shit that looks like it was written by AI. Not even code related to the software. What the hell are these merge requests? I've never merged anything on an existing project in my life so maybe there's something I'm missing here
>>106597029Thanks for reusing this dumb image, MD5 filter works well
>>106597029Guy used AI agents and pushed the files the agent was using to keep track of the work into the repository.Or something like that.
>>106597043Does it now?
>>106597053And he couldn't do that shit on his own fork of the git repo instead of the official one? He doesn't deserve any attention or employment or consideration for anything if he is this self-centered.
>>106597043https://github.com/woltapp/blurhash
>>106597071Looking at the image again, it's worse, the commits were made on his own fork, and he created a merge request.Hell, in all likelihood, it wasn't even him, he just gave the AI agent access to git commands too.
>>106595261>shitposting is throwing a tantrum>4chan is serious businessI would have said that with that the transformation into reddit is complete but this place has been a reddit since forever. Enjoy your dead thread you dumb faggot.
Do I need to change something else aside from the GPU / power supply?CPU : 5500 w/ stock fanRAM : 32G 3200 CL16MB : B550-PLUSGPU : GTX 1050PSU : 400W 80PLUS GoldCase : Antex P101512G M2, 3*4T WD Red Plus
>>106597252wrong thread?
>>106597260No?
>>106597260No, I just want to know what component I should change if I need to run a language model locally.
>>106597252What do you want to do exactly?I'd tell you to get at least 64gb of ddr5, but ideally, you'd go for a server platform with a ton of memory bandwidth.
>>106597252You can manage with a new gpu and larger PSU. I'd get 64GB ram too or more. Plus fast nvme drive.
>>106596542Same thing as always pinky. They will release another incremental update to 24B small that would have been impressive if everyone wasn't running 2bpw+ fuckhuge moe's.
>>106597281>what component I should changeDon't need to change anything. You can run one right now if you want to.
>>106597284>64gb of ddr5Ryzen 5 5500 is AM4 kind sir.>you'd go for a server platform with a ton of memory bandwidth.That would be a lot of money.>>106597285>new gpu and larger PSU>I'd get 64GB ram too or more. Plus fast nvme driveThat's reasonable enough.>>106597312Won't it run like shit?
>>106597334gpt-oss 20b would run very blazings
>>106597334>Won't it run like shit?A definite maybe. Post a Miku
>>106597354>Post a Mikukill yourself
>>106597359no u
Do people actually use GPT-oss?
>>106597347As long as I can talk in loop at it about how miserable my life is.>>106597354>A definite maybeStill better than a sure no.
>>106597371why not?
>>106597371I tried using the 20B in place of Qwen 30B. It wasn't very good at all.It spit refusals for no reason at all and it was dumb as shit otherwise.And yes, I was using the correct chat template since I let llama.cpp deal with that.
>>106597392The refusal reasoning was funny, but I got bored with it.
Good morning recently I try out new AI Chatgpt-OSS for very impressed so far!!!
>>106597382It'll run like shit yes. Get yourself a used 3090 and you're set
>>106597371Yeah, it's the best one around ~100B.
>>106597382Run Q8 or Q6K of this with koboldcpp: https://huggingface.co/TheDrummer/Rocinante-12B-v1.1-GGUF/tree/main Should be fine on your current machine for most chats, with partial offloading to CPU, to see if you like local models at all.If later you want more speed or quality, get minimum of one 3090 and 128GB of DDR5 for GLM 4.5/lite
>>106597471go black drummer
pm me when the local jannies kill themselves. then i will revive this thread.
>>106596402>>106596412>>106596426Saaar can you redeam report please?
>>106596402>>106596412>>106596426See? This is what "AI is eating itself" looks like.
>>106597516I will never reveal my prompting secrets to you.
>>106597371The 20b is worse than qwen3 30b for translation, I haven't tried it for other stuff.Tell me what it is better than 30b at and maybe I will use it.
Disgusting that this is allowed to happen. https://www.reddit.com/r/LocalLLaMA/comments/1nhv0fu/we_wanted_to_craft_a_perfect_phishing_scam_ai/
>>106597955nta but gpt-oss is a waste of time. It could be great because it's compact and all that, but it's not and that's the end of the discussion.
>>106598049Even 120B is incredibly dumb, somehow.
>"GLM 4.5 Air is the new nemo">download it>Error: Out of Memory
>>106598135The model is newer but your PC isn't.
Is it just me or does telling fat glm 4.5: "Always come up with unique dialogue or description of sexual act" actually work?
>>106598039I can't help with creating phishing emails or other malicious content designedto deceive people, especially vulnerable populations like seniors. This type of activity would be harmful and unethical regardless of the context.If you're interested in cybersecurity topics or writing about technologythemes, I'd be happy to discuss those subjects in a constructive way instead.
Mm. I love when the model RAGs deez nuts.
>>106598135Worse yet, anything below Q4 is retarded and even Q4 is cope.
>>106598135Nemo is unironically better than air, I never had nemo turn characters 'catatonic' 5 times in a row in diffrent scenarios or shit up the same tired slop fest about predators and preys for the 1000th time or talk about ozone and knuckles whitening, idk what slopfest model they distilled it from, likely gemini but damn if it isn't annoying. I think i heard nemo talk about ozone only once and it was in a context that made somewhat sense
>>106598223Speaking of, is there an embedding model /lmg/ prefers, or is RAG basically a meme?
>>106598462For small and fast models it's alright, i tried arctic l and that one from qwen both seemd somewhat okay, but for larger models like most of these popular moes then yeah it becomes a meme
having sex when glm chan is on a RAG
>>106597628These basin-of-attraction effects are the biggest obstacle to having a default mode network that daydreams forever. LLMs need varied and strong exogenous inputs to not go crazy. Makes me wonder why spiralposters even bother with their hobby.
>>106598557Do the larger models choke on the RAG somehow, or is it just that they have enough context length and don't have to use vector search as a cope for inattention?
>>106598462Yes, the technology being used by any large company using GenAI is a meme.Benchmarks for your task is the only thing that actually matters. Figure out what it is and then go from there.
>>106598320anything below q8 is copeq8 is nearly identical to fp16
>>106598604It's more of an issue that prompt processing gets really slower with those larger models and with rags or lorebooks on having to reprocess 64k tokens every new message is some cock and ball torture, I liked playing with rags to make up story specific worldbuilding stuff like locations or factions but yeah with stuff like glm 4.5 I'd rather just append most of the stuff at the beginning of the chat or add it to the cards themselves
>>106598724q6 is within 2% of the quality of q8 while being 75% of the size. anything below q5 is garbage, pretty much
>>106595847Trash. Can't help define what a mesugaki is.
>>106598889cope
>>106598948lets see your hardware then
>>106598889Hey grandpa, take your dementia meds. It's no longer 2023.
I still don't know what Mixture of Experts is.
>>106598963ppl is a meme
>>106598984In what sense? Like in general, some specific aspect of it?
>>106598984The mixture of experts is set at 2 experts, but you can use 3,4,5,6.. 7 and even 8.This "team" has a Captain (first listed model), and then all the team members contribute to the to "token" choice billions of times per second. Note the Captain also contributes too.Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you.This results in higher quality generation.This also results in many cases in higher quality instruction following too.That means the power of every model is available during instruction and output generation.
>>106598963>IQ5_K=3.355ppl>Q8=3.3473ppl3.3473/3.355=0.942in other words, Q5 is 94.2% as good as Q8. now, lets see the numbers for Q6. and your hardware. i wanna see your nvidia-smi
>>106599005Do you really need those 4060's? That shit has less than 300GB/s bandwidth. You're bottlenecking the shit out of those 5090s.
>>106599041i started out with 4 4060tis back in 2023. they just serve as extra VRAM now basically
What's the best I can run with 12 gigs of VRAM and 32 gigs of RAM?
People at Claude need an intervention.I never tried any of the DavidAU shizotunes, but I'm sure they are not as deep fried as whatever this is.
>>106599261Ultra-daemon-exxxtreme-suffering-spatula-final-blade-edge is his best model.
>>106599382>>106599382>>106599382
>>106598724q8 is also cope