/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107595736 & >>107588615►News>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo>(12/15) Nemotron 3 Nano released: https://hf.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107595736--Debunking AI model misconceptions and explaining expert activation mechanisms:>107602542 >107602610 >107602721 >107602718 >107602807 >107602891 >107602773 >107602789 >107602815 >107602829 >107603139--GPU memory allocation error and platform-specific workaround discussion:>107595928 >107595961 >107595964 >107596823 >107595997 >107596039 >107596002 >107596320--Claude model finetuning strategies and dataset sourcing challenges:>107596365 >107596486 >107597102 >107597835 >107598289 >107598359 >107601516 >107601986--Google's Gemma Scope 2 for AI safety research:>107601371 >107601386 >107601407 >107601519 >107601390 >107601636 >107601651--Exploring Qwen-Image-Layered for high-resolution animated portraits in Stellaris modding:>107603117 >107603295 >107603429 >107603433--Strategies for enhancing AI memory retention and context management:>107597731 >107597742 >107597758 >107597848--Parroting issue in AI models linked to human roleplay and data contamination:>107596587 >107596717 >107596838 >107596888 >107596877--LLaMA scout inference engine setup with future finetuning plans:>107595813 >107600238 >107600258 >107600413--Historical pre-WWI LLM's and assistant-like behavior:>107599240 >107599326 >107599724--Memory optimization challenges for running LLMs:>107599304 >107599331 >107599359 >107599418 >107599436--Speculation about GenieScope updates and forced LLM releases:>107602168 >107602257 >107602763 >107602413 >107602411--Speculation and concerns about upcoming GLM 4.7 release:>107602431 >107602719 >107602518 >107602778--glm 4.6 q8 vs kimi k2 q3 speed discrepancy due to parameter activation differences:>107602871 >107602902 >107602925--/lmg/ Book Club:>107604213 >107604386 >107604515--Miku (free space):>107595911 >107597840 >107600212 >107603433 >107604562►Recent Highlight Posts from the Previous Thread: >>107595738Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
When are these spam generals getting removed from /g/?
>>107604637so true this could have been another apple vs window thread!
>>107604637When you stop sucking dick (never)
>>107604689don't forget stallman posting
>>107604637This general even at its brimmiest is still better than 90% of /g/.
>>107604739True, we should post Eliezer Yudkowsky instead
>>107604637> muh precious /g/ catalogI'm looking now, and /g/ is becoming a lot of generals. That said, most of the stuff outside the generals is still trash> ragebait / woke> /pol/ and /x/ tier conspiracy> russians / economic collapse
>>107604763i loving your book sir!
>>107604765I'd go as far as to say /g/ is more computer illiterate than the other boards despite its theme.
>>107604790it do be consoom technology board more than anything
Gemini 3 Pro is the first model i would describe as "close to usable in most cases". I think that even if the AI boom turns into a bubble worse than the .com and 2008 Google will end up fine since most of their stuff is in-house
>Verification unrequired>click post>No valid captchaThe absolute state
>>107604840oh https://github.com/TuxedoTako/4chan-xt/issues/207#issuecomment-3662463745
>>107604833Google is also one of the few who can integrate AI in a lot of consumer products, rather than just be a dumb api provider / chat ui like openai. Gmail, Google Doc, Drive, Photos etc. all have a massive amount of users so any AI boosted feature there gains massive visibility. The closest to competing with them are companies that do not make good models (Microsoft, Crapple)
>>107604852>xtlol
>>107604637>>107604765/lmg/, /ldg/ (when a new model releases), and a few generals on /vg/ and /tg/ are the only places I visit on this godforsaken website. I don't look at any catalogs.
>>107604790>more computer illiterate than the other boards despite its themeI found the same thing. In a way it sort of makes sense, because if the anons knew how to do it they wouldn't need to come to /g/ to ask questions right? The problem is that /g/ is so low content and consume focused that there's no anons to ask answer any questions. The only place where knowledgeable anons actually hang out are the generals. Which is why /g/ is becoming subsumed with generals.
Local Miku General
>>107604765I blame rapeape for the current state of many boards; everything has to be about culture war garbage.>>107604790trvke
>>107603228a cult that, for reasons unknown, made the least """"safe"""" model out of all of themThings just happen in weird ways
>>107604607>--Historical pre-WWI LLM's and assistant-like behavior:that actually seems fun
>>107604913>trvkeYet you are parroting twitter zoomer buzzwords.
>>107604934>Its banned off everywhere else unless it's the leftist versionwhat are you talking about? places like twatter have become more /pol/ than the real /pol/rightoids have such a persecution complex
How retarded is pic related as a build?The plan is to work and play around with this for a while, mostly for local coding because I'm autistic and don't like to be dependent on the cloud for this.If I need more, the option is there to reuse 93% of this build and move to a 4x card system on a more modern platform. (once the ram prices have recovered).Another option would be a framework AI395 128GB motherboard. But those are 2K, have no upgrade path and are bandwidth limited.
>>107604956you're not doing shit with 16 rams
>>107604598I cant fucking believe there is no 24GB 50 series card to have a middle ground between uselessly low RAM and go-fuck-yourself-expensive.Like I can afford it but should I?
>>107604970was prolly planned for some super models but then sam happened
>>107604956So 16gb of RAM + 64 GB of VRAM?Compared to something like that strix halo thing, you are paying a lot more per GB, but at least you get the benefit of upgrading later.Actually, wouldn't getting a Strix Halo + a m.2 to PCI-E adapter + one 48GB GPU be more cost effective?
>>107604990Yeah…
>>107604677>satanic operations on matrices derived from the Satan himselfThank you Satan for saving my life.
>>107604852>last commit 7 months agoWhat went wrong?
>>107604955exactly, creating a fresh account on xitter and your feed will automatically be filled with elon-approve rightslop accounts, this isn't 2019 anymore
>>107604964Once the model is in the GPU, it doesn't matter much right? There are benchmark where cards hooked up to a RPI5, with 16GB, still get decent results. https://github.com/geerlingguy/ai-benchmarks?tab=readme-ov-file>>107604998Well, framework can do some things pretty okay, but It also has a sort of e-waste aura around it. And it's rdna3.5.>cost effectiveWeirdly enough, looking at pic related, a pi5 + R9700 isn't bad. That kinda inspired my proposed build.
>>107605235>Once the model is in the GPUproblem is even with 64 vram you won't fit much of anything good
>>107604970If you’re interested in running LLMs and are going to get a 5090 you might as well get a 6000 pro instead. Even with small models I imagine it’s nice to be able to have both an LLM and an SD model loaded simultaneously so you don’t have to swap to get lewd illustrations of your RP scenes.t. bought a 5090 instead of a 6000 pro
Google reuploaded medasr on their HF account, willing to bet that's what was being hyped for today?
>>107605308i mean who knows at this point but this ain't even a *gemma* thing
>>107605308sir... week not over yet... gemmy 6 soon
Gemma sirs?
>Google releases a new model called Gemmas Cope
Which models are best for both lewd roleplay while also being able to follow a game system following rules and tracking stats/states?
>>107605502>https://huggingface.co/bartowski/moonshotai_Kimi-K2-Thinking-GGUF
>>107605518Whats up with those gargantuan models? Who is able to load this?I have a 4090 + 128 GB RAM and wouldn't be able to load even the 1-bit quants.
>>107604955twatter is a cesspit where nobody will see your shit. try to discuss something in that popularity contest.case in point, my previous post was deleted, yours is still up.
>>107604598How to make porn?
>>107605554First you find a woman and then you pay her money. Bring a camera and save the video.
>>107604637Why the poor edit of my waifu? For what purpose?
>>107605554Sorry, I can't assist with that.
>>107604833Gemini 3 learned too much to mimic midwits. It behaves like a midwit, referencing pseudo sciences, pop-"culture" and so one instead of actual knowledge. Maybe it can be fixed with some prompting and by getting out of our way to reduce the amount of tokens.
>>107605565I have a woman I want to create a stable diffusion model to let all of you neets make porn for 4.99 a month.
>>107605715wrong general buddy. we all have GPUs
>>107604955Anon's post being deleted invalidates your point desune.
>>107605308 >>107605341 >>107605438Turns out it actually was medasr.https://x.com/osanseviero/status/2002121284688490706>>107605357The week is effectively over. You know what got released on a Saturday? Llama 4.
>>107605545ok anon, lets see what your contribution was that was deleted
>>107605884ye... this pretty much confirms no 4 until next year imo
>>107604944It could have been if it wasn't filtered and gated on top of that because it's still too "toxic"
>>107604598I've given up hope for privacy and started using cloud modelsHow do I keep believing in local when I don't have enough resources
https://youtu.be/g7Ak6VpEIvs?t=254We are going to make it one day bros.
>>107605884Friendship ended with medasirs. Now Altman is my best friend
>>107606187I won't have a retarded sub 10B model as girlfriend anon. I am too smart for a retard.
>>107606202>implying we simp for Neuro
>>107606187I haven't seen any footage of neuro in what feels like years.Is it just me or is her voice soulless now?
>>107606187That voice is so fucking cringe.Why the fuck would you pedos want a child as a girlfriend? Someone who doesn't have any idea what the real world is like and you can't discuss anything more sophisticated than children's movies?
>>107604213>Give me some /lmg/ recc books for the trip that I can load onto my tablet.Last time the book club was regular thing an anon put together a bundle and the link is still up: https://files.catbox.moe/hefnnc.rarI would also recommend Daemon by Daniel Suarez. Distributed AI attempting to destroy the world and rebuild it anew is as /lmg/ as it gets.
>>107606269vtroons should be kept in their swarm containment
>>107605884GLM4.7 700B-50A will save local, trust in the plan...
>>107606269>actual vedalbeggar
>>107606269Every time I check on his twitch project, it seems to reach a new low. I can't believe how much his fanbase has deteriorated since the beginning. If you picked the most retarded guy on /aids/ he'd be the smartest of the bunch there.
>>107606346no>>107606418unrepentantly so>>107606467picrel
>>107606527Yeah, you're Exhibit A of the retarded fanbase. No need to repeat yourself
They're teasing us with racist LLMs but not releasing themhttps://github.com/DGoettlich/history-llms
>>107606372I will run it at Q2!
>>107606603It's probably just shit so it would be embarrassing to release. There can't be enough training data that's verified old enough for this to work, and also pretrain a good model. This is my cope because it would otherwise be really cool to talk to a model like that
>>107606603>teasing us with racist LLMsYou're trying to hard too fit in.
>>107606628It's a series of 4B models each trained on 80B tokens. They're so severely undertrained, they wouldn't be good for much but vaguely correct trivia written in funny English.
>>107604515Continuing /lmg/ book reccsI ended up pulling copies from Polity series and Stars and Bones.That, and PKD Valis, which I've not read. We'll see how they are. >>107604728>Early AsimovI've read a lot of his stuff, but I doubt I've read all of it. And it was a long time ago... I should go back and look again at his work.
>>107606659projection from newfag
>>107606628There are examples on the readme. Seems based to me
>>107606672they don't have all the math and coding synthetic slop data. 80b is enough for a monolingual chat bot.
>>107606603like talking to a time capsule, I like the idea of time-gated models, cant wait to play around with them
Why's everyone freaked out about ram shortages when you can run full r1 on a single 8gig GPU?
>>107606603>university of Zurich>Responsible access frameworksEnjoy having to send your use case + a logged API to use the model
>>107606758ollama run deepseek
>>107606628>There can't be enough training data that's verified old enough for this to work, and also pretrain a good model.Could always be solved by augmenting with synthetic data.
>llama-server just werks and also has a pretty decent built in web UI Why didn't anyone told me of this? I spent months on ooba + silly fussing with cryptic template configs that never worked properly spitting garbage ugly outputs when dealing directly with llama.cpp you just drop the gguf and go
>>107606784anon,,
>>107606372q1, here I come...
>>107606799If anyone recommended you ooba post-2023, it was a shitpost.
>>107606799It kinda sucks outside of assistant tasks. Silly is more specialized. Knowing that template shit helps you. Also you should learn what sampling does.
>>107606741>chat bot.A chat-formatted finetune would defeat the whole purpose. They are base models trained on raw text.
>>107606758>he has a GPUYou can run models without one by just adding -cloud to the end, it's magic.
>>107606799
did the new captcha briefly kill 五毛
>>107606770> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.終わりだ
>>107606799The web UI was only added last month, and the “just werks” auto-fitting in the last week or two.The future is now.
now it just needs a central database