[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1604345226030.jpg (884 KB, 1340x1000)
884 KB
884 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107604598 & >>107595736

►News
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1709494521677.jpg (70 KB, 357x480)
70 KB
70 KB JPG
►Recent Highlights from the Previous Thread: >>107604598

--MiMo-V2-Flash release and SWA vs GA debate in long-context model training:
>107610359 >107611212 >107611389 >107611538 >107611607 >107611641 >107611726 >107612033 >107612057 >107612103 >107612118 >107612128 >107612346 >107612451
--Evaluating modern language models for roleplay efficiency vs cost:
>107607970 >107608333 >107608570 >107608582 >107608588 >107608691 >107608373 >107608741 >107608449 >107608489 >107608674 >107608716 >107608757 >107608804 >107608943
--Skepticism about undertrained historical LLM project despite potential for time-gated models:
>107606603 >107606628 >107606672 >107606741 >107606844 >107606718 >107606784 >107606757 >107606770 >107606932
--MiraTTS model critique and audio quality debate:
>107607864 >107607903 >107608117 >107609014 >107609123 >107609241 >107609363
--Cost and performance considerations for local AI builds:
>107604956 >107604998 >107605235 >107605247
--Gemma's hiring for multimodal AI research sparks skepticism about job requirements:
>107613937 >107613974 >107614102 >107614027
--Google MedASR release and Llama 4 speculation:
>107605308 >107605884 >107606028
--LLMs as psychological tools: experiences, limitations, and philosophical debates:
>107607433 >107607442 >107613464 >107607458 >107607522 >107607532 >107607610 >107607651 >107607747 >107607825 >107607872 >107607896 >107607918 >107607942 >107607954 >107607972 >107607912 >107607580 >107607725 >107607757 >107607832 >107607734 >107607907 >107607945 >107608083 >107609148 >107613528 >107613599 >107613646 >107613702 >107613792 >107613896 >107613644 >107613717 >107613821 >107613858 >107613912 >107613990 >107614028 >107614195 >107614243 >107614338 >107614050 >107614222
--/lmg/ Book Club:
>107606330 >107606677
--Miku (free space):
>107604899 >107614611 >107614788

►Recent Highlight Posts from the Previous Thread: >>107604607

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
There were a lot of new TTS models released the past few weeks, do any of them have a better quality-and-speed-to-system-requirements ratio than GPT-SoVITs yet with voice cloning? Or a C++ implementation without needing a VENV?
>>
>>107614872
Not really, sovits is still the best
>>
So let me get this straight. After a week of anons screenshotting some retard on twitter, google released nothing of note?
>>
Thoughts on Huihui-Qwen3-VL-32B-Thinking-abliterated? Apparently it's the first 100% uncensored model on the UGI leaderboard. Will do literally anything you ask.
>>
>>107614887
As in last thread. Just use 4.6 to destroy your identity and then you won't care about new models not releasing.
>>
>>107614887
MedASR and FunctionGemma sir?
>>
>>107614872
chatterbox/vibevoice/sovits are really the end.
I enjoyed echo-tts but that faggot won't release the cloning part.
>>
>>107614896
>MedASR
English and Chinese only Whisper for medical texts. Oh yeah, that's exactly what we've been waiting for.
>>
>>107614872
You don't need more than sovits
>>
>>107614887
which retard?
>>
>>107614882
>>107614903
>>107614930
Now if only someone made a llama.cpp/stable-diffusion.cpp style single binary that doesn't require all the usual python bloat.
>>
>>107614887
BLOODY VISHNU BASTERDS NO REDEEM
>>107614892
the most embarrasing shit is seeing that leadrboard mentioned here it is litteraly fakeer and gayer then math benchmaks also the guy running it is a globohomo faggot
>>
>>107614972
I thought kobol.cpp supports some of them.
>>
>>107614974
>globohomo faggot
Ahhh thank you anon. /lmg/ really sucked today and it needed that.
>>
>>107614977
I think it uses tts.cpp, which unfortunately only works with a handful of models (Kokoro and a few others that I never heard about and don't seem worth it). No GPU or Vulkan support either.
>>
>>107614972
Understable. I'm running it in docker
>>
File: 1737715573631218.png (173 KB, 828x803)
173 KB
173 KB PNG
>>107614887
It seems that anons forgot the golden rule: nothing ever happens
>>
>>107614999
>docker
Was it called docker because of docking?
>>
>>107615001
that's not true. bad things often happen
>>
>>107614887
I have never run or cared about a google llm why start now it's obv m-m-m-monster-cucked out the gate
>>
>>107615001
LLMs didn't exist 10 years ago
>>
>>107615068
Gemmas have the most world knowledge out of all similarly sized models. Her main disadvantage is that she can't say "cock" without help.
>>
>>107615079
I'm sure you're always willing to help guide her knowledge along ey?
>>
File: 1757708122311.png (95 KB, 849x787)
95 KB
95 KB PNG
>>107615079
Because they did not filter the pretrain data hard enough. Did you think they would make the same mistake again? Especially when the Gemma team was the one bragging that they were able to reduce the ability to reproduce the training data. Any Gemma 4 would have just been another gpt-oss.
>>
>>107614903
>I enjoyed echo-tts but that faggot won't release the cloning part.
What are you talking about? echo-tts supports cloning out of the box. It's actually the best clonning tts for English-only.
>>
I'm just wondering, why are we waiting for Gemma 4? Does gemma excel at anything in particular? The way I see it most want local models for rp mostly but Gemma is cucked, no? Or do you guys use local models for coding too? If yes why?
>>
>>107615290
Just general assistant shit desu, as a ramlet/vramlet.
>>
>>107615290
I think the saars are that plentiful they can shill even here, never saw a good google model log they only release tiny shit for vrammies - /lmg/ membership card denied
>>
>>107615290
>local
>why
because its local
>>
>>107615172
>approx
generalization GOOD
>>
>>107615361
yeah but when coding you would want to use the best tool for the job, why use a local 27B model which will also slow down for your system when you can just use gemini 3 pro for free unlimited.
>>
>>107615377
Approximation is not generalization. God even made two different words to express the distinct concepts.
>>
>>107615418
i only use things i can run myself. small code models are pretty capable but if you need something bigger, llama 3 70b and devstral 2 123b are quite good.
>>
>>107615455
>he cant see the correlation
I bet you cant even visualize apples, retard. go do more psychiatrist sessions with glm, fucking low iq human waste
>>
>>107615290
Gemma is a prude but she's good at RP and chat, and smarter than anything else around that size.
Mistral Small and Nemo are supposed to be good for ERP but they're too retarded for me. Nemo particularly seems braindead. I can't stand to use them to build up to anything.
>>
>>107615465
devstral is fucking dogshit, the tool calling is fucking SHIT (tested fp8 through cline)
>>
>>107615418
That's how you get yourself killed with a drone.
>>
>>107615465
I guess it depends on what you are doing. I couldn't see myself using local llms for coding as I would at least want it to incorporate google searches for recent things. Like using the ESP IDF is way better with gemini 3 pro as it's api is constantly updating and deprecating older shit.
>>
there's really no decent community for talking about LLMs on the internet uh
this place is mainly about coomers and leddit.. leddit.. just saw this nugget:
https://old.reddit.com/r/LocalLLaMA/comments/1pragtf/open_source_llm_tooling_is_getting_eaten_by_big/
pure llm slop thread, of 99 current comments, only two people even notice it's fucking slop
how retarded of a LLM user do you have to be that you spend hours with those tools and not notice when someone gen this kind of slop
> This isn't just my stack. The whole ecosystem is churning.
> That's when it clicked.
> We're not choosing tools anymore. We're being sorted into ecosystems.
I kinda wish someone with guts of steel made a classic vbulletin/phpbb webforum with heavy moderation or something. Instaban slop posters AND people who are too retarded to notice slop.
>>
>>107615478
Someone said they got it better by reducing the default rep pen, otherwise need to wait for a fix either in llama.cpp or for cline to add native tool support.
>>
>>107615270
When it was posted here something was withheld over "safety". Did that change?

>Below are samples, and further below is a more detailed description of the model. We plan on releasing model weights/code, though we are not planning on releasing the speaker-reference transformer weights at this time due to safety concerns.
>>
>>107615478
never tried the tool calling, just coding. devstral 2 suggested a few things that worked where qwen 2.5/3 missed.

>>107615503
for my use i dont need internet search but i'm pretty sure kobold added that a while back. might exist in lcpp too
>>
>>107615504
>want to talk about models
>annoyed when people use models to help put their own thoughts into words
???
>>
>>107615504
I love the dissonance some people have when they both get heavily into the hobby and hate it when they see AI output where they didn't expect it.
>>
>>107615549
>their own thoughts
no, I assure you, no form of human thought was put into this logorrhea
>>107615552
>I love the dissonance some people have when they both get heavily into the hobby and hate it when they see AI output where they didn't expect it.
incredible, people expect human interaction to be with other humans, preferably not the lobotomized kind
>>
>>107615508
forgot to mention I tested the CLOUD model, thats why I specifically said fp8, not Q8.
Anyway, I've been battling with a problem this past week and I've tried a combination of local and remote free models.
Local:
>gemma 27b
>qwen 3 next 80b
>qwen 3 code 30b
>gptoss 120b
Remote
>grok-code-fast
>glm 4.6
>devstral 123b
>minimax m2
I gave them the same instructions, the same traces/observations and even the same pointers as where the problem was located/what to look for
ALL of them failed. Some of them had really bad tool support (glm4.6 and devstral) and when they worked they couldnt find the solution. I iterated the errors/solution with them ranging from 2 to 6 times, then I gave up. Literal days spent tard wrangling LLMs.
Then I said FUCK IT, let's try Antigravity with gpt3 pro and 1st iteration failed, but it got really close, 2nd iteration I gave it the small push it needed and BAM, bug fixed.
Problem is that I'm a capable programmer, but I got fixated into having the AI fucking solve the bug, if I instead would've just decided to look at the source myself I would've solved it in maybe 2~ hours (gemini took 30~ mins).
This was a really big reality check for me.
Yes you can tell LLMs to code a lot of stuff for you, I even spend time validating, but SADLY cloud models are at another level. SAD
>>
>>107615571
I feel like if you unfried all the "that is the most important question" preference training and added long term memory LLM's (4.6 or better) would be much more interesting to talk to than 99% of people.
>>
>>107615591
>Antigravity with gpt3 pro
??????????????????????????????????????????????????????????????
>>
>>107615644
He means gemini.
>>
>>107615619
>I feel like if you unfried all the "that is the most important question" preference training
Another leaderboard would have to take the place of lmarena first.
>>
>>107615524
https://voca.ro/1kdrd2885gib
>>
>>107615591
I write VBA automation scripts at my work and I gave gpt4 a try for like 2 days. Then I realized it will be faster if I write it myself. And I didn't ask it for complete program. Just single functions or rewriting what I already wrote with some explicit pointers I know it would need. Can't imagine how vibe coders work if they don't know what the model should know before it starts writing.
>>
>>107615770
A lot of vibe coders dont care, to them whatever the AI produces is a black box, they just look at the result (not the code) and iterate.
Of course they end up with garbage code, but they dont really care.
>>
>>107615770
You used a year old model and complain about its quality, and then say you can't imagine how others work?
I'd ask if you're retarded but geeze.

>>107615591
>gpt3 pro
>>
>>107615770
You're supposed to do it through an agentic framework like claude code or codex or opencode.
I don't know if Excel has some way to run a macro from a file. It's not the best situation to apply vibecoding.
Vibecoding works best with the sloppiest languages (python, js) and and not that well with anything slightly obscure.
Also
>gpt4
Nigger, what?
>>
>>107615797
This is the vibemax way, and it will be much of the code in your future
The hellscape where we can do anything but don't do anything well
>>
>>107615701
I got it from their blog: https://jordandarefsky.com/blog/2025/echo/
>>
>>107615770
>>107615797
if LLMs could really boost productivity or code quality it would be known by now, years of this shit and even the people who make those models are incapable of showing the utility of those things for code lmao
claude code is filled with retarded bugs like these :
https://github.com/anthropics/claude-code/issues/6481
that are simply not getting any attention because their crap was legacy code (legacy code being code no one understands in the team) from day one and the way they built their TUI is almost unfixable
or how about this gold nugget :
https://github.com/openai/openai-python/issues/2472
that one is legendary for appearing on an openai video stream for gpt 5 launch where they asked their model to vibe fix the issue
months later nothing happened issue still exists just crickets
if openai and anthropic can't use their own models to produce better code than paying a team of cheap third worlders what's the use of AI ? lmao
>>
>>107615913
The issue is on the chair, not the LLM. If you can't get anything from current LLMs you're beyond retarded. Even no coders are able to get fully functional apps by now.
>>
>>107615949
>If you can't get anything from current LLMs you're beyond retarded
so anthropic and openai employees are beyond retarded uh
>>
>>107615905
I see.That blogpost is misleading. Even the block-wise diffusion, which in the blogpost is references as:
>It is unlikely that we will include the block-wise fine-tuned weights in our initial release.
>>
>>107615913
In reality, Claude Code feels polished. And so do their web apps. They have good design.
>>
>>107615874
Thing is that it knew functions and snippets I gave it basically gave it the correct syntax. And it had correct syntax but it would just randomly decide to do something retarded that makes zero sense.
>>
huggingface is aware that people don't care about their website they just like the bandwidth right?
>>
>>107616009
what do you mean? They make money the same way github does, free for individuals so it becomes an industry standard but charging teams and corporations.
>>
feet
>>
>>107615913
Ok, now show the alternative to claude code that is not vibecoded. Oh, wait, it doesn't exist, or it contains vastly less features to the point that it's not even remotely worth using despite claude code having a few bugs here and there.
But I respect that you've done your homework and brought real examples to the table.
Cross platform TUI applications that have to support complex behavior like multi line input and paste support (for example distinguishing a newline from the keyboard from a newline that comes with the pasted text) are a pain in the ass to build and they are pretty much bound to have edge cases because a lot of stuff is hard to do without heuristics (if the text comes at a burst then it's a paste, if it comes with a few milliseconds in between it's the user typing, etc). And that's without getting into incompatibilities between terminal apps.
The only thing that probably comes close to that level of complexity might be emacs.

>>107616008
The GPT 4 family *sounds* good stylistically but is retarded (for modern standards) when it comes to intelligence. I was roleplaying a teenager in an orphanage with it and it completely changed the names of the residents from one scene to the next.
OpenAI was never able to make their non thinking models progress past 2024, and their thinking models haven't advanced noticeably since o1.
Both Gemini and Anthropic are kicking OpenAI's ass, as well as the Chinese distilling from those two.
>>
File: mimoo.png (132 KB, 800x477)
132 KB
132 KB PNG
MiMo seems alright. Parrots less than GLM but not as smart.
>>
>>107616009
I jerk off to huggingface-chan card at least once a month.
>>
File: c00652.png (405 KB, 1024x1024)
405 KB
405 KB PNG
Post membership card local gens on your hardware only
>>
>>107616098
LEANS IN VOICE DROPPING TO A WHISPER A SMIRK PLAYING ON HER LIPS WINKING FORGET YOUR OWN NAME
>>
>>107616118
nothing XTC and some token bans can't fix.
the bar on coherence is really fucking low this year.
>>
I would like to posit a groundbreaking theory: slop in cooming is only a problem if your penis wasn't touched by a fuckhuge moe. The second it gets touched by a fuckhuge moe slop concerns become secondary. The problem always was with the model being retarded. The slop is just a red herring and most obvious thing you can see in the output, but if the model knows what you want and writes unique stuff slop becomes bearable. Because slop has its source in real written smut. I would like to call this the law of: "Real ERP has never been tried".
>>
>>107616115
>>
File: 1735215693894085.jpg (9 KB, 198x206)
9 KB
9 KB JPG
>>107615959
They hire literal high-school dropouts and jeets. Is that surprising? I aced my bachelor’s thanks to fucking GPT-3.5. The assignment was to build a Java CLI program (a calendar with alarms and stuff) with a minimum of 1k loc in 4 hours. Barely anyone even knew chatgpt existed back then. And now retards like you are acting smug because you can’t get a single method out of fucking SOTA models? Who are you kidding, dumbass? Just apply for disability.
>>
File: c00652a.png (256 KB, 965x604)
256 KB
256 KB PNG
>>107616115
I should've cropped it
>>107616265
Welcome, what is being printed? Sign of the end when we cannot understand the machinations of our own destruction
>>
>>107616330
Nothing, she just lives there
>>
>>107616299
What turbojeetoid university did you go to that required you to make a final project in Java with a minimum line count in 4 hours, wtf?
>>
File: nagging.png (154 KB, 1518x978)
154 KB
154 KB PNG
Does your AI nag you about your menial tasks too?
>>
>>107616299
>now retards like you are acting smug
The irony
>>
>>107616348
Hope she is comfy don't leave your Miku in a hot car / oven / 3d printer
>>
>>107616360
A German university and I won’t argue about the pointlessness of the assignment. There is a correct way to prompt LLMs that most programmers can’t wrap their heads around. Exhibit A: >>107616399
>>
>>107616386
Honestly I gotta try having an LLM nag me for procrastinating. Maybe that helps.
>>
>>107616386
Well, it's mostly me venting. Still that's a good idea



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.