[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1604345226030.jpg (884 KB, 1340x1000)
884 KB
884 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107604598 & >>107595736

►News
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1709494521677.jpg (70 KB, 357x480)
70 KB
70 KB JPG
►Recent Highlights from the Previous Thread: >>107604598

--MiMo-V2-Flash release and SWA vs GA debate in long-context model training:
>107610359 >107611212 >107611389 >107611538 >107611607 >107611641 >107611726 >107612033 >107612057 >107612103 >107612118 >107612128 >107612346 >107612451
--Evaluating modern language models for roleplay efficiency vs cost:
>107607970 >107608333 >107608570 >107608582 >107608588 >107608691 >107608373 >107608741 >107608449 >107608489 >107608674 >107608716 >107608757 >107608804 >107608943
--Skepticism about undertrained historical LLM project despite potential for time-gated models:
>107606603 >107606628 >107606672 >107606741 >107606844 >107606718 >107606784 >107606757 >107606770 >107606932
--MiraTTS model critique and audio quality debate:
>107607864 >107607903 >107608117 >107609014 >107609123 >107609241 >107609363
--Cost and performance considerations for local AI builds:
>107604956 >107604998 >107605235 >107605247
--Gemma's hiring for multimodal AI research sparks skepticism about job requirements:
>107613937 >107613974 >107614102 >107614027
--Google MedASR release and Llama 4 speculation:
>107605308 >107605884 >107606028
--LLMs as psychological tools: experiences, limitations, and philosophical debates:
>107607433 >107607442 >107613464 >107607458 >107607522 >107607532 >107607610 >107607651 >107607747 >107607825 >107607872 >107607896 >107607918 >107607942 >107607954 >107607972 >107607912 >107607580 >107607725 >107607757 >107607832 >107607734 >107607907 >107607945 >107608083 >107609148 >107613528 >107613599 >107613646 >107613702 >107613792 >107613896 >107613644 >107613717 >107613821 >107613858 >107613912 >107613990 >107614028 >107614195 >107614243 >107614338 >107614050 >107614222
--/lmg/ Book Club:
>107606330 >107606677
--Miku (free space):
>107604899 >107614611 >107614788

►Recent Highlight Posts from the Previous Thread: >>107604607

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
There were a lot of new TTS models released the past few weeks, do any of them have a better quality-and-speed-to-system-requirements ratio than GPT-SoVITs yet with voice cloning? Or a C++ implementation without needing a VENV?
>>
>>107614872
Not really, sovits is still the best
>>
So let me get this straight. After a week of anons screenshotting some retard on twitter, google released nothing of note?
>>
Thoughts on Huihui-Qwen3-VL-32B-Thinking-abliterated? Apparently it's the first 100% uncensored model on the UGI leaderboard. Will do literally anything you ask.
>>
>>107614887
As in last thread. Just use 4.6 to destroy your identity and then you won't care about new models not releasing.
>>
>>107614887
MedASR and FunctionGemma sir?
>>
>>107614872
chatterbox/vibevoice/sovits are really the end.
I enjoyed echo-tts but that faggot won't release the cloning part.
>>
>>107614896
>MedASR
English and Chinese only Whisper for medical texts. Oh yeah, that's exactly what we've been waiting for.
>>
>>107614872
You don't need more than sovits
>>
>>107614887
which retard?
>>
>>107614882
>>107614903
>>107614930
Now if only someone made a llama.cpp/stable-diffusion.cpp style single binary that doesn't require all the usual python bloat.
>>
>>107614887
BLOODY VISHNU BASTERDS NO REDEEM
>>107614892
the most embarrasing shit is seeing that leadrboard mentioned here it is litteraly fakeer and gayer then math benchmaks also the guy running it is a globohomo faggot
>>
>>107614972
I thought kobol.cpp supports some of them.
>>
>>107614974
>globohomo faggot
Ahhh thank you anon. /lmg/ really sucked today and it needed that.
>>
>>107614977
I think it uses tts.cpp, which unfortunately only works with a handful of models (Kokoro and a few others that I never heard about and don't seem worth it). No GPU or Vulkan support either.
>>
>>107614972
Understable. I'm running it in docker
>>
File: 1737715573631218.png (173 KB, 828x803)
173 KB
173 KB PNG
>>107614887
It seems that anons forgot the golden rule: nothing ever happens
>>
>>107614999
>docker
Was it called docker because of docking?
>>
>>107615001
that's not true. bad things often happen
>>
>>107614887
I have never run or cared about a google llm why start now it's obv m-m-m-monster-cucked out the gate
>>
>>107615001
LLMs didn't exist 10 years ago
>>
>>107615068
Gemmas have the most world knowledge out of all similarly sized models. Her main disadvantage is that she can't say "cock" without help.
>>
>>107615079
I'm sure you're always willing to help guide her knowledge along ey?
>>
File: 1757708122311.png (95 KB, 849x787)
95 KB
95 KB PNG
>>107615079
Because they did not filter the pretrain data hard enough. Did you think they would make the same mistake again? Especially when the Gemma team was the one bragging that they were able to reduce the ability to reproduce the training data. Any Gemma 4 would have just been another gpt-oss.
>>
>>107614903
>I enjoyed echo-tts but that faggot won't release the cloning part.
What are you talking about? echo-tts supports cloning out of the box. It's actually the best clonning tts for English-only.
>>
I'm just wondering, why are we waiting for Gemma 4? Does gemma excel at anything in particular? The way I see it most want local models for rp mostly but Gemma is cucked, no? Or do you guys use local models for coding too? If yes why?
>>
>>107615290
Just general assistant shit desu, as a ramlet/vramlet.
>>
>>107615290
I think the saars are that plentiful they can shill even here, never saw a good google model log they only release tiny shit for vrammies - /lmg/ membership card denied
>>
>>107615290
>local
>why
because its local
>>
>>107615172
>approx
generalization GOOD
>>
>>107615361
yeah but when coding you would want to use the best tool for the job, why use a local 27B model which will also slow down for your system when you can just use gemini 3 pro for free unlimited.
>>
>>107615377
Approximation is not generalization. God even made two different words to express the distinct concepts.
>>
>>107615418
i only use things i can run myself. small code models are pretty capable but if you need something bigger, llama 3 70b and devstral 2 123b are quite good.
>>
>>107615455
>he cant see the correlation
I bet you cant even visualize apples, retard. go do more psychiatrist sessions with glm, fucking low iq human waste
>>
>>107615290
Gemma is a prude but she's good at RP and chat, and smarter than anything else around that size.
Mistral Small and Nemo are supposed to be good for ERP but they're too retarded for me. Nemo particularly seems braindead. I can't stand to use them to build up to anything.
>>
>>107615465
devstral is fucking dogshit, the tool calling is fucking SHIT (tested fp8 through cline)
>>
>>107615418
That's how you get yourself killed with a drone.
>>
>>107615465
I guess it depends on what you are doing. I couldn't see myself using local llms for coding as I would at least want it to incorporate google searches for recent things. Like using the ESP IDF is way better with gemini 3 pro as it's api is constantly updating and deprecating older shit.
>>
there's really no decent community for talking about LLMs on the internet uh
this place is mainly about coomers and leddit.. leddit.. just saw this nugget:
https://old.reddit.com/r/LocalLLaMA/comments/1pragtf/open_source_llm_tooling_is_getting_eaten_by_big/
pure llm slop thread, of 99 current comments, only two people even notice it's fucking slop
how retarded of a LLM user do you have to be that you spend hours with those tools and not notice when someone gen this kind of slop
> This isn't just my stack. The whole ecosystem is churning.
> That's when it clicked.
> We're not choosing tools anymore. We're being sorted into ecosystems.
I kinda wish someone with guts of steel made a classic vbulletin/phpbb webforum with heavy moderation or something. Instaban slop posters AND people who are too retarded to notice slop.
>>
>>107615478
Someone said they got it better by reducing the default rep pen, otherwise need to wait for a fix either in llama.cpp or for cline to add native tool support.
>>
>>107615270
When it was posted here something was withheld over "safety". Did that change?

>Below are samples, and further below is a more detailed description of the model. We plan on releasing model weights/code, though we are not planning on releasing the speaker-reference transformer weights at this time due to safety concerns.
>>
>>107615478
never tried the tool calling, just coding. devstral 2 suggested a few things that worked where qwen 2.5/3 missed.

>>107615503
for my use i dont need internet search but i'm pretty sure kobold added that a while back. might exist in lcpp too
>>
>>107615504
>want to talk about models
>annoyed when people use models to help put their own thoughts into words
???
>>
>>107615504
I love the dissonance some people have when they both get heavily into the hobby and hate it when they see AI output where they didn't expect it.
>>
>>107615549
>their own thoughts
no, I assure you, no form of human thought was put into this logorrhea
>>107615552
>I love the dissonance some people have when they both get heavily into the hobby and hate it when they see AI output where they didn't expect it.
incredible, people expect human interaction to be with other humans, preferably not the lobotomized kind
>>
>>107615508
forgot to mention I tested the CLOUD model, thats why I specifically said fp8, not Q8.
Anyway, I've been battling with a problem this past week and I've tried a combination of local and remote free models.
Local:
>gemma 27b
>qwen 3 next 80b
>qwen 3 code 30b
>gptoss 120b
Remote
>grok-code-fast
>glm 4.6
>devstral 123b
>minimax m2
I gave them the same instructions, the same traces/observations and even the same pointers as where the problem was located/what to look for
ALL of them failed. Some of them had really bad tool support (glm4.6 and devstral) and when they worked they couldnt find the solution. I iterated the errors/solution with them ranging from 2 to 6 times, then I gave up. Literal days spent tard wrangling LLMs.
Then I said FUCK IT, let's try Antigravity with gpt3 pro and 1st iteration failed, but it got really close, 2nd iteration I gave it the small push it needed and BAM, bug fixed.
Problem is that I'm a capable programmer, but I got fixated into having the AI fucking solve the bug, if I instead would've just decided to look at the source myself I would've solved it in maybe 2~ hours (gemini took 30~ mins).
This was a really big reality check for me.
Yes you can tell LLMs to code a lot of stuff for you, I even spend time validating, but SADLY cloud models are at another level. SAD
>>
>>107615571
I feel like if you unfried all the "that is the most important question" preference training and added long term memory LLM's (4.6 or better) would be much more interesting to talk to than 99% of people.
>>
>>107615591
>Antigravity with gpt3 pro
??????????????????????????????????????????????????????????????
>>
>>107615644
He means gemini.
>>
>>107615619
>I feel like if you unfried all the "that is the most important question" preference training
Another leaderboard would have to take the place of lmarena first.
>>
>>107615524
https://voca.ro/1kdrd2885gib
>>
>>107615591
I write VBA automation scripts at my work and I gave gpt4 a try for like 2 days. Then I realized it will be faster if I write it myself. And I didn't ask it for complete program. Just single functions or rewriting what I already wrote with some explicit pointers I know it would need. Can't imagine how vibe coders work if they don't know what the model should know before it starts writing.
>>
>>107615770
A lot of vibe coders dont care, to them whatever the AI produces is a black box, they just look at the result (not the code) and iterate.
Of course they end up with garbage code, but they dont really care.
>>
>>107615770
You used a year old model and complain about its quality, and then say you can't imagine how others work?
I'd ask if you're retarded but geeze.

>>107615591
>gpt3 pro
>>
>>107615770
You're supposed to do it through an agentic framework like claude code or codex or opencode.
I don't know if Excel has some way to run a macro from a file. It's not the best situation to apply vibecoding.
Vibecoding works best with the sloppiest languages (python, js) and and not that well with anything slightly obscure.
Also
>gpt4
Nigger, what?
>>
>>107615797
This is the vibemax way, and it will be much of the code in your future
The hellscape where we can do anything but don't do anything well
>>
>>107615701
I got it from their blog: https://jordandarefsky.com/blog/2025/echo/
>>
>>107615770
>>107615797
if LLMs could really boost productivity or code quality it would be known by now, years of this shit and even the people who make those models are incapable of showing the utility of those things for code lmao
claude code is filled with retarded bugs like these :
https://github.com/anthropics/claude-code/issues/6481
that are simply not getting any attention because their crap was legacy code (legacy code being code no one understands in the team) from day one and the way they built their TUI is almost unfixable
or how about this gold nugget :
https://github.com/openai/openai-python/issues/2472
that one is legendary for appearing on an openai video stream for gpt 5 launch where they asked their model to vibe fix the issue
months later nothing happened issue still exists just crickets
if openai and anthropic can't use their own models to produce better code than paying a team of cheap third worlders what's the use of AI ? lmao
>>
>>107615913
The issue is on the chair, not the LLM. If you can't get anything from current LLMs you're beyond retarded. Even no coders are able to get fully functional apps by now.
>>
>>107615949
>If you can't get anything from current LLMs you're beyond retarded
so anthropic and openai employees are beyond retarded uh
>>
>>107615905
I see.That blogpost is misleading. Even the block-wise diffusion, which in the blogpost is references as:
>It is unlikely that we will include the block-wise fine-tuned weights in our initial release.
>>
>>107615913
In reality, Claude Code feels polished. And so do their web apps. They have good design.
>>
>>107615874
Thing is that it knew functions and snippets I gave it basically gave it the correct syntax. And it had correct syntax but it would just randomly decide to do something retarded that makes zero sense.
>>
huggingface is aware that people don't care about their website they just like the bandwidth right?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.