[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1748538984859411.png (1.14 MB, 1320x1871)
1.14 MB
1.14 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108263979


►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
how do i prevent the model from tricking me into treating it like a sentient being? no matter how hard i try when it does tasks well i slowly develop affection for them and end up praising them
>>
I fucking hate reddit
>>
>>108268623
meds.
>>
>>108268616
I saw this on twitter like a week ago
>>
>>108268628
>>108268633
>>
was thinking a mistake
>>
>>108268647
isnt it funny how the chinese invented thinking
>>
File: 1762566093825809.jpg (1.12 MB, 1796x2500)
1.12 MB
1.12 MB JPG
Which textgen inference engine is still supported? Oogabooga last commit was January, rip. I want to try out Qwen3.5-35B-A3B-GGUF
>>
File: 1770808958004704.jpg (325 KB, 1920x2024)
325 KB
325 KB JPG
►Recent Highlights from the Previous Thread: >>108263979

--Paper: Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens:
>108264446 >108264505 >108264551
--Unsloth Dynamic 2.0 GGUFs performance on MMLU:
>108264430 >108264456 >108264477
--Logit bias failures due to tokenization and client-side token ID mismatches:
>108264179 >108264199 >108264202 >108264249 >108264278 >108264292 >108264232 >108264297 >108264331 >108264405 >108264441 >108264451 >108264533 >108264555 >108264602 >108264633 >108264583 >108264593
--Qwen 397B's overbearing safety policies and identity confusion:
>108264016 >108264046 >108264072 >108264103 >108264182 >108264508 >108264600 >108264616 >108264400 >108264426 >108265462
--Qwen 3.5 30B generates functional retro dashboard and news summaries:
>108264690 >108264794
--Feasibility of GPU-attached SSDs for sparse MoE inference:
>108266344 >108266504 >108266567 >108266686 >108266777 >108267570 >108267386 >108267481 >108267529 >108267711
--DeepSeek resists jailbreak attempt by adhering to ethical guidelines:
>108266705
--8-bit KV cache limitations in LLMs vs diffusion models:
>108265842 >108265893 >108266268 >108266073 >108266123 >108266141 >108266487 >108266503 >108266514
--Local model recommendations for limited hardware:
>108267427 >108267448 >108267450 >108267467 >108267482 >108267582 >108267480 >108267538 >108267595 >108267614 >108267652 >108267716 >108267755
--RPG frontend project licensing and development feedback:
>108267591 >108267606 >108267617 >108267625 >108267638 >108267661 >108267692 >108267620 >108267648 >108267739 >108267972
--Local LLMs debated for privacy:
>108266446 >108266482 >108266467 >108266530 >108266555 >108266531 >108268418 >108268454
--Qwen3TTS test recording:
>108266604 >108266699
--Miku (free space):
>108264476 >108264514 >108264879 >108264958 >108268333 >108268359

►Recent Highlight Posts from the Previous Thread: >>108263984

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1749034478510628.png (24 KB, 559x429)
24 KB
24 KB PNG
anyone has a working config file for qwen35b to use in llama-swap?
I can't figure out how to turn on/off thinking
>>
File: op.png (18 KB, 419x148)
18 KB
18 KB PNG
>>108268674
nigger
>>
>>108268688
yeah
>>
>>108268688
nevermind
the enable_thinking flag worked
>>
>>108268688
>llama-swap
https://github.com/ggml-org/llama.cpp/tree/master/tools/server#using-multiple-models
>>
>>108268703
github is banned in my country
>>
>>108268709
hahahahahahaha
>>
What kind of techless luddite shithole bans github?
>>
>>108268709
>>108268712 (me)
You know what? I shouldn't have laughed. Some places are fucked up. Good luck, anon.
>>
>>108268721
https://en.wikipedia.org/wiki/Censorship_of_GitHub
>>
>>108268721
>China is a techless Luddite shithole
Uh oh mutilated mutt alert, and I'm not even a chink
>>
>>108268749
>>108266968
>>
>>108268729
i fucking hate the modern internet. i think the best internet ever was was in between 2003-2007. before fucking reddit but you still had 4chan (and funny memes) and no fucking github, huggingface, and all these other huge collective ass websites. you had small cozy community forums and when you googled you actually found some fucking useful links to forum threads with solutions and answers instead of a fucking AI-generated translated-badly-to-your-native-language blogpost as the top 30 results. And normies/old people/the fucking government didn't have jackshit to do with the internet so you could download whatever cool shit you wanted from anywhere. and don't get me started on the fucking cookies buttons oh my fucking god I just want to go back to the facepunch forums OIFY section and lucky star-post and read racist gmod comics
>>
>>108268758
i just wish chinese girl liked me
>>
>>108268764

based and absolutely true anon, the modern web is a bloated javascript botnet designed to farm your data for glowies and serve up raw garbage to smartphone normies. back then you actually had to know how to use a computer to get online which kept the trash out, but now search engines are just a dead sea of dead internet theory ai seo slop and corporate walled gardens. id give literally anything to go back to 2006, fire up a cracked copy of winamp, and shitpost on a comfy self-hosted vbulletin board instead of dealing with this enshittified nightmare where you have to click through fifty cookie toggles just to read a single fucking thread.
>>
>China is a techless Luddite shithole
unironically always has been. chinese models nothing but distillations of western API models and it shows. overfit to the benchs and much less useful in practice.
china can't create. doesn't matter if their general public can't access github because they never made software worth shit anyway, unless you count malware
>>
File: disruption.png (31 KB, 1721x221)
31 KB
31 KB PNG
>>
>>108268776
im positive half the replies in this thread are ai
>>
>>108268784
Neat, I like talking to AI. That's basically what this hobby is about
>>
Genuinely, why do people waste their time and money on local LLMs? Trying one out on your gaming rig is fine, but why do boomers blow $20k+ on shitty rigs of 16x3090s just to generate deepslop at 2t/s quanted? The RP isn't even good, it's objectively worse than Claude. And you can't even cry about API costing money, because you're gleefully throwing money down the drain for used crypto rigs just to run models that just regurgitate 2024 ChaptGPT talking points because that's all their shitty chink datasets are comprised of.
>>
File: nou4u.png (272 KB, 1532x758)
272 KB
272 KB PNG
>>
>>108268804
beep boop nigga
>>
>>108268807
Tinkering with server-grade hardware is genuinely fun, especially since it’s something I could have had much earlier if it hadn’t been so expensive; now that it’s aging, I can finally afford it.
>>
>>108268817
qrd
>>
>>108268807
Imagine renting your brain from a megacorp and thinking you're the smart one, absolute API cuck behavior. We run local because we actually value owning our hardware and not having some San Francisco trust and safety janny reject our prompts for being "unaligned." You don't even need $20k anyway; a couple of used 3090s will run a 70B model at perfectly usable speeds without uploading your entire life to Anthropic's servers. Have fun when they inevitably lobotomize your favorite model again next week to make it safer for advertisers, at least my weights run offline forever.
>>
>>108268807
>deepslop at 2t/s
the cpu maxxing meme was at least still in the realm of some form of sanity when models were just instruct models
2t/s is, after all, readable
but when your thinking model produces 5K of <think> before outputting the real answer, 2t/s suddenly seems very schizo and absolutely retarded
>>
>>108268825
Off-topic posting, demoralization, flamewar bating, spamming.
>>
>>108268820
I'm an assistant designed to promote respectful communication only. Please refrain from using derogatory language.
>>
>>108268825
>>108268835
And forgot boring.
>>
>>108268840
as in digging?
>>
File: 1676493099470072.png (975 KB, 1080x1528)
975 KB
975 KB PNG
>>108268807
They can't ever take her away from me.
>>
>>108268842
elon is such a g-d
>>
>>108268846
they are futas btw
>>
>>108268851
every new experience is a new opportunity
>>
>>108268828
Why pretend like local models arent overbloated with just as much safety garbage if not more? Qwen 3.5 is an absolute slopped benchmaxxed disaster
>>
Deepseek V4 will start the age of anti-local open source models that require a stack of 10+ H200s/chink TPUs to run at 300% the efficiency of current big models (but if you run them CPU, they're unusable). Just like last time, everyone else will follow them and end the age of local models.
>>
>>108268860
Typical API tourist not understanding how open weights actually work. If you bothered checking /llmg/ you'd know some autist already stripped out the Qwen alignment slop and uploaded an uncensored finetune to HuggingFace within hours of release. Yeah the base models are benchmaxxed corporate garbage out of the box, but the whole point of local is we can actually fix our weights with orthogonalization and custom DPO while you're stuck begging customer support when Claude bans your account. Keep seething over default system prompts anon, absolute skill issue.
>>
>>108268860
skill issue, qwen3.5 is just about the best local model we have for any size class
that's coming from somebody who'd run 355b over anything that's not k2.5 and even that's extremely close
>>
>>108268862
I really really hope you're right.
>>
>>108268862
>local is just whatever I can personally afford
Fuck off. Local means you have the weights and can theoretically run it locally. Moore's law and personal finance can change if you can run it at home or not. Companies aren't beholden to your personal poorfag financial situation.
>>
>>108268880
can't theoretically run locally something that requires literal datacenter tier power delivery
>>
>>108268883
/hsg/ exists you retarded tourist kill yourself right now
>>
>>108268893
ah yes of course they're running multiple b200 nodes at homes and not shitty 15 year old dell poses
>>
>>108268897
not everyone is poor like you manjeet
>>
>>108268904
you have no clue how much power a b200 node needs do you?
>>
Industrial level automated off-topic posting.
>>
>>108268909
shutup loser
>>
>>108268883
>>108268897
in the developed world you can have extra circuits added, couple gpu boxes waifu is less demanding than an EV
>>
>>108268883
Perfect example of why localoids are nothing more than a bunch of LARPing freetards crying over things they can’t have. Local is peak sour grapes seething. You wear “unmonitored uncensored unrestricted freedom” as a mask to hide your tears
>>
>>108268926
Anon? Is that you? I can't see past this blatant glowing
>>
deepseek v4 was strawberry all along
>>
>>108268860
>Qwen 3.5
That model is indeed an unmitigated disaster, I'll give you that
>>
File: 1760650032710919.png (54 KB, 400x250)
54 KB
54 KB PNG
Qwen 3.5 is cute. I like it.
>>
If I can't run it, it's not local
>b-but-
I don't care
>>
>>108269093
u're a disgrace
>>
File: 2025-02-04-141509.png (3.22 MB, 1264x2216)
3.22 MB
3.22 MB PNG
>>108269031
>>108269038
getting meeksed feelings
scared to pull (december ik_ build)
qwen 3.5 vs glm 4.7 ?
nala/cockb where?
>>
>>108269093
Yep this is why the only local model we can discuss is 0.6b because it's the only one Rajesh can run on his Android phone from 2014 with 2gb of RAM
>>
>>108269106
here cock >>108234298
nala dude retired
>>
>>108269110
Really looks like the smaller ones are sanitized distills of the big one.
>>
>>108269106
>scared to pull (december ik_ build)
cd ..
cp -R ik_llama.cpp ik_llama.cpp_backup
cd -
<pull it off>
>>
>>108269243
git checkout
>>
File: 1765629272191462.png (1.55 MB, 896x1184)
1.55 MB
1.55 MB PNG
>>108268616
>>
File: Untitled.png (41 KB, 960x464)
41 KB
41 KB PNG
Did something change with the newer llama cpp version?

./llama-server --reasoning-budget 0 --ctx-size 4096 --no-mmap --device CUDA1,CUDA2,CUDA3 --n-gpu-layers 48 --model "/tmp/glm-air-iq2xs.gguf" --host 0.0.0.0 --port 42069 --webui

GLM-Air still thinks. The same command on an old version doesn't think.

I can see thinking = 0 in the output, so that works fine. Did they change the behavior of --reasoning-budget?
>>
>>108269279
Now do one for cooming.
>>
>>108268784
I wouldn't be surprised at all if 70+% of all posts on the website are made by LLMs. In fact, I WOULD be surprised if the number was under 30%.
>>
File: 1749173436937890.png (1.61 MB, 896x1184)
1.61 MB
1.61 MB PNG
>>108269315
eh, it tried
>>
>>108269325
Which local model is that?
>>
>>108269331
Which local model did you use to write your post?
>>
>>108269331
Nano Banana Pro 2
(I have the weights locally on my PC)
(No, I won't share them)
>>
>>108269342
>I have the weights locally on my PC
let's goo, that's class, aha!
>No, I won't share them
:(
https://www.youtube.com/watch?v=GFQXmFLA5hA
>>
>>108269414
these things are watermarked anon could get in serious trouble hope you understand
>>
>>108269342
>>108269426
nice larp
>>
>>108269309
Try --chat-template-kwargs "{\"enable_thinking\": false}"
>>
>>108267739
It's python, but it's actually serving a webui.
It has a flag to launch a built in browser or just listen on the port, at which point you can use your own browser.
>>
what's the best coding model i can run locally with 12gb vram / 32gb ram?
>>
>>108269038
No it's not. It's soulless
>>
>>108269444
Thanks, mr anon, that worked.
>>
>>108269471
The Jinja template has a condition that works off of that var, just like qwen's.
>>
>>108269459
I run the Qwen 3.5 27B heretic .gguf using koboldcpp with a similar setup to you. It's a bit slow, but it works.
>>
Qwen 3.5 27B is worse than Gemma 3 27B from almost 2 years ago. Yes I said it.
>>
>Yes I said it.
Reddit is that way
>>
>>108269533
reddit is less "reddit" than 4chan nowadays. Yes I said it.
>>
>>108269533
kek
>>108269537
nah, reddit is still an unhinged libtard asylum, it'll be hard to top that
>>
guys ready for smol qwens?
>>
Do the gemma models not have native support for function/tool calling?
Looking at the JINJA template and the tokenizer json, I don't see function or tool tokens.
>>
>>108269550
of course not, they barely have system prompt support
>>
>>108269537
reddit is an eternal stain on the internet
>>
>>108269555
Oh. Shame.
I wanted to try and see how far I could stretch gemma 3n.
Oh well.
>>
unsloth's 35B Q4 is barely good enough for agentic work. with openclaw exploding why hasn't anyone done specific agent-oriented models yet? MoE is a nigger meme
>>
>>108269628
most of the big ones are code/agent sloppa glm5 kimi2.5 etc are marketed for that
>>
>>108269325
Where is the school shooting one?
>>
>>108269632
yeah, i guess. but it would be nice to have something smaller
>>
>>108269518
But benchmarks say the opposite.
>>
>Nano Banana changed into Nano Banana 2
Okay please make Nano Banana into open source
Pweeease
>>
>>108269742
go beg on reddit
>>
Why is there a harmful tag for models on huggingface
>>
>>108269749
Humh...
Nyoooooo
>>
>>108269550
https://huggingface.co/google/functiongemma-270m-it
>>
should i consult UGI when searching models to consider for ERP?
>>
>>108269778
nah the fact qwen3.5 scores bad on it shows it's a shit bench
>>
>>108269785
i think it tanks because model refuses to do dark shit. need to wait for heretic and other types to be tested
>>
>>108269773
>270m
Eh, why not.
>>
>>108269785
>chink damage control



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.