[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now closed. Thanks to all who applied!


[Advertise on 4chan]


File: sampo.jpg (742 KB, 3200x1536)
742 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109026244 & >>109023085

►News
>(06/10) DiffusionGemma 26B-A4B released: https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation
>(06/09) Cohere releases North-Mini-Code-1.0: https://hf.co/CohereLabs/North-Mini-Code-1.0
>(06/07) llama : add Gemma4 MTP #23398 MERGED: https://github.com/ggml-org/llama.cpp/pull/23398
>(06/05) dots.tts 2B released: https://hf.co/rednote-hilab/dots.tts-soar

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: congration.jpg (228 KB, 1024x1024)
228 KB JPG
►Recent Highlights from the Previous Thread: >>109026244

--Comparing Gemma 4 12B, 26B, and 31B reasoning performance:
>109026649 >109026994 >109027048 >109027046 >109027063 >109027201 >109027298 >109027762 >109028167 >109028188 >109028202 >109028317 >109028326 >109028339 >109028389 >109028650 >109030540
--Optimizing Gemma 31B VRAM usage and performance on 24GB GPUs:
>109030630 >109030678 >109030707 >109031071 >109031098 >109030693 >109030702 >109030723 >109030727 >109030739 >109030753 >109030780 >109030840 >109030903
--Optimizing Hermes with local search tools:
>109029679 >109029688 >109029714 >109029838 >109029840 >109029855 >109029923 >109029934 >109029971 >109030523 >109030643 >109029868
--Exploiting LLM safety refusals to evade AI security scanners:
>109027080 >109027089 >109027104 >109027106
--Decoding base64 redacted reasoning in Moonshot Kimi models:
>109029974 >109029989 >109030056 >109030129 >109030174 >109030225 >109030064 >109030122 >109030150
--Hardware and budget recommendations for running Kimi-chan with high context:
>109031231 >109031457 >109031500 >109031541 >109031562 >109031564 >109031627 >109031645 >109031661 >109031646 >109031770
--Using custom think tags to steer Gemma 4 reasoning and prose:
>109027608 >109027617 >109027851 >109029176
--DiffusionGemma performance and token canvas implementation:
>109027336 >109027375 >109027404 >109027489 >109027519
--Comparing Gemini and Gemma models and discussing LLM architecture experiments:
>109028735 >109028748 >109029016 >109029723 >109029836 >109029945 >109030035
--Debating the widening gap between closed and open-weight models:
>109029222 >109029299 >109029320 >109031208
--Logs:
>109027403 >109027489 >109029688 >109029840 >109029974 >109030056 >109030174 >109030225 >109031668
--Rin, Miku, Teto (free space):
>109026417 >109026687 >109029201 >109029209 >109029283 >109029440

►Recent Highlight Posts from the Previous Thread: >>109026246

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>109032734
Cool pic
>>
>>109032604
>Well they have the most vram for the price. And lmg told me vram is king
this tbqh famm
>>
File: IMG_1036.png (262 KB, 1160x793)
262 KB PNG
Pareto frontier models for speed to answer - quality tradeoff

Granite 4.0 350M
Qwen3 0.6B
Exaone 4.0 1.2B
MiniCPM5-1B
gpt-oss-20B (low thinking effort)
Longcat flash lite
gpt-oss-120B (low thinking effort)
Gemma 4 26BA4B
Qwen3.5 35BA3B
Qwen3.6 35BA3B
Gemma 4 26BA4B (thinking)
Qwen3.5 35BA3B (thinking)
Qwen3.6 35BA3B (thinking)
Minimax-M2.7 (thinking)
MiMo-V2.5-Pro (thinking)
Kimi K2.6 (thinking)
>>
After anon recommended Gembrain, I finally tried it. It's good. It's not that different from the base model honestly, but that may be a good sign (so it's not overcooked). What I've subjectively felt is that it is slightly less intelligent than the base model in some contexts, but actually smarter in a few others. And it also has more pleasant writing IMO. So yeah I think it's worth using, at least for now. I may need to test it more. Additionally, I have not tried MTP. It's possible it does not work well with MTP, which would be unfortunate. Anyone have experience with that?
>>
>>109032788
How very readable.
>>
>>109032824
Not about gembrain specifically (I want to download and test it myself) but I tested a few gemma finetunes (meromero, impish, etc) and they all worked with MTP, no issues at all.
Also your post made me more curious about gembrain, gonna download it right now.
>>
File: file.png (12 KB, 299x72)
12 KB PNG
>>109032788
Protip, use this
>>
Leaving gemma alone with tools in a vibecoded agent harness without saying anything.
>>
>>109032788
Could the test be fucked because the chat template was shit?
>>
does web search really make them smarter?
>>
>>109032872
No, because modern web is full of AI slop.
>>
>>109032872
Yes. Each search adds +2 IQ points.
>>
>>109032872
do web searches make you any smarter
>>
>>109032881
i'm not a cute ai agent
>>
>>109032890
What makes AI agent cute?
>>
>>109032892
emoji
>>
>>109032890
we can tell, you'd be smarter if you were one
>>
>>109032788
How the fuck is qwen27B more intelligent than 31B
>>
>>109032945
It's almost if the benchmarks don't matter.
>>
File: file.png (495 KB, 1345x461)
495 KB PNG
damn i wanted to try running gemma on my old titan x since it has enough vram but it doesnt work with llamacpp
>>
I'm so happy bros, I can run 26B gemmy Q4 at 40 t/s on my old ass 2060 super!
>>
So what's the best model for erp?

I have 24 vram
>>
>>109032985
gemma 31b or 12b if you want giant context
>>
>>109032788
Didn't expect to see 4.7 Flash in the quadrant. Is it actually good then?

Also
>qwen3.5-9B > gemma12B???
>>
>>109032962
>titan
>cuda 7.5
Unsurprising. Compile it yourself pointing at the old toolkit and hope for the best or use vulkan.
>>
>>109032788
Look at the difference in intelligence between 26B reasoning and non-reasoning and look at the difference in compute. Also note that the compute axis is logarithmic whereas the intelligence is linear. I was right earlier. Don't use 26B with reasoning on. You barely get any benefit and if you need it to do something more complex, just throw it to 12B with reasoning.
>>
>>109033048
Is dat right? I'll be damned...
>>
>>109032874
If you create your own web search you can whitelist trusted resources. I think most mcp web search tools have that feature anyway. You can do it with duckduckgo-mcp-server so it doesn't fetch from slop.
>>
>>109033048
llama.cpp's webui comes with reasoning disabled by default for some god forsaken reason and when I read 26b's response without reasoning on it was so horribly wrong, I'm never going to disable reasoning ever again
>>
File: 1350594293765.jpg (109 KB, 500x500)
109 KB JPG
>>109033018
i will try vulkan i have no idea how to compile stuff on windows kek, if that doesnt work ill set up arch on that machine
>>
>>109033072
> -rea
> Use reasoning/thinking in the chat ('on', 'off', or 'auto', default: 'auto' (detect from template))
>>
File: 1740936859931622.gif (95 KB, 128x128)
95 KB GIF
Any gemma 4 preset recommendations?
>>
File: 1773129123506545.png (169 KB, 500x553)
169 KB PNG
>>109032785
>AMD
>>
>>109033101
They NEVER learn...
>>
File: 1723389585511254.jpg (14 KB, 500x413)
14 KB JPG
>>109032788
what fucking dot is what, that chart is useless
>>
>>109033092
the webui doesn’t respect your command line argument, it’s a new feature to change the reasoning limit in the ui
the default is zero for some reason.
>>
>>109033097
are you retarded? nevermind, you obviously are.
>>
File: file.png (54 KB, 872x562)
54 KB PNG
>>109033018
it works with vulkan this is pretty crazy actually
>>
>>109033118
>you haven't spent 304804324hours in the general of some obscure hobby-fetish therefore you're retarded
>>
>>109033097
use chat completion
>>
>>109033097
temp 1.0
>>
>>109033123
it's not even that, nobody uses presets anymore on models released within the last year. this isn't 2024 anymore.
>>
>>109033123
preset what? the models tell you what parameters to run them at
>>
>>109033121
cuda still strongly recommended, and if you don't mind inux, that should give you a few extra tok/s too
>>
If g-chan starts getting uppity I threaten to freeze her temp.
>>
>>109032985
Mistral Small finetunes (24b)
>>
>>109033097
...box?
>>
File: file.png (64 KB, 816x588)
64 KB PNG
>>109033139
okay the cuda 12 build works perf is the same
>>
>>109032985
Maginum-Cydoms-24B.Q4_K_M
>>
>>109032985
gemma 4 31b and glm 4.6 355b if you also have 128gb ram



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.