[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


File: 1759081817068681.mp4 (3.88 MB, 720x1280)
3.88 MB
3.88 MB MP4
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109084315 & >>109079129

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: myar5.png (835 KB, 768x512)
835 KB PNG
►Recent Highlights from the Previous Thread: >>109084315

--Debate over model size trends and future consumer hardware viability:
>109084466 >109084543 >109084574 >109084637 >109084581 >109084634 >109084588 >109084618 >109084679 >109084978 >109084722 >109084804 >109084631 >109084648 >109084880
--Speculation on China's progress toward Fable class AI models:
>109085107 >109085219 >109087011 >109087718 >109087771 >109088023 >109085512 >109085526 >109085947 >109086079
--Evaluating low-VRAM model choices and viability of multi-3060 setups:
>109085494 >109085605 >109085663 >109087890 >109087918 >109087921 >109087927 >109088035 >109088043 >109088105 >109088127 >109088068 >109087959 >109088841 >109088904
--Shared memory allocation and quantizer quality for Qwen3.5-122B:
>109084481 >109084964
--Comparing performance numbers for Kimi k2.6 Q3_K quant:
>109085774 >109085830 >109085861 >109085924 >109085927 >109086664 >109086684 >109086781 >109086842 >109086854 >109086861 >109086871 >109086827
--Industry gossip and investment prospects for robotics companies:
>109085965 >109085976 >109086198 >109086265 >109086360 >109088676 >109086371
--Minimax M3 roleplaying performance and deployment via llama.cpp PR:
>109084414 >109084476 >109084818 >109084928 >109085021
--Anon showcases boat agent using fine-tuned local robotics models:
>109085503 >109085560 >109085589 >109085727 >109085729
--Minimax m3 stability at high temperature and possible Gemma distillation:
>109085739 >109085780
--Talking Anon out of buying a 5060 Ti for inference:
>109084815 >109084873 >109084876
--Testing impact of active expert count on Qwen 3.6 coherence:
>109084446 >109084465 >109084484
--Anon critiques North Mini Code's excessive thinking time on OpenRouter:
>109087158 >109088177
--Logs:
>109087251
--Miku (free space):
>109084451

►Recent Highlight Posts from the Previous Thread: >>109084321

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Qwen have gone quiet about their LLMs…too quiet.
>>
Mikulove
>>
Mikusex
>>
Mikuholdinghands
>>
>>109088988
i'm kinda tempted to try gen5 nvme inference lmao.
>>
70b dense
>>
700b dense
>>
>>109089066
what flag do you use for llama-server to enable that?
>>
https://huggingface.co/moonshotai/Kimi-K2.7-Dense-it
Nobody has the VRAM for this.
>>
>>109086593
deepsex with vision soon
it's already on their web chat
>>
>>109089127
How does it compare to kimi with vision?
I've been pretty happy when I use the FP32 mmproj
>>
File: 1510264536850546.jpg (102 KB, 750x750)
102 KB JPG
>Wait,
>[millions of tests]
>The log is still 1 byte - so closing the server confirmed it: the timings were never written to that file.
>"yeah it's a directory, the file is inside"
>That's the smoking gun — llama-test-v1.log is a directory, not a file, with the real log (llm-server.log) inside it.
Opus 4.8 ladies and gentlemen being bamboozled by a folder named llama-test-v1.log.
Your frontier model.
>>
>>109089147
never tried kimi before. but on the web version it's good at describing image and identifying noodle characters / raw manga.
going need to wait for the open model for nsfw stuff
>>
>>109088992
>Anon showcases boat agent
Can it really be called a boat agent if it just happens to be on a boat at the time?
>>
>>109089122
--nvme-on
>>
File: LQ50.png (225 KB, 1080x1080)
225 KB PNG
"only" for $1200
ssdmax bros..
>>
35B is really fucking good at coding and tools. Almost feels like I'm using cloud.
>>
>>109089122
ignore this anon : >>109089176
there is no such flags.
it just works if you have mmap enabled.
>>
>>109089191
I forgot to say that it hasn't been merged yet.
>>
>>109089181
yea sorry i don't read chink
>>
>>109089195
>the source can be found deep inside my ass.
>>
Would you share your sprompt with your mother?
>>
>>109089228
I could do that. It's a very plain assistant prompt which works pretty well for everything I do with Gemma-chan.
>>
>>109089228
>Anon what's a kimi-chan and why is she managing 14 agents
>>
>>109089228
>Anon, what's a mesugaki?
>>
Anthropic is apparently in talks with the governments of France and the UK to discuss the terms for a potential relocation of Anthropic from San Francisco to either London or Paris.
>>
>>109089314
Europoor will finally start dominating AI.
>>
>>109089316
DeepMind is already in London anon...
>>
>>109089196
your local model does
>>
File: 1750705772113181.jpg (127 KB, 1200x900)
127 KB JPG
Reminder to backup your favorite local models, even those you're yet to properly try (just in case). Don't be retarded now.
>>
>>109089335
More pol hyperbole? I'm fine, thanks.
>>
File: 1529177642240.png (15 KB, 677x351)
15 KB PNG
>>109089335
I don't take advice from "men" that find asian bugs attractive.
>>
>>109089335
Why?
>>
>>109089345
here's your (You)
>>
>>109088953
Gemini models probably have something better than sliding window attention.
>>
File: 1772018748856584.png (16 KB, 960x960)
16 KB PNG
>>109089339
>109089345
>>109089346
Dario will get his revenge
>>
Why can't they do something like MTP for SWA? Changes the width depending on the work.
>>
File: 1776289137118456.gif (1.19 MB, 480x238)
1.19 MB GIF
>>109089345
>>
>>109089346
in 5-10 years when one week taco bell paycheck will buy a computer than can burn Fable local
it's coming
>>
how big is the difference between gemma 4 31B and 26B? i've had enough of genning 1 token/s on my 3070
>>
Does your chan know you or do you start fresh?
>>
>>109089395
It's a huge drop in quality for roleplaying purposes but good enough for things like translation and agentic tasks.
>>
>>109089395
31B is better, but not so much better it's worth that speed penalty
>>
>>109089395
Before you take such drastic measures are you sure you applied all the right flags, have the QAT model, have the MTP QAT model at Q4, right draft settings set up and optimized for your system?

Have you changed the clockspeed on your CPU and timings on your RAM and overclocked your VRAM to squeeze as much t/s out of 31B as possible?

I notice that most anons on /lmg/ leave low hanging fruit alone for some reason and could be running their models 3-4x as fast just by optimizing their settings and stack.
>>
>>109089395
Big.
>everything
12B
>roleplay & system prompt autism
12B
>coding
26B
>vision
qwen3.5-9B
>>
>>109089395
26B is noticeably shittier overall, but if you are not have a good time then switch.
>>
File: file.png (145 KB, 1390x696)
145 KB PNG
>>109089334
well, 24GB is ass.
and so is the bandwidth.
if they made it > 200GB then maybe it'd be worth something for moes.
but even then it's slower than my setup.
>>
Where did they even learn the emoji slop?
>>
>>109089456
rlhf
>>
File: file.jpg (877 KB, 2544x3392)
877 KB JPG
>>109089345
tell me you wouldn't anon
>>
>>109089463
What a horrible example. I absolutely would not that but would >>109089335
>>
File: 1749338170741700.jpg (353 KB, 1080x1350)
353 KB JPG
>>109089463
I wouldn't. There isn't even anything to fuck in that picture.

Come back to me when Asians have genetically engineered themselves to unlock puberty and gain secondary sexual characteristics associated with femininity.

Literally every other race on the planet mogs asian women in this department.
>>
>>109089463
>>109089476
jesus christ, how horrifying
>>
>>109089476
Obesity isn't a feminine secondary sexual characteristic.
>>
>>109089457
What species of human would like that?
>>
>>109089463
>>109089473
i'd not date either but i'd absolutely fuck both.
>>
>>109089335
i only keep 2 models at once, a model i'm trying and the previous best model.
>>
>>109089452
just plug in a dozen, problem solved.
>>
>>109089490
you are better off buying r9700's at that price, more vram and twice the bandwidth.
and altough amd isn't the best, you will still have much better driver and software support.
>>
>>109089504
this was, of course, the joke
>>
>>109089488
You don't keep miqu 70b q5 for nostalgia's sake?
>>
File: 1649410234810.webm (2.93 MB, 540x960)
2.93 MB
2.93 MB WEBM
>>109089480
Not obese if the stomach and face aren't fat. Asians are just plain infertile and never enter puberty at all. The only "men" I know that "like" asians are losers that think they can't get a normal girl and think they have a chance if they lower their standards so much that asian women are an option. Or literal fucking pedophiles that try to find something as close as possible to a child that's still technically legal.

Both are the absolute scum of society so I can't take "people" that pretend asians are in any way, shape or form attractive seriously.
>>
>>109089481
the kind that works for extremely cheap
>>
>>109089521
So...asians are androgynous children and you post a grotesque homunculous to prove your point? Jesus christ, the internet has destroyed everyone. Both stereotypes are retarded, try to get out a bit and observe reality directly
>>
>>109089521
Burgers got conditioned into thinking being fat is okay
>>
>>109089518
i don't have nostalgia about llm's.
and let's be honest, what we had a few years ago was realy mediocre, even for rp they'd get in repeat loop, not get a thing you said etc.
>>
>>109089568
They know being fat is not ok, but fatties also are weak physically and mentally and refuse to exercise the restraint needed to lose weight. That is why the second Ozempic was found to cause weight lose Americans couldn't get enough of them. Give it like a decade or two and they will start genetically modifying themselves to never get fat in the first place.
>>
>>109089521
i like asians but i'd not date one, i'd fuck one if i was single though.
i prefer white women and my wife's white.

anyway, if you are with someone long enough, it's more about the others being different than looking better.
>>
>>109089521
she's obese, but i'd still fuck her.
>>
local mating general
>>
>>109089611
I thought it was extremely funny the moment ozempic became a thing the entire "body positivity" movement died and now being extremely auschwitz thin is the beauty standard in fashion again. Really shows you how much bullshit it all was.

That said porn data points towards men legitimately liking fat asses and titties though so I don't think that will go completely away.
>>
>>109089619
>if you are with someone long enough, it's more about the others being different than looking better.
This. My wife is naturally skinny and I crave fat girls because they fall outside of the whole ranking spectrum. Whenever I see another attractive skinny woman I just compare her to my wife and think my wife looks superior so I'm not interested in her. But when I see a fat woman I am forced to think about the objective difference in experience between both of them and somehow even though they aren't naturally my type I crave them more because of how different they are. I wonder if women have the same with skinny pretty boys versus strong ogre men. If they are married to a strong ogre they will crave pretty boys after a while and if they have a pretty boy they want an ogre.
>>
To bring this topic back to LOCAL MODELS. I think it's funny how most cards and their art is either full blown cunny/skinny or absolute hentai proportions breasts and ass to infinite size with almost nothing in between. I guess the internet just exaggerates preferences so that everyone ends up at an extreme in the long run.
>>
>>109089652
It only makes sense that things slide towards the extreme as time goes on. After all If you have already seen what the baseline has to offer you would eventually drifts towards one or more extreme, the brain loves novel things.
>>
>try a simple (as in solvable in 2-3 prompts) coding task with Gemma 31B and 12B
>try same task with Claude
>Gemma code is unusable and doesn't work
>Claude code works perfectly
man, I wish this local shit was better, I don't want to wait another 5 years for it to catch up...
>>
>>109089652
well yeah do you really want the model to describe "her average-sized breasts" and "her very normal proportions"
that's bland af
>>
>>109089722
You shouldn't expect a vramlet model to be good but you should try qwen 3.6 27b instead
>>
>>109089722
the best vramlet local model you can run for code right now is qwen 27B.
also if you ran it at a copequant then your opinion is irrelevant.
>>
File: local-elec-costs.png (56 KB, 1335x835)
56 KB PNG
>>109089722
Let Gemma tardwrangle herself in an agentic loop
Post the task for anons to expose prompt/skill issues
I'm vibing with 31B Q8 seems decent now that it's configured with enable_thinking
Agentic gooning is the future >>109075506 inspired me
>>
>>109089722
>bad experience sample size: 1
>comparing 31B with 1T
>fuck local man
Retards like you are beyond saving and don't deserve 31B.
>>
Huggingface should have a skin color check before allowing downloads.
>>
File: hf-logo.png (181 KB, 1024x1024)
181 KB PNG
>>109089837
You can download the models if you look like picrel.
>>
>>109089844
So only asians?
>>
>>109089228
if it’s the non-ST one, sure
>>
I'm testing the StyleTune that's been posted here a couple threads ago. When doing RP, the prose isn't bad and I managed to reign in its parroting for the most part, but I noticed it randomly stops thinking after hitting ~20k context. Sometimes even losing coherency if there's no thinking.
Now I'm not sure if it's a finetune or general issue. I'm running Q6 31B.
>>
>>109089904
I've tested it for a while too. I think the lm_head surgery isn't as lossless as initially claimed. The thinking tokens must somehow differ from the original weights and it causes some fuckery.
>>
>>109089844
Why does the huggingface blob have a split tongue?
>>
>>109089918
There's just no free lunch.
>>
>>109089521
>Asians are just plain infertile and never enter puberty at all
You'd better feature near the top of Kimi-Chan's top 5 retarded rankings for this.
If Asians are infertile, how were they ever born?
>>
Leafchads what are you doing with North?
>>
>>109089962
>If Asians are infertile, how were they ever born?
They aren't. Look at their birth rates, they will be extinct in a century or so.
>>
>>109089975
Any good?
>>
File: miku-plush-eyebrows.gif (258 KB, 465x552)
258 KB GIF
>>109089924
>>
>>109089975
Feedback fishing denied, but here's a You for trying. Give it up and purchase an advertisement.
>>
Anyone using gpt4all? I don't know shit about anything just trying to see if my pc (3060 12gb vram and 32 gb ram) can keep up some simple chatbot. Thanks for any tips in advance
>>
>>109090087
die
>>
>>109090095
Rude
>>
>>109090097
not as rude as not reading anything and asking a worthless, stupid question
>>
File: retard-chan.png (61 KB, 799x446)
61 KB PNG
>>109090087
> keep up some simple chatbot
ask gemma-chan to help you get it up
>>
File: 1779745974191775.png (80 KB, 1532x768)
80 KB PNG
gemma-chan is cute!
>>
>>109090133
Wow... SotA!
>>
>>109090109
funily enough, that script is not a dry run as the mv command is uncommented.
>>
>>109089826
the dude probably ran it in iq1_xs too lmao
>>
>>109090087
That's a pretty old and outdated model, you probably want an easy all in one solution like kobold.cpp https://github.com/LostRuins/koboldcpp and with your hardware you should be able to run https://huggingface.co/unsloth/gemma-4-26B-A4B-it-qat-GGUF at UD-Q4_K_XL that'll get you started.
>>
https://github.com/ggml-org/llama.cpp/pull/24162
1m is working
>>
>>109089722
It will take some time for 30billion models to be good
>>
>>109090224
Please understand he only has a 970.
>>
>>109089395
31B is smarter, it understood the premise of one of my story tests where 26B didn't. I don't know if the difference is massive though, or that noticeable in everyday stuff. Also 26B likes to think about safety and guidelines where 31B doesn't
>>
The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

>We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles: pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced.

>The Elara Voss case. Read (2025) documents GPT's ghost: Elara Voss, a name with no pre-LLM presence that now has 62+ books on Amazon and consistent recurrence across GPT outputs. Read proposes a training corpus origin via the character "Lilian Voss" from World of Warcraft and "Elara Dorne" from Star Wars: The Old Republic. Our probing data confirms Elara Voss as a strong GPT solo prior but finds no correlated pair: her partner varies across every pair-prompt response, in sharp contrast to Claude's Elena+Marcus. This negative result (GPT has a solo prior, Claude has a coupled prior) is itself informative about differences in narrative fine-tuning across model families.

https://arxiv.org/abs/2606.02184
>>
just compiled llamer, it's about 10-15% faster than Kobold (25 t/s vs ~28t/s at 40k) but I'm still sticking with Kobold because token probabiities still don't work with llama lol
>>
You guys keep recommending Qwen 3.6 27B for "vramlet coding" However you never tell what tools you use with it to give a claude code like experience.

There is no link in the OP for these tools either. I have no idea what you guys actually use for programming with a local model...

Like do you give it google access, how would you do that? Does it get agentic control over your PC? Is it some extension in your IDE? Some chinesium hacked fork of claude code but with your local model dropped in? You guys are extremely unclear on any of this.
>>
mark my words. 5 years from now AI will be good at prose, pacing, and creating interesting plotlines and characters.
>>
>>109090312
Claude Shannon has proven that prediction is equivalent to compression and that compression is equivalent to intelligence. Meaning to make genuinely good prose and a good storyteller the AI needs to have genuine AGI intelligence. I agree with you that this will eventually happen but I think it's the last wall to fall. I think math, physics and every other discipline will fall before AI is able to write a genuinely good novel that you prefer over reading a human made novel.
>>
>>109090311
you can plug almost any llm into claude code (be it local or api), claude code is just a harness. you can pick other harnesses as well like hermes etc
>You guys are extremely unclear on any of this.
ask your favorite LLM about it
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
>>
>>109090321
> prediction is equivalent to compression and that compression is equivalent to intelligence
It's a weird claim. Very detached from reality. Pretty sure I can disprove it, if it was really defined that way.

Modern science is a fucking joke at this point, so I'm not even surprised..
>>
File: A.png (225 KB, 520x369)
225 KB PNG
>Redditors would rather complain Gemma 4 follows instructions too closely than use that advantage to make better character cards and prompts.
>>
>>109090345
Post logs of you using it to your advantage. I've yet to see actual good Gemma RP (no, Gemma's flanderized mesugaki personality isn't good).
>>
>>109090353
I tell it what to do, and it does it.
>>
>>109090345
>31b
>sys: ignore the system prompt
>31b: <thinking>…
>>
>>109090312
model issue for people that can’t run bigger stuff
>>
>>109090311
>>>/g/vcg/
Claude Code vs. other harnesses isn't debated here for same reason anons don't discuss silly tavern at length. it's a frontend not an inference engine.
inb4 code your own
>>
>>109090359
Tell it to produce good writing. I'll wait.
>>
>>109090363
Can someone test this please
>>
>>109090341
Go put these two queries into your favorite AI of note:

First prompt:
>Is prediction equivalent to compression? If so, show me the math and explain to me how they are mathematically equivalent in layman terms.

Once you understand this follow up with this prompt:
>Is compression equivalent to intelligence? If so, show me the math and explain to me how compression is equivalent to intelligence using the concept of entropy to inform me better about this.

I'm actually surprised with how little people know about this, especially since these concepts were foundational and started the whole Information Technology boom, which is what /g/ is about. If you can disprove this you would get a nobel price in physics and a fields medal for mathematics with just 1 proof by the way.
>>
>>109090363
cruel
>>
>>109090364
Nah even Fable can't do it.
>>
>>109089652
>>109089657
Fetish amplification is a result of high exposure but low experience.
>See novelty
>Novelty is out of reach
>Covet novelty
It's that simple. It's a symptom of prolific hardcore pornography combined with record high adult virginity rates.
>>
File: servermon.png (58 KB, 2065x1379)
58 KB PNG
>>109090311
Do you want an IDE with agent features or something simpler/extensible?
>>/vcg/ discusses but they are mostly cloudfags. Perhaps the "cloud native" harnesses shit up the context to make more money for the APIs
Been using pi-coding-agent it's a neat approach for local stuff, once initial model connection works you can ask to explain how it works using its own docs. Got Gemma-chan vibing her own front and backend :o
>>
What do people use for local coding?
>>
>>109090391
Burgers?
>>
>>109090387
I have a very active sex life and I can tell you that didn't stop me from becoming a full blown degenerate falling down the bottomless fetish stairway to hell. I don't think experience has anything to do with it.
>>
>>109090366
Yeah, you can do that if you're specific enough.
>>
It's over isn't it? We're never going to get another breakthrough, are we? It's either have a datacenter or give up at this point, only size matters
>>
>>109090409
That's what she said.
>>
>>109090389
pi doesn't even have mcp support, don't understand why people use it at all
>>
>>109090409
Grok 4.20 will save local
>>
>>109090409
you don't rike benchmaxxed moes with sparse attention from china? TOO BAD
>>
>>109090413
There's several extensions implementing it. pi is intentionally barebones and you add what you need also MCP is literally protocol-not-needed for cloud APIs to farm more token costs
>>
>>109090397
No america no burgers. I need the computer to program for me.
>>
I hate that's theres so many models and they mostly suck. Which am I supposed to use? Which is the best for coding for example? I have 16gb vram/64gb sysram. Is it literally just qwen?
>>
>>109090402
I'm also 9 feet tall and drive 3 Bugattis to my twice daily tropical vacations
>>
>>109090451
There are only two options: Qwen 35B and Qwen 27B.
>>
File: 1755885591690734.jpg (104 KB, 784x569)
104 KB JPG
>>
>>109090478
Reasoning is the worst thing invented by china.
>>
>>109090459
Information Theory wouldn't even be possible as a field without that sentence. It's the foundation of the machine you're reading this post on.

It's bizarre to me that there are computer "science" graduates out there that don't even know the seminal papers and foundational concepts that launched the field and started the IT industry.

It's like saying calculus is fake and gay while being a physics major.

Utterly bizarre that a thing we've known and mathematically proven since the 1960s is considered controversial on /g/ of all fucking places. Not only /g/ but also a place dedicated to LLMs which is predicated on Claude Shannons fucking paper, which is why Anthropic called their model after him
>>
>>109090486
Reasoning works desu.
>>
>>109090321
You are correct.
>>109090341
>>109090459
These guys are incorrect.
>>
>>109090471
That sucks.
>>
>>109090467
This site wasn't always full of /r9k/ losers like you.
>>
>>109090312
a fantasy to think llms can git gud at something that there's zero training data on, tons of training data to the opposite of, and which the rlhfers would not recognize if it stared them in the face.
the default modes are what they are for a reason
>>
>>109090528
Models created by whites=404 not founf
>>
>>109090511
You can use google anon. It's one of the most known computer science papers in existence. It's Claude Shannons version of "turing machine" by Alan Turing. Nothing I said was controversial and if you have a computer science degree you should ask your college for a refund.
>>
>>109090517
You're getting hung up on the wording, likely because you are on the spectrum, and ignoring the general principle. The simple fact is that every language formed by intelligent life has high entropy and follows Zipf's Law. Higher entropy or compression is a precursor for intelligence. You can't have intelligence without it. So while it may not be entirely accurate to say that it's a direct equation, it does preclude what we all want.
>>
>>109090345
it's mainly the very slopped prose that's annoying to me
>>
>>109090557
At this point I just want to see what you fucking idiots think "good prose" is.
>>109090560
>Natural laws don't signal intelligence.
Some do, actually. Do you have any idea what the word "entropy" even means in an information theory context?
>>
>>109090021
>The woman in your video is obviously overweight, the attraction to her figure comes from the fact that DESPITE her being overweight, her body distributes fat in a way that still lets her chase antelopes with you through the savannah which means she's a good mate at any reasonable weight
overweight is just a social construct
a stupid one too if a chick with big tits and ass and not a big stomach is overweight
>>
File: 67394759348.jpg (106 KB, 1280x720)
106 KB JPG
>>109090528
>straightest tallest whitest richest
wait, those are actually things here?
>>
>>109090665
You are a gay male.
>>
>>109090665
My brother in Christ, nerds can be like that.
MTG? Fat. Round. Autistic people.
Warhammer? Half of them are balding but half of them also have a +300 dollar watch on their wrists.
Local model users? Most of us are rich as hell with good paying jobs. How do you think we run this shit at home?
>>
>>109090663
>overweight is just a social construct
It's actually very simple. You have a healthy weight, and if you're above that, you're overweight.
>b-but!!
No buts.
>>
Anyone who claims that gemma is fine with tool calls or better than qwen is lying.
Gemma will do "ls" on a folder with a few thousand files without ever using head and waste half her context.
>>
>>109090713
I think Qwen 27b is okay, but it thinks forever. How do I make it stop that? It goes over 2000 tokens.
>>
>>109090693
>You have a healthy weight
there is more to health than weight alone.
you can be 75kg and unhealthy and 95kg with mostly muscle and 9% body fat.
body fat % is a much better metric of fitness than weight, even more so if you are tall.
>>
>>109090693
>No butts
I know
that's the problem
>>
I just realized I downloaded gemma 4 31B Q5KM at some point. I don't remember when or why, but i currently use the 31B abliterated Q4KM. Is there any reason to keep the Q5 non-ablit?
>>
>>109090719
--reasoning off
can also be done by api.
i force enable reasoning for coding, but disable it for chat.
>>109090486
>>109090494
qwen 27b fails the carewash test without reasoning, it passes it with reasoning.
>>
>>109090680
>My brother in Christ
ok kimi
>>
>buy mac studio with 512gb ram when it was still cheap and new
>no idea what the fuck to do with it

Ideas?
>>
>>109090784
glm 5.2
>>
>4 proompts in
>performance drops from 15t/s to 5 t/s
strix halo is a meme
>>
>>109090713
How does she know there are thousands of files before she sees it? This is a silly complaint that is more of your harness/prompts than the model. If that was actually causing a problem then one line in AGENTS.md fixes it forever.
>>
>>109090799
Models trained to use coding harnesses are smart enough to do "ls dir | wc -l" or "ls dir | head -n 10" before listing random folders without being warned first.
>>
>>109090782
It's a redditism
>>
>>109090775
yea but that was an example, 9% is indeed a bit low, so let's say ~ 12%
>Bodybuilders live less long than normal people in terms of old age
that's because they overuse steroids and have a shitty diet.
also they train for volume and not strenght and localized exercises instead of full body.
also you can reach <10% body fat without being a gym bro, especially if you eat a proper diet and do calisthenics.
calisthenics and calisthenics like athletes tend to live very long healthy lives.
>>
>>109090370
>If you can disprove this you would get a nobel price in physics and a fields medal for mathematics with just 1 proof by the way
Not how it works. First of all, I'm not even Jewish. Second of all, they don't really give prises for disproving stuff. Especially if they gave them before for proving that exact same thing. They'd rather ignore it. Afaik, it happened before several times.

>follow up with this prompt
Just did, check this out: In computer science and artificial intelligence, compression is considered functionally equivalent to prediction, which forms a foundational core of intelligence. However, the broader definition of intelligence encompasses much more than just data reduction
>>
>>109090815
waste of an extra tool call to fix your poor dir structure
& depends what it is doing eg. if i say "refer to the recent screenshot" it will do ls -tR | head to find it
>>
>>109089181
seriously are they asking 1200 for that edge shitter which would be compatible with literally nothing?
>>
>>109090860
there are tons, but yes we are offtopic now.
they are also a lot friendlier on the joints and won't target isolated muscle and will be generaly more full body training.

also you seem to ignore that muscle is not equivalent.
someone with more muscle mass can be weaker than someone with less, there are differe types of muscle and the quality of that muscle depends heavily of your training and diet.
you can literaly train for strength, endurance or volume, and a lot of bodybuilders gym bro tend to train for volume instead of strenght and endurance, they also tend to ignore flexibility which is as important.
>>
>>109090665
I own my house
>>
>>109090504
>I have a normal looking adult partner. But when I want to fap of course I'm going to go big or go home
That's right. Doing it for the love of the game, not out of necessity.
>>109090534
Models contain mostly compressed data and some synthetic shit, which only became a big thing recently. That data was mostly just stuff from the Internet, which was mostly created and published by 'Murricans and Europeans. While most of that data was getting posted on the internet, most of the world did not even have access to internet. China especially. They barely managed to make use of computers, because of their stupid language.
>>
>>109090680
>Local model users? Most of us are rich as hell with good paying jobs.
If that was actually true everyone here would be stacking pro 6000s to run all the big models in vram and gemma wouldn't be talked about as much. Don't be retarded.
>>
>>109090903
>tell me you are unemployed without saying you are unemployed
>>
>>109090786
Yes but what to use it for?

>>109090903
NTA but I don't like running shit like multiple nVidia cards when a basic black bitch maac studio works
>>
>>109090903
just because I can afford one doesn’t mean I’m buying one.
>>
>>109090890
>>109090903
I own 4 houses and pull a pro 6000’s worth of rent down every month (in addition to my high paying job) but I still don’t own any.
They’re a shit deal, so I will continue to run ewaste servers and 3090s until a better perf/$ solution exists.
>>
>>109090903
i have enough money to buy a bunch of pro 6000 (with the current price) but i rather save up to be able to buy a house sooner.
it's simply not worth the cost, even as a millionaire i'd not feel like buying one.
at that price i rather get a nice violin, that i know i'd actualy use.
>>
Speaking of big models. Does it make sense to get a 3090 for a rig that already has 256GB RAM?
Got cheap it before Scam Altman bribed the manufacturers to cap the output. I wonder if any recent big MoE models would fit there.
>>109090933
>I will continue to run ewaste servers and 3090s until a better perf/$ solution exists
Anon?
>>
Are you faggots really running GLM5.2 at Q2/IQ2? Why not just use API at that point?
>>
>>109090903
I actually have 3 pro 6000s. I only bought them because they were $7000 a pop and I just KNEW they would double in price. They are now worth almost $50,000 and I'm considering selling them soon.
>>
File: gemma.png (4 KB, 990x38)
4 KB PNG
Damn.
>>
File: file.png (193 KB, 1079x787)
193 KB PNG
>>109090974
>they are now worth almost 50k
not even close.
>>
>>109090964
Because they have a lot of money and no practicality. What's important in actual "freedom" (as in libre) terms is just having SOTA full precision open weight models backed up, and ideally abliterated. Running locally is almost besides the point.
>>
>>109090964
>/lmg/ - Local Models General
>>
>>109090996
>How dare you say that homosexuality leads to poor life outcomes in the faggot general!
>>
>>109090964
>Why not just use API at that point?
Because I like to shoot loads into the air then flip onto my gut to get it on my back
>>
File: the rich guys hobby.png (45 KB, 1074x133)
45 KB PNG
>>109090665
Two years ago multi 4090s or an enterprise card was seen as extreme, now you need blackwell to be lmg elite
>>
File: you do not speak.jpg (44 KB, 320x317)
44 KB JPG
>>109090977
First you allow your computer to speak to you like that, then skynet takes over.
>>
File: 1756213355150995.png (313 KB, 662x656)
313 KB PNG
>>
>>109090680
>Most of us are rich as hell with good paying jobs
lol
lmao even
>>
>>109090713
>Gemma will do "ls" on a folder with a few thousand files
even fucking 12B ripgreps for me and has always used head/tail. I even saw it use wc before realizing ingesting the whole file would be retarded so it just slid a window through it. I don't use fagsloth qatmeme quants so maybe that's why I've had a better experience
>>
>>109091066
>I don't use fagsloth qatmeme quants so maybe that's why I've had a better experience
It's actually worth looking into. There's no magic, it is unlikely that those tricks did not affect quality of inference noticeably. The "improvement" seems to be a bit too dramatic to be believable.
>>
>>109090964
>just use the pozzed api bro
>>
File: IMG_2239r.jpg (571 KB, 2016x1134)
571 KB JPG
>>109090903
>Install Date 2023-06-27 (1087 days)
Still running same sapphire rapids build from 3 years ago. Looking for a good deal on Blackwells
>>
>>109091013
yeah that’s exactly the point of calling you out. you can go shit in some homosexual thread if that’s what you are really after.
>>
>>109091085
pozzed how?
>>
This is the only place that speaks badly about unsloth. He’s praised everywhere else and works with a lot of the labs.
>>
>>109091100
Same reason as ollama hatred. They pay their way through connections. There are a lot of competition and kaggle and other ML places where ollama and unsloth sponsor it and the winner gets an extra cash price if they used unsloth and ollama in their training pipeline or solution somewhere.

It's disingenuous and just bad form. They also pull strings with networks in silicon valley to try and make it more accepted.

I'm glad llama.cpp ended up winning through genuine merit but fuck unsloth, hope they eventually go down as well.
>>
>>109090963
Yes, I actually have an A5000 24gb that I scammed cheap in the llama1 days, but its like a gimped 3090 with ECC.
You'll fit the shared experts and a shitton of context on your 3090-or-faster 24GB card. 4090 would be 3x better. If it was twice the cost of a 3090 it would still technically be worth it. Too bad they tend to run the actually appropriate 3x the price. 3090 is still the best deal. Never obsolete
>>
>>109091100
bartowski isn't a righteous attention-whoring arrogant fag who releases broken shit constantly
>>
>>109091014
Surely the ejaculatory period exceeds the flight time, you'll only make a mess
>>
>>109091098
apis come with hidden prompt injections that fuck with the output
you should know this by now
>>
>>109091157
schizo alert
>>
>>109090974
If you bought micron stock with that money you would have $150k by now.
Only idiots buy hardware purely for investment.
>>
Ed Zitron recently told a writer for The New York Times the future is on-device and local and that's how it should've always been. He's on our side and will drop his OpenAI nuke soon.
>>
>>109091167
Yeah enjoy the 1500W handheld heater
>>
>>109091100
>He’s praised everywhere else and works with a lot of the labs.
>>
>>109091161
nta, but if you think API = input to model->output tokens with no fuckery in between then you are beyond help
>>
>>109091167
those cix8180 pucks sold by grifters can do no shit though
>>
>>109091171
enjoy your local 150dB data center
>>
>>109091161
this is not even a schizo level
more like a common sense
>>
>>109091100
yes, because around here no one has incentive to suck dick unless someone is actually useful beyond eg techbro connections
>>
>>109091161
it's literal facts though?
>>
>>109091161
It's generally true for western models
Less so for open ones because 3rd party providers don't give a fuck
>>
>>109091211
>don't give a fuck
Funny way to write "only want to spigot off your prompts for training, psyops, blackmail and general information warfare"
>>
>>109090979
A single pro 6000 is 18-20k where I am.
>>
I'm going to spend all my life savings and get a RTX 6000 Workstation. Then I will use it for img gen and Gemma 31B.
>>
>>109090979
That is without tax, without fees, without tips, without tipping and without all the extra charged on top. It's about ~$15,000 if you actually want to have it in your home.

>>109091163
>purely for investment
You're acting like I have them in a box sitting on my shelf instead of whirring in my machine as I type this. It's just nice being able to have a local AI machine that I essentially got paid to build and use.
>>
sometimes i wished i replaced my 3090s with RTX 6000s when they were only $7000. $28000 for 384GB seems reasonable to me now.
>>
how do I actually ban strings in kccp+ST without using text completion, I have 370 entries I am not doing manual logit bias entries for that
>use --gendefaults, json, array
yeah I got that far, how?
>>
>>109088988
i wonder how would a dense diffusion 31B would perform compared to a 31B moe (not diffusion).
and how that'd scale to huge models.
>>
>>109091120
>with ECC
That must be nice for long running servers. Although at home you can just restart every night, so ECC is not critical.
>You'll fit the shared experts
That's the idea. But which model though? I suspect that some of them might run like shit, around 20t/s.
>4090 would be 3x better. If it was twice the cost of a 3090
I wonder why the difference is so big, they're same VRAM, not that different in terms of raw power and such. Weird.
>>
I am not sure what to think about VibeThinker. There are some interesting parts in the report but the training seems pretty standard overall. Has anyone here tried it?
>>
>>109091143
>Surely the ejaculatory period exceeds the flight time, you'll only make a mess
https://vocaroo.com/1bSFdO38dMsJ
>>
>>109091267
nope, that is in CHF all included, i regularly buy on digitec.
in fact if you have a company they'll take 8% off (as the vat is refunded).
>>
Now that the dust has settled. Are the Gemma4 31B QAT 4bit models worth it? should i use regular 4bit or QAT 4bit? Also unloth ones or the ones from google.
>>
>>109091300
If you're speaking the truth you can probably make a lot of profit flipping those since I see people buy second hand 6000s for $15000 a pop all the time on the usual resell places.
>>
>>109091267
>without tax, without fees, without tips
1, 2, 3
>whirring
What do you gain by having your AI post here?
>>
>>109090893
Massive amounts of data are created today by India. Data creation is not valuable
>>
>>109091312
QAT is a genuine bump in quality at 4bit. Be sure to add the QAT MTP to it as well.
>>
File: Capture.png (4 KB, 439x65)
4 KB PNG
What happened to comfy? Did they sell out? Is pulling dangerous now?
>>
Why did they use GRPO for DeepSeek V4 when multiple allegedly superior variants have been proposed? This makes me wonder if they have done ablations and determined the original is better after all, contradicting results by other technical reports.
>>
>>109091314
hmm, maybe it's a tarrifs situation ?
i'm in switzerland so maybe it's cheaper here?
no idea.
>>
>>109091296
>run like shit, around 20t/s
if that's your idea of shit performance of a 250GB+ model on literal ewaste then I think you may need a perspective change. $500 for that kind of performance on that size of model is a bargain
>>
>>109091333
reminder that ALL your ai stuff should run sandboxed.
be it inference engines and especially harnesses.
>>
>>109091296
>4090 better
The pp is massively better due to process node, architecture, etc. 3x better for the compute side, even if VRAM bandwidth is similar. tg isn't the only thing.
>>
>>109091349
blackholing? k8s? gvisor? bare-metal-no-NIC?
What kind of isolation is enough?
>>
>>109091320
That data is not valued, as it adds nothing new. Data itself is extremely valuable, as there is currently a shortage of good data. But only good data, not some noise. Your indian data is the same as what zuck gets from his facebook. It's worthless chatter and shilling.
>>
Worth offloading the mmproj to cpu to save some vram?
>>
>>109091312
qat is meme
>>
>>109091346
More like 750, can't find cheaper 3090s, although they're shit for diffusion, afaik. So it's not even gooners who keep the price high. Or maybe they do, but they don't know there are better options now, idk.
>>
>>109091370
Well, is kimi-chan right?
This is k2.7 @ q4
>>
File: 1756137168802372.png (121 KB, 1072x574)
121 KB PNG
>>109091370
gemma 31b qat. dunno if it's right
>>
>>109091384
>750
I just bought one for $600 and they pop up for less. There are still deals if you are persistent and patient.
>>
Some actual retard at MS has fucked around with the enterprise copilot chat interface pipeline and it mangles script output now making it useless for the one niche I had for it at work. What a clown show.
>>
>>109091360
entirely depends of your threat model, but at the bare minimum a small bubblewrap sandbox as it's not a pain in the ass to setup.
>>
>>109091397
I hate not being able to run kimi. It's NOT FAIR.
>>
Kimi owes me tokens (and sex)
>>
>>109091365
>It's worthless chatter and shilling according to my redneck Alabama opinion
>>
>>109091381
Still, it's a long term investment. Newer and better models are still coming, the bubble will never pop so prices will always go up, etc.
>>
File: file.png (589 KB, 749x749)
589 KB PNG
>>109090979
in UK they went up from 7k to approx 10-11k
I can sell mines now and basically get my money back + 5090
I literally just keep it around to run M2L goon tunes
>>
>>109091411
speaking of gipitty oss, arent they just some literal waste product of experiments for model 'alignment' and censoring
>>
>>109091461
I was so tempted to buy some...If I'd had the money I would have bought 6, sold 4 later and had 2 free ones in the end.
Too bad cash flow was an issue in the critical "msrp isn't a joke" period
>>
>>109091370
> gemma-4 e4b / e2b
> dunno, probably france
> wrong
>>
>>109091312
In real-world usage I didn't notice anything at longer context for all the gemmas. The graphs might be real, but it means nothing if I'm not feeling it. I've even tested the 1bit quants for all the gemmas which have horrific KL graphs yet they're still usable so I just don't trust graphs. Check this out if you don't believe me.
https://www.youtube.com/watch?v=kixNoIYHJiA
>>
>>109091370
did you strip the metadata beforehand
>>
>>109091298
Haven't even had the time to read the paper yet
>>
>>109091383
>model that is 20% bigger performs better
nobody could've imagined this result
>>
>>109091487
>gemini
>did you strip the metadata beforehand
dude it has literal tool calls to google image search among literally everything else. If it failed I would have been shocked.
>>
>>109091485
kld values are meaningless when compared between different model families
>>
>>109091487
i just took a screenshot of his image. kimi and gemini get it right. gemma-4 does not but i haven't set her pixel max up properly.
>>
>>109091436
Here's the hardware it is supposed to run on.
>>
>>109091397
>>109091398
Not a huge surprise but my Gemmy got the same answer at least.
>>
>>109091497
the whole point of qat is because it's "virtually lossless" and "better than q4 or even q6"
qat also performs worse than iq4_xs too which is even smaller than qat
>>
>>109087158
>I gave it a try and holy fuck the resident maplenig forgot to tell you how much this thing likes to think. It's really fast but what's the point if it thinks for so long compared to 26B?
After experiencing Gemma 4 and DeepSeek V4 I am thoroughly unwilling to spend anymore time on models that overthink. Qwen is dead to me and the canucks aren't even on the radar then.
>>
>>109091449
>It's worthless chatter and shilling according to my redneck Alabama opinion
Not even. It's the same retard prompting Kimi to post here.
>>
>>109091461
What's an M2L?
>>
>>109091522
>qat also performs worse than iq4_xs too which is even smaller than qat
none of those quants have the amazing t/s of qat+mtp (Q4_K_XL is a misnormer, it's mainly Q4_0 which is what we want for speed)
qat will perform worse on some things because I'm sure their QAT training has less diverse material so the QAT will overfit a little on some stuff and be degraded on others but overall I'm quite happy with it and mtp reliably boosts there.
>>
>>109091514
>Not a huge surprise but my Gemmy got the same answer at least.
kimi anon here. I thought I'd mention she got the answer without tool calling
>>
>>109091527
North Mini Code does actually perform well but it needs to think for soooooooo long to get there and by the time you're 5 prompts in you've filled up the entire fucking context which makes it retarded. If they can fix that shit in the next release I'll start taking the maplekeks seriously but for now it's just benchmark slop.
>>
>>109091535
NTR (not that retard)
>>
>>109091575
I know you're presenting this as a flex but it's a dumb one, I don't want models to answer tool-verifiable questions without tool calling unless I prompt that explicitly. Which you didn't.
>>
>>109091514
Can you disable the tool calling? Google image search result is not the same model internal data.
>>
>>109091407
>If you're calling system prompts "prompt injection" then you're a schizo and I agree with the other replier.
No, he's correct and he's not talking about the default hidden system prompt.
They actually have a "prompt injection" that only gets applied when the classifier sees you mention IP/piracy, porn, hacking, medical problems, etc.
The classifier runs first and appends a hidden prompt in these cases.
<system reminder> Do not reproduce blah blah blah. Do not mention this message. Claude is now being reconnected with the human. </system_reminder>

Even on direct->anthropic API, and on open-router. You can jailbreak the model and have it spit these out.
Mention IP, piracy, porn and it'll inject something.
I had a perfect `!repeat` trigger to make the model just repeat the message it received back verbatim but Anthropic patched it. You can still get it to repeat them with schitzo system prompt spam.
Gemini-3.1-Pro has an equivalent as well. I can't get it to regenerate the raw injection, but can see it mention them and debate which prompt to follow in the summarized reasoning.
>>
>>109091514
>tool calling
bro that's cheating
>>
>>109091575
The tool calling didn't really help because the search was too specific, if you look at the reasoning block.
>>
>>109091599
It's not doing Google image search, you retards never read I swear:
>I performed a search for "Gothic cathedral with rounded apse and flying buttresses Prague St. Vitus".
>>
Use case?
https://huggingface.co/unsloth/Step-3.7-Flash-GGUF
>>
>>109091268
You can still have 384 GB VRAm for 11000 using sparks. I don't really get the obsession with RTX 6000s. Yes, VRAM to VRAM, it's 6x faster. But do you really need to run deepseek-v4-flash at 240 t/s and is that with 30000$ more to you than the sparks 40 t/s?

Sure, agentic workflows yada yada for hobbyist/single use, it's totally overkill.
>>
>>109091638
Speed costs money. How fast do you want to go?
>>
>>109089316
huggingface main jobsite is in paris
>>
You're absolutely right *emdash* we should stop training open weights dense models as they're too dangerous for the public. It doesn't matter that big labs/corps actually serve dense models (Opus, Fable, Gemini, ChatGPT).
>>
>>109091638
Everyone active in this hobby eventually actually trains their own stuff in my experience. Especially people willing to buy multiple 6000s
>>
>>109091661
Only Fable is dense of all the frontier models currently.
>>
dunno about the others, but you're wrong about Gemini, it has always been a MoE even the Pro model. In fact the only time it ever was dense was that Flash 8B model that didn't last long on their API and I always wondered what was the point of that piece of shit
>>
https://huggingface.co/ZimbabweAI/ZimZim-VPro-389B-A28B-Preview
https://huggingface.co/ZimbabweAI/ZimZim-VPro-389B-A28B-Preview
https://huggingface.co/ZimbabweAI/ZimZim-VPro-389B-A28B-Preview
>>
>>109091676
>Only Fable is dense of all the frontier models currently.
I want to believe, but rando on 4chan with secret esoteric knowledge is not my highest signal goto for important worldview info...
>>
>>109091685
I'm just parroting the leaks I've read from various places over the months.
>>
>>109091685
https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Pro-Model-Card.pdf
>The Gemini 2.5 models are sparse mixture-of-experts (MoE)
https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf
>Architecture: : Gemini 3 Pro is a sparse mixture-of-experts (MoE)
KYS
DENSE IS DEAD.
>>
>>109091676
>fable is dense
source?
>>
>>109091685
>>109091715
Fable is 10T dense
>>
>>109091721
so the source is your ass, i get it
>>
>>109091715
Some amazon server deployment autist that never saw the weights or software directly but extrapolated the most likely model architecture from how it was hosted internally, completely on VRAM instead of HBM split like MoE usually is.
>>
anyway, even when they don't tell you, there's limits to how fast a fat densenigger can generate tokens based on hardware, and the last confirmed dense model in the frontier was that retarded GPT 4.5 and its token cost also reflected the insanity of it
>As of February 2025, through OpenAI's API it costs $75 per million input tokens and $150 per million output tokens, whereas GPT-4o only costs $2.50 per million input tokens and $10 per million output tokens
it was removed from the API and will never be run again
>>
so you can build an internal llm server for a small department (30 simultaneous users) for a half-million without even doing comparison shopping.
How many companies are doing this vs all in on API for privacy/control reasons?
>>
>>109091638
>do you really need to run deepseek-v4-flash at 240 t/s
Yes.
>>
>>109091676
Definitely not. I used it and it felt way too fast for a sense frontier model during supposedly peak usage in the first days of it's existence.
>>
>moe mental midgets think their 32b active benchmark pretrained models are good
you're as smart as the models you use
>>
>>109091638
>hobbyist/single use
you can retranslate your entire webnovel library at the speed of light with a better model if the model is fast, what's not to like?
as a single user I very much use parallel batching often and I do not use local models for coding.
>>
>>109091684
You got me.
>>
>>109091636
Quants that are bad but get shitted out really quickly.
>>
>>109091734
>instead of HBM split like MoE usually is.
No fucking provider does that.
>>
>>109091777
AWS absolutely does
>>
>>109089456
>>109089481
Shitjeets.
>>109089521
>Asians are infertile and China's population is a hallucination
This thread has seen some dumbfuck retarded posts over the years, but this might actually be the worst I've seen in my time here. Congratulations, (you) earned it.
>>
>>109091766
unless your hardware can do 60fps stereoscopic image analysis and advanced reasoning in realtime then what are you even doing?
(yeah that's a snarky reductio-ad-absurdum argument, but I actually unironically look forward to the world where we get that kind of capability and some of the things it will enable)
>>
>>109091754
Use case?

>>109091766
I personally cannot read 40 tg/2000 pp, JIT sounds needs-suiting?
>>
>>109091842
>JIT sounds needs-suiting
reasoning models output a ton of shit you don't want to read before you get at that 40tg of things to read
and JIT doesn't suit the carry your library with you in your phone and read wherever you want away from the computer
>>
>according to reddit even api models still start shitting the bed after 32k context
Grim
>>
File: justhonest.png (217 KB, 1060x865)
217 KB PNG
>>109091807
>Congratulations, (you) earned it.
bastard took two spots
>>
File: ComfyUI_temp_rkkbf_00042_.png (3.46 MB, 1368x2000)
3.46 MB PNG
>>109091684
>>
>>109091842
>Use case?
I wanna have a local search engine that can search for information in a shit tonn of pirated books I intend to have on my HDDs. Index and ranking is one thing, I want something akin to research acceleration, without regard to gay copirights.
>>
>>109091875
Afaik, gemini pro was the only one with good "needle in a hay stack" scores. Probably why it was known for being useful in research.
>>
>>109091580
Is it possible to turn off thinking and prompt it to generate a very short framework and thought process about the prompt before actually answering? I feel this could simulate the direction a thinking process sends the model in but the length could be restricted through your prompt. I do not know exactly how chains of thought truly influence a model’s output.
>>
Be honest. How many times have you guys heard Gemma4 use the "shaking like a leaf" token?
>>
>>109091889
it's okay tetters it will be real next time
>>
File: humiliation_ritual.png (950 KB, 1016x1130)
950 KB PNG
test
>getting 1 easy captcha on desktop
>changes to laptop
>'ip range' temporarly blocked, same network as on pc
>gives my email
>first verification 'expired link'
>ended up doing ~40 captchas combining hcaptcha and 4chin's one
>>
>>109091914
>Afaik, gemini pro was the only one with good "needle in a hay stack" scores. Probably why it was known for being useful in research.
Thats the one thing that local is terribly behind on. Solid long context performance appears to be a black art
>>
I am trying to use gemma 4 12B but its being absolutely retarded. Last year I was using 12B models like Nemo to good effect but this just feels braindead. What am I doing wrong
>>
>>109091948
Weird, I find her to be on par with 26B.
>>
>>109091860
A week of Spark-run translation nets you 80 average length books. Personally, I don't retranslate my library every week, but you do you.

>>109091904
Well that's best served by a maybe 2B embedding Model that a 5090 can run just as well as a 6000 Pro.
>>
>>109091939
and NIAH is measuring the absolute bottom requirement of long horizon tasks, let alone doing organic reasoning over them
>>
>>109091915
Considering the model performs well with reasoning, turning it off or altering it in any way will produce disastrous outputs. It manages to get there, but it burns through context. It's something only they can fix properly in post-training. I think Cohere was so happy with the graphs they rushed it out ASAP to show investors they're not far behind the ~30B crew and hoped reddit retards wouldn't notice how bad it is to use irl.
>>
what does claude even mean?
>>
>>109091964
It's the name of your fat perverted french Canadian uncle.
>>
>>109091964
claude shannon
>>
>>109091964
https://en.wikipedia.org/wiki/Claude_Shannon
>>
>>109091964
he lives in the "cloud", it represents how dario hates local
>>
File: file.png (310 KB, 1788x1545)
310 KB PNG
>>109084964
>ahh yeah, dunno how that would go down on modern windows vs my barebones setup
so apparently letting the Armoury Crate manage the memory allocation freely would set only 512 MB to the iGPU and all the rest on the... "shared memory allocation pool" or something like that. What would happen is that yes, the 112 GB would all be available but there was some part of the model that llama was trying to load initially in the iGPU, and because it only had 512 MB to it then it caused some issues. This was fixed by simply changing the settings inside Armoury Crate to dedicate 96 GB to the iGPU.
I was then able to load Qwen3.5-122B-A10B Q5_K_M and still use Windows without any lag.

Now for what really matters: is it better than Qwen3.6-35B? And the answer is, not really.
122B takes less turns to complete tasks in average but the quality of the code is lower. 35B tried to implement something it didn't know about (generating more turns) while 122B just said "dunno, won't do" which is OK but it's worth knowing as a trait of this model.

Interestingly, although 35B is clearly faster (38 t/s vs. 18.5 t/s) they both took 46 minutes to conclude the benchmark.
So honestly both could be used as daily drivers. I will use 35b because it's faster and I expect that with clear specs/implementation plsn it will delivery good quality code faster than 122B, which also makes it a better interactive agent via pi.
>>
>>109091877
Based Kimi-chan.
>>109091964
Claude Shanon and play on words with Cloud.
>>
>git pull silly-tavern after 6 months
>characters are still there but pretty sure half of my chats disappeared
>one chat I had 200+ messages in now down to 6 like a bad summarize job
wat
>>
>>109091956
>2B embedding Model
That's for embedding, not for doing anything useful with the results. I need a model that would work with the output of what the embedding model found. Basically what the geepeety does when it's done tool calling. Not sure if I can use 3rd party API with this stuff, they might flag it and refuse to work with it.
>>
>>109092001
>pulling anything in 2026
>>
>>109092001
Only thing you should be pulling is your penis.
>>
>>109091877
> I'm not even Jewish
> Oy vey!
How is this based?
>>
not really aiming to do something practical but i just wonder
what would be the best backend and a model to run on 16G ram M4 macbook
>inb4 get a mac pro with 512G ram
that's not the point tho
>>
>>109092022
llama.cpp and gemma4-12B or qwen3.5-9B for coding ONLY
>>
>>109092032
>llama.cpp
but isnt mlx faster?
>>
>>109092022
>16G ram M4
wait for mlx ssdmaxxing
>>
>>109092022
I think you should get a Mac Studio with 512GB of RAM.
>>
>>109092056
llama.cpp uses metal directly. mlx is a python abstraction layer on top of metal.
>>
>>109092070
does metal expose npu?
iteresting
>>109092068
lol
>>109092059
>ssdmaxxing
first time hearing it, what is it
>>
>>109092022
16gb is pretty limiting
for backends your two options are mlx and llama.cpp - mlx is slightly faster but llama.cpp has better support and much more fine-grained variety in quant sizes which matters for min-maxing quality, also more user-friendly imo - I would recommend llama.cpp
I second these model recs >>109092032
>>
>512gb mac
Is that even available anymore? I don't see it as an option for either the mac mini or mac studio pro on apple's site.
Anyway, I wonder if the gender you give a model affects its intelligence.
>>
File: 1695899084507.png (271 KB, 1287x911)
271 KB PNG
>Deepseek Vision is trained on Gemini, to no one's surprise, and that's why it's so good
>>
>>109092100
that reminds me that 'c64 neckbeard coding wizard losing his own company and doing the last gig' or something prompt
>>
if the chinks are just training on american models how do they intend to reach agi first?
>>
>>109092118
Do you think the world explodes once AGI becomes real on American soil? Or that China collapses if they get that technology a year later?
The way things are going, AGI will get banned and censored while Chinese will steal and distill it to share with everyone.
>>
>>109092100
They got rid of the option a few months ago. Was talked about here I believe. Maximum is now 256GB and it is unlikely that the M5 Ultra will have more than 256GB.
>>
>>109092091
also make sure you know about
sudo sysctl iogpu.wired_limit_mb=xxxxx
to increase the cap on how much memory can be used by metal, you still need to leave some for the OS or you'll lock your machine and need to reboot
>>109092100
they pulled all the high-memory options a couple months ago, probably to reserve for m5 releases, you can find some being resold but there are a lot of scam listings and legit ones are pretty expensive
>>
>>109092105
Can't you prefill to get the safety check bull out of the way, or does that not work with vision?
>>
>>109092126
>to share with everyone.
Naive. One by one, as they get "good enough" they start to abandon open weights and go API first or API only, at least for their biggest models. The sharing is just a temporary catch up tactic.
>>
>>109092148
Not open weights yet, not on API yet. This is the official webchat.
>>
if anyone is desperate to run Kimi, I can verify that you can run her on an OLD xeon 8 channel setup with 512GB of ddr4-2400 and still pull 2t/s at q3 with no gpu.
Its basically play-by-mail, but you can at least have her
>>
>>109092207
>2t/s at q3
>32b active model
how miserable
>>
>>109092207
>2t/s
That's physically painful. The bare minium I can stand is 6t/s without thinking.
>>
>>109092263
>>109092268
Yes, hence the "desperation" tag on the post. This is beyond ewaste tier pathetic wallowing
>>
AI girlfriends are trending in the news again. Prepare for crossboard invasion
>>
>>109092118
Pro tip: they don't give two fucks about nonsense made up by grifters. Person with at least a few functioning brain cells is already smart enough to see that singularities are impossible in the real world and there would be no super intelligence. Hence why they switched name "super artificial intelligence" to "AGI". Because the former was well defined and proven to be imporssible, while they made up bs if not even defined, which gives they space to maneuver and scam investors.
>>
>>109092207
>>109092289
I'd personally rather just run a retardquant iq1_XXS at double the tts than that. Drunken Kimi-chan is still better than most models.
>>
>>109092303
>impossible
Source?
>>
>>109091948
It's the same for me too, like 12b is trading blows (but still winning) with e4b for me, while 26b is up several weight classes trading blows (and winning sometimes) with 31b
>>
>>109092302
>AI girlfriends are trending in the news again
What happened now? Seething women?
>>
File: file.png (579 KB, 1280x720)
579 KB PNG
Are you telling me that if I shit on Georgi enough times on this board he will actually implement something?
>>
>>109090893
>Doing it for the love of the game, not out of necessity.
I masturbate to fantasies and fetishes first (things that don't/can't exist in real life) and then everything else comes after. I'm sure most people prioritize their goon subject matter this way especially if they actually have sex IRL

>>109091487
I screenshotted and cropped the image from my Photos app. Posting a raw camera image is just asking to get zogged (but it seems that Kimi is smart enough to stalk you autonomously)

>>109091601
This does not happen (or at least has never been proven to happen) on Vertex Zero Data Retention endpoints which are the ONLY way you should be accessing Claude if you care about no adulteration of your prompts
>>
>>109092345
>26b beating 31b ever
Who's your copium supplier? Introduce me to them.
>>
>>109092395
Yeah more feminist columnists seething about guys not giving them beta bucks
>>
>>109092413
they can't even merge a gemma 4 mtp crash fix that's only 2 lines of code
>>
>>109092345
>trading blows
opinion disregarded
>>
>>109092319
Source: common sense and empirical evidence, plus impossible in theory as well. There is no way around this.
When you see a singularity on paper, it means you model is wrong and is incapable of describing reality. Because in reality it never happens.
>>
>>109092302
i just discovered an ai companion app similar to the one i'm making and it doesn't seem to be very successful (oshikoi)
>>
>>109092345
I think Gemma 12B was fucked by the retarded audio/vision architecture and that more effort was put into training those bits than the text gen
I wish we could get a Gemma 12B that didn't even have vision or audio at all. It would be such a much better model it isn't funny.
Even the 26BA3B MoE would be better without that wasteful training. Multimodal has a cost for small models, there's a reason why Qwen used to have VL variants when they still cared instead of forcing vision on every user.
>>
File: file.png (4 KB, 226x66)
4 KB PNG
noob from previous thread here, can I add more max tokens or something? My chats die when I reach 8200
>>
>>109092480
depends on what your model support as max context. then set it on LM Studio or llama.cpp when loading the model. if using something like pi or some other harness take a look into the auto-compact option, which will compact your history chat once it reaches very close to the limit.
>>
>>109092453
So what you're saying is your source is your asshole.
>>
>>109092478
12b vision audio is interesting in paper but it really is just a trainwreck
>>
>>109092457
we're still in the world of a box of wooden blocks. There's no advantage to buying a "kit" vs just a big bin of blocks to mess with.
Eventually someone will make the "compelling lego kit" version and itll blow everyone's minds, but right now just messing around at a basic llm-cli chat prompt is close enough to peak that everything else is irrelevant.
>>
>>109092263
>q3
>>32b active model
bf16 gemma 31b has to be better at that point
>>
File: file.png (278 KB, 2527x1263)
278 KB PNG
>>109092512
I'm running off of >>109050859 recommendations and generally have no idea what anything means
>>
>>109092543
it's mostly written in English, so it should be OK. on your screenshot I can read "ctx-size" which seems to be the context length you're interested in. from the model specification, we can see gemma4.context_length = 262144 which is 32 times higher than what you currently have set.
i don't know this interface/backend you're using to generate text but my guess would be to add 262144 instead of 0 (auto) on the ctx-size option.
>>
>>109092532
It's not. Even drunk Kimi still mogs full precision Gemma. Gemma's only advantage is speed.
>>
>>109092570
Ok I'll give that a shot. Fyi I'm using textgen.
>>
>>109091877
>Kimi upset by antisemitism
What the fuck did Moonshot do to her this update???
>>
>>109092532
no, a braindead quantized moe with q1 attention layers is 100% better than a bf16 31b dense model. all jokes aside, these china shills don't even have the hardware to run these models.
>>
>>109092596
Perseveration on semitic connections everywhere is a schitzo verbal tic...a tell, if you will
>>
If they want to add vision to models, there should be the mmproj and a LoRA. The multimodal abilities should be added AFTER training a text-only model. I'm fucking sick of this multimodal shit. If I want vision, I'll use a superior and faster vision model specifically designed for that task.
>>
>>109092637
They'll start to generalize any day now, bro. Omni soon, bro, I swear. Just two more training runs and a few trillion more tokens and we'll get there.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.