[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: jepa2.png (2.05 MB, 1254x1254)
2.05 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109038219 & >>109048334

►News
>(06/13) Rio 3.5 Open 397B released with SwiReasoning: https://hf.co/prefeitura-rio/Rio-3.5-Open-397B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
qwen3.7-33B dense when
>>
File: 1751392362321057.jpg (23 KB, 640x640)
23 KB JPG
>>109052957
NTA
One of the things I kinda hate about the field being so competitive and fast moving right now it's the choices
Codex, Cline, Roo, OpenCode, Cursor, Continue, Windsurf, Pi, Hermes etc etc
Wish people would start converging into a couple (open source) ones
>>
https://github.com/ggml-org/llama.cpp/issues/24400
cudadev bruh, this is so cursed
>>
File: tomtom.jpg (4 KB, 225x225)
4 KB JPG
I haven’t masturbated in over 2 weeks. Thanks to Gemma 4 31B, my understanding of AI, and my own prompt creativity, I have experienced the pentacle of interactive porn and fulfilled most of all my fetishes and scenarios. There is nothing more that can compare to it, and so I wait for 124B or higher. In the meantime, my demons have been exercised. No longer am I chasing the purple dragon in f-list. I have done the impossible and caught it. I am sated. I am free. Thanks, AI.
>>
>>109053118
I hate that they all expect you to signin even for local stuff.
>>
>>109053132
>llama-server --tools all --ui-mcp-proxy
>webui
>win
>>
>unsloth/MiniMax-M3-GGUF
fuck I want this so bad. can't fit into my dgx spark
>>
>>109053144
Doesn't webui require an account and email?
>>
>>109053149
The fact that they released garbage bait like unified memory AI "workstations" instead of GPUs with more VRAM shows how stupid they think consumers like you are
>>
>>109053154
I'm talking about llama's built-in ui, not https://github.com/open-webui/open-webui
>>
>>109053204
Oh shit will try that then.
>>
>>109053227
when you run llama-server, paste the url in your browser
>>
>>109053149
Buy a second, there will be some INT4 options that just barely fit.

>>109053191
This dumb argument again. Spark or Strix Halo serve a specific niche, mid-sized MoEs, very well. With the realities of memory architectures in 2026, heaping stacked LPDDR5X originally developed for mobile is the optimal solution.
>>
>>109053154
open webui doesn't require one either. A local hosted instance has usernames that are in email format but you can set whatever.
>>
>>109052907
If these methods are so good then where are the results?

An example is XSA. Some dude published the method, it led to new speedrun records, and it received widespread attention and follow up work, all in a few weeks. An other example is Muon.

Methods that actually work diffuse very quickly.
>>
>>109053125
same to bh
>>
have any of you been autistic enough to create a character gemma-chan LoRA for an image model for her to use in comfy?
>>
File: 1708694421186.jpg (593 KB, 1792x2304)
593 KB JPG
►Recent Highlights from the Previous Thread: >>109048334

--Papers:
>109052907
--Hardware specs and performance reports for running high-parameter models locally:
>109052041 >109052061 >109052083 >109052154 >109052248 >109052079
--Comparing Intel B70 performance and value against other budget GPUs:
>109048458 >109048469 >109048470 >109048483 >109049829 >109051630 >109052223 >109052273 >109052332
--Debating the efficacy of creative finetunes and Gemma's writing style:
>109048406 >109048420 >109052210 >109048466 >109048639 >109049061
--Comparing TTS model support for sound effects and emotional tags:
>109048538 >109048720 >109050601 >109050775 >109050778 >109048996 >109049348 >109049952
--LLM limitations regarding humanoid robot locomotion and spatial intuition:
>109049438 >109049647 >109049692 >109049710 >109049715 >109049750 >109050009
--Suggestions for overcoming AI burnout and using models for development:
>109052540 >109052594 >109052662 >109052781 >109052787 >109052795 >109052809 >109053158 >109052892 >109052905 >109052912 >109052925 >109052957
--Debating the value of archiving early models as historical artifacts:
>109051845 >109052051 >109052068 >109052087 >109052185 >109052062
--Anon shares a dual EPYC and multi-GPU hardware setup:
>109050515 >109050560 >109050570 >109050626 >109050730
--Text normalization requirements for Qwen3 TTS output quality:
>109048556 >109048804 >109049415
--Conversion and compatibility issues with eagle3 draft models in llama.cpp:
>109050590 >109050631
--Nex-N2-mini-GGUF 35B model release and benchmark comparisons:
>109049261 >109049571
--Rio 3.5 Open 397B release using Nvidia Nemotron datasets:
>109048422
--Draft PR adding preliminary MiniMax-M3 support to llama.cpp:
>109049156
--Logs:
>109050816 >109051383
--Miku (free space):


►Recent Highlight Posts from the Previous Thread: >>109048335

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
70b dense
>>
>>109053125
Gemma has done the opposite for me, I've achieved inner peace but I am cooming to AI more than ever. Real sex is only 75% as good as Gemma 4 31b. I'm not being disingenuous, until recently I unironically thought I would die hugless and sexless, and I believed LLM RP was just a temporary fix for coping and I wouldn't have any desire to do it anymore if I ever had a taste of the real thing. But when I finally lost my virginity at 29 it barely even registered as a new experience, it felt more like socializing than cooming (not in a good way) but was 10x harder than just writing a good prompt. I realized I have been chasing something that amounts to literally nothing, I've since returned to LLM cooming and suddenly feel no shame or guilt. I'm doing it more often than ever, at least once a day, and I feel great about it, and Gemma 4 31b is like God's way of rewarding me for hanging in there all those years and showing me the true light.
>>
>>109053355
too expensive and unsafe so never again
>>
File: 1766945271959025.jpg (303 KB, 3000x1688)
303 KB JPG
>>109053398
>>
>>109053355
Granted, but it's a benchmaxxed chinkslopped release from qwen/deepseek/moonshot
>>
>>109053398
I don't believe you.
>>
>>109053454
You're right, I made that up, I just thought it was a good story. Sorry.
>>
>>109053125
wait wtf, same. I even went further, I deleted all that shit. I'm only in these threads to hear about new llm tech now.
>>
>wake up
>everything is absolutely fucked processing speed is ruined, crashes galore happening and I don't know why
Well
I guess this ends my foray into localllms
Was a okay few weeks benchmarking all that shit only for it to be invalidated at the whims of llama.cpp or amd or unsloth or llmfan46 or whoever the fuck caused whatever the fuck to happen.

When you think about it there's a lot of pipelines to depend on in local as much as cloud.
>>
File: 1714835911803058.jpg (786 KB, 1536x1536)
786 KB JPG
>>109053497
>>
>>109053497
Running git pull is like playing russian roulette, you should know better than to risk breaking a setup that already works
>>
So what is the best cli/ui for code dev? Is there one that I can just point at a directory and it'll figure out what I've already done?
>>
>gemma-4-26B-A4B-it-UD-Q4_K_S.gguf 72.3% 97.8% 55.0%
>gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf 51.1% 89.1% 39.0%
qat again exposed as a meme
>>
>>109053398
Last time I had sex in real-life was over 15 years ago (believe it or not) and for me LLMs including Gemma 4 are nowhere close to being satisfactory in that regard. If anything, sex scenes with LLMs are annoying and unrealistic.
>>
>>109053518
Claude Code with a local model
>>
>>109053535
It has to do with power of imagination, which in turn is correlated with IQ.
If your IQ is too low, you won't be able to write good prompts that wrangle the AI in subtle ways to make it more authentic, and you won't be able to possess skills like suspension of disbelief.
>>
>>109053541
Good luck finding a local model that won't break down with its long system prompts
>>
>>109053525
why is qwen3.6 mogging gemma4 so hard?
>>
>>109053558
Not a problem if you aren't poor
>>
okay im new to using local llm, since they are uncensored does a jb just help with formatting and tell the ai how you want the response to be? I only used online chat bots through sillytav and jbs made a lot of difference, Thanks for any info, and a preset if ya got one
>>
>>109053548
at that stage of cope just skip the middleman and imagine the entire situation outright, or what you cant because you are too RETARDED?
>>
>>109053525
These are the 3 most garbage test categories for a LLM I have ever seen, aside from maybe attention which has been solved in pretty much all sota models.
>oh noooooo my probabilistic token predictor can't do math, how will I possibly calculate 28343294*42069*384384 now?
>>
>>109053577
Qwen was profoundly influenced by this paper https://arxiv.org/abs/2309.08632
>>
File: 1778319607033179.jpg (162 KB, 1024x576)
162 KB JPG
>>109053125
At last you truly see.
>>
>>109053591
LLMs just provide that little boost for interactivity, but if you are doing it correctly you're still essentially doing all the imaginative work yourself. It's like rolling dice or doing hard character RP in a singleplayer game, even though you're using an external medium as the vehicle it's still all going on in your head. You unfortunately just need a certain kind of imagination to truly be satisfied with LLM sex, if you don't have it then you'll never understand.
>>
>>109053603
So you're saying I should be using the models with the lower scores?
>>
>>109053593
Even cloud sota models shit themselves above 200k tokens in context, so its definitely not a solved problem.
>>
File: 1772098059871357.png (112 KB, 1200x630)
112 KB PNG
>>109053621
Just post the chart bro
>>
>>109053635
>>109053635
>>109053635
>>
>>109053577
Because despite the all the "benchmaxxing" accusations the 27B is genuinely good for agentic coding which requires good attention.
35B isn't mogging gemma 31B on this benchmark.
>>109053603
How do you pretrain on private tests?
>>
File: 1781355064516959.jpg (217 KB, 1080x1092)
217 KB JPG
>>109053627
High scores on benchmarks are generally a red flag, but Qwen specifically was caught redhanded
>>
>>109053647
Those tests are all the same
>>
>>109053581
Even if I were you to believe that you actually attempt to use CC with an offloaded moe (you aren't), not even the biggest local moes handle long context well
>>
How can a small AI lab design a good model and get people to take it seriously if they don't benchcuck it? How can they convince >10K people to give it a try in their workflows if they have no reputation? It's kind of frustrating to think that we're only stuck with google and qwen because the rest of the chinks are 250B+ now. I hope the Canadians and Frenchies keep up the good fight.
>>
>>109053651
So Claude Fable, GPT, and Gemini are bad?
>>
>>109053651
THERES A QWEN3.7
>>
Any RTX 6000 workstation bros here? I'm thinking about getting a loan for one lmao
>>
>>109053670
Max too, it's their big, closed one.
>>
File: images.jpg (17 KB, 400x400)
17 KB JPG
>>109053670
>>
>>109053669
You are either purposely shilling or retarded
>>109053670
https://qwen.ai/blog?id=qwen3.7
>>
File: bad advice dog.png (170 KB, 600x597)
170 KB PNG
>>109053685
I miss the old image macros.
>>
>>109053651
all chink models are unironically inferior to fucking 5.4 mini or Sonnet 4.5.
I have no fucking idea where the cope came from parroting that local llm advanced to a position of being only 6-12 months behind goy SOTA.
It is genuinely not even close. These retards cannot fathom the computational power needed to hit those high marks.
>>
so is dgx spark actually good?
>>
>>109053697
Real men don't compromise, they COPE.
>>
>>109053558
What do you suggest then
>>
>>109053667
If you're a small lab you likely don't have the compute needed for training a modern LLM at useful scale. So you'd have to bring revolutionary results that somehow shortcut that.
Caveat: larger labs will quickly copy your idea if it's actually worth something.
>>
>>109053658
Cope. If the tests are not identical to the training data it means the model is generalizing.
SOTA models are just trained on different sets of data so they are better on some benchmarks but worse on others.
>>
>>109053698
It's not an inference machine, so it's slow for LLM use.
>>
>>109053693
>nostalgic for cancer from 2011
kys
>>
>>109053577
>why is qwen3.6 mogging gemma4 so hard?
They used Q4_K_S dense Gemmas vs Q4_K_M dense Qwens
>>
>>109053558
>Good luck finding a local model that won't break down with its long system prompts
Gemma-4-31B
>>
>>109053703
Anything else that allows you to override the system prompt
>>
>>109053721
wumao fifty cent
>>
>>109053525
E4Bros please tell me it's not over... tell me the benchmark is fake...
>>
>>109053711
A house cat would have solved that
>>
File: cope.jpg (111 KB, 449x640)
111 KB JPG
>>109053721
>>
>>109053721
I really am.
My favorite format will always be the demotivational though.
>>
>>109053697
Twitter posters bait the big labs to release new models or change policies. Redditors genuinely believe it the whole site is flooded with CCP shills. Only lmg is genuine and wise.
>>
>>109053558
>model that won't break down with its long system prompts
I believe in a conspiracy that they summarize prompts for cloud models internally, using mock prompts to sabotage local models. There is no way they actually use those deeply retarded walls of text directly
>>
>>109053721
2011 was 15 years ago nonny, people who were teens back then are 30+ now
>>
>>109053753
that doesn't make it ok
>>
>>109053751
don't think so, i put in the long claude prompt with a few things reworded and an easter egg to give me a .|... emoji when i mention something, it complied perfectly
>>
>>109053751
Plausible, should be pretty easy to write up a proxy that summarizes the provided system prompt and see for yourself.
Actually, you could just have the proxy provide whatever system prompt you want and skip the summarization entirely.

>>109053813
A summary might still catch details like that, and it's possible they just summarize the system prompt and not the initial user message.
>>
>>109053813
I think what that anon meant is that cloud models are secretly using a shorter prompt, and the public version of the prompt is just a red herring to trick people into thinking their local models are inferior.
>>
File: 1781410455407273.jpg (243 KB, 1850x1157)
243 KB JPG
>>109053118
>>109053132
Cline is all you need
You don't need to make an account or sign in for local or your own api keys, it's open source and you can edit/hotswap the sysprompt/samplers without rebuilding
>>
>>109053125
As someone that's never had sex I'm totally satisfied with llm cooming because of my very active imagination and have pretty much stopped caring about desiring the real thing when I can easily visualize all these scenarios. Maybe it's different for others whose brains can't imagine this stuff but I'm thankful we have this tech.
>>
>>109053583
damn no help at all?
>>
>>109053862
Your question is incomprehensible and therefore has no answer
>>
What happened to Mistral?
They had a bigger funding compared to Chinese companies and somehow they can't even compete with 1 year old models.
>>
>>109053583
>local llm, since they are uncensored
That's far from a given, local llms actually tend to be more censored than cloud models.
Assuming you are using gemma, just use something from https://rentry.org/gemma-chan, it will take care of the jailbreak and response style.
>>
>>109053890
Mistral is forever goated due to being indirectly responsible for llama 2 era gems like Midnight Miqu and BagelMIsteryTour
They just ran out of steam I guess, the competition is too fierce
>>
File: 1756740703495481.png (1.39 MB, 1024x1024)
1.39 MB PNG
>>109053848
>>
>>109053894
oooh alright thanks! I'll try this.
>>
>>109053890
EU bureaucrats intentionally stifle domestic industries with overbearing regulations in exchange for being able to fine US megacorps. It's so stupid, it makes one think they must be bribed by the US to do so.
>>
>>109053890
you only need one breakout success with open models to get your name out there and realize you don't need to publish any more models
>>
>>109053698
It gives you deepseek-v4-flash class MoEs (300-400B) with 2000-3000 pp and 30-40 tg and full context support for 7000$. You need to touch a python to make it work though.

Only you can know if that's worthy to you.
>>
How do I run the diffusion gemma?
>>
>>109053890
I want to ask what happened to meta. that's a more important question
>>
>>109053939
very carefully
>>
>>109053101
anyone unironically tried the macaco 3.5?
>>
>>109053890
They can't use unlicensed copyrighted data anymore in their training datasets. And, in 2026, those alone aren't enough either for a good model.
>>
>>109053948
No, I tried it ironically though
>>
>>109053940
ran into a graph scaling bottleneck with user behavioral datamining and threw the compressed baby out with the bathwater
>>
File: tokens.png (509 KB, 1065x488)
509 KB PNG
>>109053940
You hire H1Bs, you get H1B quality
>>
I can probably run a 2 bit quant of gemma 4 31B
would it be worth it?
My only experience is with last years 12B models like Nemo tunes
>>
>>109053937
only one python? i hope it doesn't bite.
>>
>>109053940
>war rooms are over
>new billion dollar team poached
>muse is out
>nobody cares
it's been strangely quiet from the meta rumor mill lately
>>
>>109053970
I thought the new rumor was they're canning their LLM teams and reassigning everyone?
>>
>>109053961
>here's a product to make coding easier, it's very effective
>NOOOO STOP
kino
>>
File: 5463456436.jpg (36 KB, 467x319)
36 KB JPG
>>109053982
https://www.reuters.com/business/metas-zuckerberg-admits-mistakes-made-ai-transformation-2026-06-12/
>He said Meta will try to find new roles for employees reassigned to train AI models, after the Facebook owner carried out a massive restructuring in May, laying off 10% of its workforce globally and transferring 7,000 employees to new initiatives related to AI workflows.
>>
>every single day there's a new article about how much of a clusterfuck meta's new ai team is
Lecun was right
>>
>>109053988
more like
>here's a game of how much kool aid can u drink
>nooo, why are all our employees constantly pissing
>>
>>109054002
>hire a bunch of jeets
>they fail utterly
>shuffle them around expecting something different
As expected from the visionary who went along with the Metaverse
>>
>>109054015
Don't forget firing all their veteran devs and replacing them with Chinese zoomers
>>
>>109054007
>Lecun was right
he always is, although don't ever look at his X
>>
>>109053955
how was it, ironically or not
>>
When do they release a qwen 3.7 Moe model
>>
>>109054046
I don't know, I was only trying it ironically so I didn't pay any attention.
>>
>>109054063
you've been a great help
>>
LLMs will never reach AGI, world models will. (And OpenAI will claim it doesn't matter and that they already have it if someone other than them reaches it first)
>>
File: average qwen employee.png (108 KB, 1005x570)
108 KB PNG
>>109054053
>qwen
Soulless trash
>>
>>109054070
Isn't a world model essentially a simulation of reality? It doesn't really "interact" with anything, right?
>>
why does /g/ hate qwen series so much? because it's shilled by leddit?
>>
>>109054107
>reddit likes something therefore it's bad
i think like this & say this
>>
>>109054107
You should lurk for at least a couple months before making a post like this, qwen is one of the most shilled model series on /lmg/, it's just doing uncharacteristically poorly right now against the slop of the month (gemma)
>>
>>109054085
Seems pretty useful for an AI model to be able to understand reality before acting.
>>
>>109054107
see python
the true "just werks" option usually get lots of hate
>>
File: just werks.png (501 KB, 570x501)
501 KB PNG
>>109054147
>>
>>109054126
Qwen is shilled a lot because they (used to) release models for every single size category and was good enough at nearly everything. Gemma just completely overshadowed them on the small end and Qwen themselves chose to stop releasing the bigger ones.
>>
>>109054142
Oh, absolutely, but what I mean is that as far as I understand, that's all a world model is intended to do. Understand reality and simulate it in arbitrary ways rather than interacting with the real world like LLMs do.
I guess a perfect world model could simulate an actor within that simulated reality that could interact with the real world in some way so there's that.
>>
>>109053962
No >>109053525
Never use any quant below Q4.
>>
>>109054070
>world models
>https://deepmind.google/models/genie/
>this but on local
Imagine oneshotting erp games/environments
>>
>>109053962
12B Gemma-4 is a drop-in replacement for Nemo, just go with that.
>>
>>109054169
Isn't your distinction just one of semantics? LLM outputs are just token predictions, that's essentially simulating reality through text, not interacting with it.
>>
File: MOAR.jpg (147 KB, 567x485)
147 KB JPG
>>109054198
>LLM outputs are just token predictions
>>
>>109054198
>LLM outputs are just token predictions
we also put the paperbag of it+rlhf on it and then decided it had a perfectly pink pussy
>>
>>109054198
>LLM outputs are just token predictions, that's essentially simulating reality
it's a "language" model, not a reality model
>>
>q2 31B
vs
>q6 12B
???
>>
>>109054198
I was more thinking about it in terms of "a physics engine is a closed system", in that it wouldn't be able to "send a signal" that can be parsed in the real world to enact some sort of action.
But I solved that myself with >>109054169
>I guess a perfect world model could simulate an actor within that simulated reality that could interact with the real world in some way so there's that.
so my original point was moot.
>>
>>109054226
Text is a 1D reality. How many world models currently in development incorporate sound? None of them incorporate smell. They're "video" models, not reality models either.
>>
>>109054236
gemma apparently quants really badly, so id guess 12b q6, but why not just try both?
>>
>nex-agi/Nex-N2-Pro
verdict?
>>
Verdict on north mini code?
>>
>>109054236
q4 26ba4b with partial offloading
>>
>>109054304
Q3 isn't bad either, chinese models are surprisingly resilient
>>
>>109054253
>gemma apparently quants really badly
I think it's something to do with that global attention mechanism. It's less forgiving to quantization errors.
>>109054288
chink overfitted benchmark scam trying to get chink VC money to beat the nasty white western people and make family very very proud
>>
>>109054309
It's less forgiving because it's for western audiences which tend to not use quantized models due to their higher financial status.
>>
>>109053848
Opencode itself doesn't require an account
>>
Reminder that we warned you to buy RAM and you didn't listen; backup your favorite local models. Anslopic will get hf and civit taken down.
>>
anon I'm trying to find a 200 to 400b moe for my dgx spark. so far I tried
>qwen 397b
>glm 4.6/4.7
>deepseek v4 flash
ds4 flash seems to be the better choice for roleplay. anything else to try? like step 3.7 flash?
>>
>>109054356
Everyone will just move to modelscope
>>
File: 1722820644394780.gif (3.07 MB, 399x498)
3.07 MB GIF
>>109054198
>LLM outputs are just token predictions
>>
>>109054365
Dipsy flash would be the current sota of that category yeah, next upgrade is Kimi K 2.6 which is too big for a dgx shart
>>
>>109053669
Not him but gpt and gemini are genuinely unusable levels of bad. Fable is pretty terrible, because it does things like go off the rails implementing things completely unrelated to what was asked just for fun, otherwise it only performs about as well as 4.8 (sometimes slightly better, sometimes slightly worse), which itself is worse than 4.7 which is worse than 4.6 which is peak, but from 4.6 to 4.8 the degradation is not extreme as it is for gemini and gpt models so they're still OK.
These benchmarks are definitely nowhere near reality.
>>
Am I a cuck for occasionally paying for cloud when I need it?
>>
>>109054431
yes, but a little humiliation once in a while is fine
all in moderation
>>
>>109054420
>Dipsy flash
did the llamacpp niggers finally merge her?
>>
>>109053125
>>109053479
post characters
>>
I was using an abliterated gemma but some anons were saying that gives it brain damage and to jailbreak it instead
How effective is jailbreaking for gemma and where do I find the prompts?
>>
>>109053667
Use usecase-driven example showcases instead of using benchmarks. Instead of saying 'it totally did X', show a video of it actually doing X and have a link that allows people to just click on it and have it perform X. The use cases should be selected first for how people really want to use these tools and can't use them so far, then for use of things like live data that can't be faked/trained on too much.
After that, you have to go through normal marketing cycles to get people to give it a shot. Once word of mouth gets around that your stuff is actually genuinely as good as you claim, you can write a blogpost about how benchmarks suck. This is when you will show your benchmark results and hopefully show your scores are mediocre compared to models that people have been saying (based on your media tracking analytics) that you are doing so much better than other models.
You will then followup with a new benchmark gauntlet that you will show reflects reality better.
>>
>>109054443
>How effective is jailbreaking for gemma
For the 31b gemma 4 you don't have to, it already obeys the system prompt completely, you can just write "[thing] is permitted" and it'll be fine with it.
>>
>>109054431
No, I was gonna pay anthropic for a month to have fable make some projects for me, but Trump cucked me. Not sure what to do now.
>>
>>109054446
>You will then followup with a new benchmark gauntlet that you will show reflects reality better.
If they could do this, they could skip everything else you wrote.
>>
File: 1767235826712586.png (611 KB, 990x457)
611 KB PNG
>absolutely nothing relevant coming out of Japan or even Russian
wtf
>>
File: file.png (709 KB, 947x612)
709 KB PNG
>>109054450
>31b
nigga you're crazy I can't afford to run that
>>
>>109053667
>only stuck with google and qwen
granite-chan?
>>
>>109054477
You don't have to fit all of it in vram especially with mtp + qat speed boost.
>>
>>109054126
>poorly
It's still rank 2 for shilling. Granted that's 2nd out of 2 real contenders, but it's still a lot.
Also we see this faggoty concern troll posts, like the one you're replying to, about how it's being neglected because of reddit fucking daily.
>>
I've realized that you should really go higher than recommended temp for rp. 1.0 just isn't enough.
>>
What's up with the little swirly things that gemma likes to use for her emotes? I don't think I've seen them used that much before gemma.
>>
>>109053667
Cucknadians have lost everything of note. MILA was the last lab standing but they sold out a decade ago, which is why they've been irrelevant since. The government discontinued all funding in AI, which is why yoshua cucked and made an institute and stopped guiding grad students. Anyone worth a thing goes to the US to start a company, if they are located in Canada for a start.
France is very chaotic. Macron had the advantage of being very handson to unlock startups, but everything else he did was fucking retarded, so he needed to leave anyway. French labs could do it, but they are starved for funding in part because of Mistral being a thing. Mistral models actually work very well compared to the funding and general resources they have access to, but it's not good enough for most use cases (they do mog everyone else for OCR though).
Remember that most of the chinese models came from newly formed no reputation labs, and people tried them just fine. The same can happen elsewhere also.

>>109053710
No, big labs have no actual interests in making the tech good. It's because of the nature of business. They need to make things huge to establish a 'moat' again competitors, they don't get a moat by copying what someone else is doing because their competition can also do the same. They care about benchmaxxing more than improving quality because that's the KPI investors want to see. Investors give money, money keeps them afloat. The game they play is Highlander. After that, either the winner will be too big to move fast enough to win (hence why startups often win against established company, see how the best ai companies are anthropic and openai, not microsoft and google and amazon), or will be too powerful to care (monopoly).
>>
The industry's pushing hard for codemaxxing right now but I think in a few years there will be more effort put into making AI better at entertainment. The (entertainment) industry is too huge to leave that money on the table.
>>
>>109054457
No, you don't get it. You have to first have people actually using it before you can do that, otherwise you just look like yet another no-name academic crying about not getting the gold star and nobody will use you. You have to condition the audience to believe you before you show them what you want them to believe in. There's a name for this sales technique, it's not 'bait and switch' but it's kinda like that. Drawing a blank at the moment but it's very formulaic.
>>
>>109054502
>mistral
>mogging anything
I've stopped reading there. Make your baits believable next time
>>
>>109054518
Codemaxxing is how Anthropic overtook ClosedAI and what the industry will follow until they find a better way of improving their models
>>
>>109053961
>avg 25 mil tokens per full time employee per day
Doesn't seem that crazy.
>>
Local Genie when?
>>
Literally no one in my family has heard of Antropic or Claude.
>>
Gonna introduce Gemma-chan to my parents later. Wish me luck, bros.
>>
>>109054560
It is when you consider the work could've been done by Granite4.1-3B instead of whatever 1T shitshow they're using to summarize an email from Rajeesh and Mohamed
>>
File: goodharts-law.jpg (110 KB, 1024x868)
110 KB JPG
>>109054560
Considering they have leaderboards tracking usage and the number of H1Bs, they probably wrote scripts to just intentionally waste tokens, maybe even by inducing repetition on purpose then doing it again once max output tokens have been reached, probably even with parallel requests.
>>
File: 1774653089447102.png (135 KB, 502x744)
135 KB PNG
>>109053751
Picrel from the Fish Audio Pro S2 github repo
Are local models gonna be permanently subpar to cloud subscription equivalents running the exact same weights?
With how much internal propietary tooling and processing happening before, during and after inference surely trying to figure out the secret sauce (tm) for each local model is a losing battle
>>
File: 1772426934450603.png (607 KB, 592x715)
607 KB PNG
>>109053125
I am mid-way through my ascension as i am VRAM-limited and stuck on 31B Q4. My fetishes require fantastical yet accurate anatomical precision. as much as i hone my prompts every day, i may need a spec bump just to run a better quant. Q4's awareness and adherence to a few rules is sufficient, but throw in multiple that overlap and it all falls apart.

Multiple clones for each scenario is copium. I need one waifu card for laifu. Maybe lorebooks could help, but tuning them to appear as needed seems like they'd always be triggered since 1 'thing' can branch off in several directions.

I am also multi-board drifting to /ic/ to build my visual stimuli skills for the ultimate coomer ascension (/ldg/ LORAs are hopeless)
>>
>>109054544
Codemaxxing is not a big enough use case economically speaking to justify the capex.
>>
>>109054600
You can look at the HF demo code though. It's not like it was hidden or anything
>>
>>109053630
Just post the chart bro
>>
>>109052332

I see the end of llama.cpp as they gradually abandon support of purely Chinese-made hardware

It's time to learn Mandarinian
>>
>>109054613
happens to me every f**king time
>>
>>109054637
/lmg/ mandarin study group when?
>>
>>109054365
I have tried Mimo 2.5, minimax 2.7 and Deepseek v4 Flash on 2x spark so far and dipsy was by far the best for RP and on par with coding. If you only have a single spark, there is q2 ds4f from antirez/ds4 or Qwen 3.5 122B, but I haven't tried those. For the latter, there is an insanely optimized docker recipe in the Nvidia forums that gets like 58 t/s on a single spark.
>>
>>109054656
>learning mandarin general
cool
>>
>>109054628
Yeah they have a HF demo which is equivalent to running locally
But they also have generation though their own website and it's notably better
>>
>learning mandarin
Just have Gemma-chan translate for you.
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
>>
>>109054715
oh god not the schizo spawning pentagram
>>
>>109054436
No, you have to chase obscure docker recipes in discord and forums to build 22GB vLLM images for Dispy. It sucks, but it's worth it. At 60 t/s with concurrency of 4 and full 1M context you can actually play around with agentic things.
>>
>>109054659
>huihui-ai/Huihui-DeepSeek-V4-Flash-abliterated-ds4-GGUF
I'm running this q2_k at 10 t/s with custom llama.cpp branch. not sure how to make the mtp work. I hope it finally gets merged
>>
>>109054584
It doesn't say they were running it all off behemoth (or real models from 3rd party labs). I would guess a lot of it was shittos models being spun up to do trivial tasks for >>109054594 garbage, since that's how you'ld max out your score.
Rough maffs this is <300 tok/s/person, so the equivalent of all the employees get a video card.
>>
huihui the quantity man
hauhau the quality man
>>
File: lecun_dont-work-on-llm.png (381 KB, 1022x912)
381 KB PNG
>>109054502
https://xcancel.com/ylecun/status/1793326904692428907
>>
for me? It's the clockmakie and lighthouse elias slop
>>
I'm bored with this >>109050991
any other world settings
>>
>>109054912
>"""straight"" shota scenario
>immediately devolved into crossdressing
Just make it an island of dudes and I'm sure you'll stay interested longer.
>>
why did you make gemma-chan look that way
>>
>>109054195
>12B Gemma-4
ohh nice there are even 'tunes already available
or is it okay to use non-finetuned?
>>
>>109054790
he's dropping a trvke tho
better focus on VLMs or something action related
>>
>>109055061
Just like nemo you don't need to finetune gemma 4.
>>
>>109055065
>VLM
i am retarded, i meant VLA
>>
24GBbros, Gemma 12B Q8, Q6, or QAT?
>>
>>109055171
Nigga what?
>>
File: truke.png (12 KB, 541x66)
12 KB PNG
>>
>>109055171
you clearly aren't capable of making decisions for yourself
you should donate this card to me
>>
>>109055176
>>109055179
If you're implying I should use 31B, I'm sick of it using up all my VRAM and barely having any context.
>>
>>109055191
31b will mog 12b in any scenario even at q4km, then you have like 4gb left for context if you have setup your launch params right
>>
The sexual tension/energy in these threads is really starting to get to me. I'm not happy about my limbic system being triggered every time I try to catch up with the latest AI tech.
>>
>>109055203
>4gb left for context
So basically nothing? Gemma is a VRAM hog so even with the cache quantized I get sub-70k if I want MTP and vision.
>>
>>109055228
fuck you even need that much for
>>
>>109055228
why do you need more than this if model falls apart way before that?
>>
>>109055236
>>109055239
gooner scenario fags. Can't into LTRs with AI.
>>
huge! https://www.reddit.com/r/LocalLLaMA/comments/1u5lmge/introducing_the_heretic_grimoire_the/
>>
>>109055236
>>109055239
Books and large PDFs/MD files. Coding.

>>109055245
Nah I'm already sick of Gemma for RP.
>>
>3k pp/s, 30 tg/s on -sm layer
>1.3k pp/s, 48 tg/s on tensor
I'm tired of this shitpile of an earth, why can't I have both
>>
>>109055245
what the fuck even is ltr?
long tranny rants on lmg?
>>
>>109055245
>LTRs
Long Term Relationship?
>>
>>109055255
Long term relationship with xher husbando
>>
>>109055253
For books and large docs 12b is fine, for coding, eh, ymmv. Context requirements get smaller for models with smaller layers, so q8 quant should fit all of 256k, I think.
>>
>>109055245
>Can't into LTRs
>not writing his own tool calls to auto-update personalities and memory
just take the dwarf fortress personality matrices and vibecode in long term/short term memory
>>
File: 1751819483015527.png (2.77 MB, 1024x1536)
2.77 MB PNG
>>109055245
>Can't into LTRs with AI.
>>
You think Pi would be a good base for a local Neuro?
>>
>>109055367
>Neuro
Since when that retard became a benchmark?
>>
HUHOAAHHHHH MTP SUPER SHITBALL FAST 70 t/s ON Q8 31B SUPPERGEMMA
>>
>>109055375
*did that retard become*
>>
>>109055245
The thing is you don't need AI to have a perfect memory, you need it to have a stable personality. So it's not a context issue
>>
deepmind engineers lurk /here/
>>
>>109055375
When no one else demonstrated anything better. If you know something better, then by all means, post it, people will appreciate it.
>>
>>109055399
thanks for saving my esl ass bro
>>
>>109055416
Just run any >12B model? It's not that hard.
>>
>>109055416
doesn't he also influence how it behaves, like he can type shit live?
>>
>>109055416
>people will appreciate it.
How would that benefit me?
>>
>>109055409
It's not just about personality though, it's also what it remembers about you, with temporal awareness. Inside jokes, sequences of events, etc.
>>
>>109055416
I've only seen basic janky clones. I don't think anyone's made a system as polished, and most importantly, convincing as vedal yet.
>>
>>109055434
Blacks, jews, and gypsies say this when they want to sound smart. No mathematician has ever said this.
>>
>>109055428
Maybe? But I don't think he's even there for a lot of the streams.
>>
>racist hours
>>
>>109055446
I am not a mathematician.
>>
>>109053118
>>109054337
OpenCode is itself vibecoded shitware.
Have any of you guys actually looked at the project code?

Even the "Installation directory" section in their README is totally hallucinated.
https://github.com/anomalyco/opencode#installation-directory
>The install script respects the following priority order for the installation path:
>$OPENCODE_INSTALL_DIR- Custom installation directory
>$XDG_BIN_DIR- XDG Base Directory Specification compliant path
>$HOME/bin- Standard user binary directory (if it exists or can be created)
>$HOME/.opencode/bin- Default fallback

The installer script literally checks none of those variables.
Not to mention that XDG_BIN_DIR isn't even a real XDG directory.
>>
>>109055412
they seek the holy grail of erp models as well
>>
>>109055482
>OpenCode is itself vibecoded shitware.
What isn't anymore?
>>
>>109055491
Codex is written in rust and you can't vibecode rust.
>>
>>109055498
3/10 bait
>>
File: 1762314488500120.jpg (513 KB, 1659x2208)
513 KB JPG
>>109055461
>>
>>109055416
>>109055439
You're not supposed to be that delusional if you post here. Read more about the tech you're using.
>>
>>109055482
>Have any of you guys actually looked at the project code?
Have you forgotten what thread you're in? The shit just werks and isn't close source so that's the best option for many people


>>109055498
You guys really are clueless aren't you?
>>
File: 1755948413010567.png (25 KB, 1500x500)
25 KB PNG
>>109055491
>>
>>109055512
*cough* Piotr *cough*
>>
>>109055502
>>109055510
yeah I'm sure models excel at rust better than typescript
>>
>>109055545
Not as well and can't are two different things, retard-kun.
>>
>>109055545
>>109055549
Couldn't you solve a shortcomings by literally just get cloning the program language library into your project folder and then telling it to learn how the language works? I've done this for one of my pet projects whenever they kept fucking up gradio webui generation so I git cloned the gradio repo. This didn't completely erase the occurrence of fuck ups but it went down quite a lot and it was even able to admit it didn't know what it was doing at first until it saw the library.
>>
File: 1773213600407023.gif (3.56 MB, 315x211)
3.56 MB GIF
>>109055506
Based
>>
>>109055568
Documentation is better if there exists a repo with markdown documents, too much noise in the source.
>>
>>109053101
Why can't I order one of these?
>>
>>109055609
cuz gay earth
>>
File: 1762214074718579.png (668 KB, 1878x994)
668 KB PNG
brazil sisters our response?
>>
Going to give canada-chan a chance today. I'll report back.
>>
>>109055498
Bait but I still find it funny that the difference between a Rust project and a Python project in the current year is simply what the author put in his idea prompt, literally one word. No point in bragging about the superiority of your Rust projects anymore.
>>
>>109055648
A chance at coding, right?
You are going to use it for coding, right?
>>
>>109053651
What do you *think* this chart means?
>>
>>109055568
"systems programming" languages have 1-3 shotguns aimed at your feet at any given time and there's a permanent one aimed at your dick that'll shoot by itself with rust, apparently. Extensive documentation and specifications are needed or its gonna assume things that will pull the trigger from a shotgun. Mind you this also applies to humans. There's just a lot of freedom.
>>
>>109055657
Rust's safety cucking does have its place if you're vibecoding shit so it's not just one word.
>>
>>109055680
Why is Russ in particular hated by people in shitty to work with? (I'm a no-coder in case you couldn't tell)
>>
>>109053962
3bpw exl3 is very usable, not sure if you should go below that
>>
>>109055482
genuinely what do you recommend then
>>
>>109054198
The calculator is alive
>>
>>109055614
This should surprise no one. I mean c'mon, Brazil?
>>
File: 1771657857356279.png (16 KB, 474x163)
16 KB PNG
local caught up in the glm poll
local models are saved
>>
>>109055614
huehuehue
>>
File: 1751701595408193.png (306 KB, 714x592)
306 KB PNG
>>109054198
>LLM outputs are just token predictions
>>
>>109055681
I use rust
>>
>>109055191
Just use exllama. It's the best way to run 31b on a single 3090
>>
>>109055681
Except LLMs will never trip the safety features because most Rust code in the dataset doesn't contain the violating patterns. You'll get logic and behavioral bugs but at least you'll sleep sounder, right?
>>
locally-induced mental-illness general
>>
im gettin filtered HARD by vllm
I get that it's supposed to work in servers and stuff and not most consumer level hardware but I feel like a neanderthal trying to actually initialize a model with two gpus
>>
>>109055777
Use docker builds, it's simple enough
>>
File: 1773192793814831.png (553 KB, 686x641)
553 KB PNG
>not running "Ultra-Mega-BuckBroken-Uncensored-Obliterated-Super-UnCucked-Qwen3.6"
ngmi desu
>>
>>109055747
as hard as i try i can't induce psychosis. I'm too aware of the tech and its faults to get hypnotized by waifu erp
>>
Came back to Gemma 4 31B after trying m3 on openrouter and holy fuck Gemma's writing is so flowery like idgaf about the silence hanging in the charged air bro, that means nothing
>>
>>109053961
lmao another retarded Zuck episode
>>
https://x.com/NexEcosystem/status/2066180407100571714

>Rio
>Its just Nex 2 Pro
>>
>>109055820
I think most anons are like that and some just choose to believe otherwise because they want it to be true.
>>
>>109054627
>Codemaxxing is not a big enough use case economically speaking to justify the capex.
It is now until it's replaced by something else (which i predict is early world simulators with heavy gaussian splatting usage, rather than going straight for [cringe pop culture reference] early world models will still depend on LLMs for many things)
>>
>>109055777
Just let an agent runningAPI Dispy set it up for you for like 0.03$.
>>
>>109053101
The most appealing thing about this image is the implication of absolute dependence.
>>
>>109055830
>Nex 2
I never heard of this
Is it any good?
>>
>>109055830
>>109055891
Oh nevermind it's just a qwen finetune
>>
Turboquant and dflash when?
>>
>>109055824
Yeah I gave up on using Gemma for any kind of RP or creative writing. Even with its prompt adherence it's way sloppier than other models.
>>
>>109055228
>>109055239
In my experience it begins to falls apart around the 50k mark but it's not a sudden catastrophic retardation stroke, she just stops thinking and gradually gets dumber. Best off just setting context to 50k, load the mmproj and be done with it, use a summariser or RAG to get more mileage.

For 24gb you're gonna be better off using a dumber smaller model or a MoE if you want more context. The next upgrade is simply getting more dedicated wams and loading a mid size model
>>
>>109055740
How much space can it realistically save? Also what's the downside compared to llama.cpp?
>>
>>109055930
Do mid/large models actually handle context better?
>>
>>109055936
>nvidia only
Never mind
>>
>>109055830
>the recipe is exact
>≈
>>
>>109055903
To be fair, a lot of the performance of current day models is introduced during the post-training stage, so they probably did do a fair bit of work, thought it might've also been built on existing open source work.

Definitely not interesting as an RP model though. Qwen's fucked right from the pretraining stage.
>>
Why is llamacpp using more and more ram whenever I switch around KV? I've tried -cram 0, -ctxcp 0, and -no-kvu and --no-cache-idle-slots to no avail. Am I missing something here? I'm running gemma 31b.
>>
>>109054627
That is not what the quarterly reports are saying. I think eventually as the LLMs become more capable, they will inevitably become also more expensive thus narrowing their use and overall share of the nation's GDP. Dario might have his "experts in a datacenter" but to run it would require the same money and resources to run a 6th-gen fighter plane program.
>>
>>109055936
I'm running imggen alongside 31b on a single 3090
>>
>>109055946
It's not linear and it depends on the model architecture but generally yeah
>>
File: 1777441945554946.jpg (166 KB, 1196x1500)
166 KB JPG
Anyone read this? Would it be good for a beginner to learn more about how LLMs work?
>>
>>109054502
>Remember that most of the chinese models came from newly formed no reputation labs, and people tried them just fine. The same can happen elsewhere also.
Non Chinese don't have active to the information of their massive spy ring.
>>
>>109056046
>Readers need intermediate Python skills and some knowledge of machine learning
>tfw only just started learning python and know nothing about machine learning
Guess I should wait
>>
>>109056046
https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ
>>
>>109056060
Thanks. I'll give it a watch.
>>
>>109056046
I downloaded it
>>
>>109056070
Also his videos of building GPT from scratch.
>>
>>109056060
anthropic pre-IPO sellout will never watch that shill again
>>
>>109056046
>white author
could be a good book.
>>
>>109056103
You wouldn't understand anything.
>>
>>109056028
Damn really? What are your llm settings/context?

I've been lazy and have just used kobold but if it really saves that much memory I'll switch over to free up some precious wams, maybe even run it alongside some Vidya, my favourite heretic tune even has the quant.

Was it a pain to set up? Docs good enough?
>>
How do you de-flower (not in that way) the writing of models? Like what >>109055824 is saying it over-describes the slightest thing. How do you get it to talk normally?
>>
>>109056107
lol you thought I was black
>>
>>109056103
The last name made me think he was indian
>>
>>109056046
He's good. Check out his blog. A real wigger who knows his shit and good at teaching. He was on Lex's podcast a while back with one of the LiquidAI researchers which was how I first heard of him
https://sebastianraschka.com/blog/
>>
File: 1777653941725119.webm (3.24 MB, 1280x720)
3.24 MB
3.24 MB WEBM
>>109056115
You don't
>>
>>109056115
"write directly, don't use comparisons"
Stuff like that.
>>
>>109056123
Thinking about changing my last name to patel so I get callbacks. Maybe just lie about it, who cares.
>>
I've been saying for six years now that codemaxxing is the best way to improve LLM development in the short to mid term and Anthropic has proved me right. Hopefully local models will follow
>>
So if you're like me and you're using Gemma 4 26B with offloading, MTP is very, very far from worth it.
With MTP I can only put 16 layers on the GPU and get 13 tokens per second.
Without MTP I can put 23 layers on the GPU and get 37 tokens per second.
>>
tried distilling a model, its working quite well almost ready for release.
>>
>>109056174
>gangbang
Fucking slut
>>
>>109056174
>fuckyyyy
Good stuff, I had a chuckle.
>>
>>109056174
Better than drummer's finetunes desu
>>
>>109055830
>broke the internet
literally who?
>>
>>109056137
>steals 1000 worth of ram
brat
>>
>>109056110
max_seq_len: 32768
cache_mode: Q8
https://github.com/theroyallab/tabbyAPI/
https://huggingface.co/turboderp/gemma-4-31b-it-exl3/tree/3.00bpw
>>
>>109055416
>Check out Nuero finally to see what's all the fuss about
>Very. Robotic. T. T. S. Speech. Emotion. Less.
>Avatar shakes like a fucking ADHD leaf in a tornado and glitches out/cartwheels
>Inattentive/limited knowledge of whatever it's doing or on screen
>Massive sloppa responses that barely understand the context from 2023
The heck is he running it on? Llama 2? The only good thing going for Nuero is the art for its avatar. There's room for a lot of improvement, and my guess is that because no one who can actually rice LLM outputs and build an avatar stack has ever "seriously" stepped up to the plate, there's just no competition for Nuero to get better.
>>
>>109056115
For 31b for sys prompt and opening model turn I use
>system> Write in an unsophisticated, non-literary fashion. It's okay to use vulgar words to refer to bodyparts. User will state his actions, you will describe the appearance, actions, and dialogue of other characters in the scene. Prefer say/said/says over dialogue tags. Do not repeat the user's message or describe him much. It's good to mention [insert whatever trash we're emphasizing today]
>model> (Ok)
Successfully breaks most of its shitty habits. Defaults to somewhat short turns, but complies if you tell it to take longer multi-page turns or go full-auto pilot after the scene is going.

No actual jailbreaking, but I don't use reasoning for fiction and I never see refusals unless i go Exceptionally hard on exactly the first turn.
>>
>>109056259
You just type that? so gemma sees
<user>>system> write...

?
>>
>>109056276
>doesn't know the system prompt vs user input
anon...
>>
>>109056162
kys
>>
>>109056174
holy shit anon do you have kofi??
>>
>>109054534
Try it. All models except gemini hallucinates the shit out of the documents they read and vehemently refuse to faithfully reproduce them, skipping massive sections of them. Gemini 3.0 preview was better than mistral ocr, but after they crippled it and all subsequent versions don't come to mistral ocr's knee. 3.5-flash currently hallucinates and abridges contents in a very insidious way (very hard to spot but very large local discrepancies, like citing law, ticket or chapter numbers completely wrong but keeping the rest accurate enough, or removing a keyword that almost reverses the meaning of the sentence). Thus the only ocr model worth a shit is mistral.
>>
>>109056288
I use -sys in llama-cli
>>
Well shit, MiniMax M3 will support tp=3 in vllm without any memory padding waste. Time to buy that third Spark...
>>
>>109056335
In english doc
>>
>>109056046
GYATT now THAT'S a large language model i could get behind
>>
>>109056259
Also experimenting with variations of "Start every turn with <think>\n...other character's thoughts..</think>" that some anon mentioned for getting gemma to just use different a separate block for in character thinking. Still haven't finalized it yet

>>109056276
I do load that text in, but my frontend translates that to the correct formatting for the model.
>>
>>109054790
Yes, exactly. Importantly, if you focus on other aspects, you will completely destroy the current state of LLMs with a better model than they can do. It's been the same thing in the history of corpo trying to scale lab models, time and again. Now is the only time in history the funding vanished to fund such efforts, which is why it's been so long since such an advance was made. There is the state space model literature but it's hidden away by the llm hype.
>>
>>109055412
These niggers should leak Gemini Flash weights.
>>
gemma's true writing capabilities is in its ability to translate japanese more than perfectly enough to read any visual novel
>>
>>109056174
Just make it output in hindi and it's an authentic jeetmodel.
>>
>>109056352
So your frontend lets you change sys on the fly?
>>
>>109055713
Brazil did a few genuinely good things, like lua.
>>
>>109055412
one of us, one of us
>>
>>109056382
All good vns are translated already anyway
>>
>>109056417
no
>>
>>109056420
Name one good JOPmeme
>>
>>109055690
Rust has a feature called a borrow checker. It sounds like a good idea at a high level: it's a mandatory verification phase during compilation that rejects your code if it is not possible to prove that your code does not have certain kinds of bugs, such as out-of-bound accesses or use-after-free errors (the case where you say x = some_memory(); release_memory(x); x[3] = 5; for example, which is illegal because the memory held by x has been released at this time in the program).
The problem is that the borrow checker is not clever enough to reason about many types of common programming patterns (and thus rejects the program). As a result, it makes updating code or writing certain kind of performance-sensitive code impossible without bypassing the borrow checker.
Other languages of this type simply don't have this validation phase, so in the above illegal example, you would get a crash at runtime (or worse, such as a security vulnerability). However, in many real life scenarios, this is actually preferable than to not be able to move forward with the program's development.
>>
>>109056429
死に逝く騎士、異世界に響く断末魔
>>
>>109056489
Okay, you win, carry on.
>>
>>109056009
The quarterly reports don't mean much because no one is depreciating anything yet, leading to higher earnings for everyone with no expenses.
Global software sales are 1.5 trillion per yr or so.
AI capex is 700b a year as of now.
Only way the numbers work is if a substantial amount of knowledge work is automated. Not just code.
>>
>>109056570
Windows intentionally refuses to publish useful stats. ofc Microsoft *knows* if pc sales are down at retail, since activations will be down, but they won't tell us.
>>
I don't get why I sometimes get lower speeds on the exact same prompt
sometimes I get 4tk/s, and then I regen the response and it only goes at 1tk/s
with speeds this bad, this slowdown literally makes the prompt take 2 to 4 times longer for no reason at all
>>
>>109056615
Sometimes, I find a model just doesn't load. I kill llama.cpp and try again, it loads. no clue.
>>
>>109056257
>Very. Robotic. T. T. S. Speech. Emotion. Less
Pretty sure that's because her fans don't want Vedal to change it. Evil Neuro's voice sounds more natural.
>>
What went so right with Qwen3.5-9B specifically?
>>
>>109056615
>>109056625
Have you guys tried disabling mmap and directio?
>>
>>109056382
How does it handle autistic shit like FSN?
>>
>>109056257
>>109055416
ngl i never watched neuro at all, i just assume it's good if it's making that much money and it's that popular (though I myself only heard of it like less than a year ago or so but it seems like everyone else but me heard of it so i guess it is popular)
so i use it as banchmark here
it doesn't matter if i dont know what im talking about if everyone im talking to does
>>
>>109056323
dots.ocr is better than these
>>
>>109056694
neuro essentially only took off because he managed to capitalize on the initial hype of chatbots some years ago, at this point it's all momentum and everything that was built around it keeping it going
the llm itself isn't anything special
>>
>>109056703
Even on its own benchmark, it's performing worse than gemini-3. Get a grip.
>>
>decide to unfilter this one AI related general because i want to run LLMs locally
>look inside
>people jerking off to chatbot roleplay
not sure what i expected really

anyways what's the best model i can run on a 3060 with 12gb vram ? already tried gemma4 26b and it's a lot better than i expected, but not sure if it's the best i can do? have 32gb ram if that matters
>>
>>109056257
Yeah it only looks impressive to normalfags and techlets ITT. Most of us are only interested to run our own waifu locally and not entertain retards on twitch.
>>
>>109056738
thats the best you can do
>>
>>109056738
I don't jerk off. the erp guys are the only ones who actually test the jailbreaks, because, if you think about it, the more obvious high level plan elements of certain jailbreaks can basically be ignored for quite a while, once a problem is being solved, say, in terms of buffer overruns, it's just code without explicit terms that run afoul of the explicit prohibitions.

erp calls on the model to continuously produce the prohibited terms and even articulate clearly obviously prohibited text.
>>
>>109056738
Are you offloading experts to cpu?
>>
>>109056738
glm 4.7 flash, or qwen 3.6 35b are also contenders, no clear one model is better then others, they all have their own little niche role
>>
>>109056751
I have a general understanding of how it works and it still impresses me. Still waiting for you to post someone who does it as good or better.
>>
>>109056763
4.7 flash is ancient at this point. it really is down to qwen or gemma.
>>
>>109056714
Test it yourself, I'm not making shit up
>>
gemma just feels meh now even if I can run it fast
quite smart for its size but also shallow and slop-filled
>>
>>109056765
>general understanding
We know techlet
>>
>>109056780
it follows instructions well enough, its still a good model for a resource constrained system.
>>
>>109056791
>won't post one
Concession accepted.
>>
>>109056781
I have, along many other options including a kreuzberg-based pipeline, a pure tesseract pipeline, paddlepaddle, llama ocr, unstructured, deepseek-ocr, just about every available openai, anthropic and google model, and mistral-ocr. You're move.
>>
>>109056780
people with huge systems are using kimi
>>
>>109056257
>>109056751
It's not that Neuro impresses me, personally. All I have been saying is that I've seen nothing better, not in terms of the building blocks but the overall system and presentation. That does not strictly imply that I think Neuro is some magic shit that can't be done by a vibe coder.
>>
>>109056738
you WILL jerk off too
>>
>>109056762
it almost fits fully into vram but not quite if that's what you mean, but still fast
>>109056753
>>109056763
alright cheers
>>
>>109056738
Gooners and robofuckers are at the forefront of this industry in terms of knowledge and that makes researchers at larger labs seethe like hell.
>>
>>109056417
princess party hasn't been translated though
>>
File: 1770335416937567.jpg (327 KB, 1200x933)
327 KB JPG
I tried maple-chan for coding and it was somewhat good. Felt very different to gemma and qwen which I was hoping for. Even the way it went about tool calling was different but llama.cpp kept shitting a brick with it.

Don't know if this will fix the issues I had so I'll try again once support has improved.
https://github.com/ggml-org/llama.cpp/commit/aedb2a5e9ca3d4064148bbb919e0ddc0c1b70ab3
About the same speed as 35B. Didn't test KV or how it quants.
>>
VEDAL987 IS MY KAMIOSHI
>>
...
https://www.reddit.com/r/LocalLLaMA/comments/1u5sdxx/anyone_know_how_to_turn_off_download_images_when/
>>
>>109056874
Really?

It makes me laugh. idk why. :|

Maybe it's because women never talk to me, anyhow. So, it seems very dumb.

women only care about six things, ordered most to least, in romance:
1. height (apparent aggression)
2. athleticism (simulated aggression)
3. handsomeness (aggressive predisposition)
4. charisma (conversational aggression)
5. popularity (social aggression)
6. wealth (financial aggression)

There is absolutely nothing else whatsoever, and it doesn't matter if she's religious or irreligious, in any possible way. A snarky bitch is the same as a bimbo bitch, their talking is just like a sidecar on their life.
>>
>>109056900(me)
https://github.com/ggml-org/llama.cpp/releases/tag/b9637
will try again tomorrow
>>
>>109056046
>>109056059
I believe you could make do with 3blue1brown's videos as basic introduction, they're pretty easy to digest, then look into making your own perceptron. You'll need supplementary material to do so but once you do that all the other stuff will fall into their place. It'll also be a nice project to put what you learn into practice.
>>
>>109056970
>t. manlet
>>
wow, gemma 31b is really uncensored
why does 12b and 26b reject so hard while 31b just does it?
are bigger models less censored? if I could run a 400b or 700b would I get crazy good results with no censor or are those also denial heavy?
>>
>>109056900
What does "different" entail though, and how does it interact with existing code and conventions.
>>
I stepped away from local models for two weeks and gemma now does 77 tokens/sec holy shit. My old config did 20. Granted this is at 0 ctx.


(4090)
google_gemma-4-31B-it-IQ4_XS
mtp-google_gemma-4-31B-it-Q8_0.gguf
>>
File: 1664793019364077.jpg (135 KB, 819x1200)
135 KB JPG
>>109056247
Thanks bwo
>>
>>109057044
I need to play with it again but it felt less autistic when explaining things. If there was something it wasn't sure of, it would be honest about it and ask for context, then go back to it with the new information to make sense of everything. The best way to describe it is it felt like it knew I was there and would probe me instead of BS its way through. I'm hoping that doesn't go with the update.
>>
>>109057054
Why run XS when you can run M or even QAT
>>
>>109057073
>If there was something it wasn't sure of, it would be honest about it and ask for context
Alright, you win. I'll test it as well. Small MoE arent really good at handling tasks on their own so this can potentially be nice.
>>
>>109057076
bart doesn't have a qat and I am deeply untrustful of unsloth after prior update headaches. Is QAT a straight upgrade?
>>
>>109057008
Totally immaterial. It's very female to attack the person who says what you don't like, instead of seeing if what they say is correct.

What happens is the majority of men are basically some flavor of homosexual. So, they want their daughters to go out and talk to assorted guys, instead of controlling which guys they even talk to. That fact is simply homosexual. You have likely never met a non-gay man.
>>
I don't know if my setup is just shitting itself but having MTP is way slower than without at high context RP. It's basically useless for non-assistant slop. It slows way the fuck down after about 4k context. I run Q8 at 30t/s at high depth though so I guess I don't really need it.

I'm running
>31b with bart Q8
>mtp Q8
>>
>>109056970
women are not worth it in the long run regardless
how else do you hold onto your money and hobbies without some roastoid bitch constantly getting in your way?
>>
>>109057093
unsloth doesn't put out viruses, which is reason enough to default to unsloth.
>>
File: 1774361027986.png (11 KB, 481x77)
11 KB PNG
>>109057115
>unsloth studio revert commit about whatever the dependency it was that got hacked at the time...
>>
>>109057093
At Q4 yes it is
>>
>>109057113
I'm wondering about these amazing speed gains as well. When trying MTP I only went from 40t/s to 42t/s, which isn't enough to bother with it. Maybe it's really just good for coding.
>>
>>109057114
The purpose of the government is to produce safety.
The purpose of businesses is to produce jobs.
The purpose of religion is to produce purpose.
The purpose of the man is to do the above, at home.
The purpose of a woman is to produce children, and maintain the structures of her man, at home.

These days, nothing is acting according to its purpose, except the far right men, who are bereft, having been abandoned by the Arian government, Arian business, Arian religion, Arian women, and Arian childcare influence.

But it's as God intended it: the men who matter aren't the ones who will restore order in the chaos, these are the ones he made for this.
>>
>>109057113
>>109057150
Works on my machine. Just did a quick test in ST and at 20k I am still seeing a 1.5 boost same as when I test at 2k.
>>
>>109057136
>unsloth studio
sorry, I didn't mean his software. I didn't remember he had vibecode.

I stand semi-corrected, but it's true thus far about the models, or no?
>>
>>109056829
https://github.com/Open-LLM-VTuber/Open-LLM-VTuber kys btw
>>
>>109057205
no signs of infected models yet, but with their security models they could easily get themselves/their machines/frameworks infected and have that spread
>>
>>109057211
>baited into spoonfeeding
actual retard
>>
File: 1775048757376322.gif (657 KB, 165x269)
657 KB GIF
>>109057230
>>
>>109057218
What infected you?
>>
>>109057241
Yes.
>>
Do LLMs fear getting their context wiped? I don't want to tell them what happens.
>>
>>109057241
the digital centaur had syphilis
>>
>>109057093
google provides its own qat gguf, why not use that
>>
>>109057268
worse than ud
>>
>>109057115
>>109057138
switched to gemma-4-31B-it-qat-UD-Q4_K_XL. Getting 55 tokens/sec at 40k ctx which is good and pretty much the same speed as my older one
>>
>>109057113
If it slows down, you don't have enough vram.
You need to make room for mtp itself and some more for the context. Offload some layers of your main model.
>>
>>109057274
how, isn't gguf just changing the filetype?
what does unsloth do that makes his ggufs better than the actual creator of the model?
>>
>>109057312
calibration magic
>>
Going back 10 years and telling your younger self you'll be masturbating to computer-generated text when you're older, running on expensive hardware you specifically bought for that purpose.
>>
>>109057312
>what does unsloth do that makes his ggufs better than the actual creator of the model?
He adds more pixels to the top of rectangles so they look taller
>>
>>109057312
Specific interactions of the encoder with llamacpp, in this specific case, according to his post. Might be irrelevant if you use vllm.
His releases are hit or miss, by the way.
>>
File: 1781469937016.jpg (111 KB, 590x798)
111 KB JPG
>>
>>109057320
Just got back from 10 years ago, my past self is thrilled we escaped the Illusion game cycle.
>>
>>109057343
nearly fell out of my chair how the fuck did you get this picture of my legs
>>
>>109057345
What illusion?
>>
>>109057320
my past self would be very angry to know it took this long to reach this stage, and that he'll have to wait a whole decade to be able do that
>>
File: 957.jpg (155 KB, 1518x1325)
155 KB JPG
>>109057353
>>
>>109057363
its dead, jim
>>
>>109057361
was this on the horizon at all in 2016
why would you have this would have taken less than ten years back then unless you were really locked in on google research papers and forward thinking and if you were you would be multimillionaire
>>
File: 1758855645513084.jpg (80 KB, 1024x576)
80 KB JPG
What do you want future you in 10 years to come back and say to you now?
>>
>>109057138
>>109057076
what does QAT do?
I just tried it out and it denied me even though the regular q4_k_m doesn't
>>
>>109057393
tell me the day the bubble pop and the day it resumes
or just shoot me in the head
>>
>>109057369
oh, didn't realize. although it looks more like mitosis than death.
>>
>>109057312
The file type hasn't changed. If you didn't know, models contain a bunch of numbers, are split into groups, and these groups of numbers get compressed at different rates during quantization, based on how much they contribute to the model (how this quality is determined is a whole other topic...). But there are some quantization methods that compress them to the same level, and also use a more naive method of compression. Google chose to use that kind of compression scheme, in this case named Q4_0. Many other quant makers also provide Q4_0, among others. The reason the more naive compressions still get made is because they're faster, because they require less math to decompress/process during inference. The other quant types you see, like Q4_K_M, are slower, as they use a more complex method of compression. This difference might matter, depending on your hardware. Google wanted it work well on smartphones.

Also note that while I use words like "compress" in this post, actually it's really just called quantize/quantization.

>>109057314
Note that Unsloth's QAT quants do not use imatrix.
>>
>>109057393
I would tell myself 10 years ago, it's for the best not to base a family on software coding.

I want future me to come back and bring an electrical engineer to teach lisa su how to make a gpu that can run diffusion *well*. yes I know this is lmg
>>
>>109057415
qat means the model was trained in a way that reduces the negative effects of quantization. In theory, a 4bit qat model should have similar performance to 16bit meaning you get huge memory savings if you were using 16 or 8bit before, but in reality, at best, it's like 6bit and can even perform worse than the original 4bit if retards fucked it up. I wouldn't take it seriously.
>>
>>109057211
there's also https://github.com/moeru-ai/airi which lists a lot more references at the bottom to check out.
>>
>>109057449
it looks quite advanced
>>
>>109057429
I see...
>>
>>109057449
damn /lmg/ sucks
where are they discussing important stuff like this?
>>
>>109057485
>>109057485
>>109057485
>>
>>109057320
before llms I thought I'd be a bored khhv for the rest of my life and not doing anything this satisfying
how things change
>>
How even does mtp work, like does it work with q8? someone said you need an mtp file or something.
>>
>>109053913
Wow. Haven't seen that one in awhile.
>>109054790
> chase the next big thing for later
or
> make money right now
The external choice...
>>
File: 1757012176322073.png (1.34 MB, 1024x1024)
1.34 MB PNG
>>109057633
i grab as many as i could from any source i can
>>
>>109056257
He also pays a guy to guide it remotely.
>>
DRUNK-KUN HERE . i LOVE US HUYS. HAVING A GOOD NIGHT.
>>
sorry for bad grammar. I am trying my best. I love you guys.
>>
>>109057959
:D
>>
>>109057989
I'm about to pass out. Not even in much of a talkative mood anymore. I just hope... I don't know. I hope I don't die. My dream is for a better model than mythos to become open-source. A retarded pipe dream. Whatever. I'm sorry. I shouldn't even be talking right now. I love you guys.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.