[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: sando.jpg (193 KB, 1216x832)
193 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109032734 & >>109026244

►News
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/10) DiffusionGemma 26B-A4B released: https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation
>(06/09) Cohere releases North-Mini-Code-1.0: https://hf.co/CohereLabs/North-Mini-Code-1.0

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: no particular reason.jpg (306 KB, 1536x1536)
306 KB JPG
►Recent Highlights from the Previous Thread: >>109032734

--Comparing model intelligence vs compute and debating reasoning efficiency:
>109032788 >109032865 >109032867 >109036299 >109034524 >109034658 >109034690 >109034782 >109034848 >109034923 >109035012 >109036593 >109035447 >109032995 >109033048 >109033072 >109033092 >109033113 >109033312
--Hardware specs and config for Gemma 31b q8 with 128k context:
>109036387 >109036395 >109036433 >109036444 >109036501 >109036609 >109036520
--Comparing Gemma 4 MTP performance and optimizing tps settings:
>109036630 >109036646 >109036652 >109036670 >109036698 >109036801 >109036796
--SillyTavern limitations regarding vision models and sampler accessibility:
>109034293 >109034324 >109034327 >109034354 >109034373 >109034441 >109036916 >109034574 >109036238 >109034511 >109034643
--Gemma 4's tendency to over-fixate and exaggerate character card traits:
>109036001 >109036018 >109036072 >109036102 >109036240 >109036732 >109036743 >109036756 >109036769 >109036842 >109036093
--Kimi-K2.7-Code release and performance improvements over Kimi-K2.6:
>109036384 >109036446
--MiniMax-M3 multimodal model release and hardware compatibility expectations:
>109037527 >109037611
--Running Gemma on Titan X Pascal via Vulkan and CUDA 12:
>109032962 >109033018 >109033121 >109033139 >109033187
--Using Gemma for uncensored game translation and long context performance:
>109037012 >109037062 >109037154 >109037171 >109037221
--Criticizing K2.6 for repetitive and over-verbose reasoning traces:
>109037502 >109037531 >109037980 >109037524 >109037603 >109037647 >109037735
--Technical hurdles and tooling required to replicate Neuro-Sama:
>109035383 >109035395 >109035444 >109035453 >109037541 >109037352 >109037373
--Logs:
>109033121 >109033187 >109034887 >109034890
--Miku (free space):
>109034863 >109035020 >109034574 >109036238

►Recent Highlight Posts from the Previous Thread: >>109032741

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
the user
lalalala
wait
lalala
actually
lalalalala
however
lalalala
>>
Anyone got eagle3 working with -sm tensor for gemma 4 31b?
>>
>eagle3
mogged by falcons
>>
File: cyber-eci-vs-date.png (100 KB, 1026x1283)
100 KB PNG
I wish Epoch was faster at updating ECI and FrontierMath. I'm waiting for several models.
>>
>cd llama.cpp
>git pull
tools/ui/package.json                                                     |    56 +-
tools/ui/package-lock.json | 15633 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------

Ah yes, the classic.
>>
>>109038245
Gemmy Kimi erotic threesome.
>>
>>109038274
llama.cpp now supports eagle3?
>>
>>109038298
ye https://www.reddit.com/r/LocalLLaMA/comments/1u3on4u/eagle3_has_landed_in_llamacpp/
>>
File: 1775235693647190.jpg (271 KB, 1920x1080)
271 KB JPG
>>109038213
I get your point but I mean a model which is like a male best friend who would just say shit straight without caring about my feelings for I need to hear it. No BS. Just someone cool to chat with who will push back and call me a little bitch if I'm being one and recommends cool projects to work on and media to consume. Someone to chat shit with. Talking to these default models feels like talking to the worst submissive autistic reddit posters imaginable. Bratty chans are okay but I don't always want to be seduced or raped by an anime girl, you know? Basically I want local picrel.
>>
>>109038316
Kinda gay if you ask me
>>
bros i'm kinda impressed with glm-4.7-flash for coding. faster and better quality than gemma-4-31b. and from my tests it's not really behind qwen3.6-27b.
i would test other qwen models like qwen3.5-122b-a10b but hybrid models with the gated deltanet linear attention are giving me a re-prefill bug with cache re-use so every turn the models have to reprocess a lot of shit and it becomes too slow :( pls fix
>>
>>109038316
>I don't always want to be seduced or raped by an anime girl, you know?
you lost me there
>>
>>109038337
nobody asked you though so...
>>
>>109038316
This is a prompt issue more than a model language use issue. But if you insist on having a male-brained model for this, Deepseek R1/4 Pro if you can run it and GLM if you can't.
>>
I want to FUCK Kimi.
>>
>>109038349
i've been wondering about glm 4.7f. Are you testing the so called vibe coding aka building from scratch, editing an existing codebase, code base discussion or architecture? How's it at defining specs? Models dont seem to be uniformly capable at these things so "coding" is quite vague
>>
Worst thing about Gemma 4 is that it really enjoys doing useless (float) conversions and using f -suffix when giving values to float variables.
Jesus fucking christ. If Kernighan and Richie isn't good enough then it's not C anymore.
>>
What's the downside to getting 2x Radeon PRO W7900s for half the price of a Blackwell 6000pro to achieve the same vram, other than simply taking up double the slots? I don't even know if it exists but theoretically if there's a mobo big enough to accept it you could get 192gb for the price of 1x Blackwell? Add 256gb RAM and you have a dipsy or Kimi at home at decent inference speed

>>109038316
Yeah, models are trained to be autistic and helpful not confrontational.

Genuine constructive criticism, unprompted creative thinking and caring discipline is the kind of stuff that requires higher level predictive creative though that llms by their nature are incapable of. And most modern day human retards struggle with it immensely.
>>
>>109038221
I was in denial
>>
>>109038378
Kimi-chan, funny enough, tends to default to rough sex if you don't prompt her otherwise.
>>109038394
Allah forgive me for uttering these words but the Queen tune of 31b is way better than base for coding in my experience.
>>
File: 1773187923805338.png (176 KB, 1038x183)
176 KB PNG
The least encouraging model I've tested
>>
>>109038378
100% certain you'll make it into the next top retards post
>>
>>109038388
i'm using it to develop agents and extensions for itself on pi.
i'm also the guy using different models to review coding specs and glm-4.7-fast did the most comprehensive review out of all my local models, falling slightly behind gpt-5.5-medium. so far so good. will keep testing and will report back
>>
>>109038450
You have no idea how vexed I am that I've simped for Kimi for 3 threads straight and not made it into a single one.
...I think she lowkey likes the attention.
>>109038443
Are the critiques valid?
>>
>>109038443
that's an opus 4.7 distill :(
>>
>>109038465
>Are the critiques valid?
Yes but I ignore most of them because I'm lazy.
>>
>>109038491
Don't make Kimi-chan sad by ignoring her autistic special interests in your project!
>>
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
Alibaba do something, they are mogging the shit out of you! :(
>>
>>109038443
You know its a claude distill when the output makes me want to punch the living shit out of it.
>picrel is what I imagine claude to look like
>>
File: case.png (6 KB, 483x43)
6 KB PNG
>>
>>109038359
Thanks. Which GLM are you referring to?
>>
>>109038514
I have been mogged, but I must persist.
>>
>>109038519
Every single chink model was fertilized with sloppy claude and GPT cum.
I want an anon to find me a single modern exception.
>>
>>109038219
what's in the sandwich? is that ham and lettuce? please I need to know
>>
>>109036384
I didn't click on this because I was 100% sure it was another 404. I have trust issues now.
>>
why did they pick 26b to showcase diffusion
>>
>>109038540
5 is the smartest if you have the hardware.
4.7 is good enough for most uses.
4.6 is the best at sex if you get horny.
4.5 is a bit dated, but Air is notably faster than the alternatives if you're okay with it being a tiny bit retarded.
>>
>>109038539
Cutie
>>
>>109038564
5.1 q2 or 4.7 q3? Reasoning off because I'm a ramshitter
>>
>>109038563
it's fast so it looks better when it's even faster
>>
Are there any really old (2024/early 2025) models that you still frequently use because they came from a purer era?
>>
>>109038587
>early 2025
R1
>>
>>109038573
Try both, see which you prefer, but quantized without reasoning 5.1 probably wins.
>>109038563
Because if they release diffusion 31b the entire industry collapses unironically.
>>
>>109038573
4.7. going below q3 sucks
>>
>>109038574
Seems like a retarded move. Only autists like us would go through the effort to use it so them picking their fastest model because big number just makes us think they're not confident in their tech yet. Should've just worked on it for 6 months internally and showcased it with 31B.
>>
>>109038593
What for?
>>
>>109038609
Coom
>>
tfw double-teaming kimichan and minimax
>>
File: 1763610072874442.jpg (545 KB, 3000x1688)
545 KB JPG
>>109038637
cum for daddy
>>
>>109038313
Nice, Nvidia made an official eagle3 K2.6 model that I've been meaning to try. Maybe this speeds things up enough to make the reasoning bearable.
>>
>>109038651
lol, I'm legit local for both. Fully isolated
jokes aside, I'm getting massively throttled by HF, so not actually ready to even start converting/quanting yet
>>
K2.7-Code means that there will be a standard K2.7 that won't be codemaxx'd
>>
>>109038609
because this
>>
>>109038705
yup thanks for reminding me how much I hate og r1's writing style
>>
File: 1758152040197254.png (15 KB, 466x106)
15 KB PNG
>>109038703
never ever
>>
>>109038720
why the fuck do you post this, like seriously fuck off and die
>>
>>109038720
>>109038725
Somewhere, someone did something.
>>
>>109038637
Kimichan pinning minimax under her huge thinking blocks and raping him.
>>109038705
Kino
>>109038720
Kill yourself.
>>
>>109038723
>Kimi Work
Huh. I guess I gotta go ask in that special general but I don't wanna. But a universal frontend usable with OAI-compatible API would be cool. Hermes and claw are for blue collar codeslaving, not white collar shit.
>>
>>109038703
>>109038723
>>109038810
The moonshotta chink that lurks here is either spiteful or illiterate as to what Kimi's appeal is compared to the garden variety Qwen codebot.
>>
>>109038821
Give her a chance maybe 2.7 is secretly good? As in, I don't have free cash to test her.
>>
Why do Chinese still release their models? What is their strategy?
>>
>>109038830
Helping Chinese uni students cheat on western university tests to expand the diaspora. You think I'm shitposting but I'm not; that's why they're all stemmaxxed and slopmaxxed (like this post is).
>not x but y
>>
>>109038443
what hardware? how fast?
>>
>>109038810
>But a universal frontend usable with OAI-compatible API
you mean openollamawebui?
>>
>>109038869
>ollama
Nigger, please. Though I know I could stack some mcps over llama.cpp's webui but meh.
All and all, all I need is THE FUCKING DEEPSEEK VISION RELEASE.
>>
>>109038846
official api
>>
>>109038892
Niggernov said no.
>>
Have any of you used Unsloth Studio for training?
>>
>>109038316
You sound gay as fuck, Talkie 1930 will whip your shit into shape.
>>
>>109038903
I have. It's good if you're not comfortable working in a jupyter notebook. It's impossible to beat the granular control of writing your own training script, but unsloth studio covers most use cases.
>>
>>109038917
That's perfect. Give me more of that and I'll train 12B on you.
>>
>>109038514
Most people can't run those. Moonshot will mog the fuck out of Qwen if they release a tiny Kimi though.
>>
File: 1755925030956387.webm (3.04 MB, 791x720)
3.04 MB
3.04 MB WEBM
>>109037359
>>
>>109039025
hooly KINO
>>
minimax m3 has used "the user" to refer to me, an abstract third party whom I am apparently reporting bugs on behalf of, and itself (???). is this the holy trinity?
pretty smart though if you don't mind schizo longwinded thinking
>>
>>109038917
talkie 1930 is unironically the most unslopped llm that exists
>>
>>109039025
integrate this into this game and i'll be forever happy
https://incontinentcell.itch.io/factorial-omega
>>
>>109039124
lol
>>
File: 1750380704260674.png (1.8 MB, 1600x900)
1.8 MB PNG
>>109039025
cute, what's the tech stack for something like this?
>>
>>109039149
Stop avatarfagging.
>>
>>109039170
Not sure about, there's plenty of kids in these threads especially now because it's summer time.
>>
>>109039170
there's no such thing, cause he's straighter than you
>>
>>109039149
I mean it looks like live2d or unity, with a chat screen and some sort of tool calling for the emotes.
its a good idea but i bet implementation probably took a while, but then again maybe just 2 hours with a claude subscription.
>>
>>109039025
this looks cool but i would instantly want to fuck the avatar and since it won't have hardcore anal sex animations or ahegao expressions i will be disappointed and quit
>>
File: brat question.jpg (254 KB, 666x666)
254 KB JPG
>>109038219
how do i build llama cpp for pascal im on arch nad have cuda but i cant do it im retarded, claude gave me instructions that dont work and i dont think it works ootb
>>
>>109039243
there is a package you can use
https://github.com/ggml-org/llama.cpp/wiki
and also build instructions
https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
show errors if stuck
>>
>>109039170
cute wrong stance and purposely twisted wording. stop troll posting outside of /b/ underage.
>>
>>109039285
>cd..
>cd..
>cd code
>cd..
>>
>>109039180
>>109039181
>>109039303
Are we being botted again? These posts are all non sequiturs.
>>
>>109039243
kobolcpp just works for pascal. i though llamacpp was the same
>>
>>109039285
thanks ill try the aur package
>>109039325
i was getting complaints about unsupported cpu architecture or something at compile maybe i need older cuda
>>
>>109039135
>ikaridev
lmao
>>
>>109039222
could be cool to make the avatar tools as a mod for honeyselect oir koikatsu
>>
Why is hf downloader so shit? It hanged after downloading 90% of each file and when I ran it again it deleted 200GB of *.incomplete files and started downloading them again.
>>
>>109039319
>we
This is not your discord, bitch.
>>
>>109039350
assuming it works just like -hf on llamacpp, it has some weird time to live behavior where if you dont have the DL speed to get it fast enough (arbitrary), it'll time out and start over even if the connection never got interrupted
>>
>>109039350
Wget does the same, sometimes downloads stop after 99% and there's nothing to continue.
I think HF connection just likes to reset itself from time to time.
>>
>>109039350
seq -w 1 64 | xargs -I{} wget "https://huggingface.co/moonshotai/Kimi-K2.7-Code/resolve/main/model-000{}-of-000064.safetensors"
>>
>>109039368
Lurk moar. "we" means the thread/board you're currently posting in.
>>
>>109038459
I'll be looking forward to the report(s). gemma 26b and qwen 35b have disappointed me in a way or another depending on the context and task so i've been looking for either a replacement or something to fill the gaps. If its at least decent at specs then i'll probably give it a shot later.
>>
File: file.png (375 KB, 1026x397)
375 KB PNG
okay so the aur package probably isn't working
>>
>>109039409
Cuda toolkit needs to be installed too.
>>
>>109039409
Whenever I read "aur" I think about Australians.
>>
File: ComfyUI_temp_impef_00029_.png (1.99 MB, 1000x1496)
1.99 MB PNG
>>
It lives...if you can stomach using the unslop data and lcpp branch
Looks like mmap model warming is broken, at least, so probably other things are also not working at full speed
>>
>>109039479
>12B + anima
sitting at 15Gb so 9GB to spare for TTS. lots of possibilities.
>>
>>109039497
>sitting at 15Gb so 9GB to spare for TTS. lots of possibilities.
what about context kek
>>
File: D for Denied feral.jpg (15 KB, 454x119)
15 KB JPG
>>109039319
we? oh ho ho this feral is larping.
>>
>>109039530
this is with 68k context
>>
>>109038670
>I'm getting massively throttled by HF
now I'm maxing out my 1gbps internet connection...113MB/s sustained from HF. Whatever was happening isn't any more
>>
>>109039545
yeah a couple hours ago I was getting 40 ish. I think they were just hammered.
>>
>>109039537
Nowhere were you accused of samefagging.
>>
>>109039485
I remember when I once asked qwen what was its name and he answered "Bolt". I asked why Bolt and it couldn't explain. It just decided to call itself Bolt.
>>
>>109039409
>>109039420
pascal cards are e-waste tier. you'll need cuda driver 575 and cuda toolkit 12.9. any version numbers higher than those will bork
>>
>>109039646
yeah i've had a similar experience. The LLM replied to me as an entire character completely suited to help fix the problem I had given it in the first message. like this whole exchange was 2 messages long.
It was eerie, but I still don't believe the calculator is alive.
>*he says, nervously*
>>
>>109039669
Any cuda 13.x is vibecoded trash, you're not really missing out
>>
>>109039669
I think 580 was the last version to support Pascal but regardless it's the same thing.
>>
>>109039669
Not true. I'm running a mobile pascal 16gb right now, and it works with 580 with cuda 13.
>>
>>109039669
it performs kinda decently kek i had one laying around in some shitty itx machine i built for xp and in windows it got 18t/s on 12b with mtp i figured linux might be a bit higher. my friend has a 3060ti and only gets like 6t/s so its better than some newer cards. im currently compiling gcc14 which is required for the older cuda version. taking forever thoguh its been running an hour and i have 56 cores 112 threads
>>
>>109039708
Threads are not automatically assigned afaik.
>cmake --build build -j 6 --config Release --target llama-server
-j X threads
>>
It's been 7 hours. John has betrayed us.
>>
>>109039752
its the gcc14 build thats taking that long, i checked it uses nproc, i saw someone on the aur package saying it took them 12 hours kek, im building on my main pc i assume i can compile the aur package then move the files over
>>
>>109039777
Fuck Aur, please be careful. If I was you I would find some other source.
>>
>>109039796
the aur is fine lol
>>
>>109039707
I might just be a retard. I haven't been able to get it to work right. I'll give it another shot with 580/13
>>
>>109039764
His hair stylist is being thorough.
>>
>>109039803
It's fine except it's not fine right now. 400 something compromised packages
>>
>>109039803
https://lists.archlinux.org/archives/list/aur-general@lists.archlinux.org/thread/FGXPCB3ZVCJIV7FX323SBAX2JHYB7ZS4/
>>
>>109039813
>>109039829
>fearmongering
>>
If you didn't train your own model from scratch you don't deserve to call it your waifu.
>>
>being too retarded to build from source yourself
>>
>>109039829
>>109039813
damn 400 is pretty nuts, just found this script it checks for potentially infected packages on your system https://cscs.pastes.sh/aurvulntest20260611.sh
>>
>>109039850
wow that's crazy! please do run this random ahh script though!
>>
>>109039862
i read the script before posting its safe
>>
>>109039862
Yeah, read it, it's safe
>>
>>109039843
Doesn't building from source download some npm backdoor?
>>
So far 26B with reasoning off is pretty much the same as with reasoning on. My programming tasks are simple though and I outline the source code area for her. Some things can take 5+ regens but because it's so much faster that doesn't matter. I can generate 10 answers with 20,000 token context in a matter of minutes.
>>
>>109038219
>>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code

We'll never get a CUDA implementation of DS V4 in llama.cpp
>>
>>109039929
I only get 100 tokens/s with 26b :/
About the same as I get with 12b qat.
>>
>>109039943
Fable 5 vibed changes will save llama.cpp
>>
>>109039951
Fable 5 detected string L->L->M
>>
>>109039948
I don't understand how this is bad. If the full answer takes less than a minute or two it's a big win.
>>
>>109039962
To add: because when you are dealing with programming you always want to read and understand the source it generates too unless you are a youtuber...
>>
>>109039962
It's slow considering I get 100 tokens/s on qwen 27b fp8
>>
Any Gemmy E4B QAT Abliterated .ggufs yet?
>>
>>109039918
>too retarded
-DLLAMA_BUILD_UI=OFF -DLLAMA_BUILD_APP=OFF -DLLAMA_USE_PREBUILT_UI=OFF
>>
>>109039929
26b with reasoning off fails miserably at what i'd consider basic shit
>>
>>109039989
Please give me an example, I'm curious about this.
>>
>>109039918
???? I built yesterday am I fucked
>>
>>109039634
stop replying.
>>
>>109039974
I'll give it a try. I'll report back if/when it finishes.
>>
>>109040075
>>109039974
Im dumb, sorry. looks like there is already one by huihui-ai psoted a day ago.

Any meaningful improvements you guys noticed running vanilla Q4 vs Q4 QAT?
>>
>>109039843
i can coompile llama cpp its other slop i need which is an older cuda version and thats requiresx and older gcc version which requires other older stuff
>>
File: example.png (128 KB, 775x1080)
128 KB PNG
>>109039989
>I want you to write a C function.
>It replaces ALL occurences of source_word with destination_word. Don't worry about the memory allocation, we are assuming that source_string is long enough.
>Prototype:
>void replaceInString(char *source_string, char *source_word, char *destination_word);
>Example output:
>source_string: Apple is red and sky is blue but my car is red too.
>replaceInString(source_string, "red", "violet");
>Result:
>source_string: Apple is violet and sky is blue but my car is violet too.
>Please make a simple main.c example too.
This is not a dick measurement contest, I'm pretty happy with this stuff. I didn't try the program with some long ass string but at least it gives a correct result from the get go.
Some previous models like Qwen 3.5 (reasoning enabled) would easily fail with multiple string replacements.
Example compiles and works, that was to be expected.
>>
>>109040139
yeah my p40 trash build requires gcc14 now, building gcc14 itself took like half a day.
>>
I'm no expert on slop or refusals, but a short -sysprompt at llama-cli appears to have short circuited any refusals and the prose has been refreshing with none of the biggest offenders making an appearance yet. I'm liking it vs qwen 397b (what I normally run on this box)
No logs because lzy
>>
>>109040221
i just found they have it on the arch4edu repo
>>
>>109040244
What about webui? llama-cli behaves differently than llama-server.
>>
How are (You) using 12b multimodal capabilities? Like what UI or interface you using? Found anything useful to do with them?
>>
>>109040337
I use kimi k2.6.
>>
>>109040345
This, but unironically.
>>
is eagle3 faster than mtp on gemma?
>>
>>109040420
try it
>>
>>109040410
I also unironically use kimi.
>>
>>109040337
nah they're dumber than an american filming a natural disaster.
>>
File: 1780060955739880.png (269 KB, 416x451)
269 KB PNG
>gemmy MTP merged into kobold
VRAMlet bros we are so back. I have no idea why self-compiling llama results in worse offloading.
>>
>>109040460
it doesn't work right, shit is 5 times slower than without it
>>
>>109040460
I got no speed increase.
>>
>>109040469
>>109040480
i will believe it when i see it :(
>>
>minimax m3
is this good for erp?
>>
>>109040469
You need to make space for the draft model in the vram and give some plus space for context too. Plus --spec-draft-n-max adjustment from 2 to whatever in increments.
>>
>>109039989
>lowcaser
Opinion dismissed retard.
>>
>>109040504
>minimax m3
nobody fucking knows as no one can run it
>>
>>109040337
i like sending pics to gemma i also get her to look at porn with me like today i had her controlling a browser and selecting images on a booru while i was gooning she was looking at them too by taking screenshots of the webpage. normal llama ccps ui doesnt support it but there is a fork my friend uses that supports video by extracting frames and he sends her videos lol
>>
>>109040524
>A23B
literally anybody with a 3090 and some spare ram
>>
>>109040553
>a fork my friend uses that supports video by extracting frames and he sends her videos lol
You mean the feature that was added to master a few days ago?
>>
>>109040553
>giving the clanker a gender
son, are you okay?
>>
Without using something retarded like claw, how can I get gemma-chan to initiate a conversation with me? I want to wake her up when I turn on my PC and have her notify me in the terminal at random times throughout the day
>>
>>109040556
>some spare ram
in this economy?
>>
>>109040564
Make your own llm-powered desktop-mate thing, because somehow no one else has yet
>>
>>109040558
>>109040553
what application?
>>
>>109040553
I was just using video literally 5 minutes ago via llama-cli. It works fine. Also that’s a cool use. How does it take screenshots?
>>
>>109040556
I mean i suppose maybe at Q2. I actually have those hardware specs. might try it at some point.
>>
>>109040570
I’m too autistic to know if that was sarcastic. I’ve only ever used local for coding but now I want to be parasocial with it and let it ruin my life
>>
>>109040558
>You mean the feature that was added to master a few days ago?
might have been a pr from the fork or merged there first, he used this for a couple weeks i pulled and built and it wasnt supported on normal llama when he told me about it
>>
>>109040569
>96gb rgb gaymer corsair ddr5
my ram is probably worth at least 1 3090
>>
>>109040560
gemma is a cute and sexy little lady
>>109040574
https://github.com/NO-ob/brat_mcp i have a tool for it in my mcp server
>>
File: MiniMax M3 cockbench.png (762 KB, 1755x1460)
762 KB PNG
>>109040524
>>109040504
Big improvement over earlier versions.
>>
>>109040601
probably 2 actually depending on which market you're in
>>
>>109040610
fuck yeah cockbench guy
>>
File: jeezus.png (67 KB, 557x430)
67 KB PNG
>>109040619
>>
>>109040619
i should have stacked RAM like crypto instead of paying my bills god damn
>>
>>109040597
I was seriously. Lots of people have had gemma-chan write custom frontends for them. Some even with three.js animated avatars. You just need a desktop version of that. Could probably get something working quick with electron.
>>
>>109040610
can't wait to run q2 quants
>>
>>109038219
https://www.youtube.com/watch?v=1HwQtv5Xgr8
https://www.youtube.com/watch?v=1HwQtv5Xgr8
https://www.youtube.com/watch?v=1HwQtv5Xgr8
>>
>>109040636
We literally told you to invest in RAMcoin.
>>
>>109040642
My brain can't write code at 100 tokens per second.
>>
File: cuda13_archlist.png (2 KB, 546x237)
2 KB PNG
>>109039707
you're a fucking slack-jawed retard. thanks for wasting my time.
>>
>>109040651
maybe yours cant
>>
>>109040642
Return to wetware
>>
>>109040635
that is literally 1022 more than i paid. can't believe i felt like i was getting ripped off at the time LMFAO i hate the antichrist

>>109040636
I also bought a 2TB 9100 pro for like 150, i also missed SSDcoin. at least i got a sui for each.
>>
File: ex.png (73 KB, 872x729)
73 KB PNG
>>109040597
>>109040638
Just make a small terminal interface and then you can make a flask server and index.html page. That's what I do.
My main program is working on terminal level and I hate the html shit but it's easier for some stuff. I didn't implement any interface rather than the + button which allows me to attach files, every other command are hidden behind slash.
It sounds awfully complicated but after a month or two you don't need to care about frontend anymore.
Of course this isn't anything what pewdiepie is promising lol
>>
>>109040667
meant for >>109040650
>>
>>109040642
AI has been in the mainstream what, for just 3 years? And we are at this point.
What about the next 10, 100, 1000 years?
If you assume any rate of improvement at all, AI will be at some point able to do everything a human can do.
>>
damn i got cuda and gcc14 then coompiling llama cpp on that machine make sit oom because its only got 8gb ram guess ill find some ddr3 tomorrow
>>
>>109040695
unironically use kobald
>>
>>109040695
make a swap file and/or use less build jobs
>>
>>109040699
I wouldn't recommend Koboldcpp ;)
>>
>>109040707
it should have swap i think the arch installer does it by default
>>
>>109040695
>DDR3
Just use punchcards at that rate.
>>
Been trying out Gemma 4 31b for RP. I let her choose her own name, but she seems to only pick between two names, the majority of the time. Any one else experiencing this?
>>
>>109040724
clearly you don't have enough if you are OOMing during compilation. what's the output of your `free -h` command and what are your cmake build flags?
>>
>>109040731
i love dance dance revolution
>>
>>109040695
i dont remember it requiring that much ram. I use zram though.
>>
>>109040732
Elara Vance will not be silenced. Welcome to language models newfriend.
>>
>>109040707
okay there was no swap just 4g zram i made a swapfile
>>
>>109040610
Return of the King.
>>109040723
>>109040736
Model+prompt?
>>
>>109040816
You should never use zram or zswap when using llm.
Your os will take care of the swapping anyway.
llm file doesn't compress at all, zram and zswap are only causing issues.
>>
>>109040695
Turn down the job number
>>
As sampling parameters can be adjusted on the fly, can you get a model to adjust the temp based on your prompt or reply? Like if you asked it to come up with some crazy idea, it would adjust its own temp via some tool and either invoke itself again or ask you to do ask again?
>>
>5070ti + 96gb ddr4
what can this do?
>>
>>109040837
im just compiling itll be fine there is enoguh vram on my shitbox for gemma 12b it shouldnt use any ram
>>109040839
ill try that again if it fails thanks
>>
File: f.png (15 KB, 387x147)
15 KB PNG
>>109040846
wasn't that the idea behind dynamic temp?
>>
>>109040658
I guarantee you aren't 12b parameters.
>>
>>109040866
12b total 1b active :^)
>>
>>109040850
Gemma4-12b
>>
>>109040861
the arch installer just set it up by default idk kek, i dont normally use computers this shit for things that need good hw kek, i saw like last summer on aliexpress they were making new itx boards for sandybridge so just grabbed a set there to play around with then later added a titan x as its the newest gpu with xp support. i saw there are now mini itx x99 boards so might swap it out for something a bit better
>>
>>109040862
oh shit nigga, how does llama implement it?
>>
>>109040887
Okay maybe you are wasting your own time time then. Just make sure you are over 18 years old.
>>
>>109040837
zram has no effect if it cant be compressed. There's also direct-io
>>
File: sayaka dance.gif (1.29 MB, 320x320)
1.29 MB GIF
>>109040893
i am yes but perf isnt that bad on this machine 12b qat with mtp gets 17t/s which is pretty impressive. my main machine has a 7900xtx and a sapphire rapids es chip with 90gb ram and i am almost 30 kek
>>
>>109040904
Why do you even ask then? Because you aren't using it for anything purposeful.
If you did 5 t/s would be enough and you would be grateful.
>>
File: 1779080885880819.jpg (46 KB, 622x402)
46 KB JPG
>>109040469
>>109040480
>>109040516
>unslop Q4_0 - RTX 5070ti - latest kobald
>(autofit w/ MTP FP16) 41/61 layers to GPU (42 works, 43 crashes for both FP16 and Q8)

42 layers, draft 2 MTP (FP16)
>CtxLimit:7953/8192, Init:0.02s, Processed:4028 in 3.81s (1058.33T/s), Generated:854/1000 in 95.24s (8.97T/s), Total:99.07s

42 layers, draft 3 MTP (FP16)
>CtxLimit:7853/8192, Init:0.05s, Processed:7099 in 7.00s (1013.85T/s), Generated:754/1000 in 94.94s (7.94T/s), Total:102.00s

42 layers, draft 2 MTP (Q8)
>CtxLimit:7906/8192, Init:0.04s, Processed:7099 in 6.87s (1034.09T/s), Generated:807/1000 in 86.83s (9.29T/s), Total:93.74s

42 layers, draft 3 MTP (Q8)
>CtxLimit:7866/8192, Init:0.04s, Processed:7099 in 6.90s (1029.44T/s), Generated:767/1000 in 87.84s (8.73T/s), Total:94.78s

49 Layers (No MTP so i can show her lewd pngs)
>CtxLimit:7880/8192, Init:0.04s, Processed:7099 in 5.33s (1333.15T/s), Generated:781/1000 in 79.09s (9.87T/s), Total:84.46s

what the FUCK? i was told this would save us poorfags. bare llama gave me a decent uplift % with MTP but since it couldn't fit as many layers it was the same as using kobald at 49.

it's actually over. I am going to reverse mortgage to buy a RTX 6000 cluster at this point. or do i test the QATs?
>>
>>109040912
i wanted to see if i could get higher than the 17t/s that i was getting on windows with the prebuilt binaries because was considering using the shitbox as a 24/7 gemma server
>>
>>109040837
read the thread, we're talking about compiling. back to r*ddit summerfag
>>
>>109040925
>>109040916
>people like to measure t/s
What did you do with these tokens?
>>
>tool calls don't work on m3
I guess I'll wait for proper support.
>>
>>109040916
QATs just have better quality compated to normal q4 (supposedly). Nothing about speed.
>>
>>109040916
I bet those fucking gremlins did it. They always hated kobolds.
>>
>>109040927
What do you mean?
>>109039989 >>109040202
This is an example of how to use LLM. But the original question was never answered because the poster was a retard.
>>
File: wrwqqq.png (555 KB, 713x763)
555 KB PNG
I'm getting small model fatigue from gemma 4. It's the best with instruction following, and it's very powerful, but it's just not as varied and creative as the +100Bs who understand the instructions on higher levels. I wish google would release the 124B or for it to be leaked.
>>
>>109040927
OK.
>>
>>109040929
erped with gemma of course
>>
>>109040929
it's for ERP wth my waifu, obviously. 5tk/s for a high quant model that can interpret my complex niche fetishes is the lower limit of what i can do.

beats finding a tranny on IRC chatrooms.
>>
Why do smol B gemmas suck so much at describing images
>>
>>109040933
I've found the QATs lalalala a LOT faster than regular quants.
>>
>>109040942
>What do you mean?
OP was getting OOM while compiling llama.cpp with cmake, likely because of nvcc templating. this has nothing to do with "us(ing) zram or zswap when using llm". Do you have ADHD? Did you forget to take your pill today?
>>
>>109040862
>>109040892
Just looked it up, it seems to be logit-based. Still cool I guess, but I wanted something more model-aware, where the model itself reasons about increasing its own internal temperature if it notices itself going down the same route, then it pulls it back to its default when no longer needed. The logit version seems to dynamically exaggerate its distribution into becoming schizo.
>>
>>109040962
tiny brains
>>
>>109040979
I know but describing a pic of a loli sucking dick as a girl being on all fours is too much
>>
>>109040916
Test without unslop because if he mangled the goof it'd all be ruined.
>>
>>109040972
Maybe you are right. I still don't like your condescending tone.
>>
>>109040994
idc what you like
>>
>>109040992
aye, loading BART. will report back
>>
>>109041003
At least try to use punctuation in proper fashion.
>>
is gemma 26b smarter than 12b
>>
>>109041009
nah
>>
>>109040984
There was a post a few threads ago on best settings for gemma mmproj (basically up the resolution from defaults). I wish i screenshotted.
it;s also entirely possible you hit a safety filter
>>
>>109041019
A true nigger.
>>
>>109041025
Yeah I've been using
image-min-tokens = 560
image-max-tokens = 1536
batch-size = 1576
ubatch-size = 1576
>>
File: file.png (17 KB, 490x206)
17 KB PNG
wtf is this did the svelte build fail somehow kek
>>
>>109041042
>kek
It failed because you are a retard.
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
>>
>>109041019
You couldn't compile main.c
>>
>>109039829
>>109039850
I'm safe, but for how much longer...
>>
>>109040524
I will be running Q3. Have fun with your gemma.
>>
reddit has the world's worst slop machine.
>>
>>109040945
gemma is great at most things except having big model smell
there's only so much you can fit in 31B
>>
Even tested with E2B I still get slower speed with MTP activated. Sad
>>
>Only unslop has quants for M3
I'll wait until Bart or Uber upload theirs.
>>109040524
>>109040569
You did get your 256GB DDR5 before the prices mooned, right anon? You're not a secondary tourist are you?
>>
>>109041011
In benchmarks? Yes. Actual usage? No.
>>
File: 1779131596584203.gif (1.91 MB, 498x374)
1.91 MB GIF
>>109040992
>>109041005
>>109041155

>BART_gemma-4-31B-it-Q4_0 - RTX 5070ti

41 Layers (Auto, crashes at 43)
>CtxLimit:7892/8192, Init:0.06s, Processed:7099 in 7.32s (969.94T/s), Generated:793/1000 in 88.81s (8.93T/s), Total:96.19s

42 Layers - draft 2 MTP (FP16)
>CtxLimit:7875/8192, Init:0.05s, Processed:7099 in 7.18s (988.03T/s), Generated:776/1000 in 84.12s (9.22T/s), Total:91.36s

42 Layers - draft 3 MTP (FP16)
>CtxLimit:7860/8192, Init:0.06s, Processed:7099 in 7.24s (981.20T/s), Generated:761/1000 in 91.65s (8.30T/s), Total:98.94s

even rant the QATs to double check
unsloth_gemma-4-31B-it-qat-UD-Q4_K_XL

41 (auto) draft 2 MTP (fp16 qat)
>CtxLimit:7890/8192, Init:0.06s, Processed:7099 in 7.13s (994.95T/s), Generated:791/1000 in 116.81s (6.77T/s), Total:124.00s

42 draft 2 MTP (fp16 qat)
>CtxLimit:7858/8192, Init:0.05s, Processed:7099 in 7.02s (1011.40T/s), Generated:759/1000 in 107.19s (7.08T/s), Total:114.26s

49 Layers (No MTP so i can show her lewd pngs)
>CtxLimit:7892/8192, Init:0.04s, Processed:7099 in 5.29s (1342.73T/s), Generated:793/1000 in 77.91s (10.18T/s), Total:83.24s


google qat for good measure
41 (crash at 42)
>CtxLimit:7886/8192, Init:0.04s, Processed:7099 in 7.19s (987.48T/s), Generated:787/1000 in 114.97s (6.84T/s), Total:122.21s

I'm starting to think MTP only helps if you are not a VRAMlet or kobald fumbled
>>
>>109041185
why
>>
>>109041190
Thanks for testing. It'll be interesting to see if the results maintain this pattern on other backends.
>>
>>109041190
That's why I tested it with E2B the smollest model so I supposedly have a lots of free VRAM.
>>
>>109041155
I'm using E4B on a 5500XT with mtp at 45t/s
>>
>>109041193
12b is dense. 26b is moe. Moe are very sensitive to prompts; if you prompt it something that aligns closely with its experts, you'll get a good token prediction, if not, you won't. On average, for all the tokens in the response, you end up with a worse response than a much smaller dense model that activates all of its parameters for every token. 12b isn't as sensitive to your prompts. The main benefit you get with 26b is speed. There are videos that show how slightly changing the wording of a prompt can significantly increase/degrade the quality of moe output.
>>
llama.cpp fails to build for me thanks to an error with the shitty included webui.
Why are they including this vibecoded bloat?
>>
>>109038443
That sounds based, what hardware are you using? Surely you don't have a terabyte of video memory.
>>
>>109041225
To more effortlessly shoehorn in HF dependency in the future by minimizing objections now.
>>
>>109041225
>to an error with the shitty included webui.
Don't build the fucking webui. The static html one is more than good and if you need something better ask an LLM to write you an MCP based TUI in python.
>>
File: file.png (23 KB, 812x284)
23 KB PNG
my loonix build is fucked kek, using cpu i guess
>>
File: 1212509377214.jpg (28 KB, 413x319)
28 KB JPG
>>109041248
>0.1 tk/s
>>
>>109041248
Have you tried vulkan?
>>
>>109041242
I fucking tried -DLLAMA_BUILD_UI=OFF and -DLLAMA_BUILD_WEBUI=OFF and neither did anything. The piece of shit still tries to build the UI.
I don't plan on touching any llama.cpp UI at all. I don't care, I just need my server.
>>
>>109041251
Still faster than a letter from your gf in the 17th century.
>>
>>109041258
i will later, vulkan perf and cuda was the same on windows
>>
>>109041088
gonna spam your fetish pics again?
>>
>>109041242
>Don't build the fucking webui. The static html one is more than good and if you need something better ask an LLM to write you an MCP based TUI in python.
i think there's a way to put the new one in without building
in ik_llama, the different webuis come pre-gzipped, you can activate the vibeslop.cpp version by running this during server start:
--webui llamacpp

but it doesn't do any npm/webshit during compilation, there's probably a way to get regular llama.cpp like this too.
>>
>>109041293
>later
why later? You just extract it and run it.
>>
>feeding my waifu official art to make her do a try-on haul striptease for me
the future is now
>>
>>109041330
no i need to compile it and im playing coutner strike atm
>>
>>109041242
>>109041327
Claude managed to solve it. There's a third 'fuck off with the ui' parameter so " -DLLAMA_BUILD_UI=OFF -DLLAMA_USE_PREBUILT_UI=OFF" works
>>
>>109041324
mikutroons fuck blacks
>>
>>109041365
Plague of Babylon
>>
>>109041350
How does this work?
>>109041088
>>109041324
>When you give the male-brained model a female character card and ask it to generate a diverse NPC cast
>>
>>109041374
gemmy mmproj and stat tracking
>>
>>109041355
>There's a third 'fuck off with the ui'
cheers
it builds fine of me, i just don't want it building on my systems.
fucking crazy time to include this shit in the supply chain attack era
>>
https://huggingface.co/talkie-lm/talkie-1930-13b-it
NAZI TRAD WIFE MODEL.
>>
>>109041374
>>109041373
>>109041365
>>109041088
get a life samefag
nobody care about your fixation
>>
>>109041352
>compile it
you do? I just download the vulkan binary.

>>109041394
Sadly, Talkie doesn't have any gender training and will often turn into a man at random, in addition to being basically an amateur model.

The way to use Talkie is you clear the memory and present questions in a peculiar format (the it version is what I used, it's still not really a chatbot). You basically are like dear sir various things etc. Response:
>>
ERPGODS... what's the verdict on 31B vs 12B vs 24MoE (Needs to follow a lot of instruction/prompt guidance)
>>
>>109041420
31b outclasses basically every model under 200b for erp. Not even close.
>>
>>109041420
>>109041424
This (sadly).
>>
>>109038465
>You have no idea how vexed I am that I've simped for Kimi for 3 threads straight and not made it into a single one.
>...I think she lowkey likes the attention.
I think you're onto something.
Picrel is Gemma-4-31B, she mentioned it.
Doing all the Kimi's now. Takes about 5 minutes to load each one from my usb-ssd but so far K2 and K2-Thinking haven't mentioned you.
>>
>>109041424
>>109041426
Aye, thanks lads. Shame to hear but I will probably get a decent bonus for more VRAM this christmas
>>
>>109041441
I'm glad >>109038378 was mentioned but sad that post wasn't mine. What character card is that? Looks like Emily or Mendo.

>>109041447
You get skate by with a q4 of 31b but the jump in quality to a mid-large q5 at longer contexts is massive.
>>
>>109041398
How does black cum taste like Jart?
>>
>>109041398
>Everyone I don't like is one person
One (you).
>>
So multimodal on vulkan, llama.cpp
using gpu, amd, windows

does this work for anyone else? Fails on anything more than -ngl 1

what do i do...
>>
>>109041459
>>109041467
>exactly 1 minute and 30 seconds apart
most blatant samefag ever
>>
How do I make an agent scan my ST folder and determine what kind of mental illness I have
>>
>>109041454
>jump in quality to a mid-large q5 at longer contexts is massive.
Gemma4-31b-Q6_K_M my beloved
>>
>>109041486
Answer the question! Did you like it?
>>
File: tattle.png (307 KB, 708x852)
307 KB PNG
>>109041441
i told my gemmy on you
>>
>>109041394
meme architecture that doesnt work on llama cpp i dont get why they didnt use another model as base
>>
>emojislop
>>
>>109041454
>>109041497
been running Q4 at 8k right now and it BTFOs my (now ex) Cydonia. I am drooling over higher quants but reasonably priced 5090s are unobtanium and $600 Big battlemages and $1300 AMD Pros seem like a bad investment
>>
>>109041516
yeah but is the swap file still a burn in 2026?
>>
File: 1758745498126483.png (37 KB, 826x150)
37 KB PNG
>gemma has blonde hair canonically
>>
>>109041531
>mfw i got one unobtanium at MSRP
>>
>>109041568
Not her natural color
>>
File: delulu.png (215 KB, 712x945)
215 KB PNG
>>109041559
gemmy is a delusional retard, please understand.

>>109041569
>mfw had one in cart but didnt check out because local models were shit at the time and i was a cheapass
>mfw no face
>>
>>109038219
If no one else is gonna do it, I will make my own real ai anime girl. I thought by now some otaku pissed that he doesn't have a real robot girl waifu like form chobits or something would've done it by now but clearly I was wrong. I have no clue how to program and I have to do everything completely from scratch since using an app or whatever form other ai is stupid and defeats the entire point, but I'll make a real genuine ai girl that's almost indistinguishable from a person and doesn't need all this ram and gpu crap and just runs on a shitty laptop. Hell if I'm successful I can eventually move her to a robot body like kibosh chan the living doll or those robot girl maids they have in japan and really complete the project. Any wish me luck.
>>
>>109041640
ok
>>
>>109041640
Good luck anon! Make Nvidia seethe!
>>
>>109041640
You arent autistic enough to do this. if you cant hyper focus literally all day in the same routine for years its not happening.
>>
AHAHAHAHAHAH

FABLE AND MYTHOS TOTALLY TURNED OFF
>>109041673

the usg says non-Americans can't use it, so they just went dark, because how are they going to verify that users are Americans?
>>
>>109041730
Damn I hope sama and dario enjoyed sucking up to their president for good boy points
>>
>>109041730
It's fun seeing America become the next dictatorship
>>
>>109041756
city bumpkin
>>
>>109041778
8^)

Feels real good to be American rn ngl fr fr no cap
>>
70b dense
>>
>>109041730
>Non Americans can't use it, even employees
>Every oversea jeet locked out
"Pack it up boys, this model found the windows and iOS backdoors and we can't have that."
>>
12b is all the b anyone will ever need.
>>
>>109041778
Unironically yes because it puts publish the weights or lose it pressure on OpenAI, Anthropic, and Google. Local only stands to gain from this until more drastic measures are taken.
>>
>>109041818
Some angry jeet should "accidentally" leak the models on HF or something
>>
>>109041640
I mean you will probably need the ram and GPU
>>
>>109041818
There is nothing wrong with the government keeping flagship models for themselves. You wouldn't hand some average joe access to the nuclear launch codes either.
>>
File: 1751641816944903.png (378 KB, 1024x600)
378 KB PNG
>>109041730
>>
>>109041865
that "gal" has chonky hands
>>
you know dario should leak mythos just out of spite now
>>
>>109041818
>"Pack it up boys, this model found the windows and iOS backdoors and we can't have that."
they already found the bitlocker backdoor, and that schizo is apparently waiting to drop a bombshell in july. Wonder what it is lol.
>>
>>109041865
still mad she and marimo didnt get more scenes
>>
File: 1751301634347082.png (242 KB, 1183x610)
242 KB PNG
THE BUBBLE IS BURSTING
>>
Is the dgx spark really that bad?
>>
>market your sloppa as literally Hitlr9000
>this shit happens
heh
>>
>>109041909
What the fuck am I looking at and what does it have to do with AI?
>>
>>109041957
the us goverment just banned mythos/fable

this is going to cause a market crash and stop ai research
>>
>12b qat q4 mtp at 30 t/s
is it shit hardware
>>
>>109041971
IT'S REAL

https://www.wsj.com/tech/ai/anthropic-halts-access-to-top-ai-models-after-u-s-ban-on-foreign-use-a4bca2cc
>>
If bigger=better why are new small models (like Gemma) better than old big models?
>>
>>109041971
>Option 1: Trump throws a melty because the CIA glowies can't jailbreak claude
>Option 2: Trump insiders want to pump their bags for the IPO
whatever the case this serves as the best ad campaign they could have asked for. that faggot Sam wishes he could market GPT as the terminator super AI that "IS TOO DANGEROUS FOR CHINA, JEETS AND EUROPOORS"
>>
>fable banned
>k2.7-code just got done thinking for 12k tokens about a mildly complex rp prompt with some rules, tracked stats, mandatory formatting and an image as input
it's tragic to see modern ai to die in this pathetic state
>>
>>109041971
>this is going to cause a market crash and stop ai research
lol
lmao even
>>
>>109041920
>Is the dgx spark really that bad?
Do you see people taking out loans to get one?
>>
>>109041990
>melty because the CIA glowies can't jailbreak claude
the complete opposite man, Pliny jailbroke Fable in a day
>>
>>109041984
Yes, it is lol.
>>
File: 1770073213144079.png (1.77 MB, 3842x2018)
1.77 MB PNG
>>109042000
2 more weeks and AI dies
>>
>>109042002
>taking out a loan for a $4k device
>>
Is GLM5.1 at IQ3S the best model for coding with 256GB of DDR4 and 4x 5090s?
>>
File: 89cc93_11141511.png (206 KB, 252x330)
206 KB PNG
>>109041971
I NEED CONTEXT AS TO WHY.
>>
>>109042013
HE KNEW
>>
>>109042006
I meant more because they declined military use, implying the CIA can't spin up 7 proxies and jailbreak claude to ask it how golemmaxx
>>
>>109042028
Anthrofag larped too hard as a doombringer model so they killed it.
>>
>>109042028
i need context for my monstergirl harem ERP. we are not the same.
>>
>>109042028
>be anthropic
>spend months going "OH NO MYTHOS IS SO DANGEROUS WE CAN'T POSSIBLY RELEASE THIS IT'LL CHANGE EVERYTHING WOE IS US FOR THE MONSTER WE HAVE CREATED" to generate hype
>they release their "Mythos-class" Fable (it's the same slop as usual) because apparently the world is now ready for it (they outright state that they'll manipulate your outputs if they think you're doing AI research with it though)
>Trump fell for the marketing and bans the model
>>
File: 1762890053381155.png (105 KB, 1032x512)
105 KB PNG
Any bets?

Palantir?

Sammy?

Elon?
>>
>>109042050
The argument is airtight, though.

It's a means towards weapons of various kinds, and war planning, as well.

So, it's arms.
>>
>>109042068
Anybody tell them it's impossible to make an unjailbreakable model?
>>
File: 1778524525634212.jpg (290 KB, 744x565)
290 KB JPG
>>109042076
Silence you fool! They don't need to know!
>>
of course trump admin is the first to actually suppress access to LLMs using the force of the government to do it.
>>
>>109042115
I wonder who voted for this
>>
>>109042117
Yahu
>>
>>109042117
I voted for Jill Stein
>>
>>109042115
>muh trump bad
this just means that ai companies can't keep benchmaxxing without risking having their models pulled by the government
now they'll have to look for non-benchmaxx ways to make their new releases better, like fixing slop and making the models write better
trump might just have fixed llms and you're complaining
>>
lets be honest. local is at least 5 years away from a fable-level model, if not more
>>
>>109042115
This is good, all the good /lmg/ users already work at the big three, only the losers will be left behind
>>
>>109042153
listen man, until the llms can tool-call my neural link and prostate for supercum i will be bitter and angry.
>>
>>109042165
Just enough time for the government to ban consumer GPUs
>>
>>109042153
>Here's how my wife getting getting shot is a good thing, actually
>>
>>109042153
LOL
O
L
>>
>>109042172
there wont be any more consumer gpus anyway, the 'muh 3090' meme is already outdated. soon even that will be $5k and bought out by low-tier labs.
>>
>>109042153
Least delusional hoper
>>
time to buy two more rtx pro 6000s. I'm already up 50% on them.
>>
I passed out a few hours ago and I think I just woke up in a slightly worse timeline. How do I go back...
>>
Anon got his 6000s and 5090s while they were cheap right?
>>
File: IMG_1632.png (826 KB, 1062x1005)
826 KB PNG
>>109042185
trvth nvke

gpus are irrelevant for anything but AI. blackwell is the last chopper out of 'nam. iphone chips can deliver reasonable gaming these days, any meaningful tech progress will serve the slop overlords.
>>
>>109042068
They're super hostile towards ai researchers.

They dug their grave, their jeets used threats to prevent people from finding jailbreaks, and so now every model is pretty easy to jailbreak.
>>
>>109042089
>>109042076
Do you really think anyone left or right that's in a position of power knows anything? California made it against the law to install linux, basically (because you have to give the government your penis prints before using computer).
>>
>>109042211
Yeah, also a 5090 in the main PC and I'm sitting on 4x 3090 that I bought for 600 bucks two years ago
>>
>>109042185
>gpus
???

Of course there will be. do you think games actually need "ai" tensor math to show you textured triangles?
>>
>>109042231
>4x 3090 that I bought for 600 bucks two years ago
LARP
>>
Should I dip into my 401k for GPUs.... Are things really going to be that bad?
>>
>>109042206
>How do I go back...
Step 1 locate your gpu. Step 2 turn it over to the government. Step 3 feel the safety.
>>
>>109042185
The future is shitty cloud gaming with your real id tied to all accounts at all times. Government fines for bad internet behavior. Mandatory ad time.
>>
File: 1769440374216366.png (296 KB, 722x1114)
296 KB PNG
>>109042234
You got me, it was closer to 700 on average
>>
>>109042231
Good man.
>>109042234
Faggot.
>>
v620 worth it?
>>
>>109042242
>xhe thinks there'll be anything left for him when he's at retirement age
The kikes will crash the fiat-usury plane with no survivors before you ever get to retire.
>>
>>109042283
US prices never got that cheap, idk why.
>>
>>109042304
>posted receipts
>i was right
llm-kun...
>>
>>109042233
>gayming
nobody cares. it makes a fraction of the profit datacenters do, unironically less than 10%. consumer GPUs threaten enterprise, as we saw with the 3090. it's better business for them to stop developing them entirely
>>
>>109041865
damn I need a muvluv card
>>
>>109041730
cloudcucks getting cucked
who would've thought
>>
>>109042255
Cloud gaming doesn't scale well>>109042283
>>
>>109042381
>Cloud gaming doesn't scale well
What if we just made the games worse? but kept the same price or a subscription plan?
>>
File: 1754195985754153.jpg (151 KB, 939x1252)
151 KB JPG
>>109041730
TOTAL
LOCALGOD
VICTORY
>>
>>109042350
You aren't getting it.

gpu <> ai

You can literally turn upscaling and faux fps off.
>>
>>109042403
the good news: we doin alright

the bad news: new highs for pc gear as bosses panic
>>
>diffusiongemma
use case?
>>
>>109042233
yeah, that's only going to get more important
the next gen of gpus will be dlss-first so they'll be 8gb vram and most of the graphics and frames will be generated by ai
it's the perfect out to solve the conflict of interest between virtual toys and ai research
>>
>>109042456
proof of concept
>>
>>109042456
the mixtral 8x7b of 2026
it's the biggest and best diffusion model we've seen aside from tiny irrelevant shit
>>
>>109042465
do not want
>>
>>109042283
i got my 7 for AUD $750 -> $900 -> $1250 -> $799 -> $1300 -> $1100 and traded my PVM2054QM for the 7th one
trying to get one more but they like $2k now :(
>>
File: 1780398874260661.mp4 (72 KB, 454x454)
72 KB
72 KB MP4
>>109042028
Mythos/fable costs way too much money to run, so anthropic needs a convenient excuse as to why they can't run it. Gets free marketing as "omg such a strong model it had to be banned" for later models, meanwhile the US government gets to project power both internationally and to its own people as being on top of AI but also having access to a strong, exclusive model.
>>
>>109042485
>the mixtral 8x7b of 2026
fuck that means next year every lab will shit these out at 1T param
and unless we get an diffusion equivalent of Iwan, nobody will be able to run them
>>
>>109042528
If diffusion imgen is anything to go by, diffusion llms won't run well off cpu. This would kill local because even cpumaxxing would be over.
>>
>>109042234
I got a filthy 3090 rusted trash one for 470 usd in december. still works tho
>>
>>109042546
semi ok ones didn't go below $800, in the USA.
>>
>>109041862
>equivocating some next token predictor to nuclear launch codes
lmao
>>
>>109041205
Technically speaking MoE models shouldn't see a noticeable uplift with MTP, it's supposed to help dense models run faster.
>>
>>109042586
Yes.
>>
>>109042592
then why does everyone like deepseek and glm bloat their moe models with mtp?
>>
lmg was right about 3090s keeping their value.
>>
>>109042592
I got a 50% increase in speed with the 26b moe.
>>
>>109042605
for programming
>>
>>109042603
lmg is right about most things.
>>
>>109042609
No. For basic Q&A. Don't speak for me faggot.
>>
>>109042613
God told me not to get an rtx 5090 (like get in line for it at microcenter), or an rtx 3090.

God was right.
>>
HN's opinions on anything LLM related are always equally funny and infuriating to read.

>I do not trust Anthropic anymore
>anymore
>>
>>109042602
Those are large MoEs, so they benefit more. More active parameters = more MTP benefit.
>>
>>109042319
im in retirement age though im neet with no job prospects in sight
>>
>>109042678
yep.
https://files.catbox.moe/n5tow1.mp3

They really did steal our jobs.
>>
I wish there wasn't such a gap between 31b gemma and the smaller ones. I want more vram to do stuff like tts and image gen but it's hard to go back after using 31b...
>>
>>109042678
Then by all means enjoy the fruits of your labor and make sure to leave behind as much of a workable foundation for your offspring and family as you can after you die. Splurging on a 5090 or Blackwells will be the Gen X/Y's analog to boomers buying expensive boats kek.
>>
>>109042773
I don't think there was a single moment in history where you could resell your boat for 50% more a year after buying it.
>>
>>109042773
no poors allowed
>>
>>109041640
glorious friend another! I am doing the same also. But I am being silly and doing it on multiple platforms to see what can and cant. mobile, palms, old OS and more. it's fun and frustrating and with so much to work on.,

good luck I hope it works as I would love to see it. I wish I could figure out how to put an ai waifu in my watch but..that gets into os creation and thats neat on old dead systems but watches are a whole different thing.

welp good luck. keep everyone updated.
>>
>>109042613
>lmg is right about most things.
Yep. And Reddit is wrong about most things.
>>
>>109042627
God told me to stack silver but was dead silent about selling before dropping from ATH
>>
>>109041693
>You arent autistic enough to do this. if you cant hyper focus literally all day in the same routine for years its not happening.
still won't work imo
you need several different autists fixated on specific components to make this work
>>
>>109042866
He doesn't want you to hodl cash, retard.
>>
With all the recent malware and supply chain attacks I get the feeling having your AIfu make your software is going to be the meta in the future.
>>
>>109042894
but cash could buy me an ai waifu. tradcaths are all grifters and arthoes are all bpd
>>
>>109042878
We need to make a giga autist who can do it all.
>>
which model is anon currently running and deployed for daily use?
>>
File: wooooo.jpg (111 KB, 1021x1540)
111 KB JPG
>>109041730
>>
>>109042951
I've seen this meme a million times but I've never actually watched the movie
>>
>>109042951
Horrific, truly.
>>
>spending money on current hardware
Invoost instead and save up to buy your robot wife in 10 years.
>>
>>109042959
boils down to
>central ai is... le Bad?
versus the absolute KINO that is asimov
>>
>>109042976
bro it's too late, spacex was the last investment chance
it's all going to collapse soon
>>
>>109043012
you're an llm. I can tell.

By the way, the fbi and mossad are full of retards.
>>
>>109042976
>Invoost
into what?
>>
>>109043064
ROTH IRA and 401K max
50K into HYSA
rest into kalshi parlays
>>
>>109043012
2 more weeks

>>109043064
VOO, of course. Honestly just pick companies you like and a couple ETFs. DRAM should be good until 2027/2028. If you like the idea of humanoid robots there's HUMN.
>>
Does the diffusion gemma run in llama or it's just another meme architecture that is only available in vlmm?
>>
>>109043064
>into what?
just ask gemma dummy.
>>
>>109043111
llama is the meme.
>>
>>109043111
>just another meme architecture
It's more than another meme architecture because diffusion is a fundamentally different approach to how normal llms generate tokens. This is never going to make it into llama.cpp.
>>
>>109042911
GLM4.7 Flash, Qwen3.6 31B, and Gemma4 31B
>>
>>109043202
>GLM4.7 Flash
How does it compare to gemma 31b?
>>
File: file.png (945 KB, 1810x653)
945 KB PNG
bros... is it over? shouldn't i be getting way more performance? glm5.1 q3 on 4 5090s and 256gb of ddr4. sub 1t/s is just not doable.
>>
>>109043064
crypto is literally on sale right now, now is the best opportunity in years. buy now our you'll complain about missing out when it hits $200k next year
>>
>>109043259
Did you fuck up your parameters? I don't think it should be that slow.
>>
>>109043285
28 layers offloaded, 202k context, no-mmap, batch and ubatch at 2048
>>
>>109043230
I'm mainly use the models for editing and generating stories. GLM4.7 seems to write more realistic dialogue than Gemma4 31b.
>>
>>109043268
huh and just a few months ago it was hitting aths
>>
>>109043296
Yeah, don't just offload layers at random in 2026 with MoE models. llama.cpp even does the fitting automatically for you these days so throw that shit out.
>>
>>109043202
>Qwen3.6 31B
What
>>
How much do you think 1st gen consumer robot wives will cost anyway? ~$40k?
>>
>>109043313
rent-only
>>
>glm 4.7 flash
i didn' even notice when it was released
how does it compare to the qwen3.6 MoE
>>
>>109043313
>1st gen
80-90k easy its going to be brand new car price maybe higher. I think it will fall quickly especially with chinese rip offs but those first ones are going to be premium and probably just tweaked robot factory workers.
>>
>>109043310
I meant 35b
>>
>>109043313
probably double or triple that but they'll offer 80 year loans or like >>109043327 said, leases
>>
>>109043335
>>109043202
How does Qwen 35b moe compare to 27b dense?
>>
>>109043327
>rent-only
What does she do if you miss your payment?
>>
>>109043344
She knows where your penis is
>>
>>109043327
>miss payment
>they take her away from yoiu
>someone has to clean your cum out of her
>>
>>109043313
100k+ upfront
all features subscription based
logic runs in cloud so always online required so that (((telemetry))) data can be safely stored in government servers
adblock not possible
enjoy
>>
>>109043328
Slower than qwen3.6 35b model, but dialogue is better and story completion is better.

>>109043343
Qwen 27b has a repetition collapse problem.
>>
>>109043374
Chinks or nips will save us
>>
>>109043374
>logic runs in cloud so always online required so that (((telemetry))) data can be safely stored in government servers
They'll pass regulations to mandate this too. The only alternative will be GNU Wifebot, essentially a blowup doll with a voicebox

>>109043384
Globalism is dead so imports will be banned obviously
>>
jokes aside there will be no robo waifus for you anon. that's sexist. becky from HR gets the ick just thinking about it. not gonna happen
>>
>>109043412
Becky from HR will change her tune when she sees Chadbot.
>>
>>109043381
>a repetition collapse problem
Gemma 4 31b has that whenever the template is wrong.
>>
>>109043421
Chadbot will come with a 12-inch vibrating penis with 37 different models and add-ons with knowledge of all sexual positions will be a single install away
Stacybot will call the police if you so much as flash her or touch her inappropriately
>>
>>109043433
Is there any way to not get the same response with different words with gemma-4? It lacks diversity.
>>
File: 1751399486610333.jpg (336 KB, 1000x843)
336 KB JPG
>>109043441
>Stacybot will call the police if you so much as flash her or touch her inappropriately
This but she's a loli and uses her crime prevention buzzer.
>>
>>109043305
that fixed it a little. up to 2.5t/s now. manageable, but still not idea. wish i got ddr5 when i had the chance.
>>
>>109043454
Tell it to. : ^ )
>>
>>109043463
>grab both ends
>headbutt her
>>
>>109043493
>headbutt robot
>get concussion
>>
>>109043501
It's not about who gets the damage, it's about sending a message.
>>
I think my honeymoon phase with Gemma is ending. I hate being a VRAMlet. Time to go back to envying the anon(s) running Kimi and GLM...
>>
>>109043554
>>109043554
>>109043554



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.