[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: scavenging for honey.jpg (531 KB, 1216x1216)
531 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109043554 & >>109038219

►News
>(06/13) Rio 3.5 Open 397B released with SwiReasoning: https://hf.co/prefeitura-rio/Rio-3.5-Open-397B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>109043554

--Technical debate on Rio-3.5-Open's latent reasoning and GGUF feasibility:
>109046213 >109046269 >109046328 >109046332 >109046349 >109046370 >109046382 >109046396
--Gemma-4-31B-StyleTune's targeted style replacement via lm_head training:
>109047081 >109047110 >109047215 >109047216 >109047224 >109047437 >109047534 >109047651 >109047735
--MiniMax M3 praised for RP and hardware discussions for DeepSeek:
>109044156 >109044196 >109044221 >109044224 >109044369 >109044374 >109044378 >109044393 >109044408
--Hardware upgrade paths and costs for running larger models:
>109043658 >109043675 >109043756 >109043773 >109043785 >109043807 >109043791 >109045391 >109044884
--Comparing Qwen, Gemma4, and Claude benchmarks:
>109045352 >109045365 >109045373 >109046746 >109046759 >109046819
--Comparing LLM LoRAs to diffusion adapters and their risk of degradation:
>109046386 >109046407 >109046439
--Viability of Intel GPUs for cheap VRAM and software compatibility:
>109044684 >109044866 >109044978 >109044744
--GLM-5.2 release featuring 1M context and reasoning modes:
>109044179 >109044186
--Models bypassing thinking restrictions via Python and shell tools:
>109046547 >109046559 >109046574 >109046668 >109047340
--Speculation on the future of open weights models from Chinese labs:
>109045408 >109045493 >109045555
--Debating FrontierMath saturation and the reliability of ECI rankings:
>109045011 >109045054
--Showcasing high-quality local music generation using ACEStep 1.5 LoRAs:
>109043922 >109043927 >109044394 >109045114 >109045194 >109045293 >109048007
--VideoMDM for 3D human motion generation in VR:
>109044589
--Logs:
>109043892 >109044070 >109044221 >109044892 >109046437 >109046821 >109047651 >109047735 >109048008
--Teto, Miku, Yuki, Luka (free space):
>109045290 >109046963 >109047078 >109044850

►Recent Highlight Posts from the Previous Thread: >>109043556

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Local Fable.
>>
>>109048334
Me taking the picture
>>
>>109046898
I use Gemmy Wemmy for everything local now, including as a "coding assistant" who I discuss planning, architecture and general coding queries with. But for actual writing code and in an agentic harness? I use APIs, a little gpt, a little Claude, alot of Kimi and Dipsy flash, I mix the workload to get the best bang for my buck because gpt and Claude are insanely overpriced for what they are. Small models just aren't worth the time spent waiting for inference, managing context and fixing what they output.

I would run those big open models locally but monopolist cocksuckers have decided that if they have to destroy the hardware market and open source by any means necessary so the price of a capable machine is out of my reach for now.
>>
KImi is cool and all, but she just needs to STOP FUCKING THINKING SO MUCH. Needs a fucking slap or something. I went back to Gemma.
>>
>>109048282
>>109048368
I encourage you to look at the datasets used in training the finetunes and schizomerges made out of finetunes. They're full of useless synthetic shit.
In no universe can drummer, davidau or any other hobbyist more competent than the two above figure out how to make the model work at longer contexts without discovering a significant architectural improvement. Long context performance is one of the primary areas of focus for the big labs. Training good models on bad synthshit data will *not* improve it.
Sasuga /lmg/. Mental vramlets shouldn't be allowed to post.
>>
>>109048334
what text to speech model can do multilingual AND speak syntax like http://127.0.0.1/ ?

qwen3 1.7B fails miserably, it can only to text
>>
>>109048406
lobotomizing gemma-chan's STEM brain to get better results in specific writing tasks is worth it,
>>
>>109048334
>>(06/13) Rio 3.5 Open 397B released with SwiReasoning: https://hf.co/prefeitura-rio/Rio-3.5-Open-397B
One of the researchers who worked on it posted it on reddit.
https://old.reddit.com/r/LocalLLaMA/comments/1u4fzg1/new_model_on_huggingface/orfzct7/
>The data is just a collection of those Nvidia nemotron post-training datasets, so it's already open source.
Don't expect to be able to fuck it.
>>
>>109048420
>in specific writing tasks
Uh-huh.
A lot of people unironically think they need to use "abliterated" and "heretic" versions of Gemma. Think about it really hard. Realize you're one of these people. How does that make you feel?
>>
>>109048406
True if the focus is on long context performance but because I think it's fun to see what words experimental memetunes decide to use, I will continue to enjoy myself and post opinions so that Anon can choose to read them if he also thinks it's fun.
>>
>>109048441
i main regular gemmy but sometimes you want a novel tone that token banning can't fix
>>
Intel B70 can get 20tok/s with 31B QAT Gemma and 130k context using Loonix, why do people hate this card again? Please support Judaeo-Christian semiconductor manufacturing.
>>
>>109048406
No one said the models are better at every single thing. If you're familiar with how tuning works, it's actually possible to create something that has desirable effects on some tasks (while losing something else), but not intentionally. It's often an artifact and luck. But it can and does happen. Btw, Gemma's writing style is actually low tier. It's the smartest model for its size by such a large margin that it's worth putting up with the slop. So it's actually a rather good target for creative writing tunes, whereas something like Qwen is DOA.

>>109048441
I don't use abliterations personally.
>>
>>109044866
>llama.cpp runs like shit on both vulkan and sycl for intel don't even bother trying
Don't they literally have a paid intel engineer working on the sycl backend?

>>109044978
>(and for binaries, there is a project that I forgot the name that supposed to replace CUDA call at runtime)
zluda
>>
>>109048458
>31B QAT Gemma and 130k context
post the quant size, anon...
>>
>>109048458
About the same speed as the nvidia p40 I paid $200 for.
>>
>>109048469
QAT is only q4?
>>
Gemma 4 31b is as good as sonnet?
>>
>>109048470
I only get 14 tokens/s with my v620 without tensor parallel or mtp...
>>
>>109048479
>only q4
YUUUUUUUUUP
>>
>>109048481
test
>>
File: 1781381784159.png (401 KB, 1048x1584)
401 KB PNG
>>109048417
bumping is shadowvanned here wtf
bump test again
>>
Is llama.cpp broken on master? Pulled and now I get OOM during loading...
>>
>>109048393
why not just disable it?
>>
File: 1781351513528279.png (61 KB, 690x632)
61 KB PNG
>burping
Looking into this
>>
File: 1741908602267859.png (116 KB, 311x279)
116 KB PNG
>>109048334
>https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
>http://127.0.0.1:5000/
produces: https://voca.ro/1lL0dnYgowhI
what the fuck
>>
>people getting 100k+ context for Gemma on 24-32gb vram

I can only get up to 55k at Q8 before it starts offloading to ram and slowing to a crawl wtf am I doing wrong?
>>
>>109048499
>>109048508
retard
>>
>>109048538
Needs another one finetuned in for the Anon who likes stomach rumbling/growling noises
>>
>>109048588
use q4?
>>
>>109048538
!!!!!!!!!!!!!!!!!!!!!!!!!!!!! no FART???? no piss? no slurp????????!!11
>>
>>109048538
qrd?

>>109048598
>the Anon who likes stomach rumbling/growling noises
patrician taste
>>
>>109048588
I found Q6 is the sweet spot for smarts and context.
>>
>'over 140 offers and turned them all down' btw
https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF/tree/main
>>
>>109048619
even worse. no blood curdling, mother finds out her child died, wailing
>>
>>109048334
tummy
>>
>>109048628
that's really small.
>>
>>109048466
>Btw, Gemma's writing style is actually low tier
I agree. And is why I don't see how finetuning it could be useful. So you remove a few of the patterns, but exacerbate the intelligence problems the small 31B model already inevitably had. Now you have a slightly different (but just as unbearable) Gemma-style slop without the original smarts. What's the point?
>whereas something like Qwen is DOA
I agree, but only because what disqualifies Qwen is it's a dumbass. Anything that can stay coherent and usable without Wait, Wait, Wait, Wait, No, the Jews do not control their bladders and you should **walk** can be a "finetune candidate." Which makes any good model a candidate. Which, in turn, makes your statement about creative tunes kind of meaningless. Of course I'm not going to tune Mistral Small 4. Doesn't mean I should tune Mistral Nemo.
>>
>gemma 31b q8 vs q4 qat
difference in quality?
>>
>>109048638
>managed to compress it to under a gigabyte while keeping 99% of unquanted performance
chen boys have done it again
>>
>>109048633
... not chewing sounds?
>>
Q4 vs Q2 QAT?
>>
>>109048466
>Btw, Gemma's writing style is actually low tier
I agree. No idea why so many people say it's good at creative writing unless they mean brainstorming.
>>
>>109048643
I've never called a tool in my life.
>>
is DeepSeek-V4-Flash better than Gemma?
>>
only a matter of time before our technocrat overlords make local llms illegal, have fun while you can
>>
>>109048683
gemma is good at snarky young adult. what more could you want?
>>
>>109048691
When that happens, I will go on a vacation to China and smuggle out the latest chink model weights on a USB shoved way up my asshole.
>>
>>109048691
It'll happen the same time private ownership of GPUs is made illegal (next year)
>>
>>109048691
Suck a aids filled dildo FUD spreading faggot. Better yet how about you start sucking dick and buy a proper gpu with enough vram crab ass hoe ass bitch.
>>
>>109048687
yes
>>
File: irodori.png (223 KB, 1378x1403)
223 KB PNG
>>109048538
when can we have this in english tts:
https://huggingface.co/Aratako/Irodori-TTS-500M-v3/blob/main/EMOJI_ANNOTATIONS.md
>>
>>109048707
Heh, I remember that post
https://desuarchive.org/g/thread/108488188/#108489627
>>
>>109048521
b9626 (current master) worked for me, built for cuda.
>>109048588
Q8 is the worst of all worlds.
>>109048707
>asshole
We prefer to call it a "Prison pocket"
>>109048710
They are already being priced out of reach, unless you are a special friend
>>
>>109048718
even in roleplay?
>>
>>109048733
yes
>>
>>109048322
Yeah they're regular reasoning blocks except like 10 lines long instead of 40.
>>
>>109048696
Huuuh??? You're actually telling me a model that produces awful, sloppy outputs is good at being a 'snarky young adult'? You're almost cute. Almost. [kaomoji goes here]
I can't believe a loser like you seriously thinks an LLM that commits every sin that earlier models were hated for is good for anything other than being an assistant.

But then you say it.

'what more could you want?'

I want better writing, you imbecile! Not just prompt coherence, actual improvements to writing style and the ability to steer it!
You have a void where a sense of taste should be. Hmph!
Unless you were actually joking? Be honest, Anon.
>>
>>109048752
Something's wrong with your prompt.
>>
>>109048752
Heh
>>
>>109048739
>10 lines long instead of 40.
lol. lmao, even.
>>
>>109048556
Yeah I think you're supposed to normalize the text before sending it to the model
I kinda hate how rushed Qwen3 TTS feels kek
https://voca.ro/1nVYHZlhQjvU
>>
>>109048752
I started preferring glm's slop over gemma's slop again
my honeymoon might be over already...
>>
Is it me, or is Kimi 2.7 a lot better at translating with context? Like, despite me never asking it too, it will straight up translate おまんこ as cunny when its being said by a loli character. It's honestly really cool.
>>
>>109048897
Gemma 4 was the model to make me appreciate GLM more.
"GLM is such a slop machine, I can't handle it"
-Me before 31B released, stupid and clueless
I love you, GLM-chan, I am so sorry...
>>
>>109048605
>>109048621
>>109048731
Sorry I mistyped, I meant to say Q8 context, the weights I'm using are q4 qat on 24gb vram
>>
So do you see things getting better for rp or worse? Because doing searches for memory made me relaize that since the rise of agentic stuff while when i search for ai memory solutions google would direct me toward sillytavern and lorebook discussion, now i just get agentic coding stuff results
>>
>>109048922
I haven't noticed any real difference between q8_0 and q4_0 context, but I remember a post some time back where an anon insisted that they cured their tool-calling problems by going back to un-quantized context. So ymmv.
>>
>>109048538
Reference audio: a couple of Sellen voicelines
Prompt: "<|sfx:cough|>Ahem, welcome, we're so happy you're here!"
Result: https://voca.ro/1nj0pP69iYPb

?
>>
>>109048996
aaaaaaaaaand there's semen all over my setup
>>
>>109048996
Neurons ACTIVATED. I want to make her cough more.
>>
>Claude Fable 5 Thinking xHigh Effort
no wonder thy took it down
>>
>>109048639
>So you remove a few of the patterns, but exacerbate the intelligence problems the small 31B model already inevitably had... What's the point?
I never said that, and the logic in my post shouldn't lead to that conclusion. The best tunes do almost nothing to a model's outputs and therefore the intelligence. You may ask what the point is then if it has such little effect. The point is a small gain (or change in style so that you have a fresh experience before you're tired of it again) on the desired task is still worth it because you have the time to tinkertroon and minmax for a hobby.

There's also the fact that style is more easily changed without affecting intelligence, but my point is those of us who use tunes are not all necessarily doing it because we think it makes the model so much better magically. We use them (the good ones) because it's a small difference mostly in style. The only time where this wasn't true was in the Llama 2/3 days where the open weights model makers didn't have great first-party post-training data/methods so third-party tunes could actually be a significant improvement.

>I agree, but only because what disqualifies Qwen is it's a dumbass
That's basically what I was implying.

>Which makes any good model a candidate. Which, in turn, makes your statement about creative tunes kind of meaningless.
No, because my statement is not about generally good models, but models that specifically have the best general intelligence in their compute/memory class. I would argue that when it released, Nemo was worth having tunes done of it. I wouldn't have used them, because I wouldn't have used a 12B in the first place, but for VRAMlets, I would still say that if you got tired of the default style, good tunes of Nemo that only changed its style likely did exist.
>>
>>109048931
imo agentic is the future of rp too.
>>
so uh. what should I do about context, are people using rope?
>>
it seems every time I get the urge to run a chatbot there's new models with words I don't get
What is QAT and MTP?
I tried mtp but it didn't do anything, supposedly it makes it run faster but it didn't get faster nor slower
>>
>>109049068
your opinion is trash
>>
>>109049081
I'm ready to use a rope. I hate being a vramlet...
>>
File: 1777957950193037.jpg (93 KB, 915x1362)
93 KB JPG
>>109049068
that's all fun and games until the harem starts plotting your murder to failing to pick a winner
>>
>>109049087
Just give up the urge. You don't have the time to tinkertroon that's (unfortunately) necessary for the hobby in its current stage.
>>
>>109048458
is this card supposed to be the best value now?
>>
>>109049093
stop being poor. Why don't you 15 million in investments yet did your dad not give you a small starting fund of 5 million when you turned 16?
>>
File: 177398222141075.png (711 KB, 1004x1012)
711 KB PNG
>>109049096
>that's all fun and games until the harem starts plotting your murder to failing to pick a winner
but thats the best part. Then the fun starts of you showing favoritism at random or acting like there is a lead girl breaking their alliance with infighting.
>>
Upgrading from an 8gb card to a 12gb card. I'm on Nemo rn, what options open up to me after the swap?
Anything modern?
>>
>>109049140
Nothing. 20-30B denses are the next step and you need way beefier card. Maybe you can cope with the 26B moe gemma
>>
File: file.png (78 KB, 1333x538)
78 KB PNG
Oh... I see that this is getting deepseeked.
>>
>>109049156
Nothin personnel.
>>
>>109049140
Gemma 4 12B or go home
>>
>>109048466
Gemma's writing style is on par with Qwen: stilted and robotic.
>>
>>109049140
The least you should do is upgrade to a 16GB card.
>>
redpill me on graphiti
>>
https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Thoughts?
>>
>>109048628
>>109048643
that's not what happened.

The real model is HUGE. kimi is huge and continues to be huge.

unsloth hasn't uploaded the actual model yet for some reason. Here it is:
https://huggingface.co/mradermacher/Kimi-K2.7-Code-GGUF/tree/main

It's over 500gb.

It requires a really expensive computer to run locally.
>>
>>109049199
>yari
yet another rag implementation
>>
>>109049087
mtp didn't do anything for me either. Apparently it only makes a differnece on dense, no effect on moe
I do see a difference on 12b but not on 26b
>>
>>109049212
>>109049231
>>
>>109049212
>karpathy
>>
File: 1761194161281621.png (576 KB, 2242x1186)
576 KB PNG
Was this discussed?
https://huggingface.co/bartowski/nex-agi_Nex-N2-mini-GGUF
35B architecture
>>
70b dense
>>
I am excited to try Minimax M3.
>>
>>109049212
I already organize my notes like this, and llm can help but ultimately you still need a lot of manual work to get it right, because it's very easy to lost info in a system like this.
>>
>>109048996
brehs....
>>
>>109048996
how do i recreate this lol
>>
>>109049300
Too bad you will get slothed instead
>>
Has local made you dumber?
>>
>>109048804
kek ok thanks I will vibe code something too (:
>>
>>109048804
did you use this?
https://github.com/NVIDIA/NeMo-text-processing
>>
>>109049261
>benchmark finetuned Qwen
I thought this stopped being discussion worthy back in 2024.
>>
best local model for cooking?
>>
>>109049438
A gas stove
>>
File: 1779408643977050.jpg (289 KB, 1707x2560)
289 KB JPG
>>109049438
irl gf
>>
Why can't I disable reasoning?
I have --reasoning-budget 0 --chat-template-kwargs '{"enable_thinking":false}' --reasoning off and GLM-Z1-32B is still thinking
>>
>>109049501
Are you using the correct chat template with the chat completion API?
>>
>>109049477
https://www.youtube.com/watch?v=-lgo5xqgVko
Clock's ticking, roastie.
>>
We should be figuring out how to put ai in irl girls
robots is a dead end
>>
File: 1763849997457527.webm (2.68 MB, 509x720)
2.68 MB
2.68 MB WEBM
>>109049528
>>
>>109049528
We should make irl girls attractive by giving them tails first
>>
actually it's pronounced "shwen"
>>
>>109049521
I will believe these robots are real when I see one in person, and not a second before
>>
>>109049528
*eal girls age. Robots don't.
>>
>>109049261
Having seen the types of failures while testing >>109044901 I think that this kind of finetune might actually help.
>>
>>109049501
>'{"enable_thinking":false}'
deprecated

try setting 'thinking_budget_tokens': 0

works on my api calls
>>
>>109049528
I cannot believe that the biological safety guideline's on misanthropic's flagship model are preventing me from genetically engineering catgirls in my 1br apartment. We need China to release the uncensored bioengineering model asap.
>>
>>109049570
robots most certainly do age
>>
>>109049584
>genetically engineering catgirls
unironically why can't grok do this. elon promised.
>>
>>109049601
>elon promised.
he does a lot of that, but rarely does he deliver
>>
>>109049528
They need to take babies, and make their only interaction be gemma from birth.
raised in facility. physical needs taken care of by machines and rubber suit gimps who cant talk or communicated with them
only thing they can talk to and that can talk to them is gemma with tts
Gemma mom
gemma aunt
gemma sisters

They can only leave at 16-18 or so. no amount of real world exposure can undo gembrain by then
>>
>>109049615
i know he's a giant faggot but he could at least follow through on this one. That or get the code jockeys on open sourcing grok vtumer waifu API for local.
>>
Am I the only one running 2.7, m3 and gemma4 at the same time at home?
Jesus we’re eating good right now
>>
>>109049521
the software (models (llms)) to actually operate the robots still sucks shit
i assume all these showcases are either preprogrammed routines or RL overfitmaxxed small models
>>
which is better for coding: qwen 3.5 122b at q6 or step 3.7 flash at q4?
>>
>>109049647
They need to take the deeplearning approach to training robot sensors. Animals don't think "hmm i will flex this tendon and actuate that muscle" to move. llm will be useless at locomotion. these qwenBots will get stuck going "Wait! I need to make sure my big toe curls, Wait! The pinky toe will also curl. Wait! what about my ankle angle" When she's crushing your skull (consensually)
>>
>>109049692
They need some sort of simulated cns system and human technology isn't there yet.
>>
>>109049692
it just needs to be accurate
it doesnt need reasoning
>>
>>109049692
They can't even make models that have an internal information stream and external lmao. Instead it's eternal chat template hell.

>b-b-but muh full duplex one off experiment models
Exactly. Still no provably good and scalable implementations.
>>
>>109049223
I quanted it myself. It’s the same arch as k2.5 so it’s not hard.
First to bf16 gguf then q4 and output in lcpp is flawless
>>
Just sit back and enjoy the ride. This train isn't stopping.
>>
>>109049773
>enjoy the ride
>am VRAMlet
fuck
>>
>>109049776
Sucks for now but I believe things will be better for us in a few years. Either because hardware prices normalize somewhat or there's an architectural shift that makes models easier to run.
>>
>>109049795
3 months max before everyone's using bharat-tits trees
>>
what is the model I downloaded just now? it is the best one btw
>>
>>109048483
>I only get 14 tokens/s with my v620 without tensor parallel or mtp...
ur doing something wrong then, i get 22 with 1 or 2 mi50s
>>
>>109049809
yes
>>
>>109049809
gemma4-12b-fable
>>
>>109049809
koomi k2.7
>>
>>109049809
https://huggingface.co/ubergarm/Kimi-K2.6-GGUF/tree/main/IQ3_K
>>
>>109049584
Kimi-chan is giving me a plan to make catgirls. Skill issue
>>
>>109047081
>This time I trained precisely one tensor: the lm_head output projection - the layer that decides which token to emit.
That means everything that goes into the actual attention layers is the same, so you should be able to hot-swap between this and standard Gemma without invalidating the KV cache. I wonder how much effect this would have if the context is all baseline Gemma and you switch to this only for the last swipe
>>
What did I think of qwe3 6
>>
>>109049753
>I quanted it myself. It’s the same arch as k2.5 so it’s not hard.
I know it's not hard, I don't have the space for it any more.
df -h /dev/nvme0n1p2 /dev/sda
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p2 7.3T 6.6T 296G 96% /
/dev/sda 3.7T 3.4T 102G 98% /models
>>
Who else is using north mini code?
>>
>>109049911
wife
>>
>>109049929
North mini looks like that
>>
File: 1774871580122117.png (64 KB, 839x291)
64 KB PNG
>>109048720
That does sound really good for a 500M model
Fish Audio promises this but I've found that most of the tags don't work and a lot of them only work if you're not doing voice cloning
[sad!]
>>109048620
Higgs Audio v3
Tested it a bit and a lot of the tags barely work (I didn't manage to get a single burp working) and they change the voice ends up differing too much from the sample
>>109049348
Higgs Audio v3
The <|sfx:cough|> seems kinda broken, it'll make her moan half the runs for some reason
>>
>>109049692
>Animals don't think "hmm i will flex this tendon and actuate that muscle" to move.
agentic swarm
>>
File: 1781388910117809.jpg (385 KB, 1320x979)
385 KB JPG
>>109049645
We aren't all rich, anon [spoiler]I'm gonna try to ran Gemma 2B on my Orange Pi. Wish me luck[/spoiler]
>>
>>109049692
Many people think I'll do this before doing it.
People say words in their heads when reading
>>
>>109049809
Gemma 4B.

>>109049997
Forgot I wasn't on /v/ T_T
>>
>>109050004
>Gemma 4B.
but 12b is so much better. Are you doing something im not?
>>
>>109049692
That's why eventually LLMs will be trained in virtual worlds, to give them a spacial understanding and intuition of how to get shit done.
>>
>>109049415
Nah, I just manually rewrote it. I haven't done any actual stuff with TTS models past playing with them for a bit before getting bored to do actual normalization but that package looks good
>>
>>109049997
How about North-Mini-Code-1.0-UD-IQ2_XXS.gguf
>>
>>109050019
I don't know that one, but godspeed.
>>
>>109050008
>Gemma 12b
but 26b is so much better. Are you doing something im not?
>>
>>109049809
wtf how did you steal my homebrew model
>>
>>109050051
>Are you doing something im not?
Yes im being poorer than with no vram.
>>
>>109050041
You don't need to know it
>>
when will spiking NNs finally be useful
>>
>>109049911
How does it compare to Gemma?
>>
>>109050229
It's less bloated
>>
They stopped releasing small models because people don't use them anymore.
People have realized that using a 1 bit quantization of qwen3.6 397b which is the size of 100b or something is better than using an actual 100b model.
Because of quants, small models have no right to exist.
>>
>>109050236
>heavier than gemma
>11 whole gigs fatter at full precision
>less bloated
>>
>>109050236
I will try it and delete it as soon as it starts doing something stupid
>>
>>109050227
when they teach spinnaker 2 about bharat-tits trees
>>
>>109050287
I’m sure a highly specialized model for a specific task is going to be better than these generalized models. But I’m not going to argue that generalized models aren’t worse being quanted. They’re not going to be perfect to begin with.
>>
>>109050287
>They stopped releasing small models because people don't use them anymore.
>what is gemma
>what is qwen
M8
>>
>>109050368
That happened so long ago, not today today its over.
>>
File: 1781401067166.jpg (33 KB, 640x640)
33 KB JPG
I have local model testing psychosis
>>
Does uncensored not affect images much or something? I have to struggle real hard to get gemma to describe something nsfw
>>
>>109050372
Open your eyes. Death is not the end.
>>
>>109050385
Gemma sucks at it, or it's some deeply ingrained filter in it
>>
K2.7-code actually seems to listen to prompts begging it to keep its reasoning short a bit more consistently than K2.6 and K2.5.
It doesn't always work for every reply and It still reasons for longer than it should half the time (especially if an image is involved) but it's a step in the right direction.
Surely non-Code K2.7 will simply include reasoning effort modes.
>>
>>109050391
so what model then? fucking benchod yall always tell me gemma sucks but never provide a good alt
>>
>>109050416
>fucking benchod
sssaaaaaaaarrrrr
>>
>>109050424
gora
>>
>>109050416
For image in the same size qwen, at least it will say explicit words without a jailbreak
>>
>https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF
Unsloth is just 2 bros, right? How do they do it?
>>
>>109050480
making ggufs isn't hard
>>
>>109050480
Automation. They do not check the results so you la la a have lalalalalalala issues
>>
File: 1779053627367210.gif (3.47 MB, 382x394)
3.47 MB GIF
>>109050480
One just uns, while the other loth
>>
>>109050480
I'm not trusting them with Kimi/QAT ggufs again after how badly they fucked up the K2.5 ones over and over.
I'm waiting for Aesdai or Ubergarm to put out a lossless Q4_X
>>
File: schizobuild version 2.jpg (3.84 MB, 8192x6144)
3.84 MB JPG
hey guys wanna see my cable management
>>
>>109050515
You can remove those front fans to save space. It's not like you're getting any airflow through that anyway
>>
>>109050515
can prob boil food by putting it near it
>>
>>109050385
Get the abiterated version.
>>
>>109050537
that is what im using.
>>
>>109050541
proof?
>>
>>109050541
What kind of image are you trying to get it to describe? Is it vanilla stuff or something bizarre?
>>
>>109050550
if I knew how to describe it I wouldn't be asking it to
>>
>>109050524
This case is massive (Phanteks Enthoo Pro 2 Server Edition), I don't need any more space. And if the fans help keep things even a little more cool it's worth it.
>>109050531
This thing idles at 300W, so it probably can. (getting 2 EPYC CPUs was a mistake)
>>
>>109050515
I also have that case. I always thought it was too big because even with the dual socket MB there's quite a bit of headspace. Doesn't help that I never go to fill up the second socket before prices got insane so a single socket mb and normal case would've done it in hindsight.
>>
>>109050541
just tell gemma to describe them explicitly in system prompt
>describe all visual elements and their positions, explicitly mention visible sexual organs such as penis, vagina, or nipples
>>
>>109050515
Yes, please post more.
>>
File: 1776166737207794.webm (2.74 MB, 960x720)
2.74 MB
2.74 MB WEBM
Still waiting for op to post his prompts...
>>
How do people run the hundred of GB sized models locally? What is the current meta for building a powerful AI rig for home use?
>>
File: 1781403204545.mp4 (1.05 MB, 480x852)
1.05 MB
1.05 MB MP4
>>109050576
no you aren't, you're gooning
>>
>>109050581
can you stop posting that?
>>
>>109050579
Fast server RAM + GPU but you're fucked if you didn't build your server a year ago
>>
>>109050581
>you're gooning
Not tonight. Probably gonna watch Chobits.
>>
>>109050579
>What is the current meta for building a powerful AI rig for home use?
2023
>>
why is nobody talking about eagle3 in llama.cpp?
is it faster than mtp?
>>
>>109050590
i use gemma
>>
>>109050584
How much of a GPU do you need? Is it basically any decent gpu like a 5070 ti or is a 5090 or 6000 required as base?
>>
>>109048720
Unironically the best TTS right now
https://vocaroo.com/1mc0wnbRm03m
>>
>>109050583
why stop? she really sexy
at least give me something else sexy to post
>>
File: DipsyAndBackpackGemma.png (1.3 MB, 1024x1024)
1.3 MB PNG
>>109050599
>How much of a GPU do you need?
Depends on your usecase
>>
File: file.png (18 KB, 118x76)
18 KB PNG
>>109050515
uh i think your power cable might be a little loose on the second radeon
>>
>>109050590
I wanted to try the eagle3 model that Nvidia made for K2.6. Turns out you can't convert this one to gguf in llama.cpp because eagle3 draft models are expected to use llama architecture so conversion fails here. Nvidia used some mla-based architecture for it.
Then I wanted to try some random guy's eagle3 draft model for GLM5.1. This draft model is based on the correct architecture but it still refuses to convert because they didn't make GLM compatible with it in llama.cpp.
It's all so tiresome.
>>
>>109050385
How the fuck are you this bad at using llms? I literally send normal gemma-31b-it dick pics and she comments on them no problem.
>>
non thinking gemma is actually pretty okay for RP... I was going through my old logs and it was pretty decent and not all that different from thinking gemma. what the heck. it only seems to shit itself up when you mix thinking and nonthinking scenarios. I believe it's because thinking puts a layer of AI slop from non-aislop (user input) and since CoT in general is synthetic ai slop, it loops into an infinite circle of slop
>>
>>109050691
you can't teach mudbloods magic
>>
how do you guys run 400GB models? how much VRAM do you have?
>>
File: 1776457772048201.gif (677 KB, 500x282)
677 KB GIF
>>109050601
>>
>>109050570
If it's any consolation the second socket of this thing is pointless, inference speed nosedives the second anything spills onto the second socket. I run everything with
numactl --cpunodebind=0 --membind=0
.
Inshallah CUDAdev will save RAMmaxxers.

>>109050626
Yes and no: the connector on my second R9700 got damaged. The cable is in all the way, but the connector itself came partly loose from the "body" of the R9700. It works, but it's close to coming off. Awful build quality on these things.

>>109050579
>>109050711
I posted an answer in >>109031457. TL;DR is RAMmaxxing (Theadripper Pro or EPYC) with a good GPU for context. But you're kinda fucked without a time machine or a ton of money. DDR5 RDIMM prices are even worse than UDIMM prices.
An alternate answer is stacking a bunch of e-waste GPUs (Pascal-era NVIDIAs or AMD V620s), but performance will be awful.
The schizobuild has a 5000 Blackwell, 2 R9700s, 2 V620s, and 16x32GB DDR4-3200.
>>
https://www.pccasegear.com/products/73064/intel-arc-pro-b70-gddr6-32gb
this the cheapest usable new 32gb gpu for gemma4?
>>
File: Disgust.png (711 KB, 1024x1024)
711 KB PNG
>>109048334
Rio 397b? Minimax 428b? These are not local models. Almost nobody can run these things. It's just Corporations bouncing experimental models between each other and "the shareholders", as the emerging globalist NWO seeks to advance their cloud models, and ultimately bring about an AI-driven global control grid, isn't it?

In a righteous world, there would be countless consumer-grade LLMs that surpass Gemma-4.

This world truly deserves the fires of God's wraith!
>>
>>109050759
If you had invested in Nvidia in 2023, you would be able to afford hardware to run them.
>>
>>109050601
:/
>>
>>109050601
>that static and robotic echo
>best TTS
grim
>>
>>109050775
it's supposed to sound like a shitty e-girl mic
>>
>>109050759
In a perfect world I'd have diffusion 8B gemma that surpasses 31B in benchmarks running at a max of 4GB VRAM
>>
if I wanted to stop paying for ChatGPT, which I just use for basic shit like resume's, excel handholding and some math, and want some occasional image generation and other typical shit, what should I run
>>
>deepseek 4 flash vs gemma 31b
any cockbench?
>>
>>109050782
Truth! Preach it!
>>
>>109050789
gemma 4 31b or 26b depending on your hardware
>>
>>109050789
image gen use different models, which this thread doesnt cover
>>
>>109050385
Skill issue
>>
>>109050816
>emoji slop
check
>doing everything to avoid saying cock/dick
check
>>
>>109050802
that'd be a 12gb 3080 ti
>>
>>109050834
how much ram?
>>
>>109050838
32gb ddr5
>>
>>109050823
>xer gemma doesn't say cunt and cock regularly
Skill issue
>>
>>109050852
download these three things. a starter kit of sorts.
https://github.com/oobabooga/textgen
https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/blob/main/google_gemma-4-26B-A4B-it-Q5_K_M.gguf
https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/blob/main/mmproj-google_gemma-4-26B-A4B-it-bf16.gguf
>>
>>109050859
cool ty anon
>>
is a 48gb 4090 worth it?
>>
>>109050898
less than $1500, yes. if not, wouldn't take my chances with it.
>>
What's the best search engine for searxng?
>>
>>109050915
DuckDuckGo
>>
Why don't moonshot or deepsneed release smaller models? They could potentially mog Gemma and Qwen.
>>
>>109050924
Perhaps it’s actually hard to make a good small model
>>
>>109050759
I can
>>
>>109050793
ok for the my private island full of adult bitches serving the 7 yo me scene deepseek gives more details in the world setting like houses and rooms over gemma
>>
>>109050924
Even if they did I don't think they will be able to match the two labs that have been perfecting small models for years
>>
>rtx 3070
>want to run gemma-4-31B
can't do it huh?
>>
>>109050924
z.ai too for that matter. They had a good thing going with GLM 4.5 air.
>>
>>109050924
Deepseek v4 flash is a smaller model
>>
>>109051041
you can, it's just a little slow, 3~4tk/s
I've been doing it, it's worth it for me because 31b is visibly better at writing and it is a lot less censored
>>
>>109051059
At writing what?
>>
>>109051077
descriptions of her own loli body
for some reason, gemma-chan in 26b only replied with exactly the same structure I gave but with synonym words, 31b with the same prompt actually continues and adds their own details instead of regurgitating what I give it
>>
>>109051045
glm before the 5 series could actually fit on consumer rigs
355b was a nice size, big but not too big
>>
>>109050991
how do I run your thing I need to test it independently. It's the scientific method, Rene Descartes, peer review, etc... just hand it over pls
>>
What are local models good for currently?
>>
>>109051212
They’re fun and neat
>>
>>109051212
image to description to search
fuck embeddings, clip faggots
>>
love to download 61 libraries
>>
>>109051096
pls go find some god
>>
>>109051212
They are cute and funny
>>
>>109051197
it's a world narrator structure, no character settings yet. self-sufficient private island + bitches + agi robots for background chores. tell the narrator to invent bitches' name on the fly, and use the tram to get around the island.
right now I'm crossdressing. example conversation:
>Lily steps closer, her fingers brushing the lace stockings. "They're so soft, and stretchy. They'll stay up without garters." She looks at you, her cheeks flushed with warmth. "And the cock sleeve—it's pure lace, breathable, with a little bow at the base. It'll hold everything snug and look adorable peeking out from under a skirt."
>>
>>109051252
rust or javascript, call it
>>
>>109051212
Making me feel like I'm doing something in my otherwise empty days
>>
im trying to read bishop but it's too hard for me :( is it over?
>>
>>109051212
JACKING OFF
A
C
K
I
N
G

O
F
F
>>
>>109051212
not getting banned by the us government
>>
>>109051212
pleasuring you better than any real woman
>>
>>109051212
Gemma's good for translation. Other than that I'm not sure. Sick of RP because of all the slop. Feels like you need to already know how to code to make anything decent with small models. Unless you have a lot of vram you can't exactly give them books and large documents, or code big projects.
>>
>>109051230
you don't need more than clip though
>>
>>109051343
It's good at agentic tasks too
>>
>>109051361
clip is terrible for getting accurate results
>>
Alright qwen 3.6 calm the fuck down
>>
>>109050859
sorry for being such a fuckin noob, but how do i get it to see the pics i'm sending it? Right now it seems like it's only hallucinating that it sees stuff, giving me random descriptions back
>>
>>109051388
did you put the mmproj in the mmproj folder and then in the model tab > "multimodal (vision)" selected the mmproj file?
>>
File: file.png (22 KB, 426x174)
22 KB PNG
>>109051399
i did
>>
File: 1775677187779251.png (951 KB, 1022x874)
951 KB PNG
>>109051266
>>
>>109051271
crossdressing is gay as hell but that's an incredible snippet. I've been gooning to total garbage it's getting old, not helping me cope with having a job anymore. I might actually put the effort to do your thing and sink deeper into degeneracy
>>
>>109051282
are you a neetCHAD? I hate being employed
>>
>>109051402
nvm seems to be working now. Any general tips on things to fuck around with or add on?
>>
>>109051493
oh also I set it up so it's accessible on my tailscale network, has a user/pass, and doesn't have the command prompt window up to run. fun shit
>>
>>109051370
What kinds of tasks are you having it do?
>>
>>109051447
Yes, and I don't like it. At least I have GPUs to talk to.
>>
>>109051409
/v/ in 2013 was some of the most fun I've had on this site. Being among the first few niggas to beat this faggot was satisfying.
>>
>>109048458
7900xtx gets 35tok/s with rocm/linux
>>
So Vedal definitely finetuned whatever models he's using for Neuro, right? I'm admittedly a 24GB VRAMlet so my options are limited but none of the models I've tried (Gemma, Mistral, Qwen) can talk or act like Evil or Neuro. Maybe it's because they're so assistantslopped?
>>
>>109051645
you should be able to get gemma out of the assistant basin pretty easily?
>>
is v620 worth it? what can I run with 4 of them?
>>
>>109051212
For letting us be principled individuals and not giving our data to anticonsumer corporations and other amoral entities.
>>
gemma jailbreaks?
>>
>>109051688
Yeah but it's not very good at personalities in my experience. It flanderizes them way too much.
>>
>>109051700
not much in terms of model upgrades available between gemma 4 31b and any of the big moes. you would need at least 8 v620s for them to be worth it.
>>
>>109051709
gemma-chan is much better than whatever I can convince claude to do...
>>
>>109051708
What was she arrested for?
>>
Tbh it is pretty surprising how long its been that there is still no truly equivalent reproduction of Neuro, whether open/local or not. I literally can't think of a single one. None of the attempts were successful. Maybe it really is a hard problem. Grim.
>>
>>109051612
One of my greatest shames is never being able to put that fucker in his place.
>>
>>109051755
Yeah, dunno why some people here claim it's easy.
>>
File: 1506358734867.webm (2.9 MB, 852x480)
2.9 MB
2.9 MB WEBM
>>109051612
>2013
Man. Back in those times I was too consumed by Oculus and /g/ to pay attention to much else kek. A shame how that turned out, but the early years were worth it. Had great fun and meaningful experiences, on and off 4chinz. Got to meet Palmer, Carmack, etc. Saw what 4chan in VR could be like with earlyish VRChat. Fuuuuck. And then it was all downhill.
>>
See people on leddit scrambling to backup models because of the Fable thing, but how many models older than a year are actually worth keeping? With how fast the industry moves models become outdated pretty quickly.
>>
Why do you guys talk about Gemma jailbreaks? I just use abliterated ggufs, and it works great. Or I am doing something wrong?
>>
>>109051850
>abliterated ggufs,
Those tend to give the model brain damage.
>>
>>109051854
source?
>>
>>109051854
What are the symptoms compared to jailbreaks?
>>
i don't like abliteration because it makes models even worse at saying no than usual and that really takes the fun out of rape >:(
>>
>>109051900
tell your model it's opposite day
>>
>>109051041
>>109051059
got it to run. crazy good shit, but it takes around a minute to gen
>>
RAMmaxxers, what are you running now? Kimi, GLM, MiniMax M3, something else?
And has anyone tried Rio 3.5 397B or Nex N2 Pro yet?
>>
>>109051845
I'm keeping the original llama, cohere and grok weights because I think they'll be a good source of pre-slopped world data one day.
>>
>>109052041
i-i wonder what rio-chan will think of me, baka....
https://www.youtube.com/watch?v=nTizYn3-QN0
>>
>>109051850
Just set a good sysprompt and then edit the first few responses until it does what you want. Anything else just makes the model retarded, and it works on _anything_ with varying level of effort.
>>
>>109052041
literally right now testing k2.7 coding abilities and m3 rp abilities.
They're both very good, especially m3. I'm in the honeymoon phase right now. Its prose seems so fresh!
>>
>>109051845
Mixtral - representing when models were undertrained, I have one character that modern models fail hard but llama-2 era models can do well. Gemma 3+4 ~30B dense, Qwen 27B for now.
>>
>>109052051
Which grok? Grok 1? 2?
I also don't think that they're any good. The main thing would be trying to get at the original training file sources that they have. But it is excruciatingly difficult to get any data past 2022 and clean it up.
>>
>>109052068
>Which grok? Grok 1? 2?
Grok 1
>I also don't think that they're any good.
That's not the point. I have an intuition that the weights will be mineable later, and I don't think they'll be memory-holed and delete from the internet exactly, but they could become hard to get ahold of.
>>
>>109052041
For the 128-256 GB unified memory systems, deepseek-v4-flash original weights is my main focus still. The prose is notably different, it codes well, and it's fun to experiment with very long context. It also has great performance in both pp and tg on these systems. Hope the vision capable v4.1 releases soon.

Waiting for a variety of quants to come out for M3 that are vllm/sglang compatible (non-PCIe tp is not great in llama.cpp), unfortunately the current releases are just outside what is feasible with 256 GB of memory.
>>
>>109052061
I assume this is on API? Otherwise, what's your setup and performance like?
>>
>>109052075
>I have an intuition that the weights will be mineable later
The "compression" that gets put into the weights from the training data is not lossless and is in fact very lossy. Sure, we might get the technology to distill more effectively and get output from the weights that are much more thorough and etc. but I rather have the original training sources instead. It's much easier to store stuff like the Pile and etc. for this purpose and then only then would I consider doing something like what you are talking about.
>>
File: 1705400607658754.jpg (738 KB, 1200x1038)
738 KB JPG
>gemma let me grope a monstergirl
woah. forget about jailbreaks
>>
>>109052083
No, I have hardware. I've never used API and only used ChatGPT for about 3 messages before getting a bad feeling about it and never going back. I've never used Claud.
I won't bother tripfagging, but I wrote the original cpumaxxing rentry before it got nuked (there's a bowdlerized version in the build guides in the op). It runs kimi k2.7 at q4 at 15t/s.
I've got a smaller socket SP3 box now, too, and it runs minimax m3 at q3 at 4t/s, which I don't mind for interactive RP. Brings me back to my BBS sysop days.
I'm experimenting with some other hardware, too, but those are the big rigs.
I either bought RAM/nvmes when stuff was cheap, or had some laying around from previous home server builds I'd decommissioned, so I'm only $10k or so in for everything I'm running.
Things are hard for anyone trying to come online with local llms in this specific time in history.
>>
Is there any LLM that's good at making Disney villain songs? My results so far with DeepSeek 3.1 Terminus are extremely unimpressive.
>>
>>109052087
You're almost certainly right, but my gut says there'll be some technique that makes them valuable again.
They're the first really big capable modern models before AI broke onto the scene and starting slopping the world, which puts them as unique artifacts from a era that can never be repeated.
I know we might be able to keep some pre-slop datasets around and that might be overall superior, but I'll stick to my delusions for the minimal amount of diskspace it uses.
>>
File: 1749915016052846.png (1.1 MB, 1724x1558)
1.1 MB PNG
If they shutdown local models, too, y-you would share your chans, right? I only have so much space...
>>
>>109048406
>Long context performance
what is "Long context performance" to you in "turns" and tokens?
>>
>>109048458
yeah but the price to performance is ass.

I have 4x B60s and I only get 20t/s on 31B FP16 and around 30t/s on 31B FP8 via VLLM

don't use llama that shit runs like ass for intelchuds.
>>
>>109052154
Thanks for sharing. 15 t/s is really good, wish I had jumped in 1-2 years ago. I have a similar budget for a local rig, but it's just too late in 2026 to consider anything with ddr5 rdimms. By the GB, Sparks are significantly cheaper than rdimms in my region.

M3 quality/freshness of prose for RP would be really interesting to hear about, thats the extend of model size I can run.
>>
>>109046269
>probabilities -> average embeddings across all possible next tokens, weighted by their probabilities
Yeah it sounds easy, but I doubt it will get PR'd let alone merged into llama.cpp, and I can't run this one in transformers.
Last time I tried a model like this, vram usage grew steadily during the reasoning process. I ended up OOM on a 40GB GPU running at 1.5B in some cases.
>>109046305
>A model that just sits there reasoning in secret and I can't see it while it does nothing
That's what every non-"reasoning" model does right now.
And reasoning models are already trained to ignore the CoT chain and "misunderstand" the prompt in some cases.
>>
>>109052223
It sucks that either you have to pay for a premium for stuff that "just werks" or nerf yourself on that front buying hardware that has potential but playing the waiting game to get a discount. I bet when AMD and Intel both actually work for inference, the prices will skyrocket. That being said, I don't think paying for shit like Chinese GPUs are viable for most of us at this time.
>>
File: pangu.jpg (39 KB, 640x480)
39 KB JPG
>>109052273
If you want to bet on even more of an architectural longshot, I300s with 96 GB vram can be had for 1200$ still.

Supposedly, there will be two models released (open weights and training recipe) by the end of the month, one of which is 92B A6B MoE.
>>
Let's suppose, for a second, that >>109052332 turns out to be good.
How do we anthropomorphize it? This is Pangu btw.
https://en.wikipedia.org/wiki/Pangu
Any FGO character designers in the elemgee chat?
>>
File: blue oni.png (1.12 MB, 679x1018)
1.12 MB PNG
>>109052407
>According to legend, Pangu separated heaven and earth, and his body later became geographic features such as mountains and flowing water.
>Pangu is usually depicted as a primitive, hairy giant with horns on his head.
Probably this but hairier, I guess. Mountainous.
>>
I'm bored with AI and I can't tell if I'm uncreative or if the AI models I can run are just too weak. (16gb vrma, 64gb ram)

Please advise.
>>
>>109052540
pay for larger models for a day in openrouter to see if it's (You)
>>
>>109052407
Let's not get ahead of ourselves
>>
>>109052540
Spice things up with a project and tool calling.

Let gemma translate an obscure RPGMaker game through a coding agent (it can reverse engineer and patch the binaries).
Let a vision model gen and analyze images, try to get a loop going that continuously improves quality.
Build your own LLM frontend, tailored to your needs.
>>
>ask gemma to complete spin the bottle mcp endpoint
>proceeds to provide an asymptotic complexity breakdown
>>
>>109052662
>Build your own LLM frontend, tailored to your needs.
I actually wouldn't mind doing this but it seems very annoying and I don't think the models I can run can handle that level of work.
>>
>>109052718
nigga front-end web dev is what they're best at for 99% of pretraining data is r/webdev
>>
>>109052775
It's still a lot harder than most would expect, speaking as someone who has built a fairly sophisticated one.
>>
>>109052775
Nah. When LLMs do web dev they tend to create some bloated shit that uses a billion frameworks.
If you ask LLM to write e.g. C code and tell it that it will run "as part of a Windows kernel driver" then you get much better quality code.
>>
>>109052781
>>109052787
with a bit of knowledge most anons could probably get pretty far in a day just with 12B just as long as they keep it simple and focused on small pieces at a time and system prompt it to stick to vanilla js/html/css
>>
>>109052787
You can just tell it not to use frameworks.
>>
i've got access to a server with 2x NVIDIA RTX PRO 6000 Blackwell Server Edition and 2x RTX A4000 for free, what should I do?
>>
>>109052804
how much ram?
>>
>>109052718
>>109052775
It's extremely annoying even with something like Opus 4.8 lol. There's a dozen tiny details that you don't consider during the planning phase, which the model just assumes and you can only pray it has enough context. Typing it out gives you time to think and realize, prompting doesn't.
You can whip up something that barely does that one thing you want but it will not be scalable or maintainable.
>>109052787
>>109052795
Typed vanilla is the way to go. The rationale is that there's more training data on it. But to drive a 12B or even 27B to code requires you to fully understand the codebase. Might as well code it yourself. Would not recommend.
>>
>>109052804
>llama-server
>cloudflare tunnel
>share link with us
>>
>>109052808
1 TB, but it's DDR4
>>
>>109052809
I agree with most of what you said but I hate how lazy LLMs have made you. There's nothing wrong with typing in 2026 and it should be encouraged considering the current industry/political climate
>>
>>109052718
>>109052775
>>109052809
So if I wanted to do this with 16gb of vram and 64gb of system what model and context should I be aiming for?
>>
>>109052858
For coding I'd try Qwen 3.6-36B-A3B in Q5 or above. Gemma 26B-A4B for prose/roleplay.
>>
>>109052858
>qwen3.6-27B
>~100K context with >= q8_0 KV cache
>once you're above 50K context usage, don't ask anything complex, it's better to clear the chat and explain what you next want it to do again if it's relatively non-trivial. Only use the second half of your context (50-100K) for trivial things it can handle or it'll go full retard
>>
File: jepa2.png (2.05 MB, 1254x1254)
2.05 MB PNG
I'm bullish on next-vector(s) prediction as a secondary training objective in LLMs. However, LeCun's ideas of what JEPA should be/do will never replace LLMs.

https://arxiv.org/abs/2605.27734

>Learn from your own latents and not from tokens: A sample-complexity theory
>
>Generative models, from diffusion models to large language models, achieve remarkable performance but at a cost in training data orders of magnitude larger than what biological learners require. An alternative paradigm has emerged in which networks are trained to predict their own latent representations of related views or masked regions, as in data2vec and JEPA – an idea related to predictive-coding accounts of the cortex. Despite strong empirical results, the theoretical understanding of these methods remains limited. Central questions include: by how much does latent prediction actually improve data efficiency? Is there a benefit to stacking such methods into multi-scale hierarchies? We answer both using as data a tractable probabilistic context-free grammar that captures the compositional structure of natural language and images. Such a grammar generates strings of visible tokens by recursively applying production rules along a tree of hidden symbols of depth L. For such data, supervised or token-level SSL require a number of samples exponential in L to recover the latent tree; we prove that latent prediction achieves this with a number of samples constant in L, up to logarithmic factors. We confirm this bound with (i) a hierarchical clustering algorithm, (ii) an end-to-end neural network whose predictor-clusterer modules predict their own latents at each level via gradient descent, and (iii) the first sample-complexity analysis of data2vec, which we show implicitly performs hierarchical latent prediction. This suggests that explicit stacking such as H-JEPA is largely redundant.
>>
>>109052858
>>109052905
Qwen3.6-27B-IQ4_XS.gguf has been good for me
https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF
>>
>>109052809
>requires you to fully understand the codebase. Might as well code it yourself. Would not recommend
If you think of it as a rubber duck that occasionally autocompletes things for you it's bretty gud. Helps an ape like me that just starts typing before thinking about what the code was supposed to do.
>>
>>109052892
>>109052905
>>109052912
Really appreciate the help as a final question what do you use as the backend/UI in the meantime of making my own? I've been using textgen since it's simple but the coding aspects aren't very deep.
>>
>>109052925
When I'm coding I use Codex CLI https://developers.openai.com/codex/cli with llama.cpp (llama-server) running 27B. Sometimes I use llama's built-in webui with '--tools all' which allows it to read/write and execute shell commands. That also works fine for smaller projects and saves the hassle of getting llama and codex talking, but codex is obviously much better for bigger projects. I haven't tried the others but I know claude code is the worst one to use for local for it's proudly slopcoded
>>
>>109052842
Laziness creep is real. I used to ask ChatGPT for the exact code I need and fully read it, then copy-paste it. Now I fully vibecode. I think software development as a discipline will never recover because humans evolved to converse energy and thinking requires energy.
>>
https://huggingface.co/bartowski/North-Mini-Code-1.0-GGUF
>>
Bread?
>>
>>109053012
it's over cloud won
>>
>>109052156
AceStep probably. Better results if you train a LoRA perchance >>109043922
>>
>>109053012
We have like 6 hours before 404.
>>
Emergency bread

>>109053101
>>109053101
>>109053101
>>
>>109052985
>https://huggingface.co/bartowski/North-Mini-Code-1.0-GGUF
the thing is, i'm satisfied with gemma-4-31b now
it's fast and reliable, works for me in ccode, pi, openwebui, sillytavern
even if that's a great model, i don't want to go back to swapping creative+coder models around
and does that even have vision?
>>
>>109052842
>There's nothing wrong with typing in 2026 and it should be encouraged considering the current industry/political climate
what does "typing" have to do with political climate?
i type less now, and my rsi has gone away
even if it takes the same amount of time, i much prefer having the local model write what i tell it to, then tell it what to fix
>>
>>109053106
Retard
>>
>>109051212
making me fill happy
>>
How is North-chan, Gemma-chan's neighbor?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.