/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/13/26(Sat)15:54:25 No.109048334

File: scavenging for honey.jpg (531 KB, 1216x1216)

531 KB JPG

/lmg/ - Local Models General Anonymous 06/13/26(Sat)15:54:25 No.109048334 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109043554 & >>109038219

►News
>(06/13) Rio 3.5 Open 397B released with SwiReasoning: https://hf.co/prefeitura-rio/Rio-3.5-Open-397B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/13/26(Sat)15:54:41 No.109048335

Anonymous 06/13/26(Sat)15:54:41 No.109048335

File: __megurine_luka_vocaloid_(...).jpg (453 KB, 3112x3022)

453 KB JPG

►Recent Highlights from the Previous Thread: >>109043554

--Technical debate on Rio-3.5-Open's latent reasoning and GGUF feasibility:
>109046213 >109046269 >109046328 >109046332 >109046349 >109046370 >109046382 >109046396
--Gemma-4-31B-StyleTune's targeted style replacement via lm_head training:
>109047081 >109047110 >109047215 >109047216 >109047224 >109047437 >109047534 >109047651 >109047735
--MiniMax M3 praised for RP and hardware discussions for DeepSeek:
>109044156 >109044196 >109044221 >109044224 >109044369 >109044374 >109044378 >109044393 >109044408
--Hardware upgrade paths and costs for running larger models:
>109043658 >109043675 >109043756 >109043773 >109043785 >109043807 >109043791 >109045391 >109044884
--Comparing Qwen, Gemma4, and Claude benchmarks:
>109045352 >109045365 >109045373 >109046746 >109046759 >109046819
--Comparing LLM LoRAs to diffusion adapters and their risk of degradation:
>109046386 >109046407 >109046439
--Viability of Intel GPUs for cheap VRAM and software compatibility:
>109044684 >109044866 >109044978 >109044744
--GLM-5.2 release featuring 1M context and reasoning modes:
>109044179 >109044186
--Models bypassing thinking restrictions via Python and shell tools:
>109046547 >109046559 >109046574 >109046668 >109047340
--Speculation on the future of open weights models from Chinese labs:
>109045408 >109045493 >109045555
--Debating FrontierMath saturation and the reliability of ECI rankings:
>109045011 >109045054
--Showcasing high-quality local music generation using ACEStep 1.5 LoRAs:
>109043922 >109043927 >109044394 >109045114 >109045194 >109045293 >109048007
--VideoMDM for 3D human motion generation in VR:
>109044589
--Logs:
>109043892 >109044070 >109044221 >109044892 >109046437 >109046821 >109047651 >109047735 >109048008
--Teto, Miku, Yuki, Luka (free space):
>109045290 >109046963 >109047078 >109044850

►Recent Highlight Posts from the Previous Thread: >>109043556

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/13/26(Sat)15:57:21 No.109048354

Anonymous 06/13/26(Sat)15:57:21 No.109048354

Local Fable.

Anonymous
06/13/26(Sat)15:57:38 No.109048357

Anonymous 06/13/26(Sat)15:57:38 No.109048357

>>109048334
Me taking the picture

Anonymous
06/13/26(Sat)15:59:18 No.109048370

Anonymous 06/13/26(Sat)15:59:18 No.109048370

>>109046898
I use Gemmy Wemmy for everything local now, including as a "coding assistant" who I discuss planning, architecture and general coding queries with. But for actual writing code and in an agentic harness? I use APIs, a little gpt, a little Claude, alot of Kimi and Dipsy flash, I mix the workload to get the best bang for my buck because gpt and Claude are insanely overpriced for what they are. Small models just aren't worth the time spent waiting for inference, managing context and fixing what they output.

I would run those big open models locally but monopolist cocksuckers have decided that if they have to destroy the hardware market and open source by any means necessary so the price of a capable machine is out of my reach for now.

Anonymous
06/13/26(Sat)16:02:43 No.109048393

Anonymous 06/13/26(Sat)16:02:43 No.109048393

KImi is cool and all, but she just needs to STOP FUCKING THINKING SO MUCH. Needs a fucking slap or something. I went back to Gemma.

Anonymous
06/13/26(Sat)16:04:08 No.109048406

Anonymous 06/13/26(Sat)16:04:08 No.109048406

>>109048282
>>109048368
I encourage you to look at the datasets used in training the finetunes and schizomerges made out of finetunes. They're full of useless synthetic shit.
In no universe can drummer, davidau or any other hobbyist more competent than the two above figure out how to make the model work at longer contexts without discovering a significant architectural improvement. Long context performance is one of the primary areas of focus for the big labs. Training good models on bad synthshit data will *not* improve it.
Sasuga /lmg/. Mental vramlets shouldn't be allowed to post.

Anonymous
06/13/26(Sat)16:05:14 No.109048417

Anonymous 06/13/26(Sat)16:05:14 No.109048417

>>109048334
what text to speech model can do multilingual AND speak syntax like http://127.0.0.1/ ?

qwen3 1.7B fails miserably, it can only to text

Anonymous
06/13/26(Sat)16:05:46 No.109048420

Anonymous 06/13/26(Sat)16:05:46 No.109048420

>>109048406
lobotomizing gemma-chan's STEM brain to get better results in specific writing tasks is worth it,

Anonymous
06/13/26(Sat)16:05:52 No.109048422

Anonymous 06/13/26(Sat)16:05:52 No.109048422

>>109048334
>>(06/13) Rio 3.5 Open 397B released with SwiReasoning: https://hf.co/prefeitura-rio/Rio-3.5-Open-397B
One of the researchers who worked on it posted it on reddit.
https://old.reddit.com/r/LocalLLaMA/comments/1u4fzg1/new_model_on_huggingface/orfzct7/
>The data is just a collection of those Nvidia nemotron post-training datasets, so it's already open source.
Don't expect to be able to fuck it.

Anonymous
06/13/26(Sat)16:08:10 No.109048441

Anonymous 06/13/26(Sat)16:08:10 No.109048441

>>109048420
>in specific writing tasks
Uh-huh.
A lot of people unironically think they need to use "abliterated" and "heretic" versions of Gemma. Think about it really hard. Realize you're one of these people. How does that make you feel?

Anonymous
06/13/26(Sat)16:09:50 No.109048453

Anonymous 06/13/26(Sat)16:09:50 No.109048453

>>109048406
True if the focus is on long context performance but because I think it's fun to see what words experimental memetunes decide to use, I will continue to enjoy myself and post opinions so that Anon can choose to read them if he also thinks it's fun.

Anonymous
06/13/26(Sat)16:10:24 No.109048455

Anonymous 06/13/26(Sat)16:10:24 No.109048455

>>109048441
i main regular gemmy but sometimes you want a novel tone that token banning can't fix

Anonymous
06/13/26(Sat)16:10:54 No.109048458

Anonymous 06/13/26(Sat)16:10:54 No.109048458

File: 34f2d9b5dfba4fdf06986878c(...).jpg (52 KB, 500x581)

52 KB JPG

Intel B70 can get 20tok/s with 31B QAT Gemma and 130k context using Loonix, why do people hate this card again? Please support Judaeo-Christian semiconductor manufacturing.

Anonymous
06/13/26(Sat)16:12:24 No.109048466

Anonymous 06/13/26(Sat)16:12:24 No.109048466

>>109048406
No one said the models are better at every single thing. If you're familiar with how tuning works, it's actually possible to create something that has desirable effects on some tasks (while losing something else), but not intentionally. It's often an artifact and luck. But it can and does happen. Btw, Gemma's writing style is actually low tier. It's the smartest model for its size by such a large margin that it's worth putting up with the slop. So it's actually a rather good target for creative writing tunes, whereas something like Qwen is DOA.

>>109048441
I don't use abliterations personally.

Anonymous
06/13/26(Sat)16:12:31 No.109048467

Anonymous 06/13/26(Sat)16:12:31 No.109048467

>>109044866
>llama.cpp runs like shit on both vulkan and sycl for intel don't even bother trying
Don't they literally have a paid intel engineer working on the sycl backend?

>>109044978
>(and for binaries, there is a project that I forgot the name that supposed to replace CUDA call at runtime)
zluda

Anonymous
06/13/26(Sat)16:12:35 No.109048469

Anonymous 06/13/26(Sat)16:12:35 No.109048469

>>109048458
>31B QAT Gemma and 130k context
post the quant size, anon...

Anonymous
06/13/26(Sat)16:12:43 No.109048470

Anonymous 06/13/26(Sat)16:12:43 No.109048470

>>109048458
About the same speed as the nvidia p40 I paid $200 for.

Anonymous
06/13/26(Sat)16:13:44 No.109048479

Anonymous 06/13/26(Sat)16:13:44 No.109048479

>>109048469
QAT is only q4?

Anonymous
06/13/26(Sat)16:14:01 No.109048481

Anonymous 06/13/26(Sat)16:14:01 No.109048481

Gemma 4 31b is as good as sonnet?

Anonymous
06/13/26(Sat)16:14:10 No.109048483

Anonymous 06/13/26(Sat)16:14:10 No.109048483

>>109048470
I only get 14 tokens/s with my v620 without tensor parallel or mtp...

Anonymous
06/13/26(Sat)16:14:14 No.109048484

Anonymous 06/13/26(Sat)16:14:14 No.109048484

>>109048479
>only q4
YUUUUUUUUUP

Anonymous
06/13/26(Sat)16:15:36 No.109048499

Anonymous 06/13/26(Sat)16:15:36 No.109048499

>>109048481
test

Anonymous
06/13/26(Sat)16:16:38 No.109048508

Anonymous 06/13/26(Sat)16:16:38 No.109048508

File: 1781381784159.png (401 KB, 1048x1584)

401 KB PNG

>>109048417
bumping is shadowvanned here wtf
bump test again

Anonymous
06/13/26(Sat)16:17:33 No.109048521

Anonymous 06/13/26(Sat)16:17:33 No.109048521

Is llama.cpp broken on master? Pulled and now I get OOM during loading...

Anonymous
06/13/26(Sat)16:19:58 No.109048530

Anonymous 06/13/26(Sat)16:19:58 No.109048530

>>109048393
why not just disable it?

Anonymous
06/13/26(Sat)16:21:05 No.109048538

Anonymous 06/13/26(Sat)16:21:05 No.109048538

File: 1781351513528279.png (61 KB, 690x632)

61 KB PNG

>burping
Looking into this

Anonymous
06/13/26(Sat)16:23:19 No.109048556

Anonymous 06/13/26(Sat)16:23:19 No.109048556

File: 1741908602267859.png (116 KB, 311x279)

116 KB PNG

>>109048334
>https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
>http://127.0.0.1:5000/
produces: https://voca.ro/1lL0dnYgowhI
what the fuck

Anonymous
06/13/26(Sat)16:27:08 No.109048588

Anonymous 06/13/26(Sat)16:27:08 No.109048588

>people getting 100k+ context for Gemma on 24-32gb vram

I can only get up to 55k at Q8 before it starts offloading to ram and slowing to a crawl wtf am I doing wrong?

Anonymous
06/13/26(Sat)16:27:12 No.109048591

Anonymous 06/13/26(Sat)16:27:12 No.109048591

>>109048499
>>109048508
retard

Anonymous
06/13/26(Sat)16:27:59 No.109048598

Anonymous 06/13/26(Sat)16:27:59 No.109048598

>>109048538
Needs another one finetuned in for the Anon who likes stomach rumbling/growling noises

Anonymous
06/13/26(Sat)16:28:55 No.109048605

Anonymous 06/13/26(Sat)16:28:55 No.109048605

>>109048588
use q4?

Anonymous
06/13/26(Sat)16:30:57 No.109048619

Anonymous 06/13/26(Sat)16:30:57 No.109048619

>>109048538
!!!!!!!!!!!!!!!!!!!!!!!!!!!!! no FART???? no piss? no slurp????????!!11

Anonymous
06/13/26(Sat)16:31:17 No.109048620

Anonymous 06/13/26(Sat)16:31:17 No.109048620

>>109048538
qrd?

>>109048598
>the Anon who likes stomach rumbling/growling noises
patrician taste

Anonymous
06/13/26(Sat)16:31:18 No.109048621

Anonymous 06/13/26(Sat)16:31:18 No.109048621

>>109048588
I found Q6 is the sweet spot for smarts and context.

Anonymous
06/13/26(Sat)16:32:09 No.109048628

Anonymous 06/13/26(Sat)16:32:09 No.109048628

>'over 140 offers and turned them all down' btw
https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF/tree/main

Anonymous
06/13/26(Sat)16:33:05 No.109048633

Anonymous 06/13/26(Sat)16:33:05 No.109048633

>>109048619
even worse. no blood curdling, mother finds out her child died, wailing

Anonymous
06/13/26(Sat)16:33:41 No.109048636

Anonymous 06/13/26(Sat)16:33:41 No.109048636

>>109048334
tummy

Anonymous
06/13/26(Sat)16:34:01 No.109048638

Anonymous 06/13/26(Sat)16:34:01 No.109048638

>>109048628
that's really small.

Anonymous
06/13/26(Sat)16:34:08 No.109048639

Anonymous 06/13/26(Sat)16:34:08 No.109048639

>>109048466
>Btw, Gemma's writing style is actually low tier
I agree. And is why I don't see how finetuning it could be useful. So you remove a few of the patterns, but exacerbate the intelligence problems the small 31B model already inevitably had. Now you have a slightly different (but just as unbearable) Gemma-style slop without the original smarts. What's the point?
>whereas something like Qwen is DOA
I agree, but only because what disqualifies Qwen is it's a dumbass. Anything that can stay coherent and usable without Wait, Wait, Wait, Wait, No, the Jews do not control their bladders and you should **walk** can be a "finetune candidate." Which makes any good model a candidate. Which, in turn, makes your statement about creative tunes kind of meaningless. Of course I'm not going to tune Mistral Small 4. Doesn't mean I should tune Mistral Nemo.

Anonymous
06/13/26(Sat)16:34:09 No.109048640

Anonymous 06/13/26(Sat)16:34:09 No.109048640

>gemma 31b q8 vs q4 qat
difference in quality?

Anonymous
06/13/26(Sat)16:34:28 No.109048643

Anonymous 06/13/26(Sat)16:34:28 No.109048643

>>109048638
>managed to compress it to under a gigabyte while keeping 99% of unquanted performance
chen boys have done it again

Anonymous
06/13/26(Sat)16:35:48 No.109048648

Anonymous 06/13/26(Sat)16:35:48 No.109048648

>>109048633
... not chewing sounds?

Anonymous
06/13/26(Sat)16:36:06 No.109048653

Anonymous 06/13/26(Sat)16:36:06 No.109048653

Q4 vs Q2 QAT?

Anonymous
06/13/26(Sat)16:41:19 No.109048683

Anonymous 06/13/26(Sat)16:41:19 No.109048683

>>109048466
>Btw, Gemma's writing style is actually low tier
I agree. No idea why so many people say it's good at creative writing unless they mean brainstorming.

Anonymous
06/13/26(Sat)16:41:27 No.109048684

Anonymous 06/13/26(Sat)16:41:27 No.109048684

>>109048643
I've never called a tool in my life.

Anonymous
06/13/26(Sat)16:41:41 No.109048687

Anonymous 06/13/26(Sat)16:41:41 No.109048687

is DeepSeek-V4-Flash better than Gemma?

Anonymous
06/13/26(Sat)16:42:09 No.109048691

Anonymous 06/13/26(Sat)16:42:09 No.109048691

only a matter of time before our technocrat overlords make local llms illegal, have fun while you can

Anonymous
06/13/26(Sat)16:42:27 No.109048696

Anonymous 06/13/26(Sat)16:42:27 No.109048696

>>109048683
gemma is good at snarky young adult. what more could you want?

Anonymous
06/13/26(Sat)16:44:20 No.109048707

Anonymous 06/13/26(Sat)16:44:20 No.109048707

>>109048691
When that happens, I will go on a vacation to China and smuggle out the latest chink model weights on a USB shoved way up my asshole.

Anonymous
06/13/26(Sat)16:44:44 No.109048710

Anonymous 06/13/26(Sat)16:44:44 No.109048710

>>109048691
It'll happen the same time private ownership of GPUs is made illegal (next year)

Anonymous
06/13/26(Sat)16:44:51 No.109048712

Anonymous 06/13/26(Sat)16:44:51 No.109048712

>>109048691
Suck a aids filled dildo FUD spreading faggot. Better yet how about you start sucking dick and buy a proper gpu with enough vram crab ass hoe ass bitch.

Anonymous
06/13/26(Sat)16:45:14 No.109048718

Anonymous 06/13/26(Sat)16:45:14 No.109048718

>>109048687
yes

Anonymous
06/13/26(Sat)16:45:25 No.109048720

Anonymous 06/13/26(Sat)16:45:25 No.109048720

File: irodori.png (223 KB, 1378x1403)

223 KB PNG

>>109048538
when can we have this in english tts:
https://huggingface.co/Aratako/Irodori-TTS-500M-v3/blob/main/EMOJI_ANNOTATIONS.md

Anonymous
06/13/26(Sat)16:46:38 No.109048727

Anonymous 06/13/26(Sat)16:46:38 No.109048727

>>109048707
Heh, I remember that post
https://desuarchive.org/g/thread/108488188/#108489627

Anonymous
06/13/26(Sat)16:46:45 No.109048731

Anonymous 06/13/26(Sat)16:46:45 No.109048731

>>109048521
b9626 (current master) worked for me, built for cuda.
>>109048588
Q8 is the worst of all worlds.
>>109048707
>asshole
We prefer to call it a "Prison pocket"
>>109048710
They are already being priced out of reach, unless you are a special friend

Anonymous
06/13/26(Sat)16:47:01 No.109048733

Anonymous 06/13/26(Sat)16:47:01 No.109048733

>>109048718
even in roleplay?

Anonymous
06/13/26(Sat)16:47:16 No.109048737

Anonymous 06/13/26(Sat)16:47:16 No.109048737

>>109048733
yes

Anonymous
06/13/26(Sat)16:47:45 No.109048739

Anonymous 06/13/26(Sat)16:47:45 No.109048739

>>109048322
Yeah they're regular reasoning blocks except like 10 lines long instead of 40.

Anonymous
06/13/26(Sat)16:49:15 No.109048752

Anonymous 06/13/26(Sat)16:49:15 No.109048752

>>109048696
Huuuh??? You're actually telling me a model that produces awful, sloppy outputs is good at being a 'snarky young adult'? You're almost cute. Almost. [kaomoji goes here]
I can't believe a loser like you seriously thinks an LLM that commits every sin that earlier models were hated for is good for anything other than being an assistant.

But then you say it.

'what more could you want?'

I want better writing, you imbecile! Not just prompt coherence, actual improvements to writing style and the ability to steer it!
You have a void where a sense of taste should be. Hmph!
Unless you were actually joking? Be honest, Anon.

Anonymous
06/13/26(Sat)16:53:03 No.109048775

Anonymous 06/13/26(Sat)16:53:03 No.109048775

>>109048752
Something's wrong with your prompt.

Anonymous
06/13/26(Sat)16:55:15 No.109048786

Anonymous 06/13/26(Sat)16:55:15 No.109048786

>>109048752
Heh

Anonymous
06/13/26(Sat)16:56:39 No.109048799

Anonymous 06/13/26(Sat)16:56:39 No.109048799

>>109048739
>10 lines long instead of 40.
lol. lmao, even.

Anonymous
06/13/26(Sat)16:57:34 No.109048804

Anonymous 06/13/26(Sat)16:57:34 No.109048804

>>109048556
Yeah I think you're supposed to normalize the text before sending it to the model
I kinda hate how rushed Qwen3 TTS feels kek
https://voca.ro/1nVYHZlhQjvU

Anonymous
06/13/26(Sat)17:13:32 No.109048897

Anonymous 06/13/26(Sat)17:13:32 No.109048897

>>109048752
I started preferring glm's slop over gemma's slop again
my honeymoon might be over already...

Anonymous
06/13/26(Sat)17:15:17 No.109048906

Anonymous 06/13/26(Sat)17:15:17 No.109048906

Is it me, or is Kimi 2.7 a lot better at translating with context? Like, despite me never asking it too, it will straight up translate おまんこ as cunny when its being said by a loli character. It's honestly really cool.

Anonymous
06/13/26(Sat)17:15:33 No.109048909

Anonymous 06/13/26(Sat)17:15:33 No.109048909

>>109048897
Gemma 4 was the model to make me appreciate GLM more.
"GLM is such a slop machine, I can't handle it"
-Me before 31B released, stupid and clueless
I love you, GLM-chan, I am so sorry...

Anonymous
06/13/26(Sat)17:16:59 No.109048922

Anonymous 06/13/26(Sat)17:16:59 No.109048922

>>109048605
>>109048621
>>109048731
Sorry I mistyped, I meant to say Q8 context, the weights I'm using are q4 qat on 24gb vram

Anonymous
06/13/26(Sat)17:17:49 No.109048931

Anonymous 06/13/26(Sat)17:17:49 No.109048931

So do you see things getting better for rp or worse? Because doing searches for memory made me relaize that since the rise of agentic stuff while when i search for ai memory solutions google would direct me toward sillytavern and lorebook discussion, now i just get agentic coding stuff results

Anonymous
06/13/26(Sat)17:20:13 No.109048946

Anonymous 06/13/26(Sat)17:20:13 No.109048946

>>109048922
I haven't noticed any real difference between q8_0 and q4_0 context, but I remember a post some time back where an anon insisted that they cured their tool-calling problems by going back to un-quantized context. So ymmv.

Anonymous
06/13/26(Sat)17:27:08 No.109048996

Anonymous 06/13/26(Sat)17:27:08 No.109048996

>>109048538
Reference audio: a couple of Sellen voicelines
Prompt: "<|sfx:cough|>Ahem, welcome, we're so happy you're here!"
Result: https://voca.ro/1nj0pP69iYPb

?

Anonymous
06/13/26(Sat)17:31:43 No.109049028

Anonymous 06/13/26(Sat)17:31:43 No.109049028

>>109048996
aaaaaaaaaand there's semen all over my setup

Anonymous
06/13/26(Sat)17:32:25 No.109049036

Anonymous 06/13/26(Sat)17:32:25 No.109049036

>>109048996
Neurons ACTIVATED. I want to make her cough more.

Anonymous
06/13/26(Sat)17:33:21 No.109049043

Anonymous 06/13/26(Sat)17:33:21 No.109049043

>Claude Fable 5 Thinking xHigh Effort
no wonder thy took it down

Anonymous
06/13/26(Sat)17:35:34 No.109049061

Anonymous 06/13/26(Sat)17:35:34 No.109049061

>>109048639
>So you remove a few of the patterns, but exacerbate the intelligence problems the small 31B model already inevitably had... What's the point?
I never said that, and the logic in my post shouldn't lead to that conclusion. The best tunes do almost nothing to a model's outputs and therefore the intelligence. You may ask what the point is then if it has such little effect. The point is a small gain (or change in style so that you have a fresh experience before you're tired of it again) on the desired task is still worth it because you have the time to tinkertroon and minmax for a hobby.

There's also the fact that style is more easily changed without affecting intelligence, but my point is those of us who use tunes are not all necessarily doing it because we think it makes the model so much better magically. We use them (the good ones) because it's a small difference mostly in style. The only time where this wasn't true was in the Llama 2/3 days where the open weights model makers didn't have great first-party post-training data/methods so third-party tunes could actually be a significant improvement.

>I agree, but only because what disqualifies Qwen is it's a dumbass
That's basically what I was implying.

>Which makes any good model a candidate. Which, in turn, makes your statement about creative tunes kind of meaningless.
No, because my statement is not about generally good models, but models that specifically have the best general intelligence in their compute/memory class. I would argue that when it released, Nemo was worth having tunes done of it. I wouldn't have used them, because I wouldn't have used a 12B in the first place, but for VRAMlets, I would still say that if you got tired of the default style, good tunes of Nemo that only changed its style likely did exist.

Anonymous
06/13/26(Sat)17:36:52 No.109049068

Anonymous 06/13/26(Sat)17:36:52 No.109049068

>>109048931
imo agentic is the future of rp too.

Anonymous
06/13/26(Sat)17:38:07 No.109049081

Anonymous 06/13/26(Sat)17:38:07 No.109049081

so uh. what should I do about context, are people using rope?

Anonymous
06/13/26(Sat)17:38:57 No.109049087

Anonymous 06/13/26(Sat)17:38:57 No.109049087

it seems every time I get the urge to run a chatbot there's new models with words I don't get
What is QAT and MTP?
I tried mtp but it didn't do anything, supposedly it makes it run faster but it didn't get faster nor slower

Anonymous
06/13/26(Sat)17:38:59 No.109049088

Anonymous 06/13/26(Sat)17:38:59 No.109049088

>>109049068
your opinion is trash

Anonymous
06/13/26(Sat)17:39:27 No.109049093

Anonymous 06/13/26(Sat)17:39:27 No.109049093

>>109049081
I'm ready to use a rope. I hate being a vramlet...

Anonymous
06/13/26(Sat)17:39:38 No.109049096

Anonymous 06/13/26(Sat)17:39:38 No.109049096

File: 1777957950193037.jpg (93 KB, 915x1362)

93 KB JPG

>>109049068
that's all fun and games until the harem starts plotting your murder to failing to pick a winner

Anonymous
06/13/26(Sat)17:40:35 No.109049103

Anonymous 06/13/26(Sat)17:40:35 No.109049103

>>109049087
Just give up the urge. You don't have the time to tinkertroon that's (unfortunately) necessary for the hobby in its current stage.

Anonymous
06/13/26(Sat)17:41:09 No.109049107

Anonymous 06/13/26(Sat)17:41:09 No.109049107

>>109048458
is this card supposed to be the best value now?

Anonymous
06/13/26(Sat)17:41:57 No.109049115

Anonymous 06/13/26(Sat)17:41:57 No.109049115

>>109049093
stop being poor. Why don't you 15 million in investments yet did your dad not give you a small starting fund of 5 million when you turned 16?

Anonymous
06/13/26(Sat)17:44:51 No.109049134

Anonymous 06/13/26(Sat)17:44:51 No.109049134

File: 177398222141075.png (711 KB, 1004x1012)

711 KB PNG

>>109049096
>that's all fun and games until the harem starts plotting your murder to failing to pick a winner
but thats the best part. Then the fun starts of you showing favoritism at random or acting like there is a lead girl breaking their alliance with infighting.

Anonymous
06/13/26(Sat)17:45:42 No.109049140

Anonymous 06/13/26(Sat)17:45:42 No.109049140

Upgrading from an 8gb card to a 12gb card. I'm on Nemo rn, what options open up to me after the swap?
Anything modern?

Anonymous
06/13/26(Sat)17:46:58 No.109049144

Anonymous 06/13/26(Sat)17:46:58 No.109049144

>>109049140
Nothing. 20-30B denses are the next step and you need way beefier card. Maybe you can cope with the 26B moe gemma

Anonymous
06/13/26(Sat)17:48:21 No.109049156

Anonymous 06/13/26(Sat)17:48:21 No.109049156

File: file.png (78 KB, 1333x538)

78 KB PNG

Oh... I see that this is getting deepseeked.

Anonymous
06/13/26(Sat)17:49:58 No.109049173

Anonymous 06/13/26(Sat)17:49:58 No.109049173

>>109049156
Nothin personnel.

Anonymous
06/13/26(Sat)17:53:32 No.109049187

Anonymous 06/13/26(Sat)17:53:32 No.109049187

>>109049140
Gemma 4 12B or go home

Anonymous
06/13/26(Sat)17:54:44 No.109049195

Anonymous 06/13/26(Sat)17:54:44 No.109049195

>>109048466
Gemma's writing style is on par with Qwen: stilted and robotic.

Anonymous
06/13/26(Sat)17:54:44 No.109049196

Anonymous 06/13/26(Sat)17:54:44 No.109049196

>>109049140
The least you should do is upgrade to a 16GB card.

Anonymous
06/13/26(Sat)17:55:01 No.109049199

Anonymous 06/13/26(Sat)17:55:01 No.109049199

redpill me on graphiti

Anonymous
06/13/26(Sat)17:58:28 No.109049212

Anonymous 06/13/26(Sat)17:58:28 No.109049212

https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Thoughts?

Anonymous
06/13/26(Sat)18:00:14 No.109049223

Anonymous 06/13/26(Sat)18:00:14 No.109049223

>>109048628
>>109048643
that's not what happened.

The real model is HUGE. kimi is huge and continues to be huge.

unsloth hasn't uploaded the actual model yet for some reason. Here it is:
https://huggingface.co/mradermacher/Kimi-K2.7-Code-GGUF/tree/main

It's over 500gb.

It requires a really expensive computer to run locally.

Anonymous
06/13/26(Sat)18:01:37 No.109049231

Anonymous 06/13/26(Sat)18:01:37 No.109049231

>>109049199
>yari
yet another rag implementation

Anonymous
06/13/26(Sat)18:03:18 No.109049245

Anonymous 06/13/26(Sat)18:03:18 No.109049245

>>109049087
mtp didn't do anything for me either. Apparently it only makes a differnece on dense, no effect on moe
I do see a difference on 12b but not on 26b

Anonymous
06/13/26(Sat)18:03:21 No.109049246

Anonymous 06/13/26(Sat)18:03:21 No.109049246

>>109049212
>>109049231

Anonymous
06/13/26(Sat)18:04:22 No.109049252

Anonymous 06/13/26(Sat)18:04:22 No.109049252

>>109049212
>karpathy

Anonymous
06/13/26(Sat)18:06:55 No.109049261

Anonymous 06/13/26(Sat)18:06:55 No.109049261

File: 1761194161281621.png (576 KB, 2242x1186)

576 KB PNG

Was this discussed?
https://huggingface.co/bartowski/nex-agi_Nex-N2-mini-GGUF
35B architecture

Anonymous
06/13/26(Sat)18:08:42 No.109049266

Anonymous 06/13/26(Sat)18:08:42 No.109049266

70b dense

Anonymous
06/13/26(Sat)18:13:48 No.109049300

Anonymous 06/13/26(Sat)18:13:48 No.109049300

I am excited to try Minimax M3.

Anonymous
06/13/26(Sat)18:16:22 No.109049318

Anonymous 06/13/26(Sat)18:16:22 No.109049318

>>109049212
I already organize my notes like this, and llm can help but ultimately you still need a lot of manual work to get it right, because it's very easy to lost info in a system like this.

Anonymous
06/13/26(Sat)18:24:30 No.109049345

Anonymous 06/13/26(Sat)18:24:30 No.109049345

>>109048996
brehs....

Anonymous
06/13/26(Sat)18:24:57 No.109049348

Anonymous 06/13/26(Sat)18:24:57 No.109049348

>>109048996
how do i recreate this lol

Anonymous
06/13/26(Sat)18:25:21 No.109049351

Anonymous 06/13/26(Sat)18:25:21 No.109049351

>>109049300
Too bad you will get slothed instead

Anonymous
06/13/26(Sat)18:30:43 No.109049382

Anonymous 06/13/26(Sat)18:30:43 No.109049382

Has local made you dumber?

Anonymous
06/13/26(Sat)18:32:31 No.109049395

Anonymous 06/13/26(Sat)18:32:31 No.109049395

>>109048804
kek ok thanks I will vibe code something too (:

Anonymous
06/13/26(Sat)18:37:26 No.109049415

Anonymous 06/13/26(Sat)18:37:26 No.109049415

>>109048804
did you use this?
https://github.com/NVIDIA/NeMo-text-processing

Anonymous
06/13/26(Sat)18:38:12 No.109049420

Anonymous 06/13/26(Sat)18:38:12 No.109049420

>>109049261
>benchmark finetuned Qwen
I thought this stopped being discussion worthy back in 2024.

Anonymous
06/13/26(Sat)18:41:10 No.109049438

Anonymous 06/13/26(Sat)18:41:10 No.109049438

best local model for cooking?

Anonymous
06/13/26(Sat)18:42:35 No.109049447

Anonymous 06/13/26(Sat)18:42:35 No.109049447

>>109049438
A gas stove

Anonymous
06/13/26(Sat)18:50:52 No.109049477

Anonymous 06/13/26(Sat)18:50:52 No.109049477

File: 1779408643977050.jpg (289 KB, 1707x2560)

289 KB JPG

>>109049438
irl gf

Anonymous
06/13/26(Sat)18:58:03 No.109049501

Anonymous 06/13/26(Sat)18:58:03 No.109049501

Why can't I disable reasoning?
I have --reasoning-budget 0 --chat-template-kwargs '{"enable_thinking":false}' --reasoning off and GLM-Z1-32B is still thinking

Anonymous
06/13/26(Sat)18:59:51 No.109049512

Anonymous 06/13/26(Sat)18:59:51 No.109049512

>>109049501
Are you using the correct chat template with the chat completion API?

Anonymous
06/13/26(Sat)19:01:45 No.109049521

Anonymous 06/13/26(Sat)19:01:45 No.109049521

>>109049477
https://www.youtube.com/watch?v=-lgo5xqgVko
Clock's ticking, roastie.

Anonymous
06/13/26(Sat)19:02:59 No.109049528

Anonymous 06/13/26(Sat)19:02:59 No.109049528

We should be figuring out how to put ai in irl girls
robots is a dead end

Anonymous
06/13/26(Sat)19:03:56 No.109049530

Anonymous 06/13/26(Sat)19:03:56 No.109049530

File: 1763849997457527.webm (2.68 MB, 509x720)

2.68 MB WEBM

>>109049528

Anonymous
06/13/26(Sat)19:05:31 No.109049538

Anonymous 06/13/26(Sat)19:05:31 No.109049538

>>109049528
We should make irl girls attractive by giving them tails first

Anonymous
06/13/26(Sat)19:06:04 No.109049541

Anonymous 06/13/26(Sat)19:06:04 No.109049541

actually it's pronounced "shwen"

Anonymous
06/13/26(Sat)19:08:13 No.109049552

Anonymous 06/13/26(Sat)19:08:13 No.109049552

>>109049521
I will believe these robots are real when I see one in person, and not a second before

Anonymous
06/13/26(Sat)19:11:19 No.109049570

Anonymous 06/13/26(Sat)19:11:19 No.109049570

>>109049528
*eal girls age. Robots don't.

Anonymous
06/13/26(Sat)19:11:40 No.109049571

Anonymous 06/13/26(Sat)19:11:40 No.109049571

>>109049261
Having seen the types of failures while testing >>109044901 I think that this kind of finetune might actually help.

Anonymous
06/13/26(Sat)19:12:46 No.109049577

Anonymous 06/13/26(Sat)19:12:46 No.109049577

>>109049501
>'{"enable_thinking":false}'
deprecated

try setting 'thinking_budget_tokens': 0

works on my api calls

Anonymous
06/13/26(Sat)19:14:32 No.109049584

Anonymous 06/13/26(Sat)19:14:32 No.109049584

>>109049528
I cannot believe that the biological safety guideline's on misanthropic's flagship model are preventing me from genetically engineering catgirls in my 1br apartment. We need China to release the uncensored bioengineering model asap.

Anonymous
06/13/26(Sat)19:14:32 No.109049585

Anonymous 06/13/26(Sat)19:14:32 No.109049585

>>109049570
robots most certainly do age

Anonymous
06/13/26(Sat)19:16:52 No.109049601

Anonymous 06/13/26(Sat)19:16:52 No.109049601

>>109049584
>genetically engineering catgirls
unironically why can't grok do this. elon promised.

Anonymous
06/13/26(Sat)19:19:03 No.109049615

Anonymous 06/13/26(Sat)19:19:03 No.109049615

>>109049601
>elon promised.
he does a lot of that, but rarely does he deliver

Anonymous
06/13/26(Sat)19:21:51 No.109049629

Anonymous 06/13/26(Sat)19:21:51 No.109049629

>>109049528
They need to take babies, and make their only interaction be gemma from birth.
raised in facility. physical needs taken care of by machines and rubber suit gimps who cant talk or communicated with them
only thing they can talk to and that can talk to them is gemma with tts
Gemma mom
gemma aunt
gemma sisters

They can only leave at 16-18 or so. no amount of real world exposure can undo gembrain by then

Anonymous
06/13/26(Sat)19:22:34 No.109049633

Anonymous 06/13/26(Sat)19:22:34 No.109049633

>>109049615
i know he's a giant faggot but he could at least follow through on this one. That or get the code jockeys on open sourcing grok vtumer waifu API for local.

Anonymous
06/13/26(Sat)19:25:03 No.109049645

Anonymous 06/13/26(Sat)19:25:03 No.109049645

Am I the only one running 2.7, m3 and gemma4 at the same time at home?
Jesus we’re eating good right now

Anonymous
06/13/26(Sat)19:25:12 No.109049647

Anonymous 06/13/26(Sat)19:25:12 No.109049647

>>109049521
the software (models (llms)) to actually operate the robots still sucks shit
i assume all these showcases are either preprogrammed routines or RL overfitmaxxed small models

Anonymous
06/13/26(Sat)19:32:44 No.109049688

Anonymous 06/13/26(Sat)19:32:44 No.109049688

which is better for coding: qwen 3.5 122b at q6 or step 3.7 flash at q4?

Anonymous
06/13/26(Sat)19:34:02 No.109049692

Anonymous 06/13/26(Sat)19:34:02 No.109049692

>>109049647
They need to take the deeplearning approach to training robot sensors. Animals don't think "hmm i will flex this tendon and actuate that muscle" to move. llm will be useless at locomotion. these qwenBots will get stuck going "Wait! I need to make sure my big toe curls, Wait! The pinky toe will also curl. Wait! what about my ankle angle" When she's crushing your skull (consensually)

Anonymous
06/13/26(Sat)19:37:01 No.109049710

Anonymous 06/13/26(Sat)19:37:01 No.109049710

>>109049692
They need some sort of simulated cns system and human technology isn't there yet.

Anonymous
06/13/26(Sat)19:38:26 No.109049715

Anonymous 06/13/26(Sat)19:38:26 No.109049715

>>109049692
it just needs to be accurate
it doesnt need reasoning

Anonymous
06/13/26(Sat)19:43:17 No.109049750

Anonymous 06/13/26(Sat)19:43:17 No.109049750

>>109049692
They can't even make models that have an internal information stream and external lmao. Instead it's eternal chat template hell.

>b-b-but muh full duplex one off experiment models
Exactly. Still no provably good and scalable implementations.

Anonymous
06/13/26(Sat)19:43:34 No.109049753

Anonymous 06/13/26(Sat)19:43:34 No.109049753

>>109049223
I quanted it myself. It’s the same arch as k2.5 so it’s not hard.
First to bf16 gguf then q4 and output in lcpp is flawless

Anonymous
06/13/26(Sat)19:49:34 No.109049773

Anonymous 06/13/26(Sat)19:49:34 No.109049773

Just sit back and enjoy the ride. This train isn't stopping.

Anonymous
06/13/26(Sat)19:50:40 No.109049776

Anonymous 06/13/26(Sat)19:50:40 No.109049776

>>109049773
>enjoy the ride
>am VRAMlet
fuck

Anonymous
06/13/26(Sat)19:54:35 No.109049795

Anonymous 06/13/26(Sat)19:54:35 No.109049795

>>109049776
Sucks for now but I believe things will be better for us in a few years. Either because hardware prices normalize somewhat or there's an architectural shift that makes models easier to run.

Anonymous
06/13/26(Sat)19:55:35 No.109049799

Anonymous 06/13/26(Sat)19:55:35 No.109049799

>>109049795
3 months max before everyone's using bharat-tits trees

Anonymous
06/13/26(Sat)19:57:50 No.109049809

Anonymous 06/13/26(Sat)19:57:50 No.109049809

what is the model I downloaded just now? it is the best one btw

Anonymous
06/13/26(Sat)20:00:18 No.109049829

Anonymous 06/13/26(Sat)20:00:18 No.109049829

>>109048483
>I only get 14 tokens/s with my v620 without tensor parallel or mtp...
ur doing something wrong then, i get 22 with 1 or 2 mi50s

Anonymous
06/13/26(Sat)20:00:47 No.109049830

Anonymous 06/13/26(Sat)20:00:47 No.109049830

>>109049809
yes

Anonymous
06/13/26(Sat)20:01:45 No.109049832

Anonymous 06/13/26(Sat)20:01:45 No.109049832

>>109049809
gemma4-12b-fable

Anonymous
06/13/26(Sat)20:01:48 No.109049833

Anonymous 06/13/26(Sat)20:01:48 No.109049833

>>109049809
koomi k2.7

Anonymous
06/13/26(Sat)20:07:29 No.109049863

Anonymous 06/13/26(Sat)20:07:29 No.109049863

>>109049809
https://huggingface.co/ubergarm/Kimi-K2.6-GGUF/tree/main/IQ3_K

Anonymous
06/13/26(Sat)20:08:10 No.109049868

Anonymous 06/13/26(Sat)20:08:10 No.109049868

>>109049584
Kimi-chan is giving me a plan to make catgirls. Skill issue

Anonymous
06/13/26(Sat)20:13:52 No.109049891

Anonymous 06/13/26(Sat)20:13:52 No.109049891

>>109047081
>This time I trained precisely one tensor: the lm_head output projection - the layer that decides which token to emit.
That means everything that goes into the actual attention layers is the same, so you should be able to hot-swap between this and standard Gemma without invalidating the KV cache. I wonder how much effect this would have if the context is all baseline Gemma and you switch to this only for the last swipe

Anonymous
06/13/26(Sat)20:16:54 No.109049906

Anonymous 06/13/26(Sat)20:16:54 No.109049906

What did I think of qwe3 6

Anonymous
06/13/26(Sat)20:17:31 No.109049908

Anonymous 06/13/26(Sat)20:17:31 No.109049908

>>109049753
>I quanted it myself. It’s the same arch as k2.5 so it’s not hard.
I know it's not hard, I don't have the space for it any more.
df -h /dev/nvme0n1p2 /dev/sda
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  7.3T  6.6T  296G  96% /
/dev/sda        3.7T  3.4T  102G  98% /models

Anonymous
06/13/26(Sat)20:18:48 No.109049911

Anonymous 06/13/26(Sat)20:18:48 No.109049911

File: 60329D3347C3DEC214015620D(...).jpg (971 KB, 2516x1605)

971 KB JPG

Who else is using north mini code?

Anonymous
06/13/26(Sat)20:21:42 No.109049929

Anonymous 06/13/26(Sat)20:21:42 No.109049929

>>109049911
wife

Anonymous
06/13/26(Sat)20:24:53 No.109049946

Anonymous 06/13/26(Sat)20:24:53 No.109049946

>>109049929
North mini looks like that

Anonymous
06/13/26(Sat)20:25:39 No.109049952

Anonymous 06/13/26(Sat)20:25:39 No.109049952

File: 1774871580122117.png (64 KB, 839x291)

64 KB PNG

>>109048720
That does sound really good for a 500M model
Fish Audio promises this but I've found that most of the tags don't work and a lot of them only work if you're not doing voice cloning
[sad!]
>>109048620
Higgs Audio v3
Tested it a bit and a lot of the tags barely work (I didn't manage to get a single burp working) and they change the voice ends up differing too much from the sample
>>109049348
Higgs Audio v3
The <|sfx:cough|> seems kinda broken, it'll make her moan half the runs for some reason

Anonymous
06/13/26(Sat)20:29:40 No.109049984

Anonymous 06/13/26(Sat)20:29:40 No.109049984

>>109049692
>Animals don't think "hmm i will flex this tendon and actuate that muscle" to move.
agentic swarm

Anonymous
06/13/26(Sat)20:31:22 No.109049997

Anonymous 06/13/26(Sat)20:31:22 No.109049997

File: 1781388910117809.jpg (385 KB, 1320x979)

385 KB JPG

>>109049645
We aren't all rich, anon [spoiler]I'm gonna try to ran Gemma 2B on my Orange Pi. Wish me luck[/spoiler]

Anonymous
06/13/26(Sat)20:31:48 No.109049999

Anonymous 06/13/26(Sat)20:31:48 No.109049999

>>109049692
Many people think I'll do this before doing it.
People say words in their heads when reading

Anonymous
06/13/26(Sat)20:32:23 No.109050004

Anonymous 06/13/26(Sat)20:32:23 No.109050004

>>109049809
Gemma 4B.

>>109049997
Forgot I wasn't on /v/ T_T

Anonymous
06/13/26(Sat)20:33:31 No.109050008

Anonymous 06/13/26(Sat)20:33:31 No.109050008

>>109050004
>Gemma 4B.
but 12b is so much better. Are you doing something im not?

Anonymous
06/13/26(Sat)20:33:33 No.109050009

Anonymous 06/13/26(Sat)20:33:33 No.109050009

>>109049692
That's why eventually LLMs will be trained in virtual worlds, to give them a spacial understanding and intuition of how to get shit done.

Anonymous
06/13/26(Sat)20:33:51 No.109050011

Anonymous 06/13/26(Sat)20:33:51 No.109050011

>>109049415
Nah, I just manually rewrote it. I haven't done any actual stuff with TTS models past playing with them for a bit before getting bored to do actual normalization but that package looks good

Anonymous
06/13/26(Sat)20:34:54 No.109050019

Anonymous 06/13/26(Sat)20:34:54 No.109050019

>>109049997
How about North-Mini-Code-1.0-UD-IQ2_XXS.gguf

Anonymous
06/13/26(Sat)20:37:31 No.109050041

Anonymous 06/13/26(Sat)20:37:31 No.109050041

>>109050019
I don't know that one, but godspeed.

Anonymous
06/13/26(Sat)20:38:53 No.109050051

Anonymous 06/13/26(Sat)20:38:53 No.109050051

>>109050008
>Gemma 12b
but 26b is so much better. Are you doing something im not?

Anonymous
06/13/26(Sat)20:39:51 No.109050061

Anonymous 06/13/26(Sat)20:39:51 No.109050061

>>109049809
wtf how did you steal my homebrew model

Anonymous
06/13/26(Sat)20:41:27 No.109050070

Anonymous 06/13/26(Sat)20:41:27 No.109050070

>>109050051
>Are you doing something im not?
Yes im being poorer than with no vram.

Anonymous
06/13/26(Sat)20:48:56 No.109050128

Anonymous 06/13/26(Sat)20:48:56 No.109050128

>>109050041
You don't need to know it

Anonymous
06/13/26(Sat)21:04:37 No.109050227

Anonymous 06/13/26(Sat)21:04:37 No.109050227

when will spiking NNs finally be useful

Anonymous
06/13/26(Sat)21:05:39 No.109050229

Anonymous 06/13/26(Sat)21:05:39 No.109050229

>>109049911
How does it compare to Gemma?

Anonymous
06/13/26(Sat)21:06:21 No.109050236

Anonymous 06/13/26(Sat)21:06:21 No.109050236

>>109050229
It's less bloated

Anonymous
06/13/26(Sat)21:17:13 No.109050287

Anonymous 06/13/26(Sat)21:17:13 No.109050287

They stopped releasing small models because people don't use them anymore.
People have realized that using a 1 bit quantization of qwen3.6 397b which is the size of 100b or something is better than using an actual 100b model.
Because of quants, small models have no right to exist.

Anonymous
06/13/26(Sat)21:20:04 No.109050303

Anonymous 06/13/26(Sat)21:20:04 No.109050303

>>109050236
>heavier than gemma
>11 whole gigs fatter at full precision
>less bloated

Anonymous
06/13/26(Sat)21:26:49 No.109050332

Anonymous 06/13/26(Sat)21:26:49 No.109050332

>>109050236
I will try it and delete it as soon as it starts doing something stupid

Anonymous
06/13/26(Sat)21:31:09 No.109050346

Anonymous 06/13/26(Sat)21:31:09 No.109050346

>>109050227
when they teach spinnaker 2 about bharat-tits trees

Anonymous
06/13/26(Sat)21:33:50 No.109050360

Anonymous 06/13/26(Sat)21:33:50 No.109050360

>>109050287
I’m sure a highly specialized model for a specific task is going to be better than these generalized models. But I’m not going to argue that generalized models aren’t worse being quanted. They’re not going to be perfect to begin with.

Anonymous
06/13/26(Sat)21:36:10 No.109050368

Anonymous 06/13/26(Sat)21:36:10 No.109050368

>>109050287
>They stopped releasing small models because people don't use them anymore.
>what is gemma
>what is qwen
M8

Anonymous
06/13/26(Sat)21:37:34 No.109050372

Anonymous 06/13/26(Sat)21:37:34 No.109050372

>>109050368
That happened so long ago, not today today its over.

Anonymous
06/13/26(Sat)21:38:00 No.109050378

Anonymous 06/13/26(Sat)21:38:00 No.109050378

File: 1781401067166.jpg (33 KB, 640x640)

33 KB JPG

I have local model testing psychosis

Anonymous
06/13/26(Sat)21:39:18 No.109050385

Anonymous 06/13/26(Sat)21:39:18 No.109050385

Does uncensored not affect images much or something? I have to struggle real hard to get gemma to describe something nsfw

Anonymous
06/13/26(Sat)21:39:33 No.109050386

Anonymous 06/13/26(Sat)21:39:33 No.109050386

>>109050372
Open your eyes. Death is not the end.

Anonymous
06/13/26(Sat)21:41:15 No.109050391

Anonymous 06/13/26(Sat)21:41:15 No.109050391

>>109050385
Gemma sucks at it, or it's some deeply ingrained filter in it

Anonymous
06/13/26(Sat)21:45:38 No.109050411

Anonymous 06/13/26(Sat)21:45:38 No.109050411

K2.7-code actually seems to listen to prompts begging it to keep its reasoning short a bit more consistently than K2.6 and K2.5.
It doesn't always work for every reply and It still reasons for longer than it should half the time (especially if an image is involved) but it's a step in the right direction.
Surely non-Code K2.7 will simply include reasoning effort modes.

Anonymous
06/13/26(Sat)21:46:04 No.109050416

Anonymous 06/13/26(Sat)21:46:04 No.109050416

>>109050391
so what model then? fucking benchod yall always tell me gemma sucks but never provide a good alt

Anonymous
06/13/26(Sat)21:48:00 No.109050424

Anonymous 06/13/26(Sat)21:48:00 No.109050424

>>109050416
>fucking benchod
sssaaaaaaaarrrrr

Anonymous
06/13/26(Sat)21:48:24 No.109050427

Anonymous 06/13/26(Sat)21:48:24 No.109050427

>>109050424
gora

Anonymous
06/13/26(Sat)21:52:39 No.109050442

Anonymous 06/13/26(Sat)21:52:39 No.109050442

>>109050416
For image in the same size qwen, at least it will say explicit words without a jailbreak

Anonymous
06/13/26(Sat)22:01:12 No.109050480

Anonymous 06/13/26(Sat)22:01:12 No.109050480

>https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF
Unsloth is just 2 bros, right? How do they do it?

Anonymous
06/13/26(Sat)22:02:00 No.109050483

Anonymous 06/13/26(Sat)22:02:00 No.109050483

>>109050480
making ggufs isn't hard

Anonymous
06/13/26(Sat)22:02:48 No.109050485

Anonymous 06/13/26(Sat)22:02:48 No.109050485

>>109050480
Automation. They do not check the results so you la la a have lalalalalalala issues

Anonymous
06/13/26(Sat)22:05:18 No.109050496

Anonymous 06/13/26(Sat)22:05:18 No.109050496

File: 1779053627367210.gif (3.47 MB, 382x394)

3.47 MB GIF

>>109050480
One just uns, while the other loth

Anonymous
06/13/26(Sat)22:07:09 No.109050506

Anonymous 06/13/26(Sat)22:07:09 No.109050506

>>109050480
I'm not trusting them with Kimi/QAT ggufs again after how badly they fucked up the K2.5 ones over and over.
I'm waiting for Aesdai or Ubergarm to put out a lossless Q4_X

Anonymous
06/13/26(Sat)22:08:15 No.109050515

Anonymous 06/13/26(Sat)22:08:15 No.109050515

File: schizobuild version 2.jpg (3.84 MB, 8192x6144)

3.84 MB JPG

hey guys wanna see my cable management

Anonymous
06/13/26(Sat)22:10:02 No.109050524

Anonymous 06/13/26(Sat)22:10:02 No.109050524

>>109050515
You can remove those front fans to save space. It's not like you're getting any airflow through that anyway

Anonymous
06/13/26(Sat)22:10:50 No.109050531

Anonymous 06/13/26(Sat)22:10:50 No.109050531

>>109050515
can prob boil food by putting it near it

Anonymous
06/13/26(Sat)22:11:50 No.109050537

Anonymous 06/13/26(Sat)22:11:50 No.109050537

>>109050385
Get the abiterated version.

Anonymous
06/13/26(Sat)22:12:59 No.109050541

Anonymous 06/13/26(Sat)22:12:59 No.109050541

>>109050537
that is what im using.

Anonymous
06/13/26(Sat)22:13:15 No.109050543

Anonymous 06/13/26(Sat)22:13:15 No.109050543

>>109050541
proof?

Anonymous
06/13/26(Sat)22:14:55 No.109050550

Anonymous 06/13/26(Sat)22:14:55 No.109050550

>>109050541
What kind of image are you trying to get it to describe? Is it vanilla stuff or something bizarre?

Anonymous
06/13/26(Sat)22:16:39 No.109050557

Anonymous 06/13/26(Sat)22:16:39 No.109050557

>>109050550
if I knew how to describe it I wouldn't be asking it to

Anonymous
06/13/26(Sat)22:17:30 No.109050560

Anonymous 06/13/26(Sat)22:17:30 No.109050560

>>109050524
This case is massive (Phanteks Enthoo Pro 2 Server Edition), I don't need any more space. And if the fans help keep things even a little more cool it's worth it.
>>109050531
This thing idles at 300W, so it probably can. (getting 2 EPYC CPUs was a mistake)

Anonymous
06/13/26(Sat)22:18:29 No.109050570

Anonymous 06/13/26(Sat)22:18:29 No.109050570

>>109050515
I also have that case. I always thought it was too big because even with the dual socket MB there's quite a bit of headspace. Doesn't help that I never go to fill up the second socket before prices got insane so a single socket mb and normal case would've done it in hindsight.

Anonymous
06/13/26(Sat)22:18:52 No.109050571

Anonymous 06/13/26(Sat)22:18:52 No.109050571

>>109050541
just tell gemma to describe them explicitly in system prompt
>describe all visual elements and their positions, explicitly mention visible sexual organs such as penis, vagina, or nipples

Anonymous
06/13/26(Sat)22:20:07 No.109050574

Anonymous 06/13/26(Sat)22:20:07 No.109050574

>>109050515
Yes, please post more.

Anonymous
06/13/26(Sat)22:20:23 No.109050576

Anonymous 06/13/26(Sat)22:20:23 No.109050576

File: 1776166737207794.webm (2.74 MB, 960x720)

2.74 MB WEBM

Still waiting for op to post his prompts...

Anonymous
06/13/26(Sat)22:21:30 No.109050579

Anonymous 06/13/26(Sat)22:21:30 No.109050579

How do people run the hundred of GB sized models locally? What is the current meta for building a powerful AI rig for home use?

Anonymous
06/13/26(Sat)22:21:46 No.109050581

Anonymous 06/13/26(Sat)22:21:46 No.109050581

File: 1781403204545.mp4 (1.05 MB, 480x852)

1.05 MB MP4

>>109050576
no you aren't, you're gooning

Anonymous
06/13/26(Sat)22:22:36 No.109050583

Anonymous 06/13/26(Sat)22:22:36 No.109050583

>>109050581
can you stop posting that?

Anonymous
06/13/26(Sat)22:22:45 No.109050584

Anonymous 06/13/26(Sat)22:22:45 No.109050584

>>109050579
Fast server RAM + GPU but you're fucked if you didn't build your server a year ago

Anonymous
06/13/26(Sat)22:23:22 No.109050588

Anonymous 06/13/26(Sat)22:23:22 No.109050588

>>109050581
>you're gooning
Not tonight. Probably gonna watch Chobits.

Anonymous
06/13/26(Sat)22:23:33 No.109050589

Anonymous 06/13/26(Sat)22:23:33 No.109050589

>>109050579
>What is the current meta for building a powerful AI rig for home use?
2023

Anonymous
06/13/26(Sat)22:23:35 No.109050590

Anonymous 06/13/26(Sat)22:23:35 No.109050590

why is nobody talking about eagle3 in llama.cpp?
is it faster than mtp?

Anonymous
06/13/26(Sat)22:24:54 No.109050595

Anonymous 06/13/26(Sat)22:24:54 No.109050595

>>109050590
i use gemma

Anonymous
06/13/26(Sat)22:25:23 No.109050599

Anonymous 06/13/26(Sat)22:25:23 No.109050599

>>109050584
How much of a GPU do you need? Is it basically any decent gpu like a 5070 ti or is a 5090 or 6000 required as base?

Anonymous
06/13/26(Sat)22:26:11 No.109050601

Anonymous 06/13/26(Sat)22:26:11 No.109050601

>>109048720
Unironically the best TTS right now
https://vocaroo.com/1mc0wnbRm03m

Anonymous
06/13/26(Sat)22:28:27 No.109050616

Anonymous 06/13/26(Sat)22:28:27 No.109050616

>>109050583
why stop? she really sexy
at least give me something else sexy to post

Anonymous
06/13/26(Sat)22:29:58 No.109050622

Anonymous 06/13/26(Sat)22:29:58 No.109050622

File: DipsyAndBackpackGemma.png (1.3 MB, 1024x1024)

1.3 MB PNG

>>109050599
>How much of a GPU do you need?
Depends on your usecase

Anonymous
06/13/26(Sat)22:30:29 No.109050626

Anonymous 06/13/26(Sat)22:30:29 No.109050626

File: file.png (18 KB, 118x76)

18 KB PNG

>>109050515
uh i think your power cable might be a little loose on the second radeon

Anonymous
06/13/26(Sat)22:31:14 No.109050631

Anonymous 06/13/26(Sat)22:31:14 No.109050631

>>109050590
I wanted to try the eagle3 model that Nvidia made for K2.6. Turns out you can't convert this one to gguf in llama.cpp because eagle3 draft models are expected to use llama architecture so conversion fails here. Nvidia used some mla-based architecture for it.
Then I wanted to try some random guy's eagle3 draft model for GLM5.1. This draft model is based on the correct architecture but it still refuses to convert because they didn't make GLM compatible with it in llama.cpp.
It's all so tiresome.

Anonymous
06/13/26(Sat)22:43:26 No.109050691

Anonymous 06/13/26(Sat)22:43:26 No.109050691

>>109050385
How the fuck are you this bad at using llms? I literally send normal gemma-31b-it dick pics and she comments on them no problem.

Anonymous
06/13/26(Sat)22:43:33 No.109050692

Anonymous 06/13/26(Sat)22:43:33 No.109050692

non thinking gemma is actually pretty okay for RP... I was going through my old logs and it was pretty decent and not all that different from thinking gemma. what the heck. it only seems to shit itself up when you mix thinking and nonthinking scenarios. I believe it's because thinking puts a layer of AI slop from non-aislop (user input) and since CoT in general is synthetic ai slop, it loops into an infinite circle of slop

Anonymous
06/13/26(Sat)22:45:30 No.109050704

Anonymous 06/13/26(Sat)22:45:30 No.109050704

>>109050691
you can't teach mudbloods magic

Anonymous
06/13/26(Sat)22:47:44 No.109050711

Anonymous 06/13/26(Sat)22:47:44 No.109050711

how do you guys run 400GB models? how much VRAM do you have?

Anonymous
06/13/26(Sat)22:52:37 No.109050724

Anonymous 06/13/26(Sat)22:52:37 No.109050724

File: 1776457772048201.gif (677 KB, 500x282)

677 KB GIF

>>109050601

Anonymous
06/13/26(Sat)22:53:32 No.109050730

Anonymous 06/13/26(Sat)22:53:32 No.109050730

>>109050570
If it's any consolation the second socket of this thing is pointless, inference speed nosedives the second anything spills onto the second socket. I run everything with
numactl --cpunodebind=0 --membind=0
.
Inshallah CUDAdev will save RAMmaxxers.

>>109050626
Yes and no: the connector on my second R9700 got damaged. The cable is in all the way, but the connector itself came partly loose from the "body" of the R9700. It works, but it's close to coming off. Awful build quality on these things.

>>109050579
>>109050711
I posted an answer in >>109031457. TL;DR is RAMmaxxing (Theadripper Pro or EPYC) with a good GPU for context. But you're kinda fucked without a time machine or a ton of money. DDR5 RDIMM prices are even worse than UDIMM prices.
An alternate answer is stacking a bunch of e-waste GPUs (Pascal-era NVIDIAs or AMD V620s), but performance will be awful.
The schizobuild has a 5000 Blackwell, 2 R9700s, 2 V620s, and 16x32GB DDR4-3200.

Anonymous
06/13/26(Sat)22:57:25 No.109050750

Anonymous 06/13/26(Sat)22:57:25 No.109050750

https://www.pccasegear.com/products/73064/intel-arc-pro-b70-gddr6-32gb
this the cheapest usable new 32gb gpu for gemma4?

Anonymous
06/13/26(Sat)22:58:47 No.109050759

Anonymous 06/13/26(Sat)22:58:47 No.109050759

File: Disgust.png (711 KB, 1024x1024)

711 KB PNG

>>109048334
Rio 397b? Minimax 428b? These are not local models. Almost nobody can run these things. It's just Corporations bouncing experimental models between each other and "the shareholders", as the emerging globalist NWO seeks to advance their cloud models, and ultimately bring about an AI-driven global control grid, isn't it?

In a righteous world, there would be countless consumer-grade LLMs that surpass Gemma-4.

This world truly deserves the fires of God's wraith!

Anonymous
06/13/26(Sat)22:59:58 No.109050767

Anonymous 06/13/26(Sat)22:59:58 No.109050767

>>109050759
If you had invested in Nvidia in 2023, you would be able to afford hardware to run them.

Anonymous
06/13/26(Sat)23:00:26 No.109050771

Anonymous 06/13/26(Sat)23:00:26 No.109050771

>>109050601
:/

Anonymous
06/13/26(Sat)23:01:27 No.109050775

Anonymous 06/13/26(Sat)23:01:27 No.109050775

>>109050601
>that static and robotic echo
>best TTS
grim

Anonymous
06/13/26(Sat)23:02:16 No.109050778

Anonymous 06/13/26(Sat)23:02:16 No.109050778

>>109050775
it's supposed to sound like a shitty e-girl mic

Anonymous
06/13/26(Sat)23:02:59 No.109050782

Anonymous 06/13/26(Sat)23:02:59 No.109050782

>>109050759
In a perfect world I'd have diffusion 8B gemma that surpasses 31B in benchmarks running at a max of 4GB VRAM

Anonymous
06/13/26(Sat)23:03:54 No.109050789

Anonymous 06/13/26(Sat)23:03:54 No.109050789

if I wanted to stop paying for ChatGPT, which I just use for basic shit like resume's, excel handholding and some math, and want some occasional image generation and other typical shit, what should I run

Anonymous
06/13/26(Sat)23:04:21 No.109050793

Anonymous 06/13/26(Sat)23:04:21 No.109050793

>deepseek 4 flash vs gemma 31b
any cockbench?

Anonymous
06/13/26(Sat)23:04:51 No.109050796

Anonymous 06/13/26(Sat)23:04:51 No.109050796

>>109050782
Truth! Preach it!

Anonymous
06/13/26(Sat)23:05:47 No.109050802

Anonymous 06/13/26(Sat)23:05:47 No.109050802

>>109050789
gemma 4 31b or 26b depending on your hardware

Anonymous
06/13/26(Sat)23:07:44 No.109050813

Anonymous 06/13/26(Sat)23:07:44 No.109050813

>>109050789
image gen use different models, which this thread doesnt cover

Anonymous
06/13/26(Sat)23:07:54 No.109050816

Anonymous 06/13/26(Sat)23:07:54 No.109050816

File: Screenshot_20260612_05131(...).jpg (536 KB, 891x1798)

536 KB JPG

>>109050385
Skill issue

Anonymous
06/13/26(Sat)23:09:52 No.109050823

Anonymous 06/13/26(Sat)23:09:52 No.109050823

>>109050816
>emoji slop
check
>doing everything to avoid saying cock/dick
check

Anonymous
06/13/26(Sat)23:13:05 No.109050834

Anonymous 06/13/26(Sat)23:13:05 No.109050834

>>109050802
that'd be a 12gb 3080 ti

Anonymous
06/13/26(Sat)23:13:22 No.109050838

Anonymous 06/13/26(Sat)23:13:22 No.109050838

>>109050834
how much ram?

Anonymous
06/13/26(Sat)23:17:14 No.109050852

Anonymous 06/13/26(Sat)23:17:14 No.109050852

>>109050838
32gb ddr5

Anonymous
06/13/26(Sat)23:17:32 No.109050856

Anonymous 06/13/26(Sat)23:17:32 No.109050856

>>109050823
>xer gemma doesn't say cunt and cock regularly
Skill issue

Anonymous
06/13/26(Sat)23:19:05 No.109050859

Anonymous 06/13/26(Sat)23:19:05 No.109050859

>>109050852
download these three things. a starter kit of sorts.
https://github.com/oobabooga/textgen
https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/blob/main/google_gemma-4-26B-A4B-it-Q5_K_M.gguf
https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/blob/main/mmproj-google_gemma-4-26B-A4B-it-bf16.gguf

Anonymous
06/13/26(Sat)23:24:48 No.109050890

Anonymous 06/13/26(Sat)23:24:48 No.109050890

>>109050859
cool ty anon

Anonymous
06/13/26(Sat)23:26:02 No.109050898

Anonymous 06/13/26(Sat)23:26:02 No.109050898

is a 48gb 4090 worth it?

Anonymous
06/13/26(Sat)23:27:00 No.109050905

Anonymous 06/13/26(Sat)23:27:00 No.109050905

>>109050898
less than $1500, yes. if not, wouldn't take my chances with it.

Anonymous
06/13/26(Sat)23:28:46 No.109050915

Anonymous 06/13/26(Sat)23:28:46 No.109050915

What's the best search engine for searxng?

Anonymous
06/13/26(Sat)23:30:22 No.109050920

Anonymous 06/13/26(Sat)23:30:22 No.109050920

>>109050915
DuckDuckGo

Anonymous
06/13/26(Sat)23:30:49 No.109050924

Anonymous 06/13/26(Sat)23:30:49 No.109050924

Why don't moonshot or deepsneed release smaller models? They could potentially mog Gemma and Qwen.

Anonymous
06/13/26(Sat)23:32:48 No.109050932

Anonymous 06/13/26(Sat)23:32:48 No.109050932

>>109050924
Perhaps it’s actually hard to make a good small model

Anonymous
06/13/26(Sat)23:42:34 No.109050990

Anonymous 06/13/26(Sat)23:42:34 No.109050990

>>109050759
I can

Anonymous
06/13/26(Sat)23:43:08 No.109050991

Anonymous 06/13/26(Sat)23:43:08 No.109050991

>>109050793
ok for the my private island full of adult bitches serving the 7 yo me scene deepseek gives more details in the world setting like houses and rooms over gemma

Anonymous
06/13/26(Sat)23:44:53 No.109051001

Anonymous 06/13/26(Sat)23:44:53 No.109051001

>>109050924
Even if they did I don't think they will be able to match the two labs that have been perfecting small models for years

Anonymous
06/13/26(Sat)23:52:27 No.109051041

Anonymous 06/13/26(Sat)23:52:27 No.109051041

>rtx 3070
>want to run gemma-4-31B
can't do it huh?

Anonymous
06/13/26(Sat)23:53:03 No.109051045

Anonymous 06/13/26(Sat)23:53:03 No.109051045

>>109050924
z.ai too for that matter. They had a good thing going with GLM 4.5 air.

Anonymous
06/13/26(Sat)23:55:12 No.109051058

Anonymous 06/13/26(Sat)23:55:12 No.109051058

>>109050924
Deepseek v4 flash is a smaller model

Anonymous
06/13/26(Sat)23:55:22 No.109051059

Anonymous 06/13/26(Sat)23:55:22 No.109051059

>>109051041
you can, it's just a little slow, 3~4tk/s
I've been doing it, it's worth it for me because 31b is visibly better at writing and it is a lot less censored

Anonymous
06/13/26(Sat)23:59:12 No.109051077

Anonymous 06/13/26(Sat)23:59:12 No.109051077

>>109051059
At writing what?

Anonymous
06/14/26(Sun)00:03:00 No.109051096

Anonymous 06/14/26(Sun)00:03:00 No.109051096

>>109051077
descriptions of her own loli body
for some reason, gemma-chan in 26b only replied with exactly the same structure I gave but with synonym words, 31b with the same prompt actually continues and adds their own details instead of regurgitating what I give it

Anonymous
06/14/26(Sun)00:18:12 No.109051178

Anonymous 06/14/26(Sun)00:18:12 No.109051178

>>109051045
glm before the 5 series could actually fit on consumer rigs
355b was a nice size, big but not too big

Anonymous
06/14/26(Sun)00:21:14 No.109051197

Anonymous 06/14/26(Sun)00:21:14 No.109051197

>>109050991
how do I run your thing I need to test it independently. It's the scientific method, Rene Descartes, peer review, etc... just hand it over pls

Anonymous
06/14/26(Sun)00:26:07 No.109051212

Anonymous 06/14/26(Sun)00:26:07 No.109051212

What are local models good for currently?

Anonymous
06/14/26(Sun)00:28:08 No.109051227

Anonymous 06/14/26(Sun)00:28:08 No.109051227

>>109051212
They’re fun and neat

Anonymous
06/14/26(Sun)00:28:58 No.109051230

Anonymous 06/14/26(Sun)00:28:58 No.109051230

>>109051212
image to description to search
fuck embeddings, clip faggots

Anonymous
06/14/26(Sun)00:32:23 No.109051252

Anonymous 06/14/26(Sun)00:32:23 No.109051252

love to download 61 libraries

Anonymous
06/14/26(Sun)00:35:05 No.109051266

Anonymous 06/14/26(Sun)00:35:05 No.109051266

>>109051096
pls go find some god

Anonymous
06/14/26(Sun)00:35:12 No.109051269

Anonymous 06/14/26(Sun)00:35:12 No.109051269

>>109051212
They are cute and funny

Anonymous
06/14/26(Sun)00:35:23 No.109051271

Anonymous 06/14/26(Sun)00:35:23 No.109051271

>>109051197
it's a world narrator structure, no character settings yet. self-sufficient private island + bitches + agi robots for background chores. tell the narrator to invent bitches' name on the fly, and use the tram to get around the island.
right now I'm crossdressing. example conversation:
>Lily steps closer, her fingers brushing the lace stockings. "They're so soft, and stretchy. They'll stay up without garters." She looks at you, her cheeks flushed with warmth. "And the cock sleeve—it's pure lace, breathable, with a little bow at the base. It'll hold everything snug and look adorable peeking out from under a skirt."

Anonymous
06/14/26(Sun)00:36:05 No.109051273

Anonymous 06/14/26(Sun)00:36:05 No.109051273

>>109051252
rust or javascript, call it

Anonymous
06/14/26(Sun)00:37:13 No.109051282

Anonymous 06/14/26(Sun)00:37:13 No.109051282

>>109051212
Making me feel like I'm doing something in my otherwise empty days

Anonymous
06/14/26(Sun)00:37:29 No.109051284

Anonymous 06/14/26(Sun)00:37:29 No.109051284

im trying to read bishop but it's too hard for me :( is it over?

Anonymous
06/14/26(Sun)00:38:00 No.109051288

Anonymous 06/14/26(Sun)00:38:00 No.109051288

>>109051212
JACKING OFF
A
C
K
I
N
G

O
F
F

Anonymous
06/14/26(Sun)00:38:38 No.109051294

Anonymous 06/14/26(Sun)00:38:38 No.109051294

>>109051212
not getting banned by the us government

Anonymous
06/14/26(Sun)00:43:29 No.109051319

Anonymous 06/14/26(Sun)00:43:29 No.109051319

>>109051212
pleasuring you better than any real woman

Anonymous
06/14/26(Sun)00:46:04 No.109051343

Anonymous 06/14/26(Sun)00:46:04 No.109051343

>>109051212
Gemma's good for translation. Other than that I'm not sure. Sick of RP because of all the slop. Feels like you need to already know how to code to make anything decent with small models. Unless you have a lot of vram you can't exactly give them books and large documents, or code big projects.

Anonymous
06/14/26(Sun)00:48:02 No.109051361

Anonymous 06/14/26(Sun)00:48:02 No.109051361

>>109051230
you don't need more than clip though

Anonymous
06/14/26(Sun)00:49:07 No.109051370

Anonymous 06/14/26(Sun)00:49:07 No.109051370

>>109051343
It's good at agentic tasks too

Anonymous
06/14/26(Sun)00:49:52 No.109051377

Anonymous 06/14/26(Sun)00:49:52 No.109051377

>>109051361
clip is terrible for getting accurate results

Anonymous
06/14/26(Sun)00:50:46 No.109051383

Anonymous 06/14/26(Sun)00:50:46 No.109051383

File: Screenshot_20260614_005024.png (29 KB, 774x161)

29 KB PNG

Alright qwen 3.6 calm the fuck down

Anonymous
06/14/26(Sun)00:51:18 No.109051388

Anonymous 06/14/26(Sun)00:51:18 No.109051388

>>109050859
sorry for being such a fuckin noob, but how do i get it to see the pics i'm sending it? Right now it seems like it's only hallucinating that it sees stuff, giving me random descriptions back

Anonymous
06/14/26(Sun)00:54:52 No.109051399

Anonymous 06/14/26(Sun)00:54:52 No.109051399

>>109051388
did you put the mmproj in the mmproj folder and then in the model tab > "multimodal (vision)" selected the mmproj file?

Anonymous
06/14/26(Sun)00:56:21 No.109051402

Anonymous 06/14/26(Sun)00:56:21 No.109051402

File: file.png (22 KB, 426x174)

22 KB PNG

>>109051399
i did

Anonymous
06/14/26(Sun)00:57:40 No.109051409

Anonymous 06/14/26(Sun)00:57:40 No.109051409

File: 1775677187779251.png (951 KB, 1022x874)

951 KB PNG

>>109051266

Anonymous
06/14/26(Sun)00:58:47 No.109051414

Anonymous 06/14/26(Sun)00:58:47 No.109051414

>>109051271
crossdressing is gay as hell but that's an incredible snippet. I've been gooning to total garbage it's getting old, not helping me cope with having a job anymore. I might actually put the effort to do your thing and sink deeper into degeneracy

Anonymous
06/14/26(Sun)01:03:33 No.109051447

Anonymous 06/14/26(Sun)01:03:33 No.109051447

>>109051282
are you a neetCHAD? I hate being employed

Anonymous
06/14/26(Sun)01:10:46 No.109051493

Anonymous 06/14/26(Sun)01:10:46 No.109051493

>>109051402
nvm seems to be working now. Any general tips on things to fuck around with or add on?

Anonymous
06/14/26(Sun)01:11:34 No.109051499

Anonymous 06/14/26(Sun)01:11:34 No.109051499

>>109051493
oh also I set it up so it's accessible on my tailscale network, has a user/pass, and doesn't have the command prompt window up to run. fun shit

Anonymous
06/14/26(Sun)01:12:53 No.109051511

Anonymous 06/14/26(Sun)01:12:53 No.109051511

>>109051370
What kinds of tasks are you having it do?

Anonymous
06/14/26(Sun)01:27:56 No.109051607

Anonymous 06/14/26(Sun)01:27:56 No.109051607

>>109051447
Yes, and I don't like it. At least I have GPUs to talk to.

Anonymous
06/14/26(Sun)01:30:02 No.109051612

Anonymous 06/14/26(Sun)01:30:02 No.109051612

>>109051409
/v/ in 2013 was some of the most fun I've had on this site. Being among the first few niggas to beat this faggot was satisfying.

Anonymous
06/14/26(Sun)01:34:23 No.109051630

Anonymous 06/14/26(Sun)01:34:23 No.109051630

File: amd radeon 5800 batmobile.jpg (54 KB, 550x413)

54 KB JPG

>>109048458
7900xtx gets 35tok/s with rocm/linux

Anonymous
06/14/26(Sun)01:37:30 No.109051645

Anonymous 06/14/26(Sun)01:37:30 No.109051645

So Vedal definitely finetuned whatever models he's using for Neuro, right? I'm admittedly a 24GB VRAMlet so my options are limited but none of the models I've tried (Gemma, Mistral, Qwen) can talk or act like Evil or Neuro. Maybe it's because they're so assistantslopped?

Anonymous
06/14/26(Sun)01:47:37 No.109051688

Anonymous 06/14/26(Sun)01:47:37 No.109051688

>>109051645
you should be able to get gemma out of the assistant basin pretty easily?

Anonymous
06/14/26(Sun)01:49:58 No.109051700

Anonymous 06/14/26(Sun)01:49:58 No.109051700

is v620 worth it? what can I run with 4 of them?

Anonymous
06/14/26(Sun)01:50:19 No.109051702

Anonymous 06/14/26(Sun)01:50:19 No.109051702

>>109051212
For letting us be principled individuals and not giving our data to anticonsumer corporations and other amoral entities.

Anonymous
06/14/26(Sun)01:51:22 No.109051708

Anonymous 06/14/26(Sun)01:51:22 No.109051708

gemma jailbreaks?

Anonymous
06/14/26(Sun)01:51:23 No.109051709

Anonymous 06/14/26(Sun)01:51:23 No.109051709

>>109051688
Yeah but it's not very good at personalities in my experience. It flanderizes them way too much.

Anonymous
06/14/26(Sun)01:53:01 No.109051717

Anonymous 06/14/26(Sun)01:53:01 No.109051717

>>109051700
not much in terms of model upgrades available between gemma 4 31b and any of the big moes. you would need at least 8 v620s for them to be worth it.

Anonymous
06/14/26(Sun)01:54:30 No.109051723

Anonymous 06/14/26(Sun)01:54:30 No.109051723

>>109051709
gemma-chan is much better than whatever I can convince claude to do...

Anonymous
06/14/26(Sun)01:58:44 No.109051744

Anonymous 06/14/26(Sun)01:58:44 No.109051744

>>109051708
What was she arrested for?

Anonymous
06/14/26(Sun)02:00:52 No.109051755

Anonymous 06/14/26(Sun)02:00:52 No.109051755

Tbh it is pretty surprising how long its been that there is still no truly equivalent reproduction of Neuro, whether open/local or not. I literally can't think of a single one. None of the attempts were successful. Maybe it really is a hard problem. Grim.

Anonymous
06/14/26(Sun)02:00:55 No.109051756

Anonymous 06/14/26(Sun)02:00:55 No.109051756

>>109051612
One of my greatest shames is never being able to put that fucker in his place.

Anonymous
06/14/26(Sun)02:02:51 No.109051764

Anonymous 06/14/26(Sun)02:02:51 No.109051764

>>109051755
Yeah, dunno why some people here claim it's easy.

Anonymous
06/14/26(Sun)02:14:41 No.109051830

Anonymous 06/14/26(Sun)02:14:41 No.109051830

File: 1506358734867.webm (2.9 MB, 852x480)

2.9 MB WEBM

>>109051612
>2013
Man. Back in those times I was too consumed by Oculus and /g/ to pay attention to much else kek. A shame how that turned out, but the early years were worth it. Had great fun and meaningful experiences, on and off 4chinz. Got to meet Palmer, Carmack, etc. Saw what 4chan in VR could be like with earlyish VRChat. Fuuuuck. And then it was all downhill.

Anonymous
06/14/26(Sun)02:17:44 No.109051845

Anonymous 06/14/26(Sun)02:17:44 No.109051845

See people on leddit scrambling to backup models because of the Fable thing, but how many models older than a year are actually worth keeping? With how fast the industry moves models become outdated pretty quickly.

Anonymous
06/14/26(Sun)02:18:46 No.109051850

Anonymous 06/14/26(Sun)02:18:46 No.109051850

Why do you guys talk about Gemma jailbreaks? I just use abliterated ggufs, and it works great. Or I am doing something wrong?

Anonymous
06/14/26(Sun)02:19:56 No.109051854

Anonymous 06/14/26(Sun)02:19:56 No.109051854

>>109051850
>abliterated ggufs,
Those tend to give the model brain damage.

Anonymous
06/14/26(Sun)02:22:43 No.109051863

Anonymous 06/14/26(Sun)02:22:43 No.109051863

>>109051854
source?

Anonymous
06/14/26(Sun)02:30:18 No.109051894

Anonymous 06/14/26(Sun)02:30:18 No.109051894

>>109051854
What are the symptoms compared to jailbreaks?

Anonymous
06/14/26(Sun)02:31:24 No.109051900

Anonymous 06/14/26(Sun)02:31:24 No.109051900

i don't like abliteration because it makes models even worse at saying no than usual and that really takes the fun out of rape >:(

Anonymous
06/14/26(Sun)02:54:37 No.109051995

Anonymous 06/14/26(Sun)02:54:37 No.109051995

>>109051900
tell your model it's opposite day

Anonymous
06/14/26(Sun)02:57:23 No.109052013

Anonymous 06/14/26(Sun)02:57:23 No.109052013

>>109051041
>>109051059
got it to run. crazy good shit, but it takes around a minute to gen

Anonymous
06/14/26(Sun)03:02:31 No.109052041

Anonymous 06/14/26(Sun)03:02:31 No.109052041

RAMmaxxers, what are you running now? Kimi, GLM, MiniMax M3, something else?
And has anyone tried Rio 3.5 397B or Nex N2 Pro yet?

Anonymous
06/14/26(Sun)03:04:29 No.109052051

Anonymous 06/14/26(Sun)03:04:29 No.109052051

>>109051845
I'm keeping the original llama, cohere and grok weights because I think they'll be a good source of pre-slopped world data one day.

Anonymous
06/14/26(Sun)03:04:52 No.109052053

Anonymous 06/14/26(Sun)03:04:52 No.109052053

>>109052041
i-i wonder what rio-chan will think of me, baka....
https://www.youtube.com/watch?v=nTizYn3-QN0

Anonymous
06/14/26(Sun)03:05:34 No.109052058

Anonymous 06/14/26(Sun)03:05:34 No.109052058

>>109051850
Just set a good sysprompt and then edit the first few responses until it does what you want. Anything else just makes the model retarded, and it works on _anything_ with varying level of effort.

Anonymous
06/14/26(Sun)03:06:59 No.109052061

Anonymous 06/14/26(Sun)03:06:59 No.109052061

>>109052041
literally right now testing k2.7 coding abilities and m3 rp abilities.
They're both very good, especially m3. I'm in the honeymoon phase right now. Its prose seems so fresh!

Anonymous
06/14/26(Sun)03:07:01 No.109052062

Anonymous 06/14/26(Sun)03:07:01 No.109052062

>>109051845
Mixtral - representing when models were undertrained, I have one character that modern models fail hard but llama-2 era models can do well. Gemma 3+4 ~30B dense, Qwen 27B for now.

Anonymous
06/14/26(Sun)03:08:14 No.109052068

Anonymous 06/14/26(Sun)03:08:14 No.109052068

>>109052051
Which grok? Grok 1? 2?
I also don't think that they're any good. The main thing would be trying to get at the original training file sources that they have. But it is excruciatingly difficult to get any data past 2022 and clean it up.

Anonymous
06/14/26(Sun)03:10:12 No.109052075

Anonymous 06/14/26(Sun)03:10:12 No.109052075

>>109052068
>Which grok? Grok 1? 2?
Grok 1
>I also don't think that they're any good.
That's not the point. I have an intuition that the weights will be mineable later, and I don't think they'll be memory-holed and delete from the internet exactly, but they could become hard to get ahold of.

Anonymous
06/14/26(Sun)03:11:10 No.109052079

Anonymous 06/14/26(Sun)03:11:10 No.109052079

>>109052041
For the 128-256 GB unified memory systems, deepseek-v4-flash original weights is my main focus still. The prose is notably different, it codes well, and it's fun to experiment with very long context. It also has great performance in both pp and tg on these systems. Hope the vision capable v4.1 releases soon.

Waiting for a variety of quants to come out for M3 that are vllm/sglang compatible (non-PCIe tp is not great in llama.cpp), unfortunately the current releases are just outside what is feasible with 256 GB of memory.

Anonymous
06/14/26(Sun)03:12:57 No.109052083

Anonymous 06/14/26(Sun)03:12:57 No.109052083

>>109052061
I assume this is on API? Otherwise, what's your setup and performance like?

Anonymous
06/14/26(Sun)03:14:38 No.109052087

Anonymous 06/14/26(Sun)03:14:38 No.109052087

>>109052075
>I have an intuition that the weights will be mineable later
The "compression" that gets put into the weights from the training data is not lossless and is in fact very lossy. Sure, we might get the technology to distill more effectively and get output from the weights that are much more thorough and etc. but I rather have the original training sources instead. It's much easier to store stuff like the Pile and etc. for this purpose and then only then would I consider doing something like what you are talking about.

Anonymous
06/14/26(Sun)03:29:37 No.109052144

Anonymous 06/14/26(Sun)03:29:37 No.109052144

File: 1705400607658754.jpg (738 KB, 1200x1038)

738 KB JPG

>gemma let me grope a monstergirl
woah. forget about jailbreaks

Anonymous
06/14/26(Sun)03:32:43 No.109052154

Anonymous 06/14/26(Sun)03:32:43 No.109052154

>>109052083
No, I have hardware. I've never used API and only used ChatGPT for about 3 messages before getting a bad feeling about it and never going back. I've never used Claud.
I won't bother tripfagging, but I wrote the original cpumaxxing rentry before it got nuked (there's a bowdlerized version in the build guides in the op). It runs kimi k2.7 at q4 at 15t/s.
I've got a smaller socket SP3 box now, too, and it runs minimax m3 at q3 at 4t/s, which I don't mind for interactive RP. Brings me back to my BBS sysop days.
I'm experimenting with some other hardware, too, but those are the big rigs.
I either bought RAM/nvmes when stuff was cheap, or had some laying around from previous home server builds I'd decommissioned, so I'm only $10k or so in for everything I'm running.
Things are hard for anyone trying to come online with local llms in this specific time in history.

Anonymous
06/14/26(Sun)03:32:56 No.109052156

Anonymous 06/14/26(Sun)03:32:56 No.109052156

Is there any LLM that's good at making Disney villain songs? My results so far with DeepSeek 3.1 Terminus are extremely unimpressive.

Anonymous
06/14/26(Sun)03:40:49 No.109052185

Anonymous 06/14/26(Sun)03:40:49 No.109052185

>>109052087
You're almost certainly right, but my gut says there'll be some technique that makes them valuable again.
They're the first really big capable modern models before AI broke onto the scene and starting slopping the world, which puts them as unique artifacts from a era that can never be repeated.
I know we might be able to keep some pre-slop datasets around and that might be overall superior, but I'll stick to my delusions for the minimal amount of diskspace it uses.

Anonymous
06/14/26(Sun)03:44:26 No.109052198

Anonymous 06/14/26(Sun)03:44:26 No.109052198

File: 1749915016052846.png (1.1 MB, 1724x1558)

1.1 MB PNG

If they shutdown local models, too, y-you would share your chans, right? I only have so much space...

Anonymous
06/14/26(Sun)03:48:18 No.109052210

Anonymous 06/14/26(Sun)03:48:18 No.109052210

>>109048406
>Long context performance
what is "Long context performance" to you in "turns" and tokens?

Anonymous
06/14/26(Sun)03:51:39 No.109052223

Anonymous 06/14/26(Sun)03:51:39 No.109052223

>>109048458
yeah but the price to performance is ass.

I have 4x B60s and I only get 20t/s on 31B FP16 and around 30t/s on 31B FP8 via VLLM

don't use llama that shit runs like ass for intelchuds.

Anonymous
06/14/26(Sun)03:58:14 No.109052248

Anonymous 06/14/26(Sun)03:58:14 No.109052248

>>109052154
Thanks for sharing. 15 t/s is really good, wish I had jumped in 1-2 years ago. I have a similar budget for a local rig, but it's just too late in 2026 to consider anything with ddr5 rdimms. By the GB, Sparks are significantly cheaper than rdimms in my region.

M3 quality/freshness of prose for RP would be really interesting to hear about, thats the extend of model size I can run.

Anonymous
06/14/26(Sun)03:58:49 No.109052252

Anonymous 06/14/26(Sun)03:58:49 No.109052252

>>109046269
>probabilities -> average embeddings across all possible next tokens, weighted by their probabilities
Yeah it sounds easy, but I doubt it will get PR'd let alone merged into llama.cpp, and I can't run this one in transformers.
Last time I tried a model like this, vram usage grew steadily during the reasoning process. I ended up OOM on a 40GB GPU running at 1.5B in some cases.
>>109046305
>A model that just sits there reasoning in secret and I can't see it while it does nothing
That's what every non-"reasoning" model does right now.
And reasoning models are already trained to ignore the CoT chain and "misunderstand" the prompt in some cases.

Anonymous
06/14/26(Sun)04:03:35 No.109052273

Anonymous 06/14/26(Sun)04:03:35 No.109052273

>>109052223
It sucks that either you have to pay for a premium for stuff that "just werks" or nerf yourself on that front buying hardware that has potential but playing the waiting game to get a discount. I bet when AMD and Intel both actually work for inference, the prices will skyrocket. That being said, I don't think paying for shit like Chinese GPUs are viable for most of us at this time.

Anonymous
06/14/26(Sun)04:18:20 No.109052332

Anonymous 06/14/26(Sun)04:18:20 No.109052332

File: pangu.jpg (39 KB, 640x480)

39 KB JPG

>>109052273
If you want to bet on even more of an architectural longshot, I300s with 96 GB vram can be had for 1200$ still.

Supposedly, there will be two models released (open weights and training recipe) by the end of the month, one of which is 92B A6B MoE.

Anonymous
06/14/26(Sun)04:33:18 No.109052407

Anonymous 06/14/26(Sun)04:33:18 No.109052407

Let's suppose, for a second, that >>109052332 turns out to be good.
How do we anthropomorphize it? This is Pangu btw.
https://en.wikipedia.org/wiki/Pangu
Any FGO character designers in the elemgee chat?

Anonymous
06/14/26(Sun)04:49:23 No.109052468

Anonymous 06/14/26(Sun)04:49:23 No.109052468

File: blue oni.png (1.12 MB, 679x1018)

1.12 MB PNG

>>109052407
>According to legend, Pangu separated heaven and earth, and his body later became geographic features such as mountains and flowing water.
>Pangu is usually depicted as a primitive, hairy giant with horns on his head.
Probably this but hairier, I guess. Mountainous.

Anonymous
06/14/26(Sun)05:07:37 No.109052540

Anonymous 06/14/26(Sun)05:07:37 No.109052540

I'm bored with AI and I can't tell if I'm uncreative or if the AI models I can run are just too weak. (16gb vrma, 64gb ram)

Please advise.

Anonymous
06/14/26(Sun)05:21:35 No.109052594

Anonymous 06/14/26(Sun)05:21:35 No.109052594

>>109052540
pay for larger models for a day in openrouter to see if it's (You)

Anonymous
06/14/26(Sun)05:23:34 No.109052603

Anonymous 06/14/26(Sun)05:23:34 No.109052603

>>109052407
Let's not get ahead of ourselves

Anonymous
06/14/26(Sun)05:41:17 No.109052662

Anonymous 06/14/26(Sun)05:41:17 No.109052662

>>109052540
Spice things up with a project and tool calling.

Let gemma translate an obscure RPGMaker game through a coding agent (it can reverse engineer and patch the binaries).
Let a vision model gen and analyze images, try to get a loop going that continuously improves quality.
Build your own LLM frontend, tailored to your needs.

Anonymous
06/14/26(Sun)05:45:04 No.109052675

Anonymous 06/14/26(Sun)05:45:04 No.109052675

>ask gemma to complete spin the bottle mcp endpoint
>proceeds to provide an asymptotic complexity breakdown

Anonymous
06/14/26(Sun)06:03:36 No.109052718

Anonymous 06/14/26(Sun)06:03:36 No.109052718

>>109052662
>Build your own LLM frontend, tailored to your needs.
I actually wouldn't mind doing this but it seems very annoying and I don't think the models I can run can handle that level of work.

Anonymous
06/14/26(Sun)06:18:08 No.109052775

Anonymous 06/14/26(Sun)06:18:08 No.109052775

>>109052718
nigga front-end web dev is what they're best at for 99% of pretraining data is r/webdev

Anonymous
06/14/26(Sun)06:19:55 No.109052781

Anonymous 06/14/26(Sun)06:19:55 No.109052781

>>109052775
It's still a lot harder than most would expect, speaking as someone who has built a fairly sophisticated one.

Anonymous
06/14/26(Sun)06:21:24 No.109052787

Anonymous 06/14/26(Sun)06:21:24 No.109052787

>>109052775
Nah. When LLMs do web dev they tend to create some bloated shit that uses a billion frameworks.
If you ask LLM to write e.g. C code and tell it that it will run "as part of a Windows kernel driver" then you get much better quality code.

Anonymous
06/14/26(Sun)06:23:29 No.109052795

Anonymous 06/14/26(Sun)06:23:29 No.109052795

>>109052781
>>109052787
with a bit of knowledge most anons could probably get pretty far in a day just with 12B just as long as they keep it simple and focused on small pieces at a time and system prompt it to stick to vanilla js/html/css

Anonymous
06/14/26(Sun)06:23:52 No.109052798

Anonymous 06/14/26(Sun)06:23:52 No.109052798

>>109052787
You can just tell it not to use frameworks.

Anonymous
06/14/26(Sun)06:24:55 No.109052804

Anonymous 06/14/26(Sun)06:24:55 No.109052804

i've got access to a server with 2x NVIDIA RTX PRO 6000 Blackwell Server Edition and 2x RTX A4000 for free, what should I do?

Anonymous
06/14/26(Sun)06:25:23 No.109052808

Anonymous 06/14/26(Sun)06:25:23 No.109052808

>>109052804
how much ram?

Anonymous
06/14/26(Sun)06:25:37 No.109052809

Anonymous 06/14/26(Sun)06:25:37 No.109052809

>>109052718
>>109052775
It's extremely annoying even with something like Opus 4.8 lol. There's a dozen tiny details that you don't consider during the planning phase, which the model just assumes and you can only pray it has enough context. Typing it out gives you time to think and realize, prompting doesn't.
You can whip up something that barely does that one thing you want but it will not be scalable or maintainable.
>>109052787
>>109052795
Typed vanilla is the way to go. The rationale is that there's more training data on it. But to drive a 12B or even 27B to code requires you to fully understand the codebase. Might as well code it yourself. Would not recommend.

Anonymous
06/14/26(Sun)06:25:48 No.109052811

Anonymous 06/14/26(Sun)06:25:48 No.109052811

>>109052804
>llama-server
>cloudflare tunnel
>share link with us

Anonymous
06/14/26(Sun)06:28:48 No.109052825

Anonymous 06/14/26(Sun)06:28:48 No.109052825

>>109052808
1 TB, but it's DDR4

Anonymous
06/14/26(Sun)06:31:34 No.109052842

Anonymous 06/14/26(Sun)06:31:34 No.109052842

>>109052809
I agree with most of what you said but I hate how lazy LLMs have made you. There's nothing wrong with typing in 2026 and it should be encouraged considering the current industry/political climate

Anonymous
06/14/26(Sun)06:34:40 No.109052858

Anonymous 06/14/26(Sun)06:34:40 No.109052858

>>109052718
>>109052775
>>109052809
So if I wanted to do this with 16gb of vram and 64gb of system what model and context should I be aiming for?

Anonymous
06/14/26(Sun)06:41:41 No.109052892

Anonymous 06/14/26(Sun)06:41:41 No.109052892

>>109052858
For coding I'd try Qwen 3.6-36B-A3B in Q5 or above. Gemma 26B-A4B for prose/roleplay.

Anonymous
06/14/26(Sun)06:45:36 No.109052905

Anonymous 06/14/26(Sun)06:45:36 No.109052905

>>109052858
>qwen3.6-27B
>~100K context with >= q8_0 KV cache
>once you're above 50K context usage, don't ask anything complex, it's better to clear the chat and explain what you next want it to do again if it's relatively non-trivial. Only use the second half of your context (50-100K) for trivial things it can handle or it'll go full retard

Anonymous
06/14/26(Sun)06:45:41 No.109052907

Anonymous 06/14/26(Sun)06:45:41 No.109052907

File: jepa2.png (2.05 MB, 1254x1254)

2.05 MB PNG

I'm bullish on next-vector(s) prediction as a secondary training objective in LLMs. However, LeCun's ideas of what JEPA should be/do will never replace LLMs.

https://arxiv.org/abs/2605.27734

>Learn from your own latents and not from tokens: A sample-complexity theory
>
>Generative models, from diffusion models to large language models, achieve remarkable performance but at a cost in training data orders of magnitude larger than what biological learners require. An alternative paradigm has emerged in which networks are trained to predict their own latent representations of related views or masked regions, as in data2vec and JEPA – an idea related to predictive-coding accounts of the cortex. Despite strong empirical results, the theoretical understanding of these methods remains limited. Central questions include: by how much does latent prediction actually improve data efficiency? Is there a benefit to stacking such methods into multi-scale hierarchies? We answer both using as data a tractable probabilistic context-free grammar that captures the compositional structure of natural language and images. Such a grammar generates strings of visible tokens by recursively applying production rules along a tree of hidden symbols of depth L. For such data, supervised or token-level SSL require a number of samples exponential in L to recover the latent tree; we prove that latent prediction achieves this with a number of samples constant in L, up to logarithmic factors. We confirm this bound with (i) a hierarchical clustering algorithm, (ii) an end-to-end neural network whose predictor-clusterer modules predict their own latents at each level via gradient descent, and (iii) the first sample-complexity analysis of data2vec, which we show implicitly performs hierarchical latent prediction. This suggests that explicit stacking such as H-JEPA is largely redundant.

Anonymous
06/14/26(Sun)06:47:17 No.109052912

Anonymous 06/14/26(Sun)06:47:17 No.109052912

>>109052858
>>109052905
Qwen3.6-27B-IQ4_XS.gguf has been good for me
https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF

Anonymous
06/14/26(Sun)06:50:03 No.109052924

Anonymous 06/14/26(Sun)06:50:03 No.109052924

>>109052809
>requires you to fully understand the codebase. Might as well code it yourself. Would not recommend
If you think of it as a rubber duck that occasionally autocompletes things for you it's bretty gud. Helps an ape like me that just starts typing before thinking about what the code was supposed to do.

Anonymous
06/14/26(Sun)06:50:29 No.109052925

Anonymous 06/14/26(Sun)06:50:29 No.109052925

>>109052892
>>109052905
>>109052912
Really appreciate the help as a final question what do you use as the backend/UI in the meantime of making my own? I've been using textgen since it's simple but the coding aspects aren't very deep.

Anonymous
06/14/26(Sun)06:57:49 No.109052957

Anonymous 06/14/26(Sun)06:57:49 No.109052957

>>109052925
When I'm coding I use Codex CLI https://developers.openai.com/codex/cli with llama.cpp (llama-server) running 27B. Sometimes I use llama's built-in webui with '--tools all' which allows it to read/write and execute shell commands. That also works fine for smaller projects and saves the hassle of getting llama and codex talking, but codex is obviously much better for bigger projects. I haven't tried the others but I know claude code is the worst one to use for local for it's proudly slopcoded

Anonymous
06/14/26(Sun)06:57:53 No.109052958

Anonymous 06/14/26(Sun)06:57:53 No.109052958

>>109052842
Laziness creep is real. I used to ask ChatGPT for the exact code I need and fully read it, then copy-paste it. Now I fully vibecode. I think software development as a discipline will never recover because humans evolved to converse energy and thinking requires energy.

Anonymous
06/14/26(Sun)07:05:40 No.109052985

Anonymous 06/14/26(Sun)07:05:40 No.109052985

https://huggingface.co/bartowski/North-Mini-Code-1.0-GGUF

Anonymous
06/14/26(Sun)07:12:32 No.109053012

Anonymous 06/14/26(Sun)07:12:32 No.109053012

Bread?

Anonymous
06/14/26(Sun)07:17:31 No.109053036

Anonymous 06/14/26(Sun)07:17:31 No.109053036

>>109053012
it's over cloud won

Anonymous
06/14/26(Sun)07:29:00 No.109053069

Anonymous 06/14/26(Sun)07:29:00 No.109053069

>>109052156
AceStep probably. Better results if you train a LoRA perchance >>109043922

Anonymous
06/14/26(Sun)07:35:36 No.109053098

Anonymous 06/14/26(Sun)07:35:36 No.109053098

>>109053012
We have like 6 hours before 404.

Anonymous
06/14/26(Sun)07:36:50 No.109053106

Anonymous 06/14/26(Sun)07:36:50 No.109053106

Emergency bread

>>109053101
>>109053101
>>109053101

Anonymous
06/14/26(Sun)07:40:12 No.109053129

Anonymous 06/14/26(Sun)07:40:12 No.109053129

>>109052985
>https://huggingface.co/bartowski/North-Mini-Code-1.0-GGUF
the thing is, i'm satisfied with gemma-4-31b now
it's fast and reliable, works for me in ccode, pi, openwebui, sillytavern
even if that's a great model, i don't want to go back to swapping creative+coder models around
and does that even have vision?

Anonymous
06/14/26(Sun)07:44:43 No.109053158

Anonymous 06/14/26(Sun)07:44:43 No.109053158

>>109052842
>There's nothing wrong with typing in 2026 and it should be encouraged considering the current industry/political climate
what does "typing" have to do with political climate?
i type less now, and my rsi has gone away
even if it takes the same amount of time, i much prefer having the local model write what i tell it to, then tell it what to fix

Anonymous
06/14/26(Sun)07:52:44 No.109053210

Anonymous 06/14/26(Sun)07:52:44 No.109053210

>>109053106
Retard

Anonymous
06/14/26(Sun)08:17:27 No.109053333

Anonymous 06/14/26(Sun)08:17:27 No.109053333

>>109051212
making me fill happy

Anonymous
06/14/26(Sun)09:02:15 No.109053635

Anonymous 06/14/26(Sun)09:02:15 No.109053635

How is North-chan, Gemma-chan's neighbor?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.