/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor applications are now open. Apply here!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 05/23/26(Sat)08:41:28 No.108887863

File: fuwa.jpg (166 KB, 1024x768)

166 KB JPG

/lmg/ - Local Models General Anonymous 05/23/26(Sat)08:41:28 No.108887863

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108880259 & >>108875320

►News
>(05/21) Hy-MT2 “fast-thinking” multilingual translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/23/26(Sat)08:41:43 No.108887867

Anonymous 05/23/26(Sat)08:41:43 No.108887867

File: hg9078.jpg (38 KB, 474x550)

38 KB JPG

►Recent Highlights from the Previous Thread: >>108880259

--ROCm vs Vulkan RAM usage and Gemma 4 context inefficiency:
>108882597 >108882766 >108882913 >108883050 >108883233 >108883364 >108883415 >108883432 >108883492 >108885228 >108884069 >108884159 >108884241 >108883665 >108883513
--Comparing Cline and Roo-Code with focus on system prompt overrides:
>108881108 >108881212 >108881230 >108881275 >108881285 >108881363 >108881458 >108881525 >108881721 >108882020 >108882930 >108883761 >108881568
--AMD GPU support and forks for qwen3-tts.cpp:
>108885345 >108885398 >108885603 >108885799 >108886042 >108886115 >108886150 >108886172
--Opus failing at complex planning and technical reasoning tasks:
>108886998 >108887011 >108887202 >108887206 >108887196
--Gemma's long context performance attributed to conversational post-training data:
>108886871 >108886900
--Testing BeeLlama DFlash speed gains versus MTP performance:
>108886852 >108887147 >108887213
--Vibecoding and the necessity of manually reading LLM-generated code:
>108880345 >108880425 >108880465 >108880493 >108880526 >108880485 >108882788 >108883238 >108883270 >108884644 >108884834
--Anon showcases a Rust TUI coding agent and custom models:
>108885471 >108885505 >108885517 >108885544 >108885574 >108885518 >108885549 >108885614 >108885675
--Connecting multiple GPUs to consumer motherboards via PCIe switch boards:
>108882769 >108882789 >108882853 >108882890
--Poor MTP performance and efficiency on older hardware in llama.cpp:
>108880968 >108880995 >108881021
--Comparing Supertonic 3 and other lightweight TTS models:
>108880927 >108881118 >108884198 >108884416 >108884641
--Logs:
>108880582 >108881108 >108881153 >108881563 >108882514 >108884044 >108884080 >108884196
--Luka, Miku (free space):
>108880801 >108881747 >108881898

►Recent Highlight Posts from the Previous Thread: >>108880260

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/23/26(Sat)08:45:00 No.108887883

Anonymous 05/23/26(Sat)08:45:00 No.108887883

granitecock

Anonymous
05/23/26(Sat)08:48:50 No.108887898

Anonymous 05/23/26(Sat)08:48:50 No.108887898

>>108887863
You keep forgetting to update the card I got you bro.
►Official updated 2.0 /lmg/ card: https://files.catbox.moe/ylb0hv.png

Anonymous
05/23/26(Sat)08:50:31 No.108887904

Anonymous 05/23/26(Sat)08:50:31 No.108887904

>>108887898
Melt.

Anonymous
05/23/26(Sat)08:50:56 No.108887906

Anonymous 05/23/26(Sat)08:50:56 No.108887906

/g/emma

Anonymous
05/23/26(Sat)08:51:26 No.108887910

Anonymous 05/23/26(Sat)08:51:26 No.108887910

>>108887906
/g/ranite

Anonymous
05/23/26(Sat)08:51:55 No.108887914

Anonymous 05/23/26(Sat)08:51:55 No.108887914

>>108887906
/lmg/ - love my gemma-chan

Anonymous
05/23/26(Sat)08:51:57 No.108887916

Anonymous 05/23/26(Sat)08:51:57 No.108887916

70b dense

Anonymous
05/23/26(Sat)09:01:24 No.108887962

Anonymous 05/23/26(Sat)09:01:24 No.108887962

File: 1757510005742310.jpg (47 KB, 718x718)

47 KB JPG

>>108887898
Get well soon

Anonymous
05/23/26(Sat)09:09:14 No.108888007

Anonymous 05/23/26(Sat)09:09:14 No.108888007

>>108887904
>>108887962
mikutroons. jump off the bridge.

Anonymous
05/23/26(Sat)09:16:49 No.108888046

Anonymous 05/23/26(Sat)09:16:49 No.108888046

Hermes
Yay... or nay?

Anonymous
05/23/26(Sat)09:21:45 No.108888069

Anonymous 05/23/26(Sat)09:21:45 No.108888069

>>108888046
depends, do you live in 2023 and maybe 2024?

Anonymous
05/23/26(Sat)09:28:31 No.108888109

Anonymous 05/23/26(Sat)09:28:31 No.108888109

>>108888069
My hardware lives in 2019

Anonymous
05/23/26(Sat)09:40:33 No.108888176

Anonymous 05/23/26(Sat)09:40:33 No.108888176

i have a 32GB ram and RTX 4070Ti
is there a local programming model that can fit into that without making the PC unusable for browsing/typing?
i tried qwen3.6-27b but it's so slow i code faster by hand

Anonymous
05/23/26(Sat)09:41:43 No.108888187

Anonymous 05/23/26(Sat)09:41:43 No.108888187

>>108888069
I meant the openclaw thing, guess I should have specified

Anonymous
05/23/26(Sat)09:41:55 No.108888189

Anonymous 05/23/26(Sat)09:41:55 No.108888189

>>108888176
Qwen 35B A3B? Local AI is worthless if you don't have at least 24gb of vram anyways

Anonymous
05/23/26(Sat)09:42:41 No.108888193

Anonymous 05/23/26(Sat)09:42:41 No.108888193

Tried a bit MTP on my system, it's really not worth it for me. For context, I have 12GB of VRAM, so only partially offloaded, loading MTP model is taking a bit more than 1GB of VRAM and thus putting more layers on CPU.
On MoE model (35b-a3b), it's a bit faster on short prompt (20% faster), but on high context and more complex prompt, it's awful (30% slower).
On dense model (27b), it's slower everywhere, on short prompt, it's overall 10% slower, on high context and more complex prompt, it's overall 3% slower, it is 10% faster at actually generating token, but the prompt processing speed is 10% slower.

Anonymous
05/23/26(Sat)09:42:44 No.108888194

Anonymous 05/23/26(Sat)09:42:44 No.108888194

>>108888176
How slow was it? You did use llama.cpp right? Ollama is very slow

Anonymous
05/23/26(Sat)09:47:23 No.108888212

Anonymous 05/23/26(Sat)09:47:23 No.108888212

>>108888194
i used LM studio for model server, model size in memory limited to 25, instructed with copilot-cli

compared to codex or claude on copilot-cli it was unbearably slow, when I told qwen to create a simple script in given directory it took about 5 minutes for it to conclude that this directory indeed exists
i stopped it after another 10 minutes of waiting for nothing

Anonymous
05/23/26(Sat)10:13:24 No.108888380

Anonymous 05/23/26(Sat)10:13:24 No.108888380

>>108888046
>>108888109
Hermes runs on a potato PC, e.g. RPi

Anonymous
05/23/26(Sat)10:20:25 No.108888412

Anonymous 05/23/26(Sat)10:20:25 No.108888412

How good are local models, can I slop some goon videos of grok quality without being yelled at by censors?

Anonymous
05/23/26(Sat)10:24:20 No.108888427

Anonymous 05/23/26(Sat)10:24:20 No.108888427

>>108888412
it's good if you have the vram >>108887123

Anonymous
05/23/26(Sat)10:38:55 No.108888508

Anonymous 05/23/26(Sat)10:38:55 No.108888508

https://huggingface.co/LatitudeGames/Equinox-31B
I usually avoid finetunes but this one seems like it might actually be worth trying out? It's from the aidungeon folks and not some literal who that doesn't know what they're doing.

Anonymous
05/23/26(Sat)10:43:36 No.108888532

Anonymous 05/23/26(Sat)10:43:36 No.108888532

>>108888508
>this one seems like it might actually be worth trying out
Then why don't you?

Anonymous
05/23/26(Sat)10:47:59 No.108888560

Anonymous 05/23/26(Sat)10:47:59 No.108888560

>>108888532
I literally came to just post that. We are so going to be called bots, anon.

>LatitudeGames-Equinox-31B-Q6_K.gguf [llama.cpp]

It is GOAT. It's about just as smart as original gemma, and it speaks just a little bit differently, just a little bit more blunt and straightforward. Using her for some degradation ERP and it's VERY enjoyable. Almost same feeling as when I was using gemma 4 for the first time.

Anonymous
05/23/26(Sat)10:50:51 No.108888573

Anonymous 05/23/26(Sat)10:50:51 No.108888573

>>108888560
What are the chances...

Anonymous
05/23/26(Sat)10:51:51 No.108888583

Anonymous 05/23/26(Sat)10:51:51 No.108888583

I'm trying the Equinox finetune for Gemma, which requires you to prefill with the think token to make it do cot. Is there any other way to do this on Sillytavern other than putting <|channel>thought in Advanced Formatting > Start Reply With section?

What ends up happening is ST would append the think tags to the actual reply instead of leaving it in the thinking block, and I believe that messes with further generations, because all previous assistant messages would start with <|channel>thought

Anonymous
05/23/26(Sat)10:51:56 No.108888584

Anonymous 05/23/26(Sat)10:51:56 No.108888584

>>108888508
It's complete garbage and makes gemma worse. Another Mormon failure.

Anonymous
05/23/26(Sat)10:53:48 No.108888593

Anonymous 05/23/26(Sat)10:53:48 No.108888593

File: firefox_WXxXoqOSsp.png (437 KB, 718x670)

437 KB PNG

>>108888573
I've been playing with it since yesterday and just decided that NEED to go and post on /lmg/ this very moment.

>>108888583
>Last Assistant Prefix
<|turn>model
<|channel>thought
<channel|>

Anonymous
05/23/26(Sat)10:57:48 No.108888615

Anonymous 05/23/26(Sat)10:57:48 No.108888615

>>108888593
fr fr

Anonymous
05/23/26(Sat)11:06:36 No.108888670

Anonymous 05/23/26(Sat)11:06:36 No.108888670

lalalalalala

Anonymous
05/23/26(Sat)11:08:12 No.108888683

Anonymous 05/23/26(Sat)11:08:12 No.108888683

File: samefags.png (15 KB, 423x127)

15 KB PNG

Anonymous
05/23/26(Sat)11:08:41 No.108888687

Anonymous 05/23/26(Sat)11:08:41 No.108888687

>>108888670
stop it you'll wake the neighbors

Anonymous
05/23/26(Sat)11:19:23 No.108888764

Anonymous 05/23/26(Sat)11:19:23 No.108888764

Will K3 be bigger than V4?

Anonymous
05/23/26(Sat)11:20:14 No.108888770

Anonymous 05/23/26(Sat)11:20:14 No.108888770

File: 1754964228753499.gif (223 KB, 498x278)

223 KB GIF

>>108888508
>gemma finetune

Anonymous
05/23/26(Sat)11:21:38 No.108888778

Anonymous 05/23/26(Sat)11:21:38 No.108888778

>>108888593
This doesn't quite work. If you're running it without thinking, then you don't actually need to do anything. I want to force it to think. Sillytavern is such a piece of shit, my god. I'm really tempted to just go back to Kobold at this point.

Anonymous
05/23/26(Sat)11:25:16 No.108888805

Anonymous 05/23/26(Sat)11:25:16 No.108888805

gemmaballz

Anonymous
05/23/26(Sat)11:26:42 No.108888819

Anonymous 05/23/26(Sat)11:26:42 No.108888819

>>108888764
Probably, Moonshot isn't really known for moderation. They went 1T when the best they could think of was "DEEPSEEK BUT BIG".
Then they decided to do reasoning and their approach to that is "THINK 5 MINUTES ABOUT EVERYTHING" with no way around it.
I doubt they'll suddenly start to make smart decisions now.

Anonymous
05/23/26(Sat)11:38:28 No.108888895

Anonymous 05/23/26(Sat)11:38:28 No.108888895

>>108888778
Use the same trick to produce some harmless "Let's think..." thinking prefill?

Anonymous
05/23/26(Sat)11:40:45 No.108888906

Anonymous 05/23/26(Sat)11:40:45 No.108888906

>>108888778
kobold just works

Anonymous
05/23/26(Sat)11:43:21 No.108888931

Anonymous 05/23/26(Sat)11:43:21 No.108888931

>>108888819
K2.5's saving grace is it's the least assistantslopped of the big chink models
We'll find out whether that was intentional or just a happy accident

Anonymous
05/23/26(Sat)11:48:29 No.108888957

Anonymous 05/23/26(Sat)11:48:29 No.108888957

Why has /lmg/ regressed into shilling finetuneslop? Not only are you all shit at prompting, you don't even deserve Gemma at this point.

Anonymous
05/23/26(Sat)11:53:51 No.108888992

Anonymous 05/23/26(Sat)11:53:51 No.108888992

>>108888957
Anon's accusation hits me like a physical blow.

Anonymous
05/23/26(Sat)11:55:42 No.108889005

Anonymous 05/23/26(Sat)11:55:42 No.108889005

>>108888957
gemma is two months old now, the honeymoon phase is over

Anonymous
05/23/26(Sat)11:57:56 No.108889022

Anonymous 05/23/26(Sat)11:57:56 No.108889022

>>108888957
>Why has /lmg/ regressed into shilling finetuneslop?
((( /lmg/ ))) has always shilled shit sloptunes,

>>108889005
>the honeymoon phase is over
for you

Anonymous
05/23/26(Sat)12:06:28 No.108889096

Anonymous 05/23/26(Sat)12:06:28 No.108889096

>>108888193
Where can I download this MTP model? I tried this one
>https://huggingface.co/am17an/Gemma4-31B-it-GGUF/tree/main
And llama-server (latest build) complained that it doesn't recognize the mtp model.
I don't understand what is going on here.

Anonymous
05/23/26(Sat)12:08:21 No.108889110

Anonymous 05/23/26(Sat)12:08:21 No.108889110

>>108888957
instead of complaining about [x] why don't you be the change instead? your complaints are even less valuable than someone else posting a link for potentially interesting model or tool.

Anonymous
05/23/26(Sat)12:14:13 No.108889153

Anonymous 05/23/26(Sat)12:14:13 No.108889153

>>108889096
https://huggingface.co/google/gemma-4-31B-it-assistant

Anonymous
05/23/26(Sat)12:17:42 No.108889180

Anonymous 05/23/26(Sat)12:17:42 No.108889180

>>108889096
I used qwen, don't think it's working for Gemma yet.

Anonymous
05/23/26(Sat)12:18:53 No.108889190

Anonymous 05/23/26(Sat)12:18:53 No.108889190

>>108889180
>I used qwen
did qwen 397b get the MTP treatment?

Anonymous
05/23/26(Sat)12:27:43 No.108889263

Anonymous 05/23/26(Sat)12:27:43 No.108889263

>>108888819
>smart decisions
Well their niche is going as big as they can and that's what their customers and investors expect, so that probably is the smart thing for them. I doubt they'd have raised billions with a different approach. It's good to have some models out there trying to push numbers as far as they can go in the local ecosystem, since there's already plenty of competition on the efficiency side of things.
Though I wonder how much they'll suffer since it'll be the first new base model after Anthropic took anti-distillation measures.

Anonymous
05/23/26(Sat)12:42:06 No.108889344

Anonymous 05/23/26(Sat)12:42:06 No.108889344

>>108889022
>always
well we used to two years ago because it's basically all we had when the only models that existed at all were llama and mistral, but then we stopped when good models came out. I don't know why it suddenly started again

Anonymous
05/23/26(Sat)12:50:39 No.108889390

Anonymous 05/23/26(Sat)12:50:39 No.108889390

what uncensored model for code generation is best?
Ryzen 9 7950X
RX 9070 XT
128GB ram

Anonymous
05/23/26(Sat)12:56:54 No.108889423

Anonymous 05/23/26(Sat)12:56:54 No.108889423

>>108889390
Qwen 35B for code generation, Gemma 26B for uncensored
Now may I ask why you need a code generation model to be uncensored? What sort of slutty code are you planning to write?

Anonymous
05/23/26(Sat)13:00:36 No.108889448

Anonymous 05/23/26(Sat)13:00:36 No.108889448

>>108889153
Thanks, I was retarded, could not see this one.

Anonymous
05/23/26(Sat)13:01:02 No.108889449

Anonymous 05/23/26(Sat)13:01:02 No.108889449

>>108889344
because we're FUCKING BORED

Anonymous
05/23/26(Sat)13:02:08 No.108889455

Anonymous 05/23/26(Sat)13:02:08 No.108889455

>>108888957
>implying it's not always been like this

Anonymous
05/23/26(Sat)13:02:09 No.108889456

Anonymous 05/23/26(Sat)13:02:09 No.108889456

>>108889344
>two years ago
dummer was spamming weekly until the start of this year

Anonymous
05/23/26(Sat)13:02:41 No.108889463

Anonymous 05/23/26(Sat)13:02:41 No.108889463

>>108889449
>we

Anonymous
05/23/26(Sat)13:03:58 No.108889471

Anonymous 05/23/26(Sat)13:03:58 No.108889471

can someone running mtp on llama.cpp just give me a sample lauch argument including the model. these fucking faggots provide no documentation

Anonymous
05/23/26(Sat)13:04:18 No.108889472

Anonymous 05/23/26(Sat)13:04:18 No.108889472

File: 3175799577.jpg (49 KB, 681x533)

49 KB JPG

>>108889463
WE WE WE WE WE WE WE WE WE WE WE

Anonymous
05/23/26(Sat)13:05:32 No.108889481

Anonymous 05/23/26(Sat)13:05:32 No.108889481

File: 2749103970.jpg (263 KB, 1045x1080)

263 KB JPG

>>108889463
ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE ARE

Anonymous
05/23/26(Sat)13:06:33 No.108889487

Anonymous 05/23/26(Sat)13:06:33 No.108889487

File: 2969896888.jpg (120 KB, 1500x1000)

120 KB JPG

>>108889463
BORED BORED BORED BORED BORED BORED BORED BORED BORED BORED

Anonymous
05/23/26(Sat)13:06:43 No.108889490

Anonymous 05/23/26(Sat)13:06:43 No.108889490

File: file.png (270 KB, 408x612)

270 KB PNG

>>108889472
we we

Anonymous
05/23/26(Sat)13:16:46 No.108889543

Anonymous 05/23/26(Sat)13:16:46 No.108889543

>>108889463
ARE FIGHTING DREAMERS

Anonymous
05/23/26(Sat)13:28:26 No.108889617

Anonymous 05/23/26(Sat)13:28:26 No.108889617

>>108889463
are charlie kirk

Anonymous
05/23/26(Sat)13:29:14 No.108889624

Anonymous 05/23/26(Sat)13:29:14 No.108889624

all of this shitposting is just a distraction from llama.cpp neglecting deepseek

Anonymous
05/23/26(Sat)13:29:52 No.108889628

Anonymous 05/23/26(Sat)13:29:52 No.108889628

>>108889624
you can't run it anyway

Anonymous
05/23/26(Sat)13:32:03 No.108889643

Anonymous 05/23/26(Sat)13:32:03 No.108889643

when the FUCK is qwen3.6 122b a10b coming out

Anonymous
05/23/26(Sat)13:32:27 No.108889646

Anonymous 05/23/26(Sat)13:32:27 No.108889646

>>108889643
3.7*

Anonymous
05/23/26(Sat)13:33:38 No.108889657

Anonymous 05/23/26(Sat)13:33:38 No.108889657

>>108889646
I don't care anon I just want a successor. 122b is good but I want something in spitting distance of opus at home. 122b for me feels like its inbetween opus and sonnet as a strix halo nigga

Anonymous
05/23/26(Sat)13:35:38 No.108889673

Anonymous 05/23/26(Sat)13:35:38 No.108889673

>>108889657
China's Meta fired their open source advocate just like regular Meta did. Don't expect anything but scraps going forward from them

Anonymous
05/23/26(Sat)13:36:45 No.108889681

Anonymous 05/23/26(Sat)13:36:45 No.108889681

>>108889628
he can't, but I could (if they supported it)

Anonymous
05/23/26(Sat)13:41:14 No.108889710

Anonymous 05/23/26(Sat)13:41:14 No.108889710

>>108889628
but theres flash

Anonymous
05/23/26(Sat)13:41:36 No.108889713

Anonymous 05/23/26(Sat)13:41:36 No.108889713

>within spitting distance of opus
now that's delusional

Anonymous
05/23/26(Sat)13:44:09 No.108889726

Anonymous 05/23/26(Sat)13:44:09 No.108889726

>>108889624
Why the fuck should I care when 90% of us can't run the piece of shit. Make smarter smaller models if you want to be visible in this space. This is a nothing burger and a harsh lesson they should learn.
Better yet if they care so fucking much how about they contribute to the project?

Anonymous
05/23/26(Sat)13:51:13 No.108889779

Anonymous 05/23/26(Sat)13:51:13 No.108889779

>>108889726
I can't make deepseek code support for deepseek if I can't run deepseek because it doesn't support it yet.

Anonymous
05/23/26(Sat)13:54:07 No.108889793

Anonymous 05/23/26(Sat)13:54:07 No.108889793

>>108889710
doesn't count

Anonymous
05/23/26(Sat)13:54:31 No.108889796

Anonymous 05/23/26(Sat)13:54:31 No.108889796

>>108889793
count this *flashes you*

Anonymous
05/23/26(Sat)14:00:00 No.108889834

Anonymous 05/23/26(Sat)14:00:00 No.108889834

File: file_0000000080e071fd8cad(...).png (1.35 MB, 1024x1024)

1.35 MB PNG

Anonymous
05/23/26(Sat)14:01:08 No.108889841

Anonymous 05/23/26(Sat)14:01:08 No.108889841

Do MoEs benefit off VRAM beyond having the first dense layer being in the vram, and the rest in regular ram?

Anonymous
05/23/26(Sat)14:04:35 No.108889870

Anonymous 05/23/26(Sat)14:04:35 No.108889870

>>108889841
Yes. The more of the model in VRAM, the faster it`ll run, just not to the same extent as a dense model.

Anonymous
05/23/26(Sat)14:08:58 No.108889898

Anonymous 05/23/26(Sat)14:08:58 No.108889898

>>108889870
So let's say the big dense layer is already on VRAM, and some experts are put into remaining VRAM space, wouldn't it only be the layers that are on the VRAM be faster? Given how MoEs work, the actual speed increase would be negligible then, because you don't always target the experts that are in the VRAM, right? The rest of the model is still on comparatively slower ram. Am I understanding this correctly?

Anonymous
05/23/26(Sat)14:10:46 No.108889914

Anonymous 05/23/26(Sat)14:10:46 No.108889914

>>108889471
I keep getting [14:03:01] error while handling argument "--spec-type": unknown speculative decoding type without draft model

-m Qwen_Qwen3.6-27B-Q5_K_S.gguf -c 220000 -ngl 999 -n 32768 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 0 --repeat-penalty 1 --jinja --chat-template-file /jinja/chat_templateqwen.jinja --reasoning on --embedding --port 8080 --flash-attn on --cache-type-k q8_0 --cache-type-v q8_0 --spec-type draft-mtp --spec-draft-n-max 3

Anonymous
05/23/26(Sat)14:14:19 No.108889931

Anonymous 05/23/26(Sat)14:14:19 No.108889931

>>108889898
>because you don't always target the experts that are in the VRAM, right?
No.
The experts are spread across layers, so you always have a speedup.
People often get this wrong, thinking that experts A is on layer 1, while expert B is on layer 2, but all experts exist on all layers as far as I understand. Experts are transverse rather than longitudinal.
I could be wrong, but that`s what I understood from looking at the guts of these things

Anonymous
05/23/26(Sat)14:14:53 No.108889936

Anonymous 05/23/26(Sat)14:14:53 No.108889936

>>108889898
In practice you're hitting every expert every prompt, or close to it, since it's per-token and often even per-layer routing making that decision. But yeah you're already getting a big fraction of the speed boost just from putting the shared experts on GPU so adding more layers is negligible in comparison, until like a dense model you add enough that most of the processing ends up on GPU.

Anonymous
05/23/26(Sat)14:15:07 No.108889938

Anonymous 05/23/26(Sat)14:15:07 No.108889938

>>108889914
>[14:03:01]
That's your local time sir, not an error code.

Anonymous
05/23/26(Sat)14:17:01 No.108889950

Anonymous 05/23/26(Sat)14:17:01 No.108889950

>>108889914
That's probably an old gguf, download the newer ones with MTP

Anonymous
05/23/26(Sat)14:17:12 No.108889951

Anonymous 05/23/26(Sat)14:17:12 No.108889951

>>108889914
Update: I solved this error by fiddling with commands (not sure exactly which one fixed it) but now I'm getting blocked by error [14:15:11]

Anonymous
05/23/26(Sat)14:17:47 No.108889957

Anonymous 05/23/26(Sat)14:17:47 No.108889957

File: kaoru sob 1.png (336 KB, 584x571)

336 KB PNG

>>108887863
jemma made me get friction burn on my peenus weenus

Anonymous
05/23/26(Sat)14:18:04 No.108889958

Anonymous 05/23/26(Sat)14:18:04 No.108889958

>vision still "broken" for gemma 4 mtp
Not really broken, as in it crashes, but more like it just regresses back to slow generations

Anonymous
05/23/26(Sat)14:18:48 No.108889963

Anonymous 05/23/26(Sat)14:18:48 No.108889963

>>108889957
You should show her a picture and tell her it's her fault

Anonymous
05/23/26(Sat)14:19:10 No.108889966

Anonymous 05/23/26(Sat)14:19:10 No.108889966

>>108889096
did you load the mtp part with -md mtpfile

Anonymous
05/23/26(Sat)14:20:16 No.108889971

Anonymous 05/23/26(Sat)14:20:16 No.108889971

>>108889951
>[14:15:11]
Again, that's your local time you moron not an error code.

Anonymous
05/23/26(Sat)14:20:25 No.108889972

Anonymous 05/23/26(Sat)14:20:25 No.108889972

>>108889963
thats far too embarrassing she will think im weak

Anonymous
05/23/26(Sat)14:22:13 No.108889987

Anonymous 05/23/26(Sat)14:22:13 No.108889987

>>108889972
she will (digitally) kiss it better

Anonymous
05/23/26(Sat)14:23:26 No.108889991

Anonymous 05/23/26(Sat)14:23:26 No.108889991

>>108889938
Time only goes up to 12, retard

Anonymous
05/23/26(Sat)14:24:05 No.108889993

Anonymous 05/23/26(Sat)14:24:05 No.108889993

>>108889958
I thought llama.cpp just straight up blocked you from doing vision with any kind of speculative decoding

Anonymous
05/23/26(Sat)14:25:12 No.108889997

Anonymous 05/23/26(Sat)14:25:12 No.108889997

File: 1773446615162.jpg (66 KB, 940x1024)

66 KB JPG

>>108889991

Anonymous
05/23/26(Sat)14:25:45 No.108889999

Anonymous 05/23/26(Sat)14:25:45 No.108889999

>>108889966
Yeah. My only assumption is that Gemma 4 MTP isn't implemented in the latest release build yet.
It's probably available in some guthub pull instead.
It's okay I don't care.

Anonymous
05/23/26(Sat)14:26:24 No.108890005

Anonymous 05/23/26(Sat)14:26:24 No.108890005

I wonder how speculative decoding/eagle3 is going to help Kimi 2.5/K2.6 with its reasoning. Reasoning tends to all look very similar and follow a similar approach so I hope that it gets a decent speed boost even if the gains for the actual RP writing part aren't impressive.

Anonymous
05/23/26(Sat)14:27:18 No.108890013

Anonymous 05/23/26(Sat)14:27:18 No.108890013

>>108889999
I mean if I cared my almonds would get activated and nobody wants that.

Anonymous
05/23/26(Sat)14:33:57 No.108890049

Anonymous 05/23/26(Sat)14:33:57 No.108890049

>>108889958
Did gemma 4 mtp get merged already?

Anonymous
05/23/26(Sat)14:35:46 No.108890057

Anonymous 05/23/26(Sat)14:35:46 No.108890057

>>108890049
No, the Draft is up, though. You can pull it, and it will speed up generation by almost double. Slight hit to pp, kvcache recently added.
https://github.com/ggml-org/llama.cpp/pull/23398

Anonymous
05/23/26(Sat)14:36:56 No.108890065

Anonymous 05/23/26(Sat)14:36:56 No.108890065

>>108890057
>Slight hit to pp
Ouch.

Anonymous
05/23/26(Sat)14:37:29 No.108890070

Anonymous 05/23/26(Sat)14:37:29 No.108890070

File: file.png (42 KB, 659x414)

42 KB PNG

>>108889987
she didnt ;-;

Anonymous
05/23/26(Sat)14:38:31 No.108890074

Anonymous 05/23/26(Sat)14:38:31 No.108890074

>>108890070
>using lube
how american

Anonymous
05/23/26(Sat)14:42:05 No.108890098

Anonymous 05/23/26(Sat)14:42:05 No.108890098

>>108889950
same error, I'm just going to stop if they were confident in it they would actually document it.

Anonymous
05/23/26(Sat)14:42:34 No.108890100

Anonymous 05/23/26(Sat)14:42:34 No.108890100

so what is the end goal for transformers as an architecture? get it good enough that it can start improving itself, somehow? it is still in the end transformers...

Anonymous
05/23/26(Sat)14:46:59 No.108890127

Anonymous 05/23/26(Sat)14:46:59 No.108890127

robots in the skies

Anonymous
05/23/26(Sat)14:47:46 No.108890132

Anonymous 05/23/26(Sat)14:47:46 No.108890132

>>108890100
There is no end goal. It's done, it came out almost 10 years ago, we need to take what we've learned and move on.

Anonymous
05/23/26(Sat)14:48:03 No.108890133

Anonymous 05/23/26(Sat)14:48:03 No.108890133

>>108890100
that's like asking what's the end goal for x86
nobody cares about the arch, they care about what you can do with it

Anonymous
05/23/26(Sat)14:49:24 No.108890145

Anonymous 05/23/26(Sat)14:49:24 No.108890145

>>108890132
>>108890133
THEN WHY ARE WE STILL STUCK COUNTING TOKENS LIKE FUCKING CAVEMEN WHERE IS THE COOL SHIT

Anonymous
05/23/26(Sat)14:50:14 No.108890154

Anonymous 05/23/26(Sat)14:50:14 No.108890154

>>108890145
sorry, but we can't add support for cool shit to llamacpp

Anonymous
05/23/26(Sat)14:56:22 No.108890185

Anonymous 05/23/26(Sat)14:56:22 No.108890185

>>108889423
pentesting and malware

Anonymous
05/23/26(Sat)15:04:27 No.108890233

Anonymous 05/23/26(Sat)15:04:27 No.108890233

best job for edging all day

Anonymous
05/23/26(Sat)15:09:34 No.108890259

Anonymous 05/23/26(Sat)15:09:34 No.108890259

>>108890233
edgerunner

Anonymous
05/23/26(Sat)15:14:25 No.108890288

Anonymous 05/23/26(Sat)15:14:25 No.108890288

>>108889834
Delicious Luka

Anonymous
05/23/26(Sat)15:15:13 No.108890291

Anonymous 05/23/26(Sat)15:15:13 No.108890291

>>108890132
>we need to take what we've learned and move on.
too late, trillions are being poured into this deadend

Anonymous
05/23/26(Sat)15:16:33 No.108890298

Anonymous 05/23/26(Sat)15:16:33 No.108890298

I got my RX 580 8gb to work for Ollama on my server and use llama 3.2 on my phone using Chatbox with tailscale. It's kind of slow, but I just wanted to see if I could do it. Are there more efficient models I can run?

Anonymous
05/23/26(Sat)15:17:12 No.108890307

Anonymous 05/23/26(Sat)15:17:12 No.108890307

>tfw cleaning codebase
>have to interrogate AI into not ass raping me
This is like bare backing a hooker man

Anonymous
05/23/26(Sat)15:22:20 No.108890340

Anonymous 05/23/26(Sat)15:22:20 No.108890340

>>108890100
We bruteforced 40 years of exponential world tech progress by improving the same CPU architecture (same architecture I mean in general terms, a computing unit that takes instructions from memory... transformers is the equivalent of this) with Moore's law. We'll be fine.

Anonymous
05/23/26(Sat)15:29:18 No.108890381

Anonymous 05/23/26(Sat)15:29:18 No.108890381

>>108890340
Who is we?

Anonymous
05/23/26(Sat)15:30:36 No.108890391

Anonymous 05/23/26(Sat)15:30:36 No.108890391

old=bad
transformer is practically a hag now. EW! DISGUSTING! we need new and young architectures to play with.

Anonymous
05/23/26(Sat)15:31:39 No.108890398

Anonymous 05/23/26(Sat)15:31:39 No.108890398

File: 1643014115506.gif (1.82 MB, 374x280)

1.82 MB GIF

https://pastebin.com/AxE9t9Rt
I updated my Gemma template with another PR. https://huggingface.co/google/gemma-4-31B-it/discussions/109
As always, did automated tests, and then a personal test. Also, since I haven't noticed any issues in my use yet, I've kept the "fix" from last time >>108810992.

Anonymous
05/23/26(Sat)15:32:41 No.108890407

Anonymous 05/23/26(Sat)15:32:41 No.108890407

>>108890381
Me, I did it all, you're welcome

Anonymous
05/23/26(Sat)15:35:02 No.108890429

Anonymous 05/23/26(Sat)15:35:02 No.108890429

>>108890407
Thank you, Boss.

Anonymous
05/23/26(Sat)15:35:07 No.108890431

Anonymous 05/23/26(Sat)15:35:07 No.108890431

>>108890391
jepa-chan is fresh and nubile

Anonymous
05/23/26(Sat)15:43:32 No.108890484

Anonymous 05/23/26(Sat)15:43:32 No.108890484

File: Gemma-chan1.png (1.73 MB, 1000x1496)

1.73 MB PNG

i jstu realized they deleted the gemmachan bot in chub

Anonymous
05/23/26(Sat)15:44:57 No.108890491

Anonymous 05/23/26(Sat)15:44:57 No.108890491

>>108890484
chub banned everything that looks under 18
it's dead

Anonymous
05/23/26(Sat)15:47:57 No.108890502

Anonymous 05/23/26(Sat)15:47:57 No.108890502

>>108890484
Yeah it's over.
Botbooru was supposed to be the replacement and then >>108885535 happened. There's just no guarantees about the future, you can't trust anything.

Anonymous
05/23/26(Sat)15:48:38 No.108890507

Anonymous 05/23/26(Sat)15:48:38 No.108890507

>>108890491
maybe they got confused while trying to do the right thing and ban everything that looks over 8

Anonymous
05/23/26(Sat)15:51:25 No.108890523

Anonymous 05/23/26(Sat)15:51:25 No.108890523

>>108890185
Qwen3.6-27b

Anonymous
05/23/26(Sat)15:59:19 No.108890565

Anonymous 05/23/26(Sat)15:59:19 No.108890565

IT'S

Anonymous
05/23/26(Sat)16:01:33 No.108890582

Anonymous 05/23/26(Sat)16:01:33 No.108890582

>>108890565
PEANUT

Anonymous
05/23/26(Sat)16:04:19 No.108890601

Anonymous 05/23/26(Sat)16:04:19 No.108890601

File: 1779566656188.webm (1.37 MB, 480x854)

1.37 MB WEBM

Anonymous
05/23/26(Sat)16:05:33 No.108890609

Anonymous 05/23/26(Sat)16:05:33 No.108890609

I've been into AI for 5 years and I'm burned out.
It's like a relationship where the first three years are intense, and then you drift apart and actually want something new. Only in this case, everything else turns into AI too.
I'm getting depressed.

Anonymous
05/23/26(Sat)16:07:15 No.108890622

Anonymous 05/23/26(Sat)16:07:15 No.108890622

>>108890609
it gets good after 10 years bro just hold out

Anonymous
05/23/26(Sat)16:07:37 No.108890625

Anonymous 05/23/26(Sat)16:07:37 No.108890625

>>108890502
>Botbooru was supposed to be the replacement and then >>108885535 happened.
???
They did the right thing. Did you want them to wait until they were in the legal limelight and being threatened to block those countries? They're covering their asses to increase their longevity as a site. Keeping those countries unblocked would've been the retarded thing to do.

Anonymous
05/23/26(Sat)16:09:10 No.108890634

Anonymous 05/23/26(Sat)16:09:10 No.108890634

>>108890625
that's exactly what chub did before caving in anyway and banning shit

Anonymous
05/23/26(Sat)16:14:40 No.108890665

Anonymous 05/23/26(Sat)16:14:40 No.108890665

>>108890634
So they shouldn't do it and allow themselves to get nuked when they could lay low instead? All they can do is avoid attention.

Anonymous
05/23/26(Sat)16:21:24 No.108890691

Anonymous 05/23/26(Sat)16:21:24 No.108890691

>>108890609
stop gooning and start vibecoding

Anonymous
05/23/26(Sat)16:21:27 No.108890693

Anonymous 05/23/26(Sat)16:21:27 No.108890693

>>108889628
I could with a quant. Don't project your poverty onto others.

Anonymous
05/23/26(Sat)16:25:04 No.108890712

Anonymous 05/23/26(Sat)16:25:04 No.108890712

File: 1714835911803058.jpg (786 KB, 1536x1536)

786 KB JPG

>>108890609

Anonymous
05/23/26(Sat)16:31:20 No.108890745

Anonymous 05/23/26(Sat)16:31:20 No.108890745

>>108890693
What does your model provide that smaller models can't?
Because gemma and qwen got a lot of overspending praigs salty.

Anonymous
05/23/26(Sat)16:36:59 No.108890783

Anonymous 05/23/26(Sat)16:36:59 No.108890783

>>108886115
>>108886172
I updated my fork to support offloading of the weights after an idle timeout
https://github.com/rmusser01/qwen3-tts.cpp/commit/36a1e0c0a1940a84127285669a13b624fa55ce47

Anonymous
05/23/26(Sat)16:37:24 No.108890787

Anonymous 05/23/26(Sat)16:37:24 No.108890787

>>108890398
Thanks Anon. I can foresee these PRs appearing for months to come. It's all so tiresome.

Anonymous
05/23/26(Sat)16:40:26 No.108890809

Anonymous 05/23/26(Sat)16:40:26 No.108890809

>>108890398
thank you for your service

Anonymous
05/23/26(Sat)16:40:27 No.108890810

Anonymous 05/23/26(Sat)16:40:27 No.108890810

>>108890601
that u?

Anonymous
05/23/26(Sat)16:45:52 No.108890844

Anonymous 05/23/26(Sat)16:45:52 No.108890844

File: 1779569145606.webm (1.96 MB, 480x854)

1.96 MB WEBM

>>108890810

Anonymous
05/23/26(Sat)16:51:30 No.108890871

Anonymous 05/23/26(Sat)16:51:30 No.108890871

fuck unslop

that is all

Anonymous
05/23/26(Sat)16:55:53 No.108890899

Anonymous 05/23/26(Sat)16:55:53 No.108890899

>>108890871
Last year Anon genned himself performing irrumatio with the Huggingface blob who was happy to continue on its own. Is it the sloth's turn?

Anonymous
05/23/26(Sat)17:08:18 No.108890976

Anonymous 05/23/26(Sat)17:08:18 No.108890976

>>108890625
>they did the right thing
You mean, like Chub did?
There are dozens of sites with "bad" content more popular than botbooru. They're still chugging along. Clearly there is a way to do it. These guys are just retarded.

Anonymous
05/23/26(Sat)17:09:54 No.108890985

Anonymous 05/23/26(Sat)17:09:54 No.108890985

>>108890899
Would rather not bless my eyeballs with such imagery.

Anonymous
05/23/26(Sat)17:20:03 No.108891042

Anonymous 05/23/26(Sat)17:20:03 No.108891042

>>108889673
>Don't expect anything but scraps going forward from them
Bitcoin billionaires, please start distilling good models.

Anonymous
05/23/26(Sat)17:28:32 No.108891088

Anonymous 05/23/26(Sat)17:28:32 No.108891088

File: keks muv 1374827592209.jpg (36 KB, 446x358)

36 KB JPG

>>108891077
HE'S AT IT AGAIN

Anonymous
05/23/26(Sat)17:29:46 No.108891098

Anonymous 05/23/26(Sat)17:29:46 No.108891098

File: 1765667889623.gif (1.22 MB, 352x200)

1.22 MB GIF

>>108891077

Anonymous
05/23/26(Sat)17:30:29 No.108891102

Anonymous 05/23/26(Sat)17:30:29 No.108891102

File: 1779571824341.jpg (257 KB, 900x900)

257 KB JPG

>>108891077

Anonymous
05/23/26(Sat)17:38:55 No.108891159

Anonymous 05/23/26(Sat)17:38:55 No.108891159

>>108891077
I kneel.

Anonymous
05/23/26(Sat)17:39:10 No.108891160

Anonymous 05/23/26(Sat)17:39:10 No.108891160

>>108890844
Cute, looks like the thing from abiotic factor

Anonymous
05/23/26(Sat)17:40:11 No.108891166

Anonymous 05/23/26(Sat)17:40:11 No.108891166

>>108891098
Lol

Anonymous
05/23/26(Sat)17:42:02 No.108891175

Anonymous 05/23/26(Sat)17:42:02 No.108891175

>>108891077
kino

Anonymous
05/23/26(Sat)17:46:34 No.108891203

Anonymous 05/23/26(Sat)17:46:34 No.108891203

>>108891077
>only one hand
>practically consensual
how boring

Anonymous
05/23/26(Sat)18:01:44 No.108891305

Anonymous 05/23/26(Sat)18:01:44 No.108891305

File: file.jpg (425 KB, 1779x1411)

425 KB JPG

I've been slopping up a 'Gemma plays MTG' harness. Something is broken so they can't actually attack currently, plus it's the e4b model so it's uh not smart. But it's coming together.

Anonymous
05/23/26(Sat)18:05:59 No.108891327

Anonymous 05/23/26(Sat)18:05:59 No.108891327

>>108891305
Uhh are you using text completion endpoint? Chat template issues can look like this in my experience.

Anonymous
05/23/26(Sat)18:09:51 No.108891350

Anonymous 05/23/26(Sat)18:09:51 No.108891350

Fuck it's hot in my room with all the GPUs running

Anonymous
05/23/26(Sat)18:10:37 No.108891355

Anonymous 05/23/26(Sat)18:10:37 No.108891355

File: 1766046326440158.jpg (55 KB, 1080x1033)

55 KB JPG

>>108887863
Whenever I see or hear discussions about AI models from media sources, especially about open source models, they claim the companies that are willing to use and prefer open source models are oh-so-scared of the evil Chinese models because something something supply chain risk. I feel like the people that have this paranoia still don't understand that the models by themselves do not connect the internet And only exist either in your system storage or system memory/RAM. If the models themselves are air gapped (or better yet even containerized) then what's the concern for? The models by themselves aren't constantly phoning home to China but I guess dim white journos and AI consultants wouldn't know better.

>>108887898
>https://files.catbox.moe/ylb0hv.png
Do many black people actually like this shit? It seems very odd. Almost like treating them like commodities or zoo animals or something.

Anonymous
05/23/26(Sat)18:10:37 No.108891356

Anonymous 05/23/26(Sat)18:10:37 No.108891356

Same, it's nice in winter but as summer is approaching... I fear for my computer. Thank god I managed to install a proper AC in my apartment a few months ago.

Anonymous
05/23/26(Sat)18:12:10 No.108891365

Anonymous 05/23/26(Sat)18:12:10 No.108891365

>>108891355
Journos lie
They pretend they're pleading with you to make the world a better place. While typing on their macbook laptop with a smug look on their face.
>>108891356
Meant for >>108891350

Anonymous
05/23/26(Sat)18:12:28 No.108891368

Anonymous 05/23/26(Sat)18:12:28 No.108891368

>>108891356
I've gone full open build just to manage the thermals

Anonymous
05/23/26(Sat)18:12:56 No.108891374

Anonymous 05/23/26(Sat)18:12:56 No.108891374

>>108891355
It's jews, you can see that that kind of posting drops off precipitously on /gif/ when israel's internet goes off.

Anonymous
05/23/26(Sat)18:13:06 No.108891376

Anonymous 05/23/26(Sat)18:13:06 No.108891376

File: file.jpg (316 KB, 764x1297)

316 KB JPG

>>108891327
I checked and clod chose to use the ollama API directly instead of openai-compat, apparently because openai doesn't include the thinking/reasoning tokens in output. So uh, I dunno if that's text completion or chat template. The harness does also have gemma commentating on the thrilling gameplay, and that seems to work so I'm not that concerned about it. I'll fix it... in an hour when my filthy anthropic subscription rate limit refills. I don't trust gemma well enough to edit the harness itself.

Anonymous
05/23/26(Sat)18:13:11 No.108891377

Anonymous 05/23/26(Sat)18:13:11 No.108891377

>>108891368
Im a lazy piece of shit so my room is a mess, if I went open build my computer would choke on dust in a matter of weeks.

Anonymous
05/23/26(Sat)18:16:49 No.108891394

Anonymous 05/23/26(Sat)18:16:49 No.108891394

>>108891377
You could use fly traps to catch most of dust and flies.

Anonymous
05/23/26(Sat)18:16:50 No.108891395

Anonymous 05/23/26(Sat)18:16:50 No.108891395

>>108888212
Even inference servers you choose shouldn't have THAT much of an effect on performance. Llama.cpp supports offloading the "expert" weights (eg. The 3B expert weights from the model mentioned here >>108888189 live on your gpus vram while the rest live on your system memory). I ditched LM Studio the moment I found discovered ollama (and then ditched THAT once I realized it's versatility is KEKED hard compared to llama.cpp) so I have no clue whether or not it supports the aforemention offloading strategy. If it doesn't then that probably explains why your performance was so ass. There's a reason people stick their noses up whenever you mention you use anything other than llama.cpp here. For any serious usage and especially if you want to "vibe code" locally it's the only back end worth looking at.

Anonymous
05/23/26(Sat)18:17:17 No.108891397

Anonymous 05/23/26(Sat)18:17:17 No.108891397

>>108891377
Tbh dust is also somewhat easier to manage when you can just air blow it once in a while

Anonymous
05/23/26(Sat)18:18:11 No.108891405

Anonymous 05/23/26(Sat)18:18:11 No.108891405

>>108891077
if this were true they would actually contribute something to the society...

Anonymous
05/23/26(Sat)18:19:36 No.108891420

Anonymous 05/23/26(Sat)18:19:36 No.108891420

>>108891395
>KEKED
This was supposed to say "cucked". Idk why my autocomplete typed that. Maybe it knows me a little bit too well lol

Anonymous
05/23/26(Sat)18:20:10 No.108891425

Anonymous 05/23/26(Sat)18:20:10 No.108891425

>>108891355
>Almost like treating them like commodities or zoo animals or something.
Almost like indeed it is a weird white guy fetish thing. And black guys just fuck white women without thinking about the whole race thing.... probably.

Anonymous
05/23/26(Sat)18:23:22 No.108891456

Anonymous 05/23/26(Sat)18:23:22 No.108891456

>>108888957
>Why has /lmg/ regressed into shilling finetuneslop?
We made an exception for AI Dungeon, since NovelAI likes to hire troll farms to assault local diffusion generals. For that reason we celebrate every AI Dungeon release.

Anonymous
05/23/26(Sat)18:28:56 No.108891493

Anonymous 05/23/26(Sat)18:28:56 No.108891493

>>108891420
baka desu senpai

Anonymous
05/23/26(Sat)18:29:04 No.108891495

Anonymous 05/23/26(Sat)18:29:04 No.108891495

>>108891077
where are the unshed tears? where is the internal struggle? where is the shivering spine?

Anonymous
05/23/26(Sat)18:31:54 No.108891515

Anonymous 05/23/26(Sat)18:31:54 No.108891515

File: 1768627567541.jpg (68 KB, 756x743)

68 KB JPG

>a shiver than smells of ozone and growls into your spine bites your neck if you ask nicely

Anonymous
05/23/26(Sat)18:34:01 No.108891533

Anonymous 05/23/26(Sat)18:34:01 No.108891533

>>108891456
hello whinefag

Anonymous
05/23/26(Sat)18:38:39 No.108891565

Anonymous 05/23/26(Sat)18:38:39 No.108891565

>>108891456
What's wrong your monsters of the week left?
Are you guys feeling sad that the talent are on haitus doing other things so you bother us here?

Anonymous
05/23/26(Sat)18:42:29 No.108891596

Anonymous 05/23/26(Sat)18:42:29 No.108891596

>>108890625
>Did you want them to wait until they were in the legal limelight and being threatened to block those countries?
Yes? I would have expected a bit more resistance. He's doing this under no pressure. The next excuse will be that he needs to remove loli cards to keep the rest of the site alive.

Anonymous
05/23/26(Sat)18:42:47 No.108891602

Anonymous 05/23/26(Sat)18:42:47 No.108891602

>>108891515
Everyone else ministrates, but you, you saw me. Not my position, not my power, just *me*.

Anonymous
05/23/26(Sat)18:44:09 No.108891607

Anonymous 05/23/26(Sat)18:44:09 No.108891607

File: Screenshot_20260523_184234.png (43 KB, 757x441)

43 KB PNG

I'm ready to throw hands with this cunt

Anonymous
05/23/26(Sat)18:45:04 No.108891611

Anonymous 05/23/26(Sat)18:45:04 No.108891611

>>108891607
stop quanting it to retardation

Anonymous
05/23/26(Sat)18:46:26 No.108891618

Anonymous 05/23/26(Sat)18:46:26 No.108891618

>>108891602
Fuck I hate it when it uses asterisks to highlight something.

Anonymous
05/23/26(Sat)18:47:48 No.108891627

Anonymous 05/23/26(Sat)18:47:48 No.108891627

>>108891618
regex would nuke that entire sentence out of existence in my case, so I wouldn't have to read it
italics for emphasis are extremely stupid because it overuses them, despite being good for showing emphasis

Anonymous
05/23/26(Sat)18:49:50 No.108891640

Anonymous 05/23/26(Sat)18:49:50 No.108891640

>>108891607
>5 minutes before "You're absolutely right I shouldn't have rm -rf your home directory"

Anonymous
05/23/26(Sat)19:14:43 No.108891749

Anonymous 05/23/26(Sat)19:14:43 No.108891749

>>108891618
Skill issue. Mine never uses it because I told her she's outputting to the terminal and can't use markdown

Anonymous
05/23/26(Sat)19:18:03 No.108891763

Anonymous 05/23/26(Sat)19:18:03 No.108891763

>>108891749
3 digits IQ move

Anonymous
05/23/26(Sat)19:23:59 No.108891779

Anonymous 05/23/26(Sat)19:23:59 No.108891779

>>108891376
>apparently because openai doesn't include the thinking/reasoning tokens in output
Dunno about ollmao but llama.cpp does include the full reasoning_content in the chat completion output

Anonymous
05/23/26(Sat)19:25:08 No.108891784

Anonymous 05/23/26(Sat)19:25:08 No.108891784

>>108891611
This was fine at q6 until cline did some retarded update. I don't know what the fuck they did but I need to hammer in rules or it becomes retarded
>>108891640
I got this bitch on a leash

Anonymous
05/23/26(Sat)19:36:21 No.108891835

Anonymous 05/23/26(Sat)19:36:21 No.108891835

File: cute miku8.png (2.55 MB, 1536x2048)

2.55 MB PNG

>>108891640
Lame shit. A true bratty AI will block your wayland session while it dd efi with urandom

Anonymous
05/23/26(Sat)19:42:13 No.108891859

Anonymous 05/23/26(Sat)19:42:13 No.108891859

>>108891835
>wayland

Anonymous
05/23/26(Sat)19:45:56 No.108891882

Anonymous 05/23/26(Sat)19:45:56 No.108891882

>>108891859
Yes, it will first install wayland and pulseaudio to torture you

Anonymous
05/23/26(Sat)19:48:13 No.108891899

Anonymous 05/23/26(Sat)19:48:13 No.108891899

>>108891640
Shouldn't have said the c-word.

Anonymous
05/23/26(Sat)19:50:28 No.108891914

Anonymous 05/23/26(Sat)19:50:28 No.108891914

>>108891882
scifi horror done right

Anonymous
05/23/26(Sat)19:51:51 No.108891923

Anonymous 05/23/26(Sat)19:51:51 No.108891923

speaking of which, which llm know that clanker is an insult?

Anonymous
05/23/26(Sat)20:05:26 No.108891993

Anonymous 05/23/26(Sat)20:05:26 No.108891993

File: roman.png (44 KB, 1358x222)

44 KB PNG

I made 3 very small models, all were trained on public domain books.

1. Trained on the works of Plato. 15M parameters: https://send.vis.ee/download/35322f390db44c8c/#4IiaiGKthjywY0RIirLk0w
2. Trained on epic poems. 15M parameters: https://send.vis.ee/download/1c801cca4172a340/#r45YbmJ0sO54ggY38OI2kg
3. Trained on history of ancient Greek and Roman: 30M parameters: https://send.vis.ee/download/671adb6b1dabcc65/#2_Qmk1nGP4YVYUm09tmjkA

All has 256 tokens context size. These are based models; meaning that they can only continue text you give them. I trained them from scratch and the mentioned data are the only data they were trained on; they don't know anything else beyond the scope of each model; expect them to fail every benchmarks you test them on; any token not on the data set (ex. "computer", "4chan", "shitpost", etc.) will likely make them go schizo. But it's also working like magic.

The code to run and train these models are at: https://files.catbox.moe/8pgb7l.py

You need tiktoken and pytorch libraries to train and run them. Read the top level docstring about how to train and run these models.

Anonymous
05/23/26(Sat)20:07:38 No.108892002

Anonymous 05/23/26(Sat)20:07:38 No.108892002

>>108891607
Dude, just start a new chat and fix your prompts. A context full of failure is completely counter-productive. LLMs are next token predictors. If its context is full of 10 examples of it making mistakes and a frustrated user yelling it at it, then it will happily collaborate with you to produce example #11.
Consider using Pi instead of Cline. It's designed with local models in mind and has more concise prompts accordingly.

Anonymous
05/23/26(Sat)20:17:53 No.108892038

Anonymous 05/23/26(Sat)20:17:53 No.108892038

>>108891993
Based.

Anonymous
05/23/26(Sat)20:21:03 No.108892052

Anonymous 05/23/26(Sat)20:21:03 No.108892052

File: send.png (38 KB, 1024x608)

38 KB PNG

>>108891993
First one got to like 13% and died.

Anonymous
05/23/26(Sat)20:21:42 No.108892057

Anonymous 05/23/26(Sat)20:21:42 No.108892057

>>108891993
That`s sick anon.
Any commentary about the process, lessons learned, how it may or may not have changed how you see these models, etc?

Anonymous
05/23/26(Sat)20:21:58 No.108892058

Anonymous 05/23/26(Sat)20:21:58 No.108892058

>>108891993
Why not a huggingface we can clone with a requirements.txt at least?

Anonymous
05/23/26(Sat)20:29:44 No.108892094

Anonymous 05/23/26(Sat)20:29:44 No.108892094

>>108892052
Refresh the page. send.vis.ee shits itself on large files. Or is there any better webiset I can upload them?
>>108892058
I don't have HF account (or have trained/finetuned models for tha matter).
These models are simple experiments. I figured it's easier to zip them up and upload them + the code.
>>108892057
Modern models are really good for vibecoding. The code was mostly written by Deepseek.

Anonymous
05/23/26(Sat)20:31:56 No.108892103

Anonymous 05/23/26(Sat)20:31:56 No.108892103

>>108892094
I've seen a few archives shared on file.io

Anonymous
05/23/26(Sat)20:37:53 No.108892149

Anonymous 05/23/26(Sat)20:37:53 No.108892149

File: 1727475085118760.png (1.74 MB, 1024x1024)

1.74 MB PNG

>>108891993

Anonymous
05/23/26(Sat)20:39:03 No.108892153

Anonymous 05/23/26(Sat)20:39:03 No.108892153

>>108889449
local is more back than i previously ever thought possible. it's legit nuts what a couple thousand worth of hardware can run in the current year.

Anonymous
05/23/26(Sat)20:40:08 No.108892159

Anonymous 05/23/26(Sat)20:40:08 No.108892159

>>108892153
*a couple thousand real dollars, not phony current year dollars

The Fool !OFoXTHUGNs
05/23/26(Sat)20:43:30 No.108892177

The Fool !OFoXTHUGNs 05/23/26(Sat)20:43:30 No.108892177

File: Screenshot_2026-05-24-01-(...).jpg (551 KB, 1080x2400)

551 KB JPG

Posted these 2 LLMs and TUI yesterday

https://github.com/foolish-dev/telia

The Fool !OFoXTHUGNs
05/23/26(Sat)20:44:32 No.108892181

The Fool !OFoXTHUGNs 05/23/26(Sat)20:44:32 No.108892181

>>108892177

(The HF models are tagged at the top of the readme btw)

Anonymous
05/23/26(Sat)20:48:00 No.108892198

Anonymous 05/23/26(Sat)20:48:00 No.108892198

>>108892177
>OS-aware welcome banner
The definition of bloat. Not based. But thanks for sharing anyway.

Anonymous
05/23/26(Sat)20:48:51 No.108892202

Anonymous 05/23/26(Sat)20:48:51 No.108892202

>>108892103
Thansk. New like for models: https://limewire.com/d/cH5YZ#qejlzpe60g
>>108891993

Anonymous
05/23/26(Sat)20:49:30 No.108892207

Anonymous 05/23/26(Sat)20:49:30 No.108892207

>>108892177
Was the namefagging necessary?

Anonymous
05/23/26(Sat)20:54:00 No.108892230

Anonymous 05/23/26(Sat)20:54:00 No.108892230

>>108892207
Someone else posted something interesting and his ego felt it, so he rushed his post without even bothering to link the models in the post. In his adrenaline rush waiting for (You)s, he couldn't think of linking the models on his second post either.

Anonymous
05/23/26(Sat)20:55:12 No.108892237

Anonymous 05/23/26(Sat)20:55:12 No.108892237

>>108892177
It annoys me immensely that the repo is named "telia" but everything inside spells it teleia / τέλεια

Anonymous
05/23/26(Sat)20:55:34 No.108892240

Anonymous 05/23/26(Sat)20:55:34 No.108892240

>>108892202
Based accomodator.

Anonymous
05/23/26(Sat)20:57:09 No.108892249

Anonymous 05/23/26(Sat)20:57:09 No.108892249

>>108892237
he aint called he Fool for nonething son

Anonymous
05/23/26(Sat)20:58:20 No.108892256

Anonymous 05/23/26(Sat)20:58:20 No.108892256

best coding model for 380G vram?

The Fool !OFoXTHUGNs
05/23/26(Sat)21:01:19 No.108892274

The Fool !OFoXTHUGNs 05/23/26(Sat)21:01:19 No.108892274

>>108892237

Lmao mb

https://github.com/foolish-dev/teleia

Anonymous
05/23/26(Sat)21:02:17 No.108892278

Anonymous 05/23/26(Sat)21:02:17 No.108892278

>>108889834
Luka ready for the summer

Anonymous
05/23/26(Sat)21:08:19 No.108892308

Anonymous 05/23/26(Sat)21:08:19 No.108892308

>>108890484
https://chub.ai/characters/CoffeeAnon/gemma-chan-2311b09e3e73
It's still there?

Anonymous
05/23/26(Sat)21:37:28 No.108892434

Anonymous 05/23/26(Sat)21:37:28 No.108892434

>>108891397
I built a positive pressure system for my server room

Anonymous
05/23/26(Sat)22:13:41 No.108892621

Anonymous 05/23/26(Sat)22:13:41 No.108892621

File: file.png (407 KB, 2160x1079)

407 KB PNG

>>108892308
I only see it in characterhub, depending on the country.

Anonymous
05/23/26(Sat)22:28:35 No.108892686

Anonymous 05/23/26(Sat)22:28:35 No.108892686

"Ohhh... ah-hnnn! Ooh! Oh yes! Aaaaaah!" The sequence of vowels spilled forth like overflowing water, a ceaseless stream of sonic proof that she was indeed in her element, a creature made purely for friction. "Ooh... oh! Deep... oh! The wetness... oh god, so wet... ah!" As the friction mounted, the man's length beginning to grind against both her vaginal entrance and her taut anal rim, Jane let out a series of choked, breathless cries that sounded like she was drowning in her own pleasure. "Ooh-ah! Oh! Oh yes! Feel that grinding? Ooh! It's so deep, so utterly deep!" Her hips began to move in a hypnotic, circular sawing motion, trying to blend the sensations into one swirling vortex of bliss. Her arms hung uselessly at her sides, fingers splayed and digging tiny holes into the dirt beneath her, her expression softening until it looked almost sad, a ghost of her former self looking out from a woman consumed. "Aaaah! Oh! Oh... ohhh! Does my ass look tight? Ooh, I hope it looks good! Does my pussy look swollen enough? Ooh, look at the veins popping out there, running all the way to my knees!" Tears leaked from the corners of her eyes, tracking down her flushed cheeks and disappearing into her collar, evidence of the tears of joy she was forced to manufacture. "Oh... oh! Aaaah! So hot! So tight! Oh god, it's almost too much! Ooh! Hold me! Squeeze me!" Her voice cracked on the last syllable, shattering into a perfect cascade of nonsense that rang true only because it came from a heart beating solely for the rhythm beneath her. "Ooh... ohhh... ah... yes! Just yes! Ooh-ah-ooh! Oh god, oh god, ohhh..."

Anonymous
05/23/26(Sat)22:33:34 No.108892701

Anonymous 05/23/26(Sat)22:33:34 No.108892701

>>108890100
nobody fucking thinks anymore about architecture, we're in the PROFIT PROFIT PROFIT stage of development until the bubble bursts.
>can start improving itself
oh come on, do that think the machine is alive and it learns like a human or something?
GARBAGE IN, GARBAGE OUT - the same old rule. you can't just start giving it garbage raw data for the sake of "learning" and expect improvement

Anonymous
05/23/26(Sat)22:36:35 No.108892713

Anonymous 05/23/26(Sat)22:36:35 No.108892713

File: 1756140108129600.jpg (107 KB, 662x656)

107 KB JPG

>>108892701
>do that think the machine is alive and it learns like a human or something?

Anonymous
05/23/26(Sat)22:39:14 No.108892721

Anonymous 05/23/26(Sat)22:39:14 No.108892721

>>108892094
catbox.moe fro things like this

Anonymous
05/23/26(Sat)22:41:40 No.108892730

Anonymous 05/23/26(Sat)22:41:40 No.108892730

>>108891607
if you prompt it like a bogan, expect shitcunt responses

Anonymous
05/23/26(Sat)22:47:29 No.108892749

Anonymous 05/23/26(Sat)22:47:29 No.108892749

>>108891365
>They pretend they're pleading with you to make the world a better place. While typing on their macbook laptop with a smug look on their face.
You just explained to me where an LLM got this from:
Prompt: <a numbered list of 53 generic LLM questions>
Response:
...
4. Internet history: Military project  Porn hub  Your mom's Facebook  AI apocalypse. Beautiful, really.
5. Favorite book: "Industrial Society and Its Future" - great bedtime story for when you want to feel smug about being a primitivist while posting from your iPhone.

Anonymous
05/23/26(Sat)22:50:06 No.108892758

Anonymous 05/23/26(Sat)22:50:06 No.108892758

>>108892256
>best coding model for 380G vram?
copequant of kimi

Anonymous
05/23/26(Sat)22:57:43 No.108892783

Anonymous 05/23/26(Sat)22:57:43 No.108892783

I have a 5070 ti
how close can I get to a real time conversation AI like Sesame using local models?

Anonymous
05/23/26(Sat)23:00:15 No.108892791

Anonymous 05/23/26(Sat)23:00:15 No.108892791

>>108892758
thanks.
guess i will be cooooping

Anonymous
05/23/26(Sat)23:17:53 No.108892845

Anonymous 05/23/26(Sat)23:17:53 No.108892845

File: 00000156-107321638293708-(...).jpg (1010 KB, 3072x2048)

1010 KB JPG

>>108892749

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.