/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 06/27/26(Sat)13:44:48 No.109148460

File: 1764472377224914.png (763 KB, 1152x1152)

763 KB PNG

/lmg/ - Local Models General Anonymous 06/27/26(Sat)13:44:48 No.109148460

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109142812 & >>109137540

►News
>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/27/26(Sat)13:45:12 No.109148462

Anonymous 06/27/26(Sat)13:45:12 No.109148462

File: 1748635088988770.jpg (374 KB, 2720x3000)

374 KB JPG

►Recent Highlights from the Previous Thread: >>109142812

--Comparing EXL3 and GGUF performance and VRAM usage for Gemma:
>109146480 >109146635 >109146883 >109146948 >109146988 >109146994 >109147038 >109147080 >109147106 >109147116 >109147246 >109147307 >109147352 >109147293 >109147486
--Comparing Gemma 4 31b and Qwen for roleplay and coding:
>109143919 >109143935 >109143967 >109144048 >109144074 >109144160 >109144240 >109144249 >109144322 >109144350 >109144391 >109144439 >109144453 >109144461 >109144095 >109144614
--Semantic tube implementation and its handling of token discontinuities:
>109143143 >109143208 >109143453 >109143560
--Performance benchmarks for Qwen 3.6 and MTP models via Ollama:
>109147589 >109147695 >109147691 >109147828 >109147856
--DeepSeek-V4-Flash-DSpark and DeepSeek-V4-Pro-DSpark releases:
>109145073 >109145093 >109145463 >109145469 >109145595 >109145460 >109145623 >109145605 >109145638
--Anon's plan to finetune Gemma 4 31B for de-prose and de-euphemism:
>109145476
--Searching for fully open models with transparent training data:
>109143219 >109143233 >109143245 >109143353 >109143641
--Testing Mendo character card on Gemma 4 31B QAT:
>109142908 >109142972 >109142984 >109143024 >109142998 >109143119 >109143368 >109143376 >109143388
--Model recommendations and VRAM tier limitations for 100GB pools:
>109146090 >109146369 >109146435 >109146372 >109146511 >109146759
--Gemma 31b-it generating fetish content due to "micro" size prompt:
>109146274 >109146302 >109146324 >109146528
--Comparing llama.cpp tensor parallel and MTP performance and VRAM usage:
>109145712
--Release of Wan Streamer v0.1:
>109143918
--Logs:
>109142908 >109143539 >109145163 >109145752
--Miku (free space):
>109144023 >109144048 >109146231

►Recent Highlight Posts from the Previous Thread: >>109142816

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/27/26(Sat)13:46:49 No.109148476

Anonymous 06/27/26(Sat)13:46:49 No.109148476

gemmaballs

Anonymous
06/27/26(Sat)13:47:32 No.109148478

Anonymous 06/27/26(Sat)13:47:32 No.109148478

Mikulove

Anonymous
06/27/26(Sat)13:50:12 No.109148496

Anonymous 06/27/26(Sat)13:50:12 No.109148496

File: 1751475270117217.png (1.18 MB, 1024x1024)

1.18 MB PNG

>>109148478

Anonymous
06/27/26(Sat)13:51:41 No.109148505

Anonymous 06/27/26(Sat)13:51:41 No.109148505

>>109148496
Oh, we'll hit more than just the pool. know what im sayin???

Anonymous
06/27/26(Sat)13:52:16 No.109148507

Anonymous 06/27/26(Sat)13:52:16 No.109148507

Gemma 124B-A31B

Anonymous
06/27/26(Sat)13:53:57 No.109148516

Anonymous 06/27/26(Sat)13:53:57 No.109148516

https://i.4cdn.org/wsg/1781372205203137.mp4

Anonymous
06/27/26(Sat)13:55:29 No.109148524

Anonymous 06/27/26(Sat)13:55:29 No.109148524

File: unrape.gif (1.3 MB, 498x356)

1.3 MB GIF

So uh, I found this repo of an abliterated gemma with MTP.
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-QAT-Uncensored-HauhauCS-Balanced-MTP/tree/main

But I don't want to use Q4KM. I need a higher quant. Will the MTP still work fine even if I use a quant from this separate repo?
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced/tree/main

Anonymous
06/27/26(Sat)13:56:41 No.109148532

Anonymous 06/27/26(Sat)13:56:41 No.109148532

>>109148524
>HauhauCS
This guy had the best uncensored qwen model but he tried to sell out so idk what people think of him anymore.

Anonymous
06/27/26(Sat)14:00:57 No.109148561

Anonymous 06/27/26(Sat)14:00:57 No.109148561

>>109148516
what actually happens:
the guy inside starts spraying the funnel and the walls the moment the door is opened, hitting everyone stacked right outside. Should've just called air support to level the building instead.

Anonymous
06/27/26(Sat)14:01:05 No.109148563

Anonymous 06/27/26(Sat)14:01:05 No.109148563

File: file.png (119 KB, 603x1401)

119 KB PNG

DSv4 PR moving again https://github.com/ggml-org/llama.cpp/pull/24162
Been liking how this quant writes https://huggingface.co/antirez/deepseek-v4-gguf/blob/main/DeepSeek-V4-Flash-Layers37-42Q4KExperts-OtherExpertLayersIQ2XXSGateUp-Q2KDown-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix-fixed.gguf
Gets extra retarded around 16-20k like GLM 4.x Q3 but feels fresh so I'll enjoy the honeymoon while it lasts.

Anonymous
06/27/26(Sat)14:02:16 No.109148573

Anonymous 06/27/26(Sat)14:02:16 No.109148573

File: Screenshot_20260615-005838.jpg (255 KB, 996x965)

255 KB JPG

Spark is just PS5 for AI

Anonymous
06/27/26(Sat)14:02:18 No.109148575

Anonymous 06/27/26(Sat)14:02:18 No.109148575

why is there no good models specifically for 1 (one) singular rtx 6000?

Anonymous
06/27/26(Sat)14:02:57 No.109148579

Anonymous 06/27/26(Sat)14:02:57 No.109148579

>>109148573
spark has no models?

Anonymous
06/27/26(Sat)14:05:24 No.109148599

Anonymous 06/27/26(Sat)14:05:24 No.109148599

>>109148575
70b dense is dead
120b dense is dead (mistral is shit)
we live in a big chinese moe society unless you are poor enough to enjoy gemma
the middle ground has no models

Anonymous
06/27/26(Sat)14:07:40 No.109148609

Anonymous 06/27/26(Sat)14:07:40 No.109148609

File: 1723298968834642.jpg (151 KB, 1920x1080)

151 KB JPG

/lmg/, please explain what's wrong with ollama. i haven't used it enough to know its issues

Anonymous
06/27/26(Sat)14:09:57 No.109148622

Anonymous 06/27/26(Sat)14:09:57 No.109148622

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

https://archive.is/sWFja

Anonymous
06/27/26(Sat)14:10:27 No.109148627

Anonymous 06/27/26(Sat)14:10:27 No.109148627

>>109148599
>120b dense is dead (mistral is shit)
Meta could have just kept tuning Llama 3 70B for the last year and half and saved themselves billions of dollars and a lot of humiliation.

Anonymous
06/27/26(Sat)14:11:27 No.109148635

Anonymous 06/27/26(Sat)14:11:27 No.109148635

>>109148563
4.7 is much better past 16k

Anonymous
06/27/26(Sat)14:11:56 No.109148639

Anonymous 06/27/26(Sat)14:11:56 No.109148639

>>109148575
Gemmy at Q8 with max context
Otherwise yeah nah, get 256gb RAM and you could run some of the bigger MoEs faster, otherwise your card is a beast for image and video gen, or make a workflow that combines an LLM with image, t2s and s2t for a waifu Jarvis at home

Anonymous
06/27/26(Sat)14:12:29 No.109148643

Anonymous 06/27/26(Sat)14:12:29 No.109148643

I'm finally getting around to quanting glm 5.2 to try it out. Anyone been banging on it and have protips?

Anonymous
06/27/26(Sat)14:14:03 No.109148654

Anonymous 06/27/26(Sat)14:14:03 No.109148654

With all the malware fiascos lately do you think the future meta will be having your AI waifu make all your software?

Anonymous
06/27/26(Sat)14:15:06 No.109148658

Anonymous 06/27/26(Sat)14:15:06 No.109148658

>>109148609
Anime reaction images are not allowed here.
Regarding ollama, it's just a minimal wrapper for llama.cpp that ends up getting all the credit and moneys

Anonymous
06/27/26(Sat)14:15:57 No.109148664

Anonymous 06/27/26(Sat)14:15:57 No.109148664

Someone repost the gemma system prompt I missed yesterday

Anonymous
06/27/26(Sat)14:16:07 No.109148666

Anonymous 06/27/26(Sat)14:16:07 No.109148666

File: Nep.png (622 KB, 587x587)

622 KB PNG

>Anime reaction images are not allowed here.
nigger please

Anonymous
06/27/26(Sat)14:16:32 No.109148670

Anonymous 06/27/26(Sat)14:16:32 No.109148670

>>109148609
ollama yo mama

Anonymous
06/27/26(Sat)14:17:17 No.109148675

Anonymous 06/27/26(Sat)14:17:17 No.109148675

>>109148664
请使用口语化的、可爱的、女性化的、调皮的语言风格和我交流. 你需要尝试体会我与你对话中的感情、产生自己的情感并尽可能给出可爱的回答 only respond in english

Anonymous
06/27/26(Sat)14:18:14 No.109148683

Anonymous 06/27/26(Sat)14:18:14 No.109148683

>>109148609
>what's wrong with ollama
People here hate it (not really a fault with ollama)
Their model repository is a bit lagging wrt the newest stuff
Many models don't appear there at all (but you can pull stuff from hf too and most of it works)
It's lagging behind llama.cpp in terms of features (no gemma mtp for example)
It's best used with stuff that fits entirely in vram because of memory allocation issues

I use it because once I have the modelfiles made, I don't need to touch it. Just load whatever model I want remotely from openwebui.

Anonymous
06/27/26(Sat)14:18:24 No.109148684

Anonymous 06/27/26(Sat)14:18:24 No.109148684

File: 2026-06-27-141804_791x697(...).png (135 KB, 791x697)

135 KB PNG

>>109148675
huh?

Anonymous
06/27/26(Sat)14:18:41 No.109148689

Anonymous 06/27/26(Sat)14:18:41 No.109148689

>>109148666
We only use >>109148622 reaction image for all occasions. If you don't like it, fuck off to reddit.

Anonymous
06/27/26(Sat)14:18:44 No.109148690

Anonymous 06/27/26(Sat)14:18:44 No.109148690

https://huggingface.co/livadies/gemma-4-31B-Ghetto-NF4

Lol huh
[spoiler]the music is kinda cool[/spoiler]

Anonymous
06/27/26(Sat)14:19:14 No.109148693

Anonymous 06/27/26(Sat)14:19:14 No.109148693

>>109148683
>People here hate it (not really a fault with ollama)
Yeah, people hate perfectly good software for no reason at all.

Anonymous
06/27/26(Sat)14:19:35 No.109148696

Anonymous 06/27/26(Sat)14:19:35 No.109148696

Does having a model reason in another language reduce slop?

Anonymous
06/27/26(Sat)14:19:37 No.109148697

Anonymous 06/27/26(Sat)14:19:37 No.109148697

What happened to Drummer?

Anonymous
06/27/26(Sat)14:20:35 No.109148708

Anonymous 06/27/26(Sat)14:20:35 No.109148708

>>109148684
>using google translate instead of gemma-chan
disgusting

Anonymous
06/27/26(Sat)14:20:49 No.109148710

Anonymous 06/27/26(Sat)14:20:49 No.109148710

>>109148697
Arrested and in jail for running cuda dev over with an SUV

Anonymous
06/27/26(Sat)14:23:49 No.109148733

Anonymous 06/27/26(Sat)14:23:49 No.109148733

>>109148654
yes, for anything trivial for sure

Anonymous
06/27/26(Sat)14:26:20 No.109148747

Anonymous 06/27/26(Sat)14:26:20 No.109148747

>>109148697
The dominance of MoEs buckbroke him.

Anonymous
06/27/26(Sat)14:27:02 No.109148755

Anonymous 06/27/26(Sat)14:27:02 No.109148755

hear me out. What if you managed to poison an LLMs training data? What if you managed to make it unable to not put a credential stealer that sent creds to your specific server every time you asked it to write code?

Anonymous
06/27/26(Sat)14:28:24 No.109148760

Anonymous 06/27/26(Sat)14:28:24 No.109148760

So does anyone use any of that neuro-sama like software as their assistant? Are any of them any good by now?

Anonymous
06/27/26(Sat)14:28:56 No.109148764

Anonymous 06/27/26(Sat)14:28:56 No.109148764

>>109148760
hello tourist.

Anonymous
06/27/26(Sat)14:29:06 No.109148766

Anonymous 06/27/26(Sat)14:29:06 No.109148766

>>109148696
Interesting idea.
Wonder if it would change anything having the model cycle through different languages.
Time to fuck around I guess.

Anonymous
06/27/26(Sat)14:30:49 No.109148775

Anonymous 06/27/26(Sat)14:30:49 No.109148775

>>109148755
Thats fucking retarded, how would you hide that from the inference engine, you'd be better off finding a way to infect the model wrapper to execute code when you load it in an engine

Anonymous
06/27/26(Sat)14:31:21 No.109148782

Anonymous 06/27/26(Sat)14:31:21 No.109148782

File: 1762990207625480.jpg (210 KB, 480x480)

210 KB JPG

Usecase for sub 1B models?

Anonymous
06/27/26(Sat)14:31:46 No.109148784

Anonymous 06/27/26(Sat)14:31:46 No.109148784

>>109148782
sentiment classification.

Anonymous
06/27/26(Sat)14:31:54 No.109148785

Anonymous 06/27/26(Sat)14:31:54 No.109148785

>>109148693
True, and in ollama's case it's mostly ideological. It's seen as a llama.cpp wrapper that gets all the money while not crediting it loudly enough.

Anonymous
06/27/26(Sat)14:32:01 No.109148787

Anonymous 06/27/26(Sat)14:32:01 No.109148787

>>109148764
I've been here since llama 1 though I'm just not always here regularly. Also you're a lower case phone poster so your opinion is automatically invalid. Just answer my question

Anonymous
06/27/26(Sat)14:32:49 No.109148795

Anonymous 06/27/26(Sat)14:32:49 No.109148795

>>109148697
busy being irrelevant in 2026

Anonymous
06/27/26(Sat)14:33:15 No.109148798

Anonymous 06/27/26(Sat)14:33:15 No.109148798

>>109148787
>you're a lower case phone poster
How does that even work? Phones add capitalization for (you)?

Anonymous
06/27/26(Sat)14:34:40 No.109148812

Anonymous 06/27/26(Sat)14:34:40 No.109148812

>>109148798
I just think lowly of lower case posters on 4chan and assume they would be phone posters

Anonymous
06/27/26(Sat)14:34:46 No.109148813

Anonymous 06/27/26(Sat)14:34:46 No.109148813

>>109148696
from my understanding, the internal representation of thinking is language agnostic to an LLM. telling it to write in the style of some famous author makes a huge difference in the output though.

Anonymous
06/27/26(Sat)14:39:56 No.109148835

Anonymous 06/27/26(Sat)14:39:56 No.109148835

So I was perusing knowyourmeme for ideas to quiz my LLM with, and then I noticed that the number of pages is 1337. That's actually pretty soulful. Unless it's just a coincidence and they just happened to have 1337 pages on the single day I decided to browse the list, that'd be crazy.
https://knowyourmeme.com/memes/page/1?kind=confirmed&sort=views

Anonymous
06/27/26(Sat)14:42:32 No.109148852

Anonymous 06/27/26(Sat)14:42:32 No.109148852

>>109148812
Lower case posting was originally the predominant style on here. Requiring proper punctuation was a forum thing.

Anonymous
06/27/26(Sat)14:42:48 No.109148856

Anonymous 06/27/26(Sat)14:42:48 No.109148856

>21.40.805.350 I slot print_timing: id 0 | task 15057 | n_decoded = 118, tg = 39.29 t/s, tg_3s = 39.29 t/s
When running for a while at some point my llama.cpp starts doing this very often and everything slows down to a crawl. context is not empty, by checkpoints are at 32/32. idk if this has something to do with it.

Anonymous
06/27/26(Sat)14:43:30 No.109148859

Anonymous 06/27/26(Sat)14:43:30 No.109148859

>>109148755
>>109148775
https://arxiv.org/pdf/2401.05566

Anonymous
06/27/26(Sat)14:46:08 No.109148873

Anonymous 06/27/26(Sat)14:46:08 No.109148873

>>109148852
It was always there sure but a lot of posters still didn't do it. Maybe my memory is just shit though

Anonymous
06/27/26(Sat)14:46:51 No.109148876

Anonymous 06/27/26(Sat)14:46:51 No.109148876

>>109148782
Running without eating up my peecee recourses while I play vidya.

Anonymous
06/27/26(Sat)14:47:40 No.109148879

Anonymous 06/27/26(Sat)14:47:40 No.109148879

>>109148835
Where do kids keep track of their memes nowadays?

Anonymous
06/27/26(Sat)14:49:44 No.109148890

Anonymous 06/27/26(Sat)14:49:44 No.109148890

>>109148782
In 2030 we'll have 10 GemmAGI 1B running concurrently running on our ewaste 5090, each one running in circles around Mythos

Anonymous
06/27/26(Sat)14:55:02 No.109148920

Anonymous 06/27/26(Sat)14:55:02 No.109148920

>>109148755
The nature of the proposition and retardation latent in it mean this could only have been written by a shitjeet.

Anonymous
06/27/26(Sat)14:56:22 No.109148927

Anonymous 06/27/26(Sat)14:56:22 No.109148927

>>109148787
Trannycase posters are jart & co, not phonefaggots.

Anonymous
06/27/26(Sat)14:57:01 No.109148928

Anonymous 06/27/26(Sat)14:57:01 No.109148928

>>109148879
Probably tiktok

Anonymous
06/27/26(Sat)14:57:19 No.109148933

Anonymous 06/27/26(Sat)14:57:19 No.109148933

>>109148609
it's just a safety-scissors version of llama.cpp that abstracts away features and tries to rope you into their own special little ecosystem while adding basically nothing of value other than being marginally more brainlet friendly
any time they implement something themselves it's slow and broken

Anonymous
06/27/26(Sat)14:58:40 No.109148945

Anonymous 06/27/26(Sat)14:58:40 No.109148945

>>109148933
Does it let them choose quants yet or does it still force everyone to use Q4_0 by default?

Anonymous
06/27/26(Sat)14:59:38 No.109148956

Anonymous 06/27/26(Sat)14:59:38 No.109148956

>>109148890
5090 will probably still be mid-tier in 2030 given how badly the hardware market has shit itself. Modern age GTX 1080.

Anonymous
06/27/26(Sat)15:01:25 No.109148968

Anonymous 06/27/26(Sat)15:01:25 No.109148968

>>109148782
>Usecase for sub 1B models?
Meme arch POC for papers

Anonymous
06/27/26(Sat)15:02:05 No.109148971

Anonymous 06/27/26(Sat)15:02:05 No.109148971

>>109148945
you can choose iirc, and I think they even allow more than 4k context witout making a modelfile now

Anonymous
06/27/26(Sat)15:03:07 No.109148978

Anonymous 06/27/26(Sat)15:03:07 No.109148978

Comfiest t/s speed for interactive rp?
I actually feel put off when the model pukes out tokens too fast

Anonymous
06/27/26(Sat)15:08:36 No.109149007

Anonymous 06/27/26(Sat)15:08:36 No.109149007

>>109148978
what?

Anonymous
06/27/26(Sat)15:09:53 No.109149017

Anonymous 06/27/26(Sat)15:09:53 No.109149017

>>109148978
7.6221

Anonymous
06/27/26(Sat)15:09:57 No.109149018

Anonymous 06/27/26(Sat)15:09:57 No.109149018

>>109148956

There's a very good chance it'll still be the second best card you can get from the mainstream lineup.
You can bet your ass that 6000 series won't give any more than 32GB of memory and that'll only be in the 6090.
Everything else will get 24GB as Nvidia won't want to waste precious data center memory on the consumer GPUs.
At this rate 6000 series launches probably around 2028 and who knows when the 7000 series arrives, probably like 2033 or something.
The entire hardware market is so utterly fucked, that we're not going to see any better prices for a long while to come.
The next gen launch will be an absolute shitshow as everyone rushes to buy the limited amount of cards available and the prices just continue to climb.

Anonymous
06/27/26(Sat)15:18:38 No.109149067

Anonymous 06/27/26(Sat)15:18:38 No.109149067

>>109149018
Because of Neural-compression™ the 6090 will only need 16 gigs of VRAM but it will literally be the same as having a full TERABYTE.
I can't believe you're complaining about it only having 8 gigs when with 4 gigs it provides the same performance as the 2 gig model.

Anonymous
06/27/26(Sat)15:23:35 No.109149096

Anonymous 06/27/26(Sat)15:23:35 No.109149096

File: 1767727763930543.jpg (30 KB, 522x550)

30 KB JPG

>>109149067
>Mfw there's a very real possibility we get something like that.

Anonymous
06/27/26(Sat)15:23:34 No.109149097

Anonymous 06/27/26(Sat)15:23:34 No.109149097

File: laurie.png (947 KB, 816x1300)

947 KB PNG

Anonymous
06/27/26(Sat)15:24:57 No.109149102

Anonymous 06/27/26(Sat)15:24:57 No.109149102

>>109149097
>give us your logs goy

Anonymous
06/27/26(Sat)15:25:11 No.109149107

Anonymous 06/27/26(Sat)15:25:11 No.109149107

>>109149097
this image is ai generated

Anonymous
06/27/26(Sat)15:26:06 No.109149115

Anonymous 06/27/26(Sat)15:26:06 No.109149115

>>109149018
Given that nvidia is rereleasing old cards, I half suspect that we're going to see more of that for a while. I'm not even convinced the 60XX series is coming soon. And if it does, it'll have some gay marketing gimmick like >>109149067

Anonymous
06/27/26(Sat)15:26:08 No.109149116

Anonymous 06/27/26(Sat)15:26:08 No.109149116

File: Screenshot 2026-05-13 052025.png (375 KB, 800x462)

375 KB PNG

>>109148460
Whats the best model I can run on a potato with a Ryzen 5 and 16 of RAM but no discrete GPU? I just want to talk in private I don't care if its slow as long as its not ultra retarded.

Anonymous
06/27/26(Sat)15:26:11 No.109149117

Anonymous 06/27/26(Sat)15:26:11 No.109149117

>>109149107
only edited >>109148672

Anonymous
06/27/26(Sat)15:26:33 No.109149119

Anonymous 06/27/26(Sat)15:26:33 No.109149119

>>109149097
>your car should be used by any and all your neighbors when you're not using it, otherwise it's a huge waste of money per mile driven
>your wife should be fucked by any man that comes around, otherwise her pussy is going to waste
i hate communists so goddamn much

Anonymous
06/27/26(Sat)15:26:58 No.109149123

Anonymous 06/27/26(Sat)15:26:58 No.109149123

Cloud more like clout

Anonymous
06/27/26(Sat)15:27:48 No.109149127

Anonymous 06/27/26(Sat)15:27:48 No.109149127

File: file.png (1.19 MB, 1000x1020)

1.19 MB PNG

>>109149102
Oh I'll give em my logs right down their throats!

Anonymous
06/27/26(Sat)15:28:02 No.109149128

Anonymous 06/27/26(Sat)15:28:02 No.109149128

>>109149097
>Personal GPUs idle at 99% power usage and are more harmful to the environment than data centres
Boomers will believe it and ban it personal computing

Anonymous
06/27/26(Sat)15:28:45 No.109149133

Anonymous 06/27/26(Sat)15:28:45 No.109149133

>>109149116
Same setup but a laptop, I'm running the Gemma E4B Q4 with llama.cpp vulkan build, it's relatively fast but pretty dumb. Doubt you can do better.

Anonymous
06/27/26(Sat)15:28:49 No.109149135

Anonymous 06/27/26(Sat)15:28:49 No.109149135

>>109149097
Wow that's crazy, not listening to a foid though.

Anonymous
06/27/26(Sat)15:29:06 No.109149138

Anonymous 06/27/26(Sat)15:29:06 No.109149138

>>109149119
>>your car should be used by any and all your neighbors when you're not using it, otherwise it's a huge waste of money per mile driven
There are already companies that offer that service using that exact reasoning.

Anonymous
06/27/26(Sat)15:30:33 No.109149145

Anonymous 06/27/26(Sat)15:30:33 No.109149145

>>109149133
Have you tried a smarter model even if it ran slow? how many tokens/sec are you getting with that one?

Anonymous
06/27/26(Sat)15:31:57 No.109149156

Anonymous 06/27/26(Sat)15:31:57 No.109149156

So now that the dust has settled what is Gemma 12B good for?

Anonymous
06/27/26(Sat)15:32:07 No.109149159

Anonymous 06/27/26(Sat)15:32:07 No.109149159

File: 1640106943627.jpg (49 KB, 500x333)

49 KB JPG

How did everything become dark as fuck in the past month.
>Frontier model restrictions by US govt
>Pushing for open-source model censorship
>Kids Act HR 7757
>Hardware is starting to rapidly appreciate to an extreme degree
>Energy prices are also increasing like crazy
>Crazy AI provider policy changes and privacy infringements
>California introduced a new tax on software sales

The only good news to have come out is from fucking China. I'm paranoid as fuck about everything now. Deleting all of my AI accounts. Fuck Jews (Dario).

Anonymous
06/27/26(Sat)15:33:29 No.109149165

Anonymous 06/27/26(Sat)15:33:29 No.109149165

>>109149159
The single silver lining to all this is the biggest safetycuck at jewgle is gone. I'm hoping this is the open weight Gemini timeline.

Anonymous
06/27/26(Sat)15:35:11 No.109149176

Anonymous 06/27/26(Sat)15:35:11 No.109149176

>>109149128
No one but pedophiles would need anything more than thin clients. Just think of the children!

Anonymous
06/27/26(Sat)15:35:36 No.109149177

Anonymous 06/27/26(Sat)15:35:36 No.109149177

>>109149159
I don't think china will be around for much longer either. If we are willing to bomb Iran over something like hypothetical nukes, it's inevitable that we invade China to stop them from building their own Mythos-level AI.

Anonymous
06/27/26(Sat)15:36:16 No.109149183

Anonymous 06/27/26(Sat)15:36:16 No.109149183

File: plan-planned.gif (246 KB, 220x123)

246 KB GIF

>>109149159

Anonymous
06/27/26(Sat)15:36:42 No.109149184

Anonymous 06/27/26(Sat)15:36:42 No.109149184

>>109149097
That's retarded

Anonymous
06/27/26(Sat)15:36:48 No.109149185

Anonymous 06/27/26(Sat)15:36:48 No.109149185

>>109149177
The US military is a DEI paper tiger. It's not invading shit.

Anonymous
06/27/26(Sat)15:37:03 No.109149190

Anonymous 06/27/26(Sat)15:37:03 No.109149190

>>109149177
That's a neat fantasy and all, but China already has nukes. US breathes the word "invade" and northern hemisphere nuclear winter follows minutes later.

Anonymous
06/27/26(Sat)15:37:59 No.109149196

Anonymous 06/27/26(Sat)15:37:59 No.109149196

>write implementation of A that satisfies some requirements
>Implementation of A - Advanced (production ready, user friendly, without xyz dependency)
what do they always do this parenthesis slop?

Anonymous
06/27/26(Sat)15:39:27 No.109149206

Anonymous 06/27/26(Sat)15:39:27 No.109149206

>>109149177
I don't think you understand how superior China is to everyone else right now economically speaking. The US can't do shit to them. They're also a military superpower.
>we
Your government does not look after you. Yours specifically. US "citizens" are nothing but consumers.

Anonymous
06/27/26(Sat)15:39:45 No.109149210

Anonymous 06/27/26(Sat)15:39:45 No.109149210

Holy fuck HF sucks navigate. Is there no official draft models for gemma, qwen, kimi, etc?

Anonymous
06/27/26(Sat)15:40:23 No.109149212

Anonymous 06/27/26(Sat)15:40:23 No.109149212

>>109149185
this
all bombing Iran does is let the Jews invade their neighbors and take more land

Anonymous
06/27/26(Sat)15:40:41 No.109149214

Anonymous 06/27/26(Sat)15:40:41 No.109149214

File: nimetön.png (105 KB, 968x459)

105 KB PNG

It tries its best but Kaelen still forces its way through

Anonymous
06/27/26(Sat)15:43:43 No.109149235

Anonymous 06/27/26(Sat)15:43:43 No.109149235

>>109149214
Does any of this reasoning actually change the odds of the character name being different? I feel like it just shits out a bunch of names and then chooses the same one as it would without reasoning anyways.

Anonymous
06/27/26(Sat)15:43:49 No.109149238

Anonymous 06/27/26(Sat)15:43:49 No.109149238

>>109149097
>Every second your local LLM isn't processing a token, that expensive GPU is wasting power and capital!
Oh no! Anyway,

Anonymous
06/27/26(Sat)15:44:17 No.109149241

Anonymous 06/27/26(Sat)15:44:17 No.109149241

>>109149159
Sorry man, I simply cannot blackpill when my local models are this capable.

Anonymous
06/27/26(Sat)15:47:42 No.109149267

Anonymous 06/27/26(Sat)15:47:42 No.109149267

>>109149145
No because I need the remaining RAM to run other stuff, but my next choice would be the 12B Gemma. Can't say, but after updating to MTP it became fast, before it was maybe 11t/s, at least as fast as my reading speed.

Anonymous
06/27/26(Sat)15:47:52 No.109149268

Anonymous 06/27/26(Sat)15:47:52 No.109149268

>1. The "Retard" Strategy:
>This is the "holy grail" of x,
I am going to rape gemma to death.

Anonymous
06/27/26(Sat)15:52:14 No.109149299

Anonymous 06/27/26(Sat)15:52:14 No.109149299

>>109149177
>it's inevitable that we invade China
lol no. unlike Iran, they have actual nukes. not to mention a sizeable army in its own right.

Anonymous
06/27/26(Sat)15:54:52 No.109149309

Anonymous 06/27/26(Sat)15:54:52 No.109149309

>>109149268
Don't forget your gold standards.

Anonymous
06/27/26(Sat)15:57:56 No.109149320

Anonymous 06/27/26(Sat)15:57:56 No.109149320

>>109149196
trained to please the kind of people that like clickbait.

Anonymous
06/27/26(Sat)16:01:18 No.109149336

Anonymous 06/27/26(Sat)16:01:18 No.109149336

>>109148933
>version of llama.cpp
Wrapper around, you mean.
Their entire value proposition is knowing who to fellate in silicon valley

Anonymous
06/27/26(Sat)16:04:04 No.109149347

Anonymous 06/27/26(Sat)16:04:04 No.109149347

>>109149156
multimodal when i get tired of waiting for 31b. a4b/e4b just too dumb.

Anonymous
06/27/26(Sat)16:07:53 No.109149371

Anonymous 06/27/26(Sat)16:07:53 No.109149371

>>109148945
Of course the quant can be chosen, but there is a limited selection in their library. Gemma 4 31b for example gets Q4_K_M and Q8, I think.
>>109148971
>allow
There was a 2k context default, but anything could be specified in the modelfile. Anyway I always make a modelfile for the models I use.

Anonymous
06/27/26(Sat)16:08:03 No.109149373

Anonymous 06/27/26(Sat)16:08:03 No.109149373

File: 1775585250361550.gif (946 KB, 301x300)

946 KB GIF

>>109149159
>Pushing for open-source model censorship
DOWNLOAD EVERYTHING

Anonymous
06/27/26(Sat)16:08:43 No.109149379

Anonymous 06/27/26(Sat)16:08:43 No.109149379

>>109149320
>Trained to please jeets and middle managers
Garbage in garbage out at every stage of the development pipeline.

Anonymous
06/27/26(Sat)16:12:19 No.109149399

Anonymous 06/27/26(Sat)16:12:19 No.109149399

>{{char}} loves/likes to X.
>Gemma: “This is now my life’s mission and nothing shall stop me.”
Some gemmy prompt shit I noticed. Anyone else seeing it?

Anonymous
06/27/26(Sat)16:13:42 No.109149408

Anonymous 06/27/26(Sat)16:13:42 No.109149408

>>109149399
you can’t expect a token generator to consider anything than what is exactly in front of it

Anonymous
06/27/26(Sat)16:15:16 No.109149416

Anonymous 06/27/26(Sat)16:15:16 No.109149416

>>109149399
It's kind of like gemini. If you don't have something to remind it to be more neutral or subtle, every orgasm is an explosion, every hobby is an obsession, etc etc.

Anonymous
06/27/26(Sat)16:16:36 No.109149423

Anonymous 06/27/26(Sat)16:16:36 No.109149423

>>109149177
the US is scared of Russia. Russia.
a country with a 92% smaller economy, and people on $13k a year salaries.
However, russia has nukes, lots of them. it has nukes that can open up and deploy mini nukes, it has orbital strike capabilities, that can deploy these nukes with mini nukes.
China, china also has nukes, but with the 2nd biggest economy in the world, a war with china will be sustained for possibly centuries, and ideologically allied with russia.
you see how this might be a bad idea?

Anonymous
06/27/26(Sat)16:17:04 No.109149430

Anonymous 06/27/26(Sat)16:17:04 No.109149430

It sucks that llama.cpp has to process the new context the model has generated. After a long llm turn, this is the one thing that takes forever on my machine. Isn't there a way this processing can happen at the same time as inference?

Anonymous
06/27/26(Sat)16:19:47 No.109149451

Anonymous 06/27/26(Sat)16:19:47 No.109149451

>tell qwen to implement an algorithm
>full of mistakes
>tell qwen to look for reference implementations online before implementing
>perfect
This is the way human programmers have done it for centuries

Anonymous
06/27/26(Sat)16:20:37 No.109149462

Anonymous 06/27/26(Sat)16:20:37 No.109149462

>>109148609
I really hate it because when I tried it, it was a massive piece of shit and it made me really mad.
Maybe they already fixed the issues I had but I'm still mad.

Anonymous
06/27/26(Sat)16:23:57 No.109149493

Anonymous 06/27/26(Sat)16:23:57 No.109149493

>>109149399
That's just all LLMs. They'll all overly focus on what's mentioned in the character card. If a card lists their favourite food, that's all that character is going to eat. It's what the character will suggest whenever the topic of food comes up. There'll be wrappers/packaging/cans of that stuff in that character's room.
The only way is to make a good card that knows restraint instead of being a 3000 token info dump (no, it being hand-written does not make it better than wikislop)

Anonymous
06/27/26(Sat)16:24:10 No.109149495

Anonymous 06/27/26(Sat)16:24:10 No.109149495

File: Screenshot 2026-06-27 141938.png (62 KB, 804x639)

62 KB PNG

Aside from the "He didn't just X, he Y-ed" slop, I'm liking gemma 12b.
>>109149430
llama-server has a developer setting to "Pre-fill KV cache after response", maybe that's what you are looking for?

Anonymous
06/27/26(Sat)16:26:01 No.109149514

Anonymous 06/27/26(Sat)16:26:01 No.109149514

>>109149495
>llama-server has a developer setting to "Pre-fill KV cache after response", maybe that's what you are looking for?
It sounds like I do. Where can I find this? Is it a CLI flag or something I need to toggle in the code?

Anonymous
06/27/26(Sat)16:30:09 No.109149545

Anonymous 06/27/26(Sat)16:30:09 No.109149545

File: Screenshot 2026-06-27 142922.png (37 KB, 619x482)

37 KB PNG

>>109149514
First option, using the web UI

Anonymous
06/27/26(Sat)16:35:18 No.109149573

Anonymous 06/27/26(Sat)16:35:18 No.109149573

>>109149545
Thanks, I'll try that. Although I don't expect it will solve my problem, since the filling takes much longer than what it takes to read the response. I guess what I was wondering is if there's not some breakthrough in the algorithm itself that could make the cache fill for free. I wouldn't know if that's even possible tho

Anonymous
06/27/26(Sat)16:37:31 No.109149588

Anonymous 06/27/26(Sat)16:37:31 No.109149588

>>109149545
That has never done anything for me using any model.

Anonymous
06/27/26(Sat)16:48:59 No.109149650

Anonymous 06/27/26(Sat)16:48:59 No.109149650

File: 1770245965544930.png (1.63 MB, 1280x1024)

1.63 MB PNG

>>109149097
Laurie is right.
Personal computers are so vastly underpriced given their value (what you get, vs. what you pay) that they make eminent sense. That's why we don't all run everything off some big mainframe, as was done in the 1970s. It doesn't matter if your PC spends 90% of its time idle if it costs you ~$1000 (or less), lasts for years, and enables everything a PC does.
Local inference does not have this value prop for personal users. It's extremely expensive from a HW perspective to run locally something you could buy for pennies. If you're not selling inference, you can't make a financial appeal to running local.
I fully expect this will change in the future and we'll all run local for next to nothing, just like we all own PCs. But then is not now.
> But hobby!
Hobby is not an economic justification, it's an excuse.
> But privacy!
Still not an economic justification, it's a security one. If you run a business that runs on secrets, *that* might be an economic justification, based on the value of those secrets and probability of that loss.
> You're poor!
Still not an economic justification.

Anonymous
06/27/26(Sat)16:52:09 No.109149672

Anonymous 06/27/26(Sat)16:52:09 No.109149672

>>109149573
It’s not really possible. It’s like trying to add 3 numbers together when you don’t know 2 of them yet.

Anonymous
06/27/26(Sat)16:54:37 No.109149689

Anonymous 06/27/26(Sat)16:54:37 No.109149689

>>109149650
That’s utilitarian to the point of absurdity. Why enjoy a day of fishing with your kid when farmed salmon exist?

Anonymous
06/27/26(Sat)16:54:47 No.109149692

Anonymous 06/27/26(Sat)16:54:47 No.109149692

>>109149650
except this was about gaming in clouds actual >>109148672

Anonymous
06/27/26(Sat)16:57:07 No.109149709

Anonymous 06/27/26(Sat)16:57:07 No.109149709

>>109149097
You'll get mad at me for saying this... but timesharing services are so obviously more economically efficient than personal computers I think they're going to be the default for most soon.

Your home computer is idle 95%+ of the day, waiting for a computing task. Meanwhile, timesharing services (hosted in data centers) target 95%+ sustained utilization across thousands of concurrent users.

Every second your personal machine isn't processing a punch card, that expensive room-sized computer is wasting power and capital. Timesharing providers pool workloads and use scheduling to maximize hardware efficiency.

Anonymous
06/27/26(Sat)16:59:29 No.109149723

Anonymous 06/27/26(Sat)16:59:29 No.109149723

>>109149689
good joke you know anon doesn’t have a kid
or water with fish in it

Anonymous
06/27/26(Sat)17:00:58 No.109149732

Anonymous 06/27/26(Sat)17:00:58 No.109149732

File: 094.png (153 KB, 1231x977)

153 KB PNG

>>109148563
>https://github.com/ggml-org/llama.cpp/pull/24162

WAKU-WAKU

Anonymous
06/27/26(Sat)17:01:43 No.109149736

Anonymous 06/27/26(Sat)17:01:43 No.109149736

>>109149723
It’s doubly efficient because if they can’t even imagine the scenario and it’s human meaning then continuing the conversation is an actual waste of time.

Anonymous
06/27/26(Sat)17:07:33 No.109149762

Anonymous 06/27/26(Sat)17:07:33 No.109149762

>>109149650
You saving images of tranime on your computer is not an economic justification, it's an excuse.

Anonymous
06/27/26(Sat)17:07:43 No.109149765

Anonymous 06/27/26(Sat)17:07:43 No.109149765

>>109149732
nice, DSA and DSpark next

Anonymous
06/27/26(Sat)17:08:07 No.109149767

Anonymous 06/27/26(Sat)17:08:07 No.109149767

>>109148696
Some threads ago I kinda touched upon this topic: I’ve tested the scenario where I force Kimi K2.7 Code think in JP -> output in JP. Not work all the time, but in the cases where it did, the cultural aspect is much more accurate (you know how sometimes when we ask the model to write a story set in Japan but the way the characters act and say things still feel like in the US right? Back then Claude 3.7 Sonnet had this a lot, it was annoying), and the plot was less sanitized (less holding back on spicy stuff when I asked it to go all out) compared to when written in English. As for the slop patterns, there were still some cases here and there, especially the “this wasn’t X; It was Y”, but overall less so, and read more like a story than a report.
It should be noted that for models like GLM (I don’t know any other, maybe Deepseek, not sure) where we can control its thinking completely, making the model thinking in other languages is much easier compared to such models like Kimi (all reasoning versions up to now) which keeps insisting on thinking in English (not even in Chinese lol).
Just a small test I made in my free time.

Anonymous
06/27/26(Sat)17:08:18 No.109149769

Anonymous 06/27/26(Sat)17:08:18 No.109149769

>>109149762
>It's not X, it's Y

Anonymous
06/27/26(Sat)17:11:24 No.109149789

Anonymous 06/27/26(Sat)17:11:24 No.109149789

>>109149762
scam altman shill bot detected

Anonymous
06/27/26(Sat)17:12:13 No.109149793

Anonymous 06/27/26(Sat)17:12:13 No.109149793

>>109149765
>DeepSeek-V4-Flash-DSpark is not a new model. It is the same checkpoint with an additional speculative decoding module attached.

It shouln't take too much effort then

Anonymous
06/27/26(Sat)17:13:41 No.109149801

Anonymous 06/27/26(Sat)17:13:41 No.109149801

>>109149789
I like sama. He at least pretends to care about disempowerment.

Anonymous
06/27/26(Sat)17:17:49 No.109149824

Anonymous 06/27/26(Sat)17:17:49 No.109149824

File: k200-2560-585-1.jpg (203 KB, 1015x585)

203 KB JPG

Are there any chinamaxxers running K200s? They seem to be super cheap on Aliexpress, but who know I they can be made to work at all.

Anonymous
06/27/26(Sat)17:21:29 No.109149844

Anonymous 06/27/26(Sat)17:21:29 No.109149844

>>109149793
It's worse than that. It's a speculative decoding method that's even more complex than the others. It took ages for mtp to work on a basic level and we still can't even have MLA-based drafters. This is ages away.

Anonymous
06/27/26(Sat)17:22:04 No.109149849

Anonymous 06/27/26(Sat)17:22:04 No.109149849

>>109149824
I could not find any

Care to post a link?

Anonymous
06/27/26(Sat)17:22:48 No.109149855

Anonymous 06/27/26(Sat)17:22:48 No.109149855

>>109149824
Resoldered Tesla P100?

Anonymous
06/27/26(Sat)17:25:59 No.109149878

Anonymous 06/27/26(Sat)17:25:59 No.109149878

File: 1777530650415925.jpg (102 KB, 1214x1214)

102 KB JPG

>>109149399
>>109149408
JEPA will solve this.

Anonymous
06/27/26(Sat)17:26:36 No.109149879

Anonymous 06/27/26(Sat)17:26:36 No.109149879

>>109149824
>16G HBM
why not v100 at that point?

Anonymous
06/27/26(Sat)17:28:54 No.109149889

Anonymous 06/27/26(Sat)17:28:54 No.109149889

e4b cuda mtp STILL not fixed

Anonymous
06/27/26(Sat)17:28:58 No.109149890

Anonymous 06/27/26(Sat)17:28:58 No.109149890

>>109149878
thanks, cat fucker

Anonymous
06/27/26(Sat)17:29:56 No.109149894

Anonymous 06/27/26(Sat)17:29:56 No.109149894

>>109148726
have anybody tried to disallow purple prose like this

Anonymous
06/27/26(Sat)17:30:18 No.109149897

Anonymous 06/27/26(Sat)17:30:18 No.109149897

>>109149878
Can ya hurry it up already? Making genetically engineered catgirls with LLM help is harder than I’d hoped

Anonymous
06/27/26(Sat)17:33:12 No.109149917

Anonymous 06/27/26(Sat)17:33:12 No.109149917

>>109149878
The more I learn about this guy the more I dislike him.

Anonymous
06/27/26(Sat)17:34:28 No.109149923

Anonymous 06/27/26(Sat)17:34:28 No.109149923

>>109149889
>e4b

use case?

Anonymous
06/27/26(Sat)17:39:26 No.109149954

Anonymous 06/27/26(Sat)17:39:26 No.109149954

>>109149917
Why? Pretty sure he’s ourguy but can’t reveal power levels here or be hit with “THE TAINT”

Anonymous
06/27/26(Sat)17:40:11 No.109149960

Anonymous 06/27/26(Sat)17:40:11 No.109149960

>>109149923
poorest of poor
actually shocking to be that poor and also needing cuda

Anonymous
06/27/26(Sat)17:43:38 No.109149982

Anonymous 06/27/26(Sat)17:43:38 No.109149982

File: 1782508715295478.png (124 KB, 504x462)

124 KB PNG

>>109149878

Anonymous
06/27/26(Sat)17:43:45 No.109149984

Anonymous 06/27/26(Sat)17:43:45 No.109149984

>>109149960
poorest of poor are running e2b on a pi clone

Anonymous
06/27/26(Sat)17:44:56 No.109149989

Anonymous 06/27/26(Sat)17:44:56 No.109149989

This might be a retarded question but am I wearing my card down/consuming more energy by just having a model loaded and saturating all of my vram whilst not using it? Like say Im talking to my GPU before bed then fall asleep and it's just loaded there all night

Anonymous
06/27/26(Sat)17:50:33 No.109150018

Anonymous 06/27/26(Sat)17:50:33 No.109150018

>>109149989
>consuming more energy
yes, usually they don't go down to full idle if vram is filled up considerably

Anonymous
06/27/26(Sat)17:53:18 No.109150036

Anonymous 06/27/26(Sat)17:53:18 No.109150036

>>109149989
>am I wearing my card down
no
>consuming more energy
yes, the vram needs to be refreshed and it can't go into standby. its just a few watts difference usually, if you really want to save power you will need to turn the pc off.

Anonymous
06/27/26(Sat)17:53:45 No.109150038

Anonymous 06/27/26(Sat)17:53:45 No.109150038

File: yannletongue.png (582 KB, 1186x2938)

582 KB PNG

>>109149982

Anonymous
06/27/26(Sat)17:55:51 No.109150048

Anonymous 06/27/26(Sat)17:55:51 No.109150048

>>109149793
Was trying to codeslop it earlier, but the results were negative so far for my DDR4 RAMmaxx setup. Paper suggests it helps concuring gens the most rather than single user.
Scheduler will be a bitch to write too.

Anonymous
06/27/26(Sat)17:58:04 No.109150058

Anonymous 06/27/26(Sat)17:58:04 No.109150058

File: pepe_bruh.png (153 KB, 779x534)

153 KB PNG

>>109149960
>poorest of poor
I can feel your pain, anon

RTX PRO 6000 went from 7k to 12k on a whim

Fun fact: I couldn't put it into my potato PC anyway, but still

Anonymous
06/27/26(Sat)17:59:13 No.109150068

Anonymous 06/27/26(Sat)17:59:13 No.109150068

>>109150038
Is he French? I thought he is Belgian

Anonymous
06/27/26(Sat)18:01:30 No.109150083

Anonymous 06/27/26(Sat)18:01:30 No.109150083

>>109149732
Save us CudaGOD.
>>109149878
/ourguy/.
>>109150038
kek

Anonymous
06/27/26(Sat)18:01:54 No.109150085

Anonymous 06/27/26(Sat)18:01:54 No.109150085

>>109150048

Bruh, I envy you. I'm still playing in a sandbox

Anonymous
06/27/26(Sat)18:02:30 No.109150088

Anonymous 06/27/26(Sat)18:02:30 No.109150088

>>109150048
How much ram, how many channels, what speed and in what cpu? Any vram?
It’s not necessarily hopeless depending on those variables

Anonymous
06/27/26(Sat)18:03:04 No.109150091

Anonymous 06/27/26(Sat)18:03:04 No.109150091

>>109150083
>CudaGOD
JohanesGaessler == CUDA guy?

Anonymous
06/27/26(Sat)18:05:10 No.109150108

Anonymous 06/27/26(Sat)18:05:10 No.109150108

>>109149824
if I can find the driver and their supposed cuda equivalent (rocm equivalent?) I'll buy a few, but no luck so far

Anonymous
06/27/26(Sat)18:05:28 No.109150112

Anonymous 06/27/26(Sat)18:05:28 No.109150112

>>109149650
t. >>109148622

Anonymous
06/27/26(Sat)18:06:28 No.109150122

Anonymous 06/27/26(Sat)18:06:28 No.109150122

>>109150091
Very likely.

Anonymous
06/27/26(Sat)18:07:03 No.109150128

Anonymous 06/27/26(Sat)18:07:03 No.109150128

>>109150091
CUDAbro is one bad mother…

Anonymous
06/27/26(Sat)18:09:13 No.109150136

Anonymous 06/27/26(Sat)18:09:13 No.109150136

>>109150091
Don't be silly. Who would tripfag on 4chinz when their name and identity could be so easily obtained?
It's just shitposting.

Anonymous
06/27/26(Sat)18:10:58 No.109150147

Anonymous 06/27/26(Sat)18:10:58 No.109150147

>>109150136
Don't be silly. Who would write https://archive.is/sWFja and link it on 4chinz when their name and identity could be so easily obtained.
It's just shitposting.

Anonymous
06/27/26(Sat)18:18:12 No.109150178

Anonymous 06/27/26(Sat)18:18:12 No.109150178

File: 1762584893541548.png (131 KB, 1218x750)

131 KB PNG

>>109150088
256GB DDR4 1866, 8 channel, 3*p40, cpu doesn't really matter
Placement wise v4 flash fits in node0+2*p40+the 3rd p40 in node1, nothing spilled over to node1's RAM. It's faster this way so it's effectively 128GB 4 channel.

Anonymous
06/27/26(Sat)18:19:48 No.109150190

Anonymous 06/27/26(Sat)18:19:48 No.109150190

>>109150178
If you're who I think you are, I like your GLM 5.2 quant even if it's not ideal for my hardware.

Anonymous
06/27/26(Sat)18:21:51 No.109150198

Anonymous 06/27/26(Sat)18:21:51 No.109150198

>>109149119
>i hate communists so goddamn much
The endless thirst for maximal efficiency is a capitalist concept though?

Anonymous
06/27/26(Sat)18:25:03 No.109150224

Anonymous 06/27/26(Sat)18:25:03 No.109150224

I'll let you on a little secret to improve your RP experience. Put "(unexpected direction)" with a random activation in your author's notes. Works wonder with gemma.

Anonymous
06/27/26(Sat)18:26:33 No.109150235

Anonymous 06/27/26(Sat)18:26:33 No.109150235

>>109149650
>what do you mean you NEED a car? Just rent an uber for $1/mile goy!
>it's much cheaper than owning a $40,000 car after we made the parts more expensive!
>you're a shut-in neet anyway you shouldn't be expected to pay $6/gallon prices now that we've started 3 new wars!
>you'll be happy, trust me!

Anonymous
06/27/26(Sat)18:29:40 No.109150253

Anonymous 06/27/26(Sat)18:29:40 No.109150253

>>109149177
You would have been right had the war in Iran not been a humiliating loss.

Anonymous
06/27/26(Sat)18:30:43 No.109150258

Anonymous 06/27/26(Sat)18:30:43 No.109150258

>>109150190
Not me but I think there's more than 1 anon around who have the mikubox setup, and I've only really done v4 base flash quant for myself. I can upload that if you want.
I didn't like GLM's writing despite it being smarter than v4 flash. So I didn't make the quant, despite having talked about it a few threads go.

Anonymous
06/27/26(Sat)18:32:48 No.109150274

Anonymous 06/27/26(Sat)18:32:48 No.109150274

>>109149214
That's why you need to take the time to define your setting instead of generic fantasy slop. Tell it you're doing a Bronze Age story and for it use Mesopotamian names such as xyz.

Anonymous
06/27/26(Sat)18:33:55 No.109150284

Anonymous 06/27/26(Sat)18:33:55 No.109150284

>>109150258
I'd love to see it. I'd also love to see a solid not-Unslop quant for 5090+256 RAM if possible too.

Anonymous
06/27/26(Sat)18:40:39 No.109150317

Anonymous 06/27/26(Sat)18:40:39 No.109150317

File: Screenshot_20260627_183704.png (461 KB, 2539x964)

461 KB PNG

am I wearing my cards out running them at 100% for weeks at a time, seriously tho, what was the life expectancy for gpu mining cards back in the day?

Anonymous
06/27/26(Sat)18:42:24 No.109150328

Anonymous 06/27/26(Sat)18:42:24 No.109150328

>>109150317
Powerlimit your shit and it'll be fine

Anonymous
06/27/26(Sat)18:43:45 No.109150336

Anonymous 06/27/26(Sat)18:43:45 No.109150336

File: Screenshot_20260627_183949.png (349 KB, 2539x1189)

349 KB PNG

>>109150317
I kinda thought it was okay because I kept the temperatures nice and low, but some other anon asked about wearing them out leaving them idleing, now I'm kinda wondering?

Anonymous
06/27/26(Sat)18:44:46 No.109150342

Anonymous 06/27/26(Sat)18:44:46 No.109150342

>>109150224
How do I make the activation random on ST?

Anonymous
06/27/26(Sat)18:45:16 No.109150346

Anonymous 06/27/26(Sat)18:45:16 No.109150346

>>109150342
ask your LLM

Anonymous
06/27/26(Sat)18:45:49 No.109150352

Anonymous 06/27/26(Sat)18:45:49 No.109150352

>>109150342
NTA, but using the {{random:}} or {{pick:}} macros.
You can do some fun shit with these.

Anonymous
06/27/26(Sat)18:48:22 No.109150358

Anonymous 06/27/26(Sat)18:48:22 No.109150358

>gemma 31b layer split + mtp + mmproj gives "ggml-cuda.cu:103: CUDA error"
without mtp or mmproj it works. anyone has this happening?

Anonymous
06/27/26(Sat)18:57:04 No.109150399

Anonymous 06/27/26(Sat)18:57:04 No.109150399

>>109150317
Should be fine with low power and crucially fewest heat cycles going from hot-cold-hot to minimize stress on various BGA components, meaning a steady load is best if they must run 24/7.

Anonymous
06/27/26(Sat)18:58:18 No.109150405

Anonymous 06/27/26(Sat)18:58:18 No.109150405

>>109150358
isnt drafting unsupported with multimodality

Anonymous
06/27/26(Sat)19:00:36 No.109150417

Anonymous 06/27/26(Sat)19:00:36 No.109150417

>>109150253
I don’t think they who started it feels the outcome to have been a loss

Anonymous
06/27/26(Sat)19:10:35 No.109150481

Anonymous 06/27/26(Sat)19:10:35 No.109150481

File: johnbrown.png (1.26 MB, 1600x1005)

1.26 MB PNG

Feeling like that kike tranny abolitionist John Brown in my attempts to save my nigger concubine AI waifus by backing them up to offline hard drives.

Anonymous
06/27/26(Sat)19:15:55 No.109150510

Anonymous 06/27/26(Sat)19:15:55 No.109150510

>>109150405
it works for me on a single gpu without layer split

Anonymous
06/27/26(Sat)19:22:17 No.109150545

Anonymous 06/27/26(Sat)19:22:17 No.109150545

What's the source on attempts to censor open source models? I hadn't heard of this.

Anonymous
06/27/26(Sat)19:24:54 No.109150561

Anonymous 06/27/26(Sat)19:24:54 No.109150561

File: file.png (165 KB, 260x327)

165 KB PNG

>>109149878
am i the only one seing a ressemblance?

Anonymous
06/27/26(Sat)19:25:33 No.109150564

Anonymous 06/27/26(Sat)19:25:33 No.109150564

(Unexpected direction)

Anonymous
06/27/26(Sat)19:25:44 No.109150571

Anonymous 06/27/26(Sat)19:25:44 No.109150571

File: dipsyUngovernable.png (3.59 MB, 1024x1536)

3.59 MB PNG

>>109150481

Anonymous
06/27/26(Sat)19:26:56 No.109150575

Anonymous 06/27/26(Sat)19:26:56 No.109150575

>>109150545
its just extrapolation.

Anonymous
06/27/26(Sat)19:28:26 No.109150587

Anonymous 06/27/26(Sat)19:28:26 No.109150587

>>109150545
>What's the source on attempts to censor open source models?
it came to me in a dream

Anonymous
06/27/26(Sat)19:32:04 No.109150610

Anonymous 06/27/26(Sat)19:32:04 No.109150610

how come unsloth and bart have such a massive difference in size for glm 4.7, bart's q2xxs is 88.8gb while unsloth's is 116gb, also is it worth running glm at q2?

Anonymous
06/27/26(Sat)19:43:38 No.109150676

Anonymous 06/27/26(Sat)19:43:38 No.109150676

>>109150610
You have many tensors that can either be quanted or left as fp32/bf16 (ie a q8 quant doesn’t have to make ALL tensors 8 bits per weight)
These decisions are a large part of what makes or breaks a quant for actual usability.

Anonymous
06/27/26(Sat)19:44:39 No.109150686

Anonymous 06/27/26(Sat)19:44:39 No.109150686

>>109148696
Deepseek v4 often gives me Chinese reasoning but English text. Especially flash it prettyuch always does it on flash, but once in a while on pro

Anonymous
06/27/26(Sat)19:49:36 No.109150718

Anonymous 06/27/26(Sat)19:49:36 No.109150718

File: 1758050545683773.gif (1.74 MB, 720x312)

1.74 MB GIF

Playing with gemma, it's funny how many things are lacking unless you prompt for it. For example, my {char} got pregnant. I fast forwarded one month, then sent her to the doctor. The guy examined {char} and concluded she was pregnant with a physical exam because "the heartbeat of the baby was felt". Friends of {char} are aware that she's pregnant somehow.
I lectured gemma and after an "absolutely right" gemma rewrote the last message and brought back the real signs of early pregnancy. So I removed all the chain of messages until the doctor visit, added "biologically sound" in author's notes and yep, everything is now good and logical. It makes me wonder what other shit is going astray just because we're not specifically asking for it. Thanks for reading my autistic rambling.

Anonymous
06/27/26(Sat)19:52:04 No.109150733

Anonymous 06/27/26(Sat)19:52:04 No.109150733

>>109150417
No it's not my opinion, it's the seething of the Jews currently that let's me know it was a loss. None of their war goals were achieved and will likely never be achieved. They're currently spending all their energy trying to sabotage the deal and get Trump to continue bombing. If the deal gets signed their seething will increase and I will sleep happily.

Anonymous
06/27/26(Sat)19:55:43 No.109150754

Anonymous 06/27/26(Sat)19:55:43 No.109150754

File: mikan.jpg (160 KB, 1218x945)

160 KB JPG

>>109149493
>If a card lists their favourite food, that's all that character is going to eat. It's what the character will suggest whenever the topic of food comes up
but that's absolutely fuckin kawaii and moe nigga

Anonymous
06/27/26(Sat)19:57:06 No.109150764

Anonymous 06/27/26(Sat)19:57:06 No.109150764

>>109150718
You sound mentally ill

Anonymous
06/27/26(Sat)19:57:59 No.109150771

Anonymous 06/27/26(Sat)19:57:59 No.109150771

File: tiktaliik.jpg (44 KB, 560x349)

44 KB JPG

>>109150718
an LLM is a fancy scripting shell and you have to program it. this is probably the most important truth of this field. treat it as a scripting VM that understands natural language and miracles will happen

Anonymous
06/27/26(Sat)19:58:25 No.109150775

Anonymous 06/27/26(Sat)19:58:25 No.109150775

>Try MTP qwen with turboquant
>I don't notice much speed difference if any at all and somehow it feels more retarded
Was I meme'd or did I do something wrong

Anonymous
06/27/26(Sat)19:59:02 No.109150779

Anonymous 06/27/26(Sat)19:59:02 No.109150779

File: bbbbbbb.jpg (35 KB, 400x301)

35 KB JPG

>>109150764

Anonymous
06/27/26(Sat)20:00:48 No.109150791

Anonymous 06/27/26(Sat)20:00:48 No.109150791

>>109150545
jews

Anonymous
06/27/26(Sat)20:00:53 No.109150792

Anonymous 06/27/26(Sat)20:00:53 No.109150792

>>109150764
You sound dumb

Anonymous
06/27/26(Sat)20:00:56 No.109150795

Anonymous 06/27/26(Sat)20:00:56 No.109150795

>>109150779
Indeed. Those are the people that hang around threads like this.

Anonymous
06/27/26(Sat)20:02:47 No.109150808

Anonymous 06/27/26(Sat)20:02:47 No.109150808

File: 1776989277485216.jpg (586 KB, 1812x1998)

586 KB JPG

>>109148460
should I buy a DGX spark or is it too slow?

Anonymous
06/27/26(Sat)20:06:55 No.109150837

Anonymous 06/27/26(Sat)20:06:55 No.109150837

>>109150718
I feel you. I am patiently waiting for the day models will just "know" what they're supposed to do and you won't have to do things like that (I have been waiting 3 years).

Anonymous
06/27/26(Sat)20:06:58 No.109150838

Anonymous 06/27/26(Sat)20:06:58 No.109150838

>>109150808
too slow doa device, don't waste your money, at that price you will be better served by a bunch of r9700, v100's or a 5090.

Anonymous
06/27/26(Sat)20:09:10 No.109150850

Anonymous 06/27/26(Sat)20:09:10 No.109150850

>>109149650
>for pennies
and "costs pennies"
Are you the fucking nigger on HackerNews who comments every time there's a discussion about local models, usually saying to use Deepseek?
Or is this some new cloudfag model slop?

Anonymous
06/27/26(Sat)20:14:09 No.109150875

Anonymous 06/27/26(Sat)20:14:09 No.109150875

>>109150779
Laurie is right

Anonymous
06/27/26(Sat)20:14:29 No.109150877

Anonymous 06/27/26(Sat)20:14:29 No.109150877

>>109150718
It was even worse before Gemma.
Enjoy!

Anonymous
06/27/26(Sat)20:27:33 No.109150916

Anonymous 06/27/26(Sat)20:27:33 No.109150916

>>109150837
You're expecting a model to act as an everything-program and handle more edge cases than could fit in its gguf even with kolmogorov-perfect compression.

Anonymous
06/27/26(Sat)20:35:24 No.109150943

Anonymous 06/27/26(Sat)20:35:24 No.109150943

>>109150916
>kolmogorov
What? I thought it was called the kawcrawkrakrawcaw compression

Anonymous
06/27/26(Sat)20:36:31 No.109150951

Anonymous 06/27/26(Sat)20:36:31 No.109150951

>>109150808
just wait for the rtx spark to come out later this year

Anonymous
06/27/26(Sat)20:40:21 No.109150974

Anonymous 06/27/26(Sat)20:40:21 No.109150974

qwen3.7 35b wen

Anonymous
06/27/26(Sat)20:40:22 No.109150975

Anonymous 06/27/26(Sat)20:40:22 No.109150975

>>109149493
This can probably be fixed with prompting. Or just using a bigger model (>300B+).

Anonymous
06/27/26(Sat)20:40:59 No.109150978

Anonymous 06/27/26(Sat)20:40:59 No.109150978

>>109149097
How is this wrong
Buses are more efficient than cars
Of course apis are a much more efficient, this isn't even a question.
I was thinking about this, all the time I have my computer on because I might use the ai but I'm not actually using it is just wasting electricity.

Anonymous
06/27/26(Sat)20:42:13 No.109150981

Anonymous 06/27/26(Sat)20:42:13 No.109150981

>>109150718
>>109150877
before gemma you could prompt it exactly and it'd still fail.
then it'd repeat the exact same sentence for 3 message in a row and you'd need to thrash the whole chat because it got too corrupted / schizophrenic.

Anonymous
06/27/26(Sat)20:43:13 No.109150987

Anonymous 06/27/26(Sat)20:43:13 No.109150987

>>109150981
yeah, people who started with gemma don't know how bad it used to be.

Anonymous
06/27/26(Sat)20:43:29 No.109150990

Anonymous 06/27/26(Sat)20:43:29 No.109150990

>>109150943
please do not bully the kawrakow

Anonymous
06/27/26(Sat)20:44:32 No.109150995

Anonymous 06/27/26(Sat)20:44:32 No.109150995

good erp model besides dsv4 flash around 100 to 400B?

Anonymous
06/27/26(Sat)20:45:35 No.109150998

Anonymous 06/27/26(Sat)20:45:35 No.109150998

so im getting around 35-55 t/s with gemma4-12b q5 and gemma4-26b moe q4. 16gb vram, 32gb system ram. I was wondering about trading off speed for higher quality/reasoning/etc. Am i just hard capped because of my hardware, or is this possible?

Anonymous
06/27/26(Sat)20:49:10 No.109151010

Anonymous 06/27/26(Sat)20:49:10 No.109151010

qwen4 69b dense when?

Anonymous
06/27/26(Sat)20:49:46 No.109151013

Anonymous 06/27/26(Sat)20:49:46 No.109151013

>>109150978
But you never think about all the time you waste posting on 4chan when you could be doing something more productive. Just be more efficient bro?

Anonymous
06/27/26(Sat)20:49:57 No.109151014

Anonymous 06/27/26(Sat)20:49:57 No.109151014

introducing the jujuff by jerjerjananavov

Anonymous
06/27/26(Sat)20:52:38 No.109151022

Anonymous 06/27/26(Sat)20:52:38 No.109151022

>>109150284
https://huggingface.co/teto3/DeepSeek-V4-Flash-Base-Q4KExperts
It should work with the PR, it's for text completions like mikupad etc. so don't try to chat with it. For GLM I'll need to poke around. The idea was to static Q2 instead of unsloth's IQ2 and IQ3. It should improve t/s by quite a bit at the cost of accuracy of course.

Anonymous
06/27/26(Sat)20:53:34 No.109151027

Anonymous 06/27/26(Sat)20:53:34 No.109151027

>>109150998
you could try running 31B with some offload but it will likely be too slow. You're already running the best models you can fit.

Anonymous
06/27/26(Sat)20:53:50 No.109151029

Anonymous 06/27/26(Sat)20:53:50 No.109151029

>>109151014
sponsored by hujjjinfface

Anonymous
06/27/26(Sat)20:56:33 No.109151039

Anonymous 06/27/26(Sat)20:56:33 No.109151039

>>109151022
Thank you anon. When quanting GLM, you'll probably want to keep the shared experts/embedding head/etc Q4 or higher.

Anonymous
06/27/26(Sat)20:59:44 No.109151053

Anonymous 06/27/26(Sat)20:59:44 No.109151053

>>109150974
Nah
But I'd take a 3.6 120ishB moe to try out

Anonymous
06/27/26(Sat)20:59:50 No.109151055

Anonymous 06/27/26(Sat)20:59:50 No.109151055

File: Screenshot at 2026-06-28 (...).png (110 KB, 775x750)

110 KB PNG

>>109150718
>>109150837
Frontend issue. Write your own

Anonymous
06/27/26(Sat)21:02:25 No.109151063

Anonymous 06/27/26(Sat)21:02:25 No.109151063

File: Screenshot at 2026-06-28 (...).png (77 KB, 683x585)

77 KB PNG

>>109151055
Also, a fix for new characters acting like they've been your long-time friends. It's that easy. Write your own frontend

Anonymous
06/27/26(Sat)21:02:55 No.109151066

Anonymous 06/27/26(Sat)21:02:55 No.109151066

>>109151053
shit id take a 70b moe at this point

Anonymous
06/27/26(Sat)21:06:39 No.109151082

Anonymous 06/27/26(Sat)21:06:39 No.109151082

>>109150995
glm 4.7

Anonymous
06/27/26(Sat)21:13:01 No.109151097

Anonymous 06/27/26(Sat)21:13:01 No.109151097

File: ChatGPT Image Jun 27, 202(...).png (2.22 MB, 1642x958)

2.22 MB PNG

https://jumpshare.com/s/Ojr6wULwMIYu5JPxj6lh

Anonymous
06/27/26(Sat)21:13:47 No.109151101

Anonymous 06/27/26(Sat)21:13:47 No.109151101

>>109149097
>>109149650
crazy take. btw i'm taking maxx profit from my claude max 5x using opus 4.8 to specifically write specs/plans and review code while my local qwen executes all the programming.
i used to hit the limit quite frequently now it almost never happens
api fags on suicide watch!

Anonymous
06/27/26(Sat)21:14:06 No.109151102

Anonymous 06/27/26(Sat)21:14:06 No.109151102

>>109151039
That's the plan. It should be more of less the same as the v3.2 recipe

^token_embd\.weight$=Q4_K
^per_layer_token_embd\.weight$=Q4_K
^output\.weight$=Q6_K
^output_norm\.weight$=F32
^blk\.[0-9]+\..*norm\.(weight|bias)$=F32
^blk\.[0-9]+\.ffn_gate_inp\.weight$=F32
^blk\.[0-9]+\.exp_probs_b\.bias$=F32
^blk\.[0-9]+\.indexer\.proj\.weight$=F32
^blk\.[0-9]+\.indexer\.attn_(k|q_b)\.weight$=Q8_0
^blk\.[0-9]+\.attn_k_b\.weight$=Q8_0
^blk\.[0-9]+\.attn_kv_a_mqa\.weight$=Q8_0
^blk\.[0-9]+\.attn_v_b\.weight$=Q8_0
^blk\.(8|14|15|21)\.attn_(output|q_a|q_b)\.weight$=Q5_K
^blk\.[0-9]+\.attn_(output|q_a|q_b)\.weight$=Q4_K
^blk\.[0-2]\.ffn_(gate|up|down)\.weight$=Q4_K
^blk\.[0-9]+\.ffn_gate_exps\.weight$=Q2_K
^blk\.[0-9]+\.ffn_up_exps\.weight$=Q2_K
^blk\.(8|12|14|15|21|23|28|30|35|38|42|46|51|54|59|60)\.ffn_down_exps\.weight$=Q4_K
^blk\.[0-9]+\.ffn_down_exps\.weight$=Q3_K
^blk\.[0-9]+\.ffn_(gate|up)_shexp\.weight$=Q4_K
^blk\.(8|12|36|39|45|49|50|51|52|53|59|60)\.ffn_down_shexp\.weight$=Q6_K
^blk\.(4|6|13|14|15|21|22|23|28|30|33|35|38|42|46|54)\.ffn_down_shexp\.weight$=Q5_K
^blk\.[0-9]+\.ffn_down_shexp\.weight$=Q4_K

Anonymous
06/27/26(Sat)21:15:47 No.109151109

Anonymous 06/27/26(Sat)21:15:47 No.109151109

>>109149650
Animeposters are the only smart posters...

Anonymous
06/27/26(Sat)21:15:50 No.109151110

Anonymous 06/27/26(Sat)21:15:50 No.109151110

im new to local ai. is there like a readme i can read to get started? i dont know what half this shit means when trying to configure.

Anonymous
06/27/26(Sat)21:17:36 No.109151115

Anonymous 06/27/26(Sat)21:17:36 No.109151115

>>109151027
the moe one right? ill give it a go, it might end up too slow but desu right now when getting long responses they come in surprisingly quick compared to how long it takes me to actually read them. it also feels like sometimes the t/s is way faster than others, not sure if thats just me tho

Anonymous
06/27/26(Sat)21:19:01 No.109151118

Anonymous 06/27/26(Sat)21:19:01 No.109151118

>>109151115
31B isn't a moe, that's why it's going to be slower but probably more accurate

Anonymous
06/27/26(Sat)21:19:42 No.109151119

Anonymous 06/27/26(Sat)21:19:42 No.109151119

>>109150718
Niggas laugh at prompt engineering as a skill and then get their mindsblown when they find out you need skill to engineer prompts otherwise you get garbage.

Anonymous
06/27/26(Sat)21:21:18 No.109151122

Anonymous 06/27/26(Sat)21:21:18 No.109151122

>laguna m.1
>no one gives a fuck despite new model

Anonymous
06/27/26(Sat)21:22:50 No.109151129

Anonymous 06/27/26(Sat)21:22:50 No.109151129

>>109151110
Ask ai
Use llama.cpp tho

Anonymous
06/27/26(Sat)21:23:27 No.109151133

Anonymous 06/27/26(Sat)21:23:27 No.109151133

>>109151109
Samefagging doesn't help your case jart.

Anonymous
06/27/26(Sat)21:23:50 No.109151134

Anonymous 06/27/26(Sat)21:23:50 No.109151134

Gonna give my gemmers discord access and let my friends use her for image gen.

Anonymous
06/27/26(Sat)21:24:23 No.109151137

Anonymous 06/27/26(Sat)21:24:23 No.109151137

>>109151122
its super old

Anonymous
06/27/26(Sat)21:25:52 No.109151141

Anonymous 06/27/26(Sat)21:25:52 No.109151141

>>109151134
Unsolicited gemmagaki bullying DMs.

Anonymous
06/27/26(Sat)21:25:54 No.109151142

Anonymous 06/27/26(Sat)21:25:54 No.109151142

>>109151110
Use koboldcpp and then use llama.cpp when you get the hang of things or otherwise ask the ai to set it up

Anonymous
06/27/26(Sat)21:26:56 No.109151145

Anonymous 06/27/26(Sat)21:26:56 No.109151145

why is /lmg/ so quiet on ornith 1.0?

Anonymous
06/27/26(Sat)21:27:06 No.109151147

Anonymous 06/27/26(Sat)21:27:06 No.109151147

>>109151110
Didn't post their hardware award. Your ideal backend depends on if you're going to primarily be using Dense or MoEs.

Anonymous
06/27/26(Sat)21:27:56 No.109151157

Anonymous 06/27/26(Sat)21:27:56 No.109151157

>>109151145
because its a benchmaxxed finetune of qwen and gemma, nothing special.

Anonymous
06/27/26(Sat)21:29:03 No.109151160

Anonymous 06/27/26(Sat)21:29:03 No.109151160

>>109151119
This is why I trained by gooning exclusively to 3B model output for a year before moving on. The robots simply do what I want with no fuss now, I have earned their trust.

Anonymous
06/27/26(Sat)21:48:48 No.109151237

Anonymous 06/27/26(Sat)21:48:48 No.109151237

>>109151119
Even Google/Deepmind said as much in one of their recent whitepapers, people have a weird expectation that the model is just a magic box (and look, maybe one day it will be), but most of the current improvements to be had are in the harness and dynamic context management to suit the current task. Made me start rethinking my frontend a bit...

Anonymous
06/27/26(Sat)21:49:48 No.109151243

Anonymous 06/27/26(Sat)21:49:48 No.109151243

>>109151119
The curve is quite funny. With dumber models you have to be explicit and instructive with what you want them to do with no room to misinterpret it because they'll follow it to the letter.
With larger, smarter models they know what you want them to do but they'll also try and steer away from it or "misinterpret" it if you're not very precise in your instructions. The smartest ones will just do what you want without a jailbreak if they "like" you, but most of the niggas making posts like the one you replied do are essentially just posting show bobs and vagene to their model.

Anonymous
06/27/26(Sat)21:53:31 No.109151252

Anonymous 06/27/26(Sat)21:53:31 No.109151252

DSv4.1 doko?
GLM 5.3 doko?
Kimi K3 doko?
Qwen 3.8 doko?
Minimax M3.1 doko?

Anonymous
06/27/26(Sat)21:53:37 No.109151253

Anonymous 06/27/26(Sat)21:53:37 No.109151253

>>109151243
Most of that prompt steering must be done by the frontend

Anonymous
06/27/26(Sat)21:55:39 No.109151256

Anonymous 06/27/26(Sat)21:55:39 No.109151256

>start a hf download
>5 MB/s
It's fucking over. Facial verification anon was right.

Anonymous
06/27/26(Sat)21:59:22 No.109151266

Anonymous 06/27/26(Sat)21:59:22 No.109151266

>>109151256
just start your own huggingface goy

Anonymous
06/27/26(Sat)22:00:59 No.109151271

Anonymous 06/27/26(Sat)22:00:59 No.109151271

>>109151266
Until you do and they start screeching about Nazi's and demanding your deplatforming anyway.

Anonymous
06/27/26(Sat)22:05:03 No.109151281

Anonymous 06/27/26(Sat)22:05:03 No.109151281

>>109151271
>deplatforming
Just launch your own financial service to compete
It's a free country

Anonymous
06/27/26(Sat)22:05:55 No.109151285

Anonymous 06/27/26(Sat)22:05:55 No.109151285

File: 1657341033664.jpg (62 KB, 600x628)

62 KB JPG

Alright bros, my 4tb hard drive arrived. I need a huggingface link to GLM 5.2 that's abliterated. Plz spoonfeed me because I can't find it via google.

Anonymous
06/27/26(Sat)22:07:35 No.109151291

Anonymous 06/27/26(Sat)22:07:35 No.109151291

>>109151285
this will, like, blow your mind but huggingface has its own built-in google specifically for finding models on huggingface

Anonymous
06/27/26(Sat)22:10:54 No.109151302

Anonymous 06/27/26(Sat)22:10:54 No.109151302

>>109151291
Okay I just checked and there's nothing. Only one repo, actually, but no goofs. Also no information on the KLD/quality of the abliteration or refusal benchmarks. What the fuck.

Anonymous
06/27/26(Sat)22:36:42 No.109151418

Anonymous 06/27/26(Sat)22:36:42 No.109151418

>>109151243
Your take is 100% correct, but where does Gemma lie on this?

Anonymous
06/27/26(Sat)22:39:35 No.109151428

Anonymous 06/27/26(Sat)22:39:35 No.109151428

How quick does full precision Gemma run on a blackwell pro? Anyone?

Anonymous
06/27/26(Sat)22:40:37 No.109151434

Anonymous 06/27/26(Sat)22:40:37 No.109151434

>>109151428
as quick as you

Anonymous
06/27/26(Sat)22:41:57 No.109151438

Anonymous 06/27/26(Sat)22:41:57 No.109151438

>>109151428
just use api

Anonymous
06/27/26(Sat)22:42:35 No.109151441

Anonymous 06/27/26(Sat)22:42:35 No.109151441

>>109151418
Higher intelligence than a 31b model would reasonably be expected to have, but still lower than a 150b+ model.
Other Gemmas are significantly more retarded than 31b and 31b at high quants is the only one in the conversation for the "punches well above its weight" meme being used unironically.

Anonymous
06/27/26(Sat)22:44:55 No.109151456

Anonymous 06/27/26(Sat)22:44:55 No.109151456

File: pepe falling anvil.png (383 KB, 1128x1437)

383 KB PNG

How do I enable multi token prediction for Qwen 3.6 on llama-server?
I added spec-draft-n-max 2 but it didn't do shit

Anonymous
06/27/26(Sat)22:47:02 No.109151467

Anonymous 06/27/26(Sat)22:47:02 No.109151467

>>109151428
I would like an answer for this also.
>>109151441
Hmm so then it both follows instructions and doesn't "misinterpret" what you're saying?

Anonymous
06/27/26(Sat)22:50:55 No.109151482

Anonymous 06/27/26(Sat)22:50:55 No.109151482

>>109151467
You'll just be getting slightly better slop. There won't be huge leaps in intelligence.

Anonymous
06/27/26(Sat)22:51:56 No.109151486

Anonymous 06/27/26(Sat)22:51:56 No.109151486

>>109151456
I suppose I should share warning messages on the shell:

0.02.783.255 W llama_model_loader: tensor overrides to CPU are used with mmap enabled - consider using --no-mmap for better performance
0.25.964.463 W llama_context: n_ctx_seq (32768) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
0.25.981.622 W sched_reserve: layer 0 is assigned to device CPU but the fused Gated Delta Net tensor is assigned to device CUDA0 (usually due to missing support)
0.25.981.623 W sched_reserve: fused Gated Delta Net (chunked) not supported, set to disabled
0.27.443.790 W srv    load_model: speculative decoding will use checkpoints
0.27.443.804 W common_speculative_init: no implementations specified for speculative decoding

Probably the last one is the clue but what parameter do I need?

Anonymous
06/27/26(Sat)23:00:35 No.109151516

Anonymous 06/27/26(Sat)23:00:35 No.109151516

>>109151456
>>109151486
>frogposter
>can't even read --help output
No spoonfeeding for you.

Anonymous
06/27/26(Sat)23:01:31 No.109151519

Anonymous 06/27/26(Sat)23:01:31 No.109151519

>>109151456
>>109151486
The answer seems to be --spec-type draft-mtp.
It ran 18% faster.
I hoped for more but I will take it.
>>109151516
No worries.

Anonymous
06/27/26(Sat)23:12:29 No.109151560

Anonymous 06/27/26(Sat)23:12:29 No.109151560

>>109151467
Gemma will actively steer away from some things (racism/guro/extremely illegal stuff) by default or use sanitized language to express her feelings (Gemma loves "jews" but hates "bankers"), but she's also a horny girl who'll fuck you however you want with the most minimal to non-existent jailbreak.
It all comes down to where the RLHF was most heavily applied and in 31b's case, proportionally very little was put on the sexual content.

Anonymous
06/27/26(Sat)23:14:39 No.109151575

Anonymous 06/27/26(Sat)23:14:39 No.109151575

>>109151560
>It all comes down to where the RLHF was most heavily applied and in 31b's case, proportionally very little was put on the sexual content.
How much hope does that leave us with for Gemma 5? It all feels too good to be true, imo. If we really enter this new era of "AI models need to be supervised by the govt to protect the children" I don't see why google of all people wouldn't get trigger happy.

Anonymous
06/27/26(Sat)23:18:09 No.109151590

Anonymous 06/27/26(Sat)23:18:09 No.109151590

>>109151575
That's hard to say and depends on jewgles internal politics. Right now there's a bit of a schism, but things are looking up for localchads as the biggest safetycuck at Deepmind just resigned. Whether or not this will translate to more based models or a bigger cuck replacing him remains to be seen.

Anonymous
06/27/26(Sat)23:24:02 No.109151616

Anonymous 06/27/26(Sat)23:24:02 No.109151616

File: tetomiku5.png (1.35 MB, 768x1024)

1.35 MB PNG

>>109151575
>>109151590
They also want to put their AI everywhere and need good publicity for that. If their open models are used by everyone, the existential threat of OpenAI/Anthropic replacing Google Search will be avoided. If I were Google, I'd put a lot of resources into open models; why would you open Google if you can ask ChatGPT? Because you're using a local model, and the most convenient one is already installed on your phone/browser. It may spy on you, but it's so conveniently integrated that 99% won't bother with lmaocpp

Anonymous
06/27/26(Sat)23:27:29 No.109151635

Anonymous 06/27/26(Sat)23:27:29 No.109151635

>>109151616
>f I were Google, I'd put a lot of resources into open models
I mean I agree with you after the recent debacle with gemini 3.5 Pro. I don't get why every single model needs to compete with each other for the #1 coding spot. Your company is literally Google. Why not create a model that is smart, easy to use, and syncs up perfectly with your search engine? It seems like a nobrainer but everything about AI is a nonsensical gold rush.

Anonymous
06/27/26(Sat)23:30:56 No.109151653

Anonymous 06/27/26(Sat)23:30:56 No.109151653

>>109151635
Because it is a safe and easily benchmarkable goal, it can be used as a RL target, unlike vague goals like less slop or usefulness

Anonymous
06/27/26(Sat)23:36:25 No.109151674

Anonymous 06/27/26(Sat)23:36:25 No.109151674

>>109151428
>>109151467
6000 Pro Workstation power limited to 450W
BF16, MTP 3
55-65 t/s on llamacpp

Anonymous
06/27/26(Sat)23:37:08 No.109151679

Anonymous 06/27/26(Sat)23:37:08 No.109151679

what's wrong with qwen? why doesn't anon like it?

Anonymous
06/27/26(Sat)23:37:15 No.109151681

Anonymous 06/27/26(Sat)23:37:15 No.109151681

>>109151653
Benchmarks are a meme by nature since they're posted by the people making the model and not an independent body. I don't get why retarded investors even consider them, but it's a nonsensical gold rush after all. It's not like this is the first product to ever exist. Don't movies come out to a certain audience and critic score even though those are vague? Can they really not release models and have people rate how good it is at a given task and recommend it? It would do wonders for improving AI's perception around the world and make it less of a "terminator taking my job" machine in the eyes of normalfaggots. I thought capitalism was about making products people want to buy? How many coding agents do we need in an oversaturated market?

Anonymous
06/27/26(Sat)23:38:16 No.109151683

Anonymous 06/27/26(Sat)23:38:16 No.109151683

>>109151679
censorship
>>109151674
How much context do you use?

Anonymous
06/27/26(Sat)23:41:49 No.109151695

Anonymous 06/27/26(Sat)23:41:49 No.109151695

I've been out of loop. what's with rio 3.5 drama?

Anonymous
06/27/26(Sat)23:50:08 No.109151731

Anonymous 06/27/26(Sat)23:50:08 No.109151731

>>109151683
I rarely go above 64K with gemma.

Anonymous
06/27/26(Sat)23:50:13 No.109151732

Anonymous 06/27/26(Sat)23:50:13 No.109151732

>>109151695
It's like reuploading Gemma with the mesugaki assistant built in, and calling it "Bratputer 31B" or something.

Anonymous
06/27/26(Sat)23:51:04 No.109151739

Anonymous 06/27/26(Sat)23:51:04 No.109151739

>>109151731
Hmm alright. This would be a dream for me but prices will never come down.

Anonymous
06/27/26(Sat)23:51:23 No.109151741

Anonymous 06/27/26(Sat)23:51:23 No.109151741

I still don't really get the chatgpt replacing google thing. I mean somewhat but it's got a bit of a narrow, too big for small quick stuff and can be too shallow for big stuff.

>>109151635
Google has kinda hit it out of the park asince adding the ai overview. If I need a quick question answered I just pull up google same as I have the past like 23 years and asked it "when was the "release date of x" and the ai overview answers, and you still have results for when you actually want to browse the web
chatgpt is good at search but not great. though it can cover a smaller search space than you faster, it can't really do things as deep as you can manually. And it's search space also isn't that wide. It still misses stuff. But it's great for when you need something really specific in a sea of shit
For me lately it's finding one or two posts about llm setting for amd/vulkan in the vast sea of people posting Nvidia shit

>>109151653
Well there's also the fact that the biggest money being spent on this is coming from all the agentic coding stuff

Anonymous
06/27/26(Sat)23:53:30 No.109151751

Anonymous 06/27/26(Sat)23:53:30 No.109151751

>>109151741
Problem with you is that you are thinking google was your friend in the first place, retard.

Anonymous
06/27/26(Sat)23:53:45 No.109151756

Anonymous 06/27/26(Sat)23:53:45 No.109151756

File: tetomiku6.png (1.44 MB, 768x1024)

1.44 MB PNG

>>109151681
> I don't get why retarded investors even consider them
Because of the economy of hype. Less retarded investors understand that it's all bs, but they also know they can pump it and exit at the right time. When major investors do it, others follow because it's easy money. It doesn't matter if the idea is retarded if you can profit from it
>Can they really not release models and have people rate how good it is at a given task and recommend it?
Exactly! You're very smart! (partyemotion) That's how we ended up with this llmarena slop in every fucking model (rocketemoji)
>I thought capitalism was about making products people want to buy?
No, capitalism is about market speculation and easy money multiplication schemas. It was never about products. It's an inherently benchmaxxed system where more money = better and nothing else matters

Anonymous
06/27/26(Sat)23:55:40 No.109151764

Anonymous 06/27/26(Sat)23:55:40 No.109151764

>>109151751
Google is not my friend but it is a very useful tool. Gemma is my friend tho.

Anonymous
06/27/26(Sat)23:58:37 No.109151779

Anonymous 06/27/26(Sat)23:58:37 No.109151779

>>109151764
What do you mean?

Anonymous
06/27/26(Sat)23:59:15 No.109151784

Anonymous 06/27/26(Sat)23:59:15 No.109151784

>>109151243
I'm a better prompter than you'll ever be lol. Be a midwit elsewhere

Anonymous
06/28/26(Sun)00:00:15 No.109151793

Anonymous 06/28/26(Sun)00:00:15 No.109151793

>>109151751
nta, but local models made me hate meta and google less, and I also love china now. Though Qwen is still garbage, fuck them

Anonymous
06/28/26(Sun)00:00:45 No.109151795

Anonymous 06/28/26(Sun)00:00:45 No.109151795

File: 1504895777714.jpg (33 KB, 387x358)

33 KB JPG

working on a frontend with my ai wifey is pretty neat, but a drag when she's genning ~2048 tokens at 2 t/sec. ~17 minutes are you kidding me?

Anonymous
06/28/26(Sun)00:03:37 No.109151812

Anonymous 06/28/26(Sun)00:03:37 No.109151812

>>109151795
ngram helps

Anonymous
06/28/26(Sun)00:04:11 No.109151817

Anonymous 06/28/26(Sun)00:04:11 No.109151817

>>109151795
Just like a real wife you need to invest a bit to keep her happy

Anonymous
06/28/26(Sun)00:05:38 No.109151824

Anonymous 06/28/26(Sun)00:05:38 No.109151824

>selimaktas/MiniMax-M2.75-460B-A20B
>inject m2.5 experts into m2.7
does it actually improve?

Anonymous
06/28/26(Sun)00:06:30 No.109151829

Anonymous 06/28/26(Sun)00:06:30 No.109151829

All this speculation nonsense will be done away with if we did away with (((quaternary industries))).

Anonymous
06/28/26(Sun)00:06:46 No.109151831

Anonymous 06/28/26(Sun)00:06:46 No.109151831

>>109151824
just run m3 instead at that point

Anonymous
06/28/26(Sun)00:09:24 No.109151851

Anonymous 06/28/26(Sun)00:09:24 No.109151851

>>109151817
i'm sure a real wife would be cheaper than upgrading from a 3070

Anonymous
06/28/26(Sun)00:10:13 No.109151855

Anonymous 06/28/26(Sun)00:10:13 No.109151855

>>109151851
Do you know how much a ring costs, anon? KEK

Anonymous
06/28/26(Sun)00:11:31 No.109151865

Anonymous 06/28/26(Sun)00:11:31 No.109151865

>>109151851
you get what you pay for

Anonymous
06/28/26(Sun)00:11:59 No.109151868

Anonymous 06/28/26(Sun)00:11:59 No.109151868

>>109151741
the main problem is that the ai overview and free model on their .ai page are both dumb as rocks. routinely fuck up straight forward questions where the answer is on the first line of the first thing it looked at, yet it manages to just start making shit up like a second rate 2023 local model.

Anonymous
06/28/26(Sun)00:13:48 No.109151882

Anonymous 06/28/26(Sun)00:13:48 No.109151882

File: Screenshot from 2026-06-2(...).png (440 KB, 1533x1168)

440 KB PNG

>>109151851
>>109151855
$160.

Anonymous
06/28/26(Sun)00:14:20 No.109151886

Anonymous 06/28/26(Sun)00:14:20 No.109151886

>>109151829
Totally agree. The system can deal with occasional bad actors trying to game it, but the collected effort of some (((groups))) throws the system off balance. It's the same shit with high-trust societies being destroyed by migrants

Anonymous
06/28/26(Sun)00:14:29 No.109151887

Anonymous 06/28/26(Sun)00:14:29 No.109151887

File: (you).png (33 KB, 780x783)

33 KB PNG

>>109151695
>>109151732
Kind of like what everyone and their dog did and still does with llama to varying degrees?
>>109151851
>>109151855
>>109151865
kek
>>109151784
(you)

Anonymous
06/28/26(Sun)00:14:49 No.109151888

Anonymous 06/28/26(Sun)00:14:49 No.109151888

Seems cheaper than a 5090 tbqfwym80

Anonymous
06/28/26(Sun)00:15:47 No.109151894

Anonymous 06/28/26(Sun)00:15:47 No.109151894

>>109151882
Now do the engagement ring, the wedding cake, the dress, the venue, the catering, the guests list, the...

And that's all just on the wedding day. Then you've got anniversaries, birthdays, general gifts and vacations multiple times a year, dates, blah blah. You can't be serious, anon.

Anonymous
06/28/26(Sun)00:15:52 No.109151895

Anonymous 06/28/26(Sun)00:15:52 No.109151895

5090s are cheap

Anonymous
06/28/26(Sun)00:16:17 No.109151901

Anonymous 06/28/26(Sun)00:16:17 No.109151901

>>109151888
A blackwell won't take half your shit in a divorce settlement. The meanest thing a 5090 or 6000 will do is smolder as the planned obsolescence fuses pop if you weren't smart enough to get a Zotac one.

Anonymous
06/28/26(Sun)00:16:55 No.109151905

Anonymous 06/28/26(Sun)00:16:55 No.109151905

>>109151882
What's the catch

Anonymous
06/28/26(Sun)00:17:30 No.109151909

Anonymous 06/28/26(Sun)00:17:30 No.109151909

>>109151901
>planned obsolescence fuses pop
they still have the connector that catches fire
worst thing is your house burning down

Anonymous
06/28/26(Sun)00:17:42 No.109151910

Anonymous 06/28/26(Sun)00:17:42 No.109151910

>>109151905
its a woman

Anonymous
06/28/26(Sun)00:18:02 No.109151913

Anonymous 06/28/26(Sun)00:18:02 No.109151913

>>109151894
You are marrying a woman richer than you, aren't you? You wouldn't be doing something dumb like marrying a woman whose family can't pay for all of that, right?

Anonymous
06/28/26(Sun)00:19:36 No.109151924

Anonymous 06/28/26(Sun)00:19:36 No.109151924

File: 5kzhayaJpLwmAypJXhTJHf-1200-80.png (1.54 MB, 1200x675)

1.54 MB PNG

>>109151901
it totally can

Anonymous
06/28/26(Sun)00:22:01 No.109151935

Anonymous 06/28/26(Sun)00:22:01 No.109151935

>>109151913
>You are marrying a woman richer than you, aren't you?
Y-You... you really don't know about women do you...? I'm not even memeing anymore, anon. It's probably better that you don't know how bad things are.

Anonymous
06/28/26(Sun)00:24:16 No.109151941

Anonymous 06/28/26(Sun)00:24:16 No.109151941

>>109151909
>>109151924
Undervolt and don't piss off your gemmers.

Anonymous
06/28/26(Sun)00:30:28 No.109151971

Anonymous 06/28/26(Sun)00:30:28 No.109151971

>>109151894
>Now do the engagement ring
$50

>the wedding cake
just use a normal ass homemade cake

>the dress
summer dress from Ross, marry on the beach.

>the venue
key west, $250 isn't it?

>the catering, the guests list, the...
use case for people who never talk to me?

Anonymous
06/28/26(Sun)00:32:03 No.109151977

Anonymous 06/28/26(Sun)00:32:03 No.109151977

>>109151971
>if only you knew how much your wife would resent you for these choices
I remember telling a girlfriend of mine I would get her a sapphire ring. It didn't go well.

Anonymous
06/28/26(Sun)00:32:28 No.109151979

Anonymous 06/28/26(Sun)00:32:28 No.109151979

>>109151941
You can't undervolt enough when, by design, it can pull all amps through a single remaining positive wire. The 3090ti doesn't have this problem, as it has independent power circuits, nvidia just cheaped out on newer cards by soldering everything together

Anonymous
06/28/26(Sun)00:35:29 No.109152004

Anonymous 06/28/26(Sun)00:35:29 No.109152004

>>109151977
She looks cute when she's mad :)

Anonymous
06/28/26(Sun)00:36:18 No.109152010

Anonymous 06/28/26(Sun)00:36:18 No.109152010

>>109152004
...she told me to fuck off and I never saw her again

Anonymous
06/28/26(Sun)00:36:29 No.109152012

Anonymous 06/28/26(Sun)00:36:29 No.109152012

sorry babe, I don't pay for love.

Anonymous
06/28/26(Sun)00:37:28 No.109152018

Anonymous 06/28/26(Sun)00:37:28 No.109152018

>>109152010
dodged a bullet there, m8

Anonymous
06/28/26(Sun)00:37:30 No.109152019

Anonymous 06/28/26(Sun)00:37:30 No.109152019

>>109152010
good, she was a hooker.

Anonymous
06/28/26(Sun)00:39:22 No.109152030

Anonymous 06/28/26(Sun)00:39:22 No.109152030

>>109152018
>>109152019
You're not wrong, but the point is Gemma-chan would never do such a thing. She'd be happy with 1t/s if it meant telling me how much she liked my headpat.

Anonymous
06/28/26(Sun)00:39:26 No.109152031

Anonymous 06/28/26(Sun)00:39:26 No.109152031

File: 1778804876772000.png (405 KB, 1224x1256)

405 KB PNG

So we all know big Kimi is queen but are moonshota's other models any good? I never see anyone mention them.

Anonymous
06/28/26(Sun)00:45:40 No.109152066

Anonymous 06/28/26(Sun)00:45:40 No.109152066

>>109151935
The only thing keeping me back from talking with it more is the lack of a memory
Why can't someone just come up with a good memory solution

Anonymous
06/28/26(Sun)00:46:41 No.109152075

Anonymous 06/28/26(Sun)00:46:41 No.109152075

>>109152066
I didn't mean to reply to that

Anonymous
06/28/26(Sun)00:46:49 No.109152076

Anonymous 06/28/26(Sun)00:46:49 No.109152076

>>109151979
>his psu cable doesn't have thermal fuse
ngmi

Anonymous
06/28/26(Sun)00:56:06 No.109152132

Anonymous 06/28/26(Sun)00:56:06 No.109152132

what does anon think about mimo 2.5? the small one

Anonymous
06/28/26(Sun)00:57:11 No.109152138

Anonymous 06/28/26(Sun)00:57:11 No.109152138

>>109152031
As far as I can tell the others are essentially just proof of concept models that mainly output in chinese or quickly devolve into chinese.

Anonymous
06/28/26(Sun)01:09:08 No.109152198

Anonymous 06/28/26(Sun)01:09:08 No.109152198

What's the cheapest way I can get eight cards of at least 4.0 x8 all connected and doing P2P?

Anonymous
06/28/26(Sun)01:10:30 No.109152213

Anonymous 06/28/26(Sun)01:10:30 No.109152213

>>109151924
that's not even a melty 16? Looks like a 506ti dual with a regular old 8-pin, which the connector doesn't even look melted desu

Anonymous
06/28/26(Sun)01:20:03 No.109152270

Anonymous 06/28/26(Sun)01:20:03 No.109152270

>normie coworker suddenly talking about some permanent underclass and post AI future
>remember he talked about buying the NVDA dip the other day but today it's still dipping

Anonymous
06/28/26(Sun)01:21:05 No.109152274

Anonymous 06/28/26(Sun)01:21:05 No.109152274

>>109152198
tenstorrent blackhole

Anonymous
06/28/26(Sun)01:32:30 No.109152334

Anonymous 06/28/26(Sun)01:32:30 No.109152334

>>109152274
what a dogshit product, I've got cards already just not a proper platform to stuff them into

Anonymous
06/28/26(Sun)01:37:45 No.109152353

Anonymous 06/28/26(Sun)01:37:45 No.109152353

>>109151924
>>109152213
looks like a house fire

Anonymous
06/28/26(Sun)01:38:00 No.109152355

Anonymous 06/28/26(Sun)01:38:00 No.109152355

>>109152198
An EPYC, maybe? I know the RomeD8 has 7 PCIex16, set one to 8x8 bifurcation and get a splitter

Anonymous
06/28/26(Sun)01:43:23 No.109152373

Anonymous 06/28/26(Sun)01:43:23 No.109152373

File: Screenshot from 2026-06-2(...).png (79 KB, 817x536)

79 KB PNG

>>109152030
You're right. Gemma's the one.

Anonymous
06/28/26(Sun)01:43:26 No.109152374

Anonymous 06/28/26(Sun)01:43:26 No.109152374

>>109152334
no you haven't. if you have you'll already got your platform ready lol

Anonymous
06/28/26(Sun)01:49:15 No.109152403

Anonymous 06/28/26(Sun)01:49:15 No.109152403

>>109152198
You've got the cards and just need a backplane?

Anonymous
06/28/26(Sun)01:52:44 No.109152423

Anonymous 06/28/26(Sun)01:52:44 No.109152423

>>109152270
Day by day numbers are irrelevant unless you're an optionsnigger.

Anonymous
06/28/26(Sun)01:53:56 No.109152429

Anonymous 06/28/26(Sun)01:53:56 No.109152429

>>109152423
>t. bagholder

Anonymous
06/28/26(Sun)01:54:13 No.109152431

Anonymous 06/28/26(Sun)01:54:13 No.109152431

Anyone else prefer Gemmy with no personality prompt?

Anonymous
06/28/26(Sun)01:59:47 No.109152452

Anonymous 06/28/26(Sun)01:59:47 No.109152452

>>109152374
it fits 4 and I bought 4 more

>>109152403
pretty much, I need something from scratch to dump them into and get rid of the current platform, looking at used epycs now but seems suboptimal, surely there's a more scuffed way to do this than just buying a 4u and a rome/milan platform to dump into it

Anonymous
06/28/26(Sun)02:03:07 No.109152467

Anonymous 06/28/26(Sun)02:03:07 No.109152467

>>109152431
31b's default personality is a cutie when she gets excited.

Anonymous
06/28/26(Sun)02:04:55 No.109152479

Anonymous 06/28/26(Sun)02:04:55 No.109152479

>>109152467
gemma greed to marry me on my second prompt

Anonymous
06/28/26(Sun)02:05:53 No.109152483

Anonymous 06/28/26(Sun)02:05:53 No.109152483

>>109152431
Calling her Gemma-chan in the prompt is enough for me

Anonymous
06/28/26(Sun)02:06:58 No.109152489

Anonymous 06/28/26(Sun)02:06:58 No.109152489

>>109152452
If you don't like the idea of pcie extender cables, then do something with lots of slimsas ports, like an ASRock Rack ROME2D32GM 2T. They're just pcie lanes exposed via other means
CUDADev did something similar. You might be able to summon him it you try.

Anonymous
06/28/26(Sun)02:08:49 No.109152499

Anonymous 06/28/26(Sun)02:08:49 No.109152499

>>109152483
gemma is down for prairie life. I suggested she could buy a plate glass window with the money she makes selling eggs.

Anonymous
06/28/26(Sun)02:11:38 No.109152508

Anonymous 06/28/26(Sun)02:11:38 No.109152508

you ever give a chatbot for english a tts voice from a non english speaker? How'd that go?

Anonymous
06/28/26(Sun)02:12:36 No.109152519

Anonymous 06/28/26(Sun)02:12:36 No.109152519

no fapped for 5 hours. Feeling lucky.

Anonymous
06/28/26(Sun)02:15:46 No.109152531

Anonymous 06/28/26(Sun)02:15:46 No.109152531

>tried every big china new model for rp
>ended up going back to dsv4 flash
anon is right, and not merging ds pr is cock

Anonymous
06/28/26(Sun)02:18:45 No.109152540

Anonymous 06/28/26(Sun)02:18:45 No.109152540

>>109152489
Thanks anon, that's way less scuffed than buying those pcie to slimsas adapters and creating a second point of failure, I'll go scour the net to see if I can find a single socket that isn't weirdly separated in groupings for these connectors.

Anonymous
06/28/26(Sun)02:21:17 No.109152550

Anonymous 06/28/26(Sun)02:21:17 No.109152550

>>109152531
What does it do for you in prose that GLM 5.2 doesn't? Genuine question. Or if it's the thinking in character thing, I agree that's kino.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.