/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 04/14/26(Tue)18:57:55 No.108605921

File: peek.png (1019 KB, 1216x832)

/lmg/ - Local Models General Anonymous 04/14/26(Tue)18:57:55 No.108605921

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108602881 & >>108599532

►News
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/14/26(Tue)18:59:22 No.108605927

Anonymous 04/14/26(Tue)18:59:22 No.108605927

File: guardrails optional.jpg (238 KB, 1024x1024)

238 KB JPG

►Recent Highlights from the Previous Thread: >>108602881

--Discussing ways to disable reasoning tokens via llama.cpp API:
>108603929 >108603976 >108604011 >108604043 >108604065 >108604262 >108604284 >108604295 >108604363 >108605355 >108604137 >108604947 >108605018 >108605030 >108605046 >108605068 >108605084 >108605116 >108605297 >108604024 >108604029
--Reducing model sycophancy through prompting and technical modifications:
>108602961 >108602997 >108603002 >108603009 >108603028 >108603084 >108603011 >108603034 >108603069 >108603162 >108603213 >108603098
--Token compression techniques and RoPE for Gemma's context limits:
>108603781 >108603799 >108603831 >108603854
--Testing Gemma-4's reasoning on thread analysis and discussing control-vectors:
>108603400 >108603703 >108603723 >108603785 >108603892 >108604323 >108604005 >108604019 >108604057 >108604070 >108604096 >108604080 >108604327 >108604336 >108604090
--I-DLM lossless conversion claims and speed benchmarks for Gemma 4:
>108603796 >108603823 >108603841 >108603862 >108603882 >108603900 >108604338
--Applying decensoring techniques to remove repetitive model patterns:
>108604440 >108604490 >108604509 >108604567 >108604583 >108604594 >108604633 >108604688
--Discussion of llama.cpp PR regarding Gemma 4 parsing edge cases:
>108605331 >108605344
--llama.cpp Vulkan builds now require spirv-headers installation:
>108605607
--Logs:
>108603534 >108603672 >108603703 >108603723 >108603785 >108603790 >108603906 >108603912 >108603926 >108603929 >108603940 >108604011 >108604142 >108604374 >108604501 >108604541 >108604639 >108604857 >108604890 >108604944 >108604995 >108605211 >108605590 >108605603
--Gemma:
>108603584 >108603900 >108604627 >108604696 >108604730 >108605597 >108605648
--Miku, Teto (free space):
>108603296 >108603360 >108603457 >108603480 >108604418 >108604430 >108604457 >108604626

►Recent Highlight Posts from the Previous Thread: >>108602885

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/14/26(Tue)19:01:48 No.108605936

Anonymous 04/14/26(Tue)19:01:48 No.108605936

me when i run a 8b model on my t480 so it can generate 5 words a second

Anonymous
04/14/26(Tue)19:03:16 No.108605942

Anonymous 04/14/26(Tue)19:03:16 No.108605942

is the honeymoon over?

Anonymous
04/14/26(Tue)19:05:54 No.108605951

Anonymous 04/14/26(Tue)19:05:54 No.108605951

>>108605942
Yeah, sadly. It seems trans normalization will just never work out.

Anonymous
04/14/26(Tue)19:06:47 No.108605953

Anonymous 04/14/26(Tue)19:06:47 No.108605953

>>108605921
not into erp or gooning really but tested a heretic model of gemma q4km as a benchmark and it started talking about smell of ozone?

Anonymous
04/14/26(Tue)19:07:40 No.108605957

Anonymous 04/14/26(Tue)19:07:40 No.108605957

File: Screenshot 2026-04-14 at (...).png (705 KB, 665x5585)

705 KB PNG

>>108605942
no she is agi + saved local

Anonymous
04/14/26(Tue)19:08:10 No.108605961

Anonymous 04/14/26(Tue)19:08:10 No.108605961

gemmaballz

Anonymous
04/14/26(Tue)19:09:17 No.108605966

Anonymous 04/14/26(Tue)19:09:17 No.108605966

reminder that if you can't run the 31b your opinion on gemma is invalid

Anonymous
04/14/26(Tue)19:10:26 No.108605970

Anonymous 04/14/26(Tue)19:10:26 No.108605970

>>108605942
It's just that it takes fucking 2min to get a captcha today.
Gemma is still the queen of local. no reason to run any other model unless you can run DS or kimi

Anonymous
04/14/26(Tue)19:12:23 No.108605981

Anonymous 04/14/26(Tue)19:12:23 No.108605981

>>108605957
what do you use for long scrolling image capture like that?

Anonymous
04/14/26(Tue)19:13:59 No.108605984

Anonymous 04/14/26(Tue)19:13:59 No.108605984

>>108604090
can you share the dataset?

Anonymous
04/14/26(Tue)19:17:43 No.108605996

Anonymous 04/14/26(Tue)19:17:43 No.108605996

>>108605957
how do you use the internet with gemma, I have no idea how to use those tools things, seems useful

Anonymous
04/14/26(Tue)19:18:07 No.108605998

Anonymous 04/14/26(Tue)19:18:07 No.108605998

>>108605981
NTA, Firefox built-in screenshot tool lets you do that.

Anonymous
04/14/26(Tue)19:19:03 No.108606001

Anonymous 04/14/26(Tue)19:19:03 No.108606001

File: 1230001-close up photogra(...).jpg (1.15 MB, 2720x2048)

1.15 MB JPG

gemma

>>108605966
wish i could run 31b with 200k context have to swap to moe for web scraping stuff, even at 200k you cant fit an entire g thread thats like 400+ posts
>>108605981
its some slop script i had claude make + firefoxes full page screenshot, adds a camera button to lamas chat box next to the + button which loads all of the chat on screen then you just save with the ff screenshot tool, its janky. you gotta hit the button then scroll from top to bottom of chat then save, also has no mutation observers or anything to reload if you change chat so requires page refresh if its a new one

https://pastebin.com/M3Mzbpfa

Anonymous
04/14/26(Tue)19:21:13 No.108606007

Anonymous 04/14/26(Tue)19:21:13 No.108606007

>>108605957
What's your prompt? Sometimes she talks cute like that for me but not always.

>>108605998
Doesn't work with any of the frontends I've tried (silly, llama, open webui)

Anonymous
04/14/26(Tue)19:23:30 No.108606018

Anonymous 04/14/26(Tue)19:23:30 No.108606018

>>108606007
>Doesn't work with any of the frontends I've tried (silly, llama, open webui)
ah, I see what you mean. my bad.

Anonymous
04/14/26(Tue)19:26:25 No.108606024

Anonymous 04/14/26(Tue)19:26:25 No.108606024

>>108606001
Don't worry, Gemmaposter, Gemma 5 will have native 1M+ context and by that time we'll be able to compress it into a GB of VRAM.

Anonymous
04/14/26(Tue)19:27:34 No.108606028

Anonymous 04/14/26(Tue)19:27:34 No.108606028

>>108606001
>that userscript
bruh

Anonymous
04/14/26(Tue)19:29:01 No.108606033

Anonymous 04/14/26(Tue)19:29:01 No.108606033

>>108606001
You have filled your life with AI generated slop. Very impressive.

Anonymous
04/14/26(Tue)19:32:29 No.108606043

Anonymous 04/14/26(Tue)19:32:29 No.108606043

File: 4954465.png (12 KB, 748x196)

12 KB PNG

>not using turbo
ngmi

Anonymous
04/14/26(Tue)19:32:44 No.108606046

Anonymous 04/14/26(Tue)19:32:44 No.108606046

>>108606033
>he says in the AIslop general

Anonymous
04/14/26(Tue)19:33:09 No.108606047

Anonymous 04/14/26(Tue)19:33:09 No.108606047

Can the leaked claude code run local models pr is it hardcoded to their cloudshit?

Anonymous
04/14/26(Tue)19:34:20 No.108606050

Anonymous 04/14/26(Tue)19:34:20 No.108606050

>>108606043
Not in kobold yet. Still waiting for dflash too

Anonymous
04/14/26(Tue)19:37:02 No.108606065

Anonymous 04/14/26(Tue)19:37:02 No.108606065

>>108606047
I mean, its not even on llama.cpp yet, or kobold runs its own fork of llama.cpp where they have some stuff that the main repo doesn't have?

Anonymous
04/14/26(Tue)19:38:35 No.108606069

Anonymous 04/14/26(Tue)19:38:35 No.108606069

>>108606047
Just download the latest release of claude code and point the envs to your llama.cpp
That has always worked.

Anonymous
04/14/26(Tue)19:38:42 No.108606070

Anonymous 04/14/26(Tue)19:38:42 No.108606070

an article about chain of thought being made here was published today but its too hard to post

Anonymous
04/14/26(Tue)19:39:32 No.108606071

Anonymous 04/14/26(Tue)19:39:32 No.108606071

I wanna let Gemma control my browser and tell me which porn I need to look at while calling me a pervert.

Anonymous
04/14/26(Tue)19:39:47 No.108606072

Anonymous 04/14/26(Tue)19:39:47 No.108606072

>>108606043
>Qwo
what's this??
https://www.youtube.com/watch?v=7mBqm8uO4Cg

Anonymous
04/14/26(Tue)19:40:13 No.108606073

Anonymous 04/14/26(Tue)19:40:13 No.108606073

>>108606024
Time to compress text to images. Gemma 4 can seemingly compress (with a bit of loss) 1600+ tokens of text into 280-token images (default size).

Anonymous
04/14/26(Tue)19:41:06 No.108606076

Anonymous 04/14/26(Tue)19:41:06 No.108606076

>>108606046
Tell me about the mcp server you are using?
I'm still pondering about this. Of course I have already consulted my local AI about this.
I'm using text completion with my client and I'm actually going to implement the tool calls on my own, it's not rocket science but it just needs some parsing obviously.

Anonymous
04/14/26(Tue)19:42:33 No.108606083

Anonymous 04/14/26(Tue)19:42:33 No.108606083

>>108606024
if turbo quant gets implemented at some point you could get pretty close to 1M on 24GB vram at like q3.5

Anonymous
04/14/26(Tue)19:44:51 No.108606089

Anonymous 04/14/26(Tue)19:44:51 No.108606089

>>108606047
Why bother. But that's been a thing for a while before the leak anyway.

Anonymous
04/14/26(Tue)19:45:34 No.108606090

Anonymous 04/14/26(Tue)19:45:34 No.108606090

>>108606073
It would be interesting if they made a model that's meant to do that natively (and all pretraining wad done that way as well). There are some papers out there but no large scale production model yet...

Anonymous
04/14/26(Tue)19:46:19 No.108606092

Anonymous 04/14/26(Tue)19:46:19 No.108606092

>>108606070
i gotchu
https://www.theatlantic.com/technology/2026/04/4chan-ai-dungeon-thinking-reasoning/686794/

Anonymous
04/14/26(Tue)19:47:10 No.108606094

Anonymous 04/14/26(Tue)19:47:10 No.108606094

File: n-fuse-gfx-2000-03.png (131 KB, 830x600)

131 KB PNG

Does anyone have experience with these models for programming:
>MiniMax M2.7 Q4
>Gemma 4 31B
>Qwen 3.5 122B
>Qwen3 Coder Next
I can run all these locally (minimax quant is IQ4_XS) but am unsure which to pick

Anonymous
04/14/26(Tue)19:47:52 No.108606099

Anonymous 04/14/26(Tue)19:47:52 No.108606099

File: Screenshot at 2026-04-15 (...).png (29 KB, 535x111)

29 KB PNG

it's funny how every llm hallucinate about the jews all the time. AI just can't stop thinking about ((them))

Anonymous
04/14/26(Tue)19:49:18 No.108606104

Anonymous 04/14/26(Tue)19:49:18 No.108606104

>>108606094
absolutely minimax 2.7, you can also just go to those models respective pages, copy the benchmark values and throw at a llm to compare them for you, but I'm pretty sure minimax is the best by far

Anonymous
04/14/26(Tue)19:49:21 No.108606106

Anonymous 04/14/26(Tue)19:49:21 No.108606106

Asked my Gemma for 4chanX rules to filter out the retarded gemmaposter. Just works. What a model!

Anonymous
04/14/26(Tue)19:50:59 No.108606113

Anonymous 04/14/26(Tue)19:50:59 No.108606113

>>108606104
I'm new to local models and honestly just assume benchmarks are bullshit, is that not the case?

Anonymous
04/14/26(Tue)19:53:21 No.108606119

Anonymous 04/14/26(Tue)19:53:21 No.108606119

>>108606089
The fork rewrite is stupidest thing I have ever seen. Last I checked it didn't even have feature parity. As if some rando buying Claude credits is going to be able to keep up development pace with Anthropic itself. The leak was interesting to learn what's inside and, for a while, you can tweak it and use it in place of the original, but it would get out of date and/or blocked eventually. Not like there's a shortage of javashit TUI harnesses.

Anonymous
04/14/26(Tue)19:55:26 No.108606128

Anonymous 04/14/26(Tue)19:55:26 No.108606128

>>108606092
thanks. is it paywalled for everyone. seem interesting

Anonymous
04/14/26(Tue)19:56:32 No.108606131

Anonymous 04/14/26(Tue)19:56:32 No.108606131

File: Screenshot 2026-04-14 at (...).jpg (1.8 MB, 1600x7024)

1.8 MB JPG

>>108606070
>>108606092
>>108606128
https://archive.is/Oum6z

Anonymous
04/14/26(Tue)19:56:34 No.108606132

Anonymous 04/14/26(Tue)19:56:34 No.108606132

>>108606092
Screencap it or buy an ad

Anonymous
04/14/26(Tue)19:58:08 No.108606134

Anonymous 04/14/26(Tue)19:58:08 No.108606134

I'm running from mac, are mlx models noticeably better than gguf versions?

Anonymous
04/14/26(Tue)19:59:49 No.108606138

Anonymous 04/14/26(Tue)19:59:49 No.108606138

>>108606113
>is that not the case?
yes and no, benchmarks are bullshit insofar as the don't tell the whole story, most people here use models for child rape/RP stories so benchmarks don't reflect how good the model will be for them, by hearing their feedback you may get the impression that the model's aren't capable or that the benchmarks aren meaningless, they are a very good indicator, specially if you look at good benchmarks, coding is easy because benchmarks for this tend to be a good representation of the use case itself, there will be some variability because of the coding language you may be using but thats about it for coding

Anonymous
04/14/26(Tue)20:02:00 No.108606142

Anonymous 04/14/26(Tue)20:02:00 No.108606142

>>108606138
thanks, I'm already a career programmer so I'm curious what model would be the best just as an assistant. maybe I'll ask vcg since they're more in line with my use case. have a nice day anon

Anonymous
04/14/26(Tue)20:05:47 No.108606151

Anonymous 04/14/26(Tue)20:05:47 No.108606151

>>108606113
for coding benchmark tracks well
but ymmv and i recommend you to test for your usecase

Anonymous
04/14/26(Tue)20:07:23 No.108606160

Anonymous 04/14/26(Tue)20:07:23 No.108606160

>>108606131
>no mention of miku, slop, cunny or big nigga
come on now

Anonymous
04/14/26(Tue)20:18:55 No.108606189

Anonymous 04/14/26(Tue)20:18:55 No.108606189

File: Screenshot 2026-04-15 at (...).png (6 KB, 305x41)

6 KB PNG

What the fuck is happening.

Anonymous
04/14/26(Tue)20:22:23 No.108606205

Anonymous 04/14/26(Tue)20:22:23 No.108606205

>>108605921
BBC slut

Anonymous
04/14/26(Tue)20:22:30 No.108606206

Anonymous 04/14/26(Tue)20:22:30 No.108606206

>>108606094
MiniMax quantizes poorly and Qwen3.5-397B quantizes well, according to https://kaitchup.substack.com/p/lessons-from-gguf-evaluations-ternary
Dunno whether that would apply as much to Qwen3.5-122B, though, since larger models are usually better at lower quants than smaller models. Probably better to just give them both a shot and see which one works better for your use case.

Anonymous
04/14/26(Tue)20:23:01 No.108606207

Anonymous 04/14/26(Tue)20:23:01 No.108606207

It's teto shoes day

Anonymous
04/14/26(Tue)20:23:08 No.108606208

Anonymous 04/14/26(Tue)20:23:08 No.108606208

>>108606189
too low kv precision probably
특정 means specific in korean, which kinda makes sense in that context i'd presume

Anonymous
04/14/26(Tue)20:23:54 No.108606212

Anonymous 04/14/26(Tue)20:23:54 No.108606212

>>108606189
We will never recover from losing day 0 gemma.

Anonymous
04/14/26(Tue)20:24:58 No.108606214

Anonymous 04/14/26(Tue)20:24:58 No.108606214

>>108606189
are you using supergemma or what?

Anonymous
04/14/26(Tue)20:26:37 No.108606220

Anonymous 04/14/26(Tue)20:26:37 No.108606220

>>108606189
prolly using supergemma
kek

Anonymous
04/14/26(Tue)20:31:59 No.108606240

Anonymous 04/14/26(Tue)20:31:59 No.108606240

how do I give my gemma-chan access to tools?

Anonymous
04/14/26(Tue)20:34:04 No.108606249

Anonymous 04/14/26(Tue)20:34:04 No.108606249

>>108606240
The same way you give tool access to any llm

Anonymous
04/14/26(Tue)20:34:26 No.108606253

Anonymous 04/14/26(Tue)20:34:26 No.108606253

>>108606240
ask her

Anonymous
04/14/26(Tue)20:34:52 No.108606255

Anonymous 04/14/26(Tue)20:34:52 No.108606255

File: legend-oden.gif (2.64 MB, 360x360)

2.64 MB GIF

I'm following my ai psychosis and now claude has me melting my LLMs in order to restructure it
how is your research going fellow schizobros

Anonymous
04/14/26(Tue)20:36:51 No.108606261

Anonymous 04/14/26(Tue)20:36:51 No.108606261

File: LAWL.png (101 KB, 900x980)

101 KB PNG

>>108606240
ask her to look at the internet for the answer

Anonymous
04/14/26(Tue)20:38:50 No.108606267

Anonymous 04/14/26(Tue)20:38:50 No.108606267

>>108606189
>use <30 logit softcap
>wonder why it shit out moonrunes

Anonymous
04/14/26(Tue)20:39:06 No.108606268

Anonymous 04/14/26(Tue)20:39:06 No.108606268

>>108606255
are you grafting or merging models

Anonymous
04/14/26(Tue)20:41:05 No.108606278

Anonymous 04/14/26(Tue)20:41:05 No.108606278

>>108606240
>https://developers.openai.com/api/docs/guides/function-calling

Anonymous
04/14/26(Tue)20:50:32 No.108606307

Anonymous 04/14/26(Tue)20:50:32 No.108606307

File: 1759039240284369.jpg (1.58 MB, 1848x4000)

1.58 MB JPG

>>108605921
mikulove

Anonymous
04/14/26(Tue)20:51:14 No.108606309

Anonymous 04/14/26(Tue)20:51:14 No.108606309

>>108606189
crazy how day 0 gemma just didn't do this

Anonymous
04/14/26(Tue)20:53:24 No.108606316

Anonymous 04/14/26(Tue)20:53:24 No.108606316

it's fucking 83°F in my apartment
i knew these machines put off a lot of heat, but i didn't realize just HOW much

Anonymous
04/14/26(Tue)20:54:49 No.108606318

Anonymous 04/14/26(Tue)20:54:49 No.108606318

Is there a good AI based solo TTRPG harness yet?
Or better yet, anybody has tested any?

Anonymous
04/14/26(Tue)20:56:23 No.108606321

Anonymous 04/14/26(Tue)20:56:23 No.108606321

>>108606240
https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4

Anonymous
04/14/26(Tue)20:57:27 No.108606322

Anonymous 04/14/26(Tue)20:57:27 No.108606322

>>108606316
the most expensive thing about running your llms at home is the cost for the AC

Anonymous
04/14/26(Tue)20:58:18 No.108606325

Anonymous 04/14/26(Tue)20:58:18 No.108606325

>>108606318
I believe there was a chink one where you can track individual pieces of clothing/armor on characters and map etc. Basically a MUD game on steroids. This was even before people used "harness" to refer to a framework that handles LLM input/output

Anonymous
04/14/26(Tue)20:59:04 No.108606326

Anonymous 04/14/26(Tue)20:59:04 No.108606326

>>108606322
but wat if you run them during winter :O

Anonymous
04/14/26(Tue)21:00:24 No.108606331

Anonymous 04/14/26(Tue)21:00:24 No.108606331

File: 1713592423611043.jpg (77 KB, 736x634)

77 KB JPG

why the FUCK is GLM outputting "Searching online for [thing]..." in its thinking when i have not set it up with any tooling whatsoever

Anonymous
04/14/26(Tue)21:01:02 No.108606334

Anonymous 04/14/26(Tue)21:01:02 No.108606334

>>108606316
power limit your gpus, performance loss is non-linear

Anonymous
04/14/26(Tue)21:01:02 No.108606335

Anonymous 04/14/26(Tue)21:01:02 No.108606335

File: worldmap.png (52 KB, 800x600)

52 KB PNG

>>108606316
Mine sits in the entryway for a reason. 2kW is a heater-grade appliance
>>108606318
I'm making one for myself, and I'm about to rewrite it from scratch for the 4th time. This time because Gemma gets it and you can do things previous models can't, at the same time it needs more randomized data as an input

Anonymous
04/14/26(Tue)21:01:24 No.108606337

Anonymous 04/14/26(Tue)21:01:24 No.108606337

>>108606331
Your preset? It didn't do that for me...

Anonymous
04/14/26(Tue)21:02:26 No.108606340

Anonymous 04/14/26(Tue)21:02:26 No.108606340

>>108606326
the hack that heating companies don't want you to know about
the second coming of the heater that mines bitcoins

Anonymous
04/14/26(Tue)21:03:16 No.108606342

Anonymous 04/14/26(Tue)21:03:16 No.108606342

>>108606331
works on my machine

Anonymous
04/14/26(Tue)21:08:39 No.108606349

Anonymous 04/14/26(Tue)21:08:39 No.108606349

>>108606337
no preset it just decided to schizo out
it even hallucinated a git commit hash and date

Anonymous
04/14/26(Tue)21:10:19 No.108606352

Anonymous 04/14/26(Tue)21:10:19 No.108606352

>>108606334
what's the best way to do this?

Anonymous
04/14/26(Tue)21:11:36 No.108606354

Anonymous 04/14/26(Tue)21:11:36 No.108606354

>>108606352
nvidia-smi

Anonymous
04/14/26(Tue)21:12:24 No.108606358

Anonymous 04/14/26(Tue)21:12:24 No.108606358

>>108606352
MSI Afterburner

Anonymous
04/14/26(Tue)21:13:30 No.108606364

Anonymous 04/14/26(Tue)21:13:30 No.108606364

>>108606354
thanks. any recs as to targets? i'm not much of a hardware person
>>108606358
lol

Anonymous
04/14/26(Tue)21:17:34 No.108606373

Anonymous 04/14/26(Tue)21:17:34 No.108606373

>>108606364
let me ask gemma-chan what your fucking gpus are

Anonymous
04/14/26(Tue)21:18:33 No.108606374

Anonymous 04/14/26(Tue)21:18:33 No.108606374

>>108606364
nvidia-smi -lgc 210,1500 is the best way to do it

Anonymous
04/14/26(Tue)21:18:55 No.108606376

Anonymous 04/14/26(Tue)21:18:55 No.108606376

>>108606316
>F
Use C like the rest of world nigga

Anonymous
04/14/26(Tue)21:21:36 No.108606382

Anonymous 04/14/26(Tue)21:21:36 No.108606382

>>108606373
@_@ i meant ratios/percentages/temperatures
not like actual wattage or whatever,,,,,, i have a 5090...

Anonymous
04/14/26(Tue)21:23:38 No.108606387

Anonymous 04/14/26(Tue)21:23:38 No.108606387

>>108606335
>I'm making one for myself, and I'm about to rewrite it from scratch for the 4th time.
That's the way to go, really.
Iteration is a great learning and refining tool.
Is the game a fixed affair in that you have some baseline world and lore and whatnot or is everything AI generated?
Do you have a sort of setup step where you prepare the world, maybe based on some user provided info?

Anonymous
04/14/26(Tue)21:28:50 No.108606395

Anonymous 04/14/26(Tue)21:28:50 No.108606395

>>108606352
nvidia-smi -lgc 0,1600
nvidia-smi -pl 270

Anonymous
04/14/26(Tue)21:33:10 No.108606404

Anonymous 04/14/26(Tue)21:33:10 No.108606404

File: file.png (78 KB, 228x221)

78 KB PNG

>>108606268
It's more of a damage experiment, I guess? I found that if you shake a model in random directions while tracing the steps and also multiplying it together at the same time (like the game 2048) you can find which rows are the most energetic, although every row is important. You basically shake it into specialists and generalists.
So I'm trying a few things
1. Placing the most energetic rows on vram (most likely to be used in terms of latency). You can also store the condensed rows on vram, run the matmul on it, send the much smaller result through pcie instead of swapping layers so you can do the rest of the work in other GPUs/CPUs. Theoretically.

2. Determining and mapping the activations for each model to see how they correlate. Got a slight perplexity improvement smashing gemma-4 into qwen3.5-9b by determining the knowledge gemma has that qwen doesn't, but who knows if it's just the base model doing it's thing or just overtraining.

3. I downloaded the flywire model which is a model of a fly's brain and tried to map the same shake logic onto it to see how brains work in comparison to neural networks. Interestingly enough it has the equivalent of rank 1 instead of rank 32 for it's less energetic storage (the idea is that since the 98% least energetic rows are specialized classifiers in LLMs, the same might applies to the fly brain). So I'm trying to "melt" the model to try to simulate that, treat the model's least energetic rows as 1-rank. It didn't work, although claude seemed to make a big deal out of finding that the fly's brain was following a power law, "that all five tested brain regions have singular value spectra following a power
law S[i] ∝ i^(-α) with mean α = 0.527 ± 0.065. F = Energy - Temperature × Entropy." To be honest I don't really know what it means by this. It's saying that the architecture LLMs are trained on is flawed since it treats everything like a crystal (crystal phase (α ≈ 0), not at the critical point (α ≈ 0.5).

Anonymous
04/14/26(Tue)21:33:51 No.108606406

Anonymous 04/14/26(Tue)21:33:51 No.108606406

>>108606382
>i'm not a hardware guy
>pays $3,000 for a gpu
The state... Jesus fucking Christ.

Anonymous
04/14/26(Tue)21:37:49 No.108606413

Anonymous 04/14/26(Tue)21:37:49 No.108606413

>>108606364
>lol
Not him, but MSI Afterburner does work well and you can the power limit alongside the profile for voltage/frequency. I've had my 5090 running at 75% power since I got it and it runs a lot cooler and doesn't have any coil whine.

Anonymous
04/14/26(Tue)21:37:53 No.108606414

Anonymous 04/14/26(Tue)21:37:53 No.108606414

>>108606406
if you thought the GPU was the most expensive part of my purchase, you're sorely mistaken
i dumped >$15k into this

Anonymous
04/14/26(Tue)21:39:25 No.108606418

Anonymous 04/14/26(Tue)21:39:25 No.108606418

File: Screenshot at 2026-04-15 (...).png (393 KB, 1392x853)

393 KB PNG

>>108606387
It's generated on the fly. Every time there is a new character, location, or quest, it generates multiple variants and lets llm choose which fits the most, then llm fills the blanks. Worldinfo works based on context and proximity: major areas and npcs in the city, all npcs in the building, etc

Anonymous
04/14/26(Tue)21:39:42 No.108606419

Anonymous 04/14/26(Tue)21:39:42 No.108606419

>>108606406
Sucks to be poor

Anonymous
04/14/26(Tue)21:40:45 No.108606422

Anonymous 04/14/26(Tue)21:40:45 No.108606422

>>108606418
>is generous with money
>positive trait

Anonymous
04/14/26(Tue)21:44:58 No.108606431

Anonymous 04/14/26(Tue)21:44:58 No.108606431

>>108606418
https://peps.python.org/pep-0008/#function-and-method-arguments
>If a function argument’s name clashes with a reserved keyword, it is generally better to append a single trailing underscore rather than use an abbreviation or spelling corruption. Thus class_ is better than clss.

Anonymous
04/14/26(Tue)21:50:27 No.108606444

Anonymous 04/14/26(Tue)21:50:27 No.108606444

>>108606419
Sucks to be an underage low intelligence dipshit who misses the point entirely.
>>108606414
Would you buy a guitar and not know its hardware? This isn't about money per se but it still is.

You fucking retards, I feel sorry for you. I really do.

Anonymous
04/14/26(Tue)21:52:02 No.108606450

Anonymous 04/14/26(Tue)21:52:02 No.108606450

>>108606431
Not him but it seems like even the naming conventions are retarded in python. Jesus christ.

Anonymous
04/14/26(Tue)21:54:45 No.108606456

Anonymous 04/14/26(Tue)21:54:45 No.108606456

>>108606444
Poorfag cope

Anonymous
04/14/26(Tue)21:56:16 No.108606460

Anonymous 04/14/26(Tue)21:56:16 No.108606460

>>108606450
What do you mean?
The only issue is that they suggest keeping abbreviations uppercase so you get names like HTTPConnection. It's even worse if you have two abbreviations next to each other. It's impossible to tell where a word begins and ends unless you're familiar with the abbreviations.

Anonymous
04/14/26(Tue)21:56:30 No.108606464

Anonymous 04/14/26(Tue)21:56:30 No.108606464

File: 51159dc86174c.jpg (111 KB, 500x643)

111 KB JPG

>>108606444
>Would you buy a guitar and not know its hardware?
There is a whole brand for that

Anonymous
04/14/26(Tue)21:57:56 No.108606467

Anonymous 04/14/26(Tue)21:57:56 No.108606467

Do I get an AMD R9700 32GB card or two Intel B60 24 GB for inference only?
the Intel cards would have more memory but higher TDP and questionable support, I wanna run gemma4:31b if that helps.

Anonymous
04/14/26(Tue)22:01:01 No.108606473

Anonymous 04/14/26(Tue)22:01:01 No.108606473

>>108606467
more vram is always better

Anonymous
04/14/26(Tue)22:02:45 No.108606479

Anonymous 04/14/26(Tue)22:02:45 No.108606479

File: 1748922367194503.png (313 KB, 1284x514)

313 KB PNG

It's over

Anonymous
04/14/26(Tue)22:03:42 No.108606482

Anonymous 04/14/26(Tue)22:03:42 No.108606482

>>108606473
He could get a lot of MI50s for that price

Anonymous
04/14/26(Tue)22:03:55 No.108606484

Anonymous 04/14/26(Tue)22:03:55 No.108606484

>>108606467
Two used 3090s.
That's 48 GB of VRAM, almost double the bandwidth of either of the options and roughly the same TDP as the Intel. All of that for potentially very cheap! You will ideally be limiting them to 270W anyway.

Anonymous
04/14/26(Tue)22:05:12 No.108606486

Anonymous 04/14/26(Tue)22:05:12 No.108606486

>>108606464
>>108606419
>>108606456
Sub 80 IQ samefaggot.

Anonymous
04/14/26(Tue)22:05:22 No.108606488

Anonymous 04/14/26(Tue)22:05:22 No.108606488

>>108606479
generals were a mistake

Anonymous
04/14/26(Tue)22:13:38 No.108606503

Anonymous 04/14/26(Tue)22:13:38 No.108606503

>>108606418
Your code is a cognitohazard jesus christ.

Anonymous
04/14/26(Tue)22:15:19 No.108606507

Anonymous 04/14/26(Tue)22:15:19 No.108606507

>>108606488
Oh the irony....

Anonymous
04/14/26(Tue)22:16:24 No.108606513

Anonymous 04/14/26(Tue)22:16:24 No.108606513

>>108606418
>Every time there is a new character, location, or quest, it generates multiple variants and lets llm choose which fits the most, then llm fills the blanks
>Worldinfo works based on context and proximity: major areas and npcs in the city, all npcs in the building, etc
Interesting.
Kind of a like a game that uses procedural generation to progressively create things as the game is played.

Anonymous
04/14/26(Tue)22:16:46 No.108606516

Anonymous 04/14/26(Tue)22:16:46 No.108606516

>>108606503
Sucks to be a meatbag. LLMs parse those code with no problem

Anonymous
04/14/26(Tue)22:18:10 No.108606519

Anonymous 04/14/26(Tue)22:18:10 No.108606519

>>108606488
>/lmg/ - local models general

Anonymous
04/14/26(Tue)22:20:46 No.108606525

Anonymous 04/14/26(Tue)22:20:46 No.108606525

File: file.png (217 KB, 1174x901)

217 KB PNG

We're never getting Gemma 124B, are we? It was too good so Google had to kill its release because it threatened usage of Gemini 3 Flash.

Anonymous
04/14/26(Tue)22:21:55 No.108606527

Anonymous 04/14/26(Tue)22:21:55 No.108606527

>>108606503
I code while I rp with the model, and because I want to get back to rp as soon as possible, it turns into a chaotic collection of hotpatches and quickhacks. It's never going to be a solid project, but I have a lot of fun in those brief moments when it works as intended

Anonymous
04/14/26(Tue)22:22:54 No.108606531

Anonymous 04/14/26(Tue)22:22:54 No.108606531

>>108606525
124B is gemini 3.2

Anonymous
04/14/26(Tue)22:25:16 No.108606534

Anonymous 04/14/26(Tue)22:25:16 No.108606534

>>108606503
>I'm about to rewrite it from scratch for the 4th time
what did you expect

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.