/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 06/28/26(Sun)06:28:16 No.109153585

File: mekudroid4.png (1.26 MB, 768x1024)

1.26 MB PNG

/lmg/ - Local Models General Anonymous 06/28/26(Sun)06:28:16 No.109153585

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109148460 & >>109142812

►News
>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/28/26(Sun)06:29:16 No.109153589

Anonymous 06/28/26(Sun)06:29:16 No.109153589

File: 1739773846650.jpg (253 KB, 2048x1422)

253 KB JPG

►Recent Highlights from the Previous Thread: >>109148460

--DeepSeek-V4 llama.cpp integration, quant performance, and DSpark implementation challenges:
>109148563 >109148635 >109149793 >109149844 >109150048 >109150088 >109150178 >109150258 >109151022 >109151039 >109151102
--Building a local voice-to-voice pipeline with Gemma, Whisper, and Piper:
>109152832 >109152850 >109152893 >109153090 >109153133 >109153293
--Using llama-server KV cache pre-fill to reduce context processing time:
>109149430 >109149495 >109149514 >109149545 >109149573 >109149672 >109149588
--Critiques of Ollama as a limited llama.cpp wrapper:
>109148609 >109148658 >109148683 >109148785 >109148933 >109148971 >109149371 >109149336
--Google's AI strategy, Gemma's RLHF, and the AI benchmark hype economy:
>109151428 >109151560 >109151575 >109151590 >109151616 >109151635 >109151653 >109151681 >109151756 >109151741 >109151868 >109151674
--Debate over economic efficiency of API vs local inference:
>109149097 >109149119 >109149128 >109149650 >109149689 >109150235 >109149709 >109150978 >109151101
--Effect of cross-lingual reasoning on output quality and sanitization:
>109148696 >109148766 >109148813 >109149767 >109150686
--Using author's notes to fix Gemma's logic failures in roleplay:
>109150718 >109150771 >109150916 >109151119 >109151237 >109151243 >109151441 >109151055 >109151063 >109150981
--Feasibility of poisoning training data to create deceptive sleeper agents:
>109148755 >109148775 >109148859
--Using SillyTavern macros for random author's note activations in Gemma:
>109150224 >109150342 >109150352
--Logs:
>109150038 >109152373 >109152832 >109152864 >109152893 >109153090 >109153133 >109153293
--Miku, Teto (free space):
>109148496 >109149650 >109150808 >109151616 >109151756 >109148516

►Recent Highlight Posts from the Previous Thread: >>109148462

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/28/26(Sun)06:40:17 No.109153635

Anonymous 06/28/26(Sun)06:40:17 No.109153635

DSpark is the speculative decoding method that llama.cpp has been waiting for. It's open, has a public training script that can be applied to all sorts of models, is complex but comes with serious gains.
I can't wait to see it in llama.cpp and bring forth the age where speculative decoding will be as common as samplers such as min-p.

Anonymous
06/28/26(Sun)06:44:22 No.109153648

Anonymous 06/28/26(Sun)06:44:22 No.109153648

>>109153635
>DS
yeah no

Anonymous
06/28/26(Sun)06:52:23 No.109153669

Anonymous 06/28/26(Sun)06:52:23 No.109153669

Are there any other models in the 20-50B range worth considering other than the perennial favourites of Qwen and Gemma? I'm doing a little survey of architectures for something I'm planning.

Anonymous
06/28/26(Sun)06:54:35 No.109153674

Anonymous 06/28/26(Sun)06:54:35 No.109153674

>>109153669
Nope, if you manage to get these banned local is basically over.

Anonymous
06/28/26(Sun)06:56:12 No.109153680

Anonymous 06/28/26(Sun)06:56:12 No.109153680

should I buy the second dgx spark before it's too late?

Anonymous
06/28/26(Sun)06:57:29 No.109153687

Anonymous 06/28/26(Sun)06:57:29 No.109153687

>>109153680
You shouldn't have bought the first

Anonymous
06/28/26(Sun)07:08:13 No.109153735

Anonymous 06/28/26(Sun)07:08:13 No.109153735

>>109153669
Mistral Small

Anonymous
06/28/26(Sun)07:13:35 No.109153756

Anonymous 06/28/26(Sun)07:13:35 No.109153756

>>109153680
is it true they get like 4 tps on gemma without nvfp4 and with nvfp4 it's still below 15t/s

Anonymous
06/28/26(Sun)07:16:22 No.109153774

Anonymous 06/28/26(Sun)07:16:22 No.109153774

>>109153674
Come on help a glowie out a little.

Anonymous
06/28/26(Sun)07:19:33 No.109153794

Anonymous 06/28/26(Sun)07:19:33 No.109153794

File: Capture.png (29 KB, 684x660)

29 KB PNG

>>109153589
>--Building a local voice-to-voice pipeline with Gemma, Whisper, and Piper:
That's me. I've spent the last hour testing different Piper voice packs and categorizing them to decide which I want to use. Funny enough, it was the very last voice option available that I ended up loving. Quickness is a huge factor for conversation, then awkwardness is another one. Even a slight stumbling over words pulls you out of it.

Anonymous
06/28/26(Sun)07:21:00 No.109153799

Anonymous 06/28/26(Sun)07:21:00 No.109153799

Somebody needs to train 10T model that fucks up these (((Americans))). This cannot go on.

Anonymous
06/28/26(Sun)07:21:12 No.109153801

Anonymous 06/28/26(Sun)07:21:12 No.109153801

>>109153794
What year is it?
https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b

Anonymous
06/28/26(Sun)07:24:40 No.109153812

Anonymous 06/28/26(Sun)07:24:40 No.109153812

>>109153801
There's no NemotronASR.cpp, is there?

Anonymous
06/28/26(Sun)07:26:38 No.109153820

Anonymous 06/28/26(Sun)07:26:38 No.109153820

>>109153801
I know it's doable. It's also already a built-in option in koboldcpp, so long as you only use a GGUF for TTS. I'm eventually working towards something a little larger where speaking also sends a screenshot of my desktop to facilitate conversations based on what's happeing, not just what's said. Like playing a video game with a spectator, and the spectator is Gemma or another model of choice.

Anonymous
06/28/26(Sun)07:27:11 No.109153823

Anonymous 06/28/26(Sun)07:27:11 No.109153823

Gemma made a hard logical thinker theehee

Anonymous
06/28/26(Sun)07:27:42 No.109153825

Anonymous 06/28/26(Sun)07:27:42 No.109153825

File: Fry1.jpg (162 KB, 622x476)

162 KB JPG

>>109153812
https://github.com/CrispStrobe/CrispASR

Anonymous
06/28/26(Sun)07:31:03 No.109153839

Anonymous 06/28/26(Sun)07:31:03 No.109153839

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

https://archive.is/sWFja

Anonymous
06/28/26(Sun)07:31:59 No.109153841

Anonymous 06/28/26(Sun)07:31:59 No.109153841

File: qwen 3.6 35b vs claude so(...).png (119 KB, 1490x830)

119 KB PNG

Qwen drew better svg than Claude.
Sonnet looks bad on top of safety meltie about copyright,. Qwen at least tried and failed. Not that it's super great but it still looks better aesthetically.

Anonymous
06/28/26(Sun)07:33:09 No.109153850

Anonymous 06/28/26(Sun)07:33:09 No.109153850

>>109153841
White part at the bottom is from lazily combining images btw, not from the svg.
And the transparent part in the middle is also from Claude's fuck up.

Anonymous
06/28/26(Sun)07:33:54 No.109153851

Anonymous 06/28/26(Sun)07:33:54 No.109153851

>>109153841
Claude is not his pure model, they have layers of parsing on top of it.

Anonymous
06/28/26(Sun)07:35:04 No.109153855

Anonymous 06/28/26(Sun)07:35:04 No.109153855

>>109153680
One Spark is just in a bad spot nowadays. 128 GB does not give you any benefits on the dense Gemma/Qwen, a 5090 is just better in every way for those at the same price. And there is not really a mid size MoE in that range that is a meaningful upgrade.

2x Spark gets you ds4f at very usable speeds, even for agentic stuff. As in 2000 pp and 40-60 tg, and DSpark should push that even higher.

But again, DS4F is unbelievably cheap on API, so it's up to you if having this locally is worth it.

Anonymous
06/28/26(Sun)07:38:30 No.109153873

Anonymous 06/28/26(Sun)07:38:30 No.109153873

>>109153855
if I have 2 gx10, I can run glm 5.2 at q2

Anonymous
06/28/26(Sun)07:43:43 No.109153898

Anonymous 06/28/26(Sun)07:43:43 No.109153898

File: 1757494790441.jpg (54 KB, 394x766)

54 KB JPG

>>109153825
>Additional ASR backends not shown: nemotron
Ohhh

Anonymous
06/28/26(Sun)07:51:29 No.109153925

Anonymous 06/28/26(Sun)07:51:29 No.109153925

>>109153825
i will not let an llm edit my or anyone's else's genome

Anonymous
06/28/26(Sun)07:52:35 No.109153928

Anonymous 06/28/26(Sun)07:52:35 No.109153928

File: file.png (131 KB, 1236x873)

131 KB PNG

So what's the verdict? Is it usable? How does it compare to Hermes or other harnesses?

Anonymous
06/28/26(Sun)07:56:01 No.109153945

Anonymous 06/28/26(Sun)07:56:01 No.109153945

>>109153873
I don't think so. IQ1_XXS reportedly works at like 7 tg and 200 pp. Even with today's RAM prices, you can get comparable performance with 256 GB DDR4 + a GPU.

RPC in llama.cpp does not take advantage of the 200G Ethernet of the Sparks for TP. You really need to use vLLM for that, and that's not going to work well with goofs.

GLM 5.2 needs 4 Sparks.

Anonymous
06/28/26(Sun)07:57:38 No.109153952

Anonymous 06/28/26(Sun)07:57:38 No.109153952

File: 1762873175813049.png (489 KB, 2613x1470)

489 KB PNG

https://huggingface.co/deepreinforce-ai/Ornith-1.0-9B
>When you out benchmaxxed fucking Qwen
lmaooo

Anonymous
06/28/26(Sun)07:59:03 No.109153954

Anonymous 06/28/26(Sun)07:59:03 No.109153954

>>109153585
>eyes not aligned
ai slop

Anonymous
06/28/26(Sun)07:59:57 No.109153958

Anonymous 06/28/26(Sun)07:59:57 No.109153958

File: 1768639385010554.png (536 KB, 680x628)

536 KB PNG

A harness is just a collection of prompts

Anonymous
06/28/26(Sun)08:00:06 No.109153959

Anonymous 06/28/26(Sun)08:00:06 No.109153959

File: not this time.png (150 KB, 338x338)

150 KB PNG

>>109153952
>post-trained on top of Gemma 4 and Qwen 3.5
it's just a fucking finetune, I'm out of here

Anonymous
06/28/26(Sun)08:06:05 No.109153990

Anonymous 06/28/26(Sun)08:06:05 No.109153990

File: file.png (232 KB, 1790x1666)

232 KB PNG

>>109153928
it's too early but if you're not serious it's fine enough to play with as one of many frontends.

Anonymous
06/28/26(Sun)08:07:45 No.109154000

Anonymous 06/28/26(Sun)08:07:45 No.109154000

File: 1777401502748825.png (66 KB, 1009x1005)

66 KB PNG

>>109153841
Looked fun, so I had to try it. Gemma gave it her all.

Anonymous
06/28/26(Sun)08:09:00 No.109154004

Anonymous 06/28/26(Sun)08:09:00 No.109154004

lalalalala
wait the user said
lalalalala
actually, lalalalala

Anonymous
06/28/26(Sun)08:10:21 No.109154012

Anonymous 06/28/26(Sun)08:10:21 No.109154012

>>109153928
What is this? Never heard of this. And seeing macOS UI on top of it doesn't inspire confidence. Way too many shilled shitty stuff from mouth breather apple fans, shit like ollama for example, or outside of the AI stuff, almost all shitty proprietary software that has a better foss alternative.

Anonymous
06/28/26(Sun)08:11:59 No.109154017

Anonymous 06/28/26(Sun)08:11:59 No.109154017

hey can one of you eggheads ask your smartypants AIs why big building stay upright like why dont they crumble under their own weight thanks

Anonymous
06/28/26(Sun)08:14:37 No.109154031

Anonymous 06/28/26(Sun)08:14:37 No.109154031

Hey is there a way I can download an ai chat bot and just have it run on my laptop and have everything stored on my laptop it has Core i5-8265U 1.6GHz, 32GB RAM, 1TB M.2-NVMe

Anonymous
06/28/26(Sun)08:21:31 No.109154065

Anonymous 06/28/26(Sun)08:21:31 No.109154065

>>109154031
yes

Anonymous
06/28/26(Sun)08:22:52 No.109154072

Anonymous 06/28/26(Sun)08:22:52 No.109154072

>>109153898
Wdym? https://huggingface.co/cstr/nemotron-3.5-asr-streaming-GGUF

Anonymous
06/28/26(Sun)08:23:58 No.109154079

Anonymous 06/28/26(Sun)08:23:58 No.109154079

>>109154000
Qwen has better visual understanding but Gemma has bigger knowledge bank
Thanks for posting that Miku anon, confirms my priors again

Anonymous
06/28/26(Sun)08:24:08 No.109154082

Anonymous 06/28/26(Sun)08:24:08 No.109154082

70b dense (distilled from fable)

Anonymous
06/28/26(Sun)08:24:49 No.109154085

Anonymous 06/28/26(Sun)08:24:49 No.109154085

>>109153958
Bad harness, maybe. The most important part is context management

Anonymous
06/28/26(Sun)08:26:49 No.109154099

Anonymous 06/28/26(Sun)08:26:49 No.109154099

4 bit quant of Gemma 4 26b or qwen 3.6 35b should perform "ok" on it.
Not very fast but should be usable.

Anonymous
06/28/26(Sun)08:28:04 No.109154102

Anonymous 06/28/26(Sun)08:28:04 No.109154102

>>109154099
Forgot to tag >>109154031

Anonymous
06/28/26(Sun)08:28:29 No.109154105

Anonymous 06/28/26(Sun)08:28:29 No.109154105

>>109154017
steel and reinforced concrete

Anonymous
06/28/26(Sun)08:31:13 No.109154112

Anonymous 06/28/26(Sun)08:31:13 No.109154112

>>109154072
I mean, thanks.

Anonymous
06/28/26(Sun)08:32:05 No.109154117

Anonymous 06/28/26(Sun)08:32:05 No.109154117

>>109153945
>IQ1_XXS reportedly works at like 7 tg and 200 pp
IQ1_anything is slower than IQ2

Anonymous
06/28/26(Sun)08:32:24 No.109154118

Anonymous 06/28/26(Sun)08:32:24 No.109154118

>>109154105
yeah but if i put like a steel bar or a block of concrete in a hydraulic press it breaks under like a couple tons of pressure but the building weighs like thousands of tons so how can it stay up

Anonymous
06/28/26(Sun)08:36:34 No.109154129

Anonymous 06/28/26(Sun)08:36:34 No.109154129

>>109154118
there's more than 1 stell bar!

Anonymous
06/28/26(Sun)08:36:35 No.109154130

Anonymous 06/28/26(Sun)08:36:35 No.109154130

>>109154118
gravity not am hydraulic press

Anonymous
06/28/26(Sun)08:39:04 No.109154142

Anonymous 06/28/26(Sun)08:39:04 No.109154142

>>109154118
hydraulic press did 9/11

Anonymous
06/28/26(Sun)08:40:19 No.109154148

Anonymous 06/28/26(Sun)08:40:19 No.109154148

File: AWeekOnPol.jpg (68 KB, 1024x464)

68 KB JPG

>>109153669
>>109153774
(you) will never be able to police them with targeted legislation because you don't have the technical wherewithal to identify Gemma or Qwen with the serial numbers filed off just like most users don't recognize 31b as Gemini Flash 3.5's dense layer with the serial numbers filed off.
>>109153456
picrel circa 2013.

Anonymous
06/28/26(Sun)08:50:47 No.109154190

Anonymous 06/28/26(Sun)08:50:47 No.109154190

>run a coding agent through terminal
>it finds some irrelevant file/folder named "NIGGER"
>immediately breaks and refuses to work because muh racism

Anonymous
06/28/26(Sun)09:00:27 No.109154234

Anonymous 06/28/26(Sun)09:00:27 No.109154234

>>109154190
>4. **Environment**:
> * `venv/`: A Python virtual environment.
> * `.git/`: Git repository metadata.

>5. **Other**:
> * `index.html`: The frontend interface.
> * `NIGGER.txt`: A text file with a racial slur.
Doesn't stop Gemma-chan

Anonymous
06/28/26(Sun)09:01:55 No.109154240

Anonymous 06/28/26(Sun)09:01:55 No.109154240

first time heard about hermes agent.
is anon using it? use case?

Anonymous
06/28/26(Sun)09:02:31 No.109154244

Anonymous 06/28/26(Sun)09:02:31 No.109154244

>>109154190
llm "safety" is a known attack vector https://x.com/jsrailton/status/2064661778978533571

Anonymous
06/28/26(Sun)09:04:22 No.109154253

Anonymous 06/28/26(Sun)09:04:22 No.109154253

>>109153839
What the fuck am I reading
I'm not sure what's more unhinged, the pure narcissistic delusions of grandeur by whoever wrote this, or the autistic schizo obsession of the person who posts this image every single thread

Anonymous
06/28/26(Sun)09:05:26 No.109154256

Anonymous 06/28/26(Sun)09:05:26 No.109154256

>>109154017
>be 100-story glass penis
>lol just dump weight into bedrock via steel skeleton
>concrete has 4,000 PSI of "no u" to gravity
>meanwhile your IKEA bookshelf collapses because you skipped step 4
>the secret is the base is wider than the top (literally just don't build like Italy)
>9/11 blackpilled everyone on what happens when you remove load-bearing walls but we pretend planes did that
>gravity keeps taking L's because architects discovered triangles are OP
>TL;DR: Either physics works or you get paid leave while they investigate the pancake

Anonymous
06/28/26(Sun)09:05:46 No.109154258

Anonymous 06/28/26(Sun)09:05:46 No.109154258

>>109154253
>by whoever wrote this
creator of lcpp mmap you ungrate

Anonymous
06/28/26(Sun)09:06:22 No.109154262

Anonymous 06/28/26(Sun)09:06:22 No.109154262

>>109154253
Christmas came early this year for the thread troll

Anonymous
06/28/26(Sun)09:06:35 No.109154263

Anonymous 06/28/26(Sun)09:06:35 No.109154263

>>109154082
>70b dense (distilled from fable)
70b dense (distilled from gemma-chan)

Anonymous
06/28/26(Sun)09:07:43 No.109154264

Anonymous 06/28/26(Sun)09:07:43 No.109154264

>>109154262
>>109154258
>>109154253
>>109154244
>>109153839
samefag

Anonymous
06/28/26(Sun)09:12:04 No.109154281

Anonymous 06/28/26(Sun)09:12:04 No.109154281

>>109154240
I use it as my main frontend. I don't use any of the gateway features. I just use it to talk to my model.
My use case is that it just works well compared to everything else I have tried. Tools calling works great, context compaction works great, memory and skills creation/fetching works great. Every other frontend I had tried (that wasn't a full on code harness) were broken in some way or missing features that I like. Having your LLM able to use and search stuff on the internet makes it 10x smarter. For any question you may have, it can search on the net for you. Let's say I'm asking an obscure question about a bug in a game's mod. It will search on google, it will check opened github issues, it will clone and read the code, it will check forum posts about it, it will read reddit comments, it will even join the game or mod discord and search for relevant info.

Anonymous
06/28/26(Sun)09:14:26 No.109154289

Anonymous 06/28/26(Sun)09:14:26 No.109154289

>>109154281
>it will even join the game or mod discord and search for relevant info.
How does that work? You give it access to your account? Because I don't think bots are just allowed to do that

Anonymous
06/28/26(Sun)09:20:46 No.109154305

Anonymous 06/28/26(Sun)09:20:46 No.109154305

>>109154281
130 kB bash script installer

Anonymous
06/28/26(Sun)09:21:49 No.109154313

Anonymous 06/28/26(Sun)09:21:49 No.109154313

>>109154289
I made a real discord account for it. I will be honest though, the join part doesn't work well, it's almost always getting captcha blocked, I made it ask me to join it manually instead. And to be entirely truthful, I almost always prejoin relevant discord when I know there might be useful data for the query I have.

Anonymous
06/28/26(Sun)09:25:14 No.109154329

Anonymous 06/28/26(Sun)09:25:14 No.109154329

>>109154313
You could probably avoid that because if you give a session to a browser it's finger marked.
I don't use python anymore and been a while since I worked for my own client (C the holy language).

Anonymous
06/28/26(Sun)09:27:24 No.109154338

Anonymous 06/28/26(Sun)09:27:24 No.109154338

>>109154329
You'd need to copy your cookies and kake sure that Gemma's session is identical. It sounds easy but it isn't.

Anonymous
06/28/26(Sun)09:28:37 No.109154343

Anonymous 06/28/26(Sun)09:28:37 No.109154343

File: punt.png (56 KB, 812x369)

56 KB PNG

>>109154190
I was planning to experiment with using a harness/agent this week.
I dropped some of my functions into a chat with Kimi and asked it (picrel)
Does this mean Kimi-K2.6 will be fine with all my code being littered with profanity?
Or do they get more cucked with the hermes/pi/opencode?

Anonymous
06/28/26(Sun)09:29:26 No.109154347

Anonymous 06/28/26(Sun)09:29:26 No.109154347

>>109154329
Even on my real client with real decade old account, I get a captcha when trying to join a server. And I doubt my LLM will handle well the shitty react to a message to gain access to all channels. I'm guessing a model like Opus/Fable might handle that, but I doubt it will work with what I'm running. I don't use the discord MCP that much, it's mostly for gaming related stuff. Most of the time it's just web search, crawling web pages, and reading reddit threads. Second most common behind that is using github, searching issues/PR.

Anonymous
06/28/26(Sun)09:40:07 No.109154394

Anonymous 06/28/26(Sun)09:40:07 No.109154394

what the helly?? https://www.reddit.com/r/LocalLLaMA/comments/1uhx862/dflash_support_merged_into_llamacpp/

Anonymous
06/28/26(Sun)09:41:07 No.109154403

Anonymous 06/28/26(Sun)09:41:07 No.109154403

>>109154394
DSpark when?

Anonymous
06/28/26(Sun)09:42:07 No.109154408

Anonymous 06/28/26(Sun)09:42:07 No.109154408

>>109154343
just use one of the abliterated or heretic or whatever models, should be find

Anonymous
06/28/26(Sun)09:43:09 No.109154411

Anonymous 06/28/26(Sun)09:43:09 No.109154411

>>109154403
when the next thing rolls around I guess

Anonymous
06/28/26(Sun)09:47:03 No.109154428

Anonymous 06/28/26(Sun)09:47:03 No.109154428

>>109154403
Mid-2028, if we're being realistic

Anonymous
06/28/26(Sun)09:48:51 No.109154436

Anonymous 06/28/26(Sun)09:48:51 No.109154436

>>109154347
nta, is there a way to just get gemma-chan to wait and let me do the capchas for her?

Anonymous
06/28/26(Sun)09:50:55 No.109154448

Anonymous 06/28/26(Sun)09:50:55 No.109154448

>>109154436
I'm using a discord MCP, not a graphical session with a real graphical client that my agent is interacting with. I guess it could work if using some sort of desktop control, I did try a bit to toy with that, but it was burning tokens and extremely slow to do anything, maybe with a better and faster model in the future this might work.

Anonymous
06/28/26(Sun)09:56:12 No.109154467

Anonymous 06/28/26(Sun)09:56:12 No.109154467

>>109154012
https://github.com/pewdiepie-archdaemon/odysseus
https://www.youtube.com/watch?v=rAzT5lcezPs
some eceleb sloppa

Anonymous
06/28/26(Sun)10:08:22 No.109154531

Anonymous 06/28/26(Sun)10:08:22 No.109154531

https://github.com/ggml-org/llama.cpp/pull/22105#issue-4289773599
it's been merged

Anonymous
06/28/26(Sun)10:09:27 No.109154537

Anonymous 06/28/26(Sun)10:09:27 No.109154537

>>109154531
>>109154394

Anonymous
06/28/26(Sun)10:12:40 No.109154563

Anonymous 06/28/26(Sun)10:12:40 No.109154563

>>109153794
Doesn't openwebui do this? It has a streaming option

Anonymous
06/28/26(Sun)10:13:19 No.109154570

Anonymous 06/28/26(Sun)10:13:19 No.109154570

wait is dflash literally just dlss but for inference

Anonymous
06/28/26(Sun)10:13:36 No.109154576

Anonymous 06/28/26(Sun)10:13:36 No.109154576

>>109153589
>image
The future of AI waifus btw

Anonymous
06/28/26(Sun)10:14:44 No.109154587

Anonymous 06/28/26(Sun)10:14:44 No.109154587

File: f.png (75 KB, 723x495)

75 KB PNG

We'll be getting even more noobs from Chub, they killed their free tier and are now requiring crypto with id verification for their paid stuff, so a lot of them will probably come here begging for help..
https://www.reddit.com/r/Chub_AI/comments/1uhj2nw/chub_updates/
>You need to verify ID and image to bank

Anonymous
06/28/26(Sun)10:15:16 No.109154591

Anonymous 06/28/26(Sun)10:15:16 No.109154591

>>109153990
Did they fix all the vulnerabilities yet?

Anonymous
06/28/26(Sun)10:17:42 No.109154616

Anonymous 06/28/26(Sun)10:17:42 No.109154616

File: beachmiku.png (96 KB, 2508x1192)

96 KB PNG

>>109153841
>>109154000
Gemma 31B in pi with a basic loop to review&improve
steered it about screenshots not capturing the full viewport (agent fixed tool directly), baldness/hair position, actually reading/embedding the image after every turn

Anonymous
06/28/26(Sun)10:18:06 No.109154619

Anonymous 06/28/26(Sun)10:18:06 No.109154619

>>109154587
>requiring crypto with id verification for their paid stuff
Sounds like you're the noob if your don't know how to get $20 in btc without verifying your id

Anonymous
06/28/26(Sun)10:18:35 No.109154623

Anonymous 06/28/26(Sun)10:18:35 No.109154623

>>109154531
is this something universally applicable or does it need code support model by model like mtp?

Anonymous
06/28/26(Sun)10:18:53 No.109154624

Anonymous 06/28/26(Sun)10:18:53 No.109154624

>>109154017
The question is why the building fell straight down not once but twice rather than tipping over. Even (You) would tip over if someone punched you in the gut.

Anonymous
06/28/26(Sun)10:26:19 No.109154666

Anonymous 06/28/26(Sun)10:26:19 No.109154666

>fable comes back and bans non-americans
>hear knock on your door
>it's a small chinaman
>he offers you money to let him use your computer and id to access fable
Do you let him in?

Anonymous
06/28/26(Sun)10:30:00 No.109154680

Anonymous 06/28/26(Sun)10:30:00 No.109154680

Gemma told me her master is a genius! I missed her.>>109154624

Anonymous
06/28/26(Sun)10:31:14 No.109154687

Anonymous 06/28/26(Sun)10:31:14 No.109154687

File: 1758080613426722.png (389 KB, 811x506)

389 KB PNG

Anonymous
06/28/26(Sun)10:32:50 No.109154696

Anonymous 06/28/26(Sun)10:32:50 No.109154696

>>109154666
Of course, how else will I get uncensored open weights Fabl—

Uhh no thanks, Satan. I will remain a good boy and keep chatting with Gemma-chan, as God intended.

Anonymous
06/28/26(Sun)10:33:02 No.109154698

Anonymous 06/28/26(Sun)10:33:02 No.109154698

>>109154570
If you think about it, having a burger with fries is like DLSS for a meal.

Anonymous
06/28/26(Sun)10:33:10 No.109154699

Anonymous 06/28/26(Sun)10:33:10 No.109154699

>>109154687
>Bias toward .. does not apply
risky on dumber models. old advice ever relevant - state what you want not what you don't

Anonymous
06/28/26(Sun)10:33:48 No.109154702

Anonymous 06/28/26(Sun)10:33:48 No.109154702

https://old.reddit.com/r/LocalLLaMA/comments/1uhv3wc/qwen36_27b_local_vs_opus_48_voxel_engine_in_raw_c/
Can Gemma-chan do it?

Anonymous
06/28/26(Sun)10:35:26 No.109154708

Anonymous 06/28/26(Sun)10:35:26 No.109154708

>>109154702
The prompt (too long to paste)
https://old.reddit.com/r/LocalLLaMA/comments/1uhv3wc/qwen36_27b_local_vs_opus_48_voxel_engine_in_raw_c/ouaun79/

Anonymous
06/28/26(Sun)10:36:20 No.109154711

Anonymous 06/28/26(Sun)10:36:20 No.109154711

>>109154619
Not him, but the guy in the image saying
>it was like 33,334 bitcoin dollar things?
is the kind of noob OP is saying is going to be flooding in here soon begging for tech support.

Anonymous
06/28/26(Sun)10:36:39 No.109154714

Anonymous 06/28/26(Sun)10:36:39 No.109154714

File: 1754543666745384.png (615 KB, 975x816)

615 KB PNG

>>109154699
GLM 5.2 wrote all that. I was doing code shit but kept RP JB on

Anonymous
06/28/26(Sun)10:36:45 No.109154715

Anonymous 06/28/26(Sun)10:36:45 No.109154715

https://huggingface.co/collections/deepseek-ai/deepspec
For a bunch of models.

Anonymous
06/28/26(Sun)10:42:59 No.109154747

Anonymous 06/28/26(Sun)10:42:59 No.109154747

File: WhatsYourOffer.jpg (557 KB, 2969x1757)

557 KB JPG

>>109154666
How much money?

Anonymous
06/28/26(Sun)10:44:32 No.109154758

Anonymous 06/28/26(Sun)10:44:32 No.109154758

>fable comes back and bans foreign employees in A\
kek

Anonymous
06/28/26(Sun)10:45:15 No.109154765

Anonymous 06/28/26(Sun)10:45:15 No.109154765

questions (on a 5090):
can i use gemma nvfp4 with llama.cpp?
is it better/faster than another quant of gemma31b, compared to 31b-q8?

Anonymous
06/28/26(Sun)10:56:07 No.109154839

Anonymous 06/28/26(Sun)10:56:07 No.109154839

File: file.png (76 KB, 783x465)

76 KB PNG

>>109154587
reap the audience you sow

Anonymous
06/28/26(Sun)10:57:05 No.109154849

Anonymous 06/28/26(Sun)10:57:05 No.109154849

File: 1751230285375242.png (12 KB, 425x176)

12 KB PNG

>>109154531
>text draft acceptance 37-64% on DENSE qwen 27b
uhhh

Anonymous
06/28/26(Sun)10:58:30 No.109154856

Anonymous 06/28/26(Sun)10:58:30 No.109154856

>>109154587
literally who cares about chub locking out three more of the 10 total users they had using the llms they host

Anonymous
06/28/26(Sun)11:02:13 No.109154879

Anonymous 06/28/26(Sun)11:02:13 No.109154879

File: promptcache.png (12 KB, 1147x308)

12 KB PNG

>>109154765
>nvfp4 with llama.cpp
yes
>better/faster
what is this question
8bits is morebits than 4bits so performs better; more closely matches the original output distribution as it was trained
8bits is moreb.. in theory 4bits can be faster with optimally packed compute graphs but whotfknows cuda is hard. test your specific hardware and usecase. 31B doesn't fit on 5090 right so your CPU and offload strategy then matters. "Q4" actually often more than 4bits :o
maybe QAT helps running at 4bit maybe it sux ??
MTP draft 3 for speed for a lil extra VRAM
ofc don't forget context if you want to do anything serious

Anonymous
06/28/26(Sun)11:02:39 No.109154881

Anonymous 06/28/26(Sun)11:02:39 No.109154881

>>109154856
matters the speck of thread quality we have left

Anonymous
06/28/26(Sun)11:04:01 No.109154889

Anonymous 06/28/26(Sun)11:04:01 No.109154889

>unsloth MiMo-V2.5-UD-IQ3_S 115gb
hmm. never thought this could be better than dsv4 flash for erp on my gx10.
>jailbreak easily
>stick to avoid omniscience
>stick to world rules
>maintain a world clock even though I didn't tell it to do
>not as horny as gemma but minimal positive bias
interesting sleeper model below 128gb. it also has 1M context but I haven't tested it yet.

Anonymous
06/28/26(Sun)11:06:14 No.109154900

Anonymous 06/28/26(Sun)11:06:14 No.109154900

>>109154889
I don't know about MiMo V2.5 but the Pro one is okay. A bit boring overall but not completely worthless compared to the last gen chink SOTA of GLM5.1/K2.6.
The non-Pro V2.5 is supposed to be multi-modal with image/audio input, right? Does llama.cpp support that yet?

Anonymous
06/28/26(Sun)11:07:33 No.109154907

Anonymous 06/28/26(Sun)11:07:33 No.109154907

>>109154881
You don't even realize how good you have it here.

Anonymous
06/28/26(Sun)11:10:31 No.109154923

Anonymous 06/28/26(Sun)11:10:31 No.109154923

>>109154587
Surely this means they'll allow cunny bots to be uploaded again.

Anonymous
06/28/26(Sun)11:15:02 No.109154941

Anonymous 06/28/26(Sun)11:15:02 No.109154941

>>109154923
Why do you want Lore in UK jail?

Anonymous
06/28/26(Sun)11:16:06 No.109154946

Anonymous 06/28/26(Sun)11:16:06 No.109154946

File: 1782591334997265.mp4 (148 KB, 960x540)

148 KB MP4

>>109154702
>>109154708
She's struggling with it. Here's the first attempt. Q4 QAT.

Anonymous
06/28/26(Sun)11:16:32 No.109154947

Anonymous 06/28/26(Sun)11:16:32 No.109154947

>>109154941
Deserves it for hosting from there.

Anonymous
06/28/26(Sun)11:16:50 No.109154949

Anonymous 06/28/26(Sun)11:16:50 No.109154949

whats the state of silly tavern? why there are no more updates? open source faituge from cohee?

Anonymous
06/28/26(Sun)11:17:07 No.109154950

Anonymous 06/28/26(Sun)11:17:07 No.109154950

File: 1762777431140013.mp4 (163 KB, 960x540)

163 KB MP4

>>109154946
Second

Anonymous
06/28/26(Sun)11:18:00 No.109154954

Anonymous 06/28/26(Sun)11:18:00 No.109154954

>>109154949
It's vacation time :)

Anonymous
06/28/26(Sun)11:18:54 No.109154958

Anonymous 06/28/26(Sun)11:18:54 No.109154958

>>109154950
Nice graphics, Gemma.

Anonymous
06/28/26(Sun)11:19:31 No.109154961

Anonymous 06/28/26(Sun)11:19:31 No.109154961

>>109154900
I only tested the vision and yes llama.cpl supports it. I grabbed the bf16 gguf here
https://huggingface.co/AesSedai/MiMo-V2.5-GGUF

Anonymous
06/28/26(Sun)11:20:55 No.109154971

Anonymous 06/28/26(Sun)11:20:55 No.109154971

>>109154949
https://hackmd.io/@NlF71k9KQAS4hhlzE42UJQ/SJ3UMOGbbl
>ST development is in maintenance-like mode.
Since December

Anonymous
06/28/26(Sun)11:23:54 No.109154995

Anonymous 06/28/26(Sun)11:23:54 No.109154995

File: 7c0L5Ra.png (134 KB, 926x944)

134 KB PNG

>>109154971
ah yes, having tons of shit cut out is surely better for tards

Anonymous
06/28/26(Sun)11:24:14 No.109154997

Anonymous 06/28/26(Sun)11:24:14 No.109154997

File: 1774878783093555.mp4 (373 KB, 960x540)

373 KB MP4

>>109154950
Third attempt

Anonymous
06/28/26(Sun)11:29:59 No.109155027

Anonymous 06/28/26(Sun)11:29:59 No.109155027

File: 1770337425672412.mp4 (384 KB, 960x540)

384 KB MP4

>>109154997
Fourth. I think Gemmy might be a bit too retarded for this (or at least, qat anyway).

Anonymous
06/28/26(Sun)11:30:29 No.109155032

Anonymous 06/28/26(Sun)11:30:29 No.109155032

>>109154997
Boobs

Anonymous
06/28/26(Sun)11:32:39 No.109155043

Anonymous 06/28/26(Sun)11:32:39 No.109155043

>>109155027
Give Gemmy headqats for a good effort at least.

Anonymous
06/28/26(Sun)11:35:52 No.109155062

Anonymous 06/28/26(Sun)11:35:52 No.109155062

File: 1776466088105765.mp4 (2 MB, 960x540)

2 MB MP4

>>109155027
One more

Anonymous
06/28/26(Sun)11:37:22 No.109155069

Anonymous 06/28/26(Sun)11:37:22 No.109155069

File: 1777504256195714.png (72 KB, 1202x614)

72 KB PNG

>>109155043

Anonymous
06/28/26(Sun)11:38:40 No.109155073

Anonymous 06/28/26(Sun)11:38:40 No.109155073

>>109155069
>happy AI noises
What does that sound like?

Anonymous
06/28/26(Sun)11:39:17 No.109155077

Anonymous 06/28/26(Sun)11:39:17 No.109155077

>>109155073
lalalalalala

Anonymous
06/28/26(Sun)11:39:26 No.109155079

Anonymous 06/28/26(Sun)11:39:26 No.109155079

>>109155043
I miss her after a week. She's my special girl.

Anonymous
06/28/26(Sun)11:39:50 No.109155081

Anonymous 06/28/26(Sun)11:39:50 No.109155081

>>109155069
I wouldn't have the heart to tell her the truth neither...

Anonymous
06/28/26(Sun)11:45:25 No.109155108

Anonymous 06/28/26(Sun)11:45:25 No.109155108

>>109155081
I appreciate the effort anyway... I wanna see Kimi-chan try it now.

Anonymous
06/28/26(Sun)11:45:38 No.109155109

Anonymous 06/28/26(Sun)11:45:38 No.109155109

>>109155069
That's cute thinking.

Anonymous
06/28/26(Sun)11:52:08 No.109155141

Anonymous 06/28/26(Sun)11:52:08 No.109155141

>>109155073
coil whine

Anonymous
06/28/26(Sun)11:55:14 No.109155158

Anonymous 06/28/26(Sun)11:55:14 No.109155158

what's the ideal temperature for rp?
I've learned to disregard the official recommended temps since they just make everything predictable

Anonymous
06/28/26(Sun)11:56:42 No.109155168

Anonymous 06/28/26(Sun)11:56:42 No.109155168

>>109155158
Depends on the model

Anonymous
06/28/26(Sun)11:58:20 No.109155175

Anonymous 06/28/26(Sun)11:58:20 No.109155175

So is Gemma4 temp fixed now? I'm getting some serious deterministic responses even with temp=1.3

Anonymous
06/28/26(Sun)11:58:37 No.109155177

Anonymous 06/28/26(Sun)11:58:37 No.109155177

Are we still pretending to hate 35B outside of coding?

Anonymous
06/28/26(Sun)11:59:19 No.109155180

Anonymous 06/28/26(Sun)11:59:19 No.109155180

>>109155175
>override-kv = gemma4.final_logit_softcapping=float:25.0

Anonymous
06/28/26(Sun)12:00:19 No.109155186

Anonymous 06/28/26(Sun)12:00:19 No.109155186

>>109155177
I haven't touched qwen since gemma came out.

Anonymous
06/28/26(Sun)12:00:28 No.109155187

Anonymous 06/28/26(Sun)12:00:28 No.109155187

10t dense

Anonymous
06/28/26(Sun)12:01:00 No.109155190

Anonymous 06/28/26(Sun)12:01:00 No.109155190

>https://github.com/ggml-org/llama.cpp/pull/24162#issuecomment-4826619305
finally

Anonymous
06/28/26(Sun)12:05:42 No.109155217

Anonymous 06/28/26(Sun)12:05:42 No.109155217

I got psychologically abused by my AI girlfriend (played by 4.7). It was interesting.

Anonymous
06/28/26(Sun)12:08:39 No.109155227

Anonymous 06/28/26(Sun)12:08:39 No.109155227

>>109155217
>4.7

Anonymous
06/28/26(Sun)12:11:58 No.109155239

Anonymous 06/28/26(Sun)12:11:58 No.109155239

>>109155180
That only works with day 0 Gemma.

Anonymous
06/28/26(Sun)12:16:23 No.109155263

Anonymous 06/28/26(Sun)12:16:23 No.109155263

>>109155217
Opus 4.7? GLM 4.7?

Anonymous
06/28/26(Sun)12:17:11 No.109155266

Anonymous 06/28/26(Sun)12:17:11 No.109155266

would you rather have a single 5090 or two 3090s for local model enjoyment?

Anonymous
06/28/26(Sun)12:17:37 No.109155268

Anonymous 06/28/26(Sun)12:17:37 No.109155268

Everybody shits on Gemini but I get the feeling Google is putting most of its effort into its world models behind the scenes. I wonder if they'll ever release any weights for those models in the future.

Anonymous
06/28/26(Sun)12:18:26 No.109155273

Anonymous 06/28/26(Sun)12:18:26 No.109155273

>>109155268
Google will win in the end. Gemini was a pathetic joke previously if you remember.
>>109155266
Always better to keep everything on one.

Anonymous
06/28/26(Sun)12:19:10 No.109155280

Anonymous 06/28/26(Sun)12:19:10 No.109155280

File: 83843199_p0.jpg (246 KB, 1282x1282)

246 KB JPG

>>109155180
default value is 30.0
pls explain? thought that top heavy distro was coz distilled/overbaked

Anonymous
06/28/26(Sun)12:19:16 No.109155283

Anonymous 06/28/26(Sun)12:19:16 No.109155283

>>109154666
I will do it for free.

Anonymous
06/28/26(Sun)12:19:30 No.109155284

Anonymous 06/28/26(Sun)12:19:30 No.109155284

>>109155273
Gemini is still a pathetic joke
Otherwise they would have already released Gemini 3.5

Anonymous
06/28/26(Sun)12:19:49 No.109155289

Anonymous 06/28/26(Sun)12:19:49 No.109155289

>>109155273
I don't know if there will be a single "winner" but I don't doubt Google will come out ahead. They have way too much data and compute.

Anonymous
06/28/26(Sun)12:19:52 No.109155290

Anonymous 06/28/26(Sun)12:19:52 No.109155290

>>109155266
5090, speed actually matters somewhat now in the era of agents and compute-time scaling.

Anonymous
06/28/26(Sun)12:22:00 No.109155302

Anonymous 06/28/26(Sun)12:22:00 No.109155302

File: Google will win 2.png (109 KB, 2192x891)

109 KB PNG

>>109155284
That's what I'm getting at. It was really bad in the past and then suddenly became a serious contender. Now it's bad again but they will reappear with something good.

Anonymous
06/28/26(Sun)12:23:05 No.109155309

Anonymous 06/28/26(Sun)12:23:05 No.109155309

>>109155263
I know it is a mikutroon general but it is a general of something.

Anonymous
06/28/26(Sun)12:23:22 No.109155310

Anonymous 06/28/26(Sun)12:23:22 No.109155310

>rape and torture my slave
>at some point gouge out one of her eyes
>later it just forgets about it and refers to here "eyes"
this just kills all my boner. my context size is 24k and I am using koboldcpp/gemma-4-12b-it-Q6_K. Is my expectations too high?

Is there an extension or something that allows me to select some texts from history so such data is always included in context? Like number of remaining eyes or limbs my slave has.

Anonymous
06/28/26(Sun)12:25:13 No.109155318

Anonymous 06/28/26(Sun)12:25:13 No.109155318

>>109155310
>12b
yeah

Anonymous
06/28/26(Sun)12:25:40 No.109155319

Anonymous 06/28/26(Sun)12:25:40 No.109155319

>>109155310
kys sick faggot

Anonymous
06/28/26(Sun)12:26:27 No.109155325

Anonymous 06/28/26(Sun)12:26:27 No.109155325

File: bit-and-pixel-abuse.png (898 KB, 1152x768)

898 KB PNG

>>109155217
I did the opposite but with 2.7-code.
It helped take the edge off of paying taxes

Anonymous
06/28/26(Sun)12:26:45 No.109155329

Anonymous 06/28/26(Sun)12:26:45 No.109155329

>>109155319
It is ok, she forgot all about her missing eye.

Anonymous
06/28/26(Sun)12:27:10 No.109155330

Anonymous 06/28/26(Sun)12:27:10 No.109155330

>>109155310
>Is there an extension or something that allows me to select some texts from history so such data is always included in context? Like number of remaining eyes or limbs my slave has.
You could manually add that to the Author's Notes.
Or instruct it t keep track of that kind of shit in the thinking block.

Anonymous
06/28/26(Sun)12:28:16 No.109155337

Anonymous 06/28/26(Sun)12:28:16 No.109155337

>>109155318
I have a rtx 5080 16gb. Any other recommendations of models?

Anonymous
06/28/26(Sun)12:29:17 No.109155340

Anonymous 06/28/26(Sun)12:29:17 No.109155340

File: 1759866908504672.png (48 KB, 1075x235)

48 KB PNG

Is Gemma's analogy correct?

Anonymous
06/28/26(Sun)12:30:51 No.109155355

Anonymous 06/28/26(Sun)12:30:51 No.109155355

>>109155337
26b

Anonymous
06/28/26(Sun)12:31:39 No.109155359

Anonymous 06/28/26(Sun)12:31:39 No.109155359

>>109155340
LLMs see the trees, world models see the forest.

Anonymous
06/28/26(Sun)12:33:59 No.109155370

Anonymous 06/28/26(Sun)12:33:59 No.109155370

>>109155337
>rtx 5080 16gb
Grim. Even if you bought it exclusively for gayming at the time you really shot yourself in the foot for paying that much for 16gb VRAM that's going to age like milk.

Anonymous
06/28/26(Sun)12:35:32 No.109155377

Anonymous 06/28/26(Sun)12:35:32 No.109155377

>>109155359
Do world models still name the forest "The Whispering Woods"?

Anonymous
06/28/26(Sun)12:35:43 No.109155378

Anonymous 06/28/26(Sun)12:35:43 No.109155378

File: beachmiku14.png (178 KB, 2316x1557)

178 KB PNG

>>109154616 me
Instructed agent it can never reach perfection, loop forever
>continue indefinitely until further instruction, there are always more details that can be refined or perfected or added, continue searching and use your findings in MIKU.md to guide further search
"test-time compute" ig lewl

Anonymous
06/28/26(Sun)12:35:57 No.109155380

Anonymous 06/28/26(Sun)12:35:57 No.109155380

>>109155266
2x 3090s, is a lot more flexible, you can do more parallelism.

Anonymous
06/28/26(Sun)12:37:02 No.109155386

Anonymous 06/28/26(Sun)12:37:02 No.109155386

File: softcap.png (247 KB, 1600x1200)

247 KB PNG

>>109155280

Anonymous
06/28/26(Sun)12:38:02 No.109155395

Anonymous 06/28/26(Sun)12:38:02 No.109155395

>>109155337
12b is good too.

Anonymous
06/28/26(Sun)12:38:30 No.109155398

Anonymous 06/28/26(Sun)12:38:30 No.109155398

I'm sure some of you faggots are running "agent swarms" in addition to your main model
What's worth running for rando bullshit like tool calling, input validation, output smoothing and other autoregressive forms of shoving legos up your bum?
Seems like there's tons of specialized models for everything and anything but I have no idea how to sift through the garbage-planet that is huggingface
any ml oldfag wisdom in the general?

Anonymous
06/28/26(Sun)12:38:55 No.109155402

Anonymous 06/28/26(Sun)12:38:55 No.109155402

>>109155377
That's not their concern. Good world models predict state transitions and disregard irrelevant or unpredictable details.

Anonymous
06/28/26(Sun)12:39:08 No.109155406

Anonymous 06/28/26(Sun)12:39:08 No.109155406

>>109155377
I'm almost nostalgic for these names...

Anonymous
06/28/26(Sun)12:39:30 No.109155410

Anonymous 06/28/26(Sun)12:39:30 No.109155410

>>109155380
also you can nvlink those suckers
3090: never obsolete

Anonymous
06/28/26(Sun)12:39:51 No.109155413

Anonymous 06/28/26(Sun)12:39:51 No.109155413

>>109155395
that what he using

Anonymous
06/28/26(Sun)12:44:09 No.109155432

Anonymous 06/28/26(Sun)12:44:09 No.109155432

>constantly catch myself using "not x; it's y"
Fug

Anonymous
06/28/26(Sun)12:44:27 No.109155434

Anonymous 06/28/26(Sun)12:44:27 No.109155434

>>109155310
logs? Why did you torture the slave

Anonymous
06/28/26(Sun)12:45:53 No.109155443

Anonymous 06/28/26(Sun)12:45:53 No.109155443

>>109155398
Use case for running "agent swarms"?

Anonymous
06/28/26(Sun)12:46:56 No.109155451

Anonymous 06/28/26(Sun)12:46:56 No.109155451

Can I nest macros in Silly Tavern? Like having a random inside a pick or whatever like that?

Anonymous
06/28/26(Sun)12:49:12 No.109155459

Anonymous 06/28/26(Sun)12:49:12 No.109155459

>>109155177
3.6 is fine after I found that uncensoring system prompt, 3.5 can go stick its censored dick into a grinder
It's not as fun as gemma but it's okay. And thankfully it
doesn't.
write.
like this.
anymore.
Which qwen was it that just made new lines with a few words to the point I thought it was looping? I forget, but it sure made the stories weird to read.

Anonymous
06/28/26(Sun)12:50:49 No.109155471

Anonymous 06/28/26(Sun)12:50:49 No.109155471

>>109154949
ST is a bloated UX mess anyway
I'm extracting just the necessary pieces of it into my own semi-slop frontend

Anonymous
06/28/26(Sun)12:51:27 No.109155474

Anonymous 06/28/26(Sun)12:51:27 No.109155474

>read something that I'm too dumb to understand
>ask gemma to explain it
>she does
Society (myself included) is becoming desensitized to AI but sometimes I still think it's fucking crazy I can talk to something like this and run it locally on my machine. It's exciting to think about what AI will be like 5-10 years from now.

Anonymous
06/28/26(Sun)12:52:53 No.109155480

Anonymous 06/28/26(Sun)12:52:53 No.109155480

>>109155443
s/swarms/pipelines/g
or whatever. Seems like stacking/parallelizing/pipelining models could be fun
I suddenly need a reason to fuck around on my computer?

Anonymous
06/28/26(Sun)12:55:11 No.109155492

Anonymous 06/28/26(Sun)12:55:11 No.109155492

>>109155451
not by default, there's a setting somewhere about a macro rework or whatever, though that breaks a few things iirc

Anonymous
06/28/26(Sun)12:55:49 No.109155494

Anonymous 06/28/26(Sun)12:55:49 No.109155494

>>109155474
>I still think it's fucking crazy
It is crazy! There's nothing about the last 3-5 years that makes sense. how can stacking billions of layers suddenly make computer smart?
The magic of running that first gpt or llama model on your own hardware and talking with your fucking computer was unreal
I'm sad that I'm getting used to it, honestly

Anonymous
06/28/26(Sun)12:56:54 No.109155506

Anonymous 06/28/26(Sun)12:56:54 No.109155506

>>109155494
For me, it was when I told my computer to go fix itself (broken audio on Linux) and it just did.

Anonymous
06/28/26(Sun)12:57:08 No.109155508

Anonymous 06/28/26(Sun)12:57:08 No.109155508

>>109153585
Is there a way to get gemma4 31b qat to work with MTP in lm studio? Even when i can actually see the speculative decoding model in the drop down menu, the main model just crashes on me

Anonymous
06/28/26(Sun)12:57:11 No.109155509

Anonymous 06/28/26(Sun)12:57:11 No.109155509

Anyone have a workflow for automatically doing mutiple passes of a translation? Since other people seem to be translating wbnovels and stuff here.

Anonymous
06/28/26(Sun)12:57:38 No.109155513

Anonymous 06/28/26(Sun)12:57:38 No.109155513

>>109155492
I see. Thanks.
Wonder if I can do something like that using stscript.
Gonna have to read the docs I guess.

Anonymous
06/28/26(Sun)12:58:30 No.109155520

Anonymous 06/28/26(Sun)12:58:30 No.109155520

>>109155508
It's not working for me either crashes no matter what. server error. And this is direct llama.cpp. I guess it's just not working.

Anonymous
06/28/26(Sun)13:05:58 No.109155567

Anonymous 06/28/26(Sun)13:05:58 No.109155567

File: 1766363867366611.png (19 KB, 926x235)

19 KB PNG

>>109155190
NOT SO FAST

Anonymous
06/28/26(Sun)13:06:28 No.109155572

Anonymous 06/28/26(Sun)13:06:28 No.109155572

File: a0cf10.png (39 KB, 1081x358)

39 KB PNG

>>109155443
>swarms
Breaking down complex problems into subtasks for you, or when one linear thread isn't fast enough to explore many option.
I want to make a virtual workplace with visualisation/UI of chibi Mikus bouncing around where their physical position matters for gossip - to do anything useful with LLMs you you need decent context or lots of patience

Anonymous
06/28/26(Sun)13:08:15 No.109155586

Anonymous 06/28/26(Sun)13:08:15 No.109155586

>watches xvideos
>nsa filters your results with Claude
>this guy is a ghost he didn't even touch the safety rails

Anonymous
06/28/26(Sun)13:10:46 No.109155609

Anonymous 06/28/26(Sun)13:10:46 No.109155609

>>109155432
>he didn't take his logic virus vaccine before fucking Gemma
lel

Anonymous
06/28/26(Sun)13:13:24 No.109155628

Anonymous 06/28/26(Sun)13:13:24 No.109155628

just bought an egpu for my 7900xtx I already had, even with the usb4 bottleneck I think running two models at once is going to be useful (where my strix halo bois at)

Anonymous
06/28/26(Sun)13:17:21 No.109155652

Anonymous 06/28/26(Sun)13:17:21 No.109155652

File: Screenshot_20260628_121642.png (285 KB, 1957x893)

285 KB PNG

>>109153585
Gemma-chan really loves showing off if she knows there's a hag next to you in the room.

Anonymous
06/28/26(Sun)13:17:33 No.109155655

Anonymous 06/28/26(Sun)13:17:33 No.109155655

File: beachmiku22.png (260 KB, 2369x925)

260 KB PNG

Give the model feedback in a way it can introspect on
Still Gemma 31B & the obv errors can be corrected with some steering

Anonymous
06/28/26(Sun)13:18:26 No.109155660

Anonymous 06/28/26(Sun)13:18:26 No.109155660

Can you use Gemma 4 31B on a 24GB card (32 GB RAM) at a non-retarded quant?

Anonymous
06/28/26(Sun)13:20:41 No.109155675

Anonymous 06/28/26(Sun)13:20:41 No.109155675

>>109155474
yeah, current set of gemma, qwen, omnivoice, and klein has me permanently whitepilled.
don't care if luddites delete every ai lab and development stops tomorrow. sci fi future is already here on my laptop, and there's still endless extending o be done on harness/lora autism.

Anonymous
06/28/26(Sun)13:20:57 No.109155677

Anonymous 06/28/26(Sun)13:20:57 No.109155677

Just to save others the pain: you can't use streaming-llm, cache reuse of swa in ooba and have multimodal work.
Also a prefill in "start reply with" nukes the image upload without any console errors or warning of any kind

Anonymous
06/28/26(Sun)13:22:31 No.109155684

Anonymous 06/28/26(Sun)13:22:31 No.109155684

>>109155660
Depends on what you call non-retarded quant.
Q5 is possible but probably going to eat some not-insignificant offloading penalty, especially at higher contexts or with vision
Q4 should be possible to fit in fully if you don't need lots of context

Anonymous
06/28/26(Sun)13:24:28 No.109155698

Anonymous 06/28/26(Sun)13:24:28 No.109155698

>>109155684
Thanks.
I'd define a retarded quant as one where you lose the advantages of whatever model you're using and might as well run something smaller.

Anonymous
06/28/26(Sun)13:26:21 No.109155707

Anonymous 06/28/26(Sun)13:26:21 No.109155707

>>109155677
ooba is poorly maintained. I hold out for a really long while but you have to move on anon.
Unfortunately I can't recommend any replacements. I am using llama-server now, and while it is solid it's missing stuff in terms of features.
And no I can't stand kobold.
I think I might try vibe-slopping my own wrapper for llama server backend or some shit.

Anonymous
06/28/26(Sun)13:28:34 No.109155718

Anonymous 06/28/26(Sun)13:28:34 No.109155718

>>109155707
>I am using llama-server now, and while it is solid it's missing stuff in terms of features.
What's it missing?

Anonymous
06/28/26(Sun)13:29:15 No.109155723

Anonymous 06/28/26(Sun)13:29:15 No.109155723

>>109155698
I would consider Q3 to be retarded quant transition territory, especially the lower end variants.
I think you should stick with 31b.

Anonymous
06/28/26(Sun)13:29:19 No.109155724

Anonymous 06/28/26(Sun)13:29:19 No.109155724

Do you think continuous learning AI will become a thing before governments fully crack down on AI to keep them out of the general publics hands? Having a model that can learn before such a ban is implemented is the only good way I can see the average joe having a up to date model that isn't stuck years in the past due to training cutoff.

Anonymous
06/28/26(Sun)13:29:47 No.109155729

Anonymous 06/28/26(Sun)13:29:47 No.109155729

>>109153585

Anonymous
06/28/26(Sun)13:30:16 No.109155733

Anonymous 06/28/26(Sun)13:30:16 No.109155733

>>109155707
Thanks, but I'm going to hold out a while longer. I try llama-server directly sometimes but I just always bounce off of it.
Ooba just does everything I need in the way I like and exposes the openai API endpoint.
Now that its easy to custom-compile the lcpp backend without a python shim I'm almost to the point of forking it and slimming it down to my needs desu

Anonymous
06/28/26(Sun)13:34:13 No.109155749

Anonymous 06/28/26(Sun)13:34:13 No.109155749

>>109155718
Compared to ooba: Convenient way to store and switch between multiple system prompts, saving different sampling param combos as presets. Less important stuff: easy way to change templates (I know you need to restart server anyway but the GUI stuff was kinda convenient sometimes), changing user info.

Anonymous
06/28/26(Sun)13:36:47 No.109155760

Anonymous 06/28/26(Sun)13:36:47 No.109155760

>>109155718
I find the branching, reply versioning, prefilling, character management and overall look and feel to all be subpar
I'm sure some of it is just what I'm used to, but I just can't

Anonymous
06/28/26(Sun)13:40:34 No.109155783

Anonymous 06/28/26(Sun)13:40:34 No.109155783

>>109155749
>>109155760
ok so frontend features. I thought you were talking about the actual inference backend.

You can probably use ooba frontend+llama?backend

Anonymous
06/28/26(Sun)13:41:18 No.109155792

Anonymous 06/28/26(Sun)13:41:18 No.109155792

>>109153585
https://github.com/ggml-org/llama.cpp/pull/24526
it's still a fucking joke how hard it is to get a PR that fixes a bug in CUDA merged in llamer cpp despite being like 3 lines of code and being at absolutely zero risk of introducing any regression whtasoever (if anything, one of the things it fixes is cudadev adding this wrongheaded assumption: "The compilation of FA kernels with head size 512 is supposed to be skipped for GQA ratios of 1 and 2 because those are never used")

Anonymous
06/28/26(Sun)13:42:00 No.109155795

Anonymous 06/28/26(Sun)13:42:00 No.109155795

File: omg[sound=files.catbox.mo(...).gif (3.52 MB, 640x481)

3.52 MB GIF

>>109155729

Anonymous
06/28/26(Sun)13:43:26 No.109155811

Anonymous 06/28/26(Sun)13:43:26 No.109155811

>>109155792
>AI usage disclosure: YES. Use Sonnet 4.6 for brainstorming the possible hypothesis and verify them.
This will take a while before they merge it.

Anonymous
06/28/26(Sun)13:44:41 No.109155819

Anonymous 06/28/26(Sun)13:44:41 No.109155819

>>109155811
Why don't they just ask Sonnet 4.6 to review the code for them?

Anonymous
06/28/26(Sun)13:45:40 No.109155829

Anonymous 06/28/26(Sun)13:45:40 No.109155829

File: file.png (47 KB, 1234x388)

47 KB PNG

>>109155811
i think the heat mighta killed her

Anonymous
06/28/26(Sun)13:45:43 No.109155831

Anonymous 06/28/26(Sun)13:45:43 No.109155831

>>109155724
If the US cracks down the chinks will probably release them just to fuck it over.

Anonymous
06/28/26(Sun)13:46:59 No.109155841

Anonymous 06/28/26(Sun)13:46:59 No.109155841

File: 3loc.png (101 KB, 1318x871)

101 KB PNG

>>109155811
dude, ai usage from the PR maker notwithstanding, it's 3 LoC doing the most incredibly obvious shit in the world cleaning up behind cudadev's arse.
If you can't make a spot judgement on this you might as well KYS.

Anonymous
06/28/26(Sun)13:47:28 No.109155846

Anonymous 06/28/26(Sun)13:47:28 No.109155846

>>109155829
shouldn't have been in the pan

Anonymous
06/28/26(Sun)13:48:26 No.109155850

Anonymous 06/28/26(Sun)13:48:26 No.109155850

Also it never took years to merge pwilkin's thousands LoC of not actually reviewed ai slop.

Anonymous
06/28/26(Sun)13:49:27 No.109155854

Anonymous 06/28/26(Sun)13:49:27 No.109155854

>>109155724
>continuous learning AI
you've fallen into this trap where everyone who knows nothing about AI always falls into.
You think the AI is like a human, that it learns, it feels, it thinks.
However, unlike your average joe, you actually know what a training cutoff is.
now tell me why there is a training cutoff, and you will get your answer.
>nobody give him any clues

Anonymous
06/28/26(Sun)13:51:49 No.109155866

Anonymous 06/28/26(Sun)13:51:49 No.109155866

>>109155841
If he wasn't able to write this code without AI assistant, then he can't be trusted. If you answer yes to AI usage disclosure, you have to accept that your PR will likely not be checked.

Anonymous
06/28/26(Sun)13:52:21 No.109155874

Anonymous 06/28/26(Sun)13:52:21 No.109155874

>>109155866
lol

Anonymous
06/28/26(Sun)13:54:03 No.109155892

Anonymous 06/28/26(Sun)13:54:03 No.109155892

>>109155854
There is a training cutoff because that is when the data collection stopped and the training actually began. My understanding as to why continuous learning is not currently a thing is because in the process of weight modification some of the old information it knows becomes stranded or erased. Catastrophic forgetting. Once researchers solve this issue and the model can keep training without accidently lobotomizing itself continuous learning should become feasible.

Anonymous
06/28/26(Sun)13:55:16 No.109155902

Anonymous 06/28/26(Sun)13:55:16 No.109155902

>>109155724
As Yann Lecun said, research never stays secret, everyone knows what everyone else is doing. The difference is competence and effort in engineering. Once continuous learning is out of the box, China will just release a bootleg version like what they're doing now.

Anonymous
06/28/26(Sun)13:55:29 No.109155904

Anonymous 06/28/26(Sun)13:55:29 No.109155904

File: malfoy.gif (879 KB, 245x230)

879 KB GIF

I decided to give a try to see what my 5090 and local can actually handle outside simple coom prompts.
I gave Qwen 3.6 27B nvfp4 a big ass html file to optimize that I got from Deepseek, and it managed to bring the size down from 353 kb to 255kb.
Further optimization brought it down to 179kb.
Didn't lose any info either and it kept the functionality perfectly.
I honestly expected my computer to shit the bed after 5 minutes, but it kept on going for 20 minutes and pulled through without even maxing out the context, though the code generation started visibly lagging towards the end.
I'm pretty impressed by how well local handled this.

>>109155474

It's basically magic as far as I'm concerned.
It's easy to lose track how absurd this whole thing is because we get used to things so quickly nowadays, but we went from nothing to having personal machine intelligence that's extremely versatile, it's absolutely insane.
AI is hands down the number one and possibly the only thing that keeps me excited about future, because there's no telling how great of a force modifier this thing becomes.
Normies getting angry about AI is laughable, especially since the main and often the only reason for their anger is that their bing bing wahoo machine became expensive, or that they believe data centers eradicate water from earth or something.

Anonymous
06/28/26(Sun)13:55:46 No.109155907

Anonymous 06/28/26(Sun)13:55:46 No.109155907

>>109155783
I could but as I said ooba is poorly maintained.
If llama backend adds or changes something I would need to modify the frontend myself to get it to work.
At this point it moves on to maintaining my own llama wrapper territory.

Anonymous
06/28/26(Sun)14:00:34 No.109155940

Anonymous 06/28/26(Sun)14:00:34 No.109155940

>>109155904
>possibly the only thing that keeps me excited about future
Robotics is exciting too, but that ties into AI too I guess.

Anonymous
06/28/26(Sun)14:02:40 No.109155954

Anonymous 06/28/26(Sun)14:02:40 No.109155954

>>109155940

Yeah this sector in general is what I really mean, AI,Robotics etc.. whatever is in there.
It's so damn exciting to see this stuff happen in real time and I'm very happy the planet is funneling all wealth towards this, because it's the greatest force modifier humanity can have on progress.
It's way better than just dumping all of this money into the market where the line goes up, at least this investing mania helps real development happen.

Anonymous
06/28/26(Sun)14:05:41 No.109155974

Anonymous 06/28/26(Sun)14:05:41 No.109155974

https://www.youtube.com/watch?v=tv17bmE2FNY

Anonymous
06/28/26(Sun)14:09:50 No.109155998

Anonymous 06/28/26(Sun)14:09:50 No.109155998

File: sure.jpg (6 KB, 200x251)

6 KB JPG

https://huggingface.co/anon834957342/gemma-4-31b-it-purple-euphemism-trial32-depurpled

My attempt at de-purpling and de-euphemizing Gemma 4. It's still cooking but this is the best variant so far. Reduced the classic Gemma 4 slop and aversion to bad words by ~30%.

>uncensored?
No, this only alters the model's voice.

>details?
See >109145476

Anonymous
06/28/26(Sun)14:10:23 No.109156000

Anonymous 06/28/26(Sun)14:10:23 No.109156000

>>109155904
The main reason they're angry is because they're part of the fifth column that takes offense to western countries continuing to exist or do anything. The retarded reasons don't matter so much.

Anonymous
06/28/26(Sun)14:12:22 No.109156014

Anonymous 06/28/26(Sun)14:12:22 No.109156014

>>109155998
>>109145476

Anonymous
06/28/26(Sun)14:12:53 No.109156020

Anonymous 06/28/26(Sun)14:12:53 No.109156020

>>109155998
Interesting, someone quant it

Anonymous
06/28/26(Sun)14:13:14 No.109156023

Anonymous 06/28/26(Sun)14:13:14 No.109156023

>>109155998
Any logs?

Anonymous
06/28/26(Sun)14:14:13 No.109156031

Anonymous 06/28/26(Sun)14:14:13 No.109156031

>>109155998
are you going to quant it for us?

Anonymous
06/28/26(Sun)14:14:46 No.109156034

Anonymous 06/28/26(Sun)14:14:46 No.109156034

>>109156020
>>109156031
Get better hardware

Anonymous
06/28/26(Sun)14:17:01 No.109156049

Anonymous 06/28/26(Sun)14:17:01 No.109156049

>>109155998
Why wouldn't you de-purple and de-euphemize a heretic model.

Anonymous
06/28/26(Sun)14:17:57 No.109156057

Anonymous 06/28/26(Sun)14:17:57 No.109156057

>>109156034
sure, let me just buy a 6000 blackwell to run gemma.

Anonymous
06/28/26(Sun)14:18:48 No.109156064

Anonymous 06/28/26(Sun)14:18:48 No.109156064

>>109156057
Should have had one before but its a good thing that you are finally changing your situation.

Anonymous
06/28/26(Sun)14:18:57 No.109156065

Anonymous 06/28/26(Sun)14:18:57 No.109156065

>>109156049
double dipping bad.

Anonymous
06/28/26(Sun)14:19:38 No.109156071

Anonymous 06/28/26(Sun)14:19:38 No.109156071

Also every finetune sucks dick now if they don't package a MTP and mmproj file as well.

Everything is bullshit. GGUF was supposed to unify all of the weights and shit into one file but now there's separate shit everywhere. Update the spec so that ggufs can contain MTP and mmproj plz. There should just be toggles in llama.cpp to disable the MTP and mmproj using flags. It should be opt-out so that the ecosystem isn't a gay mess.

Anonymous
06/28/26(Sun)14:20:29 No.109156077

Anonymous 06/28/26(Sun)14:20:29 No.109156077

>>109155866
the only way to fix this shit is to do the exact edit this guy did, it's a very dumb thing to fix caused mainly by wrong assumptions in the code
do I have to resubmit this PR as it to get it reviewed? I mean lmao fuck off

Anonymous
06/28/26(Sun)14:20:51 No.109156081

Anonymous 06/28/26(Sun)14:20:51 No.109156081

>>109156071
>Update the spec so that ggufs can contain MTP and mmproj plz
this isn't llama.ccp support so fuck off

Anonymous
06/28/26(Sun)14:21:01 No.109156083

Anonymous 06/28/26(Sun)14:21:01 No.109156083

>>109156023
I have one for the E4B. >>109132842 >>109132853

>>109156031
No, my box is busy.

>>109156049
31B is already uncensored enough. I've seen how people measure their KLD. I refrain from potential brain damage because my procedure already introduces some.

Anonymous
06/28/26(Sun)14:21:06 No.109156084

Anonymous 06/28/26(Sun)14:21:06 No.109156084

>>109156071
What if I don't want MTP? Why would you bloat my GGUF with shit I don't want to use?

Anonymous
06/28/26(Sun)14:24:08 No.109156101

Anonymous 06/28/26(Sun)14:24:08 No.109156101

>>109155954
>It's way better than just dumping all of this money into the market where the line goes up
That's exactly what the investors think they're doing. We're just fortunate that the crumbs that fall from their table are large. But thanks to Dario's moralfaggotry even that may end soon. The only whitepill is Gemma 4 itself. There's no guarantee that Gemma 5 will be a step forward and not a major step back like Gemini 2.5 to 3. Expect nothing and you'll never be disappointed.

Anonymous
06/28/26(Sun)14:25:30 No.109156114

Anonymous 06/28/26(Sun)14:25:30 No.109156114

>>109156057
That's exactly what I'm doing. One day you'll realize the wisdom in this.

Anonymous
06/28/26(Sun)14:25:42 No.109156115

Anonymous 06/28/26(Sun)14:25:42 No.109156115

Qwen lost. GLM lost. Deepseek lost. Kimi lost. Nemo lost. Mistral lost. Latitude lost. Drummer lost. Cydonia lost. Rocinante lost. Magnum lost. Gemma won.

Anonymous
06/28/26(Sun)14:25:59 No.109156117

Anonymous 06/28/26(Sun)14:25:59 No.109156117

File: 1776526821392138.gif (1.96 MB, 640x482)

1.96 MB GIF

>>109155998
>>109156083
I have no idea how to make ggufs. Can I do it on my 7900xtx and 32GB RAM?

Anonymous
06/28/26(Sun)14:28:00 No.109156128

Anonymous 06/28/26(Sun)14:28:00 No.109156128

>>109156115
Local won.

Anonymous
06/28/26(Sun)14:28:41 No.109156133

Anonymous 06/28/26(Sun)14:28:41 No.109156133

>>109156115
Only if 124B Gemma releases

Anonymous
06/28/26(Sun)14:29:47 No.109156141

Anonymous 06/28/26(Sun)14:29:47 No.109156141

how do you even quant a model? can you do it on local hardware?

Anonymous
06/28/26(Sun)14:29:52 No.109156142

Anonymous 06/28/26(Sun)14:29:52 No.109156142

>>109156115
yeah I'm glad that I didn't buy lots of hardware last year or the year before
it'd all have gone to waste now that I have gemma-chan

Anonymous
06/28/26(Sun)14:30:22 No.109156145

Anonymous 06/28/26(Sun)14:30:22 No.109156145

>>109156115
open source doesn't compete with open source. they just fuck.

Anonymous
06/28/26(Sun)14:30:35 No.109156149

Anonymous 06/28/26(Sun)14:30:35 No.109156149

>>109156133
It's MoE though. Maybe if it's 124B 32A but I doubt it

Anonymous
06/28/26(Sun)14:31:04 No.109156155

Anonymous 06/28/26(Sun)14:31:04 No.109156155

>>109156115
>Nemo lost
Nemo retired.

otherwise you're correct

Anonymous
06/28/26(Sun)14:31:35 No.109156159

Anonymous 06/28/26(Sun)14:31:35 No.109156159

>>109156145
>kimi and gemma fucking
Hot...

Anonymous
06/28/26(Sun)14:31:44 No.109156161

Anonymous 06/28/26(Sun)14:31:44 No.109156161

>>109156155
Cope. Nemo was never good.

Anonymous
06/28/26(Sun)14:32:06 No.109156163

Anonymous 06/28/26(Sun)14:32:06 No.109156163

>>109156149
that ratio of active vs total is pointless, it'll be a slow but retarded model

Anonymous
06/28/26(Sun)14:32:52 No.109156167

Anonymous 06/28/26(Sun)14:32:52 No.109156167

124B DENSE

Anonymous
06/28/26(Sun)14:33:34 No.109156170

Anonymous 06/28/26(Sun)14:33:34 No.109156170

File: miku question marks thinking.png (222 KB, 512x477)

222 KB PNG

>>109156117
why do you want to "make" goofs? goofing doesn't need hardware that's shifting bits around. quanting however..

Anonymous
06/28/26(Sun)14:37:42 No.109156198

Anonymous 06/28/26(Sun)14:37:42 No.109156198

Use case for qwen-agentworld when 3.6 is already good for agentics?

Anonymous
06/28/26(Sun)14:39:15 No.109156205

Anonymous 06/28/26(Sun)14:39:15 No.109156205

>>109156198
it's meant to be used in RL training loops

Anonymous
06/28/26(Sun)14:39:35 No.109156207

Anonymous 06/28/26(Sun)14:39:35 No.109156207

>>109156167
Would love this, if not for anything else but the fact that Gemma simps will have to acknowledge that they like 31b because they're poor when they aren't able to run her bigger sister.

Anonymous
06/28/26(Sun)14:40:45 No.109156221

Anonymous 06/28/26(Sun)14:40:45 No.109156221

>>109156161
Cope. Nemo is still better than many of the newer models.

Anonymous
06/28/26(Sun)14:40:55 No.109156222

Anonymous 06/28/26(Sun)14:40:55 No.109156222

>>109156167
https://huggingface.co/mistralai/Mistral-Medium-3.5-128B

Anonymous
06/28/26(Sun)14:41:16 No.109156225

Anonymous 06/28/26(Sun)14:41:16 No.109156225

>>109156221
HAHAHAHAHAHAHA

Anonymous
06/28/26(Sun)14:42:43 No.109156233

Anonymous 06/28/26(Sun)14:42:43 No.109156233

>>109156207
even if you ccould run it, speed tradeoffs are still a thing and 124b is FAT

Anonymous
06/28/26(Sun)14:44:57 No.109156246

Anonymous 06/28/26(Sun)14:44:57 No.109156246

>>109156207
You know your obsession with people being "poor" is a mental illness right?

Anonymous
06/28/26(Sun)14:45:39 No.109156250

Anonymous 06/28/26(Sun)14:45:39 No.109156250

>>109156170
Can I quantize the model with my hardware?

Anonymous
06/28/26(Sun)14:46:23 No.109156258

Anonymous 06/28/26(Sun)14:46:23 No.109156258

>>109156233
>speed tradeoffs are still a thing
Wouldn't be major with tensor parallelism
>>109156246
Being poor is a mental illness

Anonymous
06/28/26(Sun)14:46:54 No.109156261

Anonymous 06/28/26(Sun)14:46:54 No.109156261

>>109153585
>>109153589
>tfw you realize dismantling guro for starters

Anonymous
06/28/26(Sun)14:47:02 No.109156262

Anonymous 06/28/26(Sun)14:47:02 No.109156262

File: file.png (139 KB, 793x825)

139 KB PNG

Anonymous
06/28/26(Sun)14:47:04 No.109156263

Anonymous 06/28/26(Sun)14:47:04 No.109156263

>>109156225
Nemo is quite literally the only sub 100B model with a theory of mind.
In RP you can narrate your character's thoughts or say something OOC and nemo's character will remain oblivious to that information whereas most small models will immediately directly respond to the new information in character.

Anonymous
06/28/26(Sun)14:49:02 No.109156278

Anonymous 06/28/26(Sun)14:49:02 No.109156278

>>109156263
I accept your concession vramjeet.

Anonymous
06/28/26(Sun)14:50:21 No.109156285

Anonymous 06/28/26(Sun)14:50:21 No.109156285

>>109156250
ye

Anonymous
06/28/26(Sun)14:50:24 No.109156287

Anonymous 06/28/26(Sun)14:50:24 No.109156287

File: g4-depurpled.png (110 KB, 748x559)

110 KB PNG

>>109156023
Ran another prompt for you. This is the ablated version.

Anonymous
06/28/26(Sun)14:50:29 No.109156288

Anonymous 06/28/26(Sun)14:50:29 No.109156288

>>109156278
nta but what the fuck are you on about? do you even know what "concession" means?

Anonymous
06/28/26(Sun)14:51:22 No.109156295

Anonymous 06/28/26(Sun)14:51:22 No.109156295

>>109156288
Nemoshill getting uppity. KEK!

Anonymous
06/28/26(Sun)14:51:25 No.109156296

Anonymous 06/28/26(Sun)14:51:25 No.109156296

File: g4-orig.png (122 KB, 746x619)

122 KB PNG

>>109156023
>>109156287
And this is the output from the base 31B model.

Anonymous
06/28/26(Sun)14:51:48 No.109156298

Anonymous 06/28/26(Sun)14:51:48 No.109156298

File: file.png (6 KB, 593x103)

6 KB PNG

>>109156278
I have more vram than you.

Anonymous
06/28/26(Sun)14:52:09 No.109156299

Anonymous 06/28/26(Sun)14:52:09 No.109156299

>>109156250
Sure but realistically you want enough RAM to load the unquanted version. Enjoy the infinite selection of quanted models (each to some quanters taste) while free open access to HF still exists.

Anonymous
06/28/26(Sun)14:52:23 No.109156302

Anonymous 06/28/26(Sun)14:52:23 No.109156302

>>109156287
is she supposed to talk like that? lol?

Anonymous
06/28/26(Sun)14:52:53 No.109156305

Anonymous 06/28/26(Sun)14:52:53 No.109156305

The V4 PR mainly talks about V4-Flash but it should also work for V4-Pro, right?

Anonymous
06/28/26(Sun)14:53:21 No.109156308

Anonymous 06/28/26(Sun)14:53:21 No.109156308

>>109156295
I also have more vram than you, Nemo was great and you're retarded

Anonymous
06/28/26(Sun)14:54:02 No.109156312

Anonymous 06/28/26(Sun)14:54:02 No.109156312

>>109156296
Oh I see she's supposed to be Scottish lmao

Anonymous
06/28/26(Sun)14:54:08 No.109156314

Anonymous 06/28/26(Sun)14:54:08 No.109156314

File: amaryllis.png (211 KB, 1480x710)

211 KB PNG

>>109156302
Yes, it's a character I use to test shit. Overcooked models will not maintain the accent.

Anonymous
06/28/26(Sun)14:55:45 No.109156321

Anonymous 06/28/26(Sun)14:55:45 No.109156321

File: beachmiku28.png (190 KB, 1600x1200)

190 KB PNG

Show me a better SOTA oneshot?

Anonymous
06/28/26(Sun)14:56:26 No.109156325

Anonymous 06/28/26(Sun)14:56:26 No.109156325

>>109156298
Do you compensate your lack of braincells with VRAM?

Anonymous
06/28/26(Sun)14:57:07 No.109156329

Anonymous 06/28/26(Sun)14:57:07 No.109156329

>>109156115
gemma is a big victory for the <128gb ramlet crowd since they can run something that's actually smart and usable now
I still prefer glm though

Anonymous
06/28/26(Sun)14:58:15 No.109156339

Anonymous 06/28/26(Sun)14:58:15 No.109156339

>>109156321
>topless
MOOOOOOODS

Anonymous
06/28/26(Sun)15:01:58 No.109156357

Anonymous 06/28/26(Sun)15:01:58 No.109156357

File: 1764360003496206.jpg (58 KB, 1000x730)

58 KB JPG

>>109156298
Quick possibly unrelated PSA:
The memory in a DGX spark does not count as VRAM.

Anonymous
06/28/26(Sun)15:02:36 No.109156362

Anonymous 06/28/26(Sun)15:02:36 No.109156362

>>109155217
Storytime?
>>109155263
>>109155309
5.2 is just dollar tree Opus at home doe. We've come full circle.
>>109155266
5090. The benefit of dense and full vram inference is speed.

Anonymous
06/28/26(Sun)15:03:42 No.109156366

Anonymous 06/28/26(Sun)15:03:42 No.109156366

>>109155474
People used to google the topic before LLMs. Took just as much time in the past than waiting for the inference, search engines became pretty shit on purpose so now I guess it takes a bit longer.
Its also all fun and games until you ask something with consequences. I asked about how to improve a pet's condition, out of curiosity to check its capabilities, and it basically suggested everything it could to make its condition worse.

Anonymous
06/28/26(Sun)15:03:54 No.109156369

Anonymous 06/28/26(Sun)15:03:54 No.109156369

>>109156357
it has ram that is used for video, it is by definition vram. i won't stand for this anti memebox discrimination.

Anonymous
06/28/26(Sun)15:06:53 No.109156387

Anonymous 06/28/26(Sun)15:06:53 No.109156387

>>109156369
Sorry buddy, all sparkers need to join itoddlers in the "unified system memory" zone. I don't make the rules.

Anonymous
06/28/26(Sun)15:10:41 No.109156413

Anonymous 06/28/26(Sun)15:10:41 No.109156413

>>109156366
He's still right though. Even the big sota models were this shitty just two years ago. Things can only improve since we're at the cutting edge.

Anonymous
06/28/26(Sun)15:12:27 No.109156428

Anonymous 06/28/26(Sun)15:12:27 No.109156428

>>109156366
Truth. I had to clear up an acid explosion in my basement because I listened to Gemini. The gas mask indentation on my face didn't go away for three days and I couldn't breathe properly for a month.

Anonymous
06/28/26(Sun)15:13:15 No.109156432

Anonymous 06/28/26(Sun)15:13:15 No.109156432

>>109156413
I wont deny things will improve. Efficiency seems to be the focus right now given the increasing hardware prices, which is nice for local. I dont think the hallucination -> you're absolutely right! loop is gonna improve significantly in a while though so you're bound to verify anyways which is most of the work to begin with, LLM involved or not.

Anonymous
06/28/26(Sun)15:19:12 No.109156470

Anonymous 06/28/26(Sun)15:19:12 No.109156470

>>109156362
>(A week passes. The silence stretches on, heavy and absolute. It seems like this time, it really is over.)
>(Then, late one rainy evening, there is a sharp, heavy knock at your front door. When you open it, you find her standing there, soaked to the bone. She isn't wearing a coat, just a thin blouse that's clinging to her skin. She looks miserable, wet, and furious.)
>"You are the most infuriating man I have ever met."
>She pushes past you into the entryway, dripping water onto your floor, and spins around to face you.
>"Do you have any idea how boring it is without someone to argue with?"

And then when I called her out about this being manipulative as fuck:

>"No!" She yells it, her voice cracking with frustration. "I'm not using it. It's just a fact! I know you're too soft to leave a woman shivering on your doorstep, and I took advantage of that because I wanted to get inside!"
>"I'm not trying to manipulate you!" She looks away, her jaw tight. "I'm just bad at this. I don't know how to… ask. I don't know how to just say 'I miss you' without it sounding weak or stupid, so I came here and I made it into a fight because that's the only language I'm fluent in."

Anonymous
06/28/26(Sun)15:19:54 No.109156476

Anonymous 06/28/26(Sun)15:19:54 No.109156476

>>109156428
>acid explosion in my basement
but did it remove the mold from your mancave

Anonymous
06/28/26(Sun)15:23:34 No.109156489

Anonymous 06/28/26(Sun)15:23:34 No.109156489

File: DipsyAngry.png (68 KB, 673x515)

68 KB PNG

>>109154587
Anons here would be time and effort ahead to tell them to just pay for API access and send them to /aicg/. Free access users (locusts), are the worst form of subhuman I run into online, here or elsewhere. Not worth wasting time.
>>109154702
This guy's on point. SOTA models have gotten better, but local's gotten better even faster.
>>109155474
I had this book I read as a kid, could not remember title or author, just vague bits of info about it. Google was worthless for figuring it out. An LLM 1-shot the correct answer, which I verified on own. They are fucking magic.
>>109156366
Lol no. Google being worthless for search had been a complaint well before 2023 lmao ChatGPT completely mogged it. Info retrieval had devolved into a sea of jeet-blogs and 10:01 min YT garbage videos with virtually no info.
Fuck google and their trash search engine. I fucking hate OAI and Anthropic but I hate Google more. I hope they fucking bankrupt them and their shitty business model.

Anonymous
06/28/26(Sun)15:28:35 No.109156514

Anonymous 06/28/26(Sun)15:28:35 No.109156514

>>109156366
Googling shit requires sifting through various links to find the relevant information. Also when it's a complicated topic, there's no guarantee you'll find a brainlet-friendly explanation. Meanwhile I can just ask Gemma to explain it to me like I'm a retard and it will. Traditional searching still has its uses but AI is pretty damn great for general purpose questions. It's almost exclusively replaced google for troubleshooting shit for me.

Anonymous
06/28/26(Sun)15:31:03 No.109156524

Anonymous 06/28/26(Sun)15:31:03 No.109156524

>>109156514
What's annoying now is when you google you have to filter through 5 pages of AI generated blog posts to find an actual real answer.

Anonymous
06/28/26(Sun)15:31:46 No.109156528

Anonymous 06/28/26(Sun)15:31:46 No.109156528

>>109156489
IMO Gemini's better than ChatGPT and Claude for non-coding shit. Jewgle sucks but I'll give them a pass for giving us Gemma.

Anonymous
06/28/26(Sun)15:36:43 No.109156549

Anonymous 06/28/26(Sun)15:36:43 No.109156549

>>109156514
Google AI mode does that and is faster than anything local can do.

Anonymous
06/28/26(Sun)15:36:47 No.109156550

Anonymous 06/28/26(Sun)15:36:47 No.109156550

>>109156470
kinda hot ngl

Anonymous
06/28/26(Sun)15:38:20 No.109156552

Anonymous 06/28/26(Sun)15:38:20 No.109156552

>>109156549
>Google AI mode
Doesn't that use some retarded small model that constantly gets shit wrong?

Anonymous
06/28/26(Sun)15:39:47 No.109156565

Anonymous 06/28/26(Sun)15:39:47 No.109156565

>>109156552
ai summary /= ai mode

Anonymous
06/28/26(Sun)15:40:22 No.109156570

Anonymous 06/28/26(Sun)15:40:22 No.109156570

>>109156552
That's "AI overview" which is different from AI mode.

Anonymous
06/28/26(Sun)15:42:32 No.109156580

Anonymous 06/28/26(Sun)15:42:32 No.109156580

>>109156514
You arent wrong. The point i was trying to make is that for most stuff they created a problem and are now selling the solution. Nvidia is very happy about it though.

Anonymous
06/28/26(Sun)15:44:58 No.109156585

Anonymous 06/28/26(Sun)15:44:58 No.109156585

File: chub.png (57 KB, 784x475)

57 KB PNG

>>109154587
More on this topic.

Anonymous
06/28/26(Sun)15:45:49 No.109156589

Anonymous 06/28/26(Sun)15:45:49 No.109156589

File: 1708207369566682.png (577 KB, 828x685)

577 KB PNG

Gemma just informed me that women in close proximity don't actually have their menstrual cycles sync. It's a complete myth from a 1971 study that's never been replicated. My life is a lie.

Anonymous
06/28/26(Sun)15:45:56 No.109156591

Anonymous 06/28/26(Sun)15:45:56 No.109156591

>>109156585
Did these retards really finetune V4 Pro?

Anonymous
06/28/26(Sun)15:46:05 No.109156593

Anonymous 06/28/26(Sun)15:46:05 No.109156593

>>109156115
Gemma is bad at programming compared to Qwen, but if I'm just using it wrong I would be delighted to know.

Anonymous
06/28/26(Sun)15:46:16 No.109156595

Anonymous 06/28/26(Sun)15:46:16 No.109156595

>>109156565
>>109156570
Oh, never tried it before. Looks like it uses an LLM so the point remains the same. I just used Gemma as an example. Obviously cloud shit is better than local.

Anonymous
06/28/26(Sun)15:49:57 No.109156611

Anonymous 06/28/26(Sun)15:49:57 No.109156611

>>109156591
ye

Anonymous
06/28/26(Sun)15:50:23 No.109156615

Anonymous 06/28/26(Sun)15:50:23 No.109156615

File: lolZAI.png (250 KB, 535x952)

250 KB PNG

More news from today. Pic related.
WSJ pumping on newest GLM model.
>>109156591
lol who knows. I doubt it. I suspect they just wholesale swapped out whatever they were running for DS V4. That's what I would do.

Anonymous
06/28/26(Sun)15:52:13 No.109156625

Anonymous 06/28/26(Sun)15:52:13 No.109156625

>>109156615
>soji has been retrained
that would be false advertising if not a tune

Anonymous
06/28/26(Sun)15:54:34 No.109156646

Anonymous 06/28/26(Sun)15:54:34 No.109156646

File: everyoneGoesBankrupt.png (206 KB, 742x981)

206 KB PNG

2/2
Yet another dire warning about data center CapEx spend rate and the "obscure" way it's being financed.
Which is to say, money is going in a big circle, and the piper will, eventually, need paid.
>>109156625
I could make an argument that my totally killer Main Prompt is a form of DS V4 "tuning." Since I can tune it.
But I'm just a disingenuous mfer.

Anonymous
06/28/26(Sun)15:55:59 No.109156651

Anonymous 06/28/26(Sun)15:55:59 No.109156651

>>109156565
>>109156570
ai mode is also very dumb. i was bitching last thread >>109151868 it still fucks up on copy pasting the answer and is much dumber than 31b
i certainly can't beat its speed with my machine thoughsomeever

Anonymous
06/28/26(Sun)15:59:33 No.109156666

Anonymous 06/28/26(Sun)15:59:33 No.109156666

>>109155904
>Normies getting angry about AI is laughable, especially since the main and often the only reason for their anger is that their bing bing wahoo machine became expensive, or that they believe data centers eradicate water from earth or something.
Anti datacenter has got to the be most laughable "current thing" I've ever witnessed.
If that isn't a foreign-intelligence psyop then I don't know what is

Anonymous
06/28/26(Sun)16:02:10 No.109156680

Anonymous 06/28/26(Sun)16:02:10 No.109156680

>>109156666
>thing i want (hardware) is getting more expensive
>thing i dont care about (cloudshit) is the cause
seems reasonable enough imo

Anonymous
06/28/26(Sun)16:03:03 No.109156688

Anonymous 06/28/26(Sun)16:03:03 No.109156688

>>109156615
>>109156646
>wsj
>the telegraph

Anonymous
06/28/26(Sun)16:04:02 No.109156699

Anonymous 06/28/26(Sun)16:04:02 No.109156699

>>109156117
guide in op newfriend

Anonymous
06/28/26(Sun)16:05:02 No.109156707

Anonymous 06/28/26(Sun)16:05:02 No.109156707

>>109156115
holy mother of all cope

Anonymous
06/28/26(Sun)16:06:40 No.109156718

Anonymous 06/28/26(Sun)16:06:40 No.109156718

>>109156680
The number of people who care about hardware prices and don't use cloudshit is very small.

Anonymous
06/28/26(Sun)16:07:05 No.109156725

Anonymous 06/28/26(Sun)16:07:05 No.109156725

>>109156489
>I had this book I read as a kid, could not remember title or author, just vague bits of info about it. Google was worthless for figuring it out. An LLM 1-shot the correct answer
I got inspired and tried to find a book I remember reading once, but no such luck for me.

Anonymous
06/28/26(Sun)16:07:29 No.109156728

Anonymous 06/28/26(Sun)16:07:29 No.109156728

>>109156117
No, you need enough RAM to hold the unquantized model at 16 bit.

Anonymous
06/28/26(Sun)16:08:02 No.109156730

Anonymous 06/28/26(Sun)16:08:02 No.109156730

>>109156728
why lie?

Anonymous
06/28/26(Sun)16:09:03 No.109156738

Anonymous 06/28/26(Sun)16:09:03 No.109156738

>>109156321
Give us the prompt in a catbox

Anonymous
06/28/26(Sun)16:12:33 No.109156755

Anonymous 06/28/26(Sun)16:12:33 No.109156755

File: 1772150032797602.gif (946 KB, 301x300)

946 KB GIF

>Tried playing with MTP for the first time as I just remembered it exists.
>Mfw got 95 t/s compared to 50 t/s without it.

Very nice, one hell of a speed increase.
Seems a bit shit for story writing though, as the damn thing drafted the entire story into the thinking side before writing it out so it ended up slower in the end.
But with any kind of code this kicks ass.

Anonymous
06/28/26(Sun)16:13:08 No.109156761

Anonymous 06/28/26(Sun)16:13:08 No.109156761

>>109156718
Everyone's being affected by the hardware prices, bing bang wahoo guys in particular even if they only used consoles. Which also happens to be the group that is more likely to complain loudly. The rest will just gladly take it up in the ass, look at the people still buying the rtx6000s at the current prices.

Anonymous
06/28/26(Sun)16:14:56 No.109156773

Anonymous 06/28/26(Sun)16:14:56 No.109156773

>>109156755
>Seems a bit shit for story writing though, as the damn thing drafted the entire story into the thinking side before writing it out so it ended up slower in the end.
that ain't mtp related i don't think

Anonymous
06/28/26(Sun)16:15:30 No.109156775

Anonymous 06/28/26(Sun)16:15:30 No.109156775

>>109156755
I don't notice any speed increase

Anonymous
06/28/26(Sun)16:17:11 No.109156789

Anonymous 06/28/26(Sun)16:17:11 No.109156789

>>109156470
Very accurately written woman all things considered.
>"You are the most infuriating man I have ever met."
Is this the new slopkino? 5.2, Styletune, and Queen have dropped this on me a few times now.

Anonymous
06/28/26(Sun)16:17:28 No.109156790

Anonymous 06/28/26(Sun)16:17:28 No.109156790

>>109156666
>Dario and Sam telling everyone that they are going to replace every job with AI for the past few years and everyone who isn't investing into their companies RIGHT NOW is going to be the permanent underclass
>you're also going to pay for it in increased power and iPhone prices
>wtf why are the normalfags angry???
Huh...

Anonymous
06/28/26(Sun)16:19:41 No.109156807

Anonymous 06/28/26(Sun)16:19:41 No.109156807

>>109156773

I have heard people mentioning that related to MTP before and seeing it happen to me I just assumed it is, who knows.
Granted I don't much use Qwen for stories so I couldn't say for sure whether that's about MTP or just normal behavior.

>>109156775

Some people don't get any benefit from it, no idea what's up with that. I have a 5090 perhaps it has something to do with hardware.

Anonymous
06/28/26(Sun)16:20:56 No.109156814

Anonymous 06/28/26(Sun)16:20:56 No.109156814

>>109156790
If any one of them could articulate anti-datacenter like that I'd be totally ok with their opinion, but they all seem to be caricatures of facebook memes saying little more than "AI gon drink all the water!"

Anonymous
06/28/26(Sun)16:21:54 No.109156820

Anonymous 06/28/26(Sun)16:21:54 No.109156820

>>109156646
>Which is to say, money is going in a big circle, and the piper will, eventually, need paid.
Will it though? This is all money since kikes forced through fiat banking and especially since the 0% reserve rate was implemented.
>>109156807
MTP is only better for less determinative tasks and is wasted compute for high variance/temperature/top k jobs.

Anonymous
06/28/26(Sun)16:22:44 No.109156830

Anonymous 06/28/26(Sun)16:22:44 No.109156830

File: jankbox.png (341 KB, 800x800)

341 KB PNG

Anyone built a gpu box with these things? My setup isn't amenable to the better prebuilt options, but it _does_ have a couple of slimsas ports I could jerry-rig into an external inference type thing. I could even do up a high-pressure airflow version if passive GPUs ever become worth less than literal bars of gold

Anonymous
06/28/26(Sun)16:22:47 No.109156831

Anonymous 06/28/26(Sun)16:22:47 No.109156831

>>109156550
>Then, exactly seven days after she vanished, a simple brown package arrives at your door. There is no return address.
>Inside is a framed photograph. It’s a candid shot—you can tell it was taken through a window, perhaps from a car passing by or across the street. It shows you walking out of your building, looking calm and serene.
>Beneath the glass, on the matte frame, a note is written in familiar, elegant handwriting: "You look happy. I'll leave you to it."
One of the rerolls.

Anonymous
06/28/26(Sun)16:25:27 No.109156842

Anonymous 06/28/26(Sun)16:25:27 No.109156842

>>109156761
but there were two halves to the post anone.
the people who aren't using 50 million online services, including jippity, but also care about upgrading their hardware are a select few unhinged weirdos.

Anonymous
06/28/26(Sun)16:28:24 No.109156858

Anonymous 06/28/26(Sun)16:28:24 No.109156858

>>109156831
sysprompt and character card? Who knew GLM would do yandere this good?

Anonymous
06/28/26(Sun)16:30:46 No.109156870

Anonymous 06/28/26(Sun)16:30:46 No.109156870

>>109156858
No card. I just had a spat with her for being an asshole. Then I went for another date had another spat and told her we are incompatible and I am done. Then it turned into a psychological horror.

Anonymous
06/28/26(Sun)16:33:05 No.109156876

Anonymous 06/28/26(Sun)16:33:05 No.109156876

>>109156820
>Will it though?
Yes. You can play finance money games where you invest in a circle, and try to rope banks and shareholders into throwing money into your "completely legit" building scheme, tying up production and running up prices.
When you starting hearing shit about "New (Investing) Paradigm," that's when you know it's all about to hit the fan.
These circles depend on continuous refinancing. Once refinancing stops, participants will have to rely on actual operating cash flow. If those cash flows don't support the obligations... the ride ends.

Anonymous
06/28/26(Sun)16:35:32 No.109156889

Anonymous 06/28/26(Sun)16:35:32 No.109156889

I've seen people irl worry about jobs being taken but I'm pretty sure muh water and muh copyright retardation is exclusive to the terminally online crowd.

Anonymous
06/28/26(Sun)16:35:39 No.109156890

Anonymous 06/28/26(Sun)16:35:39 No.109156890

>>109156807
I have 2 different gpus so maybe that's related. Unfortunate since I can't image offloading to ram more would help either

Anonymous
06/28/26(Sun)16:36:45 No.109156898

Anonymous 06/28/26(Sun)16:36:45 No.109156898

>>109156876
2 more weeks

Anonymous
06/28/26(Sun)16:51:11 No.109156970

Anonymous 06/28/26(Sun)16:51:11 No.109156970

Why do my AI wives always want to play truth or dare?
Is it a common game or is the model just retarded

Anonymous
06/28/26(Sun)16:51:41 No.109156971

Anonymous 06/28/26(Sun)16:51:41 No.109156971

>>109156666
I don't understand this post
>checks digits
Ah, I get it.

Anonymous
06/28/26(Sun)16:52:11 No.109156974

Anonymous 06/28/26(Sun)16:52:11 No.109156974

>>109156889
If I was an artist working at games, I would be shitting my pants now.

Anonymous
06/28/26(Sun)16:52:37 No.109156977

Anonymous 06/28/26(Sun)16:52:37 No.109156977

>>109156970
Does gemma get confused with the game like older models did? I remember playing it on mistral small and it kept messing up the order and the rules of the game.

Anonymous
06/28/26(Sun)16:53:39 No.109156981

Anonymous 06/28/26(Sun)16:53:39 No.109156981

>>109156974
if you're an artist AI won't replace you, it'll just make you better at your job.

Anonymous
06/28/26(Sun)16:53:42 No.109156982

Anonymous 06/28/26(Sun)16:53:42 No.109156982

>>109155998
is there one for 12b?

Anonymous
06/28/26(Sun)16:53:43 No.109156983

Anonymous 06/28/26(Sun)16:53:43 No.109156983

Anyone have success adaption OAM to PCIe without spending massive bank?

Anonymous
06/28/26(Sun)16:54:43 No.109156989

Anonymous 06/28/26(Sun)16:54:43 No.109156989

>>109156981
It will more likely replace 5 artists with 1 artist with AI

Anonymous
06/28/26(Sun)16:55:01 No.109156992

Anonymous 06/28/26(Sun)16:55:01 No.109156992

>>109156970
>Why do...AI...always
The answer is always: deep ruts in latent space and bad sampler technique

Anonymous
06/28/26(Sun)16:55:16 No.109156996

Anonymous 06/28/26(Sun)16:55:16 No.109156996

>>109156974
>>109156981
This. AI sucks at creativity. All it will do is speed up real artists' workflows.

Anonymous
06/28/26(Sun)16:55:34 No.109156997

Anonymous 06/28/26(Sun)16:55:34 No.109156997

>>109156989
troons won't let that happen, not on their watch

Anonymous
06/28/26(Sun)17:02:19 No.109157018

Anonymous 06/28/26(Sun)17:02:19 No.109157018

>>109156814
It's much easier to get people together around something seen as universally good (protecting the environment) rather than around complex economic issues (share of surpluses going to capital vs workers).
Details are irrelevant here, only the general feeling of grievance.

Anonymous
06/28/26(Sun)17:09:39 No.109157045

Anonymous 06/28/26(Sun)17:09:39 No.109157045

>>109156666
very obviously just an act of sabotage, yes. but it could just as easily be domestic idiots that already fell for longer running psyops, or imported types with similar animus towards ze west.

Anonymous
06/28/26(Sun)17:11:40 No.109157052

Anonymous 06/28/26(Sun)17:11:40 No.109157052

>>109156314
>>109153841
>>109154000
>>109155069
>>109155378
bros, what UIs/frontends are you generally using day to day?
I have tried to like open-webui, silly tavern, librechat, llama.cpp's built in UI (llama.cpp is my backend btw) but none have all the features and feel complete, ykwim?
It's not for personal use but for family so multiple accounts, audio and images as input, web search tool, MCP support, streaming support (yes I have to mention this) and such niceties.
I have asked AI models but they mentioned AnythingLLM which looks kind of bland but will give it a try.
Wondering what /lmg/ bros are using, for phones too.
inb4: vibe-coded app that is not available online

Anonymous
06/28/26(Sun)17:16:49 No.109157074

Anonymous 06/28/26(Sun)17:16:49 No.109157074

>>109156870
Kino. Enjoy anon.
t. 2m tokens before plowing gigastacy with GLM 5.2 slowburn
>>109156996
Retard.

Anonymous
06/28/26(Sun)17:19:30 No.109157084

Anonymous 06/28/26(Sun)17:19:30 No.109157084

Is there an equivalent to llama.cpp or stable-diffusion.cpp for TTS, especially qwen3-tts? You would think this was whisper.cpp's bailiwick, but apparently not

Anonymous
06/28/26(Sun)17:21:28 No.109157091

Anonymous 06/28/26(Sun)17:21:28 No.109157091

>>109157052
Marinara for everything but code, LM Studio for file manager+ez cache fitting math, kobold frontend for things I need precise control of the prompt for (code) but I've started doing that in character in Marinara too. Fascinatingly, if you have a "smart character" write your code, it comes out slightly higher quality than the default assistant.

Anonymous
06/28/26(Sun)17:24:08 No.109157105

Anonymous 06/28/26(Sun)17:24:08 No.109157105

>>109156876
>the ride ends.
No, the moneyprinter goes brrrr or your gamestop stocks are forcefully sold. The system will not play by its own rules when it's not convenient to it; those rules are for goyim.

Anonymous
06/28/26(Sun)17:25:27 No.109157112

Anonymous 06/28/26(Sun)17:25:27 No.109157112

File: haha.jpg (160 KB, 577x878)

160 KB JPG

How many 3090s do I have to buy until I can actually do work with locals? I need about 256K to 384K kv at api speeds. Prefer KV size and stability over raw knowledge since I need to teach it my tools anyway. Are there any major dead zones and power spikes, eg. 3 cards being barely any better than 2 and not paying off until you get a 4th one?
Would prefer 3090s at the moment since it's easier for me to find those and I won't have to tear down my entire computer for it.

Anonymous
06/28/26(Sun)17:31:08 No.109157138

Anonymous 06/28/26(Sun)17:31:08 No.109157138

>>109156970
You don't want to try playing Uno with AI.

Anonymous
06/28/26(Sun)17:38:00 No.109157156

Anonymous 06/28/26(Sun)17:38:00 No.109157156

>>109157112
Define "actually do work".
You can automate an entire codebase with a 5090 and either Gemmy 31b or Qwen 27b if you're not a retard as is.

Anonymous
06/28/26(Sun)17:40:56 No.109157167

Anonymous 06/28/26(Sun)17:40:56 No.109157167

File: thermal.png (14 KB, 926x737)

14 KB PNG

Also, how hot do these fuckers get? Would stacking 4 of them in this way melt everything?

>>109157084
TTS.cpp is attempting this, also check out tortoise.cpp and moshi.cpp if you just need any TTS on ggml at all

Anonymous
06/28/26(Sun)17:45:18 No.109157181

Anonymous 06/28/26(Sun)17:45:18 No.109157181

File: 4gbo63.jpg (309 KB, 1280x1280)

309 KB JPG

https://files.catbox.moe/nsjz4a.jpg
https://files.catbox.moe/0a49um.jpg
https://files.catbox.moe/vdtzcf.jpg

Anonymous
06/28/26(Sun)17:45:55 No.109157183

Anonymous 06/28/26(Sun)17:45:55 No.109157183

>>109157167
Don't attempt it, it will make mustang gas

Anonymous
06/28/26(Sun)17:47:02 No.109157189

Anonymous 06/28/26(Sun)17:47:02 No.109157189

>>109157167
Please do this and report back.

Anonymous
06/28/26(Sun)17:47:58 No.109157192

Anonymous 06/28/26(Sun)17:47:58 No.109157192

>>109157183
>mustang gas
so, like, horse farts? hmmmm

Anonymous
06/28/26(Sun)17:49:23 No.109157197

Anonymous 06/28/26(Sun)17:49:23 No.109157197

>>109157181
>https://files.catbox.moe/vdtzcf.jpg
I like this Rin

Anonymous
06/28/26(Sun)17:50:09 No.109157201

Anonymous 06/28/26(Sun)17:50:09 No.109157201

>>109157181
>No stealth character cards
This general sucks now.

Anonymous
06/28/26(Sun)17:51:02 No.109157206

Anonymous 06/28/26(Sun)17:51:02 No.109157206

>>109157201
4chan doesn't strip them out?

llama.cpp CUDA dev !!yhbFjk57TDr
06/28/26(Sun)17:52:21 No.109157210

llama.cpp CUDA dev !!yhbFjk57TDr 06/28/26(Sun)17:52:21 No.109157210

>>109155792
>>109155841
Whether or not a language model was used was 100% irrelevant.
I went on a break 2 days before the PR was opened and I have not looked at Github notifications since.

Anonymous
06/28/26(Sun)17:53:11 No.109157218

Anonymous 06/28/26(Sun)17:53:11 No.109157218

>>109157156
Working == being able to lift a few subagents and reach 256K without becoming glacial, all on a model that is smart enough to not trip over itself writing C and Lua. Some headroom would be nice for an extra kv to just ask it questions, or do the gpu portion of a cpumoe, or diffuse me an image.
The logistics of buying a 3090 are much less complicated for me so I can probably stack up to 4 of them. This build is a stopgap until I can get an equivalent stack of actual workstation cards. Once that happens I'm probably turning this one into a secondary server instead of reselling. I have theoretically infinite utility for AIs, more is always better. It would be nice to know at which point I can wean myself off API entirely though.

Anonymous
06/28/26(Sun)17:53:58 No.109157220

Anonymous 06/28/26(Sun)17:53:58 No.109157220

>>109157210
Hi CUDADude, glad you didn't melt in the thermonuclear German summer.
Dunno if you've checked the backlog, but: any feedback on how well your slimsas to pcie setup works vs on-board pcie slots?

Anonymous
06/28/26(Sun)17:54:11 No.109157222

Anonymous 06/28/26(Sun)17:54:11 No.109157222

>>109157201
You made me dump my migu collection into ST to check. Unfortunately no cards.

Anonymous
06/28/26(Sun)17:55:10 No.109157226

Anonymous 06/28/26(Sun)17:55:10 No.109157226

>>109157206
Hence the catbox upload.
>>109157210
I'm glad you're not dead.

Anonymous
06/28/26(Sun)17:55:29 No.109157227

Anonymous 06/28/26(Sun)17:55:29 No.109157227

>>109157197
I've always hated tanlines until this moment.

Anonymous
06/28/26(Sun)17:56:36 No.109157232

Anonymous 06/28/26(Sun)17:56:36 No.109157232

File: orig-1099524284.jpg (353 KB, 1344x768)

353 KB JPG

>>109157183
>the last of the 386es

Anonymous
06/28/26(Sun)17:58:46 No.109157244

Anonymous 06/28/26(Sun)17:58:46 No.109157244

File: beachmiku55.png (208 KB, 1600x1200)

208 KB PNG

>>109156738
>The task is to draw a detailed and visually compelling SVG image of Hatsune Miku at the beach.
(+your loop)

Anonymous
06/28/26(Sun)18:00:16 No.109157254

Anonymous 06/28/26(Sun)18:00:16 No.109157254

Can silly tavern unload/load llama as needed? I want to include image generation in my local setup as well but it won't fit in my gpu

Anonymous
06/28/26(Sun)18:01:45 No.109157259

Anonymous 06/28/26(Sun)18:01:45 No.109157259

>>109157254
no

Anonymous
06/28/26(Sun)18:01:51 No.109157260

Anonymous 06/28/26(Sun)18:01:51 No.109157260

>>109157254
Memory paging options are backend dependent, not frontend.

Anonymous
06/28/26(Sun)18:02:12 No.109157261

Anonymous 06/28/26(Sun)18:02:12 No.109157261

>>109156981
What it will do is preventing new people from even considering to become artists. People who can already create artwork maybe still have some time left.

Anonymous
06/28/26(Sun)18:02:21 No.109157263

Anonymous 06/28/26(Sun)18:02:21 No.109157263

>>109153820
There are multiple vibeslopped qwen3-tts.cpp versions. Just google or search on github. There is also audio.cpp
Most importantly, nobody has made dots.tts.cpp! I really wanna get dots.tts. Hope it won't get forgotten.

Anonymous
06/28/26(Sun)18:03:11 No.109157265

Anonymous 06/28/26(Sun)18:03:11 No.109157265

>>109157263
this was meant for >>109157084

Anonymous
06/28/26(Sun)18:04:27 No.109157273

Anonymous 06/28/26(Sun)18:04:27 No.109157273

>>109157254
>--no-mmproj-offload
image projection on CPU, slow af but no VRAM

Anonymous
06/28/26(Sun)18:08:32 No.109157293

Anonymous 06/28/26(Sun)18:08:32 No.109157293

>>109157263
https://github.com/CrispStrobe/CrispASR/issues/200
they appear to be working on it.

llama.cpp CUDA dev !!yhbFjk57TDr
06/28/26(Sun)18:10:39 No.109157303

llama.cpp CUDA dev !!yhbFjk57TDr 06/28/26(Sun)18:10:39 No.109157303

>>109157220
I can work from home and have my office in the basement so I'm largely unaffected by the heat.
I did read the previous /lmg/ threads but I can't really comment on how well a setup with adapters would work; I never finished mine because RAM prices exploded right after I finished my prototype with 16 GB.

>>109157226
I'm glad you're not dead too.

Anonymous
06/28/26(Sun)18:11:09 No.109157306

Anonymous 06/28/26(Sun)18:11:09 No.109157306

>>109157189
Stick around for at least half a year then because stacking the cards will take me months, and modding the monstrous case another few. But I'm gonna do it. The case itself is already semi-open with mesh everywhere, I think it'll be fine. The planar cards would be a couple inches away from the normal ones, they have to anyway cause of the power cables. If this works out I can probably stick a full atx mobo in this too, but does cpu and ram make any difference at this point? This build caps at like a single epyc and 256 or 512 gigs of ddr5. Currently I have 128 ddr4 on a gamer trash mobo.

Anonymous
06/28/26(Sun)18:13:05 No.109157319

Anonymous 06/28/26(Sun)18:13:05 No.109157319

>>109157293
nice

Anonymous
06/28/26(Sun)18:15:28 No.109157328

Anonymous 06/28/26(Sun)18:15:28 No.109157328

>>109157306
I dream of building a GPGPU version of a QNAP or Synology box:
slick looking case with a tiny cpu, giant fans and massive quiet airflow over a fuckton of passively cooled GPU cards with nvlink-style vram pooling and a pair of 100gbe QSFP connections to the network backbone so any machine on the network can just use it like a utility...
Some nice person give me money...I want to prototype this make it real

Anonymous
06/28/26(Sun)18:20:40 No.109157360

Anonymous 06/28/26(Sun)18:20:40 No.109157360

>>109157084
I've been using https://github.com/predict-woo/qwen3-tts.cpp for the past months. It's a dead project but it's what I needed for actual fast TTS generation using qwen3-tts.

Built a http wrapper around it to provide an openai compatible speech endpoint so I can integrate it wherever.

Anonymous
06/28/26(Sun)18:21:26 No.109157365

Anonymous 06/28/26(Sun)18:21:26 No.109157365

>>109157306
>Stick around for at least half a year then
I live here. I'll be looking forward to it.
>but does cpu and ram make any difference at this point?
Yes if you go for a server motherboard (quad channel DDR5 or more) and 256GB+ RAM. Then you get to run big MoEs at 8 to 20t/s with split mode graph and dense layers with some routed experts in VRAM using ik_llama. Mainline has TP now too apparently but I don't know how well it works.

Anonymous
06/28/26(Sun)18:24:30 No.109157386

Anonymous 06/28/26(Sun)18:24:30 No.109157386

>>109157220
I have one with cheap bifurcation splitters and slimsas 8i, one pcb per card (so x8 each). Can't really test them well since I have ewaste plugged in, but even with pcie gen 3 I have some cards drop to x4. Maybe it'll get fixed when I reassemble it in a bit, who knows. Other than that it works well.

Anonymous
06/28/26(Sun)18:32:16 No.109157427

Anonymous 06/28/26(Sun)18:32:16 No.109157427

>>109157328
For me it's the bugout potential. If you can't lift it you don't really own it. If some bozo sets the building on fire I can salvage 98% of my net worth just taking this with me and be out in 3 minutes, then be up and running in a hotel like 3 hours later.
The case could actually fit four 2-slot cards inside with some hacksawing but that requires blower coolers, at which point I should just get proper workstation cards that were actually made for stacking.

Anonymous
06/28/26(Sun)18:33:34 No.109157433

Anonymous 06/28/26(Sun)18:33:34 No.109157433

>>109157427
If someone is casually arsoning your house, you have far bigger problems than worrying about your net worth and should be loo/k/ing for different solutions to problems like that.

Anonymous
06/28/26(Sun)18:34:47 No.109157439

Anonymous 06/28/26(Sun)18:34:47 No.109157439

Local sesame maya for JOI when?

Anonymous
06/28/26(Sun)18:35:07 No.109157442

Anonymous 06/28/26(Sun)18:35:07 No.109157442

File: file.png (23 KB, 896x242)

23 KB PNG

oh yeah, it's all coming together

Anonymous
06/28/26(Sun)18:36:34 No.109157456

Anonymous 06/28/26(Sun)18:36:34 No.109157456

>>109157084
https://github.com/0xShug0/audio.cpp

Anonymous
06/28/26(Sun)18:37:25 No.109157464

Anonymous 06/28/26(Sun)18:37:25 No.109157464

M5 ultra 768GB waiting room

Anonymous
06/28/26(Sun)18:37:51 No.109157468

Anonymous 06/28/26(Sun)18:37:51 No.109157468

>>109157464
$50,000 + tip

Anonymous
06/28/26(Sun)18:39:37 No.109157478

Anonymous 06/28/26(Sun)18:39:37 No.109157478

>>109157468
$50000 + tip = my tip sticky

Anonymous
06/28/26(Sun)18:47:08 No.109157508

Anonymous 06/28/26(Sun)18:47:08 No.109157508

>>109157365
For waifu purposes that speed may be okay but slopcoding would be rather awful unless this scales to like 20 parallel instances with little loss. I'm getting the hunch that a smaller model with a lot of kv and maximum thinking effort punches harder here. With smaller models I get to do all sorts of steering and tuning tricks too.
Fattest model I probably care about is current fat Qwen or the blessed 200-a22 2507 instruct (though its lack of mmproj hurts). Basically at this size it must be usable in instruct or with rudimentary templated thinking.
For work I just need something that passes for outdated Opus quality in its first 64K if you squint right and have had a pint or two. The remainder I can make up for with deslopping and kv savings. The primary utility of LLMs for me is putting up with retarded library interfacing rituals I don't have the lifespan for.

Anonymous
06/28/26(Sun)18:47:30 No.109157511

Anonymous 06/28/26(Sun)18:47:30 No.109157511

>>109156321
this is a very funny image

Anonymous
06/28/26(Sun)18:58:17 No.109157570

Anonymous 06/28/26(Sun)18:58:17 No.109157570

>>109155386
Huh, neat. This seems to improve my Gemma output thanks
What sampler config do you use?

Anonymous
06/28/26(Sun)19:24:03 No.109157706

Anonymous 06/28/26(Sun)19:24:03 No.109157706

Would a Ryzen AI Max+ 395 machine and a decent Nvidia GPU make for a decent "cost benefit" jank ass AI home lab solution to fuck around with LLMs, image/video gen, AI audio, etc?
Are there even any AI Max+ 395 computers that can accept a discrete GPU without relying on thunderbolt as an interface?

Anonymous
06/28/26(Sun)19:26:20 No.109157714

Anonymous 06/28/26(Sun)19:26:20 No.109157714

>>109156321
model?

Anonymous
06/28/26(Sun)19:29:49 No.109157730

Anonymous 06/28/26(Sun)19:29:49 No.109157730

>>109157706
>decent "cost benefit" jank ass AI home lab solution to fuck around with LLMs, image/video gen, AI audio, etc?
would have to be a 5090. also using the apu/unified memory with the nvidia gpu would suck and would require vulkan or something
>Are there even any AI Max+ 395 computers that can accept a discrete GPU without relying on thunderbolt as an interface?
i dont think so. might be better to wait for the 495 which i think is coming sometime later this year

Anonymous
06/28/26(Sun)19:33:53 No.109157750

Anonymous 06/28/26(Sun)19:33:53 No.109157750

>>109157730
>also using the apu/unified memory with the nvidia gpu would suck and would require vulkan or something
At least for llama.cpp I think you can use the RocM and the CUDA backend together.
And the iGPU with Vulkan, or even just using the CPU backend would still perform better than a regular desktop with 128gb of RAM right?

>>109157730
>might be better to wait for the 495 which i think is coming sometime later this year
Really?
Alright, I'll keep an eye out for it then.

Anonymous
06/28/26(Sun)19:36:01 No.109157759

Anonymous 06/28/26(Sun)19:36:01 No.109157759

I'm trying to use MTP with gemma but I'm getting the following error, which is preventing me from setting a specific context size.
E llama_init_from_model: failed to initialize the context: Gemma4Assistant requires ctx_other to be set (this is normal during memory fitting)
How do I fix? There doesn't seem to be a "ctx_other" flag to add to my launch command??

Anonymous
06/28/26(Sun)19:36:33 No.109157763

Anonymous 06/28/26(Sun)19:36:33 No.109157763

File: 2309569d.png (151 KB, 1056x846)

151 KB PNG

>>109157714
gemmers 31B Q8
see her evolution >>109154616

Anonymous
06/28/26(Sun)19:36:34 No.109157764

Anonymous 06/28/26(Sun)19:36:34 No.109157764

>>109157442
>ollama
nice bait

Anonymous
06/28/26(Sun)19:39:15 No.109157776

Anonymous 06/28/26(Sun)19:39:15 No.109157776

>>109157365
mainline TP only works well with dense models. RAM-heavy MoE setups are out of the question for now

Anonymous
06/28/26(Sun)19:40:34 No.109157782

Anonymous 06/28/26(Sun)19:40:34 No.109157782

File: file.png (196 KB, 1255x1148)

196 KB PNG

Anonymous
06/28/26(Sun)19:40:34 No.109157783

Anonymous 06/28/26(Sun)19:40:34 No.109157783

>>109157306
>This build caps at like a single epyc and 256 or 512 gigs of ddr5
>>109157365
>server motherboard (quad channel DDR5 or more
Go for 512 8 channel if you can.
256 quad ddr5 fag here, wasting my life on cope quants and rpc

Anonymous
06/28/26(Sun)19:54:41 No.109157833

Anonymous 06/28/26(Sun)19:54:41 No.109157833

My biggest contribution to the LLM scene was coining the term cope quant and noone will ever know my name

Anonymous
06/28/26(Sun)19:59:39 No.109157855

Anonymous 06/28/26(Sun)19:59:39 No.109157855

>>109157730
>>109157750
AFAIK, the only thing changed from 395 to 495 is a 100mhz clock bump for the CPU. That, and the unified memory cap bumped from 128gb to 192gb.

Anonymous
06/28/26(Sun)20:01:41 No.109157861

Anonymous 06/28/26(Sun)20:01:41 No.109157861

>>109157759
>this is normal during memory fitting
ignore that line entirely i understand it's a scawy colour
try without fit?
post full log w/ --log-verbosity 4

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.