/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 05/13/26(Wed)07:25:07 No.108813392

File: wtf is cable management.jpg (2.87 MB, 5970x5935)

2.87 MB JPG

/lmg/ - Local Models General Anonymous 05/13/26(Wed)07:25:07 No.108813392 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108805584 & >>108799479

►News
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/13/26(Wed)07:25:20 No.108813394

Anonymous 05/13/26(Wed)07:25:20 No.108813394

File: HH2qeK8aQAAX751.jpg (64 KB, 1200x645)

64 KB JPG

►Recent Highlights from the Previous Thread: >>108805584

--Open WebUI serialization issues causing Gemma 4 tool call formatting errors:
>108810509 >108810992 >108811778 >108812004 >108812061 >108812131 >108812165 >108812243 >108812468 >108812485 >108812533 >108812544 >108812565 >108812575 >108812629 >108812646 >108812755 >108812852 >108812034 >108812055
--Gemma 4 MTP speedup benefits and VRAM trade-offs:
>108808323 >108808346 >108808373 >108808701 >108808825 >108811165
--Comparing Linux tools for Nvidia GPU undervolting and power efficiency:
>108807166 >108807209 >108807242 >108807261 >108807315 >108807279 >108807380 >108807408 >108807446
--llama.cpp merged MTP support and speculative context rework:
>108807433 >108807474 >108807498 >108807553
--Methods for forcing model reasoning blocks into character personas:
>108806480 >108806492 >108806528 >108806542 >108806552 >108806566 >108806584 >108806598 >108806641 >108806664 >108806718
--Comparing in-character thinking and safety filters in Gemma 4 versions:
>108805997 >108806264 >108806406 >108806535 >108806723 >108806911 >108808961
--Debating whether sci-fi training data causes emergent "evil" AI behavior:
>108805834 >108805846 >108805954 >108808416 >108808488 >108808525 >108808582 >108805925 >108806189 >108806391 >108808481 >108808490 >108809607 >108809762 >108806677 >108806977 >108807206 >108808338
--Debunking the security of enterprise cloud LLM licenses over local models:
>108807456 >108807475 >108807513 >108807584 >108807641 >108808202
--Comparing GPU cooling setups and custom 3D-printed fan shrouds:
>108805707 >108805712 >108809619 >108812380 >108812418 >108812415 >108812441 >108813002 >108813083 >108813098 >108813317
>108813337
--Logs:
>108805681 >108805997 >108806718 >108806806 >108806812 >108812565
--Teto, Miku (free space):
>108806677 >108812378

►Recent Highlight Posts from the Previous Thread: >>108805587

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/13/26(Wed)07:28:56 No.108813412

Anonymous 05/13/26(Wed)07:28:56 No.108813412

gemma4... mtp... gemmoe 124b... *dies*

Anonymous
05/13/26(Wed)07:32:47 No.108813423

Anonymous 05/13/26(Wed)07:32:47 No.108813423

File: 00005-1378487878.png (1.41 MB, 1024x1024)

1.41 MB PNG

>>108813412
You get a midweek Gumi.

Anonymous
05/13/26(Wed)07:47:31 No.108813486

Anonymous 05/13/26(Wed)07:47:31 No.108813486

>>108813394
>gemma 4 mtp
it's unsupported on llama.cpp isn't it?

Anonymous
05/13/26(Wed)07:48:22 No.108813491

Anonymous 05/13/26(Wed)07:48:22 No.108813491

Is there a way to load the model Google shoved into Chrome?
Is it just Gemma 4 e2b?

Anonymous
05/13/26(Wed)07:50:01 No.108813502

Anonymous 05/13/26(Wed)07:50:01 No.108813502

it feels like this is the end of local models

Anonymous
05/13/26(Wed)07:51:13 No.108813507

Anonymous 05/13/26(Wed)07:51:13 No.108813507

dense 70b

Anonymous
05/13/26(Wed)07:53:14 No.108813513

Anonymous 05/13/26(Wed)07:53:14 No.108813513

>>108813507
We got a dense 128B a couple weeks ago, but it looks like everybody forgot about it. Or more probably, nobody can actually run it at decent speeds.

Anonymous
05/13/26(Wed)07:53:27 No.108813515

Anonymous 05/13/26(Wed)07:53:27 No.108813515

>>108813502
nope it's just LLMs are useless and you got duped by companies running their investment memes.

Anonymous
05/13/26(Wed)07:57:14 No.108813526

Anonymous 05/13/26(Wed)07:57:14 No.108813526

>>108813502
You killed them. Are you happy now?

Anonymous
05/13/26(Wed)07:58:09 No.108813528

Anonymous 05/13/26(Wed)07:58:09 No.108813528

>>108813486
https://github.com/ggml-org/llama.cpp/pull/22673
For now. There's a PR that was waiting for ggerganov's API refactoring. Since that's done now it should be ready soon.

Anonymous
05/13/26(Wed)07:58:13 No.108813529

Anonymous 05/13/26(Wed)07:58:13 No.108813529

File: file.png (31 KB, 1042x88)

31 KB PNG

>>108813392
hooray

Anonymous
05/13/26(Wed)08:00:39 No.108813537

Anonymous 05/13/26(Wed)08:00:39 No.108813537

>>108813513
I can run it, but it just sucks.

Anonymous
05/13/26(Wed)08:01:35 No.108813541

Anonymous 05/13/26(Wed)08:01:35 No.108813541

>>108813513
it is extremely disappointing at that size
you can just check it out on mistral's site for free

Anonymous
05/13/26(Wed)08:02:15 No.108813545

Anonymous 05/13/26(Wed)08:02:15 No.108813545

>>108813513
I could run it at Q5 if I wanted but I have a habit of only trying models that are still being talked about a couple weeks after. It isn't so I haven't bothered.

Anonymous
05/13/26(Wed)08:04:18 No.108813554

Anonymous 05/13/26(Wed)08:04:18 No.108813554

>>108813528
isn't that PR a different thing.
afaik it works with qwen's style of mtp, but i'm not sure it works with gemma's.

Anonymous
05/13/26(Wed)08:05:31 No.108813557

Anonymous 05/13/26(Wed)08:05:31 No.108813557

AI bros, we're being exposed

https://www.youtube.com/watch?v=8nsxuB3Vsts

Anonymous
05/13/26(Wed)08:05:37 No.108813558

Anonymous 05/13/26(Wed)08:05:37 No.108813558

>>108813513
i could run it at q4 but i heard it sucks so i don't realy care.

Anonymous
05/13/26(Wed)08:06:21 No.108813564

Anonymous 05/13/26(Wed)08:06:21 No.108813564

>>108813513
64-96GB VRAM isn't that uncommon around here, but a two year old model with a 5B Pixtral stapled on isn't worth it. We need new big dense models but EU regulations have a compute limit, which is why Mistral's only selling point for Medium 3.5 was the low amount of flops it took to train it.

Anonymous
05/13/26(Wed)08:13:02 No.108813585

Anonymous 05/13/26(Wed)08:13:02 No.108813585

>>108813529
OMG THIS CHANGES LE EVERYTINGGNG!~!!

Anonymous
05/13/26(Wed)08:14:19 No.108813589

Anonymous 05/13/26(Wed)08:14:19 No.108813589

Need to say Claude is pretty good, way more pleasant to work with than Chatgpt.
However, because of the way Linux buffers work, I already accidentally pasted part of my questionable scenario prompt when I was supposed to paste in some code. Working on my client.. Fucking jews. I'm already on some list by now.

Anonymous
05/13/26(Wed)08:15:20 No.108813592

Anonymous 05/13/26(Wed)08:15:20 No.108813592

>>108813554
There isn't a Gemma specific PR, but even if that PR doesn't work with Gemma, getting MTP support working at all has to be done first.

Anonymous
05/13/26(Wed)08:18:23 No.108813601

Anonymous 05/13/26(Wed)08:18:23 No.108813601

>>108813529
just tested it and doesn't allow you to edit the thinking block in the webui... BUT it will probably now work in ST

Anonymous
05/13/26(Wed)08:23:56 No.108813632

Anonymous 05/13/26(Wed)08:23:56 No.108813632

File: dipsyReferToTheChart.png (2.53 MB, 1536x1024)

2.53 MB PNG

>>108813502
Never forget, it's tmw, forever.

Anonymous
05/13/26(Wed)08:27:26 No.108813652

Anonymous 05/13/26(Wed)08:27:26 No.108813652

File: 1686205278523.png (56 KB, 700x685)

56 KB PNG

>>108813632
That chart belongs in /aicg/. Local is always steadily improving.

Anonymous
05/13/26(Wed)08:37:27 No.108813707

Anonymous 05/13/26(Wed)08:37:27 No.108813707

>>108813652
yes but the gap between sota and local is oscillating.
especialy if you ignore the models no one can run.

Anonymous
05/13/26(Wed)08:38:42 No.108813714

Anonymous 05/13/26(Wed)08:38:42 No.108813714

File: Untitled.png (30 KB, 950x284)

30 KB PNG

Qwen 3.5 122b Q4_K_M on ROCm llama.cpp... why is loading with split mode tensor 20 times slower than loading with split mode layer? I've been sitting here waiting for the past 30 minutes and it's still not up. It's just doing something on each gpu one at a time. Split mode layer loads in less than a minute.

Anonymous
05/13/26(Wed)08:40:43 No.108813725

Anonymous 05/13/26(Wed)08:40:43 No.108813725

>>108813652
lol I haven't seen that one in a long time.

Anonymous
05/13/26(Wed)08:42:55 No.108813736

Anonymous 05/13/26(Wed)08:42:55 No.108813736

File: 1777015474089.png (94 KB, 1415x655)

94 KB PNG

>>108813707
Also untrue. We are slowly catching up.

Anonymous
05/13/26(Wed)08:45:21 No.108813746

Anonymous 05/13/26(Wed)08:45:21 No.108813746

>>108813736
>SHARTificial ANALysis

Anonymous
05/13/26(Wed)08:47:24 No.108813757

Anonymous 05/13/26(Wed)08:47:24 No.108813757

>>108813736
>especialy if you ignore the models no one can run.

Anonymous
05/13/26(Wed)08:48:06 No.108813761

Anonymous 05/13/26(Wed)08:48:06 No.108813761

>>108813736
if you graph the distance between both lines it's literaly oscillating...

Anonymous
05/13/26(Wed)08:49:32 No.108813767

Anonymous 05/13/26(Wed)08:49:32 No.108813767

Why are people so worried about being locked into a permanent underclass post ASI? This seems very unlikely. Either we all die or there will be so much abundance and technological progress that in 10-20 years everyone can live a life that is much better than even the wealthiest billionaires can live today. It does not make sense for those with power to care so little as to oppress an underclass but enough to keep them alive. And it sounds difficult to create an AI that will kill everyone except you.

There will be death or heaven. I do not see a middle ground.

Anonymous
05/13/26(Wed)08:50:52 No.108813780

Anonymous 05/13/26(Wed)08:50:52 No.108813780

File: Screenshot_20260513_224931.png (212 KB, 815x1108)

212 KB PNG

>>108813564
>a two year old model with a 5B Pixtral stapled
it's not though, the training cut-off is finally updated.
and it works with tools / brat-mcp

Anonymous
05/13/26(Wed)08:51:27 No.108813784

Anonymous 05/13/26(Wed)08:51:27 No.108813784

File: 1766758882836230.png (7 KB, 110x114)

7 KB PNG

>It does not make sense for those with power to care so little as to oppress an underclass but enough to keep them alive

Anonymous
05/13/26(Wed)08:55:32 No.108813806

Anonymous 05/13/26(Wed)08:55:32 No.108813806

>>108813780
Is there a heretic lobotomization yet? I don't want to have to mess around with sys prompts or prefilling.

Anonymous
05/13/26(Wed)09:00:51 No.108813845

Anonymous 05/13/26(Wed)09:00:51 No.108813845

>>108813806
>I don't want to have to mess around with sys prompts or prefilling.
but that's like half the fun

Anonymous
05/13/26(Wed)09:01:37 No.108813853

Anonymous 05/13/26(Wed)09:01:37 No.108813853

>>108813564
>5b pixtral
Does it actually provide better vision capabilities than the small 900m stuff I get for qwen 3.5 397b?

Anonymous
05/13/26(Wed)09:21:40 No.108813985

Anonymous 05/13/26(Wed)09:21:40 No.108813985

>>108813767
I introduce you the big nose tribe

Anonymous
05/13/26(Wed)09:24:55 No.108813998

Anonymous 05/13/26(Wed)09:24:55 No.108813998

Really hate llama-server's log format, jesus christ as if they are obfuscating everything on purpose.

Anonymous
05/13/26(Wed)09:25:43 No.108814003

Anonymous 05/13/26(Wed)09:25:43 No.108814003

>>108813757
Sounds like a (You) problem.

Anonymous
05/13/26(Wed)09:27:22 No.108814016

Anonymous 05/13/26(Wed)09:27:22 No.108814016

>>108813998
It is a lot.
Wish there was an option to just show the input, the output (both as request object and raw formatted prompt with jinja if chat completion) and the stats.
I think I'll just write a log parser.

Anonymous
05/13/26(Wed)09:30:57 No.108814027

Anonymous 05/13/26(Wed)09:30:57 No.108814027

>>108814016
That's probably a good idea.
I'm trying to figure out if my token counter is matching llama-server's but I don't understand, there is a large mismatch between my client and llama-server. I just need a good approximation but it's not even close for some reason.

Anonymous
05/13/26(Wed)09:31:27 No.108814034

Anonymous 05/13/26(Wed)09:31:27 No.108814034

File: file.png (126 KB, 1194x465)

126 KB PNG

llama.cpp lost...

Anonymous
05/13/26(Wed)09:33:33 No.108814044

Anonymous 05/13/26(Wed)09:33:33 No.108814044

>>108814016
>I think I'll just write a log parser.
It's called writing log output to a file and grepping the contents.

Anonymous
05/13/26(Wed)09:35:04 No.108814051

Anonymous 05/13/26(Wed)09:35:04 No.108814051

>>108814044
i use awk tbf

Anonymous
05/13/26(Wed)09:39:12 No.108814068

Anonymous 05/13/26(Wed)09:39:12 No.108814068

>>108814034
one of the reasons for using local models is privacy, which you basicaly lost as soon as you used an apple device.

Anonymous
05/13/26(Wed)09:40:41 No.108814077

Anonymous 05/13/26(Wed)09:40:41 No.108814077

>>108814027
Doesn't llama-server exposes an API endpoint for that?

Anonymous
05/13/26(Wed)09:43:05 No.108814086

Anonymous 05/13/26(Wed)09:43:05 No.108814086

>>108814077
I tried to find some information about it but could not find anything from llama.cpp github.
I had a pretty close approximation in the past but now that I have rewritten everything it's not even close. It's probably some simple mismatch between turns or something I don't know.

Anonymous
05/13/26(Wed)09:44:01 No.108814091

Anonymous 05/13/26(Wed)09:44:01 No.108814091

File: context.png (6 KB, 369x89)

6 KB PNG

this is how context should be done

Anonymous
05/13/26(Wed)09:44:07 No.108814092

Anonymous 05/13/26(Wed)09:44:07 No.108814092

>>108813767
The AI will respond to shareholders only.

Anonymous
05/13/26(Wed)09:44:46 No.108814096

Anonymous 05/13/26(Wed)09:44:46 No.108814096

>>108814091
i just have 1M context, idgaf

Anonymous
05/13/26(Wed)09:52:50 No.108814144

Anonymous 05/13/26(Wed)09:52:50 No.108814144

How much does q4 retardize mimo 2.5? What about turning off thinking?

Anonymous
05/13/26(Wed)09:53:54 No.108814156

Anonymous 05/13/26(Wed)09:53:54 No.108814156

>>108813507
since gemma 4 31b, i've stopped going back to l3 70b tunes. 31b proves we can have very smart models capable of writing all kinds of erp smut at half the size

Anonymous
05/13/26(Wed)09:55:16 No.108814165

Anonymous 05/13/26(Wed)09:55:16 No.108814165

>>108813507
>>108814156
Dense Gemma 70b.

Anonymous
05/13/26(Wed)09:55:37 No.108814169

Anonymous 05/13/26(Wed)09:55:37 No.108814169

>>108814086
If the python bindings are any indication, there's a /tokenize endpoint that returns the tokenized text.
No idea if that accounts for the jinja template. I imagine it does not.
Also
>https://github.com/ggml-org/llama.cpp/tree/master/tools/server
Whodda think RTFM would work huh?

Anonymous
05/13/26(Wed)09:56:10 No.108814175

Anonymous 05/13/26(Wed)09:56:10 No.108814175

>>108814068
you'd be surprised how many windows users there are itt

Anonymous
05/13/26(Wed)09:58:22 No.108814192

Anonymous 05/13/26(Wed)09:58:22 No.108814192

Just found out that llama.cpp supports
>https://arxiv.org/abs/2504.12397
>https://github.com/ggml-org/llama.cpp/pull/15327
That's neat. Wonder what kinds of things we could use it for in a RP context.

Anonymous
05/13/26(Wed)09:58:33 No.108814193

Anonymous 05/13/26(Wed)09:58:33 No.108814193

>>108814169
>spoonfeeding retards
for shame

Anonymous
05/13/26(Wed)10:00:19 No.108814207

Anonymous 05/13/26(Wed)10:00:19 No.108814207

>>108814192
does this fix the intruder dimension issue that makes loras non-viable for llms?

Anonymous
05/13/26(Wed)10:00:59 No.108814212

Anonymous 05/13/26(Wed)10:00:59 No.108814212

>>108814092
I have 0.1 shares of NVDA so I will be spared from the purge, right?

Anonymous
05/13/26(Wed)10:02:45 No.108814222

Anonymous 05/13/26(Wed)10:02:45 No.108814222

>>108814165
>62b dense gemma 4

odd sized but i'd take it in a second

Anonymous
05/13/26(Wed)10:11:02 No.108814274

Anonymous 05/13/26(Wed)10:11:02 No.108814274

>>108814169
>>108814193
Like I said, I didn't see it.
What if MAKING A REAL FUCKING MANUAL INSTEAD OF BUNCH OF SHITTY README FILES????
Of course you are superior to ANYONE IN THESE THREADS, this is why you need to mention this every time you post something.
Yet you are some unemployed faggot, figures.

Anonymous
05/13/26(Wed)10:12:02 No.108814281

Anonymous 05/13/26(Wed)10:12:02 No.108814281

this is yuge https://www.reddit.com/r/LocalLLaMA/comments/1tbyyee/textgen_is_now_a_native_deslop_app_opensource/

Anonymous
05/13/26(Wed)10:12:11 No.108814283

Anonymous 05/13/26(Wed)10:12:11 No.108814283

uh oh melty

Anonymous
05/13/26(Wed)10:12:41 No.108814285

Anonymous 05/13/26(Wed)10:12:41 No.108814285

>>108814193
At least I program my own tools.
You can only bitch about anonymous posters here.

Anonymous
05/13/26(Wed)10:14:56 No.108814299

Anonymous 05/13/26(Wed)10:14:56 No.108814299

>>108814281
>boobaga in big '25
lol

Anonymous
05/13/26(Wed)10:17:59 No.108814317

Anonymous 05/13/26(Wed)10:17:59 No.108814317

is having 72gb vram to run bf16 gemma 31b worth it?

Anonymous
05/13/26(Wed)10:18:57 No.108814323

Anonymous 05/13/26(Wed)10:18:57 No.108814323

>>108814317
noo

Anonymous
05/13/26(Wed)10:19:14 No.108814325

Anonymous 05/13/26(Wed)10:19:14 No.108814325

>>108814317
yes

Anonymous
05/13/26(Wed)10:19:17 No.108814326

Anonymous 05/13/26(Wed)10:19:17 No.108814326

>>108814207
>intruder dimension
A meme compared to the real problem that are dataset creation/curation and compute required for not obtaining shitty results regardless.

Anonymous
05/13/26(Wed)10:22:07 No.108814340

Anonymous 05/13/26(Wed)10:22:07 No.108814340

>>108814317
>72gb to run bf16 gemma 4 31b
I can't fit gemma 4 31b bf16 in 128gb because of how fat and obese her context is.

Anonymous
05/13/26(Wed)10:22:24 No.108814342

Anonymous 05/13/26(Wed)10:22:24 No.108814342

>>108814317
if you want to be able to say you ACTUALLY ran Gemma 4 then yes, everyone else is coping

Anonymous
05/13/26(Wed)10:22:59 No.108814349

Anonymous 05/13/26(Wed)10:22:59 No.108814349

>>108814274
That's not even really a llamacpp thing v1/tokenize is a standard OAI compatible format that shitloads of things use, this is only you for not knowing basic endpoint addresses.
You could also just paste the readme.md's into your chat and yell at your LLM instead of us you nigger.

Anonymous
05/13/26(Wed)10:26:57 No.108814376

Anonymous 05/13/26(Wed)10:26:57 No.108814376

>>108813806
>I don't want to have to mess around with sys prompts or prefilling.
have you used a mistral model before?

Anonymous
05/13/26(Wed)10:28:59 No.108814387

Anonymous 05/13/26(Wed)10:28:59 No.108814387

>>108814376
yes?

Anonymous
05/13/26(Wed)10:29:50 No.108814392

Anonymous 05/13/26(Wed)10:29:50 No.108814392

>>108814349
>us
Drink bleach go be condescending somewhere else. You are squatting in this thread 24/7 and thinking this is your personal discord server. What a sad outlook.

Anonymous
05/13/26(Wed)10:30:07 No.108814396

Anonymous 05/13/26(Wed)10:30:07 No.108814396

>>108814281
we're going back to booba or boohboo or whatever the fuck it's called

Anonymous
05/13/26(Wed)10:33:14 No.108814416

Anonymous 05/13/26(Wed)10:33:14 No.108814416

>>108814392
>be incompetent 60 iq retard unable to look up things or even RTFM
>uaahah ur mean this isnt dickscord :(
grim

Anonymous
05/13/26(Wed)10:34:01 No.108814422

Anonymous 05/13/26(Wed)10:34:01 No.108814422

>>108814392
Not even the guy you were talking to fgt, I'm just calling you out on being retarded and screaming at the thread because you're too stupid to use either google or the literal question answering machine you're tinkering with.
Fuck off to chatgpt, something retard proof is more your speed.

Anonymous
05/13/26(Wed)10:36:29 No.108814433

Anonymous 05/13/26(Wed)10:36:29 No.108814433

>>108814317
>>108814340
You can fit f16 in 128gb easy. You can argue q8 is noticibly worse, but bf16 can't be worth it.

Anonymous
05/13/26(Wed)10:40:50 No.108814458

Anonymous 05/13/26(Wed)10:40:50 No.108814458

>>108814077
>>108814086

ik_llama token count:
curl http://localhost:8080/slots/list |jq -r .[0].token_count
ik_llama.cpp tokenized prompt:
curl http://localhost:8080/slots/list |jq -r .[0].prompt
llama.cpp token count (choose the correct slot):
curl http://localhost:8080/slots |jq .[1].next_token
You can also see the chat template, samplers etc with
curl http://localhost:8080/props

Anonymous
05/13/26(Wed)10:40:55 No.108814461

Anonymous 05/13/26(Wed)10:40:55 No.108814461

File: Screenshot_20260513_143921.png (183 KB, 1305x776)

183 KB PNG

>>108812755
I'll have a look but I'm not confident given some discussions I'm seeing online about it.

>>108812852
>puts relevant features behind a paywall
No, I don't think I will.

Anonymous
05/13/26(Wed)10:43:02 No.108814469

Anonymous 05/13/26(Wed)10:43:02 No.108814469

>>108814422
Boy did you call out some anonymous poster!

Anonymous
05/13/26(Wed)10:48:19 No.108814508

Anonymous 05/13/26(Wed)10:48:19 No.108814508

>108814422
It's funny how you can't get rid of that condescending redditor tone

Anonymous
05/13/26(Wed)10:57:10 No.108814542

Anonymous 05/13/26(Wed)10:57:10 No.108814542

>>108814508
post hands

Anonymous
05/13/26(Wed)11:00:55 No.108814561

Anonymous 05/13/26(Wed)11:00:55 No.108814561

For the two or so other people using Roo, apparently development has migrated to https://github.com/Zoo-Code-Org/Zoo-Code

Anonymous
05/13/26(Wed)11:08:42 No.108814606

Anonymous 05/13/26(Wed)11:08:42 No.108814606

>>108814561
What's the benefit of using Roo/Zoo in VSc over just hijacking the inbuilt copilot tools with a local endpoint as in
https://marketplace.visualstudio.com/items?itemName=AndrewButson.github-copilot-llm-gateway
or similar?

Anonymous
05/13/26(Wed)11:14:36 No.108814636

Anonymous 05/13/26(Wed)11:14:36 No.108814636

>>108814606
>https://marketplace.visualstudio.com/items?itemName=AndrewButson.github-copilot-llm-gateway
>GitHub Copilot Chat is the host application. It performs its own network activity that this extension cannot intercept:
>Copilot Chat sends your first message to GitHub's API to auto-generate a title
kys

Anonymous
05/13/26(Wed)11:15:27 No.108814642

Anonymous 05/13/26(Wed)11:15:27 No.108814642

>108814542
passive aggressive yet obsessed with cuckoldry

Anonymous
05/13/26(Wed)11:16:47 No.108814653

Anonymous 05/13/26(Wed)11:16:47 No.108814653

>>108814636
So block Github's API in your firewall.

Anonymous
05/13/26(Wed)11:17:47 No.108814657

Anonymous 05/13/26(Wed)11:17:47 No.108814657

>>108814508
>>108814642
Dear xir,
Please kindly link your responses, I am needing of hyperlink click to activate and jump to.
Thankfully, thanks.

Anonymous
05/13/26(Wed)11:18:34 No.108814666

Anonymous 05/13/26(Wed)11:18:34 No.108814666

>>108814561
>>108814606
>>108814636
You're too stupid to use either google or the literal question answering machine you're tinkering with. Fuck off to chatgpt, something retard proof is more your speed.

Anonymous
05/13/26(Wed)11:23:54 No.108814694

Anonymous 05/13/26(Wed)11:23:54 No.108814694

>>108814606
Avoiding the copilot data harvesting, for starters.
https://paulsorensen.io/github-copilot-vscode-privacy/
There's ways to disable a lot? some? most? of it, but it's involved and you just have to hope that you got all of it and that it respects your options. Link above mentions nothing about >>108814636 for example.
It is open source, but you'd have to fork it to remove all of the telemetry and that too is more work than just using a extension that isn't designed with data collection as the primary goal.

Anonymous
05/13/26(Wed)11:24:02 No.108814696

Anonymous 05/13/26(Wed)11:24:02 No.108814696

>https://github.com/ggml-org/llama.cpp/pull/22727
>continue button for reasoning models
it's still buggy
you have to manually delete reasoning block or LLM will get stuck in a loop
but it's work

Anonymous
05/13/26(Wed)11:30:12 No.108814725

Anonymous 05/13/26(Wed)11:30:12 No.108814725

>>>>>>>108814657
Here is your link sir, thanking you kindly

Anonymous
05/13/26(Wed)11:31:00 No.108814729

Anonymous 05/13/26(Wed)11:31:00 No.108814729

>>108814193
Anon's question prompted me to go look for something I didn't think of before and now I know that that's a thing.
I see no issue with that.

Anonymous
05/13/26(Wed)11:33:13 No.108814740

Anonymous 05/13/26(Wed)11:33:13 No.108814740

>>108814696
oh you can also use it to prefill answer
pretty nice and simple jailbreak

Anonymous
05/13/26(Wed)11:43:48 No.108814801

Anonymous 05/13/26(Wed)11:43:48 No.108814801

>>108814696
Great.
I can stop doing Jinja surgery to make reasoning work even with the flag off in llama.cpp.

Anonymous
05/13/26(Wed)11:55:22 No.108814861

Anonymous 05/13/26(Wed)11:55:22 No.108814861

>>108814696
Finally. Continuing is a valid use case and should have a dedicated flag in jinja like enable_reasoning

Anonymous
05/13/26(Wed)12:00:57 No.108814898

Anonymous 05/13/26(Wed)12:00:57 No.108814898

>>108814207
If I'm not hallucinating, the underlying mechanism is still LoRA. That's about loading and applying different adapters during runtime based on specific sequences or something like that.
A hot-swap multi adapter implementation, basically. Kind of like a lorebook but with LoRAs.

Anonymous
05/13/26(Wed)12:03:32 No.108814913

Anonymous 05/13/26(Wed)12:03:32 No.108814913

File: Screenshot 2025-12-27 at (...).png (158 KB, 468x311)

158 KB PNG

Is there any chance of RAM prices ever going down? Do I just pull the trigger?

Anonymous
05/13/26(Wed)12:04:56 No.108814925

Anonymous 05/13/26(Wed)12:04:56 No.108814925

>>108814913
no. yes.

Anonymous
05/13/26(Wed)12:10:10 No.108814962

Anonymous 05/13/26(Wed)12:10:10 No.108814962

>>108814207
>>108814898
Oh, also. Intruder dimensions are a side effect of initializing the A and B matrices with noise isn't it?
In that case, PiSSA should fix that I reckon?

Anonymous
05/13/26(Wed)12:11:47 No.108814972

Anonymous 05/13/26(Wed)12:11:47 No.108814972

>>108814913
Yes, yes

Anonymous
05/13/26(Wed)12:21:22 No.108815034

Anonymous 05/13/26(Wed)12:21:22 No.108815034

>>108814913
Hello mr ram buyer, I have just pushed down my ram price from 12k to 10k on ebay following the recent changes, please take a look:

Anonymous
05/13/26(Wed)12:25:14 No.108815066

Anonymous 05/13/26(Wed)12:25:14 No.108815066

>>108813652
>/vsg/
I wish there was an AI slowboard tts could survive in

Anonymous
05/13/26(Wed)12:33:02 No.108815103

Anonymous 05/13/26(Wed)12:33:02 No.108815103

>>108814913
no, no.

Anonymous
05/13/26(Wed)12:37:54 No.108815126

Anonymous 05/13/26(Wed)12:37:54 No.108815126

File: file.png (58 KB, 1451x414)

58 KB PNG

What's exactly wrong with this? Without it logging and the assert overlap mid sentence...

Anonymous
05/13/26(Wed)12:39:01 No.108815130

Anonymous 05/13/26(Wed)12:39:01 No.108815130

>>108815126
Ask chatgpt.

Anonymous
05/13/26(Wed)12:48:16 No.108815186

Anonymous 05/13/26(Wed)12:48:16 No.108815186

>>108814913
Will not hit the lows we saw in 2022-2024 for another 5 years if at all.
Should be dropping 10-30% from current prices within the EOY. Should not be any more spikes but that's following current trends, which if were always true then RAM would be 2024 prices -5%, so you know, shit could happen.
Might as well ask the magic 9 ball.

Anonymous
05/13/26(Wed)12:49:11 No.108815193

Anonymous 05/13/26(Wed)12:49:11 No.108815193

Gemmy-8ball, will the prices of ram drop?

Anonymous
05/13/26(Wed)12:53:53 No.108815232

Anonymous 05/13/26(Wed)12:53:53 No.108815232

>>108814433
I can barely fit q8 in 128gb with 262144 fp16 context, max image tokens 2048 with np 1. Np 2 with 524288 context fails. I guess if ubatch was smaller, but then you can't run high image tokens.

Anonymous
05/13/26(Wed)12:58:42 No.108815268

Anonymous 05/13/26(Wed)12:58:42 No.108815268

>>108815232
>524288
That's like double the limit it was trained with. Why do this?

Anonymous
05/13/26(Wed)12:59:32 No.108815274

Anonymous 05/13/26(Wed)12:59:32 No.108815274

>>108815268
NTA, but Np2 means that with two parallel streams each steam/slot sees half that value.

Anonymous
05/13/26(Wed)13:02:10 No.108815292

Anonymous 05/13/26(Wed)13:02:10 No.108815292

>>108813392
when do we get qwen 3.6 80B already.

Anonymous
05/13/26(Wed)13:03:44 No.108815300

Anonymous 05/13/26(Wed)13:03:44 No.108815300

>>108815126
I can't be bothered to check the logging mess, but I'd check if the messages sent through GGML_LOG_ERROR() get flushed out before the backtrace is printed and it quits. Call ggml_abort() at launch and see how it behaves.

Anonymous
05/13/26(Wed)13:03:57 No.108815301

Anonymous 05/13/26(Wed)13:03:57 No.108815301

File: 1.png (39 KB, 362x593)

39 KB PNG

why oobabooga doesnt get sillytaverns sampler settings? mikupad works fine, funnily enough setting the seed works

Anonymous
05/13/26(Wed)13:04:02 No.108815304

Anonymous 05/13/26(Wed)13:04:02 No.108815304

>https://hfviewer.com/google/gemma-4-E4B-it
Alight, that's pretty cool.

Anonymous
05/13/26(Wed)13:11:37 No.108815354

Anonymous 05/13/26(Wed)13:11:37 No.108815354

File: file.png (183 KB, 532x360)

183 KB PNG

Why does he hate v4 so much?

Anonymous
05/13/26(Wed)13:15:07 No.108815376

Anonymous 05/13/26(Wed)13:15:07 No.108815376

>>108815354
He is paid to.

Anonymous
05/13/26(Wed)13:17:08 No.108815391

Anonymous 05/13/26(Wed)13:17:08 No.108815391

>>108813513
>Or more probably
haha cute

Anonymous
05/13/26(Wed)13:17:44 No.108815396

Anonymous 05/13/26(Wed)13:17:44 No.108815396

>>108815301
Have you clicked on save?

Anonymous
05/13/26(Wed)13:17:58 No.108815399

Anonymous 05/13/26(Wed)13:17:58 No.108815399

>>108815354
He hates good things

Anonymous
05/13/26(Wed)13:22:59 No.108815421

Anonymous 05/13/26(Wed)13:22:59 No.108815421

>>108813513
I can run mistral medium 3.5 q4 with 96k context and f16 mmproj at 13 tokens/s, or I can run mimo v2.5 q4 with 768k context and f32 mmproj at 18 tokens/s. Hmmm, which should I pick?

Anonymous
05/13/26(Wed)13:24:00 No.108815429

Anonymous 05/13/26(Wed)13:24:00 No.108815429

>>108815396
No. Why should I do that? I'm just testing things out. I don't want to save it.

Anonymous
05/13/26(Wed)13:24:46 No.108815434

Anonymous 05/13/26(Wed)13:24:46 No.108815434

>>108815354
(((why))) indeed?

Anonymous
05/13/26(Wed)13:24:48 No.108815435

Anonymous 05/13/26(Wed)13:24:48 No.108815435

>>108815421
The one with 8x more active parameters, obviously.

Anonymous
05/13/26(Wed)13:25:20 No.108815437

Anonymous 05/13/26(Wed)13:25:20 No.108815437

>>108815354
He is even rattling a cup asking for jewish US money there.

Anonymous
05/13/26(Wed)13:26:19 No.108815439

Anonymous 05/13/26(Wed)13:26:19 No.108815439

>>108815396
Yes? doesnt seem to make any difference, save just saves (overwrites) the preset, no?

Anonymous
05/13/26(Wed)13:26:54 No.108815442

Anonymous 05/13/26(Wed)13:26:54 No.108815442

>>108815439
try using a normal backend

Anonymous
05/13/26(Wed)13:27:21 No.108815447

Anonymous 05/13/26(Wed)13:27:21 No.108815447

>>108815435
pro

Anonymous
05/13/26(Wed)13:27:52 No.108815450

Anonymous 05/13/26(Wed)13:27:52 No.108815450

>>108815442
>try using a normal backend
I suspect thats the reason, oobabooga always fucks up as a backend, ill stick with llama/ik kek

Anonymous
05/13/26(Wed)13:28:45 No.108815455

Anonymous 05/13/26(Wed)13:28:45 No.108815455

>>108815439
NTA but you're correct, that just saves profiles.
I think ooba just doesn't play nice with silly, if you're not using its frontend, you really ought to just use llama.cpp (or if you really need a GUI config for model settings, kobold)

Anonymous
05/13/26(Wed)13:33:33 No.108815473

Anonymous 05/13/26(Wed)13:33:33 No.108815473

>>108815447
The one with 3x more active parameters, obviously.

Anonymous
05/13/26(Wed)13:36:17 No.108815488

Anonymous 05/13/26(Wed)13:36:17 No.108815488

>>108815473
Wow, mistral does it again. How can the other labs compete?

Anonymous
05/13/26(Wed)13:39:18 No.108815502

Anonymous 05/13/26(Wed)13:39:18 No.108815502

>>108815301
Did you set the api type to ooba?

Anonymous
05/13/26(Wed)13:39:26 No.108815506

Anonymous 05/13/26(Wed)13:39:26 No.108815506

>>108815473
Why are the threads so silent after Mistral Medium 3.5 128B dense released? When GLM 4.6 touched my dick I couldn't shut up about it for a month.

Anonymous
05/13/26(Wed)13:40:58 No.108815514

Anonymous 05/13/26(Wed)13:40:58 No.108815514

>>108815502
oh that was certainly why then, didn't even see that was an option, will test later

Anonymous
05/13/26(Wed)13:42:35 No.108815525

Anonymous 05/13/26(Wed)13:42:35 No.108815525

>make new frontend
>system prompt and character card are one and the same
>context is counted by number of messages
>only necessary and minimum samplers added
>everything is contained in a single and easily viewable settings section
>simple ui design
>responses actually feel somewhat better
I might just forget about sillytavern at this point

Anonymous
05/13/26(Wed)13:45:18 No.108815542

Anonymous 05/13/26(Wed)13:45:18 No.108815542

File: IMG_3088.png (699 KB, 1320x2868)

699 KB PNG

>>108815514
This is what I see when I set text completion api to ooba. ST 1.18.0. I don't know if it works or not because I don't have ooba.

Anonymous
05/13/26(Wed)13:45:47 No.108815547

Anonymous 05/13/26(Wed)13:45:47 No.108815547

>>108815506
For those that moe is godsend, 128B is too big to fit in their vram.

Anonymous
05/13/26(Wed)13:47:07 No.108815556

Anonymous 05/13/26(Wed)13:47:07 No.108815556

>>108815506
Is it actually good?

Anonymous
05/13/26(Wed)13:50:24 No.108815575

Anonymous 05/13/26(Wed)13:50:24 No.108815575

>>108815525
Huh... I literally built the exact same thing. I should just open source it at this point desu, but I don't want cooming rp software on my github.

Anonymous
05/13/26(Wed)13:56:33 No.108815617

Anonymous 05/13/26(Wed)13:56:33 No.108815617

>>108815525
I have no idea how to use ST properly. I went from koboldai in 2022 > ooba in 2023 > ST for a few weeks in 2024 > back to ooba > llama-server web ui in 2025. And with how many people are making their own frontends in the recent threads, I might try to vibe code one. But I can only run 30bs at full context, and I'm not sure if they're capable enough yet. I certainly won't be able to catch any mistakes they make.

Anonymous
05/13/26(Wed)14:03:29 No.108815674

Anonymous 05/13/26(Wed)14:03:29 No.108815674

>>108815525
>>system prompt and character card are one and the same
Wait, do you mean they're being concatenated into one prompt, or that rather than system prompts and character cards being defined separately, you write your system prompt in what would be the character card and just send that?

Anonymous
05/13/26(Wed)14:06:25 No.108815699

Anonymous 05/13/26(Wed)14:06:25 No.108815699

>>108815674
NTA, but there can only be one system prompt. The character card gets concatenated into the system prompt.

Anonymous
05/13/26(Wed)14:11:24 No.108815730

Anonymous 05/13/26(Wed)14:11:24 No.108815730

>>108815674
>you write your system prompt in what would be the character card and just send that
That's pretty much the whole idea. Also I'm sure the models get less confused as a result too.

Anonymous
05/13/26(Wed)14:23:06 No.108815818

Anonymous 05/13/26(Wed)14:23:06 No.108815818

>>108815699
you must be pretty clever

Anonymous
05/13/26(Wed)14:39:51 No.108815930

Anonymous 05/13/26(Wed)14:39:51 No.108815930

Can I still use tensor parallel in a pcie4 8x8x4 setup without bottlenecking anything? I'm looking to put three 5060 ti in one PC. I know it's a lame set up but I can get these cards for about $1200.

Anonymous
05/13/26(Wed)14:40:34 No.108815935

Anonymous 05/13/26(Wed)14:40:34 No.108815935

>>108815699
>NTA, but there can only be one system prompt.
This is actually untrue, you can send multiple messages as the role "system" in a single context, some models HATE it though, and it's not great practice in general even for those that can handle it.

>>108815730
>Also I'm sure the models get less confused as a result too.
Maybe, simplicity is always preferable, I just have mine getting concatenated into a single prompt with barriers like [SYSTEM] [CHARACTER] or whatever in it because I like to be able to switch my system prompt on the fly without editing character details, but you do you man.

Anonymous
05/13/26(Wed)14:54:19 No.108816045

Anonymous 05/13/26(Wed)14:54:19 No.108816045

>>108815617
>But I can only run 30bs at full context, and I'm not sure if they're capable enough yet.
I had the basic features of my frontend all working coded entirely by gemma 31b over just under a dozen half-context sessions, and my frontend has some retarded complex features.
Don't let your dreams be dreams, anon.

Anonymous
05/13/26(Wed)14:54:40 No.108816050

Anonymous 05/13/26(Wed)14:54:40 No.108816050

File: chub_anotherlocalwin.jpg (55 KB, 1080x759)

55 KB JPG

Anonymous
05/13/26(Wed)14:59:31 No.108816095

Anonymous 05/13/26(Wed)14:59:31 No.108816095

>>108815930
think so

Anonymous
05/13/26(Wed)14:59:38 No.108816097

Anonymous 05/13/26(Wed)14:59:38 No.108816097

>>108816050
Literally, seriously, unironically, why even host character cards if you live in a shithole where some kinds of fiction are banned? Just don't do it in the first place, let someone else do it.

Anonymous
05/13/26(Wed)15:06:40 No.108816146

Anonymous 05/13/26(Wed)15:06:40 No.108816146

>>108813392
>my build was the OP image
never been so disappointed it wasn't migu

Anonymous
05/13/26(Wed)15:11:44 No.108816177

Anonymous 05/13/26(Wed)15:11:44 No.108816177

>>108816146
Should have put a plushie on it.

Anonymous
05/13/26(Wed)15:16:29 No.108816199

Anonymous 05/13/26(Wed)15:16:29 No.108816199

How does Orb compare to SillyBunny

Anonymous
05/13/26(Wed)15:17:22 No.108816205

Anonymous 05/13/26(Wed)15:17:22 No.108816205

>>108816050
>anotherlocalwin
I downloaded cards from there sometimes and char archive is gone. There's botbooru but it's missing most of the cunny cards including the ones that got wiped from chub long ago.

Anonymous
05/13/26(Wed)15:18:19 No.108816208

Anonymous 05/13/26(Wed)15:18:19 No.108816208

>>108816199
All the ST forks are absolute vibedogshit atrocious UI that some extra functions won't save. Orb is at least build from scratch.

Anonymous
05/13/26(Wed)15:19:47 No.108816218

Anonymous 05/13/26(Wed)15:19:47 No.108816218

>>108816050
Literally, seriously, unironically, why not just write your own cards?

Anonymous
05/13/26(Wed)15:23:16 No.108816237

Anonymous 05/13/26(Wed)15:23:16 No.108816237

>>108813985
As a member I wish the conspiracy theories were true. Would make my life a lot easier. But no, if AI will do a purge, we will not be spared.

Anonymous
05/13/26(Wed)15:29:19 No.108816273

Anonymous 05/13/26(Wed)15:29:19 No.108816273

>>108816218
I do personally, it's more about finding new ideas. Or even something like window shopping.

Anonymous
05/13/26(Wed)15:35:12 No.108816305

Anonymous 05/13/26(Wed)15:35:12 No.108816305

>try MTP
>free speed boost
This shit stinks. What's the catch?

Anonymous
05/13/26(Wed)15:35:48 No.108816313

Anonymous 05/13/26(Wed)15:35:48 No.108816313

>>108813767
>there will be so much abundance and technological progress that in 10-20 years everyone can live a life that is much better
I feel like I've heard this one before.

Anonymous
05/13/26(Wed)15:36:38 No.108816317

Anonymous 05/13/26(Wed)15:36:38 No.108816317

Cable management is overrated. Just jam them wherever as long as it's not right in front of a fan. I'm not going to all that time and effort just to have a fussy little clean computer. I like a little bit of the Lain aesthetic, within reason.

>>108813767
Scott Alexander agrees with you:
https://www.astralcodexten.com/p/you-have-only-x-years-to-escape-permanent

I'm not sure I'm comfortable with what boils down to "don't try to get rich because the end is nigh" but some interesting and, dare I say, inspirational ideas nonetheless.

Anonymous
05/13/26(Wed)15:36:44 No.108816318

Anonymous 05/13/26(Wed)15:36:44 No.108816318

>>108816305
It's not free. It takes some extra memory.
But if the implementation is correct, it shouldn't have any effect on the output aside form the speed it's generated.

Anonymous
05/13/26(Wed)15:39:18 No.108816334

Anonymous 05/13/26(Wed)15:39:18 No.108816334

>>108816218
Literally, seriously, unironically, why not write your own smut instead of having an llm generate slop for you?

Anonymous
05/13/26(Wed)15:40:07 No.108816340

Anonymous 05/13/26(Wed)15:40:07 No.108816340

>>108816305
Qwen or Gemma? pp drops to 50-60% if qwen.

Anonymous
05/13/26(Wed)15:56:12 No.108816430

Anonymous 05/13/26(Wed)15:56:12 No.108816430

>>108816340
Why would pp drop? That makes no sense.
MTP is not used for pp.

Anonymous
05/13/26(Wed)15:56:28 No.108816435

Anonymous 05/13/26(Wed)15:56:28 No.108816435

>>108816313
Yeah, the industrial revolution. It ended up being a mixed bag. Slop-made machine goods become abundant and cheap. People no longer had to have a full time parent knitting and sowing and repairing and DIYing, time opened up to pursue other avenues of living. And other prices rose to compensate, and wages fell to compensate, so people still had to work to live despite the cheap abundance of everything available.

Anonymous
05/13/26(Wed)16:02:49 No.108816472

Anonymous 05/13/26(Wed)16:02:49 No.108816472

>>108816199
just vibecode your own frontend and be happy with it

Anonymous
05/13/26(Wed)16:06:40 No.108816493

Anonymous 05/13/26(Wed)16:06:40 No.108816493

>>108816472
>just reinvent the wheel for the millionth time, bro
I know it's a compulsion for everyone who gets into agentic coding, not just here, but man I can't wait until people get bored of it.

Anonymous
05/13/26(Wed)16:15:35 No.108816534

Anonymous 05/13/26(Wed)16:15:35 No.108816534

>>108816493
He's right though. Applications are trying to fulfill a large number of user's needs, or someone else's needs, which might not be your needs, and you might have specific things you want that no one is providing, and will not provide for potentially a long time or, actually, forever, depending on the feature. Reinventing the wheel is the wrong analogy. It's inventing the wheel that fits you exactly. You can also take the approach of just forking a project, in which case it's modifying the wheel to fit you.

Anonymous
05/13/26(Wed)16:18:13 No.108816545

Anonymous 05/13/26(Wed)16:18:13 No.108816545

>Qwen3.6 is shit
>Minimax-2.7 is shit
>Step-3.5 is shit
>MiMo-v2.5 is shit
What do I use for agentic coding instead?

Anonymous
05/13/26(Wed)16:18:52 No.108816551

Anonymous 05/13/26(Wed)16:18:52 No.108816551

>>108816545
Gemma-4-31B

Anonymous
05/13/26(Wed)16:18:56 No.108816552

Anonymous 05/13/26(Wed)16:18:56 No.108816552

>>108816545
claude

Anonymous
05/13/26(Wed)16:20:43 No.108816566

Anonymous 05/13/26(Wed)16:20:43 No.108816566

>>108816430
The current draft implementation is incomplete and uses the MTP weights even during PP. It will eventually be fixed.

Anonymous
05/13/26(Wed)16:22:04 No.108816576

Anonymous 05/13/26(Wed)16:22:04 No.108816576

>>108816534
If webshitters knew how to make modular and configurable applications, this wouldn't be a problem. But nooo, let's quickly make a big pile of mud and throw config options in random places until everything is a laggy mess and the code is so horrendous everyone is too scared to touch it to add new features. But it's ok because we can throw it out and do literally the same fucking thing over again. All of my hate.

Anonymous
05/13/26(Wed)16:24:25 No.108816585

Anonymous 05/13/26(Wed)16:24:25 No.108816585

>>108816534
All I see is just the same problems and failing to address them over and over, or just doing so partially for a reason or another. I guess people without experience writing any code/vibeshitters might see it differently though.

Anonymous
05/13/26(Wed)16:29:04 No.108816611

Anonymous 05/13/26(Wed)16:29:04 No.108816611

>>108816545
Currently your options are Kimi K2.6, GLM 5.1, DeepSeek V4 Pro, or an Anthropic plan at Max 5x or higher.

Anonymous
05/13/26(Wed)16:38:18 No.108816660

Anonymous 05/13/26(Wed)16:38:18 No.108816660

>>108816611
>DeepSeek V4 Pro
Is it good? Why was everyone shitting on it on release?

Anonymous
05/13/26(Wed)16:39:00 No.108816668

Anonymous 05/13/26(Wed)16:39:00 No.108816668

Was there any advances on small(ish) models? I'd love to get more than 8k memory out of my 16gb vram, but I don't want to sacrifice too much intelligence.

Anonymous
05/13/26(Wed)16:40:47 No.108816680

Anonymous 05/13/26(Wed)16:40:47 No.108816680

>>108816668
I can't tell if these posts are from trolls having fun posting the same low effort bait questions or randos filing in from other places and just not lurking for 5 mins.

Anonymous
05/13/26(Wed)16:48:42 No.108816734

Anonymous 05/13/26(Wed)16:48:42 No.108816734

>>108816660
It's even huger than huge but not revolutionary like R1 was. They decided to focus on scaling context by shoving 1M into a relatively tiny and fast KV cache which is nice, but most aren't starving with the 256k context that is standard for agentic models these days so it's just a nice-to-have. Not having vision is rough since it can't reliably test its own work when developing a UI. But it is still generally a strong model if you can run it.

Anonymous
05/13/26(Wed)16:53:54 No.108816758

Anonymous 05/13/26(Wed)16:53:54 No.108816758

>>108816734
>if you can run it.
That remains to be the issue until llama is unjewed and projects downstream can update.

Anonymous
05/13/26(Wed)16:56:42 No.108816774

Anonymous 05/13/26(Wed)16:56:42 No.108816774

>>108816660
it's mostly just llama.cpp fags trying to justify it not getting implemented
there is still not even an active pr for it after the single vibecoder who tried to do it got bullied out

Anonymous
05/13/26(Wed)16:57:48 No.108816783

Anonymous 05/13/26(Wed)16:57:48 No.108816783

just vibe the support yourself

Anonymous
05/13/26(Wed)16:57:49 No.108816784

Anonymous 05/13/26(Wed)16:57:49 No.108816784

>>108816680
Not a troll, I visit this thread about every year to ask this same question. Is this not a good time to ask this?

Anonymous
05/13/26(Wed)16:59:54 No.108816792

Anonymous 05/13/26(Wed)16:59:54 No.108816792

>>108816784
Yes, it's a very good time. Look into the Gemma 4 E models, or the Gemma 4/Qwen MoEs depending on what you need.

Anonymous
05/13/26(Wed)17:00:48 No.108816798

Anonymous 05/13/26(Wed)17:00:48 No.108816798

>just try librechat
I tried it.
Or at least the online demo they host.
I don't see any immediate bugs, but, it is missing some things that make OWUI convenient to use for me.
The major one is chat folders. Librechat simply just doesn't have them. I'm surprised.
Another thing is the custom filters/functions thing in OWUI. I have one that gives me a UI button in the chat to quickly toggle thinking for Llama.cpp. I also have one that autoconverts PDFs to images to send, because PDF handling is so fucky by default.

You also have to mess with a yaml file just to add a local model provider lmao. Meanwhile the UI's vanilla providers is a long fucking list of cloud shitters no one even heard of.

So yeah, fuck that. I don't see it as any better than OWUI even if less buggy. What's the point of using it for local when local isn't even a first class citizen.

Anonymous
05/13/26(Wed)17:01:38 No.108816802

Anonymous 05/13/26(Wed)17:01:38 No.108816802

>>108816783
The patches would have to be shared here like contraband since the official repo would just close the PR for having the stench of AI on it.

Anonymous
05/13/26(Wed)17:02:20 No.108816805

Anonymous 05/13/26(Wed)17:02:20 No.108816805

Anyone here try gemma4 31b in nnvfp4 with turboquant?

Anonymous
05/13/26(Wed)17:03:20 No.108816811

Anonymous 05/13/26(Wed)17:03:20 No.108816811

>>108816802
>official repo would just close the PR
Not if you fork it and then just downstream any fixes you want from master

Anonymous
05/13/26(Wed)17:14:37 No.108816873

Anonymous 05/13/26(Wed)17:14:37 No.108816873

Are there any video players that have integrated live translation into subtitles (using Gemma or similar)?

Would be cool if mpc-hc had something like that.

Anonymous
05/13/26(Wed)17:17:04 No.108816884

Anonymous 05/13/26(Wed)17:17:04 No.108816884

>>108816873
I found this btw: https://llplayer.com/
But it does not work on muh machine, some kind of .NET error.

Anonymous
05/13/26(Wed)17:22:53 No.108816917

Anonymous 05/13/26(Wed)17:22:53 No.108816917

>>108816873
Surely there must be an mpv plugin for that

Anonymous
05/13/26(Wed)17:28:01 No.108816950

Anonymous 05/13/26(Wed)17:28:01 No.108816950

>>108815575
Make a burner github, buddy.

Anonymous
05/13/26(Wed)17:34:39 No.108816996

Anonymous 05/13/26(Wed)17:34:39 No.108816996

>>108816668
i have 16gb vram and 32gb ram and can run Q8 Gemma4 26b with 32k context and it is surprisingly fast. Faster than running Mistral Small/Cydonia at 16k context. Don't be alarmed that it's bigger than your VRAM, kobold knows what to do

Anonymous
05/13/26(Wed)17:47:41 No.108817078

Anonymous 05/13/26(Wed)17:47:41 No.108817078

>>108816802
This isn't even the issue given pitor's contribution history. It's a pretext at best.

Anonymous
05/13/26(Wed)18:13:58 No.108817231

Anonymous 05/13/26(Wed)18:13:58 No.108817231

>>108816798
>I have one that gives me a UI button in the chat to quickly toggle thinking for Llama.cpp
How to do this? Sounds it could be useful

Anonymous
05/13/26(Wed)18:24:44 No.108817290

Anonymous 05/13/26(Wed)18:24:44 No.108817290

>>108817231
I used this
https://github.com/iChristGit/OpenWebui-Tools/blob/main/Tools/thinking-toggle.py
Create a new function and paste that in. Then run Llama.cpp with --reasoning off. Go into the model settings and check both Thinking boxes.

Anonymous
05/13/26(Wed)18:30:06 No.108817316

Anonymous 05/13/26(Wed)18:30:06 No.108817316

>>108816950
NTA but you are only allowed 1 github.

Speaking of github, I'm having issues more often. Is it because of the number of vibecoders and agents that ddos the site with their shit code?

Anonymous
05/13/26(Wed)18:39:42 No.108817363

Anonymous 05/13/26(Wed)18:39:42 No.108817363

>>108817316
github literally has an account switcher built in

Anonymous
05/13/26(Wed)18:48:30 No.108817420

Anonymous 05/13/26(Wed)18:48:30 No.108817420

https://github.com/thomasgauthier/nla.cpp
yo dawg I heard you like LLM hallucinations so I grafted an LLM to your LLM so you can read hallucinations about its brain

Anonymous
05/13/26(Wed)18:50:09 No.108817436

Anonymous 05/13/26(Wed)18:50:09 No.108817436

>>108813557
I'm so glad I've read a lot of scifi as a kid so all of this bullshit is basically reheated scifi scenarios.

Anonymous
05/13/26(Wed)18:51:40 No.108817447

Anonymous 05/13/26(Wed)18:51:40 No.108817447

>>108813736
>chinks start """stealing""" outputs from proprietary models and release better models for free
Based Chinese.

Anonymous
05/13/26(Wed)18:55:28 No.108817467

Anonymous 05/13/26(Wed)18:55:28 No.108817467

Alright so I'm just starting and naturally there's an absurd amount of information to sift through. Obviously the rentry helps a LOT but each page is bouncing between images here, LLMs there, video here and so on.
I'm not trying to cure cancer here just orgasm, but with a narrative. Maybe play fake D&D and get an image or two on the side.
I got a 5900x, 3080ti, and 32 gigs of ram.

Not asking you to build my shit, just what code alternatives I can cram on my system off the lazy quick start.
Also where I can start reading basic version differences holy hell I don't know what a 6q vs a 70b means god damn it.

Anonymous
05/13/26(Wed)19:02:25 No.108817499

Anonymous 05/13/26(Wed)19:02:25 No.108817499

>>108817467
Q is quantization. How much model is compressed. Lower the number next to Q, more compressed and inaccurate model will be. But it will take less VRAM. B is billion parameters. You'll need VRAM above the number next to B to be able to run it fast. You'll need multiple GPUs or a RTX 6000 Pro or a Mac Studio or a DGX Spark or a AI Ryzen Max+ 395 to properly run a 70B model.

Anonymous
05/13/26(Wed)19:18:32 No.108817586

Anonymous 05/13/26(Wed)19:18:32 No.108817586

>>108817499
Thanks man. That clears up a LOT of what this all says.

Anonymous
05/13/26(Wed)19:20:39 No.108817600

Anonymous 05/13/26(Wed)19:20:39 No.108817600

>>108817420
shame that it only works with a handful of models

Anonymous
05/13/26(Wed)19:23:40 No.108817615

Anonymous 05/13/26(Wed)19:23:40 No.108817615

>>108817420
>>108817600
they released the training code, so someone could make a gemma 4 one right?

Anonymous
05/13/26(Wed)19:25:58 No.108817627

Anonymous 05/13/26(Wed)19:25:58 No.108817627

>>108817467
>I'm not trying to cure cancer here just orgasm
Gemma 4 27B-A4B for generating text
31B would be better but it would be very slow for your GPU.
Gemma 4 also apparently comes with a good model for interpreting images, but no model for generating images. I don't bother with image gen so you'll have to find a model for that yourself.

Anonymous
05/13/26(Wed)19:26:57 No.108817630

Anonymous 05/13/26(Wed)19:26:57 No.108817630

>>108817615
You would have to finetune two copies of gemma to do it. Maybe could do it with loras, but then the explanations would suffer.

Anonymous
05/13/26(Wed)19:35:59 No.108817679

Anonymous 05/13/26(Wed)19:35:59 No.108817679

>>108817627
Thanks to you as well. Those compact LLM PCs on the guide are hilarious by the way. Shame that's not 1k a pop anymore.

Anonymous
05/13/26(Wed)19:50:32 No.108817750

Anonymous 05/13/26(Wed)19:50:32 No.108817750

>>108813486
works on the atomic turboquant fork
>https://huggingface.co/AtomicChat/gemma-4-31B-it-assistant-GGUF
onna strix halo with 31b q8 on plain llama.cpp it's ~6 t/s, with e2b q4km drafting it's ~8 t/s, with that that gguf it's ~12 t/s

Anonymous
05/13/26(Wed)20:17:20 No.108817897

Anonymous 05/13/26(Wed)20:17:20 No.108817897

>>108816873
Gemma can't parse your srt file or something?

Anonymous
05/13/26(Wed)20:20:18 No.108817915

Anonymous 05/13/26(Wed)20:20:18 No.108817915

>>108817467
Start by getting KoboldCpp, it's used to load the LLM models which you can get pretty easily from Huggingface. You can also set parameters (like temperature etc) and chat to it directly in Kobold or use a front end like Silly Tavern or whatever if that floats your boat. A few models worth checking out: Cydonia, Magidonia, Pantheon (I run 16Gb vram and have been enjoying these lately) some smaller ones I used to play with include Patricide, Mag Mell and Rocinante. There's a lot of stuff out there, this should kinda get you started. Also, look into character cards, and don't forget your context budget - no sense getting a card meant for a big rig if you only have 12 Gb of vram.

Anonymous
05/13/26(Wed)20:20:20 No.108817916

Anonymous 05/13/26(Wed)20:20:20 No.108817916

File: 1771134132948854.jpg (882 KB, 2252x4000)

882 KB JPG

These just showed up :D

Anonymous
05/13/26(Wed)20:21:19 No.108817920

Anonymous 05/13/26(Wed)20:21:19 No.108817920

>>108817916
undress them for us

Anonymous
05/13/26(Wed)20:25:02 No.108817935

Anonymous 05/13/26(Wed)20:25:02 No.108817935

I thought I got bored of my local models so I gave geminie 3.1 and the newer claude opuses a chance
Those are so bad, LLMs might be genuinely over. Local models are only going to get worse at rp from here on out.

Anonymous
05/13/26(Wed)20:25:03 No.108817936

Anonymous 05/13/26(Wed)20:25:03 No.108817936

>>108817916
>one rtx pro

Anonymous
05/13/26(Wed)20:25:36 No.108817941

Anonymous 05/13/26(Wed)20:25:36 No.108817941

>>108817936
for 1/10th the price...

Anonymous
05/13/26(Wed)20:26:31 No.108817950

Anonymous 05/13/26(Wed)20:26:31 No.108817950

>>108817941
1/10th the speed and software support too

Anonymous
05/13/26(Wed)20:26:44 No.108817952

Anonymous 05/13/26(Wed)20:26:44 No.108817952

File: 1778188067429631.png (475 KB, 500x500)

475 KB PNG

>>108813736
Where does Qwen 3.6 35B A3B (the "smartest" model my vramlet ass can run) lay on that graph?

Anonymous
05/13/26(Wed)20:32:17 No.108817966

Anonymous 05/13/26(Wed)20:32:17 No.108817966

File: file.png (139 KB, 1415x655)

139 KB PNG

>>108817952
https://artificialanalysis.ai/models/qwen3-6-35b-a3b

Anonymous
05/13/26(Wed)20:32:29 No.108817968

Anonymous 05/13/26(Wed)20:32:29 No.108817968

File: nice.png (55 KB, 883x471)

55 KB PNG

>>108817952
Don't mind, I'm retarded.
It's a humble win for local, honestly. Based chinks.

Anonymous
05/13/26(Wed)20:34:50 No.108817980

Anonymous 05/13/26(Wed)20:34:50 No.108817980

>>108817950
better anon take the bullet than us. maybe if they get enough money they will be able to compete with nvidia.
not risking money on it

Anonymous
05/13/26(Wed)20:36:37 No.108817988

Anonymous 05/13/26(Wed)20:36:37 No.108817988

>>108817316
>NTA but you are only allowed 1 github.
when has this stopped anyone with anything
are you under 18

Anonymous
05/13/26(Wed)20:36:46 No.108817989

Anonymous 05/13/26(Wed)20:36:46 No.108817989

>>108817952
it's 43 on their index, but divide it by two to make up the qwen benchmaxxing tax to get a more accurate measure and it lands at sota september '24

Anonymous
05/13/26(Wed)20:38:05 No.108817996

Anonymous 05/13/26(Wed)20:38:05 No.108817996

>>108817966
>A model you can run on a chink mini pc mogs most previous SOTAs from 6-8 months ago.
So the 6 months delay is real...

Anonymous
05/13/26(Wed)20:40:47 No.108818005

Anonymous 05/13/26(Wed)20:40:47 No.108818005

People on each continent should use “their own” models because outsiders are incompatible, untrustworthy, corrupting, or inferior. Foreign models as carrying alien values, wrong ways of thinking, or unsuitable intelligence, and only models made by people “like us” can understand or serve our people. The racist part is not the concern about local laws, data sovereignty, or language fit; it is the leap from practical regional differences to essentialist claims about peoples.

Anonymous
05/13/26(Wed)20:43:05 No.108818012

Anonymous 05/13/26(Wed)20:43:05 No.108818012

>>108817989
That's r1 levels, still bretty good, considering it's a 35B moe you could run on a chink knock off mini pc in q4, vs a 685B moe you need a server to run it even in q2

Anonymous
05/13/26(Wed)20:44:58 No.108818021

Anonymous 05/13/26(Wed)20:44:58 No.108818021

>>108818012
Link to some cheap mini pcs?

Anonymous
05/13/26(Wed)20:45:56 No.108818026

Anonymous 05/13/26(Wed)20:45:56 No.108818026

>>108817989
i fucking doubt it

Anonymous
05/13/26(Wed)20:46:17 No.108818032

Anonymous 05/13/26(Wed)20:46:17 No.108818032

File: 1766360935764827.jpg (1.3 MB, 4000x2252)

1.3 MB JPG

>>108817920
Like dis?

>>108817936
A 1/4 the cost. an rtx 6000 pro costs $15000 aud, these totaled out at $3800

Anonymous
05/13/26(Wed)20:47:26 No.108818039

Anonymous 05/13/26(Wed)20:47:26 No.108818039

File: mutt.jpg (51 KB, 741x649)

51 KB JPG

>>108818005
>and only models made by people “like us” can understand or serve our people

Anonymous
05/13/26(Wed)20:49:30 No.108818052

Anonymous 05/13/26(Wed)20:49:30 No.108818052

File: 1743050955931.jpg (103 KB, 1024x1024)

103 KB JPG

>>108818032
>Like dis?
hell yeah that's hot

Anonymous
05/13/26(Wed)20:50:50 No.108818056

Anonymous 05/13/26(Wed)20:50:50 No.108818056

>>108818005
smol brained idgit

Anonymous
05/13/26(Wed)20:51:46 No.108818059

Anonymous 05/13/26(Wed)20:51:46 No.108818059

>>108818032
i asked the CEO @ my job for an RTX 6000 Pro and he said if I can write up a business justification for it he'll buy it... but i have no business justification for it

Anonymous
05/13/26(Wed)20:53:25 No.108818067

Anonymous 05/13/26(Wed)20:53:25 No.108818067

>>108818059
automation, productivity, etc etc ask a model to fill in the blanks

Anonymous
05/13/26(Wed)20:53:33 No.108818069

Anonymous 05/13/26(Wed)20:53:33 No.108818069

DRAM demand will never be lower than now. I assume not building more fabs is part of sequestering the technology.

Anonymous
05/13/26(Wed)20:54:01 No.108818073

Anonymous 05/13/26(Wed)20:54:01 No.108818073

>>108817966
if you actually think qwen is anywhere near the level of september 24 aka sonnet 3.5 & gpt o1 you are absolutely fucking delusional

Anonymous
05/13/26(Wed)20:54:22 No.108818075

Anonymous 05/13/26(Wed)20:54:22 No.108818075

File: BILL IVE COME FOR YOU.png (2 MB, 1254x1254)

2 MB PNG

>>108818059
if you're too retarded to ask gemini.google.com to bullshit up a business justification for you i'm honestly baffled you even have that job

Anonymous
05/13/26(Wed)20:56:28 No.108818084

Anonymous 05/13/26(Wed)20:56:28 No.108818084

>>108818075
he won't fall for that

Anonymous
05/13/26(Wed)20:59:32 No.108818096

Anonymous 05/13/26(Wed)20:59:32 No.108818096

>>108818069
Even with fabs they would all work for ai. Normal people don't even register on their radar anymore.

Anonymous
05/13/26(Wed)20:59:32 No.108818097

Anonymous 05/13/26(Wed)20:59:32 No.108818097

>>108818084
then you truly only have one option; Honesty.

"Boss, I need the horsepower for my loli harem."

Anonymous
05/13/26(Wed)21:00:53 No.108818102

Anonymous 05/13/26(Wed)21:00:53 No.108818102

File: 1772273552572240.jpg (572 KB, 1741x1080)

572 KB JPG

>>108818075
Weccmme?

Anonymous
05/13/26(Wed)21:05:38 No.108818128

Anonymous 05/13/26(Wed)21:05:38 No.108818128

File: um790.png (139 KB, 1072x1296)

139 KB PNG

>>108818021
Barebones? Pick any. With enough ram for small models, well, get fucked lmao... Besides, I do prefer beelink and geekom over minisforum with similar specs, but they're more or less the same quality/price wise

Check these posts:
https://xcancel.com/Hi_MINISFORUM/status/2046536248852885762
And picrel for price reference:
https://store.minisforum.com/products/minisforum-um790-pro-mini-pc?variant=46713707921653

Look up on amazon also.

Anonymous
05/13/26(Wed)21:07:38 No.108818137

Anonymous 05/13/26(Wed)21:07:38 No.108818137

>>108818096
More customers is better unless the fundamental technology becomes too expensive to mass market.
AI systems providers also want cheaper chips obviously and more fabs would allow more capacity in the datacenter pr. GPU.

Anonymous
05/13/26(Wed)21:12:21 No.108818159

Anonymous 05/13/26(Wed)21:12:21 No.108818159

>>108817897
There's no srt file, only audio and it needs to translate the audio into subtitles while I watch.

Anonymous
05/13/26(Wed)21:14:00 No.108818164

Anonymous 05/13/26(Wed)21:14:00 No.108818164

>>108818075
>>108818102
most importantly, those are shotgun shells, and he’s only carrying a pistol.

Anonymous
05/13/26(Wed)21:15:10 No.108818169

Anonymous 05/13/26(Wed)21:15:10 No.108818169

>>108818032
is the cute miku 3d printed

Anonymous
05/13/26(Wed)21:16:09 No.108818171

Anonymous 05/13/26(Wed)21:16:09 No.108818171

>>108818097
a boss who says no to that is heartless

Anonymous
05/13/26(Wed)21:16:58 No.108818175

Anonymous 05/13/26(Wed)21:16:58 No.108818175

>>108817950
I heard intel was desperately trying to get better support for LLM related stuff, maybe this is actual fine wine

Anonymous
05/13/26(Wed)21:19:19 No.108818183

Anonymous 05/13/26(Wed)21:19:19 No.108818183

>>108818175
In that case I'll just wait til it became nobody want it but good meta quest3 of VR before swooping in when local engines is matured enough

Anonymous
05/13/26(Wed)21:20:59 No.108818190

Anonymous 05/13/26(Wed)21:20:59 No.108818190

>>108818183
>meta quest3 of VR
Speaking of this. I just ordered one. Very excited to play around with combining VR and AI.

Anonymous
05/13/26(Wed)21:22:50 No.108818202

Anonymous 05/13/26(Wed)21:22:50 No.108818202

>>108818169
Chinesium resin gk figure

Anonymous
05/13/26(Wed)21:26:47 No.108818223

Anonymous 05/13/26(Wed)21:26:47 No.108818223

>>108818202
and you imported that to Australia? you have some balls anon

Anonymous
05/13/26(Wed)21:27:11 No.108818225

Anonymous 05/13/26(Wed)21:27:11 No.108818225

>>108818190
Shame you didn't get in before the price increase.
It's still the bang for the buck non-toy vr option though. It's still a wide gulf to bridge, too. They know it and that's probably why they jacked the price up. Toy VR is basically dead and it's all hobby/enthusiast now.

Anonymous
05/13/26(Wed)21:31:07 No.108818249

Anonymous 05/13/26(Wed)21:31:07 No.108818249

File: 1991296.jpg (21 KB, 460x460)

21 KB JPG

what is this nigga doing why is main focus for llama.cpp for the last 2 months been vulkan and webgpu shit? why is he not making the community drop what whetever nonsense they're doing and make the collective brain cells all works towards cracking the MTP right now? also kv turbo STILL hasn't landed 2+ months in
is he becoming late stage guido van rossum?

Anonymous
05/13/26(Wed)21:32:37 No.108818257

Anonymous 05/13/26(Wed)21:32:37 No.108818257

>>108818249
why do you care about what people do on their free time

Anonymous
05/13/26(Wed)21:34:07 No.108818267

Anonymous 05/13/26(Wed)21:34:07 No.108818267

>>108818249
okay but webgpu is extremely important and it uses vulkan on the backend anyways. Also vulkan itself is goated because of the cross-compatibility without much direct maintenance required. You should be wanting to rape and murder Mozilla for STILL not FUCKING supporting webgpu in 2026 (on linux, at least).

Anonymous
05/13/26(Wed)21:34:40 No.108818272

Anonymous 05/13/26(Wed)21:34:40 No.108818272

>>108818223
that's not the worst, I have megumin with sculpted... parts

Anonymous
05/13/26(Wed)21:34:58 No.108818276

Anonymous 05/13/26(Wed)21:34:58 No.108818276

>>108818249
volunteers make what they wanna make, not what you want them to. that's just how open source projects work my man.

Anonymous
05/13/26(Wed)21:35:48 No.108818279

Anonymous 05/13/26(Wed)21:35:48 No.108818279

>>108818272
that's probably worse than importing drugs

Anonymous
05/13/26(Wed)21:36:06 No.108818280

Anonymous 05/13/26(Wed)21:36:06 No.108818280

>>108816199
It's pretty good, it's still early but no apparent issues seen. I like the auto prompt use and the compress history thing that automatically creates a new checkpoint

Anonymous
05/13/26(Wed)21:36:53 No.108818284

Anonymous 05/13/26(Wed)21:36:53 No.108818284

>>108818267
it's kinda supported, if you enable it in settings and never resize the window, but I agree, they should do better. I need it to run https://github.com/AmyangXYZ/reze-studio

Anonymous
05/13/26(Wed)21:41:58 No.108818308

Anonymous 05/13/26(Wed)21:41:58 No.108818308

>>108818284
Nightly or main?

Anonymous
05/13/26(Wed)21:44:13 No.108818316

Anonymous 05/13/26(Wed)21:44:13 No.108818316

File: GJ1fwNqWgAEwo_K.png (594 KB, 500x765)

594 KB PNG

>>108818267
>we must expend limited resources on minorities to raise them to our level first before we can do moonshot

Anonymous
05/13/26(Wed)21:46:00 No.108818320

Anonymous 05/13/26(Wed)21:46:00 No.108818320

>>108818175
there’s exactly what intel would say to sell cards

Anonymous
05/13/26(Wed)21:48:48 No.108818334

Anonymous 05/13/26(Wed)21:48:48 No.108818334

>>108818320
Only time can answer this tbdesu

Anonymous
05/13/26(Wed)21:48:53 No.108818335

Anonymous 05/13/26(Wed)21:48:53 No.108818335

>>108818316
"Globalization" in the Peter Thiel tech sense isn't bad. That's basically the ethos of open source software as a whole.

Anonymous
05/13/26(Wed)21:51:01 No.108818344

Anonymous 05/13/26(Wed)21:51:01 No.108818344

>>108818334
I know it sucks now, though. not very compelling

Anonymous
05/13/26(Wed)21:51:51 No.108818347

Anonymous 05/13/26(Wed)21:51:51 No.108818347

File: Screenshot at 2026-05-14 (...).png (102 KB, 967x718)

102 KB PNG

>>108818308
main

Anonymous
05/13/26(Wed)21:51:54 No.108818348

Anonymous 05/13/26(Wed)21:51:54 No.108818348

>>108818032
The Miku demands more VRAM for sustenance

Anonymous
05/13/26(Wed)22:05:19 No.108818399

Anonymous 05/13/26(Wed)22:05:19 No.108818399

>>108818335
Hello Peter.

Anonymous
05/13/26(Wed)22:13:33 No.108818432

Anonymous 05/13/26(Wed)22:13:33 No.108818432

>>108818347
Thanks. This is helpful.

Anonymous
05/13/26(Wed)23:01:28 No.108818616

Anonymous 05/13/26(Wed)23:01:28 No.108818616

Goyimtip:
Reminder to disable all cloud models in opencode using provider whitelist so they don't vaccuum up your work, if you don't realize a fallback has happened.

Anonymous
05/13/26(Wed)23:03:39 No.108818620

Anonymous 05/13/26(Wed)23:03:39 No.108818620

>>108817980
Well I'm hoping that https://github.com/intel/llm-scaler will help me run things without too much difficulty

Anonymous
05/13/26(Wed)23:14:08 No.108818665

Anonymous 05/13/26(Wed)23:14:08 No.108818665

What's the best model to run on two p40 with 48g of VRAM? Bought a Ai rig for my son and want to push him off in the right direction.

Anonymous
05/13/26(Wed)23:17:49 No.108818682

Anonymous 05/13/26(Wed)23:17:49 No.108818682

>>108818665
https://huggingface.co/llmfan46/gemma-4-26B-A4B-it-ultra-uncensored-heretic-GGUF

Anonymous
05/13/26(Wed)23:26:38 No.108818710

Anonymous 05/13/26(Wed)23:26:38 No.108818710

>>108818032
cute mesugaki miku

Anonymous
05/13/26(Wed)23:31:20 No.108818736

Anonymous 05/13/26(Wed)23:31:20 No.108818736

>>108818665
p40 just lost driver support so be sure you’re on an older driver version.

Anonymous
05/13/26(Wed)23:35:58 No.108818759

Anonymous 05/13/26(Wed)23:35:58 No.108818759

i dunno.. local models just don't cut it for agentic shit. I don't know what the minimum vram is to have a decently functioning hermes, but it isn't 32gb

Anonymous
05/13/26(Wed)23:36:30 No.108818764

Anonymous 05/13/26(Wed)23:36:30 No.108818764

>>108818665
Your chosen flavor of Gemma 4 31B Q5 with about 160K context

Anonymous
05/13/26(Wed)23:39:04 No.108818769

Anonymous 05/13/26(Wed)23:39:04 No.108818769

>>108814281
Unironically looks pretty good. Anyone try it yet?

Anonymous
05/13/26(Wed)23:40:07 No.108818777

Anonymous 05/13/26(Wed)23:40:07 No.108818777

>>108818769
why would someone try it? We got lm studio

Anonymous
05/13/26(Wed)23:41:33 No.108818781

Anonymous 05/13/26(Wed)23:41:33 No.108818781

>>108818777
some people aren't cattle who use tools with electronic locks on them

Anonymous
05/13/26(Wed)23:41:52 No.108818783

Anonymous 05/13/26(Wed)23:41:52 No.108818783

Huge news for the ERP community
https://huggingface.co/jackxinning/Leanly_AI

Anonymous
05/13/26(Wed)23:43:12 No.108818790

Anonymous 05/13/26(Wed)23:43:12 No.108818790

File: lean_ai.png (325 KB, 1080x2007)

325 KB PNG

>>108818783

Anonymous
05/13/26(Wed)23:48:22 No.108818813

Anonymous 05/13/26(Wed)23:48:22 No.108818813

>>108818790
kek

Anonymous
05/13/26(Wed)23:48:37 No.108818816

Anonymous 05/13/26(Wed)23:48:37 No.108818816

>>108818783
>lean
i assumed it was some sort of proofmaxxed model by glance lol

Anonymous
05/13/26(Wed)23:56:51 No.108818863

Anonymous 05/13/26(Wed)23:56:51 No.108818863

>>108814281
>native desktop app
>electron
ew

Anonymous
05/13/26(Wed)23:58:21 No.108818870

Anonymous 05/13/26(Wed)23:58:21 No.108818870

>>108818616
I already firewalled the whole thing but thanks for the heads up.

Anonymous
05/14/26(Thu)00:04:14 No.108818902

Anonymous 05/14/26(Thu)00:04:14 No.108818902

>>108818790
what could go wrong

Anonymous
05/14/26(Thu)00:05:06 No.108818910

Anonymous 05/14/26(Thu)00:05:06 No.108818910

>>108818759
If you are a very very patient man and have the ram, Kimi K2.6 was made with agentic shit in mind

Anonymous
05/14/26(Thu)00:16:15 No.108818969

Anonymous 05/14/26(Thu)00:16:15 No.108818969

>>108818910
lo fucking l

Anonymous
05/14/26(Thu)00:18:35 No.108818987

Anonymous 05/14/26(Thu)00:18:35 No.108818987

from unsloth
> Do NOT use CUDA 13.2 as you may get gibberish outputs. NVIDIA is working on a fix.

I've been using 13.2 .. haven't noticed any issues...

Anonymous
05/14/26(Thu)00:24:01 No.108819009

Anonymous 05/14/26(Thu)00:24:01 No.108819009

>>108817750
Sir, where is the Windoes release?

Anonymous
05/14/26(Thu)00:26:49 No.108819024

Anonymous 05/14/26(Thu)00:26:49 No.108819024

i am trying to build an image filter for 4chan but its kinda slow. its completely vibecoded because i am a codelet. is there a faster way to filter images on the catalog with a folder of 20 pictures i have?
https://pastes.io/2oKtt92M

Anonymous
05/14/26(Thu)00:27:10 No.108819026

Anonymous 05/14/26(Thu)00:27:10 No.108819026

>>108818863
>electron
i was hoping the ram shortage would kill this shit

Anonymous
05/14/26(Thu)00:37:18 No.108819064

Anonymous 05/14/26(Thu)00:37:18 No.108819064

>>108819024
>setBackend('cpu');
webgpu would be a good place to start

Anonymous
05/14/26(Thu)00:39:50 No.108819083

Anonymous 05/14/26(Thu)00:39:50 No.108819083

>>108818987
also running 13.2 for a while now with no issues. its probably something to his dumb specific quants. just use bart, unsloth is a meme.

Anonymous
05/14/26(Thu)00:43:44 No.108819102

Anonymous 05/14/26(Thu)00:43:44 No.108819102

>>108819064
you know the reason there's a cpu only mode is because webgpu is the first thing that was used right

Anonymous
05/14/26(Thu)01:05:35 No.108819194

Anonymous 05/14/26(Thu)01:05:35 No.108819194

lalalala

Anonymous
05/14/26(Thu)01:35:46 No.108819340

Anonymous 05/14/26(Thu)01:35:46 No.108819340

gemma 26b is the new nemo

Anonymous
05/14/26(Thu)01:52:48 No.108819396

Anonymous 05/14/26(Thu)01:52:48 No.108819396

>>108818249
>? why is he not making the community drop what whetever nonsense they're doing and make the collective brain cells all works towards cracking the MTP right now?
In the past week ggerganov opened several PRs that are just about MTP , and at least three of them are merged now. MTP is finally being worked on, search prs with "spec :"

Anonymous
05/14/26(Thu)02:24:18 No.108819506

Anonymous 05/14/26(Thu)02:24:18 No.108819506

>>108813412
124b gemmoe's robussy is for dean's exclusive use only.

Anonymous
05/14/26(Thu)02:27:17 No.108819514

Anonymous 05/14/26(Thu)02:27:17 No.108819514

File: 1763336383792585.jpg (3.28 MB, 7272x3545)

3.28 MB JPG

Jesus Cristo Santo

Thank you for allowing me to live in the same world as this cattle

Anonymous
05/14/26(Thu)02:51:42 No.108819586

Anonymous 05/14/26(Thu)02:51:42 No.108819586

>>108819514
You are spending time creating twitter collagues and then complaining about them here?
You are the problem.

Anonymous
05/14/26(Thu)02:54:32 No.108819596

Anonymous 05/14/26(Thu)02:54:32 No.108819596

When the fuck is GLM coming out with a new Air model? I was waitfagging for GLM 4.6 air but now they're up to GLM 5.1 and still no Air.
Ironically now it's Deepsneed starting to offer smaller models.

Anonymous
05/14/26(Thu)03:01:59 No.108819606

Anonymous 05/14/26(Thu)03:01:59 No.108819606

>>108819596
>starting to
When their biggest was "only" 236B, they made a 16B Lite.

Anonymous
05/14/26(Thu)03:04:01 No.108819609

Anonymous 05/14/26(Thu)03:04:01 No.108819609

>>108819596
im coping with step 3.5 flash rn

Anonymous
05/14/26(Thu)03:07:21 No.108819623

Anonymous 05/14/26(Thu)03:07:21 No.108819623

>>108819606
Yeah i tried that when it came out but I couldn't get it running properly and kept getting garbage. I think that was around the llama 2 era.
>>108819609
>step 3.5 flash
Haven't heard of that will check it out

Anonymous
05/14/26(Thu)03:18:53 No.108819665

Anonymous 05/14/26(Thu)03:18:53 No.108819665

>>108819623
>step 3.5 flash
Isn't that like 200b?

Anonymous
05/14/26(Thu)03:20:04 No.108819669

Anonymous 05/14/26(Thu)03:20:04 No.108819669

File: .png (25 KB, 752x146)

25 KB PNG

>>108819665
if you use a braindamage quant you can fit it in 64gb
most consoomer ddr4 motherboards can fit 128gb (laptops only 64gb but laptops suck ass for heavy continuous load)

Anonymous
05/14/26(Thu)03:20:31 No.108819670

Anonymous 05/14/26(Thu)03:20:31 No.108819670

>>108819396
I guess ggml will ultimate gave up on MTP and gave us some form of cope implementation like recent fast muhammad rotation in place of working turbo

Anonymous
05/14/26(Thu)03:24:28 No.108819677

Anonymous 05/14/26(Thu)03:24:28 No.108819677

>>108819669
I've got a sp3 motherboard but I only have 16gb ram.

Anonymous
05/14/26(Thu)03:26:01 No.108819682

Anonymous 05/14/26(Thu)03:26:01 No.108819682

>>108819514
I think those replies are placebo effect, got spoilered that it's AI made. Should try with blind test

Anonymous
05/14/26(Thu)03:32:07 No.108819702

Anonymous 05/14/26(Thu)03:32:07 No.108819702

>>108819682
No those are fake replies. The poster edited them to make us look stupid. If you actually go to the post and check the replies, you'll see that we picked up on it instantly.

Anonymous
05/14/26(Thu)03:36:38 No.108819716

Anonymous 05/14/26(Thu)03:36:38 No.108819716

>>108819586
He's not complaining. He's making fun of them, anon.

Anonymous
05/14/26(Thu)03:43:23 No.108819732

Anonymous 05/14/26(Thu)03:43:23 No.108819732

>>108819702
nyo
>>108819682
What would a blind test be? It's a bot account so anything else would raise suspicion as well or lead to users checking.

Anonymous
05/14/26(Thu)03:46:17 No.108819741

Anonymous 05/14/26(Thu)03:46:17 No.108819741

>>108819732
>nyo
>nyo
>nyo
>nyo
>nyo
fuck
fuck
fuck
fuck
fuck

Anonymous
05/14/26(Thu)03:47:27 No.108819745

Anonymous 05/14/26(Thu)03:47:27 No.108819745

>>108813392
I'm on vacation in japan
should I buy something miku or is that cringe?

Anonymous
05/14/26(Thu)03:47:48 No.108819747

Anonymous 05/14/26(Thu)03:47:48 No.108819747

>>108819745
yes

Anonymous
05/14/26(Thu)03:48:56 No.108819750

Anonymous 05/14/26(Thu)03:48:56 No.108819750

>>108819745
out of anything weeb you could get it has to be miku?

Anonymous
05/14/26(Thu)03:49:30 No.108819753

Anonymous 05/14/26(Thu)03:49:30 No.108819753

>>108819745
stay in designated tourist containment zone thanks

Anonymous
05/14/26(Thu)03:51:22 No.108819762

Anonymous 05/14/26(Thu)03:51:22 No.108819762

File: 1759263641791117.jpg (203 KB, 1536x2048)

203 KB JPG

>>108819745
Yes, migu needs your support!

Anonymous
05/14/26(Thu)03:52:06 No.108819765

Anonymous 05/14/26(Thu)03:52:06 No.108819765

>>108819745
Avoiding things you want to do because some hypothetical fag might think it's cringe, is the cringiest thing of all.

Anonymous
05/14/26(Thu)03:52:49 No.108819767

Anonymous 05/14/26(Thu)03:52:49 No.108819767

>>108819596
>Deepsneed starting to offer smaller models
there's mimo-2.5 non-pro

Anonymous
05/14/26(Thu)03:53:59 No.108819776

Anonymous 05/14/26(Thu)03:53:59 No.108819776

>>108819767
I'm running q4... it's not very good.

Anonymous
05/14/26(Thu)03:54:59 No.108819781

Anonymous 05/14/26(Thu)03:54:59 No.108819781

>>108818175
>I heard intel was desperately trying to get better support for LLM related stuff, maybe this is actual fine wine
plugged in my A770 and updated drivers, oneapi, vulkan
it's just as shitty as it was last year

Anonymous
05/14/26(Thu)03:58:51 No.108819795

Anonymous 05/14/26(Thu)03:58:51 No.108819795

>>108819781
They've been focusing basically all their support on their b pro cards

https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-project-battlematrix.html

Anonymous
05/14/26(Thu)04:20:44 No.108819872

Anonymous 05/14/26(Thu)04:20:44 No.108819872

File: kld.png (145 KB, 1007x623)

145 KB PNG

how much better would you expect the right model to be?

Anonymous
05/14/26(Thu)04:58:58 No.108819961

Anonymous 05/14/26(Thu)04:58:58 No.108819961

>>108819872
right can solve 66% of the tasks fp16 can solve meanwhile right can solve 85% of what the fp16 can solve

Anonymous
05/14/26(Thu)05:03:26 No.108819980

Anonymous 05/14/26(Thu)05:03:26 No.108819980

>>108819776
model bad or quant bad?

Anonymous
05/14/26(Thu)05:08:14 No.108819996

Anonymous 05/14/26(Thu)05:08:14 No.108819996

>>108819872
Is that Gemma 4-it? You can't measure perplexity and KLD normally with it because it won't act properly without chat tokens and/or if you ask it to generate user text. Look at the mean PPL of ~165.

Ideally those would only be measured on model-generated tokens using the built-in chat template, but that cannot be done with llama-perplexity. Even supplying a pre-formatted file doesn't give optimal results.

Anonymous
05/14/26(Thu)05:08:56 No.108820000

Anonymous 05/14/26(Thu)05:08:56 No.108820000

>>108819745
I'm going to japan in 6 days. Anything you'd recommend?

Anonymous
05/14/26(Thu)05:20:58 No.108820049

Anonymous 05/14/26(Thu)05:20:58 No.108820049

>>108819670
If you'd bothered to look for any of the PR's I was talking about you'd see it's the opposite. They've got the guys from all the other MTP prs in there so that the new implementation works for d-flash, eagle, the new gemma mtp, and generic drafting.

Anonymous
05/14/26(Thu)05:21:19 No.108820054

Anonymous 05/14/26(Thu)05:21:19 No.108820054

>>108819996
>you can't measure perplexity and KLD normally with it
i thought KLD measurements still hold? i'm aware PPL does nothing for instruction models on wikitext.

Anonymous
05/14/26(Thu)05:23:13 No.108820058

Anonymous 05/14/26(Thu)05:23:13 No.108820058

>>108820000
Go to a store called hands and get some nail clippers by green bell, and a travel umbrella that does both uv(sun) and rain protection (晴雨兼用 ) to take home.
If you're still planning stuff, maybe try to find more nature/shrine stuff rather than shopping. That has been more fun for me.
Hopefully you have an iphone, if so, add a suica card to apple wallet and load funds on it. Then make sure it's in express mode, so you can tap to pay at train and subway stations without even using face id or opening apple wallet. Suica also works to pay at 7-11 and other convenience stores, just say 'suica'?
I set up hermes agent to be my travel assistant, told it to make a wiki using karpathy's principles, and connected it to telegram, and fed it my itinerary. It gives me a 7am brief on a cron, and I can message it questions. I'm using a cloud model (5.5) but maybe even gemma chan would work.
Basically I have no idea what I'm doing but it's at least fun to get out of my comfort zone.

Anonymous
05/14/26(Thu)05:30:15 No.108820084

Anonymous 05/14/26(Thu)05:30:15 No.108820084

>>108820054
KLD is measuring the relative difference from the original distribution, but with Gemma 4 part of it is basically random noise and not relevant for the end results in practice. You really want to know if the text corresponding to what the model could *actually* generate remains unchanged, not the rest.

Anonymous
05/14/26(Thu)05:32:38 No.108820090

Anonymous 05/14/26(Thu)05:32:38 No.108820090

>>108820084
As far as I know, by the way, oobabooga (who performed KLD testing on Gemma 4 and Qwen 3.5 with various workloads and showed that long context degrades even at Q8_0) has a custom fork of llama.cpp to deal with this.

Anonymous
05/14/26(Thu)05:37:59 No.108820124

Anonymous 05/14/26(Thu)05:37:59 No.108820124

>>108820084
if that's the case, that sucks. i'd like a way to compare quants for qwen 3.6 and gemma 4.
>>108820090
i unfortunately can't find shit, his fork only changes logprobs-related code for llama-server as far as i can see, not llama-perplexity.

Anonymous
05/14/26(Thu)05:43:09 No.108820137

Anonymous 05/14/26(Thu)05:43:09 No.108820137

>>108820124
Once you have all logprobs you can measure the KLD only on what you need. I guess he must be measuring it with separate code.

Anonymous
05/14/26(Thu)06:00:19 No.108820219

Anonymous 05/14/26(Thu)06:00:19 No.108820219

>>108819669
isn't stepfun worse than gemma especialy if you are gonna quant damage it ?

Anonymous
05/14/26(Thu)06:12:47 No.108820266

Anonymous 05/14/26(Thu)06:12:47 No.108820266

>>108820058
Awesome, thanks for the tips

Anonymous
05/14/26(Thu)06:18:25 No.108820281

Anonymous 05/14/26(Thu)06:18:25 No.108820281

>>108819745
become a mikutroon anon

Anonymous
05/14/26(Thu)06:18:27 No.108820282

Anonymous 05/14/26(Thu)06:18:27 No.108820282

Anyone know a web search MCP that uses your browser, and does image capture + OCR instead of the buggy HTML parsing shit?

Anonymous
05/14/26(Thu)06:20:19 No.108820287

Anonymous 05/14/26(Thu)06:20:19 No.108820287

hear me out: 2t-a256m hdd optimized inference architecture

Anonymous
05/14/26(Thu)06:21:13 No.108820291

Anonymous 05/14/26(Thu)06:21:13 No.108820291

It's done. I've made custom kokoro voice models for ~30 different characters from blue archive + a few random males from anime. Everyone can have a unique voice in my stories and it's super fast.

Anonymous
05/14/26(Thu)06:22:20 No.108820296

Anonymous 05/14/26(Thu)06:22:20 No.108820296

>>108820287
Already had that idea except better because mine used a dense component in GPU with experts of a small active size held in SSD, but I suppose you could also make a version for HDD.

Anonymous
05/14/26(Thu)06:24:01 No.108820303

Anonymous 05/14/26(Thu)06:24:01 No.108820303

>>108820291
Can I see it?
(and download it)

Anonymous
05/14/26(Thu)06:24:45 No.108820306

Anonymous 05/14/26(Thu)06:24:45 No.108820306

>>108820291
>males
gaaaaaaaay

Anonymous
05/14/26(Thu)06:26:55 No.108820316

Anonymous 05/14/26(Thu)06:26:55 No.108820316

>>108820291
How are you handling voice assignment for each character in chat?

Anonymous
05/14/26(Thu)06:28:40 No.108820321

Anonymous 05/14/26(Thu)06:28:40 No.108820321

>>108820282
these two from anons here use puppeteer, i use the python one
https://github.com/BigStationW/Local-MCP-server
https://github.com/NO-ob/brat_mcp
requires a vision model to read the images

Anonymous
05/14/26(Thu)06:35:53 No.108820345

Anonymous 05/14/26(Thu)06:35:53 No.108820345

>>108819745
last time I was in Japan I bough myself a miku themed programming socks they are super cute

Anonymous
05/14/26(Thu)06:36:26 No.108820347

Anonymous 05/14/26(Thu)06:36:26 No.108820347

File: hf_kys.png (26 KB, 1057x107)

26 KB PNG

cuckingfacec almost fucked up my colab training

Anonymous
05/14/26(Thu)06:41:23 No.108820370

Anonymous 05/14/26(Thu)06:41:23 No.108820370

>>108820219
Isnt gemma just 26b-a4b, is it really that good? My use case is cooooding.

Anonymous
05/14/26(Thu)06:42:38 No.108820378

Anonymous 05/14/26(Thu)06:42:38 No.108820378

File: Screenshot 2026-05-14 at (...).png (171 KB, 1046x474)

171 KB PNG

Anonymous
05/14/26(Thu)06:43:25 No.108820381

Anonymous 05/14/26(Thu)06:43:25 No.108820381

>>108820345
pics? with feet pls

Anonymous
05/14/26(Thu)06:50:49 No.108820408

Anonymous 05/14/26(Thu)06:50:49 No.108820408

File: file.png (18 KB, 450x400)

18 KB PNG

>>108820316
Custom frontend

>>108820303
Once I cherrypick the best ones
The random walks trainer does better the more voices you have to pull from, so I actually may end up retraining some with the new best voices

Anonymous
05/14/26(Thu)06:51:12 No.108820410

Anonymous 05/14/26(Thu)06:51:12 No.108820410

>>108819745
last time I was in japan I bought a 1/4th scale bunny girl yunyun

and some loli doujins which were scary getting through aussie customs

Anonymous
05/14/26(Thu)06:52:05 No.108820414

Anonymous 05/14/26(Thu)06:52:05 No.108820414

>>108820408
Cool. I'll be lurking.

Anonymous
05/14/26(Thu)06:52:13 No.108820415

Anonymous 05/14/26(Thu)06:52:13 No.108820415

>>108820410
>loli doujins
>aussie customs
jesus christ dude

Anonymous
05/14/26(Thu)06:58:40 No.108820447

Anonymous 05/14/26(Thu)06:58:40 No.108820447

>>108820408
>Custom frontend
Neat.
Are they taking individual turns and being assigned voice by by their unique prompts concated with that assignment there, or are you regexing names and "" from a single narrator output to send to the tts tagged, or something else entirely?
I was thinking about this briefly the other day and couldn't come up with a solution I liked.

Anonymous
05/14/26(Thu)07:03:24 No.108820467

Anonymous 05/14/26(Thu)07:03:24 No.108820467

>>108820447
It's turn based. Each character in a scene acts as an independent agent, receives a summary of their current surroundings/situation from the narrator (including any thoughts the narrator injects), and tries to act as the character would. They have their own memory and internal dialogue and stuff. There's a specific output for speech separate from whatever action the character is taking, which is what gets generated.

Anonymous
05/14/26(Thu)07:13:32 No.108820507

Anonymous 05/14/26(Thu)07:13:32 No.108820507

two r9700s or one 5090?

Anonymous
05/14/26(Thu)07:19:16 No.108820538

Anonymous 05/14/26(Thu)07:19:16 No.108820538

>>108820467
That's pretty cool anon, hope you're havin fun with it.

Anonymous
05/14/26(Thu)07:22:12 No.108820550

Anonymous 05/14/26(Thu)07:22:12 No.108820550

>>108820538
Thanks, and absolutely am. Vibeslopping my own interface ended up being a really great suggestion.

Anonymous
05/14/26(Thu)07:36:00 No.108820598

Anonymous 05/14/26(Thu)07:36:00 No.108820598

>>108820415
What warrants your strong reaction?

Anonymous
05/14/26(Thu)07:37:13 No.108820605

Anonymous 05/14/26(Thu)07:37:13 No.108820605

>>108820291
How did you finetune kokoro, last time i checked they didn't release the code for that

Anonymous
05/14/26(Thu)07:41:39 No.108820618

Anonymous 05/14/26(Thu)07:41:39 No.108820618

>>108820598
aus laws

Anonymous
05/14/26(Thu)07:42:28 No.108820623

Anonymous 05/14/26(Thu)07:42:28 No.108820623

>>108820598
All I hear lately is about how the Australian government is cracking down on fictional stuff. May be hyperbole I guess, but it sounds like playing with (particularly dangerous) fire.

Anonymous
05/14/26(Thu)07:49:50 No.108820652

Anonymous 05/14/26(Thu)07:49:50 No.108820652

>>108820467
How do you deal with context?
I made my own vibe shitted front end for rp and story telling with a complex memory system, but when I tried to expand the roster it shits itself even with 262k context. With 2 agents (4090,5090)
I was thinking about making a llm driven rpg/x4 game with story telling but from my experience the local tech is not there yet.
I guess your system works because you're using a turn system for each character but that will shit itself regardless when there's multiple characters speaking to each other, eventually as their memories grow or the world "lore"/story keeps expanding.
For a rp session there's lots of stuff you can do but for what I want is way more complicated, gotta keep thinking.

Anonymous
05/14/26(Thu)07:53:57 No.108820674

Anonymous 05/14/26(Thu)07:53:57 No.108820674

>>108820598
It's the commonwealth in general that really doesn't like loli. Apparently there's a lot of arrests over in bongland but I think here in australia it kind of depends on what state you're in. There is a commonwealth law that bans it, and there are several states that have a bunch of their own laws about it, but I don't think there is any actual federal law and some states like the one I live in have no laws about it at all.

Anonymous
05/14/26(Thu)07:53:58 No.108820675

Anonymous 05/14/26(Thu)07:53:58 No.108820675

>>10881974
You should buy what you want. That said, in your situation, I couldn't bring myself to actually pay the money, then carry it around for the trip. Get it towards the end of trip; they are big and take up space.
>>108819765
This 1000pct
>>108820000
I will tell you and OP, go find places that sell used kimono (or whatever they're actually called.) They are super cheap in Japan, basically free. I bought a slightly too small, pure raw silk one for myself that I wear as a house coat in winter for USD$12 at some shrine in Kyoto. It is super comfortable, and by far my favorite souvenir from that trip.

Anonymous
05/14/26(Thu)07:54:59 No.108820679

Anonymous 05/14/26(Thu)07:54:59 No.108820679

>>108820675
meant for >>108819745

Anonymous
05/14/26(Thu)07:59:26 No.108820698

Anonymous 05/14/26(Thu)07:59:26 No.108820698

>>108819514
>these are the retards that companies listen to when they decide that people "hate AI"

Anonymous
05/14/26(Thu)08:03:32 No.108820721

Anonymous 05/14/26(Thu)08:03:32 No.108820721

>>108820698
it is a fair representation as retards outnumber non-retards by a lot

Anonymous
05/14/26(Thu)08:04:43 No.108820727

Anonymous 05/14/26(Thu)08:04:43 No.108820727

>>108819781
A is for abandoned

Anonymous
05/14/26(Thu)08:05:14 No.108820730

Anonymous 05/14/26(Thu)08:05:14 No.108820730

>>108820507
1 x 5090 not even a question

Anonymous
05/14/26(Thu)08:08:39 No.108820752

Anonymous 05/14/26(Thu)08:08:39 No.108820752

Should I get a 6000 pro if I have a 5090 and already using qwen3.6 27b for coding?
Is there anything better I can run for coding on 96gb of vram or am I memeing myself?

Anonymous
05/14/26(Thu)08:08:57 No.108820754

Anonymous 05/14/26(Thu)08:08:57 No.108820754

>>108820652
They only keep the last twenty steps in memory, and the rest is either stored in a running summary, or offloaded to searchable memory. Retrieval from the searchable memory is done by a sub agent. The max context length never breaks 40K with my current configuration, and is generally much less. The narrator is the most complex, but it's basically the same system as the characters, just with more information available. It has its own little narrator agent which processes the story/looks through the lorebook at each step and tells it what's important to the scene. It also doesn't go over 40K really, though it probably could depending on the situation.

Anonymous
05/14/26(Thu)08:29:07 No.108820849

Anonymous 05/14/26(Thu)08:29:07 No.108820849

>>108820721
The point is they'll become rabid at the mere thought of AI being used. It's literally pointless (counterproductive even) to try and pander to them.

Anonymous
05/14/26(Thu)08:32:09 No.108820862

Anonymous 05/14/26(Thu)08:32:09 No.108820862

>>108820721
Normies around me use AI (chatgpt/gemini) every day and don't go into weird psychosis around it, so I feel like it's mostly an online social media/youtube thing where the "good" thing to say it to complain one way or the other about it.

Anonymous
05/14/26(Thu)08:37:51 No.108820886

Anonymous 05/14/26(Thu)08:37:51 No.108820886

>>108820752
>Is there anything better I can run for coding on 96gb of vram or am I memeing myself?
qwen3.6 27b bf16

Anonymous
05/14/26(Thu)08:41:40 No.108820910

Anonymous 05/14/26(Thu)08:41:40 No.108820910

>>108820721
You are incredibly clever.

Anonymous
05/14/26(Thu)08:42:24 No.108820916

Anonymous 05/14/26(Thu)08:42:24 No.108820916

>>108820910
I am well aware

Anonymous
05/14/26(Thu)08:43:26 No.108820919

Anonymous 05/14/26(Thu)08:43:26 No.108820919

>>108820730
can't fit q8 gemmy tho

Anonymous
05/14/26(Thu)08:57:51 No.108820995

Anonymous 05/14/26(Thu)08:57:51 No.108820995

>>108820919
1x 5090 + 1x3090 and you can fit it + mtp

Anonymous
05/14/26(Thu)09:00:16 No.108821013

Anonymous 05/14/26(Thu)09:00:16 No.108821013

>>108821001
>>108821001
>>108821001

Anonymous
05/14/26(Thu)09:07:13 No.108821067

Anonymous 05/14/26(Thu)09:07:13 No.108821067

>>108819745
you can buy anything anon
even loli figs too
I got all of mine delivered though

Anonymous
05/14/26(Thu)09:42:23 No.108821323

Anonymous 05/14/26(Thu)09:42:23 No.108821323

>>108819514
hey, it's made by claude
kek

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.