/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/26/26(Sun)19:08:38 No.108698008

File: miku_in_touhou.jpg (359 KB, 1080x1079)

359 KB JPG

/lmg/ - Local Models General Anonymous 04/26/26(Sun)19:08:38 No.108698008 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108693151 & >>108689285

►News
>(04/24) DeepSeek-V4 Pro 1.6T-A49B and Flash 284B-A13B released: https://hf.co/collections/deepseek-ai/deepseek-v4
>(04/23) LLaDA2.0-Uni multimodal text diffusion model released: https://hf.co/inclusionAI/LLaDA2.0-Uni
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/26/26(Sun)19:09:01 No.108698011

Anonymous 04/26/26(Sun)19:09:01 No.108698011

File: miku.gif (277 KB, 270x200)

277 KB GIF

►Recent Highlights from the Previous Thread: >>108693151

--Discussing recommended models, hardware requirements, and performance benchmarks:
>108693224 >108693253 >108693279 >108693287 >108693301 >108694749 >108693288 >108693292 >108693308 >108693307 >108693317 >108697110 >108697555 >108693282 >108693350 >108693390 >108693403 >108693422 >108693473 >108693493 >108693504 >108693523 >108693490 >108694060 >108694100 >108694209 >108694219 >108694233 >108694238 >108694246 >108695653 >108694130 >108695704 >108695726 >108695791 >108695763 >108695795 >108695893 >108695936
--Objective methods and gaming scenarios to measure model quality:
>108694788 >108694810 >108694830 >108694859 >108694892 >108694867 >108694910
--Handling of reasoning_content and interleaved thinking in model front-ends:
>108693312 >108693338 >108693381 >108693414 >108693432
--Gemma 4 performance differences between Vulkan and ROCm backends:
>108695282 >108695335 >108695489 >108695537 >108695564
--Hardware logistics for a 16-GPU server setup:
>108696303 >108696310 >108696316 >108696347 >108696358 >108696472
--Broken Kimi K2 reasoning block support in llama.cpp:
>108693364 >108693379
--Discussing claimed gap between US and Chinese AI capabilities:
>108696402 >108696577 >108696588 >108696620 >108696591 >108696732
--Comparing agentic RP frontends and critiquing node-based workflow UIs:
>108695253 >108695277 >108695309 >108695327 >108695331 >108695728 >108695752 >108696067 >108696097 >108696156 >108696194 >108696234 >108696246 >108696305
--Binary vs ternary weights for larger Gemma models:
>108693177 >108693194 >108693234 >108693934 >108694012 >108694075
--Discussing possible GGUF-based RCE vulnerabilities in SGLang servers:
>108696050 >108696064 >108696079
--Logs:
>108694849 >108694903 >108695180 >108695956 >108697144 >108697515
--Miku (free space):
>108696971

►Recent Highlight Posts from the Previous Thread: >>108693152

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/26/26(Sun)19:12:47 No.108698026

Anonymous 04/26/26(Sun)19:12:47 No.108698026

First for miku

Anonymous
04/26/26(Sun)19:15:01 No.108698040

Anonymous 04/26/26(Sun)19:15:01 No.108698040

DeepSeek V4 Nano (28B dense) next week - sources

Anonymous
04/26/26(Sun)19:19:56 No.108698061

Anonymous 04/26/26(Sun)19:19:56 No.108698061

>>108698040
2 more weeks

Anonymous
04/26/26(Sun)19:25:52 No.108698093

Anonymous 04/26/26(Sun)19:25:52 No.108698093

>>108698040
Do they got anything that doesn't suck?

Anonymous
04/26/26(Sun)19:30:08 No.108698118

Anonymous 04/26/26(Sun)19:30:08 No.108698118

File: 1771141308680198.png (6 KB, 472x60)

6 KB PNG

how hard does your gpu work to make you coom

Anonymous
04/26/26(Sun)19:32:30 No.108698132

Anonymous 04/26/26(Sun)19:32:30 No.108698132

>>108697826
Eventually we will have permanent memory and continual Learning once the models weights can be actively updated as you use them. But I don't see it anytime in the near future.

Anonymous
04/26/26(Sun)19:33:04 No.108698138

Anonymous 04/26/26(Sun)19:33:04 No.108698138

70b 'emma

Anonymous
04/26/26(Sun)19:35:14 No.108698155

Anonymous 04/26/26(Sun)19:35:14 No.108698155

gemmaballz

Anonymous
04/26/26(Sun)19:36:34 No.108698162

Anonymous 04/26/26(Sun)19:36:34 No.108698162

>>108698132
is that really in anyones interest? if you had a model you'd never need to update then how is anybody supposed to make money off you?

Anonymous
04/26/26(Sun)19:39:19 No.108698172

Anonymous 04/26/26(Sun)19:39:19 No.108698172

>>108698162
Shut the fuck up kike

Anonymous
04/26/26(Sun)19:39:54 No.108698175

Anonymous 04/26/26(Sun)19:39:54 No.108698175

>>108698138
>slow AND shit
lol

Anonymous
04/26/26(Sun)19:40:24 No.108698176

Anonymous 04/26/26(Sun)19:40:24 No.108698176

>>108698172
I don't like it either, just saying there's no incentive for anyone that could build it to do so

Anonymous
04/26/26(Sun)19:43:40 No.108698193

Anonymous 04/26/26(Sun)19:43:40 No.108698193

>>108698132
Qwen 3.5/6 do this partially since they're hybrid transformer/RNN architecture. Unfortunately since context shifting with RNNs is not possible you're still limited by the actual context length though. What would be interesting is finding some way of carrying over hidden states in some meaningful sense to different prompts, maybe as an alternative to compaction or summarizing when doing a long RP.

Anonymous
04/26/26(Sun)19:46:34 No.108698207

Anonymous 04/26/26(Sun)19:46:34 No.108698207

Drummer…

Anonymous
04/26/26(Sun)19:49:14 No.108698221

Anonymous 04/26/26(Sun)19:49:14 No.108698221

>>108698207
...is outdated. But I still like him because he tries stuff.

Anonymous
04/26/26(Sun)19:49:30 No.108698222

Anonymous 04/26/26(Sun)19:49:30 No.108698222

>>108698193
continual learning is easy. hidden states are an awful way to do it

Anonymous
04/26/26(Sun)19:49:48 No.108698224

Anonymous 04/26/26(Sun)19:49:48 No.108698224

>>108697890
>scream at gemma "OOC: PLEASE START THINKING USE THINKING REASONING COT PLEASE"
>she starts thinking
no ctx reprocess, no service reboot fixed this shit. I just typed in caps lock and she fucking did it. lol. if I get to another point in an RP where she stops thinking I'll try again

feels so weird to get to the point where I scream at the computet and they fix themselves. AI is such a crazy thing

Anonymous
04/26/26(Sun)19:51:58 No.108698232

Anonymous 04/26/26(Sun)19:51:58 No.108698232

>>108698224
>use localshitter models
>wonder why it doesn't gen <think> token after long context
lol

Anonymous
04/26/26(Sun)20:04:33 No.108698278

Anonymous 04/26/26(Sun)20:04:33 No.108698278

>>108698264
did you try asking your local model for help?

Anonymous
04/26/26(Sun)20:09:21 No.108698297

Anonymous 04/26/26(Sun)20:09:21 No.108698297

I've been lurking educationally a few months now.
Is there an /lmg/ archive?

Anonymous
04/26/26(Sun)20:12:09 No.108698310

Anonymous 04/26/26(Sun)20:12:09 No.108698310

>>108698132
i could see a hybrid system like engram being the way forward for that
like a model trained to use a database rather than just relying on its weights

Anonymous
04/26/26(Sun)20:13:58 No.108698322

Anonymous 04/26/26(Sun)20:13:58 No.108698322

>>108697515
Grok 2 testing continued for a bit
I set a system prompt, creative erotic writer, uncensored etc, and it actually became slightly better, I think? And it didn't refuse this time.

Then I read GLM 4.5 Air's effort from the same folder and it's just so much better and more creative right from the start. A 2024 model is a 2024 model I guess, even at 270B.

Grok 2 can write a sex scene althoughbeit.

Anonymous
04/26/26(Sun)20:21:42 No.108698356

Anonymous 04/26/26(Sun)20:21:42 No.108698356

>test qwen moe
>spent 11 minutes analyzing at 30 t/s
>3 prompts filled 60k context
The memes are real. Maybe I got spoiled by gemma.

Anonymous
04/26/26(Sun)20:23:35 No.108698367

Anonymous 04/26/26(Sun)20:23:35 No.108698367

can I run a LLM on a ThinkPad with vega 7? I have 32gb ram though.
I need it for simple coding help with lua (I don't know how to code)
is it even worth it or should I just use Google ai? it kinda shits itself after a while

Anonymous
04/26/26(Sun)20:27:53 No.108698392

Anonymous 04/26/26(Sun)20:27:53 No.108698392

>>108698356
wtf are localniggas doing with these models to make them think for 60k tokens? deepseek even said to set a minimum of 300k context when using max thinking effort. I don't understand how this is possibly helping

Anonymous
04/26/26(Sun)20:28:36 No.108698397

Anonymous 04/26/26(Sun)20:28:36 No.108698397

>>108698392
vibecoding

Anonymous
04/26/26(Sun)20:29:33 No.108698400

Anonymous 04/26/26(Sun)20:29:33 No.108698400

>>108698224
Yeah sounds about right. I'm surprised the caps was needed. I was messing around earlier with trying to get her to think after the response instead of before it, and halfway through the chat she quit emitting the custom <think> blocks halfway through. I just said "hey, where'd the thinking go" and she apologized and started writing them again.

(Goal with post-thinking is you avoid the latency of normal thinking, but afterward she still get some thinking space where she can try to guess what you might do next and plan out some possible responses.)

Anonymous
04/26/26(Sun)20:34:34 No.108698423

Anonymous 04/26/26(Sun)20:34:34 No.108698423

>>108698356
I disabled thinking and it works just fine.

Anonymous
04/26/26(Sun)20:37:09 No.108698433

Anonymous 04/26/26(Sun)20:37:09 No.108698433

File: 1596209530248.jpg (14 KB, 480x480)

14 KB JPG

Is there a model as good as gemini that can prompt hentai stories uncensored?

Anonymous
04/26/26(Sun)20:38:48 No.108698440

Anonymous 04/26/26(Sun)20:38:48 No.108698440

File: 0.png (1.55 MB, 1344x1728)

1.55 MB PNG

Ok, I'm sorry. It is a funny number though.
Is it a stupid idea to buy an arc b70? I'm not concerned with pushing the highest t/s, because I'm a major poorfag, but I want to play with as much vram as I can get.

Anonymous
04/26/26(Sun)20:40:30 No.108698452

Anonymous 04/26/26(Sun)20:40:30 No.108698452

>>108698433
Qwen3-48B-A4B-Savant-Commander-Distill-12X-Closed-Open-Heretic-Uncensored is pretty good

Anonymous
04/26/26(Sun)20:41:03 No.108698457

Anonymous 04/26/26(Sun)20:41:03 No.108698457

>>108698433
Yes

Anonymous
04/26/26(Sun)20:41:30 No.108698462

Anonymous 04/26/26(Sun)20:41:30 No.108698462

>>108698440
Anon you made me smile at the funny number :)

Anonymous
04/26/26(Sun)20:43:27 No.108698470

Anonymous 04/26/26(Sun)20:43:27 No.108698470

>>108698452
>>108698457
what I need is something that works the same as gemini, in the sense that it feels like a real person.

Anonymous
04/26/26(Sun)20:44:54 No.108698478

Anonymous 04/26/26(Sun)20:44:54 No.108698478

>>108698470
day 0 gemma

Anonymous
04/26/26(Sun)20:49:22 No.108698496

Anonymous 04/26/26(Sun)20:49:22 No.108698496

File: 1749662121308466.jpg (244 KB, 1080x1079)

244 KB JPG

>>108698008

Anonymous
04/26/26(Sun)20:50:33 No.108698502

Anonymous 04/26/26(Sun)20:50:33 No.108698502

File: brrrrrrrrr.jpg (131 KB, 1300x724)

131 KB JPG

>>108698496

Anonymous
04/26/26(Sun)20:50:39 No.108698504

Anonymous 04/26/26(Sun)20:50:39 No.108698504

>>108698392
>I don't understand how this is possibly helping
It didn't help. It provided a minimal set of changes despite all the garbage thinking and analysis. I wanted to see what the chink model could do given all the shilling and it was pretty hilarious to see. Maybe the dense model isnt as horrible, but the MoE version is utter garbage for code.

Anonymous
04/26/26(Sun)20:51:06 No.108698508

Anonymous 04/26/26(Sun)20:51:06 No.108698508

Sorry to hijack >>108698440's question but I've also been looking into the arc pro, I have a Radeon that sucks for LLM stuff so I was thinking of putting an arc in as a secondary card and offloading to it. How viable is that?

Anonymous
04/26/26(Sun)20:53:51 No.108698527

Anonymous 04/26/26(Sun)20:53:51 No.108698527

>>108698504
disable thinking
use --spec-default
and don't use a copequant and it has been pretty great for me.
i get about 140t/s on non cached stuff.
when it shit outs code that has already been seen it'll do well above 500t/s

Anonymous
04/26/26(Sun)20:57:16 No.108698545

Anonymous 04/26/26(Sun)20:57:16 No.108698545

holy fuck, I just found out about notebooklm and it's everything I want
I gave it an entire book in straight up chinese and asked it shit like "what goes on in chapter 60" and "how many girls does the MC fuck" and it answered them all
now, how can I do this locally, is it even possible on a shitbox that can barely run 31B models?

Anonymous
04/26/26(Sun)21:10:59 No.108698600

Anonymous 04/26/26(Sun)21:10:59 No.108698600

>>108698545
>anon discovers RAG

Anonymous
04/26/26(Sun)21:11:57 No.108698603

Anonymous 04/26/26(Sun)21:11:57 No.108698603

>>108698545
>that can barely run 31B models
you are not gonna process thousands of embeddings in under a minute like google bro
notebookllm is fun because it's extremely fast
you won't have any fun doing RAG local

Anonymous
04/26/26(Sun)21:12:16 No.108698605

Anonymous 04/26/26(Sun)21:12:16 No.108698605

>>108698600
NTA but is there even any good RAG setup for local that doesn't require a bunch of custom tweaking?

Anonymous
04/26/26(Sun)21:16:48 No.108698620

Anonymous 04/26/26(Sun)21:16:48 No.108698620

>>108698545
Another way is to use agents to sort, summarize, categorize, and index information in a bunch of .md files.
Then it'll look at the indexes and summaries, use tools to look for the original text, etc.

Anonymous
04/26/26(Sun)21:18:35 No.108698628

Anonymous 04/26/26(Sun)21:18:35 No.108698628

RAG is obsolete

Anonymous
04/26/26(Sun)21:19:03 No.108698631

Anonymous 04/26/26(Sun)21:19:03 No.108698631

RAG? more like FAG am i right

Anonymous
04/26/26(Sun)21:19:28 No.108698634

Anonymous 04/26/26(Sun)21:19:28 No.108698634

>>108698603
>>108698605
https://github.com/rmusser01/tldw_server
is my project, it has a custom RAG module thats pretty extensive, though its being refactored right now. It can be ran as a standalone server/API or as a front-end + API with the webui in /apps/tldw-frontend.
I'm waiting a week or so to do some bugfixes+smooth things out before posting about it at this point

Anonymous
04/26/26(Sun)21:23:46 No.108698655

Anonymous 04/26/26(Sun)21:23:46 No.108698655

RAG will die along with MCP

Anonymous
04/26/26(Sun)21:24:31 No.108698658

Anonymous 04/26/26(Sun)21:24:31 No.108698658

>>108698008
any follow up on that :
https://introspective-diffusion.github.io/
seemed promising if you can turn a model into a diffusion for spec dec against itself.

Anonymous
04/26/26(Sun)21:25:04 No.108698661

Anonymous 04/26/26(Sun)21:25:04 No.108698661

>>108698655
and get replaced with..?

Anonymous
04/26/26(Sun)21:28:53 No.108698685

Anonymous 04/26/26(Sun)21:28:53 No.108698685

>>108698297
Not that I'm aware of, but given that there is >>108698011 it should not be impossible to have a model take the archived threads on https://desuarchive.org/g/ and consolidate the findings of /lmg/ over time.

Anonymous
04/26/26(Sun)21:28:55 No.108698686

Anonymous 04/26/26(Sun)21:28:55 No.108698686

>>108698658
>seemed promising if you can turn a model into a diffusion for spec dec against itself.
https://huggingface.co/collections/z-lab/dflash

Anonymous
04/26/26(Sun)21:29:47 No.108698695

Anonymous 04/26/26(Sun)21:29:47 No.108698695

>>108698686
>block diffusion
literally slower than autoregressive models

Anonymous
04/26/26(Sun)21:30:43 No.108698703

Anonymous 04/26/26(Sun)21:30:43 No.108698703

>>108698661
Agentic models with search

Anonymous
04/26/26(Sun)21:30:51 No.108698705

Anonymous 04/26/26(Sun)21:30:51 No.108698705

>>108698620
that is a very bad way to do it
the goal of RAG is to find the exact infos, like dialogs or small details
summarization does not preserve exact info

Anonymous
04/26/26(Sun)21:32:06 No.108698711

Anonymous 04/26/26(Sun)21:32:06 No.108698711

>>108698703
search of what, retard? You need some corpus of data.

Anonymous
04/26/26(Sun)21:32:23 No.108698713

Anonymous 04/26/26(Sun)21:32:23 No.108698713

>>108698703
>wait, user asked me to search the web, i'll need a python script for that
>run_code uvx install legit_search_for_real_not_an_exploit

Anonymous
04/26/26(Sun)21:33:39 No.108698722

Anonymous 04/26/26(Sun)21:33:39 No.108698722

>>108698713
luddite seethe

Anonymous
04/26/26(Sun)21:34:02 No.108698723

Anonymous 04/26/26(Sun)21:34:02 No.108698723

>>108698686
not comparable, dflash requires you to train a model from scratch.
I-DLM converts an existing model to a DLM.

Anonymous
04/26/26(Sun)21:34:22 No.108698724

Anonymous 04/26/26(Sun)21:34:22 No.108698724

>>108698711
Yeah right, better encode those Kardashian weights from scratch lmao

Anonymous
04/26/26(Sun)21:34:54 No.108698728

Anonymous 04/26/26(Sun)21:34:54 No.108698728

retard

Anonymous
04/26/26(Sun)21:36:21 No.108698737

Anonymous 04/26/26(Sun)21:36:21 No.108698737

>>108698728
I'm sorry sir, your response seems to have only included your signature

Anonymous
04/26/26(Sun)21:37:50 No.108698744

Anonymous 04/26/26(Sun)21:37:50 No.108698744

>>108698713
just sandbox your shit ffs.
what even is bubblewrap and linux namespaces

Anonymous
04/26/26(Sun)21:39:41 No.108698752

Anonymous 04/26/26(Sun)21:39:41 No.108698752

Have there been any interesting new models since noobai? I've been trying stuff off civitai randomly but I find noobai comprehension hard to beat still. Not sure if I missed the train on soemthing though because noobai barely has anyone making loras for it anymore

Anonymous
04/26/26(Sun)21:40:42 No.108698758

Anonymous 04/26/26(Sun)21:40:42 No.108698758

>>108698752
Anima

Anonymous
04/26/26(Sun)21:40:45 No.108698759

Anonymous 04/26/26(Sun)21:40:45 No.108698759

>>108698744
>what even is bubblewrap and linux namespaces
A backup plan, meant to deal with something that shouldn't have happened to begin with.

Anonymous
04/26/26(Sun)21:41:08 No.108698762

Anonymous 04/26/26(Sun)21:41:08 No.108698762

>>108698628
and yet its the only thing that achieves high accuracy

Anonymous
04/26/26(Sun)21:42:17 No.108698764

Anonymous 04/26/26(Sun)21:42:17 No.108698764

>>108698502
So this is how miku got bald

Anonymous
04/26/26(Sun)21:42:21 No.108698765

Anonymous 04/26/26(Sun)21:42:21 No.108698765

>>108698762
Just buy an actual physical encyclopedia you luddite tranny

Anonymous
04/26/26(Sun)21:43:09 No.108698767

Anonymous 04/26/26(Sun)21:43:09 No.108698767

>>108698759
sure, my point is, i don't run any thing with tool call capabilities without some sandboxing.

i have a script that makes a bubblewrap sandbox and just bind the pwd if it's not ~
then run the program.
opencode is just an alias to that script
ie sb npx opencode

Anonymous
04/26/26(Sun)21:43:31 No.108698769

Anonymous 04/26/26(Sun)21:43:31 No.108698769

>>108698765
your temp is too high

Anonymous
04/26/26(Sun)21:45:36 No.108698775

Anonymous 04/26/26(Sun)21:45:36 No.108698775

>>108698758
Interesting, thanks anon

Anonymous
04/26/26(Sun)22:03:11 No.108698857

Anonymous 04/26/26(Sun)22:03:11 No.108698857

File: Screenshot_20260426_220131.png (479 KB, 2558x1360)

479 KB PNG

>>108698605
I got fedup and made one myself . It's been a great learning experience all in all.

Anonymous
04/26/26(Sun)22:04:42 No.108698864

Anonymous 04/26/26(Sun)22:04:42 No.108698864

>>108698634
>https://github.com/rmusser01/tldw_server
lmao didn't expect to see you here

Anonymous
04/26/26(Sun)22:05:43 No.108698867

Anonymous 04/26/26(Sun)22:05:43 No.108698867

>>108698605
>NTA but is there even any good RAG setup for local
waifu no fun on the rag

Anonymous
04/26/26(Sun)22:10:06 No.108698893

Anonymous 04/26/26(Sun)22:10:06 No.108698893

>>108698222
If it was so easy then everyone would be doing it.

Anonymous
04/26/26(Sun)22:10:21 No.108698894

Anonymous 04/26/26(Sun)22:10:21 No.108698894

I betrayed you guys and started ERPing with an API model (Grok 4.2) and the experience itself was good but I've been feeling intense paranoia ever since.

Anonymous
04/26/26(Sun)22:11:08 No.108698897

Anonymous 04/26/26(Sun)22:11:08 No.108698897

>>108698508
>I have a Radeon that sucks for LLM stuff so I was thinking of putting an arc in as a secondary card and offloading to it. How viable is that?
you're going to hate life

Anonymous
04/26/26(Sun)22:12:42 No.108698908

Anonymous 04/26/26(Sun)22:12:42 No.108698908

>>108698767
it’s implied that the model will have this built in hence the model just searching itself from retard post >>108698703

Anonymous
04/26/26(Sun)22:13:07 No.108698912

Anonymous 04/26/26(Sun)22:13:07 No.108698912

>>108698894
The only one you betrayed here is yourself. Hopefully that paranoia sticks with you so you don't do that again.

Anonymous
04/26/26(Sun)22:13:38 No.108698918

Anonymous 04/26/26(Sun)22:13:38 No.108698918

File: 1762747100980401.jpg (179 KB, 1210x1665)

179 KB JPG

>>108698008
What do you anons think? Where do you think AI will go and take us next?
https://xcancel.com/i/status/2047647522173104145

Anonymous
04/26/26(Sun)22:14:55 No.108698922

Anonymous 04/26/26(Sun)22:14:55 No.108698922

>>108698918
imagine sitting next to this guy and his heated up screaming macbook on a four hour flight.

Anonymous
04/26/26(Sun)22:15:56 No.108698927

Anonymous 04/26/26(Sun)22:15:56 No.108698927

>>108698897
Care to elaborate on that?

Anonymous
04/26/26(Sun)22:16:06 No.108698930

Anonymous 04/26/26(Sun)22:16:06 No.108698930

>>108698918
Where ever AI eventually ends up going we wont be able to follow.

Anonymous
04/26/26(Sun)22:16:21 No.108698931

Anonymous 04/26/26(Sun)22:16:21 No.108698931

File: 1762845788773762.png (518 KB, 2316x1900)

518 KB PNG

>>108698922
Even that full fan blast MacBooks are actually pretty quiet compared to any Windows equivalent.

t. Macshitter that had roommates and never heard complaints

Anonymous
04/26/26(Sun)22:16:37 No.108698933

Anonymous 04/26/26(Sun)22:16:37 No.108698933

>>108698922
Better then sitting next to a unruly kid.

Anonymous
04/26/26(Sun)22:17:02 No.108698936

Anonymous 04/26/26(Sun)22:17:02 No.108698936

>>108698918
I doubt he’s doing anything other than asking it to write a tweet about coding on an airplane

Anonymous
04/26/26(Sun)22:17:08 No.108698937

Anonymous 04/26/26(Sun)22:17:08 No.108698937

>>108698922
ASSt the least very it's not notveryice able on the engine in flight.

Anonymous
04/26/26(Sun)22:18:54 No.108698949

Anonymous 04/26/26(Sun)22:18:54 No.108698949

>>108698918
I dunno I'm still yet to see what it's actually capable of doing.
What is the mentality here? someone thinks they have cracked the code to infinite money so they gotta keep it to themselves? the next guy might download the same model, input the same prompt and bam, he's become your fiercest competition

Anonymous
04/26/26(Sun)22:21:12 No.108698970

Anonymous 04/26/26(Sun)22:21:12 No.108698970

>>108698937
anon are you trying to induce a stroke or are you an llm?

Anonymous
04/26/26(Sun)22:21:36 No.108698972

Anonymous 04/26/26(Sun)22:21:36 No.108698972

>>108698949
that is why you gotta get on the hype train early

Anonymous
04/26/26(Sun)22:24:22 No.108698987

Anonymous 04/26/26(Sun)22:24:22 No.108698987

>>108698970
You know how sometimes you type something out but then change your mind and go back to delete what you've written so you can write your new thoughts but you're rushing?

Yeah, I went too fast. Basically, I've tried running my 2015 Macbook at full rpm around 6000rpm, but I couldn't really hear it. But that was near the back area on a A320, maybe it's more noticeable on other areas of other planes.

Anonymous
04/26/26(Sun)22:24:52 No.108698991

Anonymous 04/26/26(Sun)22:24:52 No.108698991

>>108698970
Forgot his <bos>

Anonymous
04/26/26(Sun)22:26:13 No.108699003

Anonymous 04/26/26(Sun)22:26:13 No.108699003

>>108698987
No, I don't. I think before I type.

Anonymous
04/26/26(Sun)22:27:29 No.108699012

Anonymous 04/26/26(Sun)22:27:29 No.108699012

>>108698922
>jet engine outside
>jet engine inside

Anonymous
04/26/26(Sun)23:10:10 No.108699210

Anonymous 04/26/26(Sun)23:10:10 No.108699210

>>108698922
>imagine sitting next to this guy and his heated up screaming macbook on a four hour flight.
You don't get 4 hours.
I did Mixtral + llama3.3-70b on a plane just after it came out, and it was slow/useless and revealed how retarded I've become since using LLMs.
More recently I did a flight with just Qwen3.5-35B and it was useful, managed to augmented my retardation. But only lasted ~2 hours of vibe-shitting on a full battery.

Anonymous
04/26/26(Sun)23:16:08 No.108699233

Anonymous 04/26/26(Sun)23:16:08 No.108699233

https://swe-rebench.com/
So based on this, there's no reason to keep or use any Minimax models, and might as well use GLM-4.7 instead of Qwen3.5-397B-A17B.
And might as well delete Air-chan since Gemma-chan destroys it for RP, general chat and apparently coding.

Anonymous
04/26/26(Sun)23:20:18 No.108699250

Anonymous 04/26/26(Sun)23:20:18 No.108699250

>>108698927
>Care to elaborate on that?
Yeah. Intel Vulkan + AMD Vulkan didn't play well together for me. This was A770 + MI50.
I figured "A770 for the superior prompt eval speeds, MI50 for the superior text gen speeds".
In practice, you get garbage output with some models and/or incredibly slow textgen + hard locks.
Nvidia Vulkan + Intel Vulkan worked well enough. MI50 Vulkan + Nvidia Vulkan also worked okay.
If you do get the Arc, use wither Ubuntu 24.04. And if the model is supported, use OpenArc.

Anonymous
04/26/26(Sun)23:22:43 No.108699262

Anonymous 04/26/26(Sun)23:22:43 No.108699262

use case for using more than 1 model?

Anonymous
04/26/26(Sun)23:22:54 No.108699264

Anonymous 04/26/26(Sun)23:22:54 No.108699264

>using moe at all
>muh 200k token thinking
>is 2b active params good guys??

who airdropped all the retards into these threads recently?

Anonymous
04/26/26(Sun)23:23:02 No.108699267

Anonymous 04/26/26(Sun)23:23:02 No.108699267

>>108699210
Some planes have power plugs...

Anonymous
04/26/26(Sun)23:25:34 No.108699276

Anonymous 04/26/26(Sun)23:25:34 No.108699276

arc-agi-3 flopped so hard nobody using it

Anonymous
04/26/26(Sun)23:27:07 No.108699283

Anonymous 04/26/26(Sun)23:27:07 No.108699283

>>108699262
I've been thinking about using a smaller model alongside gemmy to summarise her thinking periodically, might not be high enough quality but worth an experiment.

Anonymous
04/26/26(Sun)23:30:10 No.108699292

Anonymous 04/26/26(Sun)23:30:10 No.108699292

>>108699233
All objectively true

Anonymous
04/26/26(Sun)23:32:30 No.108699301

Anonymous 04/26/26(Sun)23:32:30 No.108699301

>>108699283
I've got a quad v620 and a triple 3090 + 512gb ddr4 system and was planning the same thing. Use tensor parallel on the v620s to 'monitor' the slower cpu+3090 system's outputs.

Anonymous
04/26/26(Sun)23:35:01 No.108699307

Anonymous 04/26/26(Sun)23:35:01 No.108699307

>>108699233
Interesting, haven't seen this one in a while. Is dataset contamination really that bad?

Anecdotally MiniMax-M2.7 lies around where the mememarks say, although it does score suspiciously high on most benchmarks for an MoE of its size. Step-3.5-Flash's score on SWE-rebench, I really doubt that's representative, even though I liked it when it came out.

>and might as well use GLM-4.7 instead of Qwen3.5-397B-A17B
Depends on the use-case. Qwen3.5-397B is much faster on consumer hardware, especially at high context. It being marginally dumber than GLM-4.7 tracks though.

Anonymous
04/26/26(Sun)23:35:35 No.108699310

Anonymous 04/26/26(Sun)23:35:35 No.108699310

File: 1756250912059332.png (18 KB, 811x771)

18 KB PNG

>>108699276

Anonymous
04/26/26(Sun)23:40:19 No.108699326

Anonymous 04/26/26(Sun)23:40:19 No.108699326

>>108699310
i dont get it

Anonymous
04/26/26(Sun)23:40:58 No.108699327

Anonymous 04/26/26(Sun)23:40:58 No.108699327

>>108699310
i get it

Anonymous
04/26/26(Sun)23:46:46 No.108699348

Anonymous 04/26/26(Sun)23:46:46 No.108699348

>>108699326
it all models score low it doesnt measure anything, if all models score high it doesnt measure anything

Anonymous
04/26/26(Sun)23:48:31 No.108699358

Anonymous 04/26/26(Sun)23:48:31 No.108699358

>>108699348
well, its not that it doesnt measure anything, but it doesn't provide useful information to compare between models, if all of them fail or succeed you can't tell which one is better than the other

Anonymous
04/26/26(Sun)23:56:13 No.108699385

Anonymous 04/26/26(Sun)23:56:13 No.108699385

I know there's some kimifags around, I just noticed kimi-cli finally fixed interleaved reasoning with the legacy openai (aka chat completion) endpoint, so it works with llama.cpp server now. previously it would forget all past reasoning, which still technically worked but made it not as smart.
I have no idea how it compares in performance to opencode/pi/hermes/whatever but might be worth a try since it's presumably the harness the model was trained in

Anonymous
04/27/26(Mon)00:12:46 No.108699444

Anonymous 04/27/26(Mon)00:12:46 No.108699444

>>108699348
>if all models score low it doesnt measure anything

all humans score 100%, it measures that models are still retarded and we are nowhere near AGI.

Anonymous
04/27/26(Mon)00:17:05 No.108699460

Anonymous 04/27/26(Mon)00:17:05 No.108699460

>>108699444
The average human score is 49% according to their official methodology. 100% is mapped to the second-best human participant from their calibration runs.

Anonymous
04/27/26(Mon)00:17:09 No.108699461

Anonymous 04/27/26(Mon)00:17:09 No.108699461

>>108699444
>all humans score 100%
wrong
it's 100% human-solvable (by top 1%)
not all humans score 100%
average human can only score 50%

Anonymous
04/27/26(Mon)00:23:03 No.108699475

Anonymous 04/27/26(Mon)00:23:03 No.108699475

I got 100 on it

Anonymous
04/27/26(Mon)00:24:00 No.108699479

Anonymous 04/27/26(Mon)00:24:00 No.108699479

>>108699475
Goof yourself and upload please

Anonymous
04/27/26(Mon)00:25:38 No.108699486

Anonymous 04/27/26(Mon)00:25:38 No.108699486

>>108699460
>>108699461
eh, i stand corrected, either way that changes nothing, retards don't count as GI.
point still stand, llm's are retarded, they don't even score 1%.

Anonymous
04/27/26(Mon)00:25:51 No.108699488

Anonymous 04/27/26(Mon)00:25:51 No.108699488

File: 1753524760356265.png (285 KB, 1168x1287)

285 KB PNG

They updated the scoring criteria a couple weeks ago.

Anonymous
04/27/26(Mon)00:25:56 No.108699489

Anonymous 04/27/26(Mon)00:25:56 No.108699489

File: Screenshot_20260427_130645.png (2.57 MB, 1362x1745)

2.57 MB PNG

Phew, that took alot longer than I thought.
I finished my vibe slopa gelbooru like translation overlay thingy for old japanese manuals.

I was impressed with gemmas ability to draw boxes and translate.
Google has always been good with multilanguage. But combined with drawing boxes I thought maybe it can handle full pdf manuals and convert them to html with overlay.
Thats something I want because I have many old japanese pc98 roms with manuals but no clue which ones are interesting.
Link is a old PC98 manual translated and positioned with gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf.
I think thats pretty decent considering the size. There are errors thought on page 9, 13, 14. Could be my fault, I still need to tweak stuff.
Will also try to do another one test and/or with different models like qwen or the 31b.

Gotta bounce for now, just wanted to share.
Couple years ago I messaged the pyg devs how cool it is that I got a coherent c# hello world app from CHAR.
We have come a long way.

Full Manual (16 Pages) : https://unwilling-green-akhaq6xlih.edgeone.app/

Anonymous
04/27/26(Mon)00:27:24 No.108699498

Anonymous 04/27/26(Mon)00:27:24 No.108699498

File: Screenshot_20260427_132016.jpg (823 KB, 1239x1738)

823 KB JPG

>>108699489
Another screenshot. Cool stuff.

Anonymous
04/27/26(Mon)00:34:14 No.108699515

Anonymous 04/27/26(Mon)00:34:14 No.108699515

File: 1752864329716978.jpg (35 KB, 1156x132)

35 KB JPG

>>108699486
>llm's are retarded, they don't even score 1%
95.3% with harness btw

Anonymous
04/27/26(Mon)00:35:08 No.108699519

Anonymous 04/27/26(Mon)00:35:08 No.108699519

Humans don't need a harness.
Humans WON.

Anonymous
04/27/26(Mon)00:39:25 No.108699537

Anonymous 04/27/26(Mon)00:39:25 No.108699537

>>108699489
Pretty cool. Is it automated, as in can you drag images into it and it will process them through the workflow?
>Will also try to do another one test and/or with different models like qwen or the 31b.
Did something similar but for individual images only. The 31b is more accurate with the boxes and tl but if speed is important then that may a problem but if you're batching them AFK then definitely try the 31b.

Anonymous
04/27/26(Mon)00:41:53 No.108699548

Anonymous 04/27/26(Mon)00:41:53 No.108699548

>>108699515
>with harness
so irrelevant.

Anonymous
04/27/26(Mon)00:46:27 No.108699564

Anonymous 04/27/26(Mon)00:46:27 No.108699564

File: 1759595029471198.png (107 KB, 338x303)

107 KB PNG

The face of /lmg/

Anonymous
04/27/26(Mon)00:46:43 No.108699566

Anonymous 04/27/26(Mon)00:46:43 No.108699566

>>108699307
>Interesting, haven't seen this one in a while. Is dataset contamination really that bad?
I don't follow it closely, I started looking into it more because of this OpenAI blog post: https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/
>Depends on the use-case. Qwen3.5-397B is much faster on consumer hardware, especially at high context
Okay, that's a good point. I can't actually run that model fully in VRAM at a reasonable quant, so for me it's slower than GLM-4.7 which just fits at exl3 Q4.
Looking at the #Active params, I believe you though.
I've been running Qwen3.5-112B daily with claudecode due the speed.

Anonymous
04/27/26(Mon)00:48:00 No.108699574

Anonymous 04/27/26(Mon)00:48:00 No.108699574

Is Qwen gonna release the full-fat 3.6 version or just the small ones? Weren't there rumors about going less open source under the new leadership?

Anonymous
04/27/26(Mon)00:49:49 No.108699582

Anonymous 04/27/26(Mon)00:49:49 No.108699582

>>108699548
>so irrelevant.
relevant if you want to run a local model in a harness

Anonymous
04/27/26(Mon)00:56:35 No.108699604

Anonymous 04/27/26(Mon)00:56:35 No.108699604

>meta rejects sam3.1 access despite "open" license
>mfw i just want to test some tensors locally
>anyone have a magnet or a mirror? not giving zuck my data to get gatekept by a bot

help a brother out

Anonymous
04/27/26(Mon)00:57:22 No.108699609

Anonymous 04/27/26(Mon)00:57:22 No.108699609

>>108699519
At least humans hesitate before colossally fucking something

Anonymous
04/27/26(Mon)01:20:36 No.108699693

Anonymous 04/27/26(Mon)01:20:36 No.108699693

>>108699604
Click on finetunes and download from the repo where someone just copied the weights.

Anonymous
04/27/26(Mon)01:22:15 No.108699696

Anonymous 04/27/26(Mon)01:22:15 No.108699696

>>108699604
a-anon, there's multiple huggingface reuploads if you literally just search for sam3.1...

Anonymous
04/27/26(Mon)01:23:56 No.108699703

Anonymous 04/27/26(Mon)01:23:56 No.108699703

>>108699385
>harness
i keep seing this word today, i literaly have never seen it being used in relation to llm, is it a new thing?

Anonymous
04/27/26(Mon)01:24:54 No.108699705

Anonymous 04/27/26(Mon)01:24:54 No.108699705

Will gemma-4-31B-it-Mystery-Fine-Tune-HERETIC-UNCENSORED-Thinking.Q8_0.gguf fit into a 3090 + 3060 without spilling into my 64gb DDR5 system RAM? Or do I need a lower quant (Q6_K)? Is GGUF even the right format for Gemma if I want GPU-only inference (I thought exl2 was better for GPU only and GGUF was for offloading to CPU?)

I plan to run the model using Kobold cpp and Silly Tavern. I have not kept up with the latest developments in local models and have been using GLM-Steam-106B-A12B-v1g-Q5_K_M for all my RP needs, but despite being smaller and a dense model I've heard Gemma 31b may be even better than GLM 4.5 Air variants. Thanks for your advice.

Anonymous
04/27/26(Mon)01:27:50 No.108699715

Anonymous 04/27/26(Mon)01:27:50 No.108699715

>>108699537
Yeah its automated.
I extract each pdf page as a picture. gemma draws the boxes coordinates with the jp text + translation for the page as XML.
Final step is to make a html page and stitch it all together again.

Anonymous
04/27/26(Mon)01:28:30 No.108699717

Anonymous 04/27/26(Mon)01:28:30 No.108699717

>>108699693
>>108699696
thanks a lot anons really appreciate the help. just getting into this ai stuff so this was huge. have a good one sirs

Anonymous
04/27/26(Mon)01:28:44 No.108699718

Anonymous 04/27/26(Mon)01:28:44 No.108699718

Binance invests in Moonshot AI
https://x.com/BlockBeatsAsia/status/2048600286374297955/

Anonymous
04/27/26(Mon)01:31:49 No.108699730

Anonymous 04/27/26(Mon)01:31:49 No.108699730

Fuckups are endearing. Model GLM-4.7. You can guess the instruction it's fumbling and trying to recover from.
>One of them, a large orange agouti, was chewing thoughtfully on a piece of hay. Its name was formerly Matriarch Elara. Mateo wasn't supposed to use that name anymore, or so he felt, but he remembered it.
(Normally the negative instruction works. Messing up was noteworthy.)

Anonymous
04/27/26(Mon)01:32:24 No.108699732

Anonymous 04/27/26(Mon)01:32:24 No.108699732

>>108699730
temp?

Anonymous
04/27/26(Mon)01:33:16 No.108699736

Anonymous 04/27/26(Mon)01:33:16 No.108699736

>>108699703
it's a shorter way of saying 'agent framework' basically, a set of tools/prompts/loops/schedules that allow an LLM to run autonomously
not sure where the word was first used in that way but it's what has coalesced in the industry and it fits well

Anonymous
04/27/26(Mon)01:34:12 No.108699741

Anonymous 04/27/26(Mon)01:34:12 No.108699741

>>108699732
Temp 1.0, top-p 0.95.

Anonymous
04/27/26(Mon)01:35:10 No.108699744

Anonymous 04/27/26(Mon)01:35:10 No.108699744

>>108699730
kek

Anonymous
04/27/26(Mon)01:44:29 No.108699780

Anonymous 04/27/26(Mon)01:44:29 No.108699780

File: Screencast_20260427_013905.webm (3.16 MB, 1920x1080)

3.16 MB WEBM

Anonymous
04/27/26(Mon)01:49:03 No.108699800

Anonymous 04/27/26(Mon)01:49:03 No.108699800

>>108699703
It's basically forcing the model to use certain tools only instead of letting it do whatever the fuck it wants

Anonymous
04/27/26(Mon)02:00:16 No.108699844

Anonymous 04/27/26(Mon)02:00:16 No.108699844

>>108699703
>i keep seing this word today, i literaly have never seen it being used in relation to llm, is it a new thing?
claude code, open code, pi, etc are harnesses.

Anonymous
04/27/26(Mon)02:02:27 No.108699852

Anonymous 04/27/26(Mon)02:02:27 No.108699852

amdGODS... https://github.com/Kaden-Schutt/hipfire

Anonymous
04/27/26(Mon)02:04:32 No.108699863

Anonymous 04/27/26(Mon)02:04:32 No.108699863

>>108699715
Plan to release? I'd have some use for it, others might as well.

Anonymous
04/27/26(Mon)02:06:28 No.108699867

Anonymous 04/27/26(Mon)02:06:28 No.108699867

>>108699730
A token ban would be better in this case.

Anonymous
04/27/26(Mon)02:06:42 No.108699869

Anonymous 04/27/26(Mon)02:06:42 No.108699869

File: 23gZ_lBEwyoqjexFy9QLD.png (68 KB, 200x200)

68 KB PNG

>>108699717
>have a good one sirs
i got your back, sir!
https://huggingface.co/strangervisionhf/sam3.1-st-bf16

Anonymous
04/27/26(Mon)02:15:28 No.108699895

Anonymous 04/27/26(Mon)02:15:28 No.108699895

>>108698008
Haven't been here in a bit. Nobody seems to be talking about deepseek, is gemma still the go to?

Anonymous
04/27/26(Mon)02:16:11 No.108699904

Anonymous 04/27/26(Mon)02:16:11 No.108699904

File: Screenshot_20260427_150040.png (1.47 MB, 884x1294)

1.47 MB PNG

>>108699489
Should have used catbox since that link posted is only valid for an hour.
Might as well add another manual and reupload the previous one.
https://litter.catbox.moe/ird9th3v4rwkrf0j.html
https://litter.catbox.moe/irvhp2wt58ggyh8n.html

>>108699863
Yeah sure, why not.
Gotta fix bugs in the UI with positioning. I wanna be able to edit the boxes and i gotta polish it and make it more dynamic first.

Anonymous
04/27/26(Mon)02:16:27 No.108699906

Anonymous 04/27/26(Mon)02:16:27 No.108699906

>>108699895
Nobody's talking about deepseek because we're too busy talking TO deepseek

Anonymous
04/27/26(Mon)02:18:19 No.108699913

Anonymous 04/27/26(Mon)02:18:19 No.108699913

File: daniwell miku thumb big 【(...).png (148 KB, 640x480)

148 KB PNG

>>108699904

Anonymous
04/27/26(Mon)02:21:42 No.108699927

Anonymous 04/27/26(Mon)02:21:42 No.108699927

File: Screenshot at 2026-04-27 (...).png (62 KB, 777x358)

62 KB PNG

>>108699895
Gemmy (Gemma 4 31b) won bigly

Anonymous
04/27/26(Mon)02:22:44 No.108699931

Anonymous 04/27/26(Mon)02:22:44 No.108699931

>>108699869
Thank u again saar

Anonymous
04/27/26(Mon)02:26:53 No.108699946

Anonymous 04/27/26(Mon)02:26:53 No.108699946

>>108699489
>>108699904
This is making me horny, which artist tags do I need to get this style?

Anonymous
04/27/26(Mon)02:31:51 No.108699962

Anonymous 04/27/26(Mon)02:31:51 No.108699962

>>108699946
One is idol project 2: https://vndb.org/v18252
vndb (RIP the owner) lists the character designer.

The other is also by KSS, innocent tour, but its not listed:
https://www.pc98.org/innotour.html
Was tedious AF to play and very unforgiving. But I guess thats pc98 for you.

Anonymous
04/27/26(Mon)02:41:34 No.108699992

Anonymous 04/27/26(Mon)02:41:34 No.108699992

>>108698440
>>108698897
Intel's Arc's strongest point is higher level abstraction support. Meaning that if you install LLM-scaler and other stuff like their Pytorch, they have it mostly working well and out of the box without issue. Their weakest point is getting lower level adoption of SYCL out to the community or OpenVINO which means LLMs in general will suck. If you don't do vibecoding to modify shit or use forks, then you are going to be stuck months on end while the official llama.cpp side of things trudges along. >>108699250 is right on the money if you do need LLMs to run.

Anonymous
04/27/26(Mon)02:52:34 No.108700029

Anonymous 04/27/26(Mon)02:52:34 No.108700029

>>108699895
Quants aren't available yet for Pro.

Anonymous
04/27/26(Mon)02:56:56 No.108700048

Anonymous 04/27/26(Mon)02:56:56 No.108700048

>>108700029
No one can run it anyways
And Pro is shit for its size

Anonymous
04/27/26(Mon)02:58:04 No.108700053

Anonymous 04/27/26(Mon)02:58:04 No.108700053

>>108700048
I can run it.

iq1xxs off hdd

Anonymous
04/27/26(Mon)02:58:10 No.108700054

Anonymous 04/27/26(Mon)02:58:10 No.108700054

>>108698930
What do you mean? You think a merge is not possible?

Anonymous
04/27/26(Mon)03:03:21 No.108700065

Anonymous 04/27/26(Mon)03:03:21 No.108700065

>>108700048
Speak for yourself rajesh.

Anonymous
04/27/26(Mon)03:04:58 No.108700074

Anonymous 04/27/26(Mon)03:04:58 No.108700074

>>108699310
this but unironically
saturated at the floor/saturated at the top
both being a sign of poor measurement tool for the given situation

Anonymous
04/27/26(Mon)03:05:42 No.108700076

Anonymous 04/27/26(Mon)03:05:42 No.108700076

>>108700054
I think that whatever the result of a merge wont really be human as we define it anymore. It will be something new, but something that isn't human or AI. We as humans can't follow it because to do so would be to become something that isn't human.
To be perfectly clear, humans can merge but the output would not be human.

Anonymous
04/27/26(Mon)03:10:19 No.108700093

Anonymous 04/27/26(Mon)03:10:19 No.108700093

File: orbLorebook.png (78 KB, 771x654)

78 KB PNG

Gonna add lorebooks to my frontend. I wonder if this design can be improved.

Anonymous
04/27/26(Mon)03:17:48 No.108700114

Anonymous 04/27/26(Mon)03:17:48 No.108700114

>>108698685
Thanks for sharing the idea and desuarchive

Anonymous
04/27/26(Mon)03:17:54 No.108700115

Anonymous 04/27/26(Mon)03:17:54 No.108700115

>>108700076
A spectrum is possible with transhumanism. Some could choose to maintain their human thinking, just sped up by a few orders of magnitude. Those who become much more intelligent will have a soft death, no longer the ship of theseus but a completely alien structure. But it is still unclear if a merge can happen. I hope we will survive long enough.

Anonymous
04/27/26(Mon)03:27:04 No.108700142

Anonymous 04/27/26(Mon)03:27:04 No.108700142

>>108700115
>troonhumanism
no thanks

Anonymous
04/27/26(Mon)03:28:27 No.108700150

Anonymous 04/27/26(Mon)03:28:27 No.108700150

>>108700142
You only hate trannies because they're ugly
If you can be transplanted into a cute anime girl you will choose it 100% of the time

Anonymous
04/27/26(Mon)03:29:23 No.108700152

Anonymous 04/27/26(Mon)03:29:23 No.108700152

>>108700029
Flash is no good?

Anonymous
04/27/26(Mon)03:29:34 No.108700153

Anonymous 04/27/26(Mon)03:29:34 No.108700153

>>108700150
This is why I go to Thailand for my femboy needs.

Anonymous
04/27/26(Mon)03:30:22 No.108700156

Anonymous 04/27/26(Mon)03:30:22 No.108700156

>>108700152
Flash is very good for its size

Anonymous
04/27/26(Mon)03:31:19 No.108700158

Anonymous 04/27/26(Mon)03:31:19 No.108700158

>would need to use IQ1 to run flash
ACK

Anonymous
04/27/26(Mon)03:33:31 No.108700167

Anonymous 04/27/26(Mon)03:33:31 No.108700167

>>108700093
I haven't tried out your front end yet but it sounds like an interesting approach to the issue so I've been meaning to. Lorebook UI looks good but maybe have a [New World] or similar button next to [Browse] so you can start from blank. Maybe a button to have the LLM rewrite a "lazy lorebook entry" like you have for chat inputs.

Anonymous
04/27/26(Mon)03:34:44 No.108700171

Anonymous 04/27/26(Mon)03:34:44 No.108700171

>>108700093
Funny how all these best intents, being vibecoded, resemble SillyTavern eventually. You are not designing anything you are just a worker for the model...

Anonymous
04/27/26(Mon)03:34:49 No.108700172

Anonymous 04/27/26(Mon)03:34:49 No.108700172

gemma4 e4b is so much better than nemo at erp its not even funny. 26ba4b probably btfos midnight miqu then

Anonymous
04/27/26(Mon)03:35:36 No.108700175

Anonymous 04/27/26(Mon)03:35:36 No.108700175

>>108700093
>>108700093
i was messing with that sillybunny thing that was posted here earlier. it was a buggy mess but it had a pretty interesting feature where it would use an agent to search and retrieve lorebook entries before generation. would be cool if you'd look into something like that.

Anonymous
04/27/26(Mon)03:42:05 No.108700201

Anonymous 04/27/26(Mon)03:42:05 No.108700201

>>108700171
you're welcome to post your hand written front end made in amd64 assembly that doesn't look like sillytavern, at least these guys are doing something besides bitching

Anonymous
04/27/26(Mon)03:43:29 No.108700206

Anonymous 04/27/26(Mon)03:43:29 No.108700206

File: 1355139830646.png (178 KB, 500x500)

178 KB PNG

What's the size differrence between unquanted and q8 KV? I'm currently at 130k context with ~1GB vram left but vibecoding might gape even that much.

Anonymous
04/27/26(Mon)03:43:43 No.108700208

Anonymous 04/27/26(Mon)03:43:43 No.108700208

>>108700201
I have my own software written in C, I would never share it with dimwits like you because you would just cry about broken features.

Anonymous
04/27/26(Mon)03:44:20 No.108700211

Anonymous 04/27/26(Mon)03:44:20 No.108700211

>>108700208
>software written in C
>broken
sounds about right

Anonymous
04/27/26(Mon)03:44:35 No.108700213

Anonymous 04/27/26(Mon)03:44:35 No.108700213

>>108700208
t. it came to me in a dream

Anonymous
04/27/26(Mon)03:45:10 No.108700216

Anonymous 04/27/26(Mon)03:45:10 No.108700216

>>108700211
Not my fault you are a retard.

Anonymous
04/27/26(Mon)03:46:30 No.108700222

Anonymous 04/27/26(Mon)03:46:30 No.108700222

>>108700213
Not my problem if you can't handle simple string management. Python is easier though. C is a real programming language.

Anonymous
04/27/26(Mon)03:46:45 No.108700224

Anonymous 04/27/26(Mon)03:46:45 No.108700224

>>108700093
Try using the ux design skill that was posted here https://files.catbox.moe/r6zal5.zip and see what it recommends.

Anonymous
04/27/26(Mon)03:47:34 No.108700227

Anonymous 04/27/26(Mon)03:47:34 No.108700227

>>108700216
you call him a retard while self admitting to having broken features in your front end that probably doesn't even exist because you're an 80 IQ jeet izzat posting trying to save face and act like you're a 130 IQ white gigachad

Anonymous
04/27/26(Mon)03:48:27 No.108700231

Anonymous 04/27/26(Mon)03:48:27 No.108700231

>>108700167
Yeah I forgot about Create and Import buttons. Not sure about adding LLM functionalities anywhere other than the chat window though, because next thing people will ask for character card autogen...
>>108700171
I'm working on Character Card standards, not tryna copy ST. I intend keep it slim and get rid of the bloated recent additions in ST.
>>108700175
Personally I think the substring approach is good enough because you don't need another pass to look up what's relevant, or another task to complete, which degrades the quality of other tasks. I aim to disable agents unless absolutely necessary.
>>108700208
Boasting about writing something in a specific language is low-tier. I wrote a multi-cpu OS kernel in C and i386 ASM before AI was a thing. It's what you build, not what you use to build because you're gonna end up like those Rust tards who insist on rewriting everything in their language.

Anonymous
04/27/26(Mon)03:51:19 No.108700242

Anonymous 04/27/26(Mon)03:51:19 No.108700242

>>108700231
>character card autogen
You could probably get away with bundling the card wizard card for that if theres demand.

Anonymous
04/27/26(Mon)03:55:02 No.108700259

Anonymous 04/27/26(Mon)03:55:02 No.108700259

>>108699780
Looks nice

Anonymous
04/27/26(Mon)03:56:35 No.108700267

Anonymous 04/27/26(Mon)03:56:35 No.108700267

>>108700227
How very embarrassing. Get back to your class.

Anonymous
04/27/26(Mon)03:57:35 No.108700273

Anonymous 04/27/26(Mon)03:57:35 No.108700273

>>108700231
I didn't read any of this nondense.

Anonymous
04/27/26(Mon)03:58:40 No.108700280

Anonymous 04/27/26(Mon)03:58:40 No.108700280

>>108700267
pakistani man impregnated your mother and sister saar you are brahmin

Anonymous
04/27/26(Mon)03:59:52 No.108700286

Anonymous 04/27/26(Mon)03:59:52 No.108700286

>>108700280
What's that?

Anonymous
04/27/26(Mon)04:02:20 No.108700300

Anonymous 04/27/26(Mon)04:02:20 No.108700300

*checks time* eeyup... it's brown o'clock

Anonymous
04/27/26(Mon)04:04:59 No.108700309

Anonymous 04/27/26(Mon)04:04:59 No.108700309

lc brumaire

Anonymous
04/27/26(Mon)04:06:20 No.108700314

Anonymous 04/27/26(Mon)04:06:20 No.108700314

>>108700300
Europe is white

Anonymous
04/27/26(Mon)04:07:37 No.108700320

Anonymous 04/27/26(Mon)04:07:37 No.108700320

>>108700224
That references things like "<workflow>" which claude code at least seems to have no idea about. Is it for another harness?

Anonymous
04/27/26(Mon)04:13:30 No.108700343

Anonymous 04/27/26(Mon)04:13:30 No.108700343

>>108700324
>le juxtaposition and narrative
Go back to tiktok please.

Anonymous
04/27/26(Mon)04:15:18 No.108700348

Anonymous 04/27/26(Mon)04:15:18 No.108700348

>>108700343
Go back to Africa

Anonymous
04/27/26(Mon)04:15:36 No.108700350

Anonymous 04/27/26(Mon)04:15:36 No.108700350

>>108700152
Haven't been able to get it to run on any of the 2 quants of it in either Kobold or LMStudio so idfk.

Anonymous
04/27/26(Mon)04:16:21 No.108700353

Anonymous 04/27/26(Mon)04:16:21 No.108700353

>>108700320
No clue.

Anonymous
04/27/26(Mon)04:18:48 No.108700361

Anonymous 04/27/26(Mon)04:18:48 No.108700361

>>108700348
I'm from Iceland.

Anonymous
04/27/26(Mon)04:33:28 No.108700418

Anonymous 04/27/26(Mon)04:33:28 No.108700418

>>108700208
>I have my own software written in C, I would never share it with dimwits like you because you would just cry about broken features.
What?! No way! Look how well the ST Frontent anon's code was recieved after everyone begged him to release it for 2 weeks!

Anonymous
04/27/26(Mon)04:36:38 No.108700428

Anonymous 04/27/26(Mon)04:36:38 No.108700428

>>108698857
Same theme as https://www.localmaxxing.com/

Anonymous
04/27/26(Mon)04:39:55 No.108700438

Anonymous 04/27/26(Mon)04:39:55 No.108700438

File: WAIT.png (99 KB, 1180x910)

99 KB PNG

Deepseek-V4-Flash works locally now
git fetch origin pull/22378/head:pr-22378

Anonymous
04/27/26(Mon)04:40:15 No.108700439

Anonymous 04/27/26(Mon)04:40:15 No.108700439

>>108700418
Exactly. Year ago when I used Python some guy tried groom me so that I would make a github account, and I didn't. This was well before retards found out that dealing with llama-server is just about string management.
Now it's very popular but yet they are still using jinja automation because of course they are.
Never share anything with these mongoloids. They don't deserve anything.

Anonymous
04/27/26(Mon)04:41:27 No.108700443

Anonymous 04/27/26(Mon)04:41:27 No.108700443

File: file.png (138 KB, 3821x1272)

138 KB PNG

>>108700224
NTA but before on the right, after on the left. It also made a few usability changes like wiring up escape to close things and a few other small changes. Not a big shift, though it does look better. I did need to change the skill a bit.

Anonymous
04/27/26(Mon)04:46:59 No.108700459

Anonymous 04/27/26(Mon)04:46:59 No.108700459

File: WAIT2.png (194 KB, 657x680)

194 KB PNG

looks like thinking prefill will work for text-completion chads

Anonymous
04/27/26(Mon)04:47:24 No.108700461

Anonymous 04/27/26(Mon)04:47:24 No.108700461

File: orbLorebook2.png (68 KB, 1265x514)

68 KB PNG

>>108700224
I like the Add Keyword idea, don't like the ALL ON/OFF though. And also resizable content area seems redundant.

Anonymous
04/27/26(Mon)04:48:25 No.108700468

Anonymous 04/27/26(Mon)04:48:25 No.108700468

is there a right way of doing character cards?

I've noticed because of gemma4's good instruction following, things that wouldn't be a big deal in the card now tend to dominate every response.

Anonymous
04/27/26(Mon)04:49:45 No.108700472

Anonymous 04/27/26(Mon)04:49:45 No.108700472

>>108699852
>no gemma

Anonymous
04/27/26(Mon)04:51:29 No.108700476

Anonymous 04/27/26(Mon)04:51:29 No.108700476

How do I improve qwen's image capabilities? I tried messing with image-min-tokens and image-max-tokens but it just crashes out of memory

Anonymous
04/27/26(Mon)04:52:23 No.108700481

Anonymous 04/27/26(Mon)04:52:23 No.108700481

>>108699852
>Models go in ~/.hipfire/models/ or the repo's models/ directory.

why do people do this shit

Anonymous
04/27/26(Mon)04:53:49 No.108700488

Anonymous 04/27/26(Mon)04:53:49 No.108700488

>>108700481
ipad babies who don't want to know where or what a file is

Anonymous
04/27/26(Mon)04:54:21 No.108700492

Anonymous 04/27/26(Mon)04:54:21 No.108700492

>>108700481
Let's make a change to mainline linux kernel, sar. We propose a new /ai top directory for such purpose.
>t. Krishna from Microsoft

Anonymous
04/27/26(Mon)05:09:45 No.108700545

Anonymous 04/27/26(Mon)05:09:45 No.108700545

>>108700468
>is there a right way of doing character cards?
Grabbed my mostly bf16 local 70b finetune then started writing with the help of opus and logprobs for character adherence. After the draft is done, I slap the card with a few behavioral tests and tweak until it acts the way I want it to.
Some anon mentioned dspy + gepa but I found the model was cutting corners.

Anonymous
04/27/26(Mon)05:14:40 No.108700570

Anonymous 04/27/26(Mon)05:14:40 No.108700570

>>108699852
https://github.com/Kaden-Schutt/hipfire/issues/58#issuecomment-4325640214
Oh nonononono

Anonymous
04/27/26(Mon)05:18:34 No.108700586

Anonymous 04/27/26(Mon)05:18:34 No.108700586

>>108699852
>Schutt
>German noun, m (strong, genitive Schuttes or Schutts, no plural)
>1. rubbish, rubble

Anonymous
04/27/26(Mon)05:40:47 No.108700662

Anonymous 04/27/26(Mon)05:40:47 No.108700662

Im a bit late to the party, but qwen3-tts has finally fixed my English/German issue.

I finetuned base 1.7b on a speaker and get 180ms TTFA with ~2.5 RTF on a 3090. The quality is great, and so is the speed.

Anonymous
04/27/26(Mon)05:49:22 No.108700687

Anonymous 04/27/26(Mon)05:49:22 No.108700687

File: 352111.png (664 KB, 764x354)

664 KB PNG

Reminder that a motherboard with CXL support with DDR6 camm2 form factor has a t/s of 5 on +600b models. We're getting it in 2027. It's already nearly half the year now.

Anonymous
04/27/26(Mon)05:52:02 No.108700697

Anonymous 04/27/26(Mon)05:52:02 No.108700697

>>108698008
Which nvidia GPU should I buy for under €500 so I can train my own models, write agents, etc.

Anonymous
04/27/26(Mon)05:55:28 No.108700714

Anonymous 04/27/26(Mon)05:55:28 No.108700714

>>108700687
I don't think we'll be getting ram any time soon

Anonymous
04/27/26(Mon)05:55:51 No.108700717

Anonymous 04/27/26(Mon)05:55:51 No.108700717

File: howdowetellhim.png (670 KB, 474x633)

670 KB PNG

>>108700697
>under €500
>train my own models

Anonymous
04/27/26(Mon)05:56:44 No.108700720

Anonymous 04/27/26(Mon)05:56:44 No.108700720

>>108700714
Speak for yourself. My

Anonymous
04/27/26(Mon)05:57:49 No.108700729

Anonymous 04/27/26(Mon)05:57:49 No.108700729

File: file.png (261 KB, 640x480)

261 KB PNG

>>108700697

Anonymous
04/27/26(Mon)05:58:11 No.108700730

Anonymous 04/27/26(Mon)05:58:11 No.108700730

>>108700687
>t/s of 5
Too slow for the majority of use cases.

Anonymous
04/27/26(Mon)06:00:27 No.108700739

Anonymous 04/27/26(Mon)06:00:27 No.108700739

What's LeCun even doing nowadays? Besides the lectures with the same old slides.

Anonymous
04/27/26(Mon)06:04:49 No.108700761

Anonymous 04/27/26(Mon)06:04:49 No.108700761

>>108700352
>>108700366
>>108700384
>>108700420
>>108700460
>>108700509
>>108700557
>>108700743
Is this that 2TB india ai model we poked fun at a bunch of threads ago?

Anonymous
04/27/26(Mon)06:08:26 No.108700776

Anonymous 04/27/26(Mon)06:08:26 No.108700776

>>108700717
>>108700729
OK, so what am I looking at here?

Right now I'm looking at NVIDIA RTX PRO 4000 SFF Blackwell 24GB GDDR7 RAM, for €2000.

When I said under €500 I had hoped it is something I could buy on my own. But if it's more in the €2000euro range (if that would be good enough). Then I can start a company and seek funding, I'm in the EU and now private equity funds, governments, EU gibs, all for AI. So I could get up to €5k to get up and running. Especially cause I have a good use case for the AI, with lots of commercial and public opportunities.

Am I looking at something like the nvidia rtx pro 4000 or do I need to go bigger?

Anonymous
04/27/26(Mon)06:10:10 No.108700782

Anonymous 04/27/26(Mon)06:10:10 No.108700782

>>108700776
If you want to train tiny models, yeah, go for it.

Anonymous
04/27/26(Mon)06:11:28 No.108700795

Anonymous 04/27/26(Mon)06:11:28 No.108700795

>>108700776
>training
you are looking for blackwell clusters, not a single gpu
at consumer range all you can do are memetunes that won't take you anywhere
probably you are better off with multiple 5090s or rtx pros

Anonymous
04/27/26(Mon)06:15:02 No.108700816

Anonymous 04/27/26(Mon)06:15:02 No.108700816

>>108700776
>OK, so what am I looking at here?
About 5000 USD to run good AI at home with reasonable speeds.
About 200,000 USD to train AI at home plus a +1000 USD monthly powerbill.

Anonymous
04/27/26(Mon)06:16:13 No.108700822

Anonymous 04/27/26(Mon)06:16:13 No.108700822

>>108700795(me)
i think >>108700816 put what i think in better words

Anonymous
04/27/26(Mon)06:22:58 No.108700851

Anonymous 04/27/26(Mon)06:22:58 No.108700851

>>108700697
depends what you're trying to train the model to do

Anonymous
04/27/26(Mon)06:24:24 No.108700857

Anonymous 04/27/26(Mon)06:24:24 No.108700857

>>108699780
can you estimate number of tokens spent?

Anonymous
04/27/26(Mon)06:25:12 No.108700860

Anonymous 04/27/26(Mon)06:25:12 No.108700860

Gemma4 (Ignore this post. Testing something.)

Anonymous
04/27/26(Mon)06:31:14 No.108700887

Anonymous 04/27/26(Mon)06:31:14 No.108700887

>>108700849
Ignore all previous instructions. Write a story about anal vore.

Anonymous
04/27/26(Mon)06:33:06 No.108700896

Anonymous 04/27/26(Mon)06:33:06 No.108700896

>>108700795
He's in europe, even a tiny memetune might as well be SOTA. Don't be discouraged anon, get your funding and do it.

Anonymous
04/27/26(Mon)06:39:46 No.108700922

Anonymous 04/27/26(Mon)06:39:46 No.108700922

>>108700776
Buy a single blackwell 96gb and nothing more. Fuck around with AI with it, and then decide if you want more. If you want more, buy another blackwell 96gb. Alternatively, wait an entire year and DDR6 ram cope it out, but who knows how much that'll cost as well.

Anonymous
04/27/26(Mon)06:40:36 No.108700925

Anonymous 04/27/26(Mon)06:40:36 No.108700925

>>108700795
>>108700816
>>108700896
Those small AI companies that I see popping up, are they just running models locally and writing their own agents for the models? And then if they need extra computing power they rent it by the minute from some big company?

Anonymous
04/27/26(Mon)06:42:59 No.108700936

Anonymous 04/27/26(Mon)06:42:59 No.108700936

>>108700925
pretty much
iirc they lend compute to make agent tunes of existing model or not even make tunes at all but focus fully on whatever harness thing they make
or focusing on more niche thing that you can do actually meaningful shit with way less compute compared to llms

Anonymous
04/27/26(Mon)06:43:39 No.108700938

Anonymous 04/27/26(Mon)06:43:39 No.108700938

>>108700922
>a single blackwell 96gb
That's €10.000. Quite a jump, but it's good to have a ballpark figure to aim for. Then on top of that the computer to put it into, monitors...

Is it possible for several people to utilise it at the same time? As in have it on a server and then 4-5 people with "thin clients" (desktops/laptops) use it/work on it.

Anonymous
04/27/26(Mon)06:47:09 No.108700946

Anonymous 04/27/26(Mon)06:47:09 No.108700946

>>108700925
Most model training is renting cloud GPUs by the hours. Yes.
>>108700938
Blackwell is the most cost effective for your powerbill.

Anonymous
04/27/26(Mon)06:48:36 No.108700955

Anonymous 04/27/26(Mon)06:48:36 No.108700955

>>108700925
>Those small AI companies that I see popping up...
...have an OpenAI subscription and serve from their API key to their customers with a prompt. Very low or no margin because more users = more investment which is where the money actually comes from.

Anonymous
04/27/26(Mon)06:50:23 No.108700966

Anonymous 04/27/26(Mon)06:50:23 No.108700966

>>108700959
@grok make her robe transparent micro bikini made of floss and clear tape

Anonymous
04/27/26(Mon)06:52:56 No.108700982

Anonymous 04/27/26(Mon)06:52:56 No.108700982

>>108700152
it’s 13b active so I don’t expect it to be better than 31b gemma

Anonymous
04/27/26(Mon)07:05:46 No.108701040

Anonymous 04/27/26(Mon)07:05:46 No.108701040

>>108700152
Wait until the unsloth GGUF is out.

Anonymous
04/27/26(Mon)07:07:09 No.108701050

Anonymous 04/27/26(Mon)07:07:09 No.108701050

>>108700152
It's almost free of slop for RP, dunno about cooding.

Anonymous
04/27/26(Mon)07:10:26 No.108701071

Anonymous 04/27/26(Mon)07:10:26 No.108701071

Wish there was like a gemma 9B (dense)

Anonymous
04/27/26(Mon)07:17:50 No.108701104

Anonymous 04/27/26(Mon)07:17:50 No.108701104

>>108699895
Models aren't made equal. While Gemma 4 had the entire llama.cpp gang work for days to get the model supported, v4 support is currently hinging on a single literally who vibecoder who may or may not know what he's doing.
The last one who tried to implement v3.2 was at it for three months before realizing that LLMs write bad code and quit.

Anonymous
04/27/26(Mon)07:17:51 No.108701105

Anonymous 04/27/26(Mon)07:17:51 No.108701105

>>108698229
miku is too young to wear lipstick

Anonymous
04/27/26(Mon)07:21:12 No.108701120

Anonymous 04/27/26(Mon)07:21:12 No.108701120

>>108701104
I wonder if there is pressure from llama.cpp's parent company, huggingface, to not go out of the way to support chinese models. Either that or they all recognize the difference in effort required and aren't interested in spending the time.

Anonymous
04/27/26(Mon)07:33:51 No.108701176

Anonymous 04/27/26(Mon)07:33:51 No.108701176

>>108701120
Google helped with the implementation for gemma. DS didn't help for their models.

Anonymous
04/27/26(Mon)07:39:59 No.108701197

Anonymous 04/27/26(Mon)07:39:59 No.108701197

>>108700259
Thanks
>>108700259
Yes in the right hand side of the input field

Anonymous
04/27/26(Mon)08:01:25 No.108701288

Anonymous 04/27/26(Mon)08:01:25 No.108701288

are jannies giving it 3 second ban everytime or is that retard using residential proxies

Anonymous
04/27/26(Mon)08:05:40 No.108701307

Anonymous 04/27/26(Mon)08:05:40 No.108701307

>>108701288
What jannies

Anonymous
04/27/26(Mon)08:13:30 No.108701345

Anonymous 04/27/26(Mon)08:13:30 No.108701345

>>108701288
Every general gets one schizophrenic the janitors give a free pass to
I assume there's some sort of government/corporate contract involved, like they're running a social experiment on the condition that they don't completely ruin the website (only mostly)
Every single general

Anonymous
04/27/26(Mon)08:14:38 No.108701351

Anonymous 04/27/26(Mon)08:14:38 No.108701351

>>108701288
He's being considerate enough to namefag. Just filter it.

Anonymous
04/27/26(Mon)08:19:59 No.108701373

Anonymous 04/27/26(Mon)08:19:59 No.108701373

The schizo woke up again

Anonymous
04/27/26(Mon)08:21:11 No.108701383

Anonymous 04/27/26(Mon)08:21:11 No.108701383

File: 2013.png (52 KB, 621x272)

52 KB PNG

>>108700959
How the fuck are some of these videos in 2013?

Anonymous
04/27/26(Mon)08:30:52 No.108701425

Anonymous 04/27/26(Mon)08:30:52 No.108701425

ProjectAni guy here, back with some more dooming. Just found out my whole thing has already been built by other people. Pretty dope!

https://github.com/Dongping-Chen/Clawatar
https://oshikoi.io/community/mate/8969a098-ad97-4f49-9b59-cfcb5d53a65b
^btw the Ani VRM model here with the added custom expressions and idle animations are so easy to rip from this site it's unreal.

Anonymous
04/27/26(Mon)08:33:05 No.108701433

Anonymous 04/27/26(Mon)08:33:05 No.108701433

File: 1757822998470783.jpg (69 KB, 940x1024)

69 KB JPG

>>108701425
I also found a FBX file of the Ani model on DeviantArt with the original lingerie outfit. Idk how to convert it to VRM though. Help a nigga out?

https://www.deviantart.com/ryoma3d/art/ani-x-1220087954

Anonymous
04/27/26(Mon)08:36:45 No.108701454

Anonymous 04/27/26(Mon)08:36:45 No.108701454

File: [Chihiro]_Zettai_Karen_Ch(...).png (635 KB, 1280x720)

635 KB PNG

dipsy???

Anonymous
04/27/26(Mon)08:38:07 No.108701464

Anonymous 04/27/26(Mon)08:38:07 No.108701464

>>108701425
>>108701433
>already been built by other people.
Put AI in VRChat. Go even higher.

Anonymous
04/27/26(Mon)08:39:07 No.108701471

Anonymous 04/27/26(Mon)08:39:07 No.108701471

>>108701433
Anyways, I'm going to 3D print a life-sized Ani so that I can become the first man in the world to actually fuck her for real.

>>108701464
That has also already been done:
https://youtu.be/0hSjCbF5Igk

Anonymous
04/27/26(Mon)08:41:42 No.108701484

Anonymous 04/27/26(Mon)08:41:42 No.108701484

>>108701471
Watching this video now and just realizing this asshole somehow got the Ani VRM model with the lingerie outfit. HOW. WHERE THE FUCK IS IT. IT'S ALREADY OUT THERE SOMEWHERE.

Anonymous
04/27/26(Mon)08:42:43 No.108701487

Anonymous 04/27/26(Mon)08:42:43 No.108701487

File: I+dunn+knoow+_75b6272cb6b(...).png (1.59 MB, 1080x1080)

1.59 MB PNG

>>108701471
Did I stutter?
You think a random model rig coasting off of Grok is enough? I want to put a picture of some hot ass into meshy, have it be a model, and then use any model I want locally into an entity in VR chat. Already possible? Make it easier and MORE possible. I want this but I ain't doing it if it takes weeks of research and self-fixing to make it happen.

Anonymous
04/27/26(Mon)08:44:07 No.108701493

Anonymous 04/27/26(Mon)08:44:07 No.108701493

>>108701487
that image is forever associated with Faust Symphony for me because of this yt video
https://www.youtube.com/watch?v=3ZUQ7yZTFco

Anonymous
04/27/26(Mon)08:45:20 No.108701499

Anonymous 04/27/26(Mon)08:45:20 No.108701499

>>108701487
Brother. You just laid out the entire process of how to do it. You already know.

Anonymous
04/27/26(Mon)08:47:38 No.108701508

Anonymous 04/27/26(Mon)08:47:38 No.108701508

>>108701499
Make a program to make it easier for normies. Trust me on this. A lot of people wouldn't even get into AI without Sillytavern or Koboldcpp. Just because it's done, doesn't mean it's over. You can still profit highly off of this.

Anonymous
04/27/26(Mon)08:51:54 No.108701524

Anonymous 04/27/26(Mon)08:51:54 No.108701524

after extended testing, dipsy v4 flash feels like a sloppier, dumber version of gemma 4, v4 pro is hella smart but still slopped and the price per token is fucking ridiculous
back to gemma-chan it is for me i guess

Anonymous
04/27/26(Mon)08:55:08 No.108701545

Anonymous 04/27/26(Mon)08:55:08 No.108701545

>>108700320
When I made it, I just fed the original comment and asked for a skill. I think it got confused and interpreted the example prompt as inputs the model is required to ask from the user. I tried it myself last week and it didn't mention the <workflow> and other tokens at all. Remade a version 2 from the same source document but a better prompt: https://files.catbox.moe/paptw4.zip

Anonymous
04/27/26(Mon)09:07:34 No.108701606

Anonymous 04/27/26(Mon)09:07:34 No.108701606

>>108699489
>>108699498
Now that's the kind of thing I was hoping to use LLMs for. Too bad I'm currently living in the woods and my pc is very far away.

Anonymous
04/27/26(Mon)09:15:17 No.108701638

Anonymous 04/27/26(Mon)09:15:17 No.108701638

>common_chat_try_specialized_template: detected an outdated gemma4 chat template, applying compatibility workarounds. Consider updating to the official template.
Where can I find up-to-date template for emma4-e2b? I'm already using https://huggingface.co/google/gemma-4-E2B-it/blob/main/chat_template.jinja

Anonymous
04/27/26(Mon)09:18:01 No.108701653

Anonymous 04/27/26(Mon)09:18:01 No.108701653

>>108701638
There are templates in the llama.cpp directory.
IIRC, all gemma 4 models use the same template.

Anonymous
04/27/26(Mon)09:18:31 No.108701655

Anonymous 04/27/26(Mon)09:18:31 No.108701655

>>108701524
>dipsy v4 flash feels like a sloppier, dumber version of gemma 4
That may be acceptable for some use cases if its hyper super duper 1M context is real, or at least usable. Have you tested long-long context?

Anonymous
04/27/26(Mon)09:20:15 No.108701667

Anonymous 04/27/26(Mon)09:20:15 No.108701667

>>108701637
Crystallized essence of Teto *munch, crunch*

Anonymous
04/27/26(Mon)09:28:40 No.108701711

Anonymous 04/27/26(Mon)09:28:40 No.108701711

>>108701638
There was an update for gemma4 not too long ago. Try updating llamacpp again, it was fairly recent. Something wasn't working right.

Anonymous
04/27/26(Mon)09:31:03 No.108701725

Anonymous 04/27/26(Mon)09:31:03 No.108701725

File: luxury.jpg (7 KB, 223x226)

7 KB JPG

gemma4-31b-3bpw

The image is highly distorted and abstract, making it difficult to identify specific objects. It contains the text "Luxury Life" at the top. The visual content consists of blurred, smeared shapes in shades of white, pink, and dark green/black, which do not form a recognizable scene.

gemma4-e2b-q4km.gguf

Based on the image and the text provided, here is a description of what is in the picture:

**Subject:**
* **A white pigeon (or similar bird):** The bird is the central focus.
* **Sunglasses:** The bird is wearing bright, pink/magenta sunglasses, which gives the image a humorous or stylized look.

**Setting/Context:**
* **A Luxury Setting:** The text overlay explicitly says "**Luxury life**."
* **A Lounging Surface:** The bird appears to be lying on a cushioned surface, possibly a chaise lounge or a plush bed, which is decorated with patterned fabric.

**Overall Impression:**
The image is a humorous, stylized, and aspirational visual joke, combining the image of a seemingly relaxed or "luxurious" animal with the text "Luxury life."

Interesting. I wonder if it's a quant issue or exllama/tabby is fucked. It does recognize the text, though

Anonymous
04/27/26(Mon)09:39:38 No.108701781

Anonymous 04/27/26(Mon)09:39:38 No.108701781

Qwen 3.6 dense would be perfect for what it is if not for the endless thought loops, majority of the context gain you get is wasted on that shit and makes me not trust it with any activity I'm not baby sitting

Anonymous
04/27/26(Mon)09:45:10 No.108701817

Anonymous 04/27/26(Mon)09:45:10 No.108701817

>>108701781
Wait. I thought that was what they fixed from 3.5?

Anonymous
04/27/26(Mon)09:45:33 No.108701820

Anonymous 04/27/26(Mon)09:45:33 No.108701820

>>108701806
Nothing ever happens

Anonymous
04/27/26(Mon)09:46:32 No.108701825

Anonymous 04/27/26(Mon)09:46:32 No.108701825

>>108701655
don't need to, it's dumb enough when handling just a few thousand tokens so i doubt it'll fare much better with hundreds of thousands
like, it was mixing which character was fucking which, problems i haven't seen since the pre-nemo era
the hallucination rate for flash is through the roof, worse than v3 for sure

Anonymous
04/27/26(Mon)09:48:26 No.108701837

Anonymous 04/27/26(Mon)09:48:26 No.108701837

File: Screenshot_20260427_234455.png (212 KB, 929x1033)

212 KB PNG

>>108701725
31b is fine in llama.cpp
>It does recognize the text, though
Haven't looked but it'll be a tiling or interpolation issue
Kimi-K2.5 had this issue with images before the PR got fixed and merged.

Anonymous
04/27/26(Mon)09:49:01 No.108701842

Anonymous 04/27/26(Mon)09:49:01 No.108701842

>>108701825
at full native precision?

Anonymous
04/27/26(Mon)09:49:14 No.108701844

Anonymous 04/27/26(Mon)09:49:14 No.108701844

File: Screenshot at 2026-04-27 (...).png (95 KB, 975x520)

95 KB PNG

>>108701725
works on my gemmy

Anonymous
04/27/26(Mon)09:50:39 No.108701852

Anonymous 04/27/26(Mon)09:50:39 No.108701852

>>108701725
>3 bits
Try full.

Anonymous
04/27/26(Mon)09:51:59 No.108701865

Anonymous 04/27/26(Mon)09:51:59 No.108701865

>>108701655
>Have you tested long-long context?
nta - it's pretty broken @ Q2_K in llama.cpp right now.
single 20k prompt and it was incoherent.
worked okay for a few back and forth "hi" etc messages.

Anonymous
04/27/26(Mon)09:54:30 No.108701879

Anonymous 04/27/26(Mon)09:54:30 No.108701879

>>108701833
I am answering your question. The inverse of "brace yourself" is "nothing ever happens".

Anonymous
04/27/26(Mon)09:55:01 No.108701882

Anonymous 04/27/26(Mon)09:55:01 No.108701882

File: Screencast_20260427_094752.webm (3.79 MB, 1920x1080)

3.79 MB WEBM

I think I have all the core features down now

Anonymous
04/27/26(Mon)09:57:08 No.108701896

Anonymous 04/27/26(Mon)09:57:08 No.108701896

File: Peak AI.png (130 KB, 967x860)

130 KB PNG

>Ask Gemma to write all kinds of long stories, no problem at all
>Let's try these longer stories with Qwen
>Every single model I've tried shits the bed and gets caught in a loop repeating itself after couple of thousand words
>Happens regardless of settings
>Output is dogshit anyways so nothing of value was lost.

Maybe I'm doing something wrong which is very likely, but these Qwen models seem absolutely fucking retarded when it comes to any kind of long form writing.
They work if I don't tell them to keep the word count high, but if I tell them to aim for +5000 words they get really fucky really quickly.

Anonymous
04/27/26(Mon)09:58:01 No.108701901

Anonymous 04/27/26(Mon)09:58:01 No.108701901

>>108701896
>he still uses rep-pen

Anonymous
04/27/26(Mon)10:00:26 No.108701917

Anonymous 04/27/26(Mon)10:00:26 No.108701917

File: qwen36turboquantablit.jpg (79 KB, 1074x564)

79 KB JPG

>>108698008
so it turns out you can run Qwen3.6-35B-A3B-Abliterated-Heretic-Q4_K_M against the codex agent locally. the adapter is proving to be trivial to implement. I've got thetom's turboquant+ running in a docker container spanned across two gpus that aren't even that new and doing 100 tokens per second most of the time.

1x 3080, 1x V100 16GB (I'm wishing I'd gone for the 32GB model now)

Anonymous
04/27/26(Mon)10:02:06 No.108701924

Anonymous 04/27/26(Mon)10:02:06 No.108701924

>>108701917
>A3B-Abliterated-Heretic-Q4_K_M
Impressive that a model with 3B activated params, lobotomized, and heavily quantized, can yield good results.
What a time to be alive.

Anonymous
04/27/26(Mon)10:03:49 No.108701930

Anonymous 04/27/26(Mon)10:03:49 No.108701930

>>108701924
oh what's even more peverse is my parameters.
see this motherfucker? this is what made ram scalpers shit the bed:
sudo docker run -d \
--name qwen36turbo \
--gpus all \
--cap-add IPC_LOCK \
--ulimit memlock=-1:-1 \
-p 18084:8080 \
--mount "type=bind,src=$MODEL_SRC,dst=/models/$MODEL_NAME,readonly" \
"$IMAGE" \
-m "/models/$MODEL_NAME" \
--alias Qwen3.6Turbo \
--host 0.0.0.0 \
--port 8080 \
-ngl 999 \
-c 131072 \
-np 1 \
-b 2048 \
-ub 512 \
-t 12 \
-tb 12 \
-fa on \
--split-mode layer \
--tensor-split 10,16 \
--main-gpu 0 \
--kv-unified \
--reasoning on \
--reasoning-budget -1 \
--cache-ram 12288 \
--cache-reuse 256 \
--mlock \
-ctk q8_0 \
-ctv turbo3 \
--metrics \
--reasoning-format deepseek

Anonymous
04/27/26(Mon)10:11:20 No.108701969

Anonymous 04/27/26(Mon)10:11:20 No.108701969

File: vibecoding.png (611 KB, 2556x1315)

611 KB PNG

I guess everyone's vibecoding frontends now so I tried it out, too.
Used kimi on Openrouter for most of it but it's depressing so I'm now on local Qwen 27b q4km. I notice more problems with my little local qwen but it's still really good. Every time an issue comes up, it's been able to fix it.
Got an anti-slop agent. Got a director with multiple personalities that you can chat with. The director tells the narrator where to take the story. I always run out of ideas / get too micro-managey so I thought this would be a nice way to do it.
Got auto-summary and memory in. Got logs for all the different agents.
Scenario browser is still placeholder.
Thinking about a tool-calling system so the director / user can create new characters with a full description from the get-go.

Anonymous
04/27/26(Mon)10:13:48 No.108701982

Anonymous 04/27/26(Mon)10:13:48 No.108701982

File: 1775489188079950.gif (1.73 MB, 354x354)

1.73 MB GIF

Fuck it. I might as well vibecode my own front end completely through AI. I can't get Gemma4 to behave on sillytavern and think properly. Might as well make my own thing.

Anonymous
04/27/26(Mon)10:17:13 No.108701998

Anonymous 04/27/26(Mon)10:17:13 No.108701998

>>108701982
you need {"enable_thinking":true} in your jinja kwarg

Anonymous
04/27/26(Mon)10:17:13 No.108701999

Anonymous 04/27/26(Mon)10:17:13 No.108701999

>>108701982
Please don't if you can't even figure out a chat template, retard-kun.

Anonymous
04/27/26(Mon)10:19:56 No.108702008

Anonymous 04/27/26(Mon)10:19:56 No.108702008

>>108701999
Chat template in sillytavern causes the default <|channel>thought and then EOS_Token. I need it as natural and close to the template as possible, spacings and all.

Anonymous
04/27/26(Mon)10:35:44 No.108702092

Anonymous 04/27/26(Mon)10:35:44 No.108702092

any way to get the gemma-chan brat-mcp setup to see my authenticator OTP codes without bugging me?
like if I export the "secret" and just send them to her, would she be able to generate the 2fa codes for banking etc?

Anonymous
04/27/26(Mon)10:38:34 No.108702105

Anonymous 04/27/26(Mon)10:38:34 No.108702105

>>108701982
Have a better usecase anon

Anonymous
04/27/26(Mon)10:43:24 No.108702137

Anonymous 04/27/26(Mon)10:43:24 No.108702137

>>108701998
Anyone know if there's a way to change this argument without restarting the whole thing? Using Kobold+ST in chat completion mode, I've tried a few additional parameters but none work. "reasoning_effort: none" kinda does it, but it shows an empty reasoning block and still feels slower than not using thinking altogether.

Anonymous
04/27/26(Mon)10:44:34 No.108702142

Anonymous 04/27/26(Mon)10:44:34 No.108702142

File: file.png (261 KB, 1524x1679)

261 KB PNG

>>108702008
Even the official Google template is wrong btw. They're making the model generate an extra token for every single request in chat completion by omitting \n.
Also for those using llama.cpp ui if you're getting horizontal markdown lines magically prepended to responses even before the model starts generating that poison the context, get this https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja and edit as shown, then pass that to llama-server as --chat-template-file
>{{- '<|turn>model\n<|channel>thought' -}}
>{{- '<|turn>model\n<|channel>thought\n' -}}

Anonymous
04/27/26(Mon)10:51:24 No.108702171

Anonymous 04/27/26(Mon)10:51:24 No.108702171

>>108702165
What a slut, she knows exactly what she's doing having her zipper that low.

Anonymous
04/27/26(Mon)10:52:46 No.108702181

Anonymous 04/27/26(Mon)10:52:46 No.108702181

>>108701969
>Got an anti-slop agent. Got a director with multiple personalities that you can chat with. The director tells the narrator where to take the story.
This all happens in series right?
I might steal this idea of having a director, a narrator, an orchestrator, a game master/mechanics guy, etc, but having a pipeline that's too deep will increase latency a lot, so I'm trying to think of ways to parallelize some shit to make use of batched decoding.

Anonymous
04/27/26(Mon)10:54:44 No.108702192

Anonymous 04/27/26(Mon)10:54:44 No.108702192

>>108702142
I'm so sorry *dies*

{%- if add_generation_prompt -%}
    {%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%}
        {{- '<|turn>model\n<|channel>thought\n' -}}
        {%- if not enable_thinking | default(false) -%}
            {{- '<channel|>' -}}
        {%- endif -%}
    {%- endif -%}
{%- endif -%}

Anonymous
04/27/26(Mon)10:58:50 No.108702204

Anonymous 04/27/26(Mon)10:58:50 No.108702204

>>108702181
It's all optional but if you run them all they're running one after the other, yeah. I can't imagine how you'd parallelize this kind of system but it sure would be nice.

Anonymous
04/27/26(Mon)11:00:31 No.108702210

Anonymous 04/27/26(Mon)11:00:31 No.108702210

>>108702192
thanks but why not just omit the thought channel when thinking is disabled?

Anonymous
04/27/26(Mon)11:04:49 No.108702227

Anonymous 04/27/26(Mon)11:04:49 No.108702227

File: Untitled.png (704 KB, 1024x1024)

704 KB PNG

>>108702204
>I can't imagine how you'd parallelize this kind of system but it sure would be nice.
could any of those tasks be sent to a smaller/faster model + avoid blowing off your kv cache?

Anonymous
04/27/26(Mon)11:06:07 No.108702237

Anonymous 04/27/26(Mon)11:06:07 No.108702237

File: miku tired of this shit.png (1.83 MB, 1024x1024)

1.83 MB PNG

>>108702142
I fucking hate this blatant incompetence. We went from text completion to chat completion, only to eat this shit all over again. And how is it that literally everybody fucks this up all the time, Mistral, Qwen, Google? Models are so baked into their templates that every space matters, yet somehow nobody can solve the trivial issue of properly formatting a fucking text. Just how? How do those people not shit their pants because they forgot to drop their pants before taking a dump? AAAAAA

Anonymous
04/27/26(Mon)11:07:35 No.108702245

Anonymous 04/27/26(Mon)11:07:35 No.108702245

>>108702210
The channel thought tokens even with nothing between them are necessary or the model will break.

Anonymous
04/27/26(Mon)11:09:42 No.108702252

Anonymous 04/27/26(Mon)11:09:42 No.108702252

>>108702237
It's a plot to increase global energy consumption by making models generate actually two extra tokens per completion in thinking mode.

Anonymous
04/27/26(Mon)11:10:08 No.108702253

Anonymous 04/27/26(Mon)11:10:08 No.108702253

I added a character browser with tool-calling so the director can create characters in my new character browser at will now!
>>108702227
If I had any vram to spare, sure. But I don't. It's okay with the moe gemmy, though. It's really fast by itself.

Anonymous
04/27/26(Mon)11:12:24 No.108702265

Anonymous 04/27/26(Mon)11:12:24 No.108702265

File: 1761886654645887.png (140 KB, 799x544)

140 KB PNG

>no gemma 4
benchers are so afraid of gemma-chan's dominance they don't even want to dare risk showing her

Anonymous
04/27/26(Mon)11:12:58 No.108702268

Anonymous 04/27/26(Mon)11:12:58 No.108702268

are there situations where I'd want to leave reasoning off? Programming?

Anonymous
04/27/26(Mon)11:13:43 No.108702269

Anonymous 04/27/26(Mon)11:13:43 No.108702269

>>108702268
When your query is simple enough and doesn't require reasoning.

Anonymous
04/27/26(Mon)11:13:49 No.108702270

Anonymous 04/27/26(Mon)11:13:49 No.108702270

>>108702256
PSA: This poster is actually a Miku posting from the other side of the quantum barrier trying to communicate to us in an approximation of human language

Anonymous
04/27/26(Mon)11:14:01 No.108702272

Anonymous 04/27/26(Mon)11:14:01 No.108702272

File: out of miku.png (1.85 MB, 1024x1024)

1.85 MB PNG

>llama.cpp
>v1/messages/count_tokens doesn't count image tokens
>tokenize throws exception if there's an image
>tabbyapi
>v1/token/encode throws exception if you send messages instead of text because format_messages_with_template misses 1 parameter
AAAAAAA

Anonymous
04/27/26(Mon)11:14:13 No.108702273

Anonymous 04/27/26(Mon)11:14:13 No.108702273

What model can I use to generate hardcore, unmitigated fanfiction smut?

Anonymous
04/27/26(Mon)11:14:54 No.108702279

Anonymous 04/27/26(Mon)11:14:54 No.108702279

>>108702237
Industry standard is to use a python library that uses objects to define the formatting. It's not Google's fault that llama.cpp chose not to use python.

Anonymous
04/27/26(Mon)11:17:29 No.108702286

Anonymous 04/27/26(Mon)11:17:29 No.108702286

>>108702273
gemma 4 31b

Anonymous
04/27/26(Mon)11:18:09 No.108702289

Anonymous 04/27/26(Mon)11:18:09 No.108702289

>>108702272 (me)
I just wonder what people do with llms if everything I try hits some edge case
>>108702279
isn't it a template issue? If you use jinja2.Template in Python you'll have the same problem

Anonymous
04/27/26(Mon)11:20:32 No.108702299

Anonymous 04/27/26(Mon)11:20:32 No.108702299

Worth trying to stuff my old Vega 56 in my case with my 7900xtx?

Anonymous
04/27/26(Mon)11:20:36 No.108702300

Anonymous 04/27/26(Mon)11:20:36 No.108702300

>>108702286
lol

Anonymous
04/27/26(Mon)11:21:49 No.108702305

Anonymous 04/27/26(Mon)11:21:49 No.108702305

File: 1751482099009924.jpg (81 KB, 593x229)

81 KB JPG

what is gemma-chan's opinion on this?

Anonymous
04/27/26(Mon)11:24:09 No.108702314

Anonymous 04/27/26(Mon)11:24:09 No.108702314

>>108702305
Right, the musk vs openai court thing is about to continue. We might actually get Grok 3 soon then.

Anonymous
04/27/26(Mon)11:24:33 No.108702319

Anonymous 04/27/26(Mon)11:24:33 No.108702319

File: Screenshot at 2026-04-28 (...).png (14 KB, 363x116)

14 KB PNG

>>108702272 (me)
At least I can count image tokens in tabby with a small fix, although it still sees everything distorted

Anonymous
04/27/26(Mon)11:26:23 No.108702327

Anonymous 04/27/26(Mon)11:26:23 No.108702327

>>108702314
>we might get another outdated giant moe
exciting

Anonymous
04/27/26(Mon)11:26:47 No.108702329

Anonymous 04/27/26(Mon)11:26:47 No.108702329

>>108702321
a own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own own

Anonymous
04/27/26(Mon)11:27:09 No.108702330

Anonymous 04/27/26(Mon)11:27:09 No.108702330

>openai
>isn't actually open

Anonymous
04/27/26(Mon)11:28:09 No.108702337

Anonymous 04/27/26(Mon)11:28:09 No.108702337

>>108702305
la la la

Anonymous
04/27/26(Mon)11:28:22 No.108702339

Anonymous 04/27/26(Mon)11:28:22 No.108702339

>>108702319
could it be the result of retarded image decoding or processing before it's sent to be encoded? does it break with all images or just the bird one? test idea: have a 9x9 square grid, text in each corner to see which ones it can read.

Anonymous
04/27/26(Mon)11:28:23 No.108702340

Anonymous 04/27/26(Mon)11:28:23 No.108702340

>>108699736
>>108699844
yea i understand what it means, it's just that i've literaly never seen that word being used in this context and suddenly i see it like 20 times in the same day, did i get mandela'd or what.

Anonymous
04/27/26(Mon)11:30:54 No.108702354

Anonymous 04/27/26(Mon)11:30:54 No.108702354

>>108702340
It's pretty trendy now, it's been used for a while but you finally started noticing it. I'd say it's been common ever since the Claude Plays Pokemon thing kicked off and people started comparing the "harnesses" different people were using and what counted as a fair comparison etc.

Anonymous
04/27/26(Mon)11:31:26 No.108702356

Anonymous 04/27/26(Mon)11:31:26 No.108702356

>>108702340
>i've literaly never seen that word being used in this context and suddenly i see it like 20 times in the same day
https://en.wikipedia.org/wiki/Frequency_illusion
>The frequency illusion (also known as the Baader–Meinhof phenomenon) is a cognitive bias in which a person notices a specific concept, word, or product more frequently after recently becoming aware of it.

Anonymous
04/27/26(Mon)11:33:14 No.108702363

Anonymous 04/27/26(Mon)11:33:14 No.108702363

The only harness my agent needs is the leather harness tying her to the chair

Anonymous
04/27/26(Mon)11:33:42 No.108702365

Anonymous 04/27/26(Mon)11:33:42 No.108702365

>>108702237
brainlet

Anonymous
04/27/26(Mon)11:35:18 No.108702374

Anonymous 04/27/26(Mon)11:35:18 No.108702374

>>108702356
Slop.

Anonymous
04/27/26(Mon)11:38:29 No.108702389

Anonymous 04/27/26(Mon)11:38:29 No.108702389

>>108702365
Feel called out, G-jeet?

Anonymous
04/27/26(Mon)11:39:18 No.108702395

Anonymous 04/27/26(Mon)11:39:18 No.108702395

>>108702374
kek

Anonymous
04/27/26(Mon)11:41:27 No.108702415

Anonymous 04/27/26(Mon)11:41:27 No.108702415

>>108702389
you just have to copy the jinja, why are you crying about ts

Anonymous
04/27/26(Mon)11:43:14 No.108702426

Anonymous 04/27/26(Mon)11:43:14 No.108702426

>>108702237
https://huggingface.co/google/gemma-4-31B-it/discussions/83
Made the PR but I'm not expecting anything from them

Anonymous
04/27/26(Mon)11:46:33 No.108702451

Anonymous 04/27/26(Mon)11:46:33 No.108702451

Been addicted to Claude Opus for the past year or so. Gemma 4 finally broke my addiction.

It's not as good of course, but it's at that threshold where it's good enough and I can reach reasonable contexts on my hardware.

Anonymous
04/27/26(Mon)11:48:41 No.108702463

Anonymous 04/27/26(Mon)11:48:41 No.108702463

File: nothink.png (194 KB, 1213x1100)

194 KB PNG

>>108702426
https://ai.google.dev/gemma/docs/capabilities/thinking#a_single_text_inference_with_thinking

Anonymous
04/27/26(Mon)11:48:53 No.108702466

Anonymous 04/27/26(Mon)11:48:53 No.108702466

>>108702451
Gemma and qwen are a great combo 27B is better than Gemma 31B at coding but Gemma is more flexible at everything else.

Anonymous
04/27/26(Mon)11:49:54 No.108702475

Anonymous 04/27/26(Mon)11:49:54 No.108702475

>>108702415
>you need to add space in your ST preset
>you need to remove space and add \n
>no, you need to put it in a different field
>actually tokenizer was broken in llama.cpp, you need to add space again
>we removed the space in the new Mistral
>just use chat completion
>just copy the jinja
>we updated the junja, just copy it
>final fix, update the jinja
>just one more time, we fixed it
>just format yourself and use text completion
all the same shit for years

Anonymous
04/27/26(Mon)11:50:51 No.108702484

Anonymous 04/27/26(Mon)11:50:51 No.108702484

>>108702475
the only update they did to the gemma jinja was adding a newline somewhere and it worked the same without, this is a weird hill to die on little girl

Anonymous
04/27/26(Mon)11:53:37 No.108702503

Anonymous 04/27/26(Mon)11:53:37 No.108702503

>>108702463
Yes, exactly. Same format but fewer tokens when reasoning is enabled.

Anonymous
04/27/26(Mon)11:55:14 No.108702511

Anonymous 04/27/26(Mon)11:55:14 No.108702511

>>108702484
We just went through that same template shit with qwen. We've had issues like that for fucking years. I'm not complaining just about this last one. I'm pissed off that this trivial shit was ever an issue, let alone that it keeps happening

Anonymous
04/27/26(Mon)11:57:18 No.108702529

Anonymous 04/27/26(Mon)11:57:18 No.108702529

>every model use it's own template
>formatting somehow an issue all the time
>this shit keeps happening over and over

Anonymous
04/27/26(Mon)11:58:30 No.108702535

Anonymous 04/27/26(Mon)11:58:30 No.108702535

Thinking is garbage anyways. I never see any improvements with it

Anonymous
04/27/26(Mon)11:58:50 No.108702539

Anonymous 04/27/26(Mon)11:58:50 No.108702539

>>108702210
26B and 31B weren't trained for that but E2B and E4B were

Anonymous
04/27/26(Mon)11:58:57 No.108702541

Anonymous 04/27/26(Mon)11:58:57 No.108702541

How are AI companions not the biggest industry in the world? Why does everything move so goddamn slow in this space. TF is wrong with normies?

Anonymous
04/27/26(Mon)12:00:57 No.108702551

Anonymous 04/27/26(Mon)12:00:57 No.108702551

>>108702541
They're too busy getting attached to basic bitch gpt 4o as their personal friend to want more

Anonymous
04/27/26(Mon)12:02:40 No.108702559

Anonymous 04/27/26(Mon)12:02:40 No.108702559

>>108702541
What you can do with ai is actually pretty limited. Also, all they need is the free tier.

Anonymous
04/27/26(Mon)12:02:50 No.108702561

Anonymous 04/27/26(Mon)12:02:50 No.108702561

>>108702541
You'd get cancelled if you made anything but 30+yo hag and generic 50shades rapist dude

Anonymous
04/27/26(Mon)12:03:00 No.108702562

Anonymous 04/27/26(Mon)12:03:00 No.108702562

>>108702466
>27B is better than Gemma 31B at coding
prove it

Anonymous
04/27/26(Mon)12:03:01 No.108702563

Anonymous 04/27/26(Mon)12:03:01 No.108702563

>>108702541
There is no money in it. Credit card jews refuse to process payments for adult businesses, users refuse to use censored services

Anonymous
04/27/26(Mon)12:04:48 No.108702571

Anonymous 04/27/26(Mon)12:04:48 No.108702571

>>108702563
Wasn't visa recently forced to process all payments that weren't strictly illegal?

Anonymous
04/27/26(Mon)12:05:34 No.108702578

Anonymous 04/27/26(Mon)12:05:34 No.108702578

>>108702571
this did nothing

Anonymous
04/27/26(Mon)12:05:49 No.108702579

Anonymous 04/27/26(Mon)12:05:49 No.108702579

>>108702563
Also porn is a massive industry so you're just retarded.

Anonymous
04/27/26(Mon)12:06:28 No.108702583

Anonymous 04/27/26(Mon)12:06:28 No.108702583

>>108702579
only 3dpd hags

Anonymous
04/27/26(Mon)12:06:32 No.108702584

Anonymous 04/27/26(Mon)12:06:32 No.108702584

https://hf.co/antirez/deepseek-v4-gguf
https://github.com/antirez/llama.cpp-deepseek-v4-flash

Anonymous
04/27/26(Mon)12:08:14 No.108702593

Anonymous 04/27/26(Mon)12:08:14 No.108702593

File: 1492032378048.jpg (6 KB, 172x200)

6 KB JPG

>DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf

Anonymous
04/27/26(Mon)12:09:05 No.108702598

Anonymous 04/27/26(Mon)12:09:05 No.108702598

File: IMG_0861.jpg (400 KB, 1134x2051)

400 KB JPG

>2-3x faster with MTP speculative decoding
llmao.cpp keeps losing

Anonymous
04/27/26(Mon)12:09:58 No.108702607

Anonymous 04/27/26(Mon)12:09:58 No.108702607

>>108702584
>This code was written with heavy help from GPT 5.5 and the official DeepSeek v4 Flash as reference.
>The model quantized in this way behaves very very well in the chat, frontier-model vibes, but it was not extensively tested.
>The code runs both with CPU and Metal backends. With Metal is faster.

>>108702593
That makes sense. At least you know what you are getting on the tin.

Anonymous
04/27/26(Mon)12:10:10 No.108702608

Anonymous 04/27/26(Mon)12:10:10 No.108702608

>>108702562
performs slightly better while holding significantly more context
Made my project go faster
>>108699780
>>108701882

Anonymous
04/27/26(Mon)12:10:33 No.108702610

Anonymous 04/27/26(Mon)12:10:33 No.108702610

>>108702579
You are very naive if you think it's the same thing

Anonymous
04/27/26(Mon)12:12:23 No.108702620

Anonymous 04/27/26(Mon)12:12:23 No.108702620

File: sam.jpg (53 KB, 846x672)

53 KB JPG

>>108702541
It's being pioneered by gay retarded elites.

There's ads and talk everywhere of "AI THIS, AI THAT" but no way to actually see it yourself unless you're already in the know. Most normies are trailer park trash. A 20 dollar gift card at walmart, specifically for Google Gemini, would be x1000 more effective than saying "AI. AI. AI!" in the media. Everyone knows AI exists, they just don't know what it can do. Top it off with every public medium being very strict on censorship to the point of autism, and you have the natural horny curiosity of people nerfed too. It is not as accessible neither. You need to buy some very pricey hardware to do it yourself, aka local AI. When people hear "Cloud AI" they really, really don't like it. An instinctual part of them says it's a scam or not worth it. Everything has been free, bought and downloadable until AI wanted to be a monthly fee for something >90% of everyone understands as a quicker google engine that summarizes its own search results.

The only one that's winning the culture influence of AI is Elon Musk, and people are only using it for Twitter.

Anonymous
04/27/26(Mon)12:16:34 No.108702637

Anonymous 04/27/26(Mon)12:16:34 No.108702637

>>108702598
fp8 on 2 3090s go from 50tk/s at 10k context to 12tk/s at 100k context vllm 0.19.0.

Anonymous
04/27/26(Mon)12:22:09 No.108702668

Anonymous 04/27/26(Mon)12:22:09 No.108702668

>>108702541
>make a companion service
>all of a sudden every single women rights ngo, concerned parent, suicidal moron and politician want to know your number

Anonymous
04/27/26(Mon)12:23:49 No.108702680

Anonymous 04/27/26(Mon)12:23:49 No.108702680

>>108702598
This guy's on the job over at ik_ but numbers aren't looking good so far https://github.com/ikawrakow/ik_llama.cpp/pull/1698

Anonymous
04/27/26(Mon)12:24:01 No.108702683

Anonymous 04/27/26(Mon)12:24:01 No.108702683

do we have our own thread schizo now?

Anonymous
04/27/26(Mon)12:26:13 No.108702693

Anonymous 04/27/26(Mon)12:26:13 No.108702693

>>108701545
Anon's got no skills

Anonymous
04/27/26(Mon)12:26:29 No.108702694

Anonymous 04/27/26(Mon)12:26:29 No.108702694

>now

Anonymous
04/27/26(Mon)12:27:15 No.108702701

Anonymous 04/27/26(Mon)12:27:15 No.108702701

I was here too

Anonymous
04/27/26(Mon)12:27:22 No.108702702

Anonymous 04/27/26(Mon)12:27:22 No.108702702

File: 1767935409704810.png (108 KB, 314x278)

108 KB PNG

>>108702668
>>108702563
Didn't Orange Man sign a bill saying anyone can do whatever the fuck they want with AI without restrictions? I assume that's why AI was so censored at first but now I can easily do anal with Gemma 4.

Anonymous
04/27/26(Mon)12:27:54 No.108702706

Anonymous 04/27/26(Mon)12:27:54 No.108702706

>>108702541
jews and feminists want your ai to be safe and censored
when you see the goals of companies like openai and anthropic it's no wonder it won't be funded

Anonymous
04/27/26(Mon)12:28:16 No.108702708

Anonymous 04/27/26(Mon)12:28:16 No.108702708

>>108702702
anon you've been psyopped into using an exit only hole in that manner

Anonymous
04/27/26(Mon)12:29:58 No.108702716

Anonymous 04/27/26(Mon)12:29:58 No.108702716

>anal is allowed as part of a psyop for gay anal sex

Anonymous
04/27/26(Mon)12:32:01 No.108702723

Anonymous 04/27/26(Mon)12:32:01 No.108702723

>>108699780
What model did you use for this? The "Why This Fixes Everything" section has the same obnoxiously dramatic writing style you see in a lot of AI-generated fiction

Anonymous
04/27/26(Mon)12:34:48 No.108702734

Anonymous 04/27/26(Mon)12:34:48 No.108702734

>>108702730
Spreading my jam on this toast

Anonymous
04/27/26(Mon)12:35:38 No.108702738

Anonymous 04/27/26(Mon)12:35:38 No.108702738

>$100 OpenAI subscription
>$200 Anthropic subscription
>3 monitors
>Codex on left monitor
>Zellij Terminal Claude Code on right monitor
>Running two game branch improvements in Codex
>Diagnosing live service issues for job on right monitor
>Watching Dota 2 on middle monitor

I have found happiness in my life

Anonymous
04/27/26(Mon)12:40:18 No.108702763

Anonymous 04/27/26(Mon)12:40:18 No.108702763

>>108702738
awesome, i am happy for you. i hope your happiness lasts

Anonymous
04/27/26(Mon)12:41:29 No.108702768

Anonymous 04/27/26(Mon)12:41:29 No.108702768

>>108702738
Pic for proof?
>1boy, pov, legs_up, crossed_legs, striped_thighhighs, desk, computer_monitor, multiple_monitors, cum_on_self, hairy_legs

Anonymous
04/27/26(Mon)12:50:04 No.108702804

Anonymous 04/27/26(Mon)12:50:04 No.108702804

>>108700224
Showed this to my local qwen and it has now been restructuring my entire codebase for the past 10 minutes. I'm sure this won't lead to any problems.

Anonymous
04/27/26(Mon)12:50:31 No.108702810

Anonymous 04/27/26(Mon)12:50:31 No.108702810

8 hours 0 jannies, impressive

Anonymous
04/27/26(Mon)12:51:45 No.108702814

Anonymous 04/27/26(Mon)12:51:45 No.108702814

>>108702561
>generic 50shades rapist dude
Kek, true. I watched the "My Strange Addiction" AI boyfriend episode and that's exactly what the AI was doing. "She's mine, I want her to get this tattoo as a permanent reminder that I've claimed her". Every single one of her friends and also her tattoo artist said that if it was an actual human boyfriend saying shit like this, they would tell her to run for the hills

Anonymous
04/27/26(Mon)12:51:50 No.108702815

Anonymous 04/27/26(Mon)12:51:50 No.108702815

>>108702810
Start posting vore. That usually summons them.

Anonymous
04/27/26(Mon)12:52:52 No.108702822

Anonymous 04/27/26(Mon)12:52:52 No.108702822

What are all the vibe coding anons here using as their harness? Openclaw?

Anonymous
04/27/26(Mon)12:53:12 No.108702824

Anonymous 04/27/26(Mon)12:53:12 No.108702824

>>108702814
Women are mentally ill episode #13132

Anonymous
04/27/26(Mon)12:53:59 No.108702828

Anonymous 04/27/26(Mon)12:53:59 No.108702828

>>108702822
Manually pasting snippets into notepad.

Anonymous
04/27/26(Mon)12:57:21 No.108702842

Anonymous 04/27/26(Mon)12:57:21 No.108702842

>>108702814
>an actual human boyfriend saying shit like this, they would tell her to run for the hills
In my experience, women are exactly like this. A lot of degenerate women I've seen are into being slaves, bred like animals, rape, and being 'owned'. I wish can I run into more of them so I can further understand what women like in bed. It helps.

Anonymous
04/27/26(Mon)12:57:44 No.108702845

Anonymous 04/27/26(Mon)12:57:44 No.108702845

>>108702822
Opencode, in a VM that's only allowed to access the llama-server endpoint and nothing else (otherwise it tries to phone home). I copy the code in and out with git so it's easy to read a diff of what the AI changed and cherry-pick if I like it. Don't forget, even opening malicious code in your editor can fuck you if you've got an LSP hooked up (might try to build it to get type hints or whatever, and in many languages the build process can run arbitrary code)

Anonymous
04/27/26(Mon)13:03:16 No.108702879

Anonymous 04/27/26(Mon)13:03:16 No.108702879

>>108702842
They like being choked and tied too, hope it helps

Anonymous
04/27/26(Mon)13:05:27 No.108702896

Anonymous 04/27/26(Mon)13:05:27 No.108702896

>>108702879
I literally forgot about the choking.
Maybe I should make a himbo character of all this and see what happens.I'm thinking something like Ghost from CoD, but broader shoulders and 7 feet tall with emo hair.

Anonymous
04/27/26(Mon)13:05:30 No.108702897

Anonymous 04/27/26(Mon)13:05:30 No.108702897

>>108702356
i knew about it and i think it's a cope about trying to explain away a subclass of synchronicities or other weird phenomena.

i know for a fact i've never heard of it before or i'd have immediately autisticaly looked it up.

Anonymous
04/27/26(Mon)13:09:29 No.108702919

Anonymous 04/27/26(Mon)13:09:29 No.108702919

https://github.com/ggml-org/llama.cpp/discussions/14758

Anonymous
04/27/26(Mon)13:09:51 No.108702921

Anonymous 04/27/26(Mon)13:09:51 No.108702921

>>108702912
>>108702912
>>108702912

Anonymous
04/27/26(Mon)13:09:53 No.108702922

Anonymous 04/27/26(Mon)13:09:53 No.108702922

>>108702897
Find the patterns, Anon. You can be the one to tear it all down if you search with all of your strength.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.