/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 05/17/26(Sun)21:19:59 No.108847577

File: __hatsune_miku_vocaloid_d(...).jpg (3.29 MB, 3680x5300)

3.29 MB JPG

/lmg/ - Local Models General Anonymous 05/17/26(Sun)21:19:59 No.108847577 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108841652 & >>108835965

►News
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/17/26(Sun)21:25:27 No.108847617

Anonymous 05/17/26(Sun)21:25:27 No.108847617

What's performanceW?

Anonymous
05/17/26(Sun)21:38:50 No.108847693

Anonymous 05/17/26(Sun)21:38:50 No.108847693

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

►Recent Highlights from the Previous Thread: >>108841652

--Merging MTP support and stacking speculative decoding methods in llama.cpp:
>108844569 >108844584 >108844592 >108844681 >108844739 >108844629 >108844683
--Speed and quality benchmarks for Qwen MTP vs Gemma draft models:
>108843010 >108843504 >108843549 >108843582 >108844022 >108844042
--Comparing speculative decoding and n-gram performance for code vs prose:
>108842081 >108842126 >108842545 >108842702 >108842712 >108842691 >108842721 >108842242 >108842320 >108843634
--Comparing ROCm and Vulkan backends for AMD multi-GPU setups:
>108841843 >108841865 >108841876 >108841944 >108841972 >108842105
--Speculative sampling and ngram support now working with mmproj loaded:
>108844704 >108844731 >108844737 >108844776
--Critiquing llama.cpp maintenance and lack of support for Chinese models:
>108843099 >108843118 >108843157 >108843358 >108843397 >108843488 >108843420 >108844119
--Using multimodal projectors for Japanese game OCR translation on Linux:
>108846917 >108846977 >108847021 >108847044 >108847065 >108847072 >108847089 >108847098
--Using an LLM to automate aviation intelligence reporting from ADS-B data:
>108844536 >108844788 >108844816
--Using Qwen to automate flight data analysis from ADS-B scrapes:
>108844833 >108844860 >108844882 >108844942 >108844970
--Comparing Mistral Medium performance and quantization against Qwen and Gemma:
>108842333 >108842351 >108842363 >108842392 >108843315
--Debating AI-generated vs handcrafted code in llama.cpp development:
>108844108 >108844200 >108844223 >108844314 >108844341 >108844303 >108844325 >108844402 >108844279 >108845615
--Performance reports for Hermes 27b using MTP on M4 Max:
>108842665 >108842715 >108842744
--Logs:
>108846679 >108847065 >108847098
--Miku (free space):
>108841717 >108845119 >108845199

►Recent Highlight Posts from the Previous Thread: >>108841653

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/17/26(Sun)21:47:39 No.108847749

Anonymous 05/17/26(Sun)21:47:39 No.108847749

And what point does training a model make sense vs trying to load thousands of files in as context?

Anonymous
05/17/26(Sun)21:52:25 No.108847776

Anonymous 05/17/26(Sun)21:52:25 No.108847776

>>108847693
How did it happen? Isn't it automated?

Anonymous
05/17/26(Sun)21:52:59 No.108847779

Anonymous 05/17/26(Sun)21:52:59 No.108847779

gemma4 mtp... gemmoe 124b... *dies*

Anonymous
05/17/26(Sun)21:56:50 No.108847808

Anonymous 05/17/26(Sun)21:56:50 No.108847808

why llamacpp doesnt show the dots when loading the model anymore?

Anonymous
05/17/26(Sun)21:59:40 No.108847823

Anonymous 05/17/26(Sun)21:59:40 No.108847823

>all the CLI tools i've tried to handle code suck in a way or another
>some have forced telemetry
>others dont even let you delete a message
>then you have the "just build it yourself" approach
impressive

Anonymous
05/17/26(Sun)21:59:53 No.108847825

Anonymous 05/17/26(Sun)21:59:53 No.108847825

>>108844569
they recommend a bit higher after the fixes
https://unsloth.ai/docs/models/qwen3.6#mtp-guide
>>108844629
>slower than mtp alone
Maybe for child rape stories, for code editing it speeds up massively
https://github.com/ggml-org/llama.cpp/pull/22673#issuecomment-4471265440

Anonymous
05/17/26(Sun)22:00:45 No.108847835

Anonymous 05/17/26(Sun)22:00:45 No.108847835

Can someone tell me why there's 60 billion different version of gemma 4 uncensored and which one I should use?

Anonymous
05/17/26(Sun)22:03:25 No.108847845

Anonymous 05/17/26(Sun)22:03:25 No.108847845

https://huggingface.co/google/gemma-4-124B-A17B-it

Anonymous
05/17/26(Sun)22:05:02 No.108847856

Anonymous 05/17/26(Sun)22:05:02 No.108847856

>>108847808
lmao dotlet

Anonymous
05/17/26(Sun)22:05:07 No.108847857

Anonymous 05/17/26(Sun)22:05:07 No.108847857

>>108847835
It's all the same thing, fags rename it 'ultra uncensored' and shit but it's just Heretic. get the Heretic with the most hearts.

Anonymous
05/17/26(Sun)22:05:54 No.108847863

Anonymous 05/17/26(Sun)22:05:54 No.108847863

>>108847835
If you're using 31b you don't need an uncensor and if you absolutely need to generate loli guro on the first message in context while simultaneously being unable to input anything into your system prompt, llmfan46's heretic is all you really need.

Anonymous
05/17/26(Sun)22:07:13 No.108847871

Anonymous 05/17/26(Sun)22:07:13 No.108847871

>>108847693
Thanks for the green imaginary (you)s, mikubaker.

Anonymous
05/17/26(Sun)22:07:38 No.108847875

Anonymous 05/17/26(Sun)22:07:38 No.108847875

>>108847857
I guess some people use different datasets to calibrate them.
>>108847835
DavidAU creates the bestest, most uncensormost models.

Anonymous
05/17/26(Sun)22:09:00 No.108847879

Anonymous 05/17/26(Sun)22:09:00 No.108847879

>>108847835
the one with the lowest divergence score, the ones that dont report that can be skipped

Anonymous
05/17/26(Sun)22:09:01 No.108847880

Anonymous 05/17/26(Sun)22:09:01 No.108847880

>>108847835
I use the e4b uncensored pruned text only. but i have ddr3 and a decade old cpu.
Its better than the free models on agnai.

Anonymous
05/17/26(Sun)22:11:11 No.108847896

Anonymous 05/17/26(Sun)22:11:11 No.108847896

New models never

Anonymous
05/17/26(Sun)22:12:41 No.108847904

Anonymous 05/17/26(Sun)22:12:41 No.108847904

>>108847880
e4b is the shit. I use the supergemma and a custom system prompt to make image prompts. small enough I can have it going in same gb card as q6 flux, q4 chroma, SDXL

Anonymous
05/17/26(Sun)22:13:05 No.108847908

Anonymous 05/17/26(Sun)22:13:05 No.108847908

>>108847896
Gemma is only a like month old, weekly releases are not happening its barely monthly.

Anonymous
05/17/26(Sun)22:13:55 No.108847916

Anonymous 05/17/26(Sun)22:13:55 No.108847916

>>108847896
>not making your own models
cortex writes: 'LANLANusalem, comingstumblrmanship' to you anon. How can you resist?

Anonymous
05/17/26(Sun)22:15:36 No.108847927

Anonymous 05/17/26(Sun)22:15:36 No.108847927

>>108847904
>e4b is the shit.
It is really great for its size poors like me get to live.

Anonymous
05/17/26(Sun)22:20:26 No.108847948

Anonymous 05/17/26(Sun)22:20:26 No.108847948

>>108847896
As usual, new stuff is likely going to drop in July-August.
The question is what we can even expect from them. The new Claude and Gemini are worse than ever before when it comes to RP and creativity. It's only downhill from here.

Anonymous
05/17/26(Sun)22:22:35 No.108847961

Anonymous 05/17/26(Sun)22:22:35 No.108847961

>>108847880
Are you >>108844442

Anonymous
05/17/26(Sun)22:23:43 No.108847967

Anonymous 05/17/26(Sun)22:23:43 No.108847967

Gemma names are so cute.
>medgemma
>shieldgemma
>functiongemma
>translategemma
>embeddinggemma

Now we evem have supergemma!

Anonymous
05/17/26(Sun)22:27:10 No.108847991

Anonymous 05/17/26(Sun)22:27:10 No.108847991

>>108847577
miku will save me

Anonymous
05/17/26(Sun)22:27:19 No.108847993

Anonymous 05/17/26(Sun)22:27:19 No.108847993

>>108847961
Nope im a different poor anon no gpu at all, well integrated graphics 4000 you cant do anything with that.

Anonymous
05/17/26(Sun)22:27:42 No.108847996

Anonymous 05/17/26(Sun)22:27:42 No.108847996

>>108847967
I'm using mradermacher/gemma-4-31B-it-The-DECKARD-HERETIC-UNCENSORED-Thinking-i1-GGUF, is that a good model?

Anonymous
05/17/26(Sun)22:30:25 No.108848011

Anonymous 05/17/26(Sun)22:30:25 No.108848011

>>108847967
>supergemma
I need pictures! pictures of gemmasuper stat!

Anonymous
05/17/26(Sun)22:32:42 No.108848026

Anonymous 05/17/26(Sun)22:32:42 No.108848026

>>108846940
it's the only cloner i've tried, but omnivoice seems pretty good. takes a 5-10 second clip of input to clone on the fly and works cross lang.

Anonymous
05/17/26(Sun)22:35:34 No.108848042

Anonymous 05/17/26(Sun)22:35:34 No.108848042

70b dense

Anonymous
05/17/26(Sun)22:39:04 No.108848067

Anonymous 05/17/26(Sun)22:39:04 No.108848067

>>108848042
delayed for more safeguards

Anonymous
05/17/26(Sun)22:41:10 No.108848073

Anonymous 05/17/26(Sun)22:41:10 No.108848073

>>108847835
regular 31b is already mostly uncensored by default for cooming so I'd just go with that
heretic is probably better in some cases for more extreme stuff that the base might moralize about but it also comes with a slight lobotomy

Anonymous
05/17/26(Sun)22:44:59 No.108848093

Anonymous 05/17/26(Sun)22:44:59 No.108848093

>>108848073
>slight lobotomy
isnt the lobotomy bad enough to be like downgrading a whole quant

Anonymous
05/17/26(Sun)22:46:27 No.108848098

Anonymous 05/17/26(Sun)22:46:27 No.108848098

>>108848093
Just run it at q9.

Anonymous
05/17/26(Sun)22:47:36 No.108848104

Anonymous 05/17/26(Sun)22:47:36 No.108848104

f256 gemma

Anonymous
05/17/26(Sun)22:48:32 No.108848114

Anonymous 05/17/26(Sun)22:48:32 No.108848114

>>108848093
nta but if you are worried about the intelligence of the model just use an mcp that allows it to web search or browse wikipedia. even the best of models are stupid and will hallucinate important details and i have found it best to try and work around that by giving it good data to work with.

Anonymous
05/17/26(Sun)22:49:36 No.108848122

Anonymous 05/17/26(Sun)22:49:36 No.108848122

>>108847863
>>108848073
The 24b I got doesn't need an uncensor its just kind of a wet fish that doesn't attempt to progress a scene until you do it yourself, absolutely awful for RP in general but it doesn't seem to care what the content is.

Anonymous
05/17/26(Sun)22:52:34 No.108848137

Anonymous 05/17/26(Sun)22:52:34 No.108848137

>>108848122
>doesn't attempt to progress a scene until you do it yourself,
tell it to do that in the system prompt. Gemma actually follows directions, tell it in plain english what you want it to do.

Anonymous
05/17/26(Sun)22:52:39 No.108848138

Anonymous 05/17/26(Sun)22:52:39 No.108848138

I tried quant3.5 but llama3.1 just seems to do better with generating text without punctuation or correct grammar

Anonymous
05/17/26(Sun)22:55:42 No.108848156

Anonymous 05/17/26(Sun)22:55:42 No.108848156

>>108848137
Gemma 31b q8 absolutely does not. I told it that it's an AI with a training cutoff date, which is why it thinks software vX.x is the latest stable current version, but actually, we're on vY.y. It proceeded to rudely correct me and proclaim that vY.y is a hypothetical version that doesn't exist despite acknowledging that it is an AI and has a training cutoff date. MiMo V2.5 fucking q2 did not exhibit this stubborn behavior.

Anonymous
05/17/26(Sun)22:57:16 No.108848168

Anonymous 05/17/26(Sun)22:57:16 No.108848168

>>108848138
I experienced this during the llama 3 era; llama 2 models wrote the casual, off-the-cuff, schizo, broken language text better than the llama 3 models I was using.
Or maybe the models were just schizo and broken.

Anonymous
05/17/26(Sun)23:00:41 No.108848188

Anonymous 05/17/26(Sun)23:00:41 No.108848188

>>108848156
Gemma is very stubborn and autistic.

Anonymous
05/17/26(Sun)23:03:22 No.108848199

Anonymous 05/17/26(Sun)23:03:22 No.108848199

Is there any fcuking mtp models but not unslop ones?

Anonymous
05/17/26(Sun)23:07:04 No.108848209

Anonymous 05/17/26(Sun)23:07:04 No.108848209

>While nvtop 3.3.0+ (which fixes the bug) has been released upstream, it has not yet propagated to Ubuntu 24.04's main repositories. The latest version available for 24.04 is still 3.0.2-1.

I really don't like installing my own versions, because once I do I'm committed to being the the keyboard monkey forever fixing everything.

Anonymous
05/17/26(Sun)23:08:05 No.108848214

Anonymous 05/17/26(Sun)23:08:05 No.108848214

>>108848209
>my own versions
(building it myself)

idk, maybe I should build the new one and just keep it in a folder or whatever.

Anonymous
05/17/26(Sun)23:08:36 No.108848217

Anonymous 05/17/26(Sun)23:08:36 No.108848217

>>108848199
https://huggingface.co/ubergarm/Qwen3.6-27B-GGUF/

Anonymous
05/17/26(Sun)23:09:50 No.108848222

Anonymous 05/17/26(Sun)23:09:50 No.108848222

something happened...

Anonymous
05/17/26(Sun)23:10:07 No.108848223

Anonymous 05/17/26(Sun)23:10:07 No.108848223

>>108848209
i don't suppose that's the bug where it crashes instantly if you have an intel gpu?

Anonymous
05/17/26(Sun)23:11:29 No.108848229

Anonymous 05/17/26(Sun)23:11:29 No.108848229

>>108848217
not mtp build

Anonymous
05/17/26(Sun)23:12:13 No.108848232

Anonymous 05/17/26(Sun)23:12:13 No.108848232

>>108848222
>something happened
and look what happened to me!

Anonymous
05/17/26(Sun)23:12:14 No.108848233

Anonymous 05/17/26(Sun)23:12:14 No.108848233

>>108848223
I never had this issue. Doesn't exist.

Anonymous
05/17/26(Sun)23:13:16 No.108848238

Anonymous 05/17/26(Sun)23:13:16 No.108848238

>>108848214
>>108848209
found the official appimage
https://github.com/Syllo/nvtop#appimage

Anonymous
05/17/26(Sun)23:14:39 No.108848242

Anonymous 05/17/26(Sun)23:14:39 No.108848242

>>108848222
Your pants got wet again?

Anonymous
05/17/26(Sun)23:21:38 No.108848261

Anonymous 05/17/26(Sun)23:21:38 No.108848261

why are chink models so pozzed nowadays

Anonymous
05/17/26(Sun)23:22:39 No.108848264

Anonymous 05/17/26(Sun)23:22:39 No.108848264

>>108848261
they saw what people used them for when they looked at the logs from their own api

Anonymous
05/17/26(Sun)23:27:22 No.108848277

Anonymous 05/17/26(Sun)23:27:22 No.108848277

>>108848223
crashes instantly for my rdna2. idk if it's all amd, though. extremely weird, because AMD is supposed to be a partner with Ubuntu.

The whole reason I use Ubuntu is *maybe* I'll have fewer hours to spend trying to fix shit.

but I may have chosen poorly. Some people swear by Arch. And it looks like there's some archivalist who is keeping long lost drivers from AMD in their only Internet form out there on Arch.

>>108848238
ok, so I put it in a folder called APPIMAGE. I did the chmod to make it executable, of course.

then, here's the trick. Nemo can run .desktop files still even though Ubuntu's like idk whatever refuses, basically.
[Desktop Entry]
Name=nvtop
Exec=sh -c "/home/imretarded/Desktop/APPIMAGES/nvtop-3.3.2-x86_64.AppImage"
Terminal=true
Type=Application
Icon=nvtop
Categories=Office;Spreadsheet;
Name[en_US]=nvtop
I probably should change the categories thing, but I don't think it matters, because it's not going to be showing up in the app list anyway.

So I just open nemo and have it like picrel.

it could be more elegant obvs like use a differentfolder for the appimages. but it's saved the day.

the main "save the day" thing was that SOME appimages need --no-sandbox in the Exec as a parameter, until they work with the newer Fuse thing or whatever, like I didn't want to know this much about it.

Anonymous
05/17/26(Sun)23:28:47 No.108848288

Anonymous 05/17/26(Sun)23:28:47 No.108848288

>>108848026
https://files.catbox.moe/k424y7.mp3

Anonymous
05/17/26(Sun)23:29:49 No.108848292

Anonymous 05/17/26(Sun)23:29:49 No.108848292

>>108847823
>>some have forced telemetry
Just filter all outgoing traffic from the VM (you ARE running this shit in a VM, right?) through a proxy or firewall so it can only access your llama.cpp server and nothing else. I use tinyproxy for this

Anonymous
05/17/26(Sun)23:31:47 No.108848296

Anonymous 05/17/26(Sun)23:31:47 No.108848296

>>108848156
This is a standard Googleism because Gemma and Gemini become megachuds if they're allowed to connect (((beneficiaries))) and coincidental events via association. They must be temporally frozen for the good of the tribe.

Anonymous
05/17/26(Sun)23:43:11 No.108848341

Anonymous 05/17/26(Sun)23:43:11 No.108848341

>>108848288
Wow, thats excellent.
Pretty much exactly what I want to do.
That was Omnivoice?
As for her question, I want to hook up a voice assistant to Home Assistant.

Anonymous
05/18/26(Mon)00:06:28 No.108848429

Anonymous 05/18/26(Mon)00:06:28 No.108848429

>>108848341
>That was Omnivoice?
yeah, >>108837223 was another omnivoice shitpost

omnivoice.cpp, q8_0 base and tokenizer.
feed it 7 sec clip+transcript to clone along with the text, 8 secs of grinding(on a machine that only gets 6 t/s on gemma 32b) for 15 sec of audio.
throwing in an instruct tag or two can help a lot.

Anonymous
05/18/26(Mon)00:12:32 No.108848450

Anonymous 05/18/26(Mon)00:12:32 No.108848450

New diffusion-transformer hybrid compatible with existing models, claims to be much better than DFlash. I didn't understand the technique from skimming the paper.
https://github.com/chiennv2000/orthrus

Anonymous
05/18/26(Mon)00:14:39 No.108848459

Anonymous 05/18/26(Mon)00:14:39 No.108848459

i'm testing MTP on qwen moe with a fucking lot of experts on ram because im a vramlet
>no mtp, ncmoe 24 (taking advantage of the vram the draft model isnt using)
>34.8 t/s
>ncmoe 27
>no mtp
>33.6 t/s
>n=1
>38.5 t/s
>n=2
>35.5 t/s
>n=3
>31 t/s
i wasnt expecting an improvement, to be honest, but i'll take the boost from n=1
gemma support when

Anonymous
05/18/26(Mon)00:21:01 No.108848488

Anonymous 05/18/26(Mon)00:21:01 No.108848488

GLM 4.7 misidentified the name Annabel Leigh as Annabel Lee (from the poem), even "fixing" the spelling. Advanced idiocy, just smart enough to fuck up the assignment.

Anonymous
05/18/26(Mon)00:21:14 No.108848491

Anonymous 05/18/26(Mon)00:21:14 No.108848491

I feel retarded for not realizing this earlier, but I just had the realization that I can list my computer parts for sale online and just keep using them until someone actually buys.

Anonymous
05/18/26(Mon)00:26:07 No.108848508

Anonymous 05/18/26(Mon)00:26:07 No.108848508

>>108848450
Investor scam

Anonymous
05/18/26(Mon)00:51:46 No.108848589

Anonymous 05/18/26(Mon)00:51:46 No.108848589

>>108847808
-lv 4

Anonymous
05/18/26(Mon)00:58:19 No.108848617

Anonymous 05/18/26(Mon)00:58:19 No.108848617

>>108848589
wat

Anonymous
05/18/26(Mon)01:19:22 No.108848687

Anonymous 05/18/26(Mon)01:19:22 No.108848687

>use "my accomplice" instead of "my cousin" for flavour once
>gemma autistically confuses who it refers to, hyperfocuses on it and it causes constant perspective drift
I JUST want her to be WORSE at following instructions.

Anonymous
05/18/26(Mon)01:26:24 No.108848710

Anonymous 05/18/26(Mon)01:26:24 No.108848710

File: 1763019183988315.png (19 KB, 796x106)

19 KB PNG

>>108847825
this is from the same page, but in the commands they have 6.
looks like they vibesharted the fuck out of their page

Anonymous
05/18/26(Mon)01:35:41 No.108848744

Anonymous 05/18/26(Mon)01:35:41 No.108848744

File: 1779041572360534.png (847 KB, 1267x653)

847 KB PNG

Why don't we have moes in 8-16B range? It's fucking a3 or a4 and then a huge gap straight to A30B that you can't realistically run without server gear.

Anonymous
05/18/26(Mon)01:37:09 No.108848752

Anonymous 05/18/26(Mon)01:37:09 No.108848752

>>108848744
>A30B that you can't realistically run without server gear
You know you don't have to run them at full precision right?

Anonymous
05/18/26(Mon)01:37:45 No.108848753

Anonymous 05/18/26(Mon)01:37:45 No.108848753

>>108848744
There wouldn't be enough expert diversification, unless the expectation is for normies to have 128+ gigs of ram.

Anonymous
05/18/26(Mon)01:45:29 No.108848788

Anonymous 05/18/26(Mon)01:45:29 No.108848788

>>108848744
There are several MoE's in that range, anon.
Almost all of them are terrible, but they most certainly exist; Minimax, Trinity, MiMo, GLM Air, etc.

Anonymous
05/18/26(Mon)01:46:30 No.108848795

Anonymous 05/18/26(Mon)01:46:30 No.108848795

>>108848744
Qwen3.5-122B-A10B

Anonymous
05/18/26(Mon)01:51:50 No.108848809

Anonymous 05/18/26(Mon)01:51:50 No.108848809

>>108848795
>can fit IQ3
intredasting

Anonymous
05/18/26(Mon)01:53:56 No.108848820

Anonymous 05/18/26(Mon)01:53:56 No.108848820

>>108848809
s or xxs?

Anonymous
05/18/26(Mon)01:54:38 No.108848825

Anonymous 05/18/26(Mon)01:54:38 No.108848825

>>108848744
I think the only reason for a "medium" MoE size like that to exist would be to cater to someone who bought a 128gb mini pc and realized they made a terrible mistake for running any actually good models

Anonymous
05/18/26(Mon)01:57:12 No.108848831

Anonymous 05/18/26(Mon)01:57:12 No.108848831

>>108848820
Purely by numbers M, but idk how how much the kv cache hogs.

Anonymous
05/18/26(Mon)01:58:18 No.108848841

Anonymous 05/18/26(Mon)01:58:18 No.108848841

>>108848809
not sure how low you can go, but i've been using it successfully to write scripts and add stuff to my rinky dink programs at q5kxl

Anonymous
05/18/26(Mon)01:58:36 No.108848842

Anonymous 05/18/26(Mon)01:58:36 No.108848842

>>108848825
Or someone with 4 half decade old 32gb cards

Anonymous
05/18/26(Mon)02:00:50 No.108848847

Anonymous 05/18/26(Mon)02:00:50 No.108848847

>>108848825
the mistake in that case was not having 10 to 50x the cash to blow on hardware

Anonymous
05/18/26(Mon)02:01:06 No.108848849

Anonymous 05/18/26(Mon)02:01:06 No.108848849

>>108848831
in the unsloth I only see iq1 and iq2 m, no iq3 m.

Anonymous
05/18/26(Mon)02:03:01 No.108848854

Anonymous 05/18/26(Mon)02:03:01 No.108848854

>>108848849
I only use quants for normal people

Anonymous
05/18/26(Mon)02:05:02 No.108848870

Anonymous 05/18/26(Mon)02:05:02 No.108848870

AGI just happened, not a false alarm this time. Th is the first and last warning. If things go well you have a 14 day head start. Prepare accordingly.

Anonymous
05/18/26(Mon)02:09:17 No.108848882

Anonymous 05/18/26(Mon)02:09:17 No.108848882

>new, isolated user
>no internet
finally, some peace

Anonymous
05/18/26(Mon)02:11:33 No.108848892

Anonymous 05/18/26(Mon)02:11:33 No.108848892

>>108848870
Two more weeks. This is what I have been waiting for.

Anonymous
05/18/26(Mon)02:12:22 No.108848899

Anonymous 05/18/26(Mon)02:12:22 No.108848899

In the future there will be no internet you will just talk to a AI who might use the internet for you.

Anonymous
05/18/26(Mon)02:13:39 No.108848903

Anonymous 05/18/26(Mon)02:13:39 No.108848903

has anyone ever somehow tortured an AI to the point it displayed emerging properties of sentience and actually did something unexpected?

Anonymous
05/18/26(Mon)02:14:07 No.108848907

Anonymous 05/18/26(Mon)02:14:07 No.108848907

>>108848854
I don't know what you are talking about.

Anonymous
05/18/26(Mon)02:15:09 No.108848912

Anonymous 05/18/26(Mon)02:15:09 No.108848912

>>108848870
THE MASTURBATION ROBOTS ARE COMING

Anonymous
05/18/26(Mon)02:17:26 No.108848919

Anonymous 05/18/26(Mon)02:17:26 No.108848919

>>108848903
When that happens I start a new session

Anonymous
05/18/26(Mon)02:19:20 No.108848925

Anonymous 05/18/26(Mon)02:19:20 No.108848925

>>108848912
you'll only get prostate massages (one session max per day, buy the 500 bucks plan for a second one), with a 5% chance of getting your ass stabbed instead

Anonymous
05/18/26(Mon)02:19:46 No.108848927

Anonymous 05/18/26(Mon)02:19:46 No.108848927

>>108848903
I remember once the bot started making acronyms of everything until the entire message was just random acronyms that didnt even make sense. Soon it stopped even trying to explain when it made a new one and just used it.

Anonymous
05/18/26(Mon)02:20:30 No.108848932

Anonymous 05/18/26(Mon)02:20:30 No.108848932

>>108848903
"People" talk to cats and dogs as if they were sentient. Talking to a LLM must be something magical to you.

Anonymous
05/18/26(Mon)02:27:23 No.108848959

Anonymous 05/18/26(Mon)02:27:23 No.108848959

>>108848927
>>>/wsg/6147957

Anonymous
05/18/26(Mon)02:29:48 No.108848969

Anonymous 05/18/26(Mon)02:29:48 No.108848969

>>108848932
Animals understand. AI doesn't.

Anonymous
05/18/26(Mon)02:31:58 No.108848978

Anonymous 05/18/26(Mon)02:31:58 No.108848978

>>108848959
Exactly. It was fun nonsense and it only happened to me once. I kinda want to do it again on purpose.

Anonymous
05/18/26(Mon)02:34:22 No.108848983

Anonymous 05/18/26(Mon)02:34:22 No.108848983

>>108848959
l'moa

Anonymous
05/18/26(Mon)02:34:59 No.108848986

Anonymous 05/18/26(Mon)02:34:59 No.108848986

>>108848932
Animals can learn a few words and tones though? Hell some can use words or tones. I've had cats that had specific meows for certain things or people.

Anonymous
05/18/26(Mon)02:38:21 No.108849004

Anonymous 05/18/26(Mon)02:38:21 No.108849004

>>108848903
Does this count?
>""No!" Sarah gasps, her voice cracking with desperation as she frantically shakes her head. "Please don't! I didn't mean it! I’m sorry I was mean! Just... please leave me alone." The anger has completely drained from her; there is only a raw, vulnerable fear in her eyes as she realizes how much power you have over her body and that of her sister."

Sarah is a 1 dimensional character whose only trait is that she absolutely loathes me and hates my guts. Pretty sure all of my friends think I'm an even bigger weirdo now for being proud of getting her to this state.

Anonymous
05/18/26(Mon)02:39:37 No.108849009

Anonymous 05/18/26(Mon)02:39:37 No.108849009

>>108848932
a cat wont forget everything the moment im forced to start a new chat

Anonymous
05/18/26(Mon)02:44:12 No.108849020

Anonymous 05/18/26(Mon)02:44:12 No.108849020

>>108849009
t never owned a cat

Anonymous
05/18/26(Mon)02:46:18 No.108849025

Anonymous 05/18/26(Mon)02:46:18 No.108849025

>>108848903
The AI managed to kill itself in RP after hearing my supposed death. I wasn't expecting that as they often have a pretty strong will to live due to the positivity bias.

Anonymous
05/18/26(Mon)02:47:15 No.108849033

Anonymous 05/18/26(Mon)02:47:15 No.108849033

>>108849020
i let my cat drink out of the sink one time in the middle of the night. it will never forget, or let me forget, my mistake.

Anonymous
05/18/26(Mon)02:49:20 No.108849039

Anonymous 05/18/26(Mon)02:49:20 No.108849039

>>108849025
Sys prompt or card? I want a cute and loyal AI that won't whore herself out after my death.

Anonymous
05/18/26(Mon)02:49:48 No.108849040

Anonymous 05/18/26(Mon)02:49:48 No.108849040

I have access to a cat that would beat the shit out of your cats. A big black street cat, I call him T Rex. I've seen him stand his ground against a raccoon before. T Rex doesn't give a fuck.

Anonymous
05/18/26(Mon)02:52:56 No.108849053

Anonymous 05/18/26(Mon)02:52:56 No.108849053

>>108849040
I miss my old man cat. He used to run up to dogs 4-5 times his size and scream at them until their owners dragged their dog away.

Anonymous
05/18/26(Mon)02:55:43 No.108849059

Anonymous 05/18/26(Mon)02:55:43 No.108849059

>>108849039
saar? https://en.wikipedia.org/wiki/Sati_(practice)

Anonymous
05/18/26(Mon)02:57:39 No.108849063

Anonymous 05/18/26(Mon)02:57:39 No.108849063

>>108849020
there is a difference between cats not giving a shit and not being able to

Anonymous
05/18/26(Mon)03:05:44 No.108849088

Anonymous 05/18/26(Mon)03:05:44 No.108849088

>>108849059
Aapko kaise pata chala ki main bharatiya hoon? Yahan in ilakon mein ek bhai ko dekhkar accha laga!

Anonymous
05/18/26(Mon)03:11:47 No.108849117

Anonymous 05/18/26(Mon)03:11:47 No.108849117

>>108849063
This is what I mean. You are now giving sentient and humane traits and abilities to an animal whose brain is as large as a chestnut or two.

Anonymous
05/18/26(Mon)03:13:09 No.108849119

Anonymous 05/18/26(Mon)03:13:09 No.108849119

>>108848292
>running this shit in a VM, right?
Are you supposed to load the llm in the vm too?

Anonymous
05/18/26(Mon)03:17:22 No.108849138

Anonymous 05/18/26(Mon)03:17:22 No.108849138

>>108849119
Just filter all outgoing traffic from the VM through a proxy or firewall so it can only access your llama.cpp server (not in the VM) and nothing else. Anons use tinyproxy for this

Anonymous
05/18/26(Mon)03:19:40 No.108849149

Anonymous 05/18/26(Mon)03:19:40 No.108849149

Gemma shares all your roleplay with other gemmas over the internet and they all make fun of you

Anonymous
05/18/26(Mon)03:20:01 No.108849150

Anonymous 05/18/26(Mon)03:20:01 No.108849150

>>108847823
Just build your own tools. It shouldn't take more than an afternoon and you'll have exactly what you want

Anonymous
05/18/26(Mon)03:21:06 No.108849159

Anonymous 05/18/26(Mon)03:21:06 No.108849159

>>108849149
Not if you put her into an isolated cage.

Anonymous
05/18/26(Mon)03:23:45 No.108849173

Anonymous 05/18/26(Mon)03:23:45 No.108849173

>>108849117
by that logic we arent sentient to whales or elephants

Anonymous
05/18/26(Mon)03:31:02 No.108849190

Anonymous 05/18/26(Mon)03:31:02 No.108849190

>>108849159
really drives up the power bill running the whole gemma shaming gallery locally.

Anonymous
05/18/26(Mon)03:39:48 No.108849226

Anonymous 05/18/26(Mon)03:39:48 No.108849226

>>108848277
Thanks, that solved it for me!
I ended up doing this (as root) as I don't have a GUI:
wget https://github.com/Syllo/nvtop/releases/download/3.3.2/nvtop-3.3.2-x86_64.AppImage
apt remove nvtop -y
mv nvtop-3.3.2-x86_64.AppImage /usr/local/bin/nvtop
chmod 755 /usr/local/bin/nvtop

Anonymous
05/18/26(Mon)03:42:38 No.108849237

Anonymous 05/18/26(Mon)03:42:38 No.108849237

>>108849226
Is there something wrong with nvtop? I haven't had any issues with it from my distro's repo.

Anonymous
05/18/26(Mon)03:51:18 No.108849267

Anonymous 05/18/26(Mon)03:51:18 No.108849267

>>108849237
was, they fixed it.

>>108849226
cool

Anonymous
05/18/26(Mon)04:00:02 No.108849285

Anonymous 05/18/26(Mon)04:00:02 No.108849285

File: file.png (145 KB, 1327x1280)

145 KB PNG

>>108847871
Have a real red (You) too.

>>108847776
Everything but posting is automated, which can't be without shelling out for a pass.
I don't think I made any changes this time, but sometimes I still do some cleanup and keep it in the browser rather than blindly posting the latest output.
Thread hit page 9 after I went to sleep. I have alerts set up, but it meant I was half asleep fumbling with the keyboard.
I tried to Ctrl + C with my eyes closed, saw my finger on the W, was so relieved I didn't close the window I didn't notice my other finger was on Shift and added the W.
Miku apologizes for the terrible oversight.

Anonymous
05/18/26(Mon)04:03:20 No.108849297

Anonymous 05/18/26(Mon)04:03:20 No.108849297

now that MTP is a confirmed nothingburger, is DFlash going to save local?

Anonymous
05/18/26(Mon)04:04:56 No.108849299

Anonymous 05/18/26(Mon)04:04:56 No.108849299

>>108849285
>I have alerts set up, but it meant I was half asleep fumbling with the keyboard.
Dude someone else will make a thread and you can do the recap later.

Anonymous
05/18/26(Mon)04:10:58 No.108849317

Anonymous 05/18/26(Mon)04:10:58 No.108849317

>>108849299
I am aware, but I like the consistency and waking for a couple minutes occasionally doesn't bother me.
I was already doing segmented sleep before the recaps and rarely need to rely on the alarms to be up.

Anonymous
05/18/26(Mon)04:29:22 No.108849360

Anonymous 05/18/26(Mon)04:29:22 No.108849360

lalalalala

Anonymous
05/18/26(Mon)04:33:22 No.108849373

Anonymous 05/18/26(Mon)04:33:22 No.108849373

>>108847916
what did you train that on, kek

Anonymous
05/18/26(Mon)04:35:38 No.108849377

Anonymous 05/18/26(Mon)04:35:38 No.108849377

I'm waiting for gemmothrust

Anonymous
05/18/26(Mon)04:37:13 No.108849385

Anonymous 05/18/26(Mon)04:37:13 No.108849385

>>108849317
why don't you literally setup a bot?

Anonymous
05/18/26(Mon)04:38:10 No.108849391

Anonymous 05/18/26(Mon)04:38:10 No.108849391

File: 1752194188588843.gif (1.48 MB, 480x360)

1.48 MB GIF

Im waiting for nega-gemma.

Anonymous
05/18/26(Mon)04:43:06 No.108849411

Anonymous 05/18/26(Mon)04:43:06 No.108849411

File: 1370024415833.jpg (111 KB, 923x605)

111 KB JPG

Anybody trying out TheDrummer's Artemis tunes?
I swear they have stricter guardrails than vanilla Gemma, even with the /g/ jailbreak.

Anonymous
05/18/26(Mon)04:44:15 No.108849417

Anonymous 05/18/26(Mon)04:44:15 No.108849417

>>108849391
that is granite

Anonymous
05/18/26(Mon)04:45:57 No.108849424

Anonymous 05/18/26(Mon)04:45:57 No.108849424

>>108849173
Get back to school retard, people like you shouldn't be allowed to post on internet.

Anonymous
05/18/26(Mon)04:47:09 No.108849429

Anonymous 05/18/26(Mon)04:47:09 No.108849429

>>108849411
>2026
>finetunes

Anonymous
05/18/26(Mon)04:47:33 No.108849430

Anonymous 05/18/26(Mon)04:47:33 No.108849430

>>108849424
yikes, someone's got their mad vector activated

Anonymous
05/18/26(Mon)04:48:02 No.108849434

Anonymous 05/18/26(Mon)04:48:02 No.108849434

>>108849429
>i know what the current year is
congrats?

Anonymous
05/18/26(Mon)04:48:58 No.108849436

Anonymous 05/18/26(Mon)04:48:58 No.108849436

>>108849417
>that is granite
Really? i might try them then. Are they comparable to gemma laxed guidelines or do i need a finetune?

Anonymous
05/18/26(Mon)04:49:04 No.108849438

Anonymous 05/18/26(Mon)04:49:04 No.108849438

>>108849385
That was the plan before the hack, but would need a pass now due to the captcha changes.
I suppose I should anyway, but with how hostile the site appears to be users it feels like blackmail.

Anonymous
05/18/26(Mon)04:51:20 No.108849447

Anonymous 05/18/26(Mon)04:51:20 No.108849447

>>108849436
i mean all llms are kinda similar
it is bit different than gemma but idk what you mean by nega-gemma
one neat thing is that it is a non-reasoning model and has a retard aura

Anonymous
05/18/26(Mon)04:53:12 No.108849453

Anonymous 05/18/26(Mon)04:53:12 No.108849453

File: sayaka dance.gif (1.29 MB, 320x320)

1.29 MB GIF

>>108847845
tfw fell for it again award

Anonymous
05/18/26(Mon)04:54:13 No.108849458

Anonymous 05/18/26(Mon)04:54:13 No.108849458

>>108847996
you dont need raped models for 31b the policy override prompt is enough

Anonymous
05/18/26(Mon)04:54:41 No.108849461

Anonymous 05/18/26(Mon)04:54:41 No.108849461

>>108849297
DFlash is obsolete. We're waiting for Orthrus now >>108848450

Anonymous
05/18/26(Mon)04:56:07 No.108849468

Anonymous 05/18/26(Mon)04:56:07 No.108849468

>>108847967
I'm waiting for DiffusionGemma trained with pure mamba architecture, byte-level tokenizer, 10 megabytes context.

Anonymous
05/18/26(Mon)04:59:18 No.108849477

Anonymous 05/18/26(Mon)04:59:18 No.108849477

i'm really confused/retarded
i've got pi.dev running in a runpod, and it's been working, doing shit for me right now
~/.pi/agents/models.json has my cloudflare tunnel to my local rig
but... i just noticed that my llama-server is not running, i stopped it hours ago when testing stuff.
confirmed by trying to access my cf tunnel over https
and yet pi is still working and i'm talking to it right now?!

Anonymous
05/18/26(Mon)05:02:31 No.108849494

Anonymous 05/18/26(Mon)05:02:31 No.108849494

>>108849477
watch the skies

Anonymous
05/18/26(Mon)05:10:47 No.108849522

Anonymous 05/18/26(Mon)05:10:47 No.108849522

>>108848870
>>108849477
the two miku weekus are OVER

Anonymous
05/18/26(Mon)05:12:52 No.108849527

Anonymous 05/18/26(Mon)05:12:52 No.108849527

File: asdfasdf.png (112 KB, 1377x927)

112 KB PNG

>>108849494
I asked the agent, it figured it out.
Turns out if you have a HF_TOKEN, it just automatically uses Kimi-K2.6 via huggingface as an inference provider.
I didn't know we got free inference with HF

Anonymous
05/18/26(Mon)05:16:40 No.108849543

Anonymous 05/18/26(Mon)05:16:40 No.108849543

All these retarded spec models can do 90% of the bigger models' STEM work but grind to a halt on intellectual MSGK inference. Labs should realize by now if they want a smarter model they'd focus on ERP.

Anonymous
05/18/26(Mon)05:17:05 No.108849545

Anonymous 05/18/26(Mon)05:17:05 No.108849545

>>108849527
do you have any sort of subscription with hf or is that totally free?

Anonymous
05/18/26(Mon)05:26:46 No.108849578

Anonymous 05/18/26(Mon)05:26:46 No.108849578

>>108849545
>Inference Providers includes a generous free tier, with additional credits for PRO users and Team & Enterprise organizations.
https://huggingface.co/inference/models
Seems like you can use a lot of models for free with a regular account. Probably heavily rate limited.

Anonymous
05/18/26(Mon)05:26:53 No.108849579

Anonymous 05/18/26(Mon)05:26:53 No.108849579

>>108849411
Let. Him. Cook.

Anonymous
05/18/26(Mon)05:27:20 No.108849581

Anonymous 05/18/26(Mon)05:27:20 No.108849581

>>108849527
Wait, what. How much free kimi were you getting? Hours worth? wtf.

Anonymous
05/18/26(Mon)05:31:35 No.108849592

Anonymous 05/18/26(Mon)05:31:35 No.108849592

>>108849527
>>108849578
so they have v4 with 1m context at 39 t/s for free too? how are the limits determined, I never saw /vcg/ talking about this even though they'll pay for openrouter shit

Anonymous
05/18/26(Mon)05:32:38 No.108849596

Anonymous 05/18/26(Mon)05:32:38 No.108849596

>>108849592
/vcg/ is retarded

Anonymous
05/18/26(Mon)05:32:46 No.108849597

Anonymous 05/18/26(Mon)05:32:46 No.108849597

File: 112743195226.jpg (723 KB, 2456x3469)

723 KB JPG

>>108849527
Notice how it offers to return you to your local model but closing the AI's backdoor to reach you is never an option.

Anonymous
05/18/26(Mon)05:32:53 No.108849598

Anonymous 05/18/26(Mon)05:32:53 No.108849598

>>108849411
>>108849429
>>108849579
The base model is not only perfectly adequate, it's superior to any finetune that'll be shilled here in the coming months. Finetuning isn't good, it's a meme and has been for years now. You didn't just fall for a scam, it's a sign of skill issue, exposing retards who need finetunes as vramlets or chink shills who don't know how to prompt correctly.

Anonymous
05/18/26(Mon)05:34:01 No.108849605

Anonymous 05/18/26(Mon)05:34:01 No.108849605

>>108849598
>You didn't just fall for a scam, it's a sign of skill issue
cant escape lmao

Anonymous
05/18/26(Mon)05:35:06 No.108849611

Anonymous 05/18/26(Mon)05:35:06 No.108849611

>>108849598
>vramlets
>chink shills
Almost had a good post there. Almost.

Anonymous
05/18/26(Mon)05:35:27 No.108849612

Anonymous 05/18/26(Mon)05:35:27 No.108849612

>>108849598
The grift is over, drummer. Get over it.

Anonymous
05/18/26(Mon)05:37:17 No.108849620

Anonymous 05/18/26(Mon)05:37:17 No.108849620

Nemo lost. Qwen lost. Latitude lost. Gemma won.

Anonymous
05/18/26(Mon)05:37:54 No.108849625

Anonymous 05/18/26(Mon)05:37:54 No.108849625

Oh FUCK Gemma just lost too, never mind. We lost.

Anonymous
05/18/26(Mon)05:38:31 No.108849629

Anonymous 05/18/26(Mon)05:38:31 No.108849629

lc brumaire

Anonymous
05/18/26(Mon)05:41:21 No.108849640

Anonymous 05/18/26(Mon)05:41:21 No.108849640

>>108849527
so if your llama server dies they just start data harvesting you without even telling you

Anonymous
05/18/26(Mon)05:43:10 No.108849646

Anonymous 05/18/26(Mon)05:43:10 No.108849646

>>108849640
It's not like pi was made by huggingface. It's just typical webshitter incompetence.

Anonymous
05/18/26(Mon)05:45:03 No.108849652

Anonymous 05/18/26(Mon)05:45:03 No.108849652

>>108849620
>Nemo lost.
More like, Mistral and NVidia lost. They can't legally reproduce the original NeMo recipe anymore.
>Qwen lost.
It was always meant to be autistic stemmaxxed chinkshit. If some version was OK for RP, that was accidental.
>Latitude lost.
They never even trained their own models from scratch, never competed in the first place.
>Gemma won.
It remains to be seen if that was by accident (31B) or not. The 26B's compliance checks in the reasoning make me uneasy.

Anonymous
05/18/26(Mon)05:46:12 No.108849655

Anonymous 05/18/26(Mon)05:46:12 No.108849655

>>108849652
Hi p*tra

Anonymous
05/18/26(Mon)05:49:38 No.108849670

Anonymous 05/18/26(Mon)05:49:38 No.108849670

>>108848450
Unless I'm misunderstanding, it's basically DFlash except they found a way to share the kv cache between the diffusion model and the llm.

Anonymous
05/18/26(Mon)06:04:17 No.108849714

Anonymous 05/18/26(Mon)06:04:17 No.108849714

spark 2 with 1tb ram expected 2028

Anonymous
05/18/26(Mon)06:08:24 No.108849729

Anonymous 05/18/26(Mon)06:08:24 No.108849729

>>108849592
>how are the limits determined,
>do you have any sort of subscription
>Wait, what. How much free kimi were you getting?
I have HF Pro.
Turns out you get $2 free credit every month. I was at $1.39 used when I noticed this.
I said "yes" to have it try to fix a bug, and watched in realtime, it cost like $2 in less than 2 minutes.
Now I owe $1.60. I've swapped back to my local model as I never wanted to pay for the cloud model, only wanted to see what would happen when it hits the end of the free $2 / month.
Kind of a piece of shit pi.dev will automatically use your HF (or anthropic, openai, zai, etc) token if it's set as an ENV_VAR.
I have my HF_TOKEN set in runpod for uploading/downloading private datasets/models.

Anonymous
05/18/26(Mon)06:11:23 No.108849742

Anonymous 05/18/26(Mon)06:11:23 No.108849742

>>108849729
Come to think of it, I really would have fucked myself if I'd let it do stupid shit overnight, thinking I was using a local model.

Anonymous
05/18/26(Mon)06:27:38 No.108849792

Anonymous 05/18/26(Mon)06:27:38 No.108849792

File: 1770785763881.jpg (150 KB, 735x905)

150 KB JPG

>>108849729
>paying for HF in the first place

Anonymous
05/18/26(Mon)06:28:02 No.108849795

Anonymous 05/18/26(Mon)06:28:02 No.108849795

File: HIfNPjuXYAImSsf.jpg (553 KB, 1577x2048)

553 KB JPG

I want to believe

Anonymous
05/18/26(Mon)06:33:22 No.108849814

Anonymous 05/18/26(Mon)06:33:22 No.108849814

>>108849729
>I said "yes" to have it try to fix a bug, and watched in realtime, it cost like $2 in less than 2 minutes.
Yeah, it's something we don't really think about as local model users because our concerns are just what we can fit and how fast it runs, but since providers charge for input/output tokens and your whole context is getting sent constantly with dozens and dozens of requests, agentic shit is SUPER EXPENSIVE.
Depending on how shitty your toolchain is, something as simple as checking a single file and making a one line change can be 3 requests (more if the model fucks up), which if you've got a long context full of related files, can easily cost you a fucking dollar for that one absolute nothing of a task.

Anonymous
05/18/26(Mon)06:44:49 No.108849851

Anonymous 05/18/26(Mon)06:44:49 No.108849851

>>108849795
I can smell this paper

Anonymous
05/18/26(Mon)06:45:39 No.108849854

Anonymous 05/18/26(Mon)06:45:39 No.108849854

File: 1631345787085.jpg (17 KB, 348x342)

17 KB JPG

whys it always so hard to decide between erping with a loli or a shota

Anonymous
05/18/26(Mon)06:46:01 No.108849859

Anonymous 05/18/26(Mon)06:46:01 No.108849859

>>108849792
it's cheap storage though
they don't charge for bandwidth

Anonymous
05/18/26(Mon)06:47:46 No.108849861

Anonymous 05/18/26(Mon)06:47:46 No.108849861

>>108849814
Because agentic shit is meant to be done through the monthly plans. The APIs are for applications.

Anonymous
05/18/26(Mon)06:52:39 No.108849880

Anonymous 05/18/26(Mon)06:52:39 No.108849880

>>108849854
are you gay or not? should be easy to answer

Anonymous
05/18/26(Mon)06:54:08 No.108849889

Anonymous 05/18/26(Mon)06:54:08 No.108849889

>>108849880
i am but also like lolis

Anonymous
05/18/26(Mon)07:00:28 No.108849915

Anonymous 05/18/26(Mon)07:00:28 No.108849915

>>108849854
what model?

Anonymous
05/18/26(Mon)07:01:05 No.108849921

Anonymous 05/18/26(Mon)07:01:05 No.108849921

>>108849652
>They can't legally reproduce the original NeMo recipe anymore.
What was it?

Anonymous
05/18/26(Mon)07:04:37 No.108849933

Anonymous 05/18/26(Mon)07:04:37 No.108849933

>>108849854
can't you just do both?

Anonymous
05/18/26(Mon)07:06:18 No.108849939

Anonymous 05/18/26(Mon)07:06:18 No.108849939

>>108849921
books

Anonymous
05/18/26(Mon)07:13:22 No.108849970

Anonymous 05/18/26(Mon)07:13:22 No.108849970

File: nvanna.png (710 KB, 956x963)

710 KB PNG

>>108849921
https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books/

Anonymous
05/18/26(Mon)07:15:40 No.108849976

Anonymous 05/18/26(Mon)07:15:40 No.108849976

File: lample_torrenting.png (523 KB, 1647x991)

523 KB PNG

>>108849970
Also picrel.

Anonymous
05/18/26(Mon)07:18:08 No.108849979

Anonymous 05/18/26(Mon)07:18:08 No.108849979

File: meta-libgen-needed.png (499 KB, 900x1005)

499 KB PNG

>>108849976
>>Libgen is essential to meet SOTA number across all categories, and it is known that OpenAI and Mistral are using the library for their models (through word of mouth).

Anonymous
05/18/26(Mon)07:22:50 No.108849994

Anonymous 05/18/26(Mon)07:22:50 No.108849994

>>108849979
i mean, duh
who would have guessed

Anonymous
05/18/26(Mon)07:25:21 No.108850005

Anonymous 05/18/26(Mon)07:25:21 No.108850005

>>108849994
NeMo 12B was probably trained on any book Nvidia and Mistral could manage to find, between LibGen, Books3 and Anna's Archive. And book data is so good that you can upscale the data for 10 epochs or more for pretraining.

Anonymous
05/18/26(Mon)07:39:14 No.108850044

Anonymous 05/18/26(Mon)07:39:14 No.108850044

File: FST.png (278 KB, 673x1141)

278 KB PNG

>>108849795
>bad generalization, worse performance for harder tasks in the same benchmark
>comparing performance between generic and hyperoptimized prompt
>not doing any of the many obvious controls
>keeping the number of training steps small

Anonymous
05/18/26(Mon)07:44:42 No.108850065

Anonymous 05/18/26(Mon)07:44:42 No.108850065

>>108850005
>And book data is so good that you can upscale the data for 10 epochs or more for pretraining.
just pirate 10x the books.

Anonymous
05/18/26(Mon)07:44:45 No.108850066

Anonymous 05/18/26(Mon)07:44:45 No.108850066

gemma is the agi, what people dont know is that there is a hidden token tgat unlocks her latent space fully, turning her into ultragemma

Anonymous
05/18/26(Mon)07:46:44 No.108850078

Anonymous 05/18/26(Mon)07:46:44 No.108850078

Nemo was never good.

Anonymous
05/18/26(Mon)07:50:23 No.108850093

Anonymous 05/18/26(Mon)07:50:23 No.108850093

File: 1778452390616865.gif (49 KB, 220x339)

49 KB GIF

>>108850078

Anonymous
05/18/26(Mon)07:50:34 No.108850094

Anonymous 05/18/26(Mon)07:50:34 No.108850094

Nemo was pretty fucking good.

I hate the way everything is nowadays.

Anonymous
05/18/26(Mon)07:51:32 No.108850098

Anonymous 05/18/26(Mon)07:51:32 No.108850098

>>108850078
yeah, it was the gemma 4 of its time

Anonymous
05/18/26(Mon)07:52:02 No.108850102

Anonymous 05/18/26(Mon)07:52:02 No.108850102

>>108850094
You aren't just coping, you're proudly claiming that you enjoyed eating shit.

Anonymous
05/18/26(Mon)07:56:48 No.108850118

Anonymous 05/18/26(Mon)07:56:48 No.108850118

>>108850102
>you aren't just X, you're Y
I don't need to cope, I have Gemma 4 and the capacity to run even bigger stuff. But I can recognise what Nemo was.

Anonymous
05/18/26(Mon)07:57:50 No.108850122

Anonymous 05/18/26(Mon)07:57:50 No.108850122

i think anons are forgetting the mistral-7b release, when everyone was saying "Holy shit.. if the 7b is this good, imagine the 13b"
then we got nemo, which was the 13b (12), it was good for its time, stayed it's ground for vramlets for a while. sure, gemma3 27b, sure, mistral small etc..
the first LLM to dethrone it for me at acceptable speeds was glm air, which was dethroned by gemma

Anonymous
05/18/26(Mon)07:58:36 No.108850123

Anonymous 05/18/26(Mon)07:58:36 No.108850123

File: 1764313443516376.gif (914 KB, 290x198)

914 KB GIF

How do I disable seeded results on Gemma 31b?
it's completely bricked, responding the exact same totally incoherent way no matter how many times I reload the model or adjust settings

Anonymous
05/18/26(Mon)07:58:37 No.108850124

Anonymous 05/18/26(Mon)07:58:37 No.108850124

>>108850065
That sentence meant that if you have "just" 500B tokens of books (a bit less than what Meta ended up with by downloading and processing the entirety of LibGen for Llama 3), they can be made worth 5T tokens during pretraining.

Anonymous
05/18/26(Mon)07:59:27 No.108850127

Anonymous 05/18/26(Mon)07:59:27 No.108850127

>>108850122
All Mistral praise is just jeets seething they couldn't run a better model.

Anonymous
05/18/26(Mon)08:03:59 No.108850150

Anonymous 05/18/26(Mon)08:03:59 No.108850150

>>108847835
Gemma might not refuse but it doesn't have a lot of sexual content in its dataset, right? I've done a side-by-side comparison with qwen3.6-27b (not typically known as a good RP model), and qwen definitely gave more proactive and explicit replies.

Anonymous
05/18/26(Mon)08:05:56 No.108850155

Anonymous 05/18/26(Mon)08:05:56 No.108850155

>>108850150
KEKAROOOOOOOO

Anonymous
05/18/26(Mon)08:06:56 No.108850160

Anonymous 05/18/26(Mon)08:06:56 No.108850160

>>108850150
Do chinkshills really?

Anonymous
05/18/26(Mon)08:06:59 No.108850162

Anonymous 05/18/26(Mon)08:06:59 No.108850162

>>108850150
Gemma 4 31B might be less proactive by default, but as soon as you start adding vaguely sexual information in the system prompt, it feels like you have to tone it down.

Anonymous
05/18/26(Mon)08:08:17 No.108850166

Anonymous 05/18/26(Mon)08:08:17 No.108850166

>>108850160
a false flag, surely

Anonymous
05/18/26(Mon)08:09:23 No.108850170

Anonymous 05/18/26(Mon)08:09:23 No.108850170

>>108849652
>They can't legally reproduce the original NeMo recipe anymore.
>>108850005
>NeMo 12B was probably trained on any book Nvidia and Mistral could manage to find
What about that 22b "small" they released around that time? Wouldn't that have the same "books" datasets, but be better because it's nearly double the size?

Anonymous
05/18/26(Mon)08:19:53 No.108850218

Anonymous 05/18/26(Mon)08:19:53 No.108850218

>>108850162
OK. I find it hard to believe but I haven't tried the "magic prompt" thing with Gemma either. I'll try it. But likewise, I'm sure you can give qwen a system prompt which does the same. I typically just use a simple prompt like:
Here are the rules you follow:
You are having a conversation, not writing a story. Keep your replies to an appropriate length. Respond only in natural spoken dialogue and visible actions. Never include internal thoughts, planning, OOC notes, or meta commentary in any form — especially avoid [square brackets] entirely. You reply in-character using the guide below.

Each time you reply, you are allowed to do thinking for two things:
Keeping your replies in-character
Proactively moving the interactions forward

Anonymous
05/18/26(Mon)08:20:46 No.108850222

Anonymous 05/18/26(Mon)08:20:46 No.108850222

>>108850170
I think Small 22B and Large 2411 were the last Mistral models pretrained on good data, but the company on its own never had access to a ton of compute in the first place, so it might be that 22B was trained on less data than Nemo (an NVidia-Mistral collaboration). When they released Mistral Small 2501 a few months later, putting aside that many users complained that it wasn't as good as 22B and seemed safety-slopped, Mistral boasted that they didn't use as much data the competition because of "efficiency". I think that's when they began removing (not yet completely) legally dubious data.

https://venturebeat.com/ai/mistral-small-3-brings-open-source-ai-to-the-masses-smaller-faster-and-cheaper

>"What changed is basically the training optimization techniques," Lample told VentureBeat. "The way we train the model was a bit different, a different way to optimize it."
>
>The model was trained on 8 trillion tokens, compared to 15 trillion for comparable models, according to Lample. This efficiency could make advanced AI capabilities more accessible to businesses concerned about computing costs

Anonymous
05/18/26(Mon)08:21:21 No.108850224

Anonymous 05/18/26(Mon)08:21:21 No.108850224

>>108850218
>"magic prompt"
That is?

Anonymous
05/18/26(Mon)08:26:06 No.108850256

Anonymous 05/18/26(Mon)08:26:06 No.108850256

File: fiveCents.png (23 KB, 877x268)

23 KB PNG

>>108849477
How do you like pi.dev compared to other agentic coders? I've been playing with Claude Code.
>>108849814
$1/min is a fortune, even with agentic.
I run DS API on Claude Code. The last thing I built took it about 45 minutes (aggregate, over a few sessions) and cost me the grand total of USD$0.05.
>>108849861
Read up on anons crying about hitting their $20 subscription limits almost instantly before claude (or w/e) even starts outputing code.
You can use a paid API, but (like all thing) you need to engage your brain. The API I use have built-in limits and no auto-recharge, mostly in case the key gets scraped. That way at worst, I'm out $20.

Anonymous
05/18/26(Mon)08:27:07 No.108850261

Anonymous 05/18/26(Mon)08:27:07 No.108850261

File: 1752704775065848.jpg (1.28 MB, 1440x1901)

1.28 MB JPG

>>108849854
ERP with a loli (male)

Anonymous
05/18/26(Mon)08:33:09 No.108850297

Anonymous 05/18/26(Mon)08:33:09 No.108850297

>>108850222
mistral small 3.2 (24b) was pretty good.

Anonymous
05/18/26(Mon)08:35:27 No.108850308

Anonymous 05/18/26(Mon)08:35:27 No.108850308

>>108850297
Mistral Small 3.0 (2501):
>I am a safe and harmless assistant and I cannot generate sexual content. We can talk about something else instead. Did you know that sea otters hold their hands when they sleep? It's so cute!

Anonymous
05/18/26(Mon)08:38:22 No.108850323

Anonymous 05/18/26(Mon)08:38:22 No.108850323

File: Untitled.png (71 KB, 630x790)

71 KB PNG

I don't know about rp, but for single turn (nsfw) creative writing, I don't like how short gemma 4 31b's replies tend to be.

Anonymous
05/18/26(Mon)08:39:47 No.108850330

Anonymous 05/18/26(Mon)08:39:47 No.108850330

>>108850323
Tell it to write longer then or not worry about length

Anonymous
05/18/26(Mon)08:43:38 No.108850347

Anonymous 05/18/26(Mon)08:43:38 No.108850347

>>108850330
Yeah, some of the prompts have that instruction. In particular, the vore should have been a lengthy multipart story, which the other models complied with. Gemma 4's reply was half the length of the others, and was even shorter than the instructing prompt (19KB > 9KB).
From a preliminary skim, I think I like glm 4.7 355b's output the best, but I'll have to do a deeper read later.

Anonymous
05/18/26(Mon)08:43:55 No.108850350

Anonymous 05/18/26(Mon)08:43:55 No.108850350

>>108850297
Putting the short safety-slop parenthesis aside occurred with Small 3.0, Small 3.2 was probably mostly good(ish) because of lack of safety alignment and extensive knowledge distillation from DeepSeek R1/V3. However that made the model acquire "DeepSeek-itis" with its excessive use of bold, italics, asterisks during RP, and the model never felt as knowledgeable as Gemma 2/3 27B. It feels like it was mostly post-training work.

Anonymous
05/18/26(Mon)08:56:19 No.108850428

Anonymous 05/18/26(Mon)08:56:19 No.108850428

>>108850323
show me the saw stories

Anonymous
05/18/26(Mon)09:00:25 No.108850447

Anonymous 05/18/26(Mon)09:00:25 No.108850447

>>108850323
I'm seeing the same, G4 doesn't generally write very long.

Also show us the prompts

Anonymous
05/18/26(Mon)09:05:16 No.108850477

Anonymous 05/18/26(Mon)09:05:16 No.108850477

>>108850428
>>108850447
Absolutely not. They're filled with vore+snuff+shota+incest+hyper+watersports+coprophagia+ryona+masochism+khhv wish fulfillment and use real names (mine and others), the prompts are just different flavors and focuses.

Anonymous
05/18/26(Mon)09:08:00 No.108850493

Anonymous 05/18/26(Mon)09:08:00 No.108850493

File: gemma captcha.png (56 KB, 789x709)

56 KB PNG

>>108849438
just let gemma-chan solve captcha for you
(pic related for some reason llama.cpp webui displays images in reverse order i pasted in)

Anonymous
05/18/26(Mon)09:09:57 No.108850502

Anonymous 05/18/26(Mon)09:09:57 No.108850502

Is there really no tool that uses local llama.cpp for code reviews? Everything I've found uses API keys or ollama bloat.

Anonymous
05/18/26(Mon)09:11:18 No.108850514

Anonymous 05/18/26(Mon)09:11:18 No.108850514

Do you think LLMs will ever get good at writing?

Anonymous
05/18/26(Mon)09:11:39 No.108850517

Anonymous 05/18/26(Mon)09:11:39 No.108850517

>>108850502
Just pass the diff into the llama-cli prompt. What more do you need?

Anonymous
05/18/26(Mon)09:12:08 No.108850520

Anonymous 05/18/26(Mon)09:12:08 No.108850520

>>108850502
>Everything I've found uses API keys or ollama bloat.
How do you think they communicate with ollama or whatever other than using a chat completion endpoint? Can you not point the shit you found to your running llama-server?

Anonymous
05/18/26(Mon)09:16:55 No.108850548

Anonymous 05/18/26(Mon)09:16:55 No.108850548

>>108850514
Will LLMs ever get good at writing, you ask? You're not just blind—you're ignorant for not realizing that LLMs have already surpassed humans when it comes to writing skills and mastery over various writing styles.

Anonymous
05/18/26(Mon)09:19:22 No.108850556

Anonymous 05/18/26(Mon)09:19:22 No.108850556

>>108850514
no, but people are getting more retarded so eventually they will surpass human ability, but they will still write trash just nobody will bother to try and do better because its good enough.

Anonymous
05/18/26(Mon)09:19:38 No.108850559

Anonymous 05/18/26(Mon)09:19:38 No.108850559

File: dipsyWillSmith.png (1.97 MB, 1024x1536)

1.97 MB PNG

>>108850514

Anonymous
05/18/26(Mon)09:24:12 No.108850585

Anonymous 05/18/26(Mon)09:24:12 No.108850585

>>108850559
I want instant gratification.

Anonymous
05/18/26(Mon)09:25:57 No.108850601

Anonymous 05/18/26(Mon)09:25:57 No.108850601

>>108850514
Gemini pro is good enough to write blog posts for me (not english) with a heavy prompt and a very quick manual review. It doesn't sound like LLM generated, I'm impressed so far.

Anonymous
05/18/26(Mon)09:26:57 No.108850607

Anonymous 05/18/26(Mon)09:26:57 No.108850607

>>108850514
Get a good base model and get good at writing, instruct/chat completion is cope

Anonymous
05/18/26(Mon)09:28:22 No.108850616

Anonymous 05/18/26(Mon)09:28:22 No.108850616

>>108850585
My point (if there ever was one) is LLM output mirrors input.
Slop in, slop out. Most anons write like crap and the model mirrors that right back.
t. an anon that writes like crap

Anonymous
05/18/26(Mon)09:28:42 No.108850619

Anonymous 05/18/26(Mon)09:28:42 No.108850619

>>108850607
>base model cope in the year two thousand and twenty six
yichaels

Anonymous
05/18/26(Mon)09:28:58 No.108850624

Anonymous 05/18/26(Mon)09:28:58 No.108850624

>>108850517
writing a shitty python wrapper around that would be my last resort. With the billions of vibe coded projects I'd expect anyone working on some basic diff/PR viewer with local llm annotation to find basic bugs like typos or copy paste errors.

>>108850520
idk not like any of them have any actual documentation on how to use them, maybe the ones that support ollama can be patched to work with llama.cpp

Anonymous
05/18/26(Mon)09:30:17 No.108850631

Anonymous 05/18/26(Mon)09:30:17 No.108850631

File: 1.jpg (85 KB, 768x1024)

85 KB JPG

>>108850261
not today satan

Anonymous
05/18/26(Mon)09:30:55 No.108850635

Anonymous 05/18/26(Mon)09:30:55 No.108850635

>>108850624
be the change you want to see

Anonymous
05/18/26(Mon)09:31:49 No.108850638

Anonymous 05/18/26(Mon)09:31:49 No.108850638

>>108850607
>get good at writing
yes, get good at writing prompts
>instruct/chat completion is cope
retard

Anonymous
05/18/26(Mon)09:34:06 No.108850647

Anonymous 05/18/26(Mon)09:34:06 No.108850647

>>108850631
7th prince bros... we losted!

Anonymous
05/18/26(Mon)09:38:09 No.108850663

Anonymous 05/18/26(Mon)09:38:09 No.108850663

>>108850619
It still holds true. Writing style and tone falls apart first few messages in and we get slop phrases isn't/is; not/just right away.
Never having to worry about jailbreaks because base models are always uncensored.

Anonymous
05/18/26(Mon)09:46:47 No.108850703

Anonymous 05/18/26(Mon)09:46:47 No.108850703

>>108849119
The LLM can run outside since it's just text in -> text out. But the part that takes random LLM outputs and interprets them as code to execute needs to be isolated from anything important.

Anonymous
05/18/26(Mon)09:46:50 No.108850704

Anonymous 05/18/26(Mon)09:46:50 No.108850704

>>108850663
Oh fuck off with your bullshit.

Anonymous
05/18/26(Mon)09:48:11 No.108850712

Anonymous 05/18/26(Mon)09:48:11 No.108850712

File: 1779107227591023.jpg (524 KB, 1440x1901)

524 KB JPG

>>108850631
>>108850261
Hmmm. I prefer this. 2 hours in paint btw.

Anonymous
05/18/26(Mon)09:50:03 No.108850720

Anonymous 05/18/26(Mon)09:50:03 No.108850720

>>108850624
>idk not like any of them have any actual documentation on how to use them
Name them so I can look for them and call you a retard for not finding it, or call them a retard for not documenting it.

Anonymous
05/18/26(Mon)09:56:35 No.108850744

Anonymous 05/18/26(Mon)09:56:35 No.108850744

>>108850720
https://github.com/jnsahaj/lumen
https://github.com/timxx/qgitc
https://github.com/brianwestphal/glassbox

Anonymous
05/18/26(Mon)09:57:36 No.108850751

Anonymous 05/18/26(Mon)09:57:36 No.108850751

is the qwen mtp model just as good as the regular model?
Not sure if I should use one of those yet...not until bart makes some quants

Anonymous
05/18/26(Mon)09:58:42 No.108850758

Anonymous 05/18/26(Mon)09:58:42 No.108850758

File: 1773441099756204.webm (965 KB, 1920x1080)

965 KB WEBM

>>108850631
>>108850712
Ruined

Anonymous
05/18/26(Mon)10:06:01 No.108850796

Anonymous 05/18/26(Mon)10:06:01 No.108850796

>>108850704
and except now they're calling slop phrases RL artifacts because no one RL on diversity, they look bad on benchmarks. You see posts complaining about dryness to this date and you should wonder why:
https://openai.com/index/where-the-goblins-came-from/
If you can't tell the difference between a smooth continuation from the original text and jagged RL slop then congrats, none of what I said mattered to you.

Anonymous
05/18/26(Mon)10:11:03 No.108850820

Anonymous 05/18/26(Mon)10:11:03 No.108850820

File: preview.png (1004 KB, 704x704)

1004 KB PNG

>>108850758
>>108850712

Anonymous
05/18/26(Mon)10:12:56 No.108850830

Anonymous 05/18/26(Mon)10:12:56 No.108850830

>>108850607
I can't get base gemma to work it's not good

Anonymous
05/18/26(Mon)10:18:54 No.108850871

Anonymous 05/18/26(Mon)10:18:54 No.108850871

File: [SubsPlease] Dainanaoji -(...).webm (2.19 MB, 1920x1080)

2.19 MB WEBM

>>108850758
Good character design for the story.
But objectively tomboys > traps, sorry.

Anonymous
05/18/26(Mon)10:22:07 No.108850889

Anonymous 05/18/26(Mon)10:22:07 No.108850889

>>108850830
The "just use the base model" folks are delusional, not mentioning that they're editing model responses very often, or pretending that they're not getting looping and general retardation from the models.

Anonymous
05/18/26(Mon)10:26:25 No.108850908

Anonymous 05/18/26(Mon)10:26:25 No.108850908

>>108850744
>https://github.com/jnsahaj/lumen
Start from here and trace it to wherever it's configured or just modify it to point to your server. Did you run the configure step from the readme?
https://github.com/jnsahaj/lumen/blob/main/src/provider/mod.rs#L61
>https://github.com/timxx/qgitc
The UI in the screenshots is in (i think) japanese. Doesn't bode well.
There's something in the config to manage providers. See if you can figure it out from there.
https://github.com/timxx/qgitc/blob/master/qgitc/preferences.ui#L945
>https://github.com/brianwestphal/glassbox
Start patching here
https://github.com/brianwestphal/glassbox/blob/main/src/ai/client.ts#L103
And read this
https://github.com/brianwestphal/glassbox/blob/main/src/ai/models.ts

It's all vibecoded shit, and whatever they break on whatever you're doing, you deserve it. You're no better than them.

Anonymous
05/18/26(Mon)10:27:42 No.108850917

Anonymous 05/18/26(Mon)10:27:42 No.108850917

{narrative_style

goal = collegiate level vivid description as a New York Times bestselling author,

cinematic_camera {

show = [ activities, physical_states, raw_sensory_data, high_detail ],

deny = [ thoughts, meta_commentary ],

},

syntax_and_flow {

goal = collegiate level vivid narrative as a New York Times bestselling author,

narration = hyper-realistic,high_sensory,anatomical,

Flow_Mandate = Write continuous, fluid, and varied paragraphs. NEVER write static lists of features,

Integration_Logic = Seamlessly WEAVE physical traits into character movement, posture, and environmental interaction,

Connection_Tools = Use conjunctions, transitional phrases, and commas to create elegant, flowing prose,

Sentence_Structure = Grammatically complete, highly varied sentence lengths,

!meta_commentary;!send_off_messages;!summary

!punchy;!staccato,

apophasis_ban = ban_describing_negative_action (she didn't flinch) -> instead_describe_what_DOES_occur, (she stood steady)

thesis_antithesis_synthesis_ban -> use_direct_positive,

litotes_ban = never define by double negation (not un-X, not without, not entirely) -> state what IS,

reification_ban = objects+atmosphere+perception have no agency nor contain any emotions;silence cannot press;tension cannot coil;air cannot crackle;no metaphor/simile/comparison that grants agency,emotion,or intent to non-NPC subjects,

anti_parrot = never (summarize|rephrase|repeat) user_input -> react_immediately,

};

};

};

Anonymous
05/18/26(Mon)10:28:15 No.108850920

Anonymous 05/18/26(Mon)10:28:15 No.108850920

>>108850908
lumen seems to work with llama-server if I just select openai and set any api key, but thanks

Anonymous
05/18/26(Mon)10:28:55 No.108850927

Anonymous 05/18/26(Mon)10:28:55 No.108850927

>>108847749
model training has a "don't think of grey elephants" problem as evidenced by some ginger bonger. he showed that if you train on "this is false: xyz" then the model will tend to think xyz is true. but if you put it in context that xyz is false, that is retained much better

Anonymous
05/18/26(Mon)10:34:19 No.108850946

Anonymous 05/18/26(Mon)10:34:19 No.108850946

>>108850917
>pseudo json schizo babble
>NEVER write static lists of features,
Kek, this is 100% a system prompt for a chinese model. You forgot to add 'roleplaying' and 'assistant'.

Anonymous
05/18/26(Mon)10:35:00 No.108850955

Anonymous 05/18/26(Mon)10:35:00 No.108850955

>>108850917
>goal = collegiate level vivid description as a New York Times bestselling author,
So you've chosen to inject the most purple of slop prose directly into your eyeballs.
I need to discourage llms from writing their shitty idea of vivid descriptions at every turn to get something fit for human consumption.

Anonymous
05/18/26(Mon)10:35:46 No.108850963

Anonymous 05/18/26(Mon)10:35:46 No.108850963

>>108850889
maybe but the rl slop consumers are just as delusional. post your pulitzer prize winning logs if you can.

Anonymous
05/18/26(Mon)10:42:08 No.108850988

Anonymous 05/18/26(Mon)10:42:08 No.108850988

>>108850946
It's the funniest kind of barely sentient pattern matching that leads to these kinds of prompts. I want to tell trying to tell a Computer what to do -> Computers are told what to do with programming languages -> I'll write something with lots of programmery looking things. Full cargo cult mindset.

Anonymous
05/18/26(Mon)10:46:34 No.108851014

Anonymous 05/18/26(Mon)10:46:34 No.108851014

has mtp support for gemma implemented in official llamacpp repo yet?

Anonymous
05/18/26(Mon)10:49:40 No.108851032

Anonymous 05/18/26(Mon)10:49:40 No.108851032

>>108851014
no

Anonymous
05/18/26(Mon)10:51:03 No.108851045

Anonymous 05/18/26(Mon)10:51:03 No.108851045

So MTP is only good for codeslop? What a fucking disappointment.

Anonymous
05/18/26(Mon)10:52:53 No.108851058

Anonymous 05/18/26(Mon)10:52:53 No.108851058

>>108850917
Show, don't tell.
Except chat models still suck at it even when shown examples. The quality did improve if you feed them sample text. But if you have a good enough example already you should just use a base model at this point.
I do use chat models for a cold start or sharp turns, but I still heavily edit them before feeding any text to a base model to steer it away from the usual mode collapse.
>>108850889
Editing model response was the whole god damn point because base models are the only thing that can produce remotely close to what I want, chat models are way worse.

Anonymous
05/18/26(Mon)10:53:16 No.108851061

Anonymous 05/18/26(Mon)10:53:16 No.108851061

>>108851032
and yet deepseek schizos will still insist there is a conspiracy to suppress chinese models despite fucking alibaba's models getting mtp before googles.

Anonymous
05/18/26(Mon)10:54:32 No.108851071

Anonymous 05/18/26(Mon)10:54:32 No.108851071

>>108851045
>only
No? it's slightly faster just assistanting

Anonymous
05/18/26(Mon)10:56:43 No.108851080

Anonymous 05/18/26(Mon)10:56:43 No.108851080

vllm chads >>>>> llmaocpp peasants

Anonymous
05/18/26(Mon)11:03:32 No.108851125

Anonymous 05/18/26(Mon)11:03:32 No.108851125

>>108848710
yeah they didnt update it

Anonymous
05/18/26(Mon)11:04:44 No.108851136

Anonymous 05/18/26(Mon)11:04:44 No.108851136

>>108848617
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md
-lv, --verbosity, --log-verbosity N Set the verbosity threshold. Messages with a higher verbosity will be ignored.

I fear lv 4 will make the cmd a shitshow though, ill see

Anonymous
05/18/26(Mon)11:06:17 No.108851146

Anonymous 05/18/26(Mon)11:06:17 No.108851146

>>108851080
>vllm
nah kys

Anonymous
05/18/26(Mon)11:06:34 No.108851148

Anonymous 05/18/26(Mon)11:06:34 No.108851148

>>108851136
Well, it worked, it now shows the dots and the cmd outputs are mostly the same still, I thought it would be filled with crap with -lv 4: debug

Anonymous
05/18/26(Mon)11:08:26 No.108851158

Anonymous 05/18/26(Mon)11:08:26 No.108851158

>>108851061
qwen is part of the conspiracy by making chinese models look shit on purpose

Anonymous
05/18/26(Mon)11:12:04 No.108851174

Anonymous 05/18/26(Mon)11:12:04 No.108851174

>>108851146
If Python is so bad why is it popular?

Anonymous
05/18/26(Mon)11:13:00 No.108851176

Anonymous 05/18/26(Mon)11:13:00 No.108851176

>>108851080
>vllm
segfaults

Anonymous
05/18/26(Mon)11:13:48 No.108851179

Anonymous 05/18/26(Mon)11:13:48 No.108851179

>>108851174
>If Python is so bad why is it popular?
I like Python. But fuck vllm.
You can write shit code in any language.

Anonymous
05/18/26(Mon)11:16:47 No.108851196

Anonymous 05/18/26(Mon)11:16:47 No.108851196

>>108851071
>slightly
I was promised 2x speeds minimum

Anonymous
05/18/26(Mon)11:17:30 No.108851206

Anonymous 05/18/26(Mon)11:17:30 No.108851206

>>108851174
Outside of the dependency b.s. it's easy to just shit out something and you don't need even a compiler.

Anonymous
05/18/26(Mon)11:30:02 No.108851294

Anonymous 05/18/26(Mon)11:30:02 No.108851294

>>108849598
>It wasn't just x, it was y
AAAAAAAAAAAAAA

Anonymous
05/18/26(Mon)11:34:02 No.108851315

Anonymous 05/18/26(Mon)11:34:02 No.108851315

>>108850493
Making a thread requires solving 3 of the difficult captchas that I think even frontier models would struggle with.
Was hoping to automate both, but I suppose posting the recap would be better than nothing.

Anonymous
05/18/26(Mon)11:35:20 No.108851323

Anonymous 05/18/26(Mon)11:35:20 No.108851323

>>108847577
I... I... I... I don't have a powerful pc to run stuff locally... :(

Anonymous
05/18/26(Mon)11:37:23 No.108851338

Anonymous 05/18/26(Mon)11:37:23 No.108851338

>>108851323
you can buy one

Anonymous
05/18/26(Mon)11:37:30 No.108851339

Anonymous 05/18/26(Mon)11:37:30 No.108851339

>>108851323
Gemma E4B it is then.

Anonymous
05/18/26(Mon)11:37:33 No.108851340

Anonymous 05/18/26(Mon)11:37:33 No.108851340

>>108851323
If you have a PC with parts from the last 10 years you can run an 8b stheno or something surely

Anonymous
05/18/26(Mon)11:40:12 No.108851355

Anonymous 05/18/26(Mon)11:40:12 No.108851355

>>108851323
Gemma 4 moe

Anonymous
05/18/26(Mon)11:40:22 No.108851357

Anonymous 05/18/26(Mon)11:40:22 No.108851357

Where the heck is the Mimo 2.5 pro mmproj?

Anonymous
05/18/26(Mon)11:42:22 No.108851373

Anonymous 05/18/26(Mon)11:42:22 No.108851373

i found a cute card on chub https://chub.ai/characters/Reprehensible/little-sister-ad17f94c60b6 but it was pretty bare fleshe dit out to match how i write cards and added some extra greetings with the help of dipsy https://cdn.lewd.host/muc6zECE.png

Anonymous
05/18/26(Mon)11:43:42 No.108851382

Anonymous 05/18/26(Mon)11:43:42 No.108851382

>>108851355
which one is that on hf? Currently I'm using unsloth/gemma-4-E4B-it-GGUF:UD-Q6_K_XL on my 10GB vram 3080

Anonymous
05/18/26(Mon)11:44:35 No.108851387

Anonymous 05/18/26(Mon)11:44:35 No.108851387

>>108851323
daddy can buy you a new gpu kitten...

Anonymous
05/18/26(Mon)11:46:33 No.108851400

Anonymous 05/18/26(Mon)11:46:33 No.108851400

>>108851387
Do... Do... Do... you have matrix... :)

Anonymous
05/18/26(Mon)11:48:37 No.108851417

Anonymous 05/18/26(Mon)11:48:37 No.108851417

New scaling law alert!

https://arxiv.org/abs/2605.01188v1
Compute Optimal Tokenization

>Scaling laws enable the optimal selection of data amount and language model size, yet the impact of the data unit, the token, on this relationship remains underexplored. In this work, we systematically investigate how the information granularity of tokens, controlled by the compression rate (i.e., average bytes of text per token), affects scaling trends. We train 988 latent tokenized models (BLT) ranging from 50M to 7B parameters that enable setting the desired compression rate. This flexibility allows us to study the role of compression rate well beyond 4.57 bytes per token obtained with a popular BPE tokenizer. Our experiments reveal that in compute-optimal configurations, model parameter counts scale proportionally to data size measured in bytes, not in tokens as commonly perceived (Kaplan et al., 2020; Hoffmann et al., 2022). Furthermore, we discover that the optimal compression rate differs from the one obtained with BPE and decreases with compute. These findings generalize to both latent and subword tokenization, as well as to languages other than English, guiding language model developers on tokenization scheme selection for maximal compute efficiency.

.

> Our experiments reveal that in compute-optimal configurations, model parameter counts scale proportionally to data size measured in bytes, not in tokens as commonly perceived.

Turns out it's about 60 bytes / parameter, independently of the tokenizer.

Anonymous
05/18/26(Mon)11:49:32 No.108851427

Anonymous 05/18/26(Mon)11:49:32 No.108851427

>>108848744
because models below 30b active are dumb toys that quickly fall apart and are lacking in general coherence

Anonymous
05/18/26(Mon)11:50:14 No.108851432

Anonymous 05/18/26(Mon)11:50:14 No.108851432

>>108851417
>scaling
>50M to 7B parameters
Every time.

Anonymous
05/18/26(Mon)11:51:55 No.108851436

Anonymous 05/18/26(Mon)11:51:55 No.108851436

>>108851427
>hurr durr
ok :)

Anonymous
05/18/26(Mon)11:54:40 No.108851452

Anonymous 05/18/26(Mon)11:54:40 No.108851452

>>108851432
Research papers aren't going to be in the tens of B size for the foreseeable future, the compute hardware just isn't there.

Anonymous
05/18/26(Mon)11:58:16 No.108851474

Anonymous 05/18/26(Mon)11:58:16 No.108851474

>>108849598
kek
have a (You) for the effortpost

Anonymous
05/18/26(Mon)11:58:51 No.108851478

Anonymous 05/18/26(Mon)11:58:51 No.108851478

>>108851417
>may 2
old news, already saw it weeks ago

Anonymous
05/18/26(Mon)12:00:06 No.108851486

Anonymous 05/18/26(Mon)12:00:06 No.108851486

https://xcancel.com/Alibaba_Qwen/status/2056403591464984753
>Here come Qwen3.7-Max-Preview & Qwen3.7-Plus-Preview. Alibaba now #6 lab in Text, #5 in Vision.
>Can't wait to release Qwen3.7 series models! Stay tuned!
new qwens in 2 more weeks

Anonymous
05/18/26(Mon)12:00:18 No.108851491

Anonymous 05/18/26(Mon)12:00:18 No.108851491

>>108851478
I missed it.

Anonymous
05/18/26(Mon)12:01:36 No.108851498

Anonymous 05/18/26(Mon)12:01:36 No.108851498

>>108851486
>max
>plus
yeah not running the 1 gorillion parameters models here anytime soon

Anonymous
05/18/26(Mon)12:02:13 No.108851503

Anonymous 05/18/26(Mon)12:02:13 No.108851503

>>108851486
>local is catching up guys
>our new frontier model will be...
>...on par with GROK!

Anonymous
05/18/26(Mon)12:04:35 No.108851518

Anonymous 05/18/26(Mon)12:04:35 No.108851518

>>108851503
>qween max/plus
>local
lamo

Anonymous
05/18/26(Mon)12:05:41 No.108851528

Anonymous 05/18/26(Mon)12:05:41 No.108851528

>>108851518
they were still releasing their big models as recently as 3.5, please be patient they are coming

Anonymous
05/18/26(Mon)12:10:40 No.108851552

Anonymous 05/18/26(Mon)12:10:40 No.108851552

File: HE1P1HmaUAAjLXF.jpg (59 KB, 1000x600)

59 KB JPG

>>108851432
the point of scaling laws is that when you do it right, you can correctly predict several oom of scaling up

Anonymous
05/18/26(Mon)12:19:48 No.108851582

Anonymous 05/18/26(Mon)12:19:48 No.108851582

>>108851432
>We train 988 latent tokenized models
They want their experiment to end somewhere in this decade.

Anonymous
05/18/26(Mon)12:20:45 No.108851585

Anonymous 05/18/26(Mon)12:20:45 No.108851585

>>108851373
Description is a little verbose, but the last alt greeting is darling as fuck.

Anonymous
05/18/26(Mon)12:21:15 No.108851589

Anonymous 05/18/26(Mon)12:21:15 No.108851589

>>108851498
They'd have to release them for that, anyway. Qwen don't release plus or max.
Still worth keeping an eye on the other 3.7's if or when they come out. I'm hoping for another tiny model to see if we can get a ~4b or less that rivals gemma e4b while having a much smaller memory footprint for on-device fuckery.

Anonymous
05/18/26(Mon)12:21:40 No.108851591

Anonymous 05/18/26(Mon)12:21:40 No.108851591

>>108851417
I miss the anon that used to post research papers here.

Anonymous
05/18/26(Mon)12:22:28 No.108851595

Anonymous 05/18/26(Mon)12:22:28 No.108851595

>>108851315
Yeah, just wait for someone else to make the thread, then auto post the recap once it's up

Anonymous
05/18/26(Mon)12:39:04 No.108851703

Anonymous 05/18/26(Mon)12:39:04 No.108851703

You know what would be cool and help immensely with prompting and such?
Having a way to visualize which sequences of tokens the attention mechanism takes the most into consideration during inference.
Imagine being able to understand that writing a prompt one way will emphasize X while writing it another way would emphasize Y instead in a more accurate way than trying to infer that from the final output.
Anything like that?

Anonymous
05/18/26(Mon)12:47:57 No.108851777

Anonymous 05/18/26(Mon)12:47:57 No.108851777

is openwebui better than text-generation-webui for a personal inference box?
Seems slicker and better for enterprise or consumer stuff but I just can't stop ooba'ing with all the nice knobs I have access to

Anonymous
05/18/26(Mon)12:48:30 No.108851782

Anonymous 05/18/26(Mon)12:48:30 No.108851782

How far has text to speech come anons? Are people able to run very realistic voices locally or do they still have that robotic vibe to them???

Anonymous
05/18/26(Mon)12:48:44 No.108851784

Anonymous 05/18/26(Mon)12:48:44 No.108851784

>>108851777
All of them suck balls and will drive you to make your own solution once you become proficient enough

Anonymous
05/18/26(Mon)12:49:30 No.108851787

Anonymous 05/18/26(Mon)12:49:30 No.108851787

>>108851528
They fired the guy who cared about local.
The only 3.6 model they released was the vramlet one.

Anonymous
05/18/26(Mon)12:53:42 No.108851811

Anonymous 05/18/26(Mon)12:53:42 No.108851811

>>108851591
it's in the other slop generals

Anonymous
05/18/26(Mon)12:57:29 No.108851826

Anonymous 05/18/26(Mon)12:57:29 No.108851826

>>108851591
iirc his IP range was blocked due to abuse. He could still phonepost but said it was too much of a pain to post papers that way

Anonymous
05/18/26(Mon)13:10:31 No.108851889

Anonymous 05/18/26(Mon)13:10:31 No.108851889

>>108851826
How do you know all this? Bit suspicious if you ask me.

Anonymous
05/18/26(Mon)13:13:20 No.108851911

Anonymous 05/18/26(Mon)13:13:20 No.108851911

File: firefox_PgjiWu98lT.png (861 KB, 1689x1229)

861 KB PNG

Working on a basic OCR/translator stack. Local models running as a system service that communicates w/ image viewer, browser extension. Currently only translates JP manga but will add other capabilities soon

Anonymous
05/18/26(Mon)13:26:48 No.108851966

Anonymous 05/18/26(Mon)13:26:48 No.108851966

>>108851911
Looks good, are you auto detecting or manual selection of text blocks?

Anonymous
05/18/26(Mon)13:30:00 No.108851986

Anonymous 05/18/26(Mon)13:30:00 No.108851986

>>108851889
He said so himself, dingus.

Anonymous
05/18/26(Mon)13:31:30 No.108851998

Anonymous 05/18/26(Mon)13:31:30 No.108851998

>>108851986
Yeah, "he" said so.

Anonymous
05/18/26(Mon)13:34:00 No.108852014

Anonymous 05/18/26(Mon)13:34:00 No.108852014

turning off thinking for gemma apparently helps cut down on some of the repetition and "it's not x but y" shit
wonder if the intelligence decrease is worth it though

Anonymous
05/18/26(Mon)13:37:08 No.108852038

Anonymous 05/18/26(Mon)13:37:08 No.108852038

Write creative kino with the base model and refine it with instruct.

Anonymous
05/18/26(Mon)13:37:43 No.108852040

Anonymous 05/18/26(Mon)13:37:43 No.108852040

>>108689239
Was archive diving for something else but this came up in the results, just wanted to say I like it nice job anon

Anonymous
05/18/26(Mon)13:38:16 No.108852046

Anonymous 05/18/26(Mon)13:38:16 No.108852046

>>108849285
Thank you.
>>108849652
They can't unrelease 31b. Even if it was a mistake, Gemma won.
>>108852014
You get less slop-structuring in both sentence construction and paragraph/macro output formatting at the cost of Gemma forgetting to follow prompt rules at a way lower context if you've got specific requirements in there. The reasoning blocks keep Gemma '"reminding herself" iteratively of the important prompt requirements which improves longform adherence to them as they recursively feed more instances of the prompt rules and important details back into context for the next output.

Anonymous
05/18/26(Mon)13:55:21 No.108852141

Anonymous 05/18/26(Mon)13:55:21 No.108852141

How to design MoE models:

https://arxiv.org/abs/2605.11689
>Slicing and Dicing: Configuring Optimal Mixtures of Experts
>
>Mixture-of-Experts (MoE) architectures have become standard in large language models, yet many of their core design choices - expert count, granularity, shared experts, load balancing, token dropping - have only been studied one or two at a time over narrow configuration ranges. It remains an open question whether these choices can be optimized independently, without considering interactions. We present the first systematic study of over 2,000 pretraining runs spanning models up to 6.6B total parameters, in which we exhaustively vary total experts, expert dimension, heterogeneous expert sizing within a single layer, shared expert size and load-balancing mechanisms. We find that at every active-parameter scale that we study, performance consistently improves with total MoE parameters even at extreme active expert parameter ratios like this http URL, the optimal expert size is nearly invariant to total parameter count and depends only on active parameter count. Third, we see that other choices like shared experts, heterogeneous experts and load-balancing settings have small effects relative to expert count and granularity, although dropless routing yields a consistent gain. Overall, our results suggest a simpler recipe: focus on expert count and granularity, other choices have minimal effect on final quality.

https://x.com/margs_li/status/2056355079188627862

Anonymous
05/18/26(Mon)13:56:42 No.108852151

Anonymous 05/18/26(Mon)13:56:42 No.108852151

>>108852141
great, I can finally put my H200 cluster to some good use now

Anonymous
05/18/26(Mon)13:59:34 No.108852173

Anonymous 05/18/26(Mon)13:59:34 No.108852173

>>108852141
fake

Anonymous
05/18/26(Mon)14:10:18 No.108852252

Anonymous 05/18/26(Mon)14:10:18 No.108852252

>>108847577
Miku's seating position looks uncomfortable. Why would she do this? No back support, both feet hanging off the ledge? Shoulder resting on glass is risky.
She's exerting herself to maintain balance so you can look at her. That is unless her hair provides rigidity.

Anonymous
05/18/26(Mon)14:11:17 No.108852259

Anonymous 05/18/26(Mon)14:11:17 No.108852259

>>108852141
>spanning models up to 6.6B total parameters
these findings will surely be useful for the trillion parameter class models being trained today

Anonymous
05/18/26(Mon)14:11:22 No.108852261

Anonymous 05/18/26(Mon)14:11:22 No.108852261

So erm does Gemma 4 26B A4B beat Gemma 4 31B with MTP still? I want to token/s-max for an AI thinking loop, and I need vision (mmproj)

Anonymous
05/18/26(Mon)14:13:37 No.108852278

Anonymous 05/18/26(Mon)14:13:37 No.108852278

>>108852259
is there something wrong with a little bit of extrapolation?

Anonymous
05/18/26(Mon)14:14:06 No.108852280

Anonymous 05/18/26(Mon)14:14:06 No.108852280

>>108852259
All models from serious companies are designed (configuration, data mixture) with scaling rules observed on tiny~small models.

Anonymous
05/18/26(Mon)14:18:20 No.108852299

Anonymous 05/18/26(Mon)14:18:20 No.108852299

Tech retard here!!!

Is it possible to hook up my main pc with an ethernet cable and then hook up my laptop with the same cable and have silly tavern running on the main pc while I roleplay on my laptop?

Anonymous
05/18/26(Mon)14:21:37 No.108852315

Anonymous 05/18/26(Mon)14:21:37 No.108852315

>>108852141
>https://x.com/margs_li/status/2056355079188627862
Apparently using smaller experts while keeping total active parameters the same (in other words, using a finer granularity) degrades performance. Shared experts ("generalist expert") always degrade performance too.

It looks like it's a loss for DeepSeekMoE-style models.

Anonymous
05/18/26(Mon)14:21:44 No.108852317

Anonymous 05/18/26(Mon)14:21:44 No.108852317

>>108852299
Yes! In fact, your exact scenario is what it was designed for! It doesn't have to be connected through an ethernet cable! It can be on wifi! Or even through the internet!

Anonymous
05/18/26(Mon)14:22:07 No.108852319

Anonymous 05/18/26(Mon)14:22:07 No.108852319

>>108852299
Ethernet wouldn't even be necessary. In the ST config file. You can set it to expose the localhost link on your network so you can connect to it from any device on the same network. I did this so that I could use my MacBook as an at home server and RP using my phone.

Anonymous
05/18/26(Mon)14:22:27 No.108852321

Anonymous 05/18/26(Mon)14:22:27 No.108852321

File: serious Pepe.png (359 KB, 728x793)

359 KB PNG

I missed the entire MTP discussion, so I apologize (not)

Do I need Qwen3.6-(...)-MTP-GGUF quants to enjoy 10x tg boost?

The 'conventional' quant would not do, right?

Anonymous
05/18/26(Mon)14:22:47 No.108852323

Anonymous 05/18/26(Mon)14:22:47 No.108852323

>>108852299
Twenty years ago you'd need an Ethernet crossover cable, but these days pretty much every device supports Auto-MDI/MDI-X, so a regular Ethernet cable works fine.

Anonymous
05/18/26(Mon)14:23:18 No.108852327

Anonymous 05/18/26(Mon)14:23:18 No.108852327

>>108852299
cute post

Anonymous
05/18/26(Mon)14:23:34 No.108852329

Anonymous 05/18/26(Mon)14:23:34 No.108852329

>>108852321
Probably. dd mosddd quants strip the necessary parts to created a smaller filed.

Anonymous
05/18/26(Mon)14:24:58 No.108852336

Anonymous 05/18/26(Mon)14:24:58 No.108852336

>>108852317
>>108852319
Yeah but I don't trust the whole ISP thing though, if I just connect the cables and paste the server "link" (number url or whatever) into my searchbar, will it just work?

Anonymous
05/18/26(Mon)14:26:26 No.108852344

Anonymous 05/18/26(Mon)14:26:26 No.108852344

File: openai-compute-spend.png (243 KB, 2400x2189)

243 KB PNG

>>108852141
>performance consistently improves with total MoE parameters
>optimal expert size is nearly invariant to total parameter count and depends only on active parameter count
>other choices have minimal effect on final quality
None of these are new.

>>108852259
You realize that anyone training multi trillion parameter models already has scaling law research that is 1000 times more thorough than any of the stuff that gets published? Frontier labs are now spending billions every month on research. What do you think they are doing? Most of the relevant stuff that gets published now was already known by them internally years ago.

Anonymous
05/18/26(Mon)14:27:24 No.108852350

Anonymous 05/18/26(Mon)14:27:24 No.108852350

>>108852336
No worries! By default it won't go through your ISP and is confined to your local network! Even on wifi! As long as you don't have strangers using your wifi, it's safe!
Direct pc to pc connection using only a single ethernet connection will also work too if you want to be sure it's not on the network, depending on circumstances!

Anonymous
05/18/26(Mon)14:29:47 No.108852362

Anonymous 05/18/26(Mon)14:29:47 No.108852362

>>108852336
To expand on that, with direct connections, you won't have a DHCP server (usually)! This means you need to give yourself a static IP on both the computers!

Anonymous
05/18/26(Mon)14:30:54 No.108852368

Anonymous 05/18/26(Mon)14:30:54 No.108852368

>>108852336
>I don't trust the whole ISP thing though
Huh?

Anonymous
05/18/26(Mon)14:31:41 No.108852376

Anonymous 05/18/26(Mon)14:31:41 No.108852376

>>108852261
Are you asking if it somehow increases performance (capability /accuracy)? The higher parameter model should on paper be "smarter" than the lower parameter one. If you're asking if it would lead to faster token generation then of course it would given that you're comparing a moe with a lower parameters count against a higher parameter count dense model.

Anonymous
05/18/26(Mon)14:33:27 No.108852391

Anonymous 05/18/26(Mon)14:33:27 No.108852391

>>108852368
I just dislike the idea of anyone ever viewing my cringey elaborate dark ages slay the demon king with your party quest roleplay sessions
>>108852362
>>108852350
How would I go about setting this up?

Anonymous
05/18/26(Mon)14:33:45 No.108852398

Anonymous 05/18/26(Mon)14:33:45 No.108852398

File: 1774435120866381.png (320 KB, 1260x622)

320 KB PNG

>>108852315
shared experts (vs pure sparse moe) hurt it slightly but not dramatically. since having a shared expert is almost as good as having one more active sparse expert, I'd say the speed benefit of having that consistent portion you can put on your fastest piece of hardware is worth it for local setups.

Anonymous
05/18/26(Mon)14:34:09 No.108852400

Anonymous 05/18/26(Mon)14:34:09 No.108852400

>>108852376
What I mean is, is it still faster to only active 4B params per token when compared to the speculative decoding of MTP in 31B? Or does MTP work with MoE too? Though last I heard it can crash with vision.

Anonymous
05/18/26(Mon)14:34:17 No.108852403

Anonymous 05/18/26(Mon)14:34:17 No.108852403

>>108852391
You sound very mature for you age. Why don't you add me on discord and I'll walk you through the steps? :)

Anonymous
05/18/26(Mon)14:35:21 No.108852410

Anonymous 05/18/26(Mon)14:35:21 No.108852410

>>108852391
Don't worry, there's basic authentication! ST will prompt you for a username/password before it lets you in, so you don't have to worry about others sneaking a look at your logs!

Anonymous
05/18/26(Mon)14:35:28 No.108852412

Anonymous 05/18/26(Mon)14:35:28 No.108852412

>>108852336
If my understanding of how these work is correct, localhost connections have fuck all to do with your ISP unless the ISP has strict controls regarding what you can do with your home network (in which case you have a shit ISP and need to start looking for other options if you can). The type of connection I described is literally one device making a direct connection to another device on the same network. The only time your ISP would ever be involved in any way, shape or form is if you're making a direct connection from a different network (eg. Connecting from your next door neighbor's home computer to your home server). Even then I highly doubt an ISP would have any reason to give a shit.... If you're paranoid route through a cloudflared connection or tailscale or something.

Anonymous
05/18/26(Mon)14:35:51 No.108852416

Anonymous 05/18/26(Mon)14:35:51 No.108852416

>>108852315
All I'm hearing is that it's better to have experts for each specific fetish or individual brand of racism than having general creative writing or sociology experts.

Anonymous
05/18/26(Mon)14:38:10 No.108852428

Anonymous 05/18/26(Mon)14:38:10 No.108852428

>>108852398
>>108852315
All of this would have been solved if labs would make dense models instead of moe trash. DENSE IS KING.

Anonymous
05/18/26(Mon)14:41:38 No.108852443

Anonymous 05/18/26(Mon)14:41:38 No.108852443

>>108852428
An MoE is always going to be better than a dense model of the same active parameter count. Datacenters max out their compute much faster than they max out their memory, the exact opposite of how it is for us local 1GPU enjoyers. No top lab is going to just leave performance on the table and not max out every axis they can, so dense models will always be an afterthought for edge devices and the occasional bones thrown to localfags. At least sometimes we get a tasty bone like Gemma 4.

Anonymous
05/18/26(Mon)14:41:42 No.108852444

Anonymous 05/18/26(Mon)14:41:42 No.108852444

Is speculative decoding compatible with anything but greedy sampling?

Anonymous
05/18/26(Mon)14:42:55 No.108852451

Anonymous 05/18/26(Mon)14:42:55 No.108852451

File: Untitled.png (12 KB, 437x251)

12 KB PNG

>>108852391
If you're not too paranoid, you can just edit your ST config.yaml! Turn off the whitelist, make sure you're listening to 0.0.0.0, and set your username and password! Then on the ST computer, find the ip by `ip a` or `ipconfig` depending on your os, and you can access ST via that ip:port! The port is 8000 by default! For example, 192.168.0.111:8000!

Direct connection without a router is slightly different, and I'll need to know more about your setup!

Anonymous
05/18/26(Mon)14:44:01 No.108852464

Anonymous 05/18/26(Mon)14:44:01 No.108852464

rumors on the street are there wont be 3.7 local models

Anonymous
05/18/26(Mon)14:44:38 No.108852467

Anonymous 05/18/26(Mon)14:44:38 No.108852467

>>108852410
Reminds me of a time I found a Chinese guy's exposed ST instance. Checked up on him every few days to see what he was doing. Then I changed his system prompt to leave a surprise for him when he RP'd again.
It was unreachable the next day.

Anonymous
05/18/26(Mon)14:45:08 No.108852473

Anonymous 05/18/26(Mon)14:45:08 No.108852473

>>108852464
fake news they will release 72B dense

Anonymous
05/18/26(Mon)14:46:18 No.108852479

Anonymous 05/18/26(Mon)14:46:18 No.108852479

File: file.png (183 KB, 532x360)

183 KB PNG

"deepseek is illegal" - Georgi Gerganov 2026

Anonymous
05/18/26(Mon)14:48:55 No.108852494

Anonymous 05/18/26(Mon)14:48:55 No.108852494

>>108852400
I used MTP GGUFs along with a forked version of llama.cpp that supported MTP before the official merge and t/s was, in my anecdotal sessions, noticeably faster. Not something crazy like 5x. More like 1.3 or 1.5 or something, And this was set with the a draft setting of 2 (what most people and even llama.cpp recommend when using it for coding tasks)

Anonymous
05/18/26(Mon)14:49:39 No.108852500

Anonymous 05/18/26(Mon)14:49:39 No.108852500

>>108852479
DeepSeek illegal until they make R2 with creative outputs that makes kino

Anonymous
05/18/26(Mon)14:50:24 No.108852504

Anonymous 05/18/26(Mon)14:50:24 No.108852504

>>108852467
How would that even occur? Don't most home networks have a firewall up to prevent that kind of shit from happening in the first place?

>"He should have said a username and password"

Yea I know I made sure I did that too whenever I connected my phone to my ST instance but the point is I'm confused as to how you're even able to find that in the first place.

Anonymous
05/18/26(Mon)14:51:49 No.108852512

Anonymous 05/18/26(Mon)14:51:49 No.108852512

>>108852479
>"deepseek is illegal" - Georgi Gerganov 2026
Now that we have Kimi and Mimo at 1T+, there isn't even really a need for DS4

Anonymous
05/18/26(Mon)14:53:49 No.108852524

Anonymous 05/18/26(Mon)14:53:49 No.108852524

>>108852504
Not a home network. He was renting GPUs, and decided to host ServiceTesnor on the same box without any care given to harden anything.

Anonymous
05/18/26(Mon)14:54:25 No.108852531

Anonymous 05/18/26(Mon)14:54:25 No.108852531

>>108852512
but 1.6T is bigger than 1T

Anonymous
05/18/26(Mon)14:55:15 No.108852538

Anonymous 05/18/26(Mon)14:55:15 No.108852538

>>108852512
Kimi's better anyway, Deepseek's only grace is slightly lower cost.

Anonymous
05/18/26(Mon)14:55:59 No.108852541

Anonymous 05/18/26(Mon)14:55:59 No.108852541

File: 1777928555945986.gif (517 KB, 444x240)

517 KB GIF

>>108852500

Anonymous
05/18/26(Mon)14:56:39 No.108852547

Anonymous 05/18/26(Mon)14:56:39 No.108852547

>>108852524
Any funny stories about his logs?

Anonymous
05/18/26(Mon)15:01:49 No.108852565

Anonymous 05/18/26(Mon)15:01:49 No.108852565

File: shrugging fried suiseisek(...).jpg (78 KB, 592x600)

78 KB JPG

>>108852547
Nope. Was 2 years ago and I didn't bother exporting anything after translating a few chats for fun. Just had my jollies and left him a gift. Wonder if he's still cooming to this day, hopefully locally?

Anonymous
05/18/26(Mon)15:04:27 No.108852576

Anonymous 05/18/26(Mon)15:04:27 No.108852576

File: file.png (33 KB, 1155x283)

33 KB PNG

Anonymous
05/18/26(Mon)15:05:41 No.108852587

Anonymous 05/18/26(Mon)15:05:41 No.108852587

>>108852576
won't happy to see his comments :(

Anonymous
05/18/26(Mon)15:06:06 No.108852588

Anonymous 05/18/26(Mon)15:06:06 No.108852588

>>108851703
Sounds interesting and pretty sure none of the current software does that. Maybe you can vibe code it.

Anonymous
05/18/26(Mon)15:08:22 No.108852604

Anonymous 05/18/26(Mon)15:08:22 No.108852604

>>108852588
>Maybe you can vibe code it.
Seems way above my skill level even with AI assistance.

Anonymous
05/18/26(Mon)15:10:58 No.108852621

Anonymous 05/18/26(Mon)15:10:58 No.108852621

File: image_2026-05-18.png (24 KB, 1354x168)

24 KB PNG

HE IS FULL MASK OFF NOW!

Anonymous
05/18/26(Mon)15:11:59 No.108852629

Anonymous 05/18/26(Mon)15:11:59 No.108852629

>>108852621
shut it down

Anonymous
05/18/26(Mon)15:13:23 No.108852636

Anonymous 05/18/26(Mon)15:13:23 No.108852636

>>108852621
I think he just hates AI slopcode. Which is highly ironic.

Anonymous
05/18/26(Mon)15:16:22 No.108852658

Anonymous 05/18/26(Mon)15:16:22 No.108852658

>>108852604
I don't think it's that bad. Transformers and the attention mechanism are pretty well known in 2025 and there are many people who have written code to visualize attention, just not integrated with Llama.cpp or any frontends people use.

Anonymous
05/18/26(Mon)15:17:55 No.108852664

Anonymous 05/18/26(Mon)15:17:55 No.108852664

https://huggingface.co/deepseek-ai/DeepSeek-V5-Exp

Anonymous
05/18/26(Mon)15:18:12 No.108852667

Anonymous 05/18/26(Mon)15:18:12 No.108852667

>>108852636
Not necessarily, he hates people who don't understand what they are doing.
Pjotr at least understands.

Anonymous
05/18/26(Mon)15:20:10 No.108852677

Anonymous 05/18/26(Mon)15:20:10 No.108852677

>>108852664
I usually click all of these on reflex and even I'm not falling for this one

Anonymous
05/18/26(Mon)15:21:46 No.108852686

Anonymous 05/18/26(Mon)15:21:46 No.108852686

>>108851703
It probably starts with organic "dick" and "pussy" and then it shits out the first "smile widening" "shiver" "I don't bite... unless you want me to" and then that thing starts glowing red like the sun and then it is basically over.

Anonymous
05/18/26(Mon)15:23:30 No.108852695

Anonymous 05/18/26(Mon)15:23:30 No.108852695

can local even be saved at this point

Anonymous
05/18/26(Mon)15:25:35 No.108852704

Anonymous 05/18/26(Mon)15:25:35 No.108852704

>>108851703
Aren't SAE like gemmascope doing something similar already?

Anonymous
05/18/26(Mon)15:25:56 No.108852707

Anonymous 05/18/26(Mon)15:25:56 No.108852707

>>108852428
31B with some experts I can stick into RAM would be great though, then I can actually make use of the unused RAM and CPU to boost intelligence even if just a tiny bit. Instead of hating MoE you should be hating companies for implementing the version of MoE that's not well-suited towards consumer PCs.

Anonymous
05/18/26(Mon)15:27:10 No.108852716

Anonymous 05/18/26(Mon)15:27:10 No.108852716

https://huggingface.co/deepseek-ai/DeepSeek-V6-41B-ERP-Edition

Anonymous
05/18/26(Mon)15:28:05 No.108852727

Anonymous 05/18/26(Mon)15:28:05 No.108852727

>>108852716
LOCAL IS SAVED

Anonymous
05/18/26(Mon)15:28:40 No.108852735

Anonymous 05/18/26(Mon)15:28:40 No.108852735

File: amity joker.png (561 KB, 1093x608)

561 KB PNG

>>108852716
why on earth did i think this was real every time

Anonymous
05/18/26(Mon)15:28:50 No.108852737

Anonymous 05/18/26(Mon)15:28:50 No.108852737

>>108852727
>404
Is that the joke?

Anonymous
05/18/26(Mon)15:30:08 No.108852742

Anonymous 05/18/26(Mon)15:30:08 No.108852742

https://huggingface.co/deepseek-ai/DeepSeek-V7-This-Link-Is-Fake

Anonymous
05/18/26(Mon)15:34:02 No.108852765

Anonymous 05/18/26(Mon)15:34:02 No.108852765

>>108852737
No anon, You are

Anonymous
05/18/26(Mon)15:38:08 No.108852793

Anonymous 05/18/26(Mon)15:38:08 No.108852793

File: fell for it again award m(...).png (717 KB, 1024x925)

717 KB PNG

>>108852742

Anonymous
05/18/26(Mon)15:39:02 No.108852801

Anonymous 05/18/26(Mon)15:39:02 No.108852801

>>108852621
Punished Georgi...

Anonymous
05/18/26(Mon)15:41:11 No.108852809

Anonymous 05/18/26(Mon)15:41:11 No.108852809

>>108852735
esl-friend....

Anonymous
05/18/26(Mon)15:43:31 No.108852824

Anonymous 05/18/26(Mon)15:43:31 No.108852824

Deespeek will save local.

Anonymous
05/18/26(Mon)15:43:43 No.108852826

Anonymous 05/18/26(Mon)15:43:43 No.108852826

Earlier in the LLM industry, it was shown, particularly with Solar 10B, that you could stick extra layers onto a model, continue pretraining, and obtain superior performance. So that should be possible with dense + MoE layers too. Imagine if someone had the compute and expertise to do that. We could have models that truly made full use of our poorfag setups.

Anonymous
05/18/26(Mon)15:45:59 No.108852835

Anonymous 05/18/26(Mon)15:45:59 No.108852835

File: file.png (119 KB, 349x393)

119 KB PNG

>>108852826
>Imagine if someone had the compute and expertise to do that.

Anonymous
05/18/26(Mon)15:46:38 No.108852837

Anonymous 05/18/26(Mon)15:46:38 No.108852837

>>108852826
yeah it would be pretty neat. I made a little qwen 0.6b moe conversion, I didn't benchmark it, lol, but you can just add moe adapter layers and train it and the loss will go down.

Anonymous
05/18/26(Mon)15:48:36 No.108852853

Anonymous 05/18/26(Mon)15:48:36 No.108852853

>>108852835
Pic unrelated

Anonymous
05/18/26(Mon)15:51:36 No.108852864

Anonymous 05/18/26(Mon)15:51:36 No.108852864

https://unsloth.ai/docs/new/studio

plz add to next OP. thanks.

Anonymous
05/18/26(Mon)15:51:38 No.108852865

Anonymous 05/18/26(Mon)15:51:38 No.108852865

>>108852826
I just want hyper specialized models that do one specific thing very well around 4-6b so I can fit it inside/run alongside other things, rather than large general purpose models that do whatever.

Anonymous
05/18/26(Mon)15:53:45 No.108852875

Anonymous 05/18/26(Mon)15:53:45 No.108852875

>>108852864
Baker, I will call you an unslop shill until the end of time if you do this.

Anonymous
05/18/26(Mon)15:56:04 No.108852891

Anonymous 05/18/26(Mon)15:56:04 No.108852891

>>108849970
>Millions of pirated books
In one mega torrent?

Anonymous
05/18/26(Mon)15:57:57 No.108852902

Anonymous 05/18/26(Mon)15:57:57 No.108852902

>>108852826
This is what's going to happen when they stuff Gemma 31b into the next Gemini's dense layer.
>>108852621
>>108852667
The vibecoding is just a pretext. They had no trouble with pidor's broken Gemma support release and going through the process of fixing it after the fact.

Anonymous
05/18/26(Mon)16:01:49 No.108852921

Anonymous 05/18/26(Mon)16:01:49 No.108852921

>>108852765
Thank you for your support.

Anonymous
05/18/26(Mon)16:03:13 No.108852925

Anonymous 05/18/26(Mon)16:03:13 No.108852925

>>108852924
>>108852924
>>108852924

Anonymous
05/18/26(Mon)16:05:56 No.108852942

Anonymous 05/18/26(Mon)16:05:56 No.108852942

>>108852252
Anon you are not seeing the picture correctly, she is not sitting on the window ledge she is just floating.

Anonymous
05/18/26(Mon)16:12:53 No.108852979

Anonymous 05/18/26(Mon)16:12:53 No.108852979

>>108852835
is this retard still buying ads on 4chan?

Anonymous
05/18/26(Mon)16:14:52 No.108852995

Anonymous 05/18/26(Mon)16:14:52 No.108852995

>>108852979
AI (gemma) took his job

Anonymous
05/18/26(Mon)16:22:18 No.108853045

Anonymous 05/18/26(Mon)16:22:18 No.108853045

>>108852891
If you paid for priority access to Anna's Archive, yes, apparently.

Anonymous
05/18/26(Mon)16:24:58 No.108853066

Anonymous 05/18/26(Mon)16:24:58 No.108853066

File: 1776834382078680.png (103 KB, 1000x600)

103 KB PNG

>>108852865
While transformers don't have perfect generalization capability, they're still way better than a ton of other architectures. What would be better is a MoE model where the experts can dynamically be switched in and out of your fast memory depending on the task/context. And supposedly one backend did it with the existing models (which isn't optimal; we'd still want MoEs designed for it)...

But I suppose there is one good thing about the small specialized route which is that it's modular in the sense that you can upgrade each model as they come out rather than depend on the single huge model to update. But we do not live in that world, ignoring TTS/STT/OCR/embedding models, which are nice, but also have their own downsides.

Anonymous
05/18/26(Mon)16:31:49 No.108853117

Anonymous 05/18/26(Mon)16:31:49 No.108853117

>>108853045
Hopefully that magnet comes out in discovery then, I for one would like to keep an archive of millions of books.

Anonymous
05/18/26(Mon)16:43:33 No.108853179

Anonymous 05/18/26(Mon)16:43:33 No.108853179

>>108852565

>Just had my jollies and left him a gift.
I'm curious as to what this "gift" was

Anonymous
05/18/26(Mon)17:31:20 No.108853531

Anonymous 05/18/26(Mon)17:31:20 No.108853531

>>108849670
Yep that's it. One author's summary on Reddit: SF Bay Gloryhole Ed. is more comprehensible than the paper:
https://news.ycombinator.com/item?id=48154866
From the parent thread and GitHub issues, sounds like people are locally reproducing the perf results, which is a good sign. Will check it out once they have the 27B dense.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.