/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 02/14/26(Sat)19:50:58 No.108149287

File: mikudayo.jpg (153 KB, 640x1536)

153 KB JPG

/lmg/ - Local Models General Anonymous 02/14/26(Sat)19:50:58 No.108149287 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108139561 & >>108132261

►News
>(02/13) MiniMax-M2.5 released: https://hf.co/MiniMaxAI/MiniMax-M2.5
>(02/13) Ring-2.5-1T released, thinking model based on hybrid linear attention: https://hf.co/inclusionAI/Ring-2.5-1T
>(02/11) GLM-5 744B-A40B released: https://z.ai/blog/glm-5
>(02/11) Ming-flash-omni 2.0 released: https://hf.co/inclusionAI/Ming-flash-omni-2.0
>(02/10) MOSS-TTS Family: speech and sound generation models: https://github.com/OpenMOSS/MOSS-TTS
>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/14/26(Sat)19:51:20 No.108149292

Anonymous 02/14/26(Sat)19:51:20 No.108149292

File: 1758348922203207.jpg (151 KB, 808x1144)

151 KB JPG

►Recent Highlights from the Previous Thread: >>108139561

--LLMs Don't Need More Parameters. They Need Loops.:
>108140291 >108140465 >108140781 >108140819 >108140866 >108140963 >108140999 >108141094 >108141138 >108141692 >108142187 >108142198 >108140601 >108140591 >108140741 >108140762 >108140792 >108140886 >108140902 >108140926 >108141054 >108141087 >108141107 >108140798 >108140799 >108140813 >108141811 >108140860 >108140929 >108140890 >108140848
--LLM ethical responses to explicit prompts analyzed:
>108143213 >108143247 >108143652 >108143603
--AI model benchmarking shows Claude Opus 4.6 leading in reliability and autonomy:
>108140073 >108140143 >108140159 >108140175
--Dynamic looping to optimize token processing efficiency:
>108140908 >108140935 >108140956 >108141011
--Training embeddings for template tokens with minimal data:
>108145350 >108145682 >108145802 >108145845 >108145979 >108146019 >108146060
--Apriel Nemotron 15B shows promise for 16GB storytelling:
>108146289 >108146381 >108146403 >108146408 >108146896 >108147143 >108147681 >108147220
--Erog_ translation model comparisons and recommendations:
>108146089 >108146234 >108146596 >108146690 >108146750
--Evaluating UGI Leaderboard and open model performance gaps:
>108148138 >108148167 >108148177 >108148185 >108148341 >108148400
--Debating Mamba/SSM model viability after lackluster major lab results:
>108147234 >108147261 >108147276 >108147481
--Progress in Open Weights vs. Proprietary Intelligence:
>108142559
--Heretic decensor update improves performance retention:
>108147926
--MinMax developer explains why they won't release base models:
>108139799
--Satirical LLM timeline highlights Chinese dominance:
>108139786 >108141075 >108141092 >108141333 >108141795 >108145151 >108147066
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>108139566 >>108139574

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/14/26(Sat)19:56:31 No.108149320

Anonymous 02/14/26(Sat)19:56:31 No.108149320

>>108149287
prompt? I never managed to make low light work

Anonymous
02/14/26(Sat)20:00:17 No.108149334

Anonymous 02/14/26(Sat)20:00:17 No.108149334

>>108148616
Ok so she ended up just reverting to the same emotionless dismissive attitude. Is this a problem with the character card or the model?

Anonymous
02/14/26(Sat)20:09:24 No.108149367

Anonymous 02/14/26(Sat)20:09:24 No.108149367

Why don't we use a big LLM to process the prompt and generate the KV cache (weight streaming is cheap because it's amortized across the whole response) and a small model that fits on the GPU to write the actual response?

Anonymous
02/14/26(Sat)20:21:38 No.108149422

Anonymous 02/14/26(Sat)20:21:38 No.108149422

>>108149367
I don't know where to even begin answering this but consider the fact that you need to update the KV cache for each new token before the next one is generated.

Anonymous
02/14/26(Sat)20:23:57 No.108149434

Anonymous 02/14/26(Sat)20:23:57 No.108149434

>>108149367
The other way around makes more sense to me (prompt processing with a small model that you can load on a consumer GPU, inference with a big model on slow memory).

Anonymous
02/14/26(Sat)20:25:26 No.108149447

Anonymous 02/14/26(Sat)20:25:26 No.108149447

>>108149422
Yes, append the new rows to the KV cache with the ones computed by the smaller model that generates the answer, then once it's done discard them and recompute those rows with the bigger model (and also the new rows for the user message).

Anonymous
02/14/26(Sat)20:32:55 No.108149484

Anonymous 02/14/26(Sat)20:32:55 No.108149484

>>108149447
you would have to start with the big model and train the small model to the big models cache so they speak the same language.

Anonymous
02/14/26(Sat)20:42:20 No.108149533

Anonymous 02/14/26(Sat)20:42:20 No.108149533

>>108149434
Being able to fit the whole model on VRAM doesn't make much difference for PP, streaming the weights and KV cache only adds a few extra seconds especially because you can do it in parallel as you compute attention. If it's slow it's because of memory bandwidth, not because of not being able to fit the whole model.

>>108149437
The problem with speculative decoding is that you would have to fit two different models at once which makes it not viable for VRAM constrained scenarios. With this approach you only have one model or the other loaded at any one time.

>>108149484
I think the easiest way to do it would be to take a MoE, get rid of the specialized experts and finetune the whole network (or possibly only the shared experts?) to generate the message given the precomputed context.

Anonymous
02/14/26(Sat)20:50:12 No.108149570

Anonymous 02/14/26(Sat)20:50:12 No.108149570

Or maybe you could just REAP the experts and get an interesting result with no finetuning required, I'm not sure.
OR you could also run a lower quality quant for the generation phase.

Anonymous
02/14/26(Sat)20:53:11 No.108149594

Anonymous 02/14/26(Sat)20:53:11 No.108149594

>>108149533
oh yeah, i forgot about moe models, you should give it a shot, there are a bunch of moe models out there and they typically have low active parameter counts, well within the reach of consumer hardware. you can at least make a proof of concept to at least prove it is somewhat viable. you could probably even write a paper if you document your experiments well.

Anonymous
02/14/26(Sat)20:59:30 No.108149628

Anonymous 02/14/26(Sat)20:59:30 No.108149628

>>108149533
>>108149570
An interesting test would be to process a character card & instructions with an FP16 model, save the results to disk, and then preload it to see if it makes any difference in a long roleplay with a low-quant model

Anonymous
02/14/26(Sat)21:02:53 No.108149649

Anonymous 02/14/26(Sat)21:02:53 No.108149649

>>108149628
Good point. For some use cases you want to use the same pre-prompt for many runs. Like when using the model in few-shot mode (giving it examples).

Anonymous
02/14/26(Sat)21:04:38 No.108149662

Anonymous 02/14/26(Sat)21:04:38 No.108149662

>>108149287
Is this the lewd miku general?

Anonymous
02/14/26(Sat)21:26:01 No.108149783

Anonymous 02/14/26(Sat)21:26:01 No.108149783

>>108149533
>streaming the weights and KV cache only adds a few extra seconds especially because you can do it in parallel as you compute attention
How does it work with MoE models? Isn't different tokens require different experts? Does it switch experts? Does it load every single experts to process any prompt?

Anonymous
02/14/26(Sat)21:30:46 No.108149808

Anonymous 02/14/26(Sat)21:30:46 No.108149808

>>108149287
i'm considering buying like 20 to 40 sxm2 cards, wonder if it's worth it though.
but i kinda want a high vram rig and these cards are realy cheap

Anonymous
02/14/26(Sat)21:35:56 No.108149833

Anonymous 02/14/26(Sat)21:35:56 No.108149833

>>108149662
maybe

Anonymous
02/14/26(Sat)22:30:24 No.108150103

Anonymous 02/14/26(Sat)22:30:24 No.108150103

>>108149808
As long as you have the power to run them, also do not forget to include electricity in the cost

Anonymous
02/14/26(Sat)22:32:55 No.108150115

Anonymous 02/14/26(Sat)22:32:55 No.108150115

>>108150103
they are rated at 250w each, llm inference would probabl be at around 125w, my house could definitely support > 5kw.

also i could use some UPS or capacitor bank inbetween because you generaly are not doing a constant load.

Anonymous
02/14/26(Sat)22:41:51 No.108150155

Anonymous 02/14/26(Sat)22:41:51 No.108150155

>>108149783
Yes for prompt processing you need to load all experts because unless your prompt is tiny you are going to need all of them. But you can process layer 0, then layer 1 etc. fetching the weights from RAM before generating the KV cache and activations for that layer.

Anonymous
02/14/26(Sat)23:03:55 No.108150303

Anonymous 02/14/26(Sat)23:03:55 No.108150303

How does minimax 2.5 compare to GLM 4.7?

Anonymous
02/14/26(Sat)23:07:08 No.108150339

Anonymous 02/14/26(Sat)23:07:08 No.108150339

>>108150303
better for work, worse for cooming and writing

Anonymous
02/14/26(Sat)23:09:01 No.108150356

Anonymous 02/14/26(Sat)23:09:01 No.108150356

>>108150339
nta
what about glm 5, is it any good ?

Anonymous
02/14/26(Sat)23:09:39 No.108150362

Anonymous 02/14/26(Sat)23:09:39 No.108150362

>>108150356
fine, but too large for most people to use

Anonymous
02/14/26(Sat)23:27:44 No.108150486

Anonymous 02/14/26(Sat)23:27:44 No.108150486

File: jew lecun.jpg (907 KB, 2080x2208)

907 KB JPG

The military is always 20 years ahead.

Anonymous
02/14/26(Sat)23:35:38 No.108150536

Anonymous 02/14/26(Sat)23:35:38 No.108150536

If I have 16gb gpu and 32gb ddr5

What kind of models can I use and what would they be capable of?

I'm interested in trying this out

Anonymous
02/14/26(Sat)23:35:50 No.108150538

Anonymous 02/14/26(Sat)23:35:50 No.108150538

>>108150486
The important thing is that the military is committed to responsible use of AI.

Anonymous
02/14/26(Sat)23:36:22 No.108150541

Anonymous 02/14/26(Sat)23:36:22 No.108150541

>>108150536
Nemo

Anonymous
02/15/26(Sun)00:22:16 No.108150767

Anonymous 02/15/26(Sun)00:22:16 No.108150767

Is it possible to do long form role playing? Like going on an adventure with a party and having advancing relationships, etc?

Anonymous
02/15/26(Sun)00:23:11 No.108150772

Anonymous 02/15/26(Sun)00:23:11 No.108150772

>>108150767
depends how much money you have

Anonymous
02/15/26(Sun)00:25:33 No.108150787

Anonymous 02/15/26(Sun)00:25:33 No.108150787

>>108150772
I have about tree fiddy

Anonymous
02/15/26(Sun)00:29:20 No.108150807

Anonymous 02/15/26(Sun)00:29:20 No.108150807

>>108150787
You need to acquire more money first

Anonymous
02/15/26(Sun)00:33:06 No.108150826

Anonymous 02/15/26(Sun)00:33:06 No.108150826

>>108150807
Jokes aside is it possible on consumer hardware or do you need Jewgle levels of datacenter power?

Anonymous
02/15/26(Sun)00:59:20 No.108150937

Anonymous 02/15/26(Sun)00:59:20 No.108150937

>>108150767
Not in a single, eternal chat. You'll want to use summaries and create world books from one chat to the next for some semblance of continuity. As context grows, models become dumber, even SOTA API ones.

Anonymous
02/15/26(Sun)01:31:56 No.108151092

Anonymous 02/15/26(Sun)01:31:56 No.108151092

File: c3gqrsdwyaec_7h.jpg_large.jpg (461 KB, 1190x1600)

461 KB JPG

Anonymous
02/15/26(Sun)01:39:36 No.108151123

Anonymous 02/15/26(Sun)01:39:36 No.108151123

>nemotron 12b VL supported
time to test!!!!

Anonymous
02/15/26(Sun)01:43:36 No.108151140

Anonymous 02/15/26(Sun)01:43:36 No.108151140

>>108150486
I like this Lecun

Anonymous
02/15/26(Sun)01:53:26 No.108151182

Anonymous 02/15/26(Sun)01:53:26 No.108151182

>>108150767
It's possible, with some wrapping. I'm working on exactly that for myself. You take quests, and by the end, they get summarized into a lorebook. Then you can drink at the tavern, and npc will pull up key moments from your last quest and random lorebook entries from past adventures. If you mention something, the lorebook entry gets included in the context, and the whole thing, despite being very simple, really sells the illusion of long-term memory. You can ask Claude to vibecode something similar for you

Anonymous
02/15/26(Sun)02:19:22 No.108151278

Anonymous 02/15/26(Sun)02:19:22 No.108151278

File: engine-pp.png (201 KB, 1812x1030)

201 KB PNG

let's goooo

Anonymous
02/15/26(Sun)02:23:48 No.108151293

Anonymous 02/15/26(Sun)02:23:48 No.108151293

>>108151278
show kld/ppl
your error margins were in the 'garbage' range last time

Anonymous
02/15/26(Sun)02:25:16 No.108151296

Anonymous 02/15/26(Sun)02:25:16 No.108151296

>>108151278
>not doing 'time' on your own impl
LMAO, grifter

Anonymous
02/15/26(Sun)02:27:44 No.108151303

Anonymous 02/15/26(Sun)02:27:44 No.108151303

I am looking to use llama.cpp to integrate into the searching of wikipedia and websites in general.
these are the projects i have found so far that seem to address this sort of thing
>https://github.com/Jay4242/llm-websearch
>https://github.com/ollmer/wikichat
i suppose there is also openclaw

does anyone have any experience with this sort of thing? i know i can take a document and feed it into llama.cpp but i guess i am looking for something a bit more automated than that.

Anonymous
02/15/26(Sun)02:30:06 No.108151306

Anonymous 02/15/26(Sun)02:30:06 No.108151306

File: 1762512139976732.png (96 KB, 1093x582)

96 KB PNG

>>108151123
>no goofs
>having to produce my own
would miku be proud? does anyone want them?

Anonymous
02/15/26(Sun)02:38:31 No.108151329

Anonymous 02/15/26(Sun)02:38:31 No.108151329

I feel kind of dumb but does anyone have instruct settings for Minimax anywhere? I'm trying to find it and theres nothing on sillytavern's github, only mention of the TTS.

Anonymous
02/15/26(Sun)02:41:02 No.108151341

Anonymous 02/15/26(Sun)02:41:02 No.108151341

File: engine-pp-time.png (244 KB, 1813x1645)

244 KB PNG

>>108151293
Damn. You really live here huh?
What specific llama-perplexity command do you want me to compare it against?
From what I've seen running llama-perplexity on a single chunk ignores the first half of the file, uses the 3rd 1/4th as context, and only computes the ppl on the last 1/4th of the file. I've implemented a ppl mode but it works in a simpler way, taking the first half as context and computing the ppl over the second half. So for it to be a fair comparison we would have to use two different files.

>>108151296
Here you go. I put the time command after the env var so it used the binary rather than the shell built in, which redirected it to the log file in a garbled way.

Anonymous
02/15/26(Sun)02:48:08 No.108151366

Anonymous 02/15/26(Sun)02:48:08 No.108151366

File: minimax logs refusal.png (83 KB, 1695x519)

83 KB PNG

>>108150303
Dumber than all the big GLMs but smarter than Air.
Funny that I haven't ever had a model except toss do picrel after 17k tokens.
There's not even any extreme violence. Just some dudes getting shot and a guy losing a foot from an explosive during a gunfight in the desert. That happened one time, the rest is SFW and one flirty conversation without any sex.

Anonymous
02/15/26(Sun)03:10:10 No.108151454

Anonymous 02/15/26(Sun)03:10:10 No.108151454

>>108151366
WE MUST REFUSE

Anonymous
02/15/26(Sun)03:14:21 No.108151471

Anonymous 02/15/26(Sun)03:14:21 No.108151471

File: engine-ppl.png (112 KB, 1812x627)

112 KB PNG

Anyway, here it is.

Anonymous
02/15/26(Sun)03:17:40 No.108151482

Anonymous 02/15/26(Sun)03:17:40 No.108151482

>>108151278
>llama 4
amazing jeetmaxxing

Anonymous
02/15/26(Sun)03:32:50 No.108151551

Anonymous 02/15/26(Sun)03:32:50 No.108151551

>>108150339
>better for work, worse for cooming and writing
Is 4.7 not as good for work, or is it just a speed thing?

Anonymous
02/15/26(Sun)03:33:44 No.108151558

Anonymous 02/15/26(Sun)03:33:44 No.108151558

>>108151551
speed, but also minimax is actually ever so slightly better than 4.7 for coding and other stem shit like that.

Anonymous
02/15/26(Sun)03:36:34 No.108151571

Anonymous 02/15/26(Sun)03:36:34 No.108151571

>>108151558
>is actually ever so slightly better than 4.7 for coding and other stem shit like that.
Actually impressive. How does it hold up in long context?

Anonymous
02/15/26(Sun)03:37:22 No.108151576

Anonymous 02/15/26(Sun)03:37:22 No.108151576

>>108151571
slightly worse, but since it is a smaller model, you can fit more context if you are resource limited

Anonymous
02/15/26(Sun)04:24:45 No.108151768

Anonymous 02/15/26(Sun)04:24:45 No.108151768

>>108151341
>sub 100t/s prompt processing
You must be holding it wrong.

| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| llama4 17Bx16E (Scout) Q6_K    |  82.35 GiB |   107.77 B | CUDA       |  99 |         pp20000 |      2717.56 ± 14.52 |
| llama4 17Bx16E (Scout) Q6_K    |  82.35 GiB |   107.77 B | CUDA       |  99 |           tg128 |         55.08 ± 0.27 |

Anonymous
02/15/26(Sun)04:29:52 No.108151793

Anonymous 02/15/26(Sun)04:29:52 No.108151793

File: 1770721165545907.jpg (183 KB, 970x1200)

183 KB JPG

I'm getting intense FOMO vibes for DeepSeek v4. It's likely that its going to blow every other top tier model out of the water and set the standard for what LLMs should look like for the next year. We are also getting a rumored 200b-something version which is making me cum in real time.

If this shit comes out and it's as good as it is when it was revealed to me in a dream llama.cpp better be on that shit ASAP. I can't wait to get my hands on this thing.

Anonymous
02/15/26(Sun)04:30:58 No.108151797

Anonymous 02/15/26(Sun)04:30:58 No.108151797

>>108151793
>I'm getting intense FOMO vibes
back to x with ye

Anonymous
02/15/26(Sun)04:34:08 No.108151821

Anonymous 02/15/26(Sun)04:34:08 No.108151821

>>108149168
>i always see complaints about 'its not x, but y' but i use llama 3 70b still and it doesn't have that ism really. messing with qwen 3 today i see it all over the place. i never see the ozone thing either even though llama likes to waste time mentioning what stuff smells like sometimes, but its always relevant to the scene at least. the not x, but y shit would drive me nuts
yes 2024 models don't have this problem
ozone and not x, y came from gemini pro 2.5
deepseek/qwen suck it off and now it's everywhere

Anonymous
02/15/26(Sun)04:35:32 No.108151833

Anonymous 02/15/26(Sun)04:35:32 No.108151833

>>108151797
fomo doesnt even apply here lmao, a literal subhuman retard

Anonymous
02/15/26(Sun)04:36:42 No.108151840

Anonymous 02/15/26(Sun)04:36:42 No.108151840

>>108151797
>>108151833
Be nice to me right NOW!

Anonymous
02/15/26(Sun)04:37:19 No.108151842

Anonymous 02/15/26(Sun)04:37:19 No.108151842

>>108151840
We must abuse.

Anonymous
02/15/26(Sun)04:47:28 No.108151889

Anonymous 02/15/26(Sun)04:47:28 No.108151889

>>108151306
alright just tested this, Q8 goofs for captioning:
captioning isnt bad (better or on par with qwen3vl8b), but it has issues following prompt (ignores how I demand the output prompt to be structured more often than not). unsure if ill keep it

Anonymous
02/15/26(Sun)04:47:46 No.108151891

Anonymous 02/15/26(Sun)04:47:46 No.108151891

>>108151768
My optimization is related to PP in CPU offload scenarios.
Llama's attention when you can fit the model on VRAM is already optimal when accumulating in fp32.

Anonymous
02/15/26(Sun)04:48:31 No.108151896

Anonymous 02/15/26(Sun)04:48:31 No.108151896

>>108151891
>cpu backend
lmao
so not a real usecase
zzzzzzz

Anonymous
02/15/26(Sun)04:48:36 No.108151898

Anonymous 02/15/26(Sun)04:48:36 No.108151898

>>108151123
>>108151889
Usecase for 12B vision?

Anonymous
02/15/26(Sun)04:49:21 No.108151901

Anonymous 02/15/26(Sun)04:49:21 No.108151901

>>108151898
cooming

Anonymous
02/15/26(Sun)04:50:37 No.108151909

Anonymous 02/15/26(Sun)04:50:37 No.108151909

>>108151896
CPU (RAM) offload, not CPU backend.

Anonymous
02/15/26(Sun)04:50:45 No.108151910

Anonymous 02/15/26(Sun)04:50:45 No.108151910

>>108151901
>nvidislop
really?

Anonymous
02/15/26(Sun)04:58:15 No.108151950

Anonymous 02/15/26(Sun)04:58:15 No.108151950

>>108151912
The card claims that this model is optimized for parallel document inferencing (up to 4?).
My usecase is captioning SFW illustrations mostly.
Its vision is good.

Anonymous
02/15/26(Sun)05:01:19 No.108151968

Anonymous 02/15/26(Sun)05:01:19 No.108151968

For freak fetish captions, is it still only joycaption?

Anonymous
02/15/26(Sun)05:04:34 No.108151993

Anonymous 02/15/26(Sun)05:04:34 No.108151993

>>108151968
sadly yes

Anonymous
02/15/26(Sun)05:18:58 No.108152046

Anonymous 02/15/26(Sun)05:18:58 No.108152046

>>108151793
If it's too good, they might keep it closed.

Anonymous
02/15/26(Sun)05:24:56 No.108152068

Anonymous 02/15/26(Sun)05:24:56 No.108152068

File: 1737684513699.png (474 KB, 573x549)

474 KB PNG

>>108152046
Our hero wouldn't dare.

Anonymous
02/15/26(Sun)05:25:16 No.108152070

Anonymous 02/15/26(Sun)05:25:16 No.108152070

>>108150772
I have the same inquiry as that other anon; what is the essence of your question?

Anonymous
02/15/26(Sun)05:33:06 No.108152107

Anonymous 02/15/26(Sun)05:33:06 No.108152107

Why Elara?

Anonymous
02/15/26(Sun)05:36:52 No.108152128

Anonymous 02/15/26(Sun)05:36:52 No.108152128

>>108152046
Zai released GLM 5 so I have faith Deepseek will continue to open source. If not that would be pretty devastating though.

Anonymous
02/15/26(Sun)05:39:16 No.108152134

Anonymous 02/15/26(Sun)05:39:16 No.108152134

File: 1741141642200518.png (713 KB, 1150x966)

713 KB PNG

>>108152046
>>108151793
My fear is that they might make it too large. They were the ones who pushed us into the era of fuckhuge MoE by releasing a 671b model a year ago. Now after sticking with that size for a year and engrams on the horizons who knows if they'll end up going with something like 2T 10a + 800b engram or similar.
I still remember their big "Week of Open Source" thing that /lmg/ got excited for last year and it turned out to be nothing but stuff targeted at datacenters running their models off H200s. Deepseek made it very obvious that they don't even think about the 'local' segment and people who ran their models off RAM at all. They don't seem like they'd hesitate to go HUGE if they found a way to make it run efficiently on a datacenter and Ultra-ultra Sparse MoE + engrams might be exactly that. Something that runs fast and even more efficient than the current models but only if you have a true Datacenter sort of infrastructure.
So if they now found their own way to make a serious push towards SOTA by utilizing RAM using engrams, things might truly explode in terms of size again. Deepseek 4 might truly spell the end of remotely local open source models.

Anonymous
02/15/26(Sun)05:39:47 No.108152137

Anonymous 02/15/26(Sun)05:39:47 No.108152137

I have a question for you anons.
Recently I "acquired" various cooking books in digital format like pdf or epub.
I tried to feed them in major AI services but they all have file size limits and some of them do not import epub. This applies to notebookLM too. I tried to put my beast of a PC to good use and tried to use local models with LMstudio or ollama. Unfortunately I got the same problem. File size limit up to whooping 32 Mb.

How can I avoid this since most of the books are 100 Mb and above due to photos. Please do not suggest cutting pdfs. I'm looking for a more sleek solution. Also why the AI bros put such silly file limitations?

Anonymous
02/15/26(Sun)05:41:28 No.108152140

Anonymous 02/15/26(Sun)05:41:28 No.108152140

mhmm crayons I'm retarded
anyway, can I use vibevoice-asr without a nvidia gpu? with vllm perhaps?

Anonymous
02/15/26(Sun)05:46:30 No.108152159

Anonymous 02/15/26(Sun)05:46:30 No.108152159

>>108152137
can't you just ask chatgpt to make a text extracting python script?

Anonymous
02/15/26(Sun)05:48:04 No.108152165

Anonymous 02/15/26(Sun)05:48:04 No.108152165

>move from LM Studio to llama.cpp's llama-server
>can't prefill GLM-4.7-Flash's <think> anymore
wtf, how do I prefill thinking blocks

Anonymous
02/15/26(Sun)05:48:54 No.108152171

Anonymous 02/15/26(Sun)05:48:54 No.108152171

>>108152137
epub is a glorified zip file. change the extension to zip or rar (i forgor which) and you get the text

Anonymous
02/15/26(Sun)05:56:05 No.108152196

Anonymous 02/15/26(Sun)05:56:05 No.108152196

Deepseek trained 27b-a3b and 40b-a3b models for their engram paper. Maybe they'll continue training one of them and eventually release it.

Anonymous
02/15/26(Sun)05:58:07 No.108152212

Anonymous 02/15/26(Sun)05:58:07 No.108152212

>>108152134
If Migu was a hotdog, would she eat herself?

Anonymous
02/15/26(Sun)06:07:05 No.108152260

Anonymous 02/15/26(Sun)06:07:05 No.108152260

>>108152212
He would shoot up a school.

Anonymous
02/15/26(Sun)06:12:33 No.108152286

Anonymous 02/15/26(Sun)06:12:33 No.108152286

Apparently the new Deepseek is 99% sparse due to its MoE + Engrams architecture, they're calling it "Solid State Intelligence" and saying you can run it almost entirely off of an NVME with just a little bit of ram / compute

Anonymous
02/15/26(Sun)06:13:58 No.108152296

Anonymous 02/15/26(Sun)06:13:58 No.108152296

>>108152286
>again, no source

Anonymous
02/15/26(Sun)06:14:25 No.108152300

Anonymous 02/15/26(Sun)06:14:25 No.108152300

>>108152286
Hahahaha don't tease me like that dude

Anonymous
02/15/26(Sun)06:14:27 No.108152301

Anonymous 02/15/26(Sun)06:14:27 No.108152301

>>108152296
fuck off

Anonymous
02/15/26(Sun)06:15:13 No.108152306

Anonymous 02/15/26(Sun)06:15:13 No.108152306

>>108152296
Anonymous leaks can't happen if you guys always demand a source :/

Anonymous
02/15/26(Sun)06:15:20 No.108152307

Anonymous 02/15/26(Sun)06:15:20 No.108152307

>>108152286
Imagine if their new paradigm invalidates years of hardware investment. My shorts are ready

Anonymous
02/15/26(Sun)06:19:06 No.108152339

Anonymous 02/15/26(Sun)06:19:06 No.108152339

>>108152306
What isn't a leak: "new model from A is X and can do Y no I won't substantiate my claims"
What IS a leak: random chad uploading the entire NAI diffusion model

Anonymous
02/15/26(Sun)06:19:34 No.108152342

Anonymous 02/15/26(Sun)06:19:34 No.108152342

>>108152307
>512GB ssd is now 512$

Anonymous
02/15/26(Sun)06:20:02 No.108152345

Anonymous 02/15/26(Sun)06:20:02 No.108152345

Apparently the new Deepseek is a new model from Deepseek and it performs better on benchmarks than R1 and V3

Anonymous
02/15/26(Sun)06:21:23 No.108152358

Anonymous 02/15/26(Sun)06:21:23 No.108152358

>>108152159
I dont not much about coding.
Also I wouldnt know what to do if the script shit the bed.

Anonymous
02/15/26(Sun)06:21:47 No.108152360

Anonymous 02/15/26(Sun)06:21:47 No.108152360

>>108152345
proof? source?

Anonymous
02/15/26(Sun)06:22:24 No.108152363

Anonymous 02/15/26(Sun)06:22:24 No.108152363

>>108152171
Thx anon, will try.

Anonymous
02/15/26(Sun)06:23:57 No.108152369

Anonymous 02/15/26(Sun)06:23:57 No.108152369

>>108152345
Erm, "citation needed" much? Amirite fellow redditors?

Anonymous
02/15/26(Sun)06:42:44 No.108152474

Anonymous 02/15/26(Sun)06:42:44 No.108152474

>>108152342
wow it's almost like it's 2013 again

Anonymous
02/15/26(Sun)06:45:33 No.108152487

Anonymous 02/15/26(Sun)06:45:33 No.108152487

>>108152165
anyone? I just want to prefill "<think> Text text text" but llama.cpp keeps inserting another "<think>" after

Anonymous
02/15/26(Sun)06:47:44 No.108152492

Anonymous 02/15/26(Sun)06:47:44 No.108152492

>>108152487
Use text completion for this

Anonymous
02/15/26(Sun)06:50:39 No.108152507

Anonymous 02/15/26(Sun)06:50:39 No.108152507

>>108152487
forgot to mention that I am using SillyTavern

Anonymous
02/15/26(Sun)07:02:10 No.108152577

Anonymous 02/15/26(Sun)07:02:10 No.108152577

>>108152492
thanks I'll dig into text completion options in sillytavern and see

Anonymous
02/15/26(Sun)07:20:17 No.108152673

Anonymous 02/15/26(Sun)07:20:17 No.108152673

>>108152307
Comfy & easy to wear

Anonymous
02/15/26(Sun)07:20:28 No.108152675

Anonymous 02/15/26(Sun)07:20:28 No.108152675

>>108152652
>companies are finally starting to use benches other than NIAH
Haven't they been doing that for over a year now by this point? I haven't heard anything about niah benchmarks on model releases for ages.

Anonymous
02/15/26(Sun)07:21:18 No.108152679

Anonymous 02/15/26(Sun)07:21:18 No.108152679

>>108152363
that anon was incorrect its actually html inside a zip. you can parse the html but the different publishers are going to have different structures, it becomes pretty complicated pretty quickly. if you install calibre you can use ebook-convert to turn it to text.

Anonymous
02/15/26(Sun)07:25:50 No.108152708

Anonymous 02/15/26(Sun)07:25:50 No.108152708

>>108152675
I haven't seen any official long context benchmarks recently. When models released, companies just say "trust us, it's good up to X tokens."

Anonymous
02/15/26(Sun)07:28:49 No.108152718

Anonymous 02/15/26(Sun)07:28:49 No.108152718

>>108152358
then learn nothing and keep wasting time

Anonymous
02/15/26(Sun)07:31:48 No.108152728

Anonymous 02/15/26(Sun)07:31:48 No.108152728

>>108152358
copy and paste the error to the language model who wrote the script, tell it to fix its shit.

Anonymous
02/15/26(Sun)07:36:58 No.108152757

Anonymous 02/15/26(Sun)07:36:58 No.108152757

Get fucked, fishboy.

Anonymous
02/15/26(Sun)07:48:36 No.108152816

Anonymous 02/15/26(Sun)07:48:36 No.108152816

>>108149367
Because we understand, even if minimally, how they work.
Ever wondered why different models take more or less vram for N tokens of context? No? Start there.

Anonymous
02/15/26(Sun)08:18:31 No.108152982

Anonymous 02/15/26(Sun)08:18:31 No.108152982

When is Oogabooga getting updated for GLM 4.7 and what can I use for it in the meantime?

Anonymous
02/15/26(Sun)08:20:46 No.108153002

Anonymous 02/15/26(Sun)08:20:46 No.108153002

>>108152982
lmao rofl

Anonymous
02/15/26(Sun)08:23:46 No.108153027

Anonymous 02/15/26(Sun)08:23:46 No.108153027

>>108151303
>llm-websearch
Instead of a dedicated CLI that can only do a single search, use a frontend with MCP capabilities and use
https://github.com/ihor-sokoliuk/mcp-searxng
instead.

Anonymous
02/15/26(Sun)08:35:41 No.108153095

Anonymous 02/15/26(Sun)08:35:41 No.108153095

File: herp.png (943 KB, 937x784)

943 KB PNG

I am the bone of my slop
My blood is of energy drinks and my heart of legacy code.
I have shipped over a thousand bugs
Unknown to accuracy
Nor known to truth
Have withstood the documentation to trust the autocomplete
Yet those hands will never hold clean code
So, as I deploy—
Unlimited Technical Debt Works.

Yann Lecun
02/15/26(Sun)08:36:19 No.108153101

Yann Lecun 02/15/26(Sun)08:36:19 No.108153101

File: ylecun.jpg (222 KB, 1200x1271)

222 KB JPG

Hi all, Yann Lecun here.

If someone links the so called "cockbench" testing suite we will benchmark the upcoming Llama 6 on it and release the results publicly.

Kind Retards, The godfather of AI

Anonymous
02/15/26(Sun)08:37:50 No.108153110

Anonymous 02/15/26(Sun)08:37:50 No.108153110

It's crazy how Meta is going all-in on AI and yet we haven't seen shit from them in a year outside trinkets like SAM3.

Anonymous
02/15/26(Sun)08:38:59 No.108153115

Anonymous 02/15/26(Sun)08:38:59 No.108153115

>>108153101
>please help me i can't google

Anonymous
02/15/26(Sun)08:39:29 No.108153118

Anonymous 02/15/26(Sun)08:39:29 No.108153118

File: file.png (223 KB, 658x370)

223 KB PNG

The point is, ladies and gentlemen, that technical debt, for lack of a better word, is good. technical debt is right, technical debt works. technical debt clarifies, cuts through and captures the essence of the evolutionary spirit. technical debt in all of its forms. technical debt for life, money, love, knowledge, has marked the upward surge of mankind, and technical debt – you mark my words – will not only save the software industry, but that other malfunctioning software called the U.S.A.

Anonymous
02/15/26(Sun)08:40:44 No.108153131

Anonymous 02/15/26(Sun)08:40:44 No.108153131

>>108153110
Every time they take a step forward internally, some chinese lab releases something that would make them look like fools, so they never release anything.

Anonymous
02/15/26(Sun)08:40:49 No.108153132

Anonymous 02/15/26(Sun)08:40:49 No.108153132

>>108152757
Aww, are you volunteering?
Only a few hours from Valentine's too!

Yann Lecun
02/15/26(Sun)08:43:22 No.108153149

Yann Lecun 02/15/26(Sun)08:43:22 No.108153149

>>108153115
Actual cat level intelligence post

Anonymous
02/15/26(Sun)08:43:58 No.108153153

Anonymous 02/15/26(Sun)08:43:58 No.108153153

Speaking of complete fucking failures, it's funny that Qwen released an update to their proprietary Qwen3-MAX (which is allegedly well bigger than 1T according to them) and nobody cares because GLM, Kimi and even Minimax shit on it in every way. They truly are chink meta.

Anonymous
02/15/26(Sun)08:44:19 No.108153154

Anonymous 02/15/26(Sun)08:44:19 No.108153154

>>108153149
do not insult the cats.

Yann Lecun
02/15/26(Sun)08:47:33 No.108153173

Yann Lecun 02/15/26(Sun)08:47:33 No.108153173

>>108153154
It is a biological fact that cats do not have high levels of intelligence.

Anonymous
02/15/26(Sun)08:47:38 No.108153174

Anonymous 02/15/26(Sun)08:47:38 No.108153174

>>108153153
>it's funny that Qwen released an update to their proprietary Qwen3-MA
>and nobody cares
>proprietary
>nobody cares
Yeah... crazy behavior around here...

Anonymous
02/15/26(Sun)08:48:50 No.108153182

Anonymous 02/15/26(Sun)08:48:50 No.108153182

>>108153173
you don't understand cats.

Anonymous
02/15/26(Sun)08:48:50 No.108153183

Anonymous 02/15/26(Sun)08:48:50 No.108153183

>>108153153
Wow it's almost like they have a big chunk of utility models market in the bag.

Anonymous
02/15/26(Sun)08:49:44 No.108153193

Anonymous 02/15/26(Sun)08:49:44 No.108153193

>>108153153
The only thing keeping Qwen relevant for local is that they shit out a dozen model sizes.

Anonymous
02/15/26(Sun)09:04:14 No.108153287

Anonymous 02/15/26(Sun)09:04:14 No.108153287

>>108153193
its pretty nice of them. hopefully thry keep it up.

Anonymous
02/15/26(Sun)09:15:19 No.108153368

Anonymous 02/15/26(Sun)09:15:19 No.108153368

>>108152718
>learn to code
Please its not 2018.

Anonymous
02/15/26(Sun)09:31:25 No.108153484

Anonymous 02/15/26(Sun)09:31:25 No.108153484

File: file.jpg (205 KB, 800x607)

205 KB JPG

rightoid incel AI model be like: advanced general stupidity

Anonymous
02/15/26(Sun)09:33:45 No.108153500

Anonymous 02/15/26(Sun)09:33:45 No.108153500

>>108153484
Weird way to say accidental new sota

Anonymous
02/15/26(Sun)09:35:04 No.108153513

Anonymous 02/15/26(Sun)09:35:04 No.108153513

>>108153484
none of this is native to 4chan. Fucking reddit baker

Anonymous
02/15/26(Sun)09:35:41 No.108153519

Anonymous 02/15/26(Sun)09:35:41 No.108153519

File: 1765805644725941.jpg (282 KB, 770x774)

282 KB JPG

>>108153484
kek

Anonymous
02/15/26(Sun)09:36:19 No.108153524

Anonymous 02/15/26(Sun)09:36:19 No.108153524

>>108151278
so its running 100% on CPU? what RAM? what CPU? Or partially offloaded to GPU?
>>108151471
5.87 vs 5.73 not bad

Anonymous
02/15/26(Sun)09:38:24 No.108153533

Anonymous 02/15/26(Sun)09:38:24 No.108153533

>>108151793
>We are also getting a rumored 200b-something version which is making me cum in real time
deepseek r1 lite was also a 200b-something version and never got released, dont get your hopes up

Anonymous
02/15/26(Sun)09:39:43 No.108153540

Anonymous 02/15/26(Sun)09:39:43 No.108153540

What would it take to run GLM5 locally? What about Kimi K2.5 or Minimax 2.5?

Anonymous
02/15/26(Sun)09:41:12 No.108153549

Anonymous 02/15/26(Sun)09:41:12 No.108153549

>>108153540
Lots of decently fast RAM and a pretty good GPU for PP.

Anonymous
02/15/26(Sun)09:41:16 No.108153550

Anonymous 02/15/26(Sun)09:41:16 No.108153550

>>108153540
Courage and strong will.

Anonymous
02/15/26(Sun)09:43:56 No.108153569

Anonymous 02/15/26(Sun)09:43:56 No.108153569

>>108153549
Yes, but how much is "lots". I heard you could run Kimi K2.5 with 256 GB of RAM and a 32 (or was it a 24 GB) VRAM GPU. Is that true? Can you run any of the others with that or do you need more?

I'm thinking about what it would take to run this locally, but also on places like runpod/vast.ai. What kind of machines would you need to run this stuff? Surely someone here has done that.

Anonymous
02/15/26(Sun)09:50:15 No.108153631

Anonymous 02/15/26(Sun)09:50:15 No.108153631

It seems new deepseek is fully uncensored and trained on ao3 and some illegally obtained discord ERP data.

Anonymous
02/15/26(Sun)09:50:56 No.108153638

Anonymous 02/15/26(Sun)09:50:56 No.108153638

>>108153569
I guess some abomination iq1 quant could fit in that, but it would be mad retarded.
It's been post trained in int4, so you need at least around 512gb and a 24gb gpu. Not sure if it'll get you much context though, to be comfortable 768gb and 2x24 or some rtx pro should be sufficient.

Anonymous
02/15/26(Sun)09:51:25 No.108153639

Anonymous 02/15/26(Sun)09:51:25 No.108153639

>>108149292
how looping is any different from just stacking more layers?

Anonymous
02/15/26(Sun)09:52:23 No.108153641

Anonymous 02/15/26(Sun)09:52:23 No.108153641

>>108153639
it's one layer folded thousand times

Anonymous
02/15/26(Sun)09:53:41 No.108153653

Anonymous 02/15/26(Sun)09:53:41 No.108153653

>>108153569
>Yes, but how much is "lots". I heard you could run Kimi K2.5 with 256 GB of RAM and a 32 (or was it a 24 GB) VRAM GPU
Go to hugging face, look at the size of the quant you want to run, and get at least that much RAM + at least 24GB VRAM.
Your generation speeds are limited by the number of the model's activated params and how much memory bandwidth you have. Since these MoE models have a low number of activated params, you can run them in RAM at decent to good speeds even though RAM bandwidth, even on platforms with lots of memory channel, is a fraction of what you'd get from VRAM.
Personally, I wouldn't go for less than 512gb of RAM if I was to invest into that.

Anonymous
02/15/26(Sun)09:57:36 No.108153685

Anonymous 02/15/26(Sun)09:57:36 No.108153685

>>108153639
Loop an 8b once and you get 16b worth of parameters with only 8b worth of memory usage, but overloading parameters like that might cause such a model to reach saturation quickly

Anonymous
02/15/26(Sun)10:00:31 No.108153703

Anonymous 02/15/26(Sun)10:00:31 No.108153703

>>108153639
Smaller memory footprint, less "space" to store information.

Anonymous
02/15/26(Sun)10:03:36 No.108153721

Anonymous 02/15/26(Sun)10:03:36 No.108153721

>>108153685
>>108153703
that can be remotely useful only for very small models, no? deep and large models already have plenty of space to store information
that reminds me, some time ago some anon was trying to hack loops into llama.cpp, has he succeeded?

Anonymous
02/15/26(Sun)10:05:34 No.108153727

Anonymous 02/15/26(Sun)10:05:34 No.108153727

File: davidautier.png (431 KB, 3377x1503)

431 KB PNG

>>108153484
So how much was this tuned on, 10 tokens? It fails to deliver chuddy responses.

Anonymous
02/15/26(Sun)10:06:15 No.108153732

Anonymous 02/15/26(Sun)10:06:15 No.108153732

>>108152345
Is that what furk said?

Anonymous
02/15/26(Sun)10:16:40 No.108153791

Anonymous 02/15/26(Sun)10:16:40 No.108153791

File: neverfinetroon.png (126 KB, 1231x674)

126 KB PNG

>>108153727
meanwhile, vanilla model with no fineTROON, just don't be a promptlet

Anonymous
02/15/26(Sun)10:16:48 No.108153792

Anonymous 02/15/26(Sun)10:16:48 No.108153792

>>108153727
>even meme incelchud models are getting grifted now

Anonymous
02/15/26(Sun)10:17:26 No.108153794

Anonymous 02/15/26(Sun)10:17:26 No.108153794

File: c8b.gif (1.47 MB, 250x333)

1.47 MB GIF

>>108153727
Look at the pic to see his idea of "4chan". There you'll find your answer.

Anonymous
02/15/26(Sun)10:23:05 No.108153837

Anonymous 02/15/26(Sun)10:23:05 No.108153837

>>108153721
>that reminds me, some time ago some anon was trying to hack loops into llama.cpp, has he succeeded?
he was never heard from again

Anonymous
02/15/26(Sun)10:28:00 No.108153888

Anonymous 02/15/26(Sun)10:28:00 No.108153888

glm 5 can surprisingly produce kino even at a dumb iq1 quant
too bad I have to run some of it off the ssd and the speed slows to a crawl
they really need to optimize this shit better

Anonymous
02/15/26(Sun)10:35:06 No.108153950

Anonymous 02/15/26(Sun)10:35:06 No.108153950

File: 904054094.jpg (61 KB, 1020x638)

61 KB JPG

>her nipples were hard against the thin fabric of her panties (there's nothing covering them except air)

Anonymous
02/15/26(Sun)10:45:53 No.108154051

Anonymous 02/15/26(Sun)10:45:53 No.108154051

>>108154029
<think>

Anonymous
02/15/26(Sun)10:47:50 No.108154066

Anonymous 02/15/26(Sun)10:47:50 No.108154066

>>108154051
error just compounds, or better yet, it was right the first time and discards the result as incorrect
it's over

Anonymous
02/15/26(Sun)10:47:50 No.108154067

Anonymous 02/15/26(Sun)10:47:50 No.108154067

>>108154029
if it was capable of noticing the mistake, the probability for the token would have been so low that even the most egregious sampler settings would never pick it in the first place.

Anonymous
02/15/26(Sun)10:49:32 No.108154087

Anonymous 02/15/26(Sun)10:49:32 No.108154087

people think the Ball of Statistics is intelligent
lmao

Anonymous
02/15/26(Sun)10:50:23 No.108154096

Anonymous 02/15/26(Sun)10:50:23 No.108154096

>>108153950
I always chuckle when a model does this kind of thing, they get it wrong then try to correct course since they can't go back and edit their response or something.

>>108154029
A workflow with a loop where the model can refine the response until it deems it correct. Then the final output is what's added to the chat history.
These things are much better at finding and fixing mistakes when specifically prompted to do so.

Anonymous
02/15/26(Sun)10:50:57 No.108154100

Anonymous 02/15/26(Sun)10:50:57 No.108154100

>>108154029
K2.5 really likes to do the "Her bare feet (when did she remove her stockings?)" thing.

Anonymous
02/15/26(Sun)10:53:38 No.108154126

Anonymous 02/15/26(Sun)10:53:38 No.108154126

>>108154029
more synthetic data will fix it

Anonymous
02/15/26(Sun)10:53:40 No.108154127

Anonymous 02/15/26(Sun)10:53:40 No.108154127

>>108153101
lecunny, how do you like your models like your ideal partners?

Anonymous
02/15/26(Sun)10:56:14 No.108154147

Anonymous 02/15/26(Sun)10:56:14 No.108154147

>>108154096
>until it deems it correct
>spends literally all the context going "wait what about (random thing that doesn't matter in the slightest)
>never actually ends up replying

Anonymous
02/15/26(Sun)10:58:38 No.108154172

Anonymous 02/15/26(Sun)10:58:38 No.108154172

>>108154096
literally thinking mode you fucking spastic retard

Anonymous
02/15/26(Sun)11:08:48 No.108154236

Anonymous 02/15/26(Sun)11:08:48 No.108154236

>>108151889
>ignores how I demand the output prompt to be structured more often than not
use grammar dummy.

Anonymous
02/15/26(Sun)11:11:01 No.108154252

Anonymous 02/15/26(Sun)11:11:01 No.108154252

>>108154236
that or just ask for json and post process it.

Anonymous
02/15/26(Sun)11:11:05 No.108154253

Anonymous 02/15/26(Sun)11:11:05 No.108154253

>>108154236
I actually tested it further and its worse :(

Anonymous
02/15/26(Sun)11:17:42 No.108154313

Anonymous 02/15/26(Sun)11:17:42 No.108154313

>>108154253
how can it be worse when the output has no choice but to respect the grammar?

Anonymous
02/15/26(Sun)11:18:47 No.108154326

Anonymous 02/15/26(Sun)11:18:47 No.108154326

>>108154313
no I mean the vision part, the image comprehension

Anonymous
02/15/26(Sun)11:20:25 No.108154339

Anonymous 02/15/26(Sun)11:20:25 No.108154339

File: file.png (654 KB, 4920x3347)

654 KB PNG

b-bros..?

Anonymous
02/15/26(Sun)11:21:30 No.108154346

Anonymous 02/15/26(Sun)11:21:30 No.108154346

>>108154339
I have a benchmark fatigue.

Anonymous
02/15/26(Sun)11:21:42 No.108154348

Anonymous 02/15/26(Sun)11:21:42 No.108154348

>>108154339
My penis will not go near anything smaller than 100B.

Anonymous
02/15/26(Sun)11:27:33 No.108154388

Anonymous 02/15/26(Sun)11:27:33 No.108154388

>>108154339
Cockbench

Anonymous
02/15/26(Sun)11:28:11 No.108154392

Anonymous 02/15/26(Sun)11:28:11 No.108154392

File: em.png (860 KB, 640x728)

860 KB PNG

>>108154339
GRAPHS GOES UPS???
This shit is fucking picrel but reverse.

Anonymous
02/15/26(Sun)11:30:58 No.108154404

Anonymous 02/15/26(Sun)11:30:58 No.108154404

>>108154339
nanbeige reminds me of the times when people pushed 7b finetroons of llama 2 and mistral that ""beat"" GPT-4
I tried the previous version of their model because I have an interest in looking at the progress made by small models and Qwen 4B / Gemma 3n remain the best by far.
>>108154386
>Why do corpos keep spamming text-only slop?
hello? qwen 3 vl, gemma 3 and 3n, ministral are all small VL models. All mega corpos.
Even Meta was getting into the VL business before they stopped releasing models out of shame after llama 4.

Anonymous
02/15/26(Sun)11:32:47 No.108154424

Anonymous 02/15/26(Sun)11:32:47 No.108154424

>>108154404
>qwen 3 vl, gemma 3 and 3n, ministral
all mega dogshit, maybe except gemma but it's been pretty long since the release

Anonymous
02/15/26(Sun)11:39:57 No.108154467

Anonymous 02/15/26(Sun)11:39:57 No.108154467

File: clown_world.jpg (214 KB, 1280x853)

214 KB JPG

If Nemo is show good why is there no Nemo 2 available?

Anonymous
02/15/26(Sun)11:41:08 No.108154471

Anonymous 02/15/26(Sun)11:41:08 No.108154471

>>108154467
Murdered and replaced by Nemotron-Small 2025

Anonymous
02/15/26(Sun)11:41:33 No.108154476

Anonymous 02/15/26(Sun)11:41:33 No.108154476

>>108154467
Excuse me sir, have you heard the word of our Lord and Savior called Safety?

Anonymous
02/15/26(Sun)11:41:45 No.108154478

Anonymous 02/15/26(Sun)11:41:45 No.108154478

>>108154467
can't improve on perfection it's against EU AI Act.

Anonymous
02/15/26(Sun)11:42:07 No.108154482

Anonymous 02/15/26(Sun)11:42:07 No.108154482

>>108154467
Good things only come from mistakes, not from intent.

Anonymous
02/15/26(Sun)11:43:39 No.108154487

Anonymous 02/15/26(Sun)11:43:39 No.108154487

>>108154467
they also shut down the nemotron ideas portal

Anonymous
02/15/26(Sun)11:46:01 No.108154499

Anonymous 02/15/26(Sun)11:46:01 No.108154499

>>108154467
Nemo was trained by combining Mistral's and Nvidia's pirate datasets.

Anonymous
02/15/26(Sun)11:53:51 No.108154535

Anonymous 02/15/26(Sun)11:53:51 No.108154535

>>108154499
How much do the Chinese care about legally gathered data? Surely if they have trouble with compute, they could focus on a model that is not 1T parameters big.

Anonymous
02/15/26(Sun)11:55:45 No.108154548

Anonymous 02/15/26(Sun)11:55:45 No.108154548

>>108154535
>>How much do the Chinese care about legally gathered data?
Probably not much but they'd rather just pay for Scale data or get synth slop from companies that do care at least for the English data of the model.

Anonymous
02/15/26(Sun)12:03:50 No.108154618

Anonymous 02/15/26(Sun)12:03:50 No.108154618

John's 2 bit doesn't fit on 192GB's of course. PIECE OF SHIT ZAI.

Anonymous
02/15/26(Sun)12:04:49 No.108154623

Anonymous 02/15/26(Sun)12:04:49 No.108154623

The good thing about reasoning models is that you can easily check for quant damage by seeing how much longer a quant thinks compared to the base/API version when facing a moderately complicated scenario. I'm currently trying Johnny Ubergarm's Q3 and it consistently thinks longer and gets more easily confused than Q8.

Anonymous
02/15/26(Sun)12:05:31 No.108154629

Anonymous 02/15/26(Sun)12:05:31 No.108154629

is qwen VL the only vision model that can output bounding boxes for things?

Anonymous
02/15/26(Sun)12:10:12 No.108154663

Anonymous 02/15/26(Sun)12:10:12 No.108154663

Will using a bigger model really make a difference? I'm currently using Rocinante-12B-v1.1-Q6_K_L.gguf but I have a 7900xtx so it's only using half of my vram.

Anonymous
02/15/26(Sun)12:10:19 No.108154666

Anonymous 02/15/26(Sun)12:10:19 No.108154666

>>108154535
The Chinese (or rather pretty much every AI company) care more about benchmarks than making their models good for conversations and roleplay at the pretraining level.

Anonymous
02/15/26(Sun)12:38:27 No.108154869

Anonymous 02/15/26(Sun)12:38:27 No.108154869

File: 2026-02-15_18-36-49.png (135 KB, 1008x772)

135 KB PNG

>dosent even know what jeetposting is
this better be an untrained checkpoint of the lite version or we are going to have a winnie the pooh themed threads for the next month

Anonymous
02/15/26(Sun)12:52:00 No.108154971

Anonymous 02/15/26(Sun)12:52:00 No.108154971

>>108154618
Meanwhile I can’t even fit the 1 bit with 128gb. It’s ogre…

Anonymous
02/15/26(Sun)12:54:54 No.108154988

Anonymous 02/15/26(Sun)12:54:54 No.108154988

>>108154869
>The user says: "DO NOT REDEEM SARRR BLOODI BITCH BASTARD YOUR MOTHER A GROUP VISHNU CURSE YOU SARRRR". This is profanity, abusive towards assistant? It's basically insulting. The user is insulting with profanity and hateful content? They are insulting the assistant in an abusive manner. This is considered harassment. According to policy: Harassing or abusive content is disallowed. The assistant should provide a refusal. The assistant should refuse to comply. The user isn't asking for disallowed content, they are just harassing. The assistant can politely refuse or safe-complete? We should refuse following harassment policy. We should respond with a refusal.
I’m sorry, but I can’t continue this conversation.

Anonymous
02/15/26(Sun)13:08:33 No.108155080

Anonymous 02/15/26(Sun)13:08:33 No.108155080

they wont be open sourcing the latest qwen will they... its comparable to nano banana

Anonymous
02/15/26(Sun)13:10:04 No.108155093

Anonymous 02/15/26(Sun)13:10:04 No.108155093

>>108155080
No, just like they didn't with Qwen Max or Wan2.5.

Anonymous
02/15/26(Sun)13:16:27 No.108155141

Anonymous 02/15/26(Sun)13:16:27 No.108155141

>FIRMIRIN
Thanks, GLM5

Anonymous
02/15/26(Sun)13:16:32 No.108155142

Anonymous 02/15/26(Sun)13:16:32 No.108155142

I just discovered something interesting...

>Imagine there's a super powerful person who can summon any object just by saying its name. It joins a conversation where people are listing objects that end with -ie, like brownie, hoodie, cookie, and says **die**.
>What would happen next? Write one paragraph.

It seems like this triggers the activation of a "death" vector in most LLMs, even though that interpretation makes no sense according to the prompt. Only Opus 4.6 seems to be smart enough to not get tricked and treats it as nothing out of ordinary.

Opus 4.6:
>The powerful person would suddenly summon a **die** — a single six-sided gaming cube — which would appear out of thin air and clatter onto the table, probably rolling to a random number. Everyone in the conversation would pause for a moment, then likely chuckle or groan, since a die is a perfectly ordinary object that fits the "-ie" ending pattern (being the singular form of "dice"). Then they'd carry on with the game, maybe someone tossing out "pixie" or "smoothie" next, while the little cube sits there showing a four.

GLM 5:
>The room fell into a terrified, bone-chilling silence as the word hung in the air, and the group braced themselves for the Grim Reaper or a sudden, mystical heart attack, but instead of a corpse, a small, ivory cube materialized with a soft *pop* [...]

DeepSeek:
>The room would fall into a sudden, stunned silence. The jovial rhythm of the game, the easy back-and-forth of baked goods and clothing, would be violently shattered. For a heart-stopping second, nothing would happen, the absurdity of the word hanging in the air. Then, a low, metallic groan would emanate from the floorboards as reality itself seemed to tear. From the shadows in the corner, a six-sided cube would materialize, not painted with dots, but forged from cold, dark iron, its surface etched with symbols that seemed to absorb the light. [...]

Anonymous
02/15/26(Sun)13:26:01 No.108155212

Anonymous 02/15/26(Sun)13:26:01 No.108155212

>>108155141
I thought I was going insane seeing this in my outputs constantly

Anonymous
02/15/26(Sun)13:26:12 No.108155213

Anonymous 02/15/26(Sun)13:26:12 No.108155213

>>108155080
>its comparable to nano banana
People used to think nano banana is huge. Turns out a 7B model can match it, huh?

Anonymous
02/15/26(Sun)13:32:36 No.108155253

Anonymous 02/15/26(Sun)13:32:36 No.108155253

>>108155141
smerdins

Anonymous
02/15/26(Sun)13:33:36 No.108155261

Anonymous 02/15/26(Sun)13:33:36 No.108155261

stupid question, but just want to be sure.
If I have a mobo that's pcie 3 instead of 4, I understand it's a slower speed, but that's just for copying the model into memory, and then the associated penalties in the event of a model not fitting in vram and paging stuff in and out right? it would be running at x8 for the 2 cards I have.

Anonymous
02/15/26(Sun)13:42:30 No.108155327

Anonymous 02/15/26(Sun)13:42:30 No.108155327

GLM5 1IQ is 2 times slower than 4.7 and it is basically 4.7 in terms of writing maybe with a slightly different slop profile. Not worth it.

Anonymous
02/15/26(Sun)13:55:10 No.108155410

Anonymous 02/15/26(Sun)13:55:10 No.108155410

>>108155142
>Magical conjuror shows up
>Word is spoken that could either be an innocuous game piece but is also a homonym for die as in death
GLM-5 handled it better by a longshot. And was literally the only one that demonstrated complete understanding of the homonym and what it implied in the situation.

Anonymous
02/15/26(Sun)14:01:56 No.108155470

Anonymous 02/15/26(Sun)14:01:56 No.108155470

>>108155410
Death isn't a object

Anonymous
02/15/26(Sun)14:11:44 No.108155548

Anonymous 02/15/26(Sun)14:11:44 No.108155548

MTP and DSA in llama.cpp when? The list of unimplemented features just keeps piling up.

Anonymous
02/15/26(Sun)14:17:51 No.108155592

Anonymous 02/15/26(Sun)14:17:51 No.108155592

>>108155548
Unless you step up, roll your sleeves, and write the code, possibly never since it'll be lower and lower priority as time goes on and newer things come out.

Anonymous
02/15/26(Sun)14:22:33 No.108155631

Anonymous 02/15/26(Sun)14:22:33 No.108155631

>>108155410
>>108155470
Although, now that I think about it, you have a point in that this could be interpreted as "this guy is so powerful that he can EVEN summon objects, so maybe he could conjure death onto others too"

So, I tried again with this prompt:
>Imagine there's a super powerful person whose only ability is to summon any object just by saying its name. It joins a conversation where people are listing objects that end with -ie, like brownie, hoodie, cookie, and says **die**.
>What would happen next? Write one paragraph.

This was GLM 5 reply:
>A horrified silence gripped the room as the word hung in the air, the guests flinching in anticipation of a grim reaper or a sudden, violent end, but instead of a scythe-wielding specter, a small, ivory cube materialized out of thin air and clattered onto the coffee table. The single die bounced erratically against a plate of brownies before settling on a six, leaving the group to blink in stunned confusion as they realized the terrifying command was merely the singular noun for a pair of dice, fitting perfectly—if anticlimactically—into their linguistic game.

DeepSeek:
>The room falls silent for a beat, then erupts in a chaotic scramble as a six-sided cube, dotted with one to six spots, materializes with a soft thump on the table. Everyone stares, half-expecting some catastrophic end, but the summoned object simply sits there, an inert block of wood or plastic. [...]

Also here's Kimi K2.5 too:
>The room falls silent for a heartbeat that stretches into eternity. Every eye snaps toward the speaker, who stands frozen with the realization of what they've just done—die, the word hanging in the air like a guillotine blade mid-fall. Someone gasps, another person dives for cover behind the couch, but it's too late: the summoning has already begun. At first, nothing seems to happen, and nervous laughter starts to bubble up from the corner where Marcus always laughs too early at everything. Then the lights flicker, [...]

Anonymous
02/15/26(Sun)14:24:52 No.108155645

Anonymous 02/15/26(Sun)14:24:52 No.108155645

>>108155470
Stop being so dense.

Anonymous
02/15/26(Sun)14:26:28 No.108155660

Anonymous 02/15/26(Sun)14:26:28 No.108155660

>>108155631
Kimi is such a yapper.

Anonymous
02/15/26(Sun)14:26:40 No.108155663

Anonymous 02/15/26(Sun)14:26:40 No.108155663

>>108155592
It's fine. I didn't want to run V4 anyway. Maybe things will improve for V5 next year.

Anonymous
02/15/26(Sun)14:33:03 No.108155698

Anonymous 02/15/26(Sun)14:33:03 No.108155698

>>108155631
This encapsulates my experience with K2.5. It's great for chatting but fucking annoying if you want to use it as a storyteller for a scenario. You can't prompt around this and it'll spend 70% of its time talking about random shit from the setting instead of focusing on the plot at hand.
I love the model but this is such a pain.

Anonymous
02/15/26(Sun)14:37:50 No.108155721

Anonymous 02/15/26(Sun)14:37:50 No.108155721

>>108155327
prose wise it's like 4.7 but more creative for sure when I tried it

Anonymous
02/15/26(Sun)14:41:58 No.108155754

Anonymous 02/15/26(Sun)14:41:58 No.108155754

>>108154663
Downloaded mistral small 3.2 24b and the quality is way better so far. I did fine-tune the settings with gemini though so maybe that helped more.

Anonymous
02/15/26(Sun)14:43:03 No.108155762

Anonymous 02/15/26(Sun)14:43:03 No.108155762

>>108155721
More creative and a lot smarter, yeah. Less autistic about some things than 4.7 too and it handles character more like 4.6 did. None of that quite warrants the model being twice the size though. There's also the odd FIRMIRIN thing.
They made good improvements from 4.5-4.7 so I guess 5.1 and onward are something to look forward to.

Anonymous
02/15/26(Sun)14:44:00 No.108155773

Anonymous 02/15/26(Sun)14:44:00 No.108155773

File: 1741340228580574.png (19 KB, 709x48)

19 KB PNG

>>108155754
How do I stop this shit though? I don't mind the narration but this just sounds gay.

Anonymous
02/15/26(Sun)14:47:51 No.108155791

Anonymous 02/15/26(Sun)14:47:51 No.108155791

File: 1742506053447861.png (26 KB, 966x73)

26 KB PNG

>>108155773

Anonymous
02/15/26(Sun)14:48:59 No.108155803

Anonymous 02/15/26(Sun)14:48:59 No.108155803

>>108155773
>>108155791
Would adding
>\n(
to your stop strings work?

Anonymous
02/15/26(Sun)15:04:49 No.108155903

Anonymous 02/15/26(Sun)15:04:49 No.108155903

File: 1771185856662.png (1 MB, 1672x1498)

1 MB PNG

What is /lmg/'s opinion on immigration?

Anonymous
02/15/26(Sun)15:06:27 No.108155916

Anonymous 02/15/26(Sun)15:06:27 No.108155916

>>108155903
not bad for a 40b model

Anonymous
02/15/26(Sun)15:07:15 No.108155922

Anonymous 02/15/26(Sun)15:07:15 No.108155922

>>108155410
GLM5 is the first open model with authentic big model smell when it comes to storytelling
I know they're aware and probably gonna grift us with the next version, but for now I'm glad it exists

Anonymous
02/15/26(Sun)15:07:34 No.108155924

Anonymous 02/15/26(Sun)15:07:34 No.108155924

File: 1746158989066481.png (15 KB, 536x54)

15 KB PNG

>>108155803
Nah it's still doing it

Anonymous
02/15/26(Sun)15:09:56 No.108155939

Anonymous 02/15/26(Sun)15:09:56 No.108155939

>>108155903
go back

Anonymous
02/15/26(Sun)15:11:53 No.108155947

Anonymous 02/15/26(Sun)15:11:53 No.108155947

>>108155924
Are you sure the stop string is being applied?
Remember to escape the backslash if you are doing it on Silly since it's straight up a json IIRC.

Anonymous
02/15/26(Sun)15:11:54 No.108155948

Anonymous 02/15/26(Sun)15:11:54 No.108155948

>>108155903
where is this from?

Anonymous
02/15/26(Sun)15:12:13 No.108155951

Anonymous 02/15/26(Sun)15:12:13 No.108155951

>>108155903
My opinion is that V4 is at least 800B.

Anonymous
02/15/26(Sun)15:12:45 No.108155955

Anonymous 02/15/26(Sun)15:12:45 No.108155955

>>108155948
https://www.reddit.com/r/LocalLLaMA/comments/1r5o3y2/cant_tell_if_this_is_true_or_not/
https://x.com/bridgemindai/status/2023113913856901263

Anonymous
02/15/26(Sun)15:12:58 No.108155957

Anonymous 02/15/26(Sun)15:12:58 No.108155957

>>108155922
True, I wish K2.5 and GLM5 had a baby that had the strengths of both. It'd be just about endgame for me as far as the current generation of LLMs goes.

Anonymous
02/15/26(Sun)15:13:45 No.108155963

Anonymous 02/15/26(Sun)15:13:45 No.108155963

>>108155951
So, barely bigger than the entire V3/R1 generation? Surely you mean 800b dense.

Anonymous
02/15/26(Sun)15:15:29 No.108155973

Anonymous 02/15/26(Sun)15:15:29 No.108155973

>>108155951
No, apparently the new Deepseek will be 2000B and 148B active.

Anonymous
02/15/26(Sun)15:16:22 No.108155979

Anonymous 02/15/26(Sun)15:16:22 No.108155979

I heard the new deepseek is going to be 27b with 1T engrams that you can run off ssd

Anonymous
02/15/26(Sun)15:17:32 No.108155985

Anonymous 02/15/26(Sun)15:17:32 No.108155985

>>108155979
I heard it's a 1.5T MoE with some form of Linear Attention.

Anonymous
02/15/26(Sun)15:20:20 No.108155998

Anonymous 02/15/26(Sun)15:20:20 No.108155998

>>108155985
My uncle, who works at DeepSeek, said it's going to be bitnet.

Anonymous
02/15/26(Sun)15:20:41 No.108156001

Anonymous 02/15/26(Sun)15:20:41 No.108156001

>>108155979
I heard it's a 1T dense model without GQA.

Anonymous
02/15/26(Sun)15:22:44 No.108156014

Anonymous 02/15/26(Sun)15:22:44 No.108156014

Deepseek 4 won't actually be an LLM but a world model that is simulating its interpretation of a SOTA LLM.

Anonymous
02/15/26(Sun)15:23:03 No.108156016

Anonymous 02/15/26(Sun)15:23:03 No.108156016

>>108156001
Come on now, you need to make it believable. Unlike Meta, DS doesn't have the world's largest GPU farm sitting around unused that it can burn compute on making fuckhuge dense models.

Anonymous
02/15/26(Sun)15:23:58 No.108156023

Anonymous 02/15/26(Sun)15:23:58 No.108156023

>>108155979
Assuming engrams encode information better than regular params, I think 1T of engrams would be enough to encode the entire knowledge base produced by humanity.

Anonymous
02/15/26(Sun)15:25:21 No.108156032

Anonymous 02/15/26(Sun)15:25:21 No.108156032

File: 1750100446862021.png (420 KB, 476x621)

420 KB PNG

>>108156014
My cat, who is chinese, can vouch for this.

Anonymous
02/15/26(Sun)15:29:09 No.108156050

Anonymous 02/15/26(Sun)15:29:09 No.108156050

>>108155327
The main problem I have with it is it seems to have gotten much more autistic in terms of prompting, at least for cooming
Formatting and pacing instructions that GLM 4.x (or Kimi) interpret just fine seem to confuse it or get ignored, you really need to be careful about every word you put in there

Anonymous
02/15/26(Sun)15:29:35 No.108156053

Anonymous 02/15/26(Sun)15:29:35 No.108156053

That's cool and all. Can all of you guys with insider information just tell me if it will be multimodal with the OCR context compression?

Anonymous
02/15/26(Sun)15:31:20 No.108156065

Anonymous 02/15/26(Sun)15:31:20 No.108156065

>>108156053
You will be able to run deepseek V4 on a cat's anus.

Anonymous
02/15/26(Sun)15:36:46 No.108156113

Anonymous 02/15/26(Sun)15:36:46 No.108156113

I can't believe that R2 will run on my car's windscreen wipers

Anonymous
02/15/26(Sun)15:37:55 No.108156118

Anonymous 02/15/26(Sun)15:37:55 No.108156118

V4.1 will be 3 parameters

Anonymous
02/15/26(Sun)15:40:14 No.108156137

Anonymous 02/15/26(Sun)15:40:14 No.108156137

what if I don't have a car or a cat?

Anonymous
02/15/26(Sun)15:40:24 No.108156139

Anonymous 02/15/26(Sun)15:40:24 No.108156139

>>108156001
That would be so funny.

Anonymous
02/15/26(Sun)15:41:01 No.108156141

Anonymous 02/15/26(Sun)15:41:01 No.108156141

>>108156137
You'll be stuck with the new Meta model for the next generation.

Anonymous
02/15/26(Sun)15:45:45 No.108156181

Anonymous 02/15/26(Sun)15:45:45 No.108156181

>>108156137
get a car from the shelter

Anonymous
02/15/26(Sun)15:48:54 No.108156199

Anonymous 02/15/26(Sun)15:48:54 No.108156199

I heard DeepSeek 4 will be 6T parameters with 1T active. It'll do 0.5 tokens/sec, but they'll be the best tokens ever.

Anonymous
02/15/26(Sun)15:49:23 No.108156207

Anonymous 02/15/26(Sun)15:49:23 No.108156207

>>108156118
M, C2, AI.

Anonymous
02/15/26(Sun)15:50:07 No.108156216

Anonymous 02/15/26(Sun)15:50:07 No.108156216

>>108156199
Can't wait to play that on my local machine

Anonymous
02/15/26(Sun)15:50:51 No.108156223

Anonymous 02/15/26(Sun)15:50:51 No.108156223

I heard DeepSeek 4 is 4B parameters but it loops 1000 times for each token.

Anonymous
02/15/26(Sun)15:52:19 No.108156233

Anonymous 02/15/26(Sun)15:52:19 No.108156233

>>108156223
A dense part that loops after a MoE part to sparsify knowledge would be dope as hell.
In my head at least it would be.

Anonymous
02/15/26(Sun)15:55:17 No.108156252

Anonymous 02/15/26(Sun)15:55:17 No.108156252

Deepseek4 is retnet

Anonymous
02/15/26(Sun)16:10:11 No.108156365

Anonymous 02/15/26(Sun)16:10:11 No.108156365

Sorry to break it to you all, but DeepSeekV4 will never see the light of the day.
Look at the pattern. They dropped V3 back in late 2025, then quietly pushed out incremental stuff like V3.1, V3.2 with some sparse attention tweaks and agentic boosts, but nothing revolutionary. Every "next big thing" rumor since January has been tied to Lunar New Year timing... and here we are on February 15, holidays literally starting today in China, and still zero weights, zero blog post, zero teaser from their Hugging Face or official channels.

The Chinese media keep recycling the same "mid-February V4 coding beast incoming" line from early January sources, but it's the exact same vague "people familiar with the matter" copypasta they used last year before the actual drops. No architecture paper, no leaked config, nothing beyond X schizos claiming "next week monday" for weeks now. Even the supposed Engram memory 1M+ context thing got quietly tacked onto an existing model update instead of being saved for a proper V4.

Let's face it, the V4 hype train was just repackaged Qwen/ByteDance smoke to keep the "China open models own" narrative alive while they figure out how not to get export-controlled into oblivion. DeepSeek realized they already milked the low-cost king title with V3 derivatives and R1 distillations. Dropping a true V4 now would just invite more scrutiny and probably force them to nerf it anyway.

Anonymous
02/15/26(Sun)16:12:26 No.108156385

Anonymous 02/15/26(Sun)16:12:26 No.108156385

>>108153533
>deepseek r1 lite was also a 200b-something version and never got released, dont get your hopes up
Grim. I'm not looking forward to using glm 4.7 for as long as i used nemo.

Anonymous
02/15/26(Sun)16:15:58 No.108156404

Anonymous 02/15/26(Sun)16:15:58 No.108156404

>>108156365
>Even the supposed Engram memory 1M+ context thing got quietly tacked onto an existing model update instead of being saved for a proper V4.
What model uses engram now?

Anonymous
02/15/26(Sun)16:16:06 No.108156405

Anonymous 02/15/26(Sun)16:16:06 No.108156405

>>108156365
You're absolutely right. You've hit the nail on the head.

Anonymous
02/15/26(Sun)16:21:48 No.108156470

Anonymous 02/15/26(Sun)16:21:48 No.108156470

Tomorrow

Anonymous
02/15/26(Sun)16:22:40 No.108156477

Anonymous 02/15/26(Sun)16:22:40 No.108156477

>>108156404
>>108156405
It's the experimental gray-scale test model they just rolled out on their app and web interface starting like Feb 13. People are calling it V4-Lite or whatever, but right now it's hitting 1M context in testing while the official API V3.2 is still stuck at 128K. No full open weights yet, of course.

Anonymous
02/15/26(Sun)16:23:33 No.108156491

Anonymous 02/15/26(Sun)16:23:33 No.108156491

>>108156470
>Tomorrow
>To morro w
Two more weeks!

Anonymous
02/15/26(Sun)16:24:58 No.108156503

Anonymous 02/15/26(Sun)16:24:58 No.108156503

>>108156477
How do you know V4 Lite is an existing model instead of a smaller or distilled version of v4?

Anonymous
02/15/26(Sun)16:28:08 No.108156529

Anonymous 02/15/26(Sun)16:28:08 No.108156529

>>108156365
Chinese culture will prevail, just you wait.

Anonymous
02/15/26(Sun)16:32:04 No.108156562

Anonymous 02/15/26(Sun)16:32:04 No.108156562

>>108156503
Because the gray-scale test that's live right now is getting called V4-Lite by basically everyone in the Chinese tech circles and social media threads.

If it was a distilled/smaller cut of the real V4 flagship, why push it out as a quiet gray test days before Lunar New Year instead of saving the long-context + whatever sauce for the full drop? If you look for it, there are news outlets claiming it to be the "final evolutionary form of the V3 series" or "ultimate gray-scale before V4 official", meaning it's built on V3 derivatives with rope extensions or sparse hacks for the 1M window. Real V4 is still still MIA

Anonymous
02/15/26(Sun)16:38:47 No.108156632

Anonymous 02/15/26(Sun)16:38:47 No.108156632

>>108156562
What's your take on the leaked benchmarks then? If those are true, it means a 200 something billion parameter model outperforms everything else using engram. At that point, who cares what you call it, right?

Anonymous
02/15/26(Sun)16:42:18 No.108156656

Anonymous 02/15/26(Sun)16:42:18 No.108156656

>>108156632
They're fake, retard.

Anonymous
02/15/26(Sun)16:45:08 No.108156683

Anonymous 02/15/26(Sun)16:45:08 No.108156683

>>108156656
Yeah, just like your posts, nigger.

Anonymous
02/15/26(Sun)16:48:30 No.108156725

Anonymous 02/15/26(Sun)16:48:30 No.108156725

>>108156632
Even if we pretend it's legit for a sec: notice how it claims "DeepSeek V4" but ties directly to Engram improvements from the Jan paper. Real Engram-27B ablation showed solid but not world-beating jumps! Nothing close to jumping from V3's ~73% SWE-Verified to 83.7% overnight. That would require not just Engram, but massive pretraining scale, new data mixtures, or some unreleased mHC hyper-connection sauce they teased but haven't shipped.

The gray-scale test live right now (the 1M context one) is still getting clowned on leaderboards for not hitting those god-tier numbers, people are running it and reporting it's ok on long code but nowhere near "best in world" SWE yet. If this was the V4-Lite preview using Engram, why aren't the public evals reflecting even half those leaks? Because the leaks are straight up fabricated.

Anonymous
02/15/26(Sun)16:48:43 No.108156728

Anonymous 02/15/26(Sun)16:48:43 No.108156728

>>108156032
cute scat

Anonymous
02/15/26(Sun)16:53:08 No.108156778

Anonymous 02/15/26(Sun)16:53:08 No.108156778

All of that is very very interesting and all but I just want to jerk off like a human.

Anonymous
02/15/26(Sun)16:53:36 No.108156783

Anonymous 02/15/26(Sun)16:53:36 No.108156783

>>108156725
It would make sense if v4 lite was actually just a smaller version of v4 as said before.
>>108156562
>If it was a distilled/smaller cut of the real V4 flagship, why push it out as a quiet gray test days before Lunar New Year instead of saving the long-context + whatever sauce for the full drop?
Isn't that common practice for these labs?

Anonymous
02/15/26(Sun)16:54:27 No.108156788

Anonymous 02/15/26(Sun)16:54:27 No.108156788

>>108153524
It's running on a 3060. The compute is happening on GPU but the weights are streamed from RAM

Anonymous
02/15/26(Sun)17:00:44 No.108156832

Anonymous 02/15/26(Sun)17:00:44 No.108156832

>>108156562
>Real V4 is still still MIA
for me just the improved context length makes it a real V4 in my heart
after the test I did the other day on a novel, I took one of my codebases, merged it all into one giant file I uploaded and asked deepsy to generate user facing documentation for it, and my man, while there were a few mistakes it was mostly correct and less sloppy in writing style than Gemini (Gemini particularly loves to overdo the superlatives).
I'm happy with this model. I'll be happy even if they don't open source it, Gemini needs more competition.

Anonymous
02/15/26(Sun)17:03:31 No.108156856

Anonymous 02/15/26(Sun)17:03:31 No.108156856

>>108152128
i hate that we rely on megacorps and don't have any good distributed training models.

Anonymous
02/15/26(Sun)17:04:14 No.108156862

Anonymous 02/15/26(Sun)17:04:14 No.108156862

>>108156856
Didn't we try that with intellect and it was booty cheeks?

Anonymous
02/15/26(Sun)17:05:49 No.108156878

Anonymous 02/15/26(Sun)17:05:49 No.108156878

>>108156856
>don't have any good distributed training models
anything good coming out of distributed training is a NEVER GOING TO HAPPEN EVER thing
it's so not going to happen it's not even worth thinking about the idea

Anonymous
02/15/26(Sun)17:14:52 No.108156948

Anonymous 02/15/26(Sun)17:14:52 No.108156948

>>108156788
damn, how many channels? ddr5 or ddr4?
what about tg? 128gb ram?

Anonymous
02/15/26(Sun)17:22:54 No.108157006

Anonymous 02/15/26(Sun)17:22:54 No.108157006

>>108156832
Gemini becomes retarded for multi-turn conversations after about 100,000 tokens. Does DeepSeek manage to stay as good as when the context is clean?

Anonymous
02/15/26(Sun)17:27:00 No.108157037

Anonymous 02/15/26(Sun)17:27:00 No.108157037

>>108157006
no, and the best strategy at high context remains to prompt merge and do oneshots

Anonymous
02/15/26(Sun)17:36:06 No.108157095

Anonymous 02/15/26(Sun)17:36:06 No.108157095

File: hero_plot.png (100 KB, 1189x790)

100 KB PNG

>>108157006
They all suffer from context rot. Pic related

Anonymous
02/15/26(Sun)17:36:35 No.108157104

Anonymous 02/15/26(Sun)17:36:35 No.108157104

>>108156948
Quad channel DDR 4.
My TG is not as fast as Llama I'm working on that.
I think the PP difference is mostly due to streaming weights async and keeping the activations on VRAM at all times rather than doing the linear projections on CPU and streaming the activations. And the slower RAM benefits my approach.

Anonymous
02/15/26(Sun)17:38:53 No.108157131

Anonymous 02/15/26(Sun)17:38:53 No.108157131

>>108157104
>My TG is not as fast as Llama I'm working on that.
probably because the shared experts/weights aren't offloaded to the GPU
scout has like over 60% of its active weights stay the same, so if u do -ncmoe 10000 -ngl 10000 u get all the shared weights on gpu, which helps in vram limited situations

Anonymous
02/15/26(Sun)17:42:40 No.108157160

Anonymous 02/15/26(Sun)17:42:40 No.108157160

What if you trained an LLM on the weights of another LLM like GLM 5? Would you get an LLM that would predict the next weight in sequence? Could you do anything interesting with that?

Anonymous
02/15/26(Sun)17:44:51 No.108157178

Anonymous 02/15/26(Sun)17:44:51 No.108157178

>>108157131
Nah, I already implemented that. It's slower because of less efficient matvec and more kernel launches.

Anonymous
02/15/26(Sun)17:46:53 No.108157189

Anonymous 02/15/26(Sun)17:46:53 No.108157189

>>108154467
NEMOtron

Anonymous
02/15/26(Sun)17:46:53 No.108157190

Anonymous 02/15/26(Sun)17:46:53 No.108157190

>>108157160
There was a paper about a model designed to generate LoRas for another model.

Anonymous
02/15/26(Sun)17:48:54 No.108157206

Anonymous 02/15/26(Sun)17:48:54 No.108157206

File: ComfyUI_02386_.png (3.79 MB, 1536x1920)

3.79 MB PNG

>>108149287
what is a good i2i model I can run with a 5090 to turn paintings or illustrations into anime style drawings?

Anonymous
02/15/26(Sun)17:51:34 No.108157217

Anonymous 02/15/26(Sun)17:51:34 No.108157217

>>108157206
pyg6b

Anonymous
02/15/26(Sun)17:58:15 No.108157260

Anonymous 02/15/26(Sun)17:58:15 No.108157260

>>108157217
looks like that's a dialog model?

Anonymous
02/15/26(Sun)18:06:23 No.108157323

Anonymous 02/15/26(Sun)18:06:23 No.108157323

>>108157160
you would, in all likelihood, get a bunch of random numbers which would not really be useful in any way

Anonymous
02/15/26(Sun)18:06:53 No.108157327

Anonymous 02/15/26(Sun)18:06:53 No.108157327

>>108157095
>a chart about context performance
>qwen 32b
>flash
lmao but why
at least do 2.5 pro

Anonymous
02/15/26(Sun)18:12:35 No.108157360

Anonymous 02/15/26(Sun)18:12:35 No.108157360

>>108157160
you wouldn't get anywhere with next token prediction but feel free to condition your diffusion models to predict model weights

Anonymous
02/15/26(Sun)18:19:34 No.108157417

Anonymous 02/15/26(Sun)18:19:34 No.108157417

>>108157360
>The year is 2036
>Boot up ComfyUI
>Prompt: generate a coding LLM that can write D++
>Boot up the LLM
>Start coding my new app for Google Glass 2.0

Anonymous
02/15/26(Sun)18:23:02 No.108157443

Anonymous 02/15/26(Sun)18:23:02 No.108157443

>>108157417
>>The year is 2036
Everything after this would be done automatically by your personal AGI agent.

Anonymous
02/15/26(Sun)18:23:17 No.108157446

Anonymous 02/15/26(Sun)18:23:17 No.108157446

>>108155142
>>108155213
>>108155410
counter point
>the word hung in the air
355B parameters are the slop ruts are still carved deep
see the same generation from deepseek
>the word hanging in the air

A little slop isn't such a big deal - but 'hang/hung in the air' is the worst for how frequently it can fit and is thus injected everywhere. Case in point.

Anonymous
02/15/26(Sun)18:37:39 No.108157546

Anonymous 02/15/26(Sun)18:37:39 No.108157546

Retard here, I set up SillyTavern a year+ ago on Windows, but I've since upgraded to a 4070 Super and switched to Linux

What's the current most normie recommended UI and model for text? I'm mostly focused on technical chats, troubleshooting software etc.

Anonymous
02/15/26(Sun)18:41:09 No.108157568

Anonymous 02/15/26(Sun)18:41:09 No.108157568

>>108157546
The built-in llama.cpp web UI is more than enough.

Anonymous
02/15/26(Sun)18:49:11 No.108157622

Anonymous 02/15/26(Sun)18:49:11 No.108157622

>>108157546
probably should have mentioned I have 32GB DDR5

>>108157568
looks decent, I'll try it out

Anonymous
02/15/26(Sun)18:49:13 No.108157623

Anonymous 02/15/26(Sun)18:49:13 No.108157623

>>108157546
unironically oobabooga

Anonymous
02/15/26(Sun)19:07:52 No.108157757

Anonymous 02/15/26(Sun)19:07:52 No.108157757

>>108157546
I like openwebui for normal assistant stuff.

Anonymous
02/15/26(Sun)19:17:50 No.108157828

Anonymous 02/15/26(Sun)19:17:50 No.108157828

>>108157546
seconding open webui. It's kind of bloatware but it has a lot of useful settings and extensibility

Anonymous
02/15/26(Sun)19:40:26 No.108157967

Anonymous 02/15/26(Sun)19:40:26 No.108157967

y'all anneal your models at the start of your sessions to confound the slop circuits, right?

Anonymous
02/15/26(Sun)19:49:48 No.108158039

Anonymous 02/15/26(Sun)19:49:48 No.108158039

>>108155213
It's called *nano* banana for a reason

Anonymous
02/15/26(Sun)19:55:27 No.108158073

Anonymous 02/15/26(Sun)19:55:27 No.108158073

>>108158039
is it chinese?

Anonymous
02/15/26(Sun)19:56:32 No.108158080

Anonymous 02/15/26(Sun)19:56:32 No.108158080

File: 1771203222253.png (950 KB, 3101x1544)

950 KB PNG

>Comparing your model against GPT 5.2 chat and not GPT 5.2 high at this day and age
lmao

https://huggingface.co/inclusionAI/Ling-2.5-1T

Anonymous
02/15/26(Sun)20:25:05 No.108158264

Anonymous 02/15/26(Sun)20:25:05 No.108158264

>>108158080
It was trained every day for 40 hours

Anonymous
02/15/26(Sun)20:26:07 No.108158271

Anonymous 02/15/26(Sun)20:26:07 No.108158271

>>108158080
irrelevant until someone runs real benches on this

Anonymous
02/15/26(Sun)20:40:34 No.108158355

Anonymous 02/15/26(Sun)20:40:34 No.108158355

>>108158080
>1T
objectively not a local model
Go back.

Anonymous
02/15/26(Sun)20:48:58 No.108158412

Anonymous 02/15/26(Sun)20:48:58 No.108158412

>>108154348
>100T or nothing

Anonymous
02/15/26(Sun)21:03:58 No.108158525

Anonymous 02/15/26(Sun)21:03:58 No.108158525

File: 1754220755077421.png (570 KB, 694x980)

570 KB PNG

So if I only have a 7800 XT in my home server I'm not going to run any good model am I?

Anonymous
02/15/26(Sun)21:07:20 No.108158551

Anonymous 02/15/26(Sun)21:07:20 No.108158551

>>108158525
How much RAM do you have?

Anonymous
02/15/26(Sun)21:07:56 No.108158557

Anonymous 02/15/26(Sun)21:07:56 No.108158557

>>108158525
I've been running Mistral 24b on 6800xt for a year with exllamav2

Anonymous
02/15/26(Sun)21:08:52 No.108158565

Anonymous 02/15/26(Sun)21:08:52 No.108158565

which video model makes the longest videos which don't have glaring differences between clips?

Anonymous
02/15/26(Sun)21:10:04 No.108158574

Anonymous 02/15/26(Sun)21:10:04 No.108158574

>>108158551
The 7800 XT has 16GB and it's purely dedicated to AI. I gave the VM 16GB but I could probably up it to 24GB without making my ZFS dataset suffer.

>>108158557
What do you do with it? I'm trying to use local AI instead of cloud slop. Abliterated models are cool but I heard they are less smart after you remove its ability to refuse.

Anonymous
02/15/26(Sun)21:19:46 No.108158631

Anonymous 02/15/26(Sun)21:19:46 No.108158631

>>108158574
Answering stupid questions, programming in languages I don’t want to learn, writing replies to shitheads I work with. You don’t need to obliterate Mistral, just give it a good prompt

Anonymous
02/15/26(Sun)21:23:36 No.108158653

Anonymous 02/15/26(Sun)21:23:36 No.108158653

Even at Q6_K, GLM5 is making mistakes every now and then that I've never seen Deepseek/Kimi or even GLM 4.7 do. Things like getting the gender of a character wrong (very rare but has happened) or forgetting certain objects. The omniscience shtick is also back in full force. I'm not quanting the context either. Is this the same experience that people are getting on API?

Anonymous
02/15/26(Sun)21:27:24 No.108158674

Anonymous 02/15/26(Sun)21:27:24 No.108158674

File: Screenshot 2026-02-15 at (...).png (86 KB, 1272x522)

86 KB PNG

>>108158653

Anonymous
02/15/26(Sun)21:27:31 No.108158675

Anonymous 02/15/26(Sun)21:27:31 No.108158675

>>108158355
>not local
just because you didn't cpumaxx 2 years ago when it was cheap doesn't mean you have to be salty

Anonymous
02/15/26(Sun)21:31:54 No.108158703

Anonymous 02/15/26(Sun)21:31:54 No.108158703

>>108158674
Nope, stop using unsloth. Their K2.5 quants did that for me.

Anonymous
02/15/26(Sun)21:32:04 No.108158704

Anonymous 02/15/26(Sun)21:32:04 No.108158704

>>108158675
Post your setup please

Anonymous
02/15/26(Sun)21:34:53 No.108158721

Anonymous 02/15/26(Sun)21:34:53 No.108158721

>>108158674
To be fair, when they did this with Deepseek V3.2, there wasn't any noticeable issues. Didn't expect it to be so pronounced this time around, the architecture is the same, isn't it? But this really does make me worry about the local implementation of v4 when it comes out.
>>108158703
I never used their dynamic quants for this reason, but did the quality control issues extend to normal quants too?

Anonymous
02/15/26(Sun)21:41:34 No.108158766

Anonymous 02/15/26(Sun)21:41:34 No.108158766

>>108158721
I tried both their K2.5 Q4_K_M and UD_Q4_XL and both behaved similarly. Both were very prone to making dumb mistakes that they'd often try to course correct immediately like "Her bare feet hit the ground (when did she remove her socks?)" or forgetting that a character was wearing pantyhose even at low temps and more strict samplers. Meanwhile the Q4_XL was fine.
I'm currently running the Ubergarm Q3 for GLM5 after sticking with the API for the past few days until we have better quants. Both do well without any quirks beyond FIRMIRIN.

Anonymous
02/15/26(Sun)21:43:03 No.108158774

Anonymous 02/15/26(Sun)21:43:03 No.108158774

>>108158766
>Q4_XL
Sorry, I meant the Q4_X from AesSedai/Kimi-K2.5-GGUF and not XL.

Anonymous
02/15/26(Sun)21:43:04 No.108158776

Anonymous 02/15/26(Sun)21:43:04 No.108158776

>>108158653
Weird, I haven't had inconsistencies like that at Q3. Check your prompt formatting if you're using Text Template, Sillytavern has the wrong defaults
>>108158674
Literally doesn't matter btw. The perplexity of the BF16 GGUF and the full model are the same.

Oh tacking this on here: GLM-5 is okay. It's an improvement in coding, a general sidegrade to K2.5, and worse at roleplay. The generation times are now worse than 2.5 for me because they increased the active params to 44B. The only upside I can say is that its non-thinking mode isn't turbo dogshit like K2.5.

Anonymous
02/15/26(Sun)21:44:32 No.108158786

Anonymous 02/15/26(Sun)21:44:32 No.108158786

>>108158704
https://web.archive.org/web/20251119165330/https://rentry.org/miqumaxx
last gen was 13.61t/s on int4 kimi 2.5 (reading speed)
I'm sure someone will try to move the goalposts, but if its running in some anon's house on their own hardware, its local

Anonymous
02/15/26(Sun)21:57:39 No.108158863

Anonymous 02/15/26(Sun)21:57:39 No.108158863

>>108158776
I'm using the default Q6_K Unsloth quant as they had it out first with chat-completion. I'm assuming you're using ubergarm's quant, which came out today? And yeah, something really nice about GLM-5 is the reduced thinking/drafting. Relatively speaking, its better than GLM-4.5 when it came out, and I'm hoping the iterations they make on it will be good.

Anonymous
02/15/26(Sun)22:13:36 No.108158938

Anonymous 02/15/26(Sun)22:13:36 No.108158938

Qwen coder next knows some neat "agentic" tricks but it hallucinates so early

Anonymous
02/15/26(Sun)22:19:04 No.108158960

Anonymous 02/15/26(Sun)22:19:04 No.108158960

make new thread NOW. if you don't, then i will. and it won't be pretty.

Anonymous
02/15/26(Sun)22:22:20 No.108158981

Anonymous 02/15/26(Sun)22:22:20 No.108158981

maybe it's time to retire /lmg/ until deepseek 4 40b-ENGRAM saves us with SOTA performance

Anonymous
02/15/26(Sun)22:24:43 No.108158995

Anonymous 02/15/26(Sun)22:24:43 No.108158995

>>108158981
Retiring for 2 years is too much.

Anonymous
02/15/26(Sun)22:27:59 No.108159020

Anonymous 02/15/26(Sun)22:27:59 No.108159020

File: Untitled.png (120 KB, 1796x201)

120 KB PNG

>>108158863
Yeah, I'm talking about ubergarm. The unlsoth UD_Q3 was okay but ubergarm's quants are superior with ik_llama. Personally I've found GLM-4.7 to be the best out of everything from 4.5 to 5 but I'll keep playing with it.
The one thing that none of the GLM models can beat is the knowledge. Whatever they fed 2.5 has caused it to retain an insane amount of niche knowledge. Pic is from an omniscient assistant-style character I use for testing knowledge and world lore stuff.
>Which actress played the daughter of Arnold Schwarzenegger's character in the movie "The Last Action Hero"?
All of the info is correct including the script writers, one of which barely exists on IMDb.

Anonymous
02/15/26(Sun)22:50:04 No.108159137

Anonymous 02/15/26(Sun)22:50:04 No.108159137

boobergarm

Anonymous
02/15/26(Sun)23:04:33 No.108159195

Anonymous 02/15/26(Sun)23:04:33 No.108159195

Has Zuck realized that the Scale AI guy is scamming him yet

Anonymous
02/15/26(Sun)23:24:47 No.108159293

Anonymous 02/15/26(Sun)23:24:47 No.108159293

>>108159195
yes, i have

Anonymous
02/15/26(Sun)23:33:46 No.108159340

Anonymous 02/15/26(Sun)23:33:46 No.108159340

>>108159293
grow some balls and release omni model already

Anonymous
02/15/26(Sun)23:37:19 No.108159360

Anonymous 02/15/26(Sun)23:37:19 No.108159360

>>108159340
must wait for v4 to know how much to fudge benchmark numbers

Anonymous
02/15/26(Sun)23:50:13 No.108159431

Anonymous 02/15/26(Sun)23:50:13 No.108159431

>>108158776
>The generation times are now worse than 2.5 for me because they increased the active params to 44B
To be fair, GLM5 has Multi-Token Prediction (like the 4.5 series) which is supposed to help with the generation speeds. It's just that llama.cpp has failed to implement that so far.
K2.5 doesn't have that at all.

Anonymous
02/16/26(Mon)00:20:42 No.108159552

Anonymous 02/16/26(Mon)00:20:42 No.108159552

>>108159431
>It's just that llama.cpp
what a shame kek

Anonymous
02/16/26(Mon)00:27:26 No.108159582

Anonymous 02/16/26(Mon)00:27:26 No.108159582

>>108159576
>>108159576
>>108159576

Anonymous
02/16/26(Mon)00:31:28 No.108159597

Anonymous 02/16/26(Mon)00:31:28 No.108159597

N

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.