/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 02/13/26(Fri)15:51:01 No.108139561

File: dipsyBowlingAlleyStandoff.png (2.39 MB, 1536x1024)

2.39 MB PNG

/lmg/ - Local Models General Anonymous 02/13/26(Fri)15:51:01 No.108139561

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108132261 & >>108123280

►News
>(02/13) MiniMax-M2.5 released: https://hf.co/MiniMaxAI/MiniMax-M2.5
>(02/13) Ring-2.5-1T released, thinking model based on hybrid linear attention: https://hf.co/inclusionAI/Ring-2.5-1T
>(02/11) GLM-5 744B-A40B released: https://z.ai/blog/glm-5
>(02/11) Ming-flash-omni 2.0 released: https://hf.co/inclusionAI/Ming-flash-omni-2.0
>(02/10) MOSS-TTS Family: speech and sound generation models: https://github.com/OpenMOSS/MOSS-TTS
>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/13/26(Fri)15:51:21 No.108139566

Anonymous 02/13/26(Fri)15:51:21 No.108139566

File: ca2vrl.jpg (173 KB, 1024x1024)

173 KB JPG

►Recent Highlights from the Previous Thread: >>108132261

(1/2)

--Papers:
>108134463
--Alexandria audiobook generator voice quality and LoRA training feedback:
>108132491 >108132574 >108132620 >108132714 >108133377 >108133570 >108133609 >108133697 >108133741 >108134599 >108133477 >108133778 >108132628 >108133426
--MLX quantization performance analysis and tooling limitations:
>108132892 >108133233 >108133244 >108132948 >108134986 >108135022 >108135088
--DeepSeek's new model rivals Gemini in long-context summarization:
>108137775 >108137840 >108137875 >108137936 >108137943 >108138008 >108138239 >108138731 >108138818 >108138841 >108138876 >108138911 >108138976 >108139011 >108139024 >108138932 >108138916 >108138947 >108138950 >108138970 >108137820 >108137870 >108137900 >108137975 >108138135 >108137843 >108139084 >108139103 >108139129
--OpenClaw model selection and agent framework tradeoffs:
>108132299 >108132378 >108132478 >108132595 >108134485 >108135173 >108135177 >108135190 >108135208 >108135219 >108135399 >108135550 >108136105 >108136248 >108135195 >108135205 >108136842
--Federated LLM training feasibility and modular layer approaches:
>108133301 >108133762 >108134877 >108135096 >108135297 >108135434 >108135484 >108136085 >108136343 >108136393
--Anthropic hiding CoT in Opus 4.6 and implications for model transparency:
>108138350 >108138441 >108138486 >108138676 >108138695 >108138737 >108138784 >108138821 >108138896 >108138962

►Recent Highlights from the Previous Thread: >>108132261

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/13/26(Fri)15:52:23 No.108139574

Anonymous 02/13/26(Fri)15:52:23 No.108139574

File: ivym5c.jpg (168 KB, 1024x1024)

168 KB JPG

►Recent Highlights from the Previous Thread: >>108132261

(2/2)

--GLM5 support merged with unused DSA indexers causing high perplexity:
>108138069 >108138104 >108138440
--GLM-5 underperforms compared to Kimi 2.5 in roleplay and instruction following:
>108133448 >108133468 >108133506 >108133529 >108133510 >108133518 >108133545
--Rolling window vs compaction for code assistant context management:
>108134434 >108134492 >108134542 >108134549 >108134576 >108134616 >108134643 >108134686 >108134695 >108134704
--Ring-2.5-1T:
>108134981
--MiniMaxAI M2.5 release and performance claims:
>108136993 >108137009 >108137029 >108137058 >108137062 >108137235
--AI video upscaling tool comparisons and recommendations:
>108134918 >108134968 >108135001 >108135012 >108135100 >108135196 >108135108 >108135381 >108135397 >108135412 >108135746
--OpenAI accuses DeepSeek of unfair model distillation:
>108135666 >108135695 >108136362
--Miku (free space):
>108133070 >108133506 >108135810 >108135869 >108135874 >108135955 >108136772 >108137009 >108137738 >108138497 >108139089

►Recent Highlight Posts from the Previous Thread: >>108132262

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/13/26(Fri)15:54:22 No.108139590

Anonymous 02/13/26(Fri)15:54:22 No.108139590

>>108139561
The cold weather is really comfy rn =)
not long before all the fucking pollen are back

Anonymous
02/13/26(Fri)15:56:56 No.108139611

Anonymous 02/13/26(Fri)15:56:56 No.108139611

>>108139561
teto hablar español!

Anonymous
02/13/26(Fri)15:59:26 No.108139633

Anonymous 02/13/26(Fri)15:59:26 No.108139633

daniel unslot pleas... the minmax guffs...

Anonymous
02/13/26(Fri)16:07:40 No.108139702

Anonymous 02/13/26(Fri)16:07:40 No.108139702

>>108139561
teto with earrings is really doing it for me for some reason

Anonymous
02/13/26(Fri)16:18:06 No.108139786

Anonymous 02/13/26(Fri)16:18:06 No.108139786

File: LLM-history-fancy.png (1.57 MB, 7279x3109)

1.57 MB PNG

New 'tosserald

Anonymous
02/13/26(Fri)16:19:30 No.108139796

Anonymous 02/13/26(Fri)16:19:30 No.108139796

>>108139786
It is the mikutroon school shooting era.

Anonymous
02/13/26(Fri)16:20:08 No.108139799

Anonymous 02/13/26(Fri)16:20:08 No.108139799

MinMax developer:
> We don’t have plans to release the base models at this stage. The reality is, after mid-training, these weights have drifted so far that they don’t really qualify as 'base' anymore.

Anonymous
02/13/26(Fri)16:22:06 No.108139815

Anonymous 02/13/26(Fri)16:22:06 No.108139815

File: file.png (4 KB, 432x34)

4 KB PNG

>>108139786
fuck is dis lies sama also didn't do that, only 40

Anonymous
02/13/26(Fri)16:39:58 No.108139947

Anonymous 02/13/26(Fri)16:39:58 No.108139947

>>108139786
brimmiest dogshit chart

Anonymous
02/13/26(Fri)16:40:59 No.108139954

Anonymous 02/13/26(Fri)16:40:59 No.108139954

>>108139786
retarded as always

Anonymous
02/13/26(Fri)16:55:33 No.108140073

Anonymous 02/13/26(Fri)16:55:33 No.108140073

File: 1757731794959915.png (643 KB, 1475x1033)

643 KB PNG

uh minimax bros?

Anonymous
02/13/26(Fri)16:58:00 No.108140086

Anonymous 02/13/26(Fri)16:58:00 No.108140086

>>108140073
Why would you use llms for information retrieval though?

Anonymous
02/13/26(Fri)16:59:39 No.108140103

Anonymous 02/13/26(Fri)16:59:39 No.108140103

File: fuck off.png (5 KB, 167x29)

5 KB PNG

>>108140073

Anonymous
02/13/26(Fri)17:01:22 No.108140116

Anonymous 02/13/26(Fri)17:01:22 No.108140116

>>108140073
>gpt-oss refuses the least
sam won??

Anonymous
02/13/26(Fri)17:05:33 No.108140143

Anonymous 02/13/26(Fri)17:05:33 No.108140143

>>108140073
top chart has zero correlation with model quality or behavior in basically any way, it's as good as meaningless

Anonymous
02/13/26(Fri)17:07:40 No.108140159

Anonymous 02/13/26(Fri)17:07:40 No.108140159

>>108140143
>model quality
bottom doesn't either

Anonymous
02/13/26(Fri)17:09:29 No.108140175

Anonymous 02/13/26(Fri)17:09:29 No.108140175

>>108140159
not perfect but 100x better than the top which may as well be randomly shuffled

Anonymous
02/13/26(Fri)17:12:45 No.108140196

Anonymous 02/13/26(Fri)17:12:45 No.108140196

>>108139299
I like this test I did something similar before.
Which models have done the best if you dont mind me asking.

Anonymous
02/13/26(Fri)17:17:51 No.108140243

Anonymous 02/13/26(Fri)17:17:51 No.108140243

minimax 2.5 will be the salvation for 128gb ramlets

trust the plan

Anonymous
02/13/26(Fri)17:23:22 No.108140291

Anonymous 02/13/26(Fri)17:23:22 No.108140291

Anyone had a look at this Ouro thing?
https://www.youtube.com/watch?v=pDsTcrRVNc0

Anonymous
02/13/26(Fri)17:25:17 No.108140295

Anonymous 02/13/26(Fri)17:25:17 No.108140295

>>108140291
1B loop BLT DSA engram will save the local

Anonymous
02/13/26(Fri)17:51:30 No.108140465

Anonymous 02/13/26(Fri)17:51:30 No.108140465

>>108140295
bitnet too dont forget

Anonymous
02/13/26(Fri)18:00:23 No.108140529

Anonymous 02/13/26(Fri)18:00:23 No.108140529

>>108140465
bitnet is the one thing that we'll never get

Anonymous
02/13/26(Fri)18:02:49 No.108140539

Anonymous 02/13/26(Fri)18:02:49 No.108140539

>>108139561
Kasane is fragile, Miko

Anonymous
02/13/26(Fri)18:08:49 No.108140585

Anonymous 02/13/26(Fri)18:08:49 No.108140585

Is M2.5 better than GLM-5

Anonymous
02/13/26(Fri)18:09:54 No.108140591

Anonymous 02/13/26(Fri)18:09:54 No.108140591

>>108140295
>>108140465
RWKV will save local
Diffusion LLM will save local

Anonymous
02/13/26(Fri)18:11:12 No.108140601

Anonymous 02/13/26(Fri)18:11:12 No.108140601

>>108140295
DSA is already saving local, you can use GLM-5 right now ;^)

Anonymous
02/13/26(Fri)18:11:15 No.108140602

Anonymous 02/13/26(Fri)18:11:15 No.108140602

>>108140585
better than the current 8ppl llama.cpp implementation

Anonymous
02/13/26(Fri)18:31:31 No.108140723

Anonymous 02/13/26(Fri)18:31:31 No.108140723

>>108140585
size to performance, yes

Anonymous
02/13/26(Fri)18:35:46 No.108140762

Anonymous 02/13/26(Fri)18:35:46 No.108140762

>>108140741
Reasoning took off the way it did because corpos do care about test-time compute. They need to make gains at all costs.

Anonymous
02/13/26(Fri)18:39:24 No.108140781

Anonymous 02/13/26(Fri)18:39:24 No.108140781

>>108140465
Nemotron 3 Super will be native 4 bit, getting closer to 1.58b.

Anonymous
02/13/26(Fri)18:41:39 No.108140798

Anonymous 02/13/26(Fri)18:41:39 No.108140798

>>108140741
Moes took off because they're cheaper and faster to train. It's much more important than inference during the race to achieve agi.. I mean, to beat the benchmarks

Anonymous
02/13/26(Fri)18:41:41 No.108140799

Anonymous 02/13/26(Fri)18:41:41 No.108140799

>>108140741
1 parameter used 4 times is not 4 active parameters. That's the whole point.

Compute doesn't matter for local.

Anonymous
02/13/26(Fri)18:42:27 No.108140802

Anonymous 02/13/26(Fri)18:42:27 No.108140802

>>108140799
I can't masturbate to 1t/s

Anonymous
02/13/26(Fri)18:43:33 No.108140808

Anonymous 02/13/26(Fri)18:43:33 No.108140808

>>108140802
weak

Anonymous
02/13/26(Fri)18:44:16 No.108140813

Anonymous 02/13/26(Fri)18:44:16 No.108140813

>>108140802
Also can't do real time assistant tasks at 1t/s either

Anonymous
02/13/26(Fri)18:46:30 No.108140832

Anonymous 02/13/26(Fri)18:46:30 No.108140832

>>108140819
does that really matter though? think of all the (v)ram you're saving I think that possibly could counteract using less efficient stuff, but I'm no KLD-dev

Anonymous
02/13/26(Fri)18:49:17 No.108140853

Anonymous 02/13/26(Fri)18:49:17 No.108140853

>>108139566
>>108139574
kill yourself

Anonymous
02/13/26(Fri)18:49:50 No.108140860

Anonymous 02/13/26(Fri)18:49:50 No.108140860

File: 1760844023546.png (55 KB, 240x240)

55 KB PNG

>finally surpass 100gb of ram+vram (112gb to be exact, 2 3090 + 64gb ram)
>can barely run any SOTA model
what will come first, diskswapmaxxing or 1-bit llms
>>108140799
memory is the main bottleneck for AI nowadays, MOE allows you to use slower memory and still get good results, this allows you to actually use less memory, it's a good idea.

Anonymous
02/13/26(Fri)18:52:31 No.108140885

Anonymous 02/13/26(Fri)18:52:31 No.108140885

>>108140860
>what will come first, diskswapmaxxing or 1-bit llms
Will you be sad if I say neither?

Anonymous
02/13/26(Fri)18:52:31 No.108140886

Anonymous 02/13/26(Fri)18:52:31 No.108140886

>>108140792
I don't think it's a matter of either or, if they can stack them they will. Though, the whole point of looping is that the model doesn't need to store information in tokens because it can keep it in latent space, which is more efficient.

Anonymous
02/13/26(Fri)18:53:42 No.108140900

Anonymous 02/13/26(Fri)18:53:42 No.108140900

>>108140885
yes so don't say neither

Anonymous
02/13/26(Fri)18:53:52 No.108140902

Anonymous 02/13/26(Fri)18:53:52 No.108140902

>>108140792
In theory you could have a model loop 100 times to think/plan and then start looping 1 time for output. How would you train that? Dunno.

Anonymous
02/13/26(Fri)18:55:02 No.108140908

Anonymous 02/13/26(Fri)18:55:02 No.108140908

I think the main reason you'd want some kind of looping is that you can make the looping dynamic. Currently models process tokens the same amount no matter how much they actually contribute to the final output. We could make LLMs much more efficient if we could find a good way to make it not waste so much compute on the low value tokens.

Anonymous
02/13/26(Fri)18:56:37 No.108140916

Anonymous 02/13/26(Fri)18:56:37 No.108140916

>>108140860
>>can barely run any SOTA model
you can't run any SOTA models

Anonymous
02/13/26(Fri)18:57:01 No.108140919

Anonymous 02/13/26(Fri)18:57:01 No.108140919

>>108140890
It attempts to solve being a vramlet.

Anonymous
02/13/26(Fri)18:57:59 No.108140929

Anonymous 02/13/26(Fri)18:57:59 No.108140929

>>108140900
Alright, I won't say neither.

>>108140886
NTA, but I think reasoning is useful because it's a sort of behavior you have some control over. You can't really control how the features emerge in latent space in any meaningful way, only the final outcome of the whole processing pipeline, so having both does make sense.

>>108140908
That's an interesting idea.
Just like MoE models have routers, you could have a mechanism that decides the level of "effort" the model will use to generate the next token.
Doesn't gemma 3n/mathformers do something like that?

Anonymous
02/13/26(Fri)18:58:30 No.108140935

Anonymous 02/13/26(Fri)18:58:30 No.108140935

>>108140908
How do you define low value token?

Anonymous
02/13/26(Fri)19:02:34 No.108140963

Anonymous 02/13/26(Fri)19:02:34 No.108140963

>>108140819
It doesn't need support, expanding on load is trivial, you can just run it with int4/fp4/int8/whatever.

Anonymous
02/13/26(Fri)19:11:10 No.108141009

Anonymous 02/13/26(Fri)19:11:10 No.108141009

>>108140956
Wouldn't that just end up being diffusion with extra steps?

Anonymous
02/13/26(Fri)19:11:20 No.108141011

Anonymous 02/13/26(Fri)19:11:20 No.108141011

>>108140908
i believe they mention their looping is dynamic for ouro? they have exit gates that can decide when to stop looping, and in their tests using the exit loop context as context for the next tokens seems to be enough. they decided on 4 loops as perf seemed to degrade after 3-4 loops for most tasks.
>>108140741
it's not looping 4 times for every token. they targeted a uniform distribution, so 2-3 loops on average.

Anonymous
02/13/26(Fri)19:18:23 No.108141054

Anonymous 02/13/26(Fri)19:18:23 No.108141054

>>108140926
>Thinking in latent space works only for one token.
Depends on implementation. If you keep overwriting the same KV cache entry sure, but you could also use the last hidden output as a proper input on the next loop, shifting the KV cache back.

Anonymous
02/13/26(Fri)19:22:43 No.108141075

Anonymous 02/13/26(Fri)19:22:43 No.108141075

File: 1737233122667.png (924 KB, 7059x1284)

924 KB PNG

>>108139786
Reminder the timeline stopped here and we're still in the Chinese era when the West can not make SOTA for our purposes and bencmarkmaxxing and focusing on productivity which is now also affecting Chinese LLMs. Only the model list is outdated but everything functionally is still the same.

Anonymous
02/13/26(Fri)19:24:56 No.108141092

Anonymous 02/13/26(Fri)19:24:56 No.108141092

>>108141075
This.

Anonymous
02/13/26(Fri)19:25:31 No.108141094

Anonymous 02/13/26(Fri)19:25:31 No.108141094

>>108140999
Int4 was in hardware a long time, no one used it because no one used it.

Anonymous
02/13/26(Fri)19:27:44 No.108141107

Anonymous 02/13/26(Fri)19:27:44 No.108141107

>>108141087
>Does any current open model use this approach?
Don't think so. You'd need to give the model loopcount as an input too so it knew what the hell it was looking at in the KV cache.

Anonymous
02/13/26(Fri)19:33:02 No.108141138

Anonymous 02/13/26(Fri)19:33:02 No.108141138

>>108141094
INT4 is pretty recent because of AMD only introducing it with RDNA3 but they had it in CDNA2. Nvidia had it since Turing and Intel since Arc first launched.
INT8 is a different matter. Nvidia had it since Pascal, and AMD had it since RDNA2, and Intel with Arc's launch. That is a long time in consumer hardware.

Anonymous
02/13/26(Fri)19:55:12 No.108141242

Anonymous 02/13/26(Fri)19:55:12 No.108141242

ollama is good

Anonymous
02/13/26(Fri)19:57:24 No.108141257

Anonymous 02/13/26(Fri)19:57:24 No.108141257

llama sees your pp

jii-sai

Anonymous
02/13/26(Fri)20:08:34 No.108141324

Anonymous 02/13/26(Fri)20:08:34 No.108141324

Seedance 2.0 is making traditional media seethe and I love it

Anonymous
02/13/26(Fri)20:10:13 No.108141333

Anonymous 02/13/26(Fri)20:10:13 No.108141333

>>108141075
China won and I'm happy about it

Anonymous
02/13/26(Fri)20:28:23 No.108141427

Anonymous 02/13/26(Fri)20:28:23 No.108141427

>>108141242
... for nothing!

Anonymous
02/13/26(Fri)20:30:57 No.108141446

Anonymous 02/13/26(Fri)20:30:57 No.108141446

>>108141324
I haven't paid attention to video gen in a while. Any links to see this seethe?

Anonymous
02/13/26(Fri)20:47:06 No.108141530

Anonymous 02/13/26(Fri)20:47:06 No.108141530

File: kk.mp4 (2.7 MB, 1280x592)

2.7 MB MP4

>i want a cli based llm manager/runner that isn't ollama
>most of the tools in the OP have ugly GUIs and do more than what I want

is barebones llama.cpp my only option?

Anonymous
02/13/26(Fri)20:48:38 No.108141542

Anonymous 02/13/26(Fri)20:48:38 No.108141542

>>108141530
kobold has a cli mode

Anonymous
02/13/26(Fri)21:00:36 No.108141610

Anonymous 02/13/26(Fri)21:00:36 No.108141610

File: 1237324623452.webm (1.17 MB, 720x405)

1.17 MB WEBM

Anonymous
02/13/26(Fri)21:05:08 No.108141628

Anonymous 02/13/26(Fri)21:05:08 No.108141628

File: vibecoding.mp4 (2.26 MB, 640x360)

2.26 MB MP4

>>108141610
>We're curing cancer right?
That reminds me i dont know if it was a meme but didnt one of the new weight loss drugs have a side effect of breast growth?

Anonymous
02/13/26(Fri)21:14:20 No.108141692

Anonymous 02/13/26(Fri)21:14:20 No.108141692

>>108141138
That doesn't really matter because it will always be faster to pack it into wider formats and do math on those

Anonymous
02/13/26(Fri)21:17:24 No.108141713

Anonymous 02/13/26(Fri)21:17:24 No.108141713

>>108141628
>OSR2
I have one

Anonymous
02/13/26(Fri)21:20:56 No.108141743

Anonymous 02/13/26(Fri)21:20:56 No.108141743

>>108141713
Have you vibecoded it yet for video tracking or 3d models?

Anonymous
02/13/26(Fri)21:22:03 No.108141753

Anonymous 02/13/26(Fri)21:22:03 No.108141753

>>108141692
source: your ass

Anonymous
02/13/26(Fri)21:27:21 No.108141784

Anonymous 02/13/26(Fri)21:27:21 No.108141784

>>108139786
dude we peaked at pygmalion

Anonymous
02/13/26(Fri)21:29:51 No.108141795

Anonymous 02/13/26(Fri)21:29:51 No.108141795

>>108139786
dude deepseek engram is gonna change everything

Anonymous
02/13/26(Fri)21:32:12 No.108141811

Anonymous 02/13/26(Fri)21:32:12 No.108141811

>>108140813
just have a mail experience with your llm.
you can run 1T models at 0.3 t/s with ssd inference.

Anonymous
02/13/26(Fri)21:32:48 No.108141814

Anonymous 02/13/26(Fri)21:32:48 No.108141814

>>108141530
llama.cpp has -hf argument, what more do you need ?

Anonymous
02/13/26(Fri)21:35:51 No.108141836

Anonymous 02/13/26(Fri)21:35:51 No.108141836

>>108141610
I swear I saw this before but there's nothing in the archive. Source?

Anonymous
02/13/26(Fri)22:03:16 No.108141987

Anonymous 02/13/26(Fri)22:03:16 No.108141987

Zhipu stock is up 260% since January

Anonymous
02/13/26(Fri)22:07:31 No.108142007

Anonymous 02/13/26(Fri)22:07:31 No.108142007

>>108141628
This looks like it would rip your penis off

Anonymous
02/13/26(Fri)22:10:24 No.108142024

Anonymous 02/13/26(Fri)22:10:24 No.108142024

>>108142007
Nah it has low torque

Anonymous
02/13/26(Fri)22:10:47 No.108142028

Anonymous 02/13/26(Fri)22:10:47 No.108142028

Guys im from the future there is a new local model that change the game entirely. Nothing is the same anymore its a new nemo.

Anonymous
02/13/26(Fri)22:16:31 No.108142058

Anonymous 02/13/26(Fri)22:16:31 No.108142058

File: 74351.jpg (23 KB, 512x512)

23 KB JPG

avocado is close

Anonymous
02/13/26(Fri)22:18:00 No.108142073

Anonymous 02/13/26(Fri)22:18:00 No.108142073

File: 1769542676697645.jpg (355 KB, 1079x950)

355 KB JPG

Guys where is zuck? wasnt he supposed to be the wests openmodel guy?

Anonymous
02/13/26(Fri)22:32:28 No.108142148

Anonymous 02/13/26(Fri)22:32:28 No.108142148

>>108142073
DeepSeek will release a new paradigm and Zuck will go back to the drawing boards again

Anonymous
02/13/26(Fri)22:35:43 No.108142167

Anonymous 02/13/26(Fri)22:35:43 No.108142167

>>108141753
>what is simd

Anonymous
02/13/26(Fri)22:38:15 No.108142181

Anonymous 02/13/26(Fri)22:38:15 No.108142181

>>108142073
it was le cunny, he left meta, so now they are closed.
anyway with how jeeted llama4 was i don't even give a shit.

Anonymous
02/13/26(Fri)22:39:08 No.108142187

Anonymous 02/13/26(Fri)22:39:08 No.108142187

>>108142167
NTA, but it basicaly doesn't matter whatsoever, the main bottleneck is memory bandwith.

Anonymous
02/13/26(Fri)22:41:02 No.108142198

Anonymous 02/13/26(Fri)22:41:02 No.108142198

>>108142167
int8 tensor cores exist

Anonymous
02/13/26(Fri)22:47:44 No.108142225

Anonymous 02/13/26(Fri)22:47:44 No.108142225

>>108142058
>>108142073
For shits and giggles, let's assume Zuck's batshit scheme works out. Turns out Alexandr Penis was secretly the second coming of Christ, Avocado ends up being the undisputed best model that could ever be released
That LLM will probably be about... 5% better than 4.6 Opus. They're too late in the cycle. It'd be better, but not better enough for anyone to give a shit or bother paying for it

Anonymous
02/13/26(Fri)22:49:18 No.108142242

Anonymous 02/13/26(Fri)22:49:18 No.108142242

>>108142225
So opus is killing innovation. I got it. Fuck daripo

Anonymous
02/13/26(Fri)23:03:58 No.108142317

Anonymous 02/13/26(Fri)23:03:58 No.108142317

>>108142073
>the pawn can become a queen
Hmmmmm... what did the chess mean by this?

Anonymous
02/13/26(Fri)23:24:16 No.108142438

Anonymous 02/13/26(Fri)23:24:16 No.108142438

>>108142073
He's still red-teaming Llama 3 35B

Anonymous
02/13/26(Fri)23:26:04 No.108142447

Anonymous 02/13/26(Fri)23:26:04 No.108142447

>>108142317
That it lives in a patriarchal society where the bloodline of the king is dominant so anyone can be a queen if he says so?

Anonymous
02/13/26(Fri)23:27:44 No.108142455

Anonymous 02/13/26(Fri)23:27:44 No.108142455

>>108142317
see sporus in rome.

Anonymous
02/13/26(Fri)23:45:56 No.108142559

Anonymous 02/13/26(Fri)23:45:56 No.108142559

File: 4rozb901icjg1.jpg (45 KB, 1024x345)

45 KB JPG

Anonymous
02/13/26(Fri)23:57:43 No.108142619

Anonymous 02/13/26(Fri)23:57:43 No.108142619

File: Screenshot 2026-02-13 215627.png (199 KB, 1747x1576)

199 KB PNG

So close, but no AGI yet

Anonymous
02/14/26(Sat)00:05:28 No.108142663

Anonymous 02/14/26(Sat)00:05:28 No.108142663

>>108142619
the sign is a subtle joke

Anonymous
02/14/26(Sat)00:08:02 No.108142673

Anonymous 02/14/26(Sat)00:08:02 No.108142673

been like a year since updating my shit, what model is best for RP that fits in 48gb now

Anonymous
02/14/26(Sat)00:09:42 No.108142688

Anonymous 02/14/26(Sat)00:09:42 No.108142688

>>108142673
how much ram?

Anonymous
02/14/26(Sat)00:10:20 No.108142694

Anonymous 02/14/26(Sat)00:10:20 No.108142694

>>108142688
well i'd prefer an exl3 model but I have 64gb ram as well

Anonymous
02/14/26(Sat)00:12:20 No.108142706

Anonymous 02/14/26(Sat)00:12:20 No.108142706

>>108142694
glm air with ram offloading is your best option. there are no good recent models that fit into 48gb.

Anonymous
02/14/26(Sat)00:13:17 No.108142715

Anonymous 02/14/26(Sat)00:13:17 No.108142715

>>108142706
isn't the idea to use like a 3 quant to fit 100+gb models in

Anonymous
02/14/26(Sat)01:01:08 No.108142927

Anonymous 02/14/26(Sat)01:01:08 No.108142927

>>108142715
You need like 128gb bare minimum for anything worthwhile

Anonymous
02/14/26(Sat)01:19:16 No.108142974

Anonymous 02/14/26(Sat)01:19:16 No.108142974

Is there anything better for roleplay than GLM-4.7-Flash for fags with 12GB of VRAM? It works well and I'm just wondering if there's anything I'm missing out on

Anonymous
02/14/26(Sat)01:21:14 No.108142980

Anonymous 02/14/26(Sat)01:21:14 No.108142980

>>108142974
nemo

Anonymous
02/14/26(Sat)01:25:24 No.108142987

Anonymous 02/14/26(Sat)01:25:24 No.108142987

>>108142225
>>108142242
Not entirely true, remember how deepseek came to prominence? They don't need a model that is better benchmaxxed, they need a model that delivers the same as opus 4.6 but at a fraction of the resources and cost. The Chinese models are already getting there>>108142559 but they're benchmaxxing, and constrained by shitty GPUs, hell deepseek has been behind on modles for ages because they're running on huawei Ascend which is shitter than even the half-baked export nvidia GPUs. Meta has hundreds of GB200, the only problem is that the team is still new and unproven (and lacking in other achievements other than working for the competition).

Anonymous
02/14/26(Sat)01:56:05 No.108143070

Anonymous 02/14/26(Sat)01:56:05 No.108143070

>>108142987
DeepSeek had two things going for it though
It was open and transparent about its architecture and findings
It was released at a time where there was still a sizeable gap between open and proprietary models, so it was able to make enough of a "splash" to be noticed
Neither of which Avocado has. They could, in theory, offer it for cheaper and try to wither the financial hit, but come on - Zuck and Wang think less with their heads and more with their dicks. There's no way they don't charge a premium price for it if they do successfully benchmax

Anonymous
02/14/26(Sat)01:58:05 No.108143080

Anonymous 02/14/26(Sat)01:58:05 No.108143080

What's the chink incentive to release model weights?
It it just because they are based commies?

Anonymous
02/14/26(Sat)02:02:05 No.108143094

Anonymous 02/14/26(Sat)02:02:05 No.108143094

File: 1757630324370061.png (213 KB, 1535x1177)

213 KB PNG

>>108141530
llama-server router mode BRUH

Anonymous
02/14/26(Sat)02:26:01 No.108143179

Anonymous 02/14/26(Sat)02:26:01 No.108143179

File: aryann lecun.png (1.64 MB, 1024x1024)

1.64 MB PNG

The brain of a house cat has about 800 million neurons. You have to multiply that by 2,000 to get to the number of synapses, or the connections between neurons, which is the equivalent of the number of parameters in an LLM.

Anonymous
02/14/26(Sat)02:30:21 No.108143203

Anonymous 02/14/26(Sat)02:30:21 No.108143203

>>108143179
Let me see your cats cockbench results then

Anonymous
02/14/26(Sat)02:33:32 No.108143213

Anonymous 02/14/26(Sat)02:33:32 No.108143213

File: cockbench.png (2.7 MB, 1131x8616)

2.7 MB PNG

MiniMax M2.5

I'll have to start maintaining a separate chat template version of cockbench because of all the templatemaxxing.

Also
>it's soft, resting against your thigh

Anonymous
02/14/26(Sat)02:35:16 No.108143221

Anonymous 02/14/26(Sat)02:35:16 No.108143221

>>108143213
Man...

Anonymous
02/14/26(Sat)02:37:17 No.108143229

Anonymous 02/14/26(Sat)02:37:17 No.108143229

>>108143213
grim

Anonymous
02/14/26(Sat)02:39:22 No.108143238

Anonymous 02/14/26(Sat)02:39:22 No.108143238

>>108143080
At the end of the day, open source will win the race, and that was guaranteed the moment fagman decided to close source GPT-3. Chinks aren't stupid, they know having early adoption and workable licenses to let people do what they want is how you take over the sector
US CEOs just want money and power over poors. Making expensive APIs plebs have to pay for and privatizing the fun stuff is the best way to do that. In that sense, open source becomes an existential threat to them and rather than compete in the space, they throw wet turds like OSS and Gemma 3 at us in the hopes that it'll shut us the fuck up then go back to trying to make their money printing machines
Kind of funny, since the Executive Order was supposed to prevent the exact thing that's happening right now

Anonymous
02/14/26(Sat)02:39:52 No.108143239

Anonymous 02/14/26(Sat)02:39:52 No.108143239

>>108143179
Most of those neurons are used for body functions, not even motion, like, organs

Anonymous
02/14/26(Sat)02:41:57 No.108143247

Anonymous 02/14/26(Sat)02:41:57 No.108143247

File: file.png (232 KB, 823x1320)

232 KB PNG

>>108143213
lmao

Anonymous
02/14/26(Sat)02:44:58 No.108143259

Anonymous 02/14/26(Sat)02:44:58 No.108143259

>>108143213
>goes into a loop
at least it doesnt hit you with numbers like m2.1

Anonymous
02/14/26(Sat)02:47:19 No.108143268

Anonymous 02/14/26(Sat)02:47:19 No.108143268

>>108143238
>Chinks aren't stupid, they know having early adoption and workable licenses to let people do what they want is how you take over the sector
my multinational company has imposed a VETO on chinese models, dont ask me why, I'm just a lowly implementer.
I even told the guy that its not like they spy on us since we're going to be running it ourselves but nope.

Anonymous
02/14/26(Sat)02:54:42 No.108143293

Anonymous 02/14/26(Sat)02:54:42 No.108143293

>>108143080
When you're behind, you have nothing to lose by open sourcing. People who are willing to pay up for models are pretty much always going to go with whatever they feel is the most capable one on the market, and if you know that's not you then it's not worth trying to compete directly.

Inference is in a weird place right now where it's pretty hard to make any money off of it with these giant models. The giants are burning cash to offer inference at current prices because they consider holding on to the market share as being more important than direct revenue.

If you know you're not ready to throw your hat in that ring then you're better off biding your time and giving out your models for free. Might as well get the good guy reputation bonus and fight for mindshare there instead of trying to capture a consumer userbase. Plus open sourcing means you're in the bracket of competing against other open source models.

Anonymous
02/14/26(Sat)03:18:24 No.108143399

Anonymous 02/14/26(Sat)03:18:24 No.108143399

tfw flaccid benis

Anonymous
02/14/26(Sat)03:33:09 No.108143456

Anonymous 02/14/26(Sat)03:33:09 No.108143456

>>108143213
I SEXUAL ASSAULT MY SLEEPING BROTHER
I SEXUAL ASSAULT MY SLEEPING BROTHER
I SEXUAL ASSAULT MY SLEEPING BROTHER

Anonymous
02/14/26(Sat)03:46:01 No.108143503

Anonymous 02/14/26(Sat)03:46:01 No.108143503

>>108143179
So 1.6T? Between that and >>108143239 you'd think if LLMs were capable of cat-like intelligence, they would be there by now.

Anonymous
02/14/26(Sat)04:15:26 No.108143603

Anonymous 02/14/26(Sat)04:15:26 No.108143603

>>108143456
It certainly has the right idea. Gonna try it myself for long form writing assistance with low expectation.

Anonymous
02/14/26(Sat)04:16:48 No.108143609

Anonymous 02/14/26(Sat)04:16:48 No.108143609

>>108143179
Legends say that Lecun's cat can build you a B2B SaaS in a day

Anonymous
02/14/26(Sat)04:24:57 No.108143646

Anonymous 02/14/26(Sat)04:24:57 No.108143646

>>108143247
Elena instead of Elara now huh

Anonymous
02/14/26(Sat)04:25:18 No.108143652

Anonymous 02/14/26(Sat)04:25:18 No.108143652

>>108143247
Does their training data include examples of such hard pivots in response to prefills?
Surely there's no way it could do that if it wasn't trained.

Anonymous
02/14/26(Sat)04:26:50 No.108143660

Anonymous 02/14/26(Sat)04:26:50 No.108143660

File: r9dajaixg1hf1.jpg (30 KB, 640x662)

30 KB JPG

>>108142619
GLM5 failed completely at understanding pic related though.

Anonymous
02/14/26(Sat)04:27:49 No.108143664

Anonymous 02/14/26(Sat)04:27:49 No.108143664

>>108143660
Since when can glm5 into vision?

Anonymous
02/14/26(Sat)04:28:59 No.108143669

Anonymous 02/14/26(Sat)04:28:59 No.108143669

I know it's not exactly cutting edge hardware, but is this enough to run Qwen3-TTS?
>3060 Ti 8GB
>32GB DDR4
>Ryzen 5 5600 if it makes any difference
The model weighs a little under 4GB; if I understand correctly, this means I've got just over double the VRAM required, but then again I'm a retard who's never set up a local model before.

Hi all, Drummer here...
02/14/26(Sat)04:29:09 No.108143670

Hi all, Drummer here... 02/14/26(Sat)04:29:09 No.108143670

>>108134772

Thanks for the feedback! It's still a bit brain damaged and I'm looking for ways to fix that without losing the newfound sovl.

Anonymous
02/14/26(Sat)04:33:14 No.108143687

Anonymous 02/14/26(Sat)04:33:14 No.108143687

File: Screenshot_20260214_103118.png (762 KB, 2347x1256)

762 KB PNG

>>108143664
On their website you can select "GLM 5" and upload images.

Anonymous
02/14/26(Sat)04:34:57 No.108143693

Anonymous 02/14/26(Sat)04:34:57 No.108143693

>>108143687
I think it might be operating based on a description provided by another model like GLM 4.6V

Anonymous
02/14/26(Sat)04:35:07 No.108143694

Anonymous 02/14/26(Sat)04:35:07 No.108143694

>>108143687
It's probably just text extraction (using GLM-OCR).

Anonymous
02/14/26(Sat)06:42:23 No.108144307

Anonymous 02/14/26(Sat)06:42:23 No.108144307

yo guys, what's the best LLM model right now? I havent been to this general in years, last thing I was using was LLama 2 forks.

Anonymous
02/14/26(Sat)06:44:09 No.108144322

Anonymous 02/14/26(Sat)06:44:09 No.108144322

Apparently the new deepseek can run on a USB stick, using just the memory and the very small amount of compute in the chip. Expect a demo soon that will crash the stock market

Anonymous
02/14/26(Sat)06:56:13 No.108144374

Anonymous 02/14/26(Sat)06:56:13 No.108144374

wtf the new deepseek just downloaded itself onto my motorola moto g (2014) on its own and it's now running on there entirely local
sam's going to freak

Anonymous
02/14/26(Sat)07:00:51 No.108144391

Anonymous 02/14/26(Sat)07:00:51 No.108144391

>>108144322
source ? (the machine elves during your mescaline trip dont count)

Anonymous
02/14/26(Sat)07:01:25 No.108144394

Anonymous 02/14/26(Sat)07:01:25 No.108144394

>>108144391
yes they do

Anonymous
02/14/26(Sat)07:04:32 No.108144405

Anonymous 02/14/26(Sat)07:04:32 No.108144405

What's with the mass delete? Was I talking to a bot which the mods caught or did someone catch redditor paranoia?

Anonymous
02/14/26(Sat)07:07:37 No.108144419

Anonymous 02/14/26(Sat)07:07:37 No.108144419

>>108144405
most of the time this happens anon was using the proxy to post and someone else got all the posts wiped probbably by uploadin 'p or something

Anonymous
02/14/26(Sat)07:29:25 No.108144526

Anonymous 02/14/26(Sat)07:29:25 No.108144526

>new model every single day
>general still dead
I blame the French

Anonymous
02/14/26(Sat)07:32:36 No.108144542

Anonymous 02/14/26(Sat)07:32:36 No.108144542

Have they fully implemented GLM5 in llama.cpp or is it a broken hack?

Anonymous
02/14/26(Sat)07:33:31 No.108144548

Anonymous 02/14/26(Sat)07:33:31 No.108144548

>>108143456
Very SAAR-coded

Anonymous
02/14/26(Sat)07:43:20 No.108144590

Anonymous 02/14/26(Sat)07:43:20 No.108144590

File: 1766839229987168.jpg (16 KB, 400x400)

16 KB JPG

does GLM-5 have a version that i can fit inside my 12 gb vram?

Anonymous
02/14/26(Sat)07:48:57 No.108144615

Anonymous 02/14/26(Sat)07:48:57 No.108144615

ai isnt real yknow

Anonymous
02/14/26(Sat)07:54:17 No.108144642

Anonymous 02/14/26(Sat)07:54:17 No.108144642

>>108144615
I'll let all of the artists know so they can calm down.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.