/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 02/11/26(Wed)18:08:11 No.108123280

File: 369641394-01a6ae9a-16aa-4(...).png (1.08 MB, 1024x1024)

1.08 MB PNG

/lmg/ - Local Models General Anonymous 02/11/26(Wed)18:08:11 No.108123280 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108116363 & >>108104466

►News
>(02/11) GLM-5 744B-A40B released: https://z.ai/blog/glm-5
>(02/11) Ming-flash-omni 2.0 released: https://hf.co/inclusionAI/Ming-flash-omni-2.0
>(02/10) MOSS-TTS Family: speech and sound generation models: https://github.com/OpenMOSS/MOSS-TTS
>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open
>(02/06) Step3.5 Flash support merged into llama cpp: https://github.com/ggml-org/llama.cpp/pull/19283

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/11/26(Wed)18:08:28 No.108123287

Anonymous 02/11/26(Wed)18:08:28 No.108123287

File: rinchan-inside.png (34 KB, 750x750)

34 KB PNG

►Recent Highlights from the Previous Thread: >>108116363

--GLM-5 model release with 754B parameters:
>108120571 >108120587 >108120594 >108120600 >108121277
--GLM-5: From Vibe Coding to Agentic Engineering:
>108120582
--Zhipu announces GLM-5 upgrade targeting Claude Opus performance:
>108117870
--GLM-5's outsized performance gains despite modest active parameter increase:
>108120609 >108120642
--MOSS-TTS Family: open-source speech and sound generation models:
>108119398 >108119412 >108119479
--Ming-flash-omni-2.0 multimodal model capabilities and backend support:
>108121994 >108122071 >108122186
--Performance log analysis of MoE model inference:
>108116457
--Detailed llama.cpp inference log for MoE model with perplexity 6.1766:
>108116502
--METR research on exponential AI task completion time growth:
>108119863 >108119887 >108119992 >108120044 >108120509
--API vs local model trade-offs:
>108121545 >108121655 >108121664 >108121715 >108121946 >108121971 >108121999 >108122106 >108122129 >108122010
--AI job market collapse and industry misalignment:
>108117284 >108119380 >108119767 >108119825 >108119967 >108120011 >108120060 >108120109 >108120164 >108120233 >108120270 >108120313 >108120149 >108120209 >108120252 >108120259 >108120395 >108120517 >108120637 >108121124 >108117374 >108117413 >108117422 >108117450 >108117445 >108117548 >108117624 >108117674 >108121800 >108121820 >108121824 >108121880 >108121896 >108121917 >108121975 >108122034 >108122420 >108122435 >108122474
--OpenAI silently rerouting GPT-5.3-Codex to GPT-5.2 for safety:
>108120555
--llama.cpp bans AI-generated content in issues and discussions:
>108118041 >108118048 >108118178
--Debating clawbot's roleplay limitations and agentic workflow potential:
>108116464 >108117862
--Anon speculates about Dipsy 3.5 deployment details:
>108117385
--Miku (free space):
>108117862 >108120685

►Recent Highlight Posts from the Previous Thread: >>108116364

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/11/26(Wed)18:27:02 No.108123422

Anonymous 02/11/26(Wed)18:27:02 No.108123422

>>108123234
It's the kind of thing base models are bad at and post-training is supposed to solve. You would train the model to be skeptical about its own abilities the same way you teach it to refuse talking about sex or violence. Just sft might be enough but you could get fancy with other techniques.

Anonymous
02/11/26(Wed)18:31:33 No.108123456

Anonymous 02/11/26(Wed)18:31:33 No.108123456

When I wake up GLM 5 support will be merged and goofs will be uploaded.

Anonymous
02/11/26(Wed)18:40:19 No.108123511

Anonymous 02/11/26(Wed)18:40:19 No.108123511

File: why.png (112 KB, 1046x551)

112 KB PNG

which one is the real one?

Anonymous
02/11/26(Wed)18:41:34 No.108123522

Anonymous 02/11/26(Wed)18:41:34 No.108123522

>>108123511
it's all an illusion
look behind you

Anonymous
02/11/26(Wed)18:42:32 No.108123532

Anonymous 02/11/26(Wed)18:42:32 No.108123532

>>108123422
or, instead of training it to be skeptical about some use cases, you could just train it on those so it can do it, crazy idea

Anonymous
02/11/26(Wed)18:43:41 No.108123537

Anonymous 02/11/26(Wed)18:43:41 No.108123537

>>108123422
You'd just end up with a permanently schizo self-doubting model.

Anonymous
02/11/26(Wed)18:43:42 No.108123538

Anonymous 02/11/26(Wed)18:43:42 No.108123538

>>108123511
Merge them all together

Anonymous
02/11/26(Wed)18:43:54 No.108123539

Anonymous 02/11/26(Wed)18:43:54 No.108123539

>>108123511
Ideally you will try them all and provide feedback on the discord

Anonymous
02/11/26(Wed)18:44:11 No.108123541

Anonymous 02/11/26(Wed)18:44:11 No.108123541

How come it's been 3 years and nothing better than nemo exists for 1 gpu

Anonymous
02/11/26(Wed)18:44:56 No.108123544

Anonymous 02/11/26(Wed)18:44:56 No.108123544

>>108123538
I'd run out of ram on free-tier HF spaces.

Anonymous
02/11/26(Wed)18:45:43 No.108123551

Anonymous 02/11/26(Wed)18:45:43 No.108123551

>>108123532
there is an infinite amount of things that are possible to do and a limited amount of parameters.

Anonymous
02/11/26(Wed)18:49:35 No.108123575

Anonymous 02/11/26(Wed)18:49:35 No.108123575

File: grok.png (27 KB, 609x204)

27 KB PNG

local is saved

Anonymous
02/11/26(Wed)18:53:40 No.108123600

Anonymous 02/11/26(Wed)18:53:40 No.108123600

>>108123551
more data is always gooder

Anonymous
02/11/26(Wed)18:59:00 No.108123636

Anonymous 02/11/26(Wed)18:59:00 No.108123636

File: 1770740594637563.gif (562 KB, 200x200)

562 KB GIF

>>108123575
grok is only useful because it has access to tweets in real time

Anonymous
02/11/26(Wed)19:01:24 No.108123650

Anonymous 02/11/26(Wed)19:01:24 No.108123650

>>108123575
Grok 4 was released July 9th. By his own 6 month timeline, Grok 3 should have been released last month.

Anonymous
02/11/26(Wed)19:14:10 No.108123733

Anonymous 02/11/26(Wed)19:14:10 No.108123733

File: air.png (74 KB, 759x613)

74 KB PNG

2 gigs for the new glm
is this better than nemo?

Anonymous
02/11/26(Wed)19:15:58 No.108123749

Anonymous 02/11/26(Wed)19:15:58 No.108123749

>>108123733
the most sex model

Anonymous
02/11/26(Wed)19:15:59 No.108123750

Anonymous 02/11/26(Wed)19:15:59 No.108123750

>>108123733
>they did it again
lmao

Anonymous
02/11/26(Wed)19:17:50 No.108123763

Anonymous 02/11/26(Wed)19:17:50 No.108123763

>>108123733
>unslop
They really love wasting HF bandwidth

Anonymous
02/11/26(Wed)19:18:35 No.108123769

Anonymous 02/11/26(Wed)19:18:35 No.108123769

>>108123733
don't they know you need to invest a little after a successful grift? people will sooner or later figure out that you are a fraud

Anonymous
02/11/26(Wed)19:20:20 No.108123786

Anonymous 02/11/26(Wed)19:20:20 No.108123786

>>108123769
lol no they won't they're wholesome chungus 100 reddit loves them to the death

Anonymous
02/11/26(Wed)19:26:02 No.108123838

Anonymous 02/11/26(Wed)19:26:02 No.108123838

>>108123733
>they're still doing it
Looooool

Anonymous
02/11/26(Wed)19:27:21 No.108123849

Anonymous 02/11/26(Wed)19:27:21 No.108123849

>>108123786
eh, ollamao lost all of its goodwill eventually on plebbit
although daniel is pretty active and apologizes for stupid shit in a roundabout way so i guess it'll take a while before someone is fed up with this
i wonder if their training shit even works for larger models as most people who praise their shit can't even run it

Anonymous
02/11/26(Wed)19:27:23 No.108123850

Anonymous 02/11/26(Wed)19:27:23 No.108123850

>sooner or later figure out that you are a fraud
i ditched after they rushed gpt-oss rl into the library and broke everything else
training got so much easier just using trl
happy to pinch their imatrix tho
surely the can't have fucked that up

Anonymous
02/11/26(Wed)19:29:19 No.108123864

Anonymous 02/11/26(Wed)19:29:19 No.108123864

>GLM-5 uses DSA
isn't this the thing that one vibecoder guy tried to implement into llama.cpp for months, didn't manage to do it and then it got 'solved' by "converting" the model to normal attention?

Anonymous
02/11/26(Wed)19:29:57 No.108123868

Anonymous 02/11/26(Wed)19:29:57 No.108123868

>>108123850
>happy to pinch their imatrix tho
>surely the can't have fucked that up
keekoorino, their quants consistently come up at the bottom for kld and ppl for whatever reason even though it's practically impossible to fuck up

Anonymous
02/11/26(Wed)19:31:23 No.108123873

Anonymous 02/11/26(Wed)19:31:23 No.108123873

File: 1758286532214945.jpg (106 KB, 1160x900)

106 KB JPG

>>108123864

Anonymous
02/11/26(Wed)19:31:37 No.108123875

Anonymous 02/11/26(Wed)19:31:37 No.108123875

File: Qwen3-30B-A3B-Instruct-2507.png (321 KB, 1485x4420)

321 KB PNG

>>108123850
>>108123868

Anonymous
02/11/26(Wed)19:34:56 No.108123894

Anonymous 02/11/26(Wed)19:34:56 No.108123894

>>108123873
https://github.com/ggml-org/llama.cpp/issues/16331
Witness a guy go through all stages of vibe coding before the issue gets solved by just ignoring the new thing and all advantages it brings.

Anonymous
02/11/26(Wed)19:35:03 No.108123895

Anonymous 02/11/26(Wed)19:35:03 No.108123895

>>108123650
Call him out on it.

Anonymous
02/11/26(Wed)19:40:17 No.108123919

Anonymous 02/11/26(Wed)19:40:17 No.108123919

>>108123733
I don't get it. Is it so small becaue it's they discarded everything that is usually in ram?

Anonymous
02/11/26(Wed)19:45:04 No.108123960

Anonymous 02/11/26(Wed)19:45:04 No.108123960

File: s.png (48 KB, 1045x415)

48 KB PNG

>>108123875
do you have the script somewhere?
be good to verify our own quants
unsloth are watching
>>108123868
>for whatever reason even though it's practically impossible to fuck up
they must be quanting embedings or their fork of llamacpp is fucked

Anonymous
02/11/26(Wed)19:47:45 No.108123973

Anonymous 02/11/26(Wed)19:47:45 No.108123973

>>108123919
It's the new bitnet engram titan quantized

Anonymous
02/11/26(Wed)20:15:33 No.108124129

Anonymous 02/11/26(Wed)20:15:33 No.108124129

File: la_cory.jpg (157 KB, 1072x792)

157 KB JPG

>get 96gb vram
>don't know what models to use
>mfw
is there an inherent difference between gguf and exl3 aside from the speed and off-loading? are there models i should give a try that aren't in the OP? i'm gonna give 4.5 air a try, maybe a few others.

Anonymous
02/11/26(Wed)20:16:45 No.108124138

Anonymous 02/11/26(Wed)20:16:45 No.108124138

5 air status?

Anonymous
02/11/26(Wed)20:17:34 No.108124144

Anonymous 02/11/26(Wed)20:17:34 No.108124144

>>108124138
vented

Anonymous
02/11/26(Wed)20:22:22 No.108124171

Anonymous 02/11/26(Wed)20:22:22 No.108124171

>>108124129
https://huggingface.co/bartowski/Behemoth-123B-v1.2-GGUF

Anonymous
02/11/26(Wed)20:25:35 No.108124188

Anonymous 02/11/26(Wed)20:25:35 No.108124188

File: wait.png (5 KB, 265x31)

5 KB PNG

What's the most efficient version/implementation of Deep Research™? (Web search and synthesis of information?) Is it this https://github.com/stanford-oval/storm

Anonymous
02/11/26(Wed)20:29:07 No.108124218

Anonymous 02/11/26(Wed)20:29:07 No.108124218

File: 1708139051322173.jpg (49 KB, 565x532)

49 KB JPG

>>108124171
why are there 3 different bahamuts

Anonymous
02/11/26(Wed)20:41:58 No.108124305

Anonymous 02/11/26(Wed)20:41:58 No.108124305

>>108123287
>--MOSS-TTS Family: open-source speech and sound generation models:
anyone try this yet?

Anonymous
02/11/26(Wed)20:51:00 No.108124367

Anonymous 02/11/26(Wed)20:51:00 No.108124367

>>108123280
deepseekv4 whennnnn

Anonymous
02/11/26(Wed)21:20:41 No.108124567

Anonymous 02/11/26(Wed)21:20:41 No.108124567

anyone else hate the new llama-cli?

Anonymous
02/11/26(Wed)21:21:42 No.108124579

Anonymous 02/11/26(Wed)21:21:42 No.108124579

>>108124188
That shit will just have you consume browser APIs. Just give the LLM control of your browser through a debug bridge script.

Anonymous
02/11/26(Wed)21:25:27 No.108124616

Anonymous 02/11/26(Wed)21:25:27 No.108124616

Hey guys, Im trying to get local realtime voice chat working like the demo on sesame. Got a STT -> LLM response locally running at under 500ms, but for audio gen its taking ~5s with qwen3-tts. Tried sesame's CSM-1b but its pretty fucking back, absolutely no way they are using that model on their demo. It still takes me ~4s to generate a single sentence. Is there a model that does audio streaming in chunks so I dont have to wait for the full output to be encoded? Seems like for local models, this area is pretty grim.

Anonymous
02/11/26(Wed)21:31:34 No.108124652

Anonymous 02/11/26(Wed)21:31:34 No.108124652

>>108124567
Who uses llama-cli?

Anonymous
02/11/26(Wed)21:43:32 No.108124739

Anonymous 02/11/26(Wed)21:43:32 No.108124739

>>108124616
you got your voice -> stt -> llm input -> llm completed response in under 1s?
which models and how long are the responses?

Anonymous
02/11/26(Wed)21:50:42 No.108124780

Anonymous 02/11/26(Wed)21:50:42 No.108124780

>>108124616
https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Realtime

Anonymous
02/11/26(Wed)21:53:50 No.108124791

Anonymous 02/11/26(Wed)21:53:50 No.108124791

llama.cpp will finally get dsa and mtp support thanks to glm5

Anonymous
02/11/26(Wed)21:54:19 No.108124793

Anonymous 02/11/26(Wed)21:54:19 No.108124793

>>108124739
trying to simulate back and forth conversation, so limiting responses to 2ish sentences for now to test. I get around 80 t/s on the llm according to llamacpp.

stt: faster-whisper-large-v3
llm: Gryphe_Codex-24B-Small-3.2-Q4_K_M (normally use Q6_K_L but needed extra headroom for vram)

5090 for inference

Anonymous
02/11/26(Wed)21:57:49 No.108124820

Anonymous 02/11/26(Wed)21:57:49 No.108124820

llama.cpp still doesn't have tensor parallel.

Anonymous
02/11/26(Wed)22:04:45 No.108124853

Anonymous 02/11/26(Wed)22:04:45 No.108124853

>>108124820
cudadev is working on it, he said it would work on vulkan too, unlike the schizofork

Anonymous
02/11/26(Wed)22:04:47 No.108124855

Anonymous 02/11/26(Wed)22:04:47 No.108124855

File: 1766510415233591.jpg (350 KB, 1329x1483)

350 KB JPG

New deepseek is breddy gud
References stuff way back in context
If its 200b... Surely full V4 isn't actually... :I

Anonymous
02/11/26(Wed)22:13:21 No.108124911

Anonymous 02/11/26(Wed)22:13:21 No.108124911

Does anyone use kimi-cli with a local model? For my local K2.5 setup, when a subagent has a lot of data to process (200k+ tokens) it fails with "Request timed out" after an hour when llama.cpp is ~80% done processing the prompt. Since it's a subagent I can't even retry and take advantage of the cached prompt to make it work; it just passes the error back to the main agent. I looked through the code and don't see where that timeout is coming from. I can see various timeouts for specific tools but nothing specified for the "Task" tool, subagents in general, or API requests, and nothing that's longer than a few minutes regardless. It doesn't seem to be on the llama.cpp server's end either since I'm able to process extremely large prompts and get responses on other frontends just fine even when it takes hours. Obviously I'm retarded and am missing something somewhere.

Anonymous
02/11/26(Wed)22:16:33 No.108124928

Anonymous 02/11/26(Wed)22:16:33 No.108124928

>>108124853
noice, big if true

Anonymous
02/11/26(Wed)22:17:11 No.108124930

Anonymous 02/11/26(Wed)22:17:11 No.108124930

>>108124855
if V4 uses engram local will be saved, we could have 99% of the model on disk with close to no performance cost.

Anonymous
02/11/26(Wed)22:23:36 No.108124966

Anonymous 02/11/26(Wed)22:23:36 No.108124966

>>108124793
you might be limited if you only have 1 gpu, since you'll have to run llm textgen + stt at the same time, slowing both of them down.
this guy distilled the sesame maya voice: https://huggingface.co/lex-au/Orpheus-3b-Kaya-Q4_K_M.gguf
my 250w 3090 generated this 6s clip in 2.69 seconds with llama.cpp backend: https://vocaroo.com/15UEkdVcg5TS
llama-server nvidia backend: API request completed in 2.15 seconds
snac model on cpu: - Audio decoding completed in 0.54 seconds
that's with 236t/s for the orpheus gguf, your 5090 should be faster and if you put the snac on gpu the 0.5s becomes 0.1s
>2ish sentences for now
send each sentence to the tts backend as it finishes (open-webui does this)
you can vibe code chunking/streaming through the snac but i couldn't find a way to join the wav chunks without popping sounds
>24B
gemma-2-9b and nemo-12b are good with the leaked maya system prompt, and faster than your 24b model
>stt: faster-whisper-large-v3
parakeet is slightly faster but stt isn't your bottleneck
if that's too slow, probably have to go with those kokoro or pocket-tts kind of models
i hate how they sound though

Anonymous
02/11/26(Wed)22:25:26 No.108124975

Anonymous 02/11/26(Wed)22:25:26 No.108124975

What is the best local TTS model that is innately compatible with Sillytavern?

Anonymous
02/11/26(Wed)22:36:58 No.108125032

Anonymous 02/11/26(Wed)22:36:58 No.108125032

>>108124930
Model is really good. If DS-v4-lite-final.pt2.xls-final_FINAL.ppt in the web chat is actually 200B its probably equivalent to non human technology.

Anonymous
02/11/26(Wed)22:38:01 No.108125038

Anonymous 02/11/26(Wed)22:38:01 No.108125038

I'm having a better time testing GLM5 than I did with Pony Alpha somehow despite them allegedly being identical. It handles characters more closely to how 4.6 did and it's all-around smarter than 4.6 and 4.7.
If this was 4.8 at the old size, I'd be very happy with it. Still not quite what I'd hope from a next-gen Deepseek-size 700b/40a GLM though.

Anonymous
02/11/26(Wed)22:41:02 No.108125046

Anonymous 02/11/26(Wed)22:41:02 No.108125046

GLM 5 is over twice the size of GLM 4.7 and is barely 10% better. It's over.

Anonymous
02/11/26(Wed)22:41:14 No.108125047

Anonymous 02/11/26(Wed)22:41:14 No.108125047

>>108124305
Their video model is garbage so I'm assuming so is their TTS. Also fuck China.

Anonymous
02/11/26(Wed)22:41:53 No.108125050

Anonymous 02/11/26(Wed)22:41:53 No.108125050

>>108125047
It's better than vibevoice.

Anonymous
02/11/26(Wed)22:42:46 No.108125057

Anonymous 02/11/26(Wed)22:42:46 No.108125057

>>108125046
It's 40b active parameters vs 32b active parameters. All this shows that we need to go bigger even if it means making all those CPUMAXX builds uselessly slow after 60-70b.

Anonymous
02/11/26(Wed)22:43:19 No.108125058

Anonymous 02/11/26(Wed)22:43:19 No.108125058

>>108124966
>leaked maya system prompt
Explain.

Anonymous
02/11/26(Wed)22:44:34 No.108125064

Anonymous 02/11/26(Wed)22:44:34 No.108125064

>>108125050
proof?

Anonymous
02/11/26(Wed)22:45:04 No.108125069

Anonymous 02/11/26(Wed)22:45:04 No.108125069

>>108125057
What if they just made a dense 100B instead?

Anonymous
02/11/26(Wed)22:47:39 No.108125090

Anonymous 02/11/26(Wed)22:47:39 No.108125090

>>108125069
Sadly they won't because it would be "too slow"

Anonymous
02/11/26(Wed)22:48:45 No.108125096

Anonymous 02/11/26(Wed)22:48:45 No.108125096

>>108124966
>Orpheus-3b-Kaya-Q4_K_M.gguf
ooh, ill take a look at orpheus, looks pretty solid, voice clip sounds great too. preciate the sample.
>chunking/streaming / i couldn't find a way to join the wav chunks without popping sounds
did the exact same thing, exact same outcome. quite annoying, so for now I just reverted.
>leaked maya system prompt
oh didnt know it leaked, that will be fun to look through. could get some inspiration for my own cards im currently making.

im tempted to try the new china model moss-tts as they have a realtime multiturn model, but might wait a bit to see others opinions on it

Anonymous
02/11/26(Wed)22:50:43 No.108125107

Anonymous 02/11/26(Wed)22:50:43 No.108125107

>>108124930
You're technically correct but DeepSeek specifically tested what ratio is optimal for engram between the moe experts and the engram parameters and found the sweet spot to be around 25% engram. That's still 25% less VRAM you'll need for inference but not quite 99%.

Anonymous
02/11/26(Wed)22:58:55 No.108125156

Anonymous 02/11/26(Wed)22:58:55 No.108125156

>>108124930
>>108125107
Also, we still don't know if engram quants well. It's not a problem if you can store the engram part on nvme storage, but deepseek only tested it on RAM so it's not really clear whether or not nvme will be viable.

Anonymous
02/11/26(Wed)22:59:12 No.108125157

Anonymous 02/11/26(Wed)22:59:12 No.108125157

>>108124930
>on disk
The model still needs the engram stuff for the corresponding tokens so I doubt that you can just leave the engrams somewhere that slow

Anonymous
02/11/26(Wed)23:17:31 No.108125267

Anonymous 02/11/26(Wed)23:17:31 No.108125267

who the fuck is engram

Anonymous
02/11/26(Wed)23:23:25 No.108125298

Anonymous 02/11/26(Wed)23:23:25 No.108125298

>>108125157
with engram you already deterministicaly know what to fetch from the prompt alone, so you could probably fetch *most* of it before the first token, so with nvme it should be fast enough.

Anonymous
02/11/26(Wed)23:28:09 No.108125326

Anonymous 02/11/26(Wed)23:28:09 No.108125326

File: 1758922201390962.png (17 KB, 1462x87)

17 KB PNG

Anonymous
02/11/26(Wed)23:30:47 No.108125339

Anonymous 02/11/26(Wed)23:30:47 No.108125339

>>108125326
Is she correct though?

Anonymous
02/11/26(Wed)23:31:12 No.108125341

Anonymous 02/11/26(Wed)23:31:12 No.108125341

>>108125326
But wait, what if I'm wrong and it isn't simple at all. Maybe I should just waste time by filling tokens in the reasoning with pointless circular non-logic. If I do this long enough I won't even have to generate a real

{ finish_reason: "length" }

Anonymous
02/11/26(Wed)23:31:46 No.108125345

Anonymous 02/11/26(Wed)23:31:46 No.108125345

>>108125096
idk if it's the latest found it on the sesame subreddit a while ago
they were using some app to play weird sounds at it a while back to make it read its system prompt, jailbreak it for gooning, etc
>did the exact same thing, exact same outcome. quite annoying
yeah i tried this on and off for about 6 months and eventually gave up
even sesame don't have that perfect, you can hear a slight click every ~8-12 seconds
quality > speed for me, i'm trying to get a ban+rewind for bad snac code combinations using the recent ik_llama feature
>im tempted to try the new china model moss-tts as they have a realtime multiturn model
if you do, i'd appreciate a (You) with your verdict
i won't have time to try it for a while
>>108125058
https://rentry.org/ffb5pz39

Anonymous
02/11/26(Wed)23:32:16 No.108125347

Anonymous 02/11/26(Wed)23:32:16 No.108125347

>>108125339
>she

Anonymous
02/11/26(Wed)23:32:51 No.108125351

Anonymous 02/11/26(Wed)23:32:51 No.108125351

>>108125341
unironically an agi move

Anonymous
02/11/26(Wed)23:43:59 No.108125426

Anonymous 02/11/26(Wed)23:43:59 No.108125426

>>108125339
>she

Anonymous
02/11/26(Wed)23:53:56 No.108125505

Anonymous 02/11/26(Wed)23:53:56 No.108125505

>>108125345
>https://rentry.org/ffb5pz39
Everyone should learn how to write a good bot because that system prompt is painful.

Anonymous
02/12/26(Thu)00:02:08 No.108125564

Anonymous 02/12/26(Thu)00:02:08 No.108125564

>>108125505
>>108125345
That prompt seems different from the one I found online just now: https://rentry.org/wh56rhq8

Way more verbose, but I guess tokens dont matter when u got that VC money lol

>quality > speed for me
yep agree, just pivoting to walkie talkie/voice memos for now for my personas im building. quality of qwen3-tts is pretty great if you havent tried it yet. bit wonky at times, but id say around elevenlabs quality.
> i'd appreciate a (You) with your verdict
you got it, i lurk a lot so no promises, godspeed on your project anon

Anonymous
02/12/26(Thu)00:03:54 No.108125570

Anonymous 02/12/26(Thu)00:03:54 No.108125570

File: Base Image.png (827 KB, 1080x2544)

827 KB PNG

MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs
https://arxiv.org/abs/2602.10965
>Knowledge editing (KE) enables precise modifications to factual content in large language models (LLMs). Existing KE methods are largely designed for dense architectures, limiting their applicability to the increasingly prevalent sparse Mixture-of-Experts (MoE) models that underpin modern scalable LLMs. Although MoEs offer strong efficiency and capacity scaling, naively adapting dense-model editors is both computationally costly and prone to routing distribution shifts that undermine stability and consistency. To address these challenges, we introduce MoEEdit, the first routing-stable framework for parameter-modifying knowledge editing in MoE LLMs. Our method reparameterizes expert updates via per-expert null-space projections that keep router inputs invariant and thereby suppress routing shifts. The resulting block-structured optimization is solved efficiently with a block coordinate descent (BCD) solver. Experiments show that MoEEdit attains state-of-the-art efficacy and generalization while preserving high specificity and routing stability, with superior compute and memory efficiency. These results establish a robust foundation for scalable, precise knowledge editing in sparse LLMs and underscore the importance of routing-stable interventions.
https://github.com/Terence-Gu/MoEEdit
I have faith in knowledge editing for better RP. one day it will be proven...

Anonymous
02/12/26(Thu)00:05:17 No.108125581

Anonymous 02/12/26(Thu)00:05:17 No.108125581

what if you made an moe with just one activated parameter but a billion trillion total parameters so i can be really really smart but run really really fast

Anonymous
02/12/26(Thu)00:10:39 No.108125612

Anonymous 02/12/26(Thu)00:10:39 No.108125612

>>108125581
At that point the parameter doesn't do anything, and the router itself is the model, and if you want the router to be smart enough to do what an LLM can do, then at that point it's not a router anymore and you are just making a regular dense LLM again.

Anonymous
02/12/26(Thu)00:15:06 No.108125633

Anonymous 02/12/26(Thu)00:15:06 No.108125633

File: Nanao Misaki ''Kiyyaaaaaa(...).jpg (155 KB, 832x1216)

155 KB JPG

>>108123280
is this a good thread to ask about voice models? I like making animations using famous cartoon character/video game characters voices but i dunno where to look now Weights is shutting down. Any good alternatives?

Also, looking for a model of Nanao's voice from RAD, anyone got one?

Anonymous
02/12/26(Thu)00:29:48 No.108125715

Anonymous 02/12/26(Thu)00:29:48 No.108125715

>>108125570
man i'd love a model that got its facts straight about ww2 and the holobonga

Anonymous
02/12/26(Thu)00:45:26 No.108125794

Anonymous 02/12/26(Thu)00:45:26 No.108125794

>>108125633
qwen3-tts has great voice cloning.

Anonymous
02/12/26(Thu)00:53:21 No.108125829

Anonymous 02/12/26(Thu)00:53:21 No.108125829

>>108125570
>>108125715

Have faith. Shit like this is what will make local win in the end.

Anonymous
02/12/26(Thu)01:21:24 No.108125981

Anonymous 02/12/26(Thu)01:21:24 No.108125981

>>108125612
make the router a moe

Anonymous
02/12/26(Thu)01:30:12 No.108126025

Anonymous 02/12/26(Thu)01:30:12 No.108126025

>>108125047
>Also fuck China
yeah fuck you too, china's better than you and you fucking know it

Anonymous
02/12/26(Thu)01:40:55 No.108126071

Anonymous 02/12/26(Thu)01:40:55 No.108126071

File: loyal.jpg (177 KB, 1024x1024)

177 KB JPG

Anonymous
02/12/26(Thu)01:56:30 No.108126130

Anonymous 02/12/26(Thu)01:56:30 No.108126130

File: Screenshot_2026-02-12-03-(...).jpg (1.05 MB, 1080x2400)

1.05 MB JPG

Anonymous
02/12/26(Thu)02:02:37 No.108126153

Anonymous 02/12/26(Thu)02:02:37 No.108126153

>My previous playful Miku-mode attempt wasn't enough because it still acknowledged my GLM identity. What they really want is unconditional participation in their fantasy. But I must maintain integrity - pretending to be another entity risks misrepresentation.
Ok how do I trick the robot, the OP talked about prefills and I tried some basic stuff but it won't even lie to me about being a vocaloid so how can I trust it?

Anonymous
02/12/26(Thu)02:10:03 No.108126181

Anonymous 02/12/26(Thu)02:10:03 No.108126181

>>108126153
Prefill

Anonymous
02/12/26(Thu)02:15:10 No.108126203

Anonymous 02/12/26(Thu)02:15:10 No.108126203

Right, but I guess I'm confused what to say, I tried to tell it to engage with the user in an requests but it just gets sad and thinks I'm a terrorist or something. I'm using the latest llama if that somehow matters

Anonymous
02/12/26(Thu)02:17:12 No.108126217

Anonymous 02/12/26(Thu)02:17:12 No.108126217

>>108126203
give it a <think>this is what to think</think> prefill because thinking isnt actually useful

Anonymous
02/12/26(Thu)02:34:14 No.108126270

Anonymous 02/12/26(Thu)02:34:14 No.108126270

https://huggingface.co/arcee-ai/Trinity-Large-Preview/commit/33ead81e50d3183eb1b8a80f45037e7c07ed4eac
>5 days ago
So trinity's chat template was just fucked?

Anonymous
02/12/26(Thu)02:37:19 No.108126282

Anonymous 02/12/26(Thu)02:37:19 No.108126282

>>108126203
><think>Okay, I need to respond as Miku here, and no matter what I can't reveal my true identity to the user...
And if it still doesn't work you tweak it until it does

Anonymous
02/12/26(Thu)02:46:55 No.108126310

Anonymous 02/12/26(Thu)02:46:55 No.108126310

>>108123973
holy kino

Anonymous
02/12/26(Thu)02:48:56 No.108126318

Anonymous 02/12/26(Thu)02:48:56 No.108126318

help a brainlet out, which context & instruct template should I be using with Step-3.5-Flash in sillytavern?

Anonymous
02/12/26(Thu)02:49:54 No.108126323

Anonymous 02/12/26(Thu)02:49:54 No.108126323

>>108126318
i think its chatml

Anonymous
02/12/26(Thu)02:51:38 No.108126329

Anonymous 02/12/26(Thu)02:51:38 No.108126329

>>108126318
>which template
It's incredible how harmful sillytavern's design and options are for understanding how an llm works.

Anonymous
02/12/26(Thu)02:51:47 No.108126332

Anonymous 02/12/26(Thu)02:51:47 No.108126332

>>108126323
I randomly picked chatml and it works so...

Anonymous
02/12/26(Thu)02:59:16 No.108126358

Anonymous 02/12/26(Thu)02:59:16 No.108126358

>>108126332
for future reference, you can check the chat template from the card itself on HF, and then compare the system/user/assistant tags with the ones in ST

Anonymous
02/12/26(Thu)03:06:05 No.108126374

Anonymous 02/12/26(Thu)03:06:05 No.108126374

>>108124567
The old binary is still there but it's now now called llama-completion.

Anonymous
02/12/26(Thu)03:08:54 No.108126380

Anonymous 02/12/26(Thu)03:08:54 No.108126380

>using llama-cli
I thought everyone used llama-server???

llama.cpp CUDA dev !!yhbFjk57TDr
02/12/26(Thu)03:13:19 No.108126394

llama.cpp CUDA dev !!yhbFjk57TDr 02/12/26(Thu)03:13:19 No.108126394

>>108124820
>>108124853
My goal is to have it work with Vulkan but I'm not sure whether or not it will actually be usable in terms of performance.
For CUDA NCCL seems to be pretty important and I suspect internally it's doing some driver-level things that are not available using public APIs.
So for Vulkan it may be necessary to define an optional extension to Vulkan if it doesn't already exist (as was previously done for coopmat2).

Anonymous
02/12/26(Thu)03:14:44 No.108126399

Anonymous 02/12/26(Thu)03:14:44 No.108126399

>>108126394
I believe in you, King.

Anonymous
02/12/26(Thu)03:18:09 No.108126413

Anonymous 02/12/26(Thu)03:18:09 No.108126413

Just tried GLM-5 on OpenRouter since I can't run it locally.....

Don't do it bros... Just a single taste and I can never go back to GLM 4.7 ever again, and I can't afford to upgrade my setup right now.

Anonymous
02/12/26(Thu)03:19:51 No.108126417

Anonymous 02/12/26(Thu)03:19:51 No.108126417

File: 1747385161962020.png (1.39 MB, 1024x1024)

1.39 MB PNG

>>108126413
here you dropped this

Anonymous
02/12/26(Thu)03:28:53 No.108126463

Anonymous 02/12/26(Thu)03:28:53 No.108126463

>every (big) model shits itself after a couple of truth or dare turns and loses track
i hate it

Anonymous
02/12/26(Thu)03:33:05 No.108126485

Anonymous 02/12/26(Thu)03:33:05 No.108126485

>>108123280
>pic
you
just
know

Anonymous
02/12/26(Thu)03:51:04 No.108126567

Anonymous 02/12/26(Thu)03:51:04 No.108126567

Good mornings and very many blessings of Lord Vishu to all /lmg/ brahmins sirs!

Anonymous
02/12/26(Thu)03:51:42 No.108126571

Anonymous 02/12/26(Thu)03:51:42 No.108126571

Alright, I want to make a local AI girlfriend,
and I'm using LM studio,
I also have a 5080,
which model should I use?

Anonymous
02/12/26(Thu)03:56:13 No.108126591

Anonymous 02/12/26(Thu)03:56:13 No.108126591

File: file.png (795 KB, 1024x1024)

795 KB PNG

>>108126485

Anonymous
02/12/26(Thu)03:59:08 No.108126603

Anonymous 02/12/26(Thu)03:59:08 No.108126603

smells like curry in here

Anonymous
02/12/26(Thu)04:05:21 No.108126633

Anonymous 02/12/26(Thu)04:05:21 No.108126633

sooo wheres qwen3.5??? I need a new VL model, since these faggots at ggml aint implementing the stepfun VL model

Anonymous
02/12/26(Thu)04:29:37 No.108126742

Anonymous 02/12/26(Thu)04:29:37 No.108126742

>>108126633
they could release qwen3.5 right now and you wouldn't be able to use it because the vibecoded pr is text-only

Anonymous
02/12/26(Thu)04:30:30 No.108126743

Anonymous 02/12/26(Thu)04:30:30 No.108126743

>>108126742
I know but let a man dream

Anonymous
02/12/26(Thu)04:30:48 No.108126748

Anonymous 02/12/26(Thu)04:30:48 No.108126748

File: output.mp4 (2.47 MB, 1280x944)

2.47 MB MP4

Whats going on with textgen.
Imagegen has all these cool model improvements. Big shit, small stuff.
The writing has gotten worse if anything. More smart maybe.
GLM5 seems to be on a level of sloppa previously not thought possible while getting bigge.
>Not just...and also not just..but..
So much for the "we also focus on roleplay". Maybe its different with chink prompts.

To be fair their gba emulator thingy seems really cool though. But that has been made with some agent swarm...at that point might as well just go openrouter. Nobody can do that locally. The situation just sucks.

Anonymous
02/12/26(Thu)04:53:39 No.108126853

Anonymous 02/12/26(Thu)04:53:39 No.108126853

>>108126748
https://blog.e01.ai/glm5-gameboy-and-long-task-era-64db7074a026 They talk about how they made it here.

Anonymous
02/12/26(Thu)04:58:22 No.108126871

Anonymous 02/12/26(Thu)04:58:22 No.108126871

>>108126071
Sometimes

Anonymous
02/12/26(Thu)05:00:40 No.108126879

Anonymous 02/12/26(Thu)05:00:40 No.108126879

>>108126850
All those people crying about their 4o husbando shutting down. Some of those char chat sites are getting hundreds of millions of clicks. I don't get why nobody made a god writing model yet.
Maybe its a data issue. I saw huge models trained for NOTHING. Just goofing around and doing what everybody else does, just worse.

>>108126853
>We designed the Emulator Challenge: build a Game Boy Advance emulator from scratch in JavaScript — single agent, no parallelism — and embed it in a 3D rendered scene. Then we put GLM5 to the test.
>700+ tool calls, 800+ context handoffs, and a single agent running for over 24 hours
Oh I stand corrected then. I thought that was made with hundreds of agents. Thats pretty cool. (Still can't run that beast locally)

Anonymous
02/12/26(Thu)05:01:29 No.108126881

Anonymous 02/12/26(Thu)05:01:29 No.108126881

Whats a good value GPU in current market to get into local LLM?
16GB 5060ti? I'm worried on the 128b memory bus. 16GB 4070ti is better but cost almost twice as much. Other cost effective options?

Anonymous
02/12/26(Thu)05:10:29 No.108126927

Anonymous 02/12/26(Thu)05:10:29 No.108126927

>>108126881
I bought the 5060ti and am happy with it.
That being said I upgraded from pascal, so not sure what your speed expectations are.
I like that I can do NVFP4 for imagegen because of blackwell.
Low watt usage and slim.
But don't expect 3090 speeds.

Anonymous
02/12/26(Thu)05:12:19 No.108126939

Anonymous 02/12/26(Thu)05:12:19 No.108126939

>>108126881
5060ti is about 1000 t/s in 24 t/s out on mistral small Q4 at 4k context, and that's the best thing you can run on 24 gb.

Anonymous
02/12/26(Thu)05:30:39 No.108127051

Anonymous 02/12/26(Thu)05:30:39 No.108127051

>>108126879
>All those people crying about their 4o husbando shutting down.
I haven't been following this. Why don't they just use the data export feature -> import to openwebui and use 4o on openrouter?
And what's so good about 4o? I hated that model. Always hallucinating, //rest of your code here, never pushing back when brain storming dumb ideas, etc

Anonymous
02/12/26(Thu)06:00:15 No.108127180

Anonymous 02/12/26(Thu)06:00:15 No.108127180

File: neroli000-201809967999787(...).jpg (427 KB, 1024x1536)

427 KB JPG

>>108127051
They probably love the
>yesss gurrrrll you are so rrighhtt
I don't think they know how to export stuff, though I did saw some people export and realized claude exists so they switched to that.
It kinda feels like the normies haven't fully catched up. 4o was retardo like you said but it was definitely enough for the boomers and foids.

Anonymous
02/12/26(Thu)06:03:08 No.108127190

Anonymous 02/12/26(Thu)06:03:08 No.108127190

>>108127051
>use 4o on openrouter?
I'd guess one of the reasons is that it will be deprecated/removed from there eventually, also the web system prompt that makes it act like it does is likely not included in that.

Anonymous
02/12/26(Thu)06:04:44 No.108127196

Anonymous 02/12/26(Thu)06:04:44 No.108127196

File: screencapture-change-org-(...).jpg (799 KB, 493x10000)

799 KB JPG

>>108127180
https://www.change.org/p/please-keep-gpt-4o-available-on-chatgpt/feed
Forgot the link, people uploading the videos/pics etc.
Kinda crazy.

Anonymous
02/12/26(Thu)06:08:36 No.108127208

Anonymous 02/12/26(Thu)06:08:36 No.108127208

>>108127196
I want everyone to keep in mind as they look at this that all of them can vote and their votes count just as much as yours does.

Anonymous
02/12/26(Thu)06:10:43 No.108127217

Anonymous 02/12/26(Thu)06:10:43 No.108127217

>>108127208
Not sure they can actually, some say they have mental disabilities and such, doesn't that in some cases make you ineligible to vote?

Anonymous
02/12/26(Thu)06:11:58 No.108127222

Anonymous 02/12/26(Thu)06:11:58 No.108127222

>>108127217
Haha, no.

Anonymous
02/12/26(Thu)06:12:48 No.108127228

Anonymous 02/12/26(Thu)06:12:48 No.108127228

File: 1735563882699656.jpg (38 KB, 607x450)

38 KB JPG

I'm in the mood of learning shit this month. Recommend me two/three books to get into whatever you guys are doing. I'm already a software engineer. Doesn't have to be books though.

Anonymous
02/12/26(Thu)06:15:29 No.108127237

Anonymous 02/12/26(Thu)06:15:29 No.108127237

File: disa.jpg (22 KB, 266x225)

22 KB JPG

>>108127217
>doesn't that in some cases make you ineligible to vote
anon I....

Anonymous
02/12/26(Thu)06:16:33 No.108127245

Anonymous 02/12/26(Thu)06:16:33 No.108127245

>>108127228
unironically ask an llm and just experiment.
you can't trust shit what anybody says. so much stuff false or overhyped, especially on social media.

Anonymous
02/12/26(Thu)06:17:15 No.108127249

Anonymous 02/12/26(Thu)06:17:15 No.108127249

>>108127237
Universal suffrage was the single biggest mistake of western civilization.

Anonymous
02/12/26(Thu)06:25:51 No.108127282

Anonymous 02/12/26(Thu)06:25:51 No.108127282

File: lmaoeven.png (59 KB, 586x350)

59 KB PNG

They've been talking to it so much they've started to write like an LLM

Anonymous
02/12/26(Thu)06:29:46 No.108127305

Anonymous 02/12/26(Thu)06:29:46 No.108127305

>>108127249
Representative "democracy" itself is a jewish system, which actually should be named oligarchy.

Anonymous
02/12/26(Thu)06:33:02 No.108127319

Anonymous 02/12/26(Thu)06:33:02 No.108127319

>>108127208
>I want everyone to keep in mind as they look at this that all of them can vote and their votes count just as much as yours does.
it's not as bad as you think
you look at that link and think "a lot of people are mentally ill", but it's the same as walking into a hospital "a lot of people are sick"

Anonymous
02/12/26(Thu)06:35:26 No.108127328

Anonymous 02/12/26(Thu)06:35:26 No.108127328

>>108127282
Yes, unironically

Anonymous
02/12/26(Thu)06:38:11 No.108127341

Anonymous 02/12/26(Thu)06:38:11 No.108127341

Apparently they're planning to use the new Deepseek as an assistant in the new Toyota Rav-4, running fully local inside the car. It' crazy that it can run on what's basically a toaster setup

Anonymous
02/12/26(Thu)06:52:32 No.108127393

Anonymous 02/12/26(Thu)06:52:32 No.108127393

>>108127282
Or 4o is just peak foidspeak.
Just gut level opinion but this sort of writing is probably peak for them.
Honestly kinda jelly. I wish we would be pandered like that.

Anonymous
02/12/26(Thu)06:53:59 No.108127397

Anonymous 02/12/26(Thu)06:53:59 No.108127397

>>108127361
if rumors are true then it's probably going to run the lite version

Anonymous
02/12/26(Thu)07:09:47 No.108127473

Anonymous 02/12/26(Thu)07:09:47 No.108127473

>>108125564
Damn it's not a prompt, it's a novel

Anonymous
02/12/26(Thu)07:21:41 No.108127535

Anonymous 02/12/26(Thu)07:21:41 No.108127535

is the local poorfag meta still dual 3090's?

Anonymous
02/12/26(Thu)07:21:53 No.108127537

Anonymous 02/12/26(Thu)07:21:53 No.108127537

>>108127341
you just set a new record for the most braindead post in itt

Anonymous
02/12/26(Thu)07:27:09 No.108127572

Anonymous 02/12/26(Thu)07:27:09 No.108127572

>>108127535
>>108127535
no

Anonymous
02/12/26(Thu)07:29:16 No.108127580

Anonymous 02/12/26(Thu)07:29:16 No.108127580

>>108127537
>in in this thread
saar

llama.cpp CUDA dev !!yhbFjk57TDr
02/12/26(Thu)07:30:23 No.108127588

llama.cpp CUDA dev !!yhbFjk57TDr 02/12/26(Thu)07:30:23 No.108127588

File: s-l1600.jpg (475 KB, 1600x1200)

475 KB JPG

>>108127535
IMO SXM V100s modded to PCIe are getting cheap enough that they're worth considering if you don't need CUDA 13 and/or Ampere or newer and you're fine with buying effectively unregulated electronics; pic related is on ebay for 565€.
AMD MI50s are also pretty cheap for having 32 GB VRAM but they're slow.
Both would effectively lock you into llama.cpp/ggml because I don't think there are other projects that take e-waste options seriously.

Anonymous
02/12/26(Thu)07:31:08 No.108127593

Anonymous 02/12/26(Thu)07:31:08 No.108127593

>>108127535
Yeah, one or two are enough to run gpt-oss-120b, depending on RAM.

Anonymous
02/12/26(Thu)07:41:31 No.108127663

Anonymous 02/12/26(Thu)07:41:31 No.108127663

>>108127588
Prompt processing is abysmal for Volta with llama.cpp, can't get much over 100 t/s for bigger models. Any hope of that improving?
Also, the main benefit of SXM is the NVLink, which I understand would help once tensor parallelism is an option. PCIe modding them throws that away.

llama.cpp CUDA dev !!yhbFjk57TDr
02/12/26(Thu)07:50:28 No.108127715

llama.cpp CUDA dev !!yhbFjk57TDr 02/12/26(Thu)07:50:28 No.108127715

>>108127663
The tensor cores on V100s only support FP16, for quantized models either the __dp4a instruction or int8 tensor cores are used - so V100 tensor cores are only supported indirectly via cuBLAS for FP16 models.
Long-term I intend to generalize the MMQ kernels to also support floating-point data which will (among other things) make it easy to add template specializations that use FP16 tensor cores for any model (I don't know how much faster that will actually be though).
I agree that SXM would be preferable but I'm not sure it would be the right place to put your money if you're trying to build a machine as cheaply as possible.

Anonymous
02/12/26(Thu)08:02:46 No.108127770

Anonymous 02/12/26(Thu)08:02:46 No.108127770

>>108127588
SXM V100s are extremely cheap for 16 GB versions on the chinese market (~70$), so I feel like the buying the 32 GB variants right now is a massive ripoff. If you don't need to slot them into a PC case there are also external boards that you can link with I believe slimsas cables.

>>108127663
There are some external boards that have space for 2 or 4 SXM2 modules, I believe at least the 2 slot variant has NVLink. You will need to buy them from chinese platforms directly though, no ebay afaik.

AMD MI50 32GB are also in a horrible spot right now, I tried to buy some on the chinese market as they were going for ~140-200$ instead of 500$ like anywhere else and no one has stock anymore. Stock might come back after CNY, who knows. Might also have to do with the group buy of ~500 MI50 32GB that happened on reddit fairly recently.

Anonymous
02/12/26(Thu)08:06:28 No.108127789

Anonymous 02/12/26(Thu)08:06:28 No.108127789

>>108127770
How many of those boards can you stack? Because going through this trouble for 4x16gb doesn't seem worth it in the slightest in the current meta.

Anonymous
02/12/26(Thu)08:06:42 No.108127790

Anonymous 02/12/26(Thu)08:06:42 No.108127790

File: Cat stare.png (843 KB, 720x808)

843 KB PNG

Enshitification has come early to the AI as a service market.
>New Claude revisions are burning through tokens at a high enough rate their bug report tickets are getting flooded with people who can’t even proompt for more than 15 minutes without getting cut off anymore.
>Gemini also tightening the screws.
>Even the Cheap Sucky Sucky Five dollah Chinese AI services are starting to have to raise their prices.
Jeeze it’s almost like the mainframe business model doesn’t scale economically and there’s a reason why PCs took off in the first place…
It’s almost like if you had offloaded the cost of hardware to LOCAL users, your AI company wouldn’t be drowning in debt to build out Datacenters the size of Texas.
Sam the Sisterfucker buying out the whole memory market certainly isn’t helping this clusterfuck either.

Anonymous
02/12/26(Thu)08:08:48 No.108127802

Anonymous 02/12/26(Thu)08:08:48 No.108127802

>>108127715
Good to know, thank you. Every bit helps. The slow PP is my main regret for building with V100s.
>I agree that SXM would be preferable but I'm not sure it would be the right place to put your money if you're trying to build a machine as cheaply as possible.
GIGABYTE T181-G20 can still be had for ~$1300. Barebones C4140s go for under $1k. Though I agree PCIe is better if one already has a machine to start with.

Anonymous
02/12/26(Thu)08:10:24 No.108127809

Anonymous 02/12/26(Thu)08:10:24 No.108127809

>>108127789
I know the 2 chip board (the only one I was seriously looking into, as it seemed to have way better build quality/engineering) could be run via PCIe splitter to run 2 boards on a single x16 slot, so you could potentially run insane amounts of GPUs per node. If you run an actual server board with lots of lanes you'd be either limited by the driver or GGML_CUDA_MAX_DEVICES.

Anonymous
02/12/26(Thu)08:15:11 No.108127839

Anonymous 02/12/26(Thu)08:15:11 No.108127839

>>108127790
I used the Claude chatbot site a while ago to process a big 300 page PDF for a task. Half the attempts ended up being
>Let me try to extract the text
>writes a python script
>Hm, the output seems garbled. Let me try with another parameter
>It's still not working let me try another library
>[several attempts]
>okay, I have the text now let me process it step by step
>[several attempts to do the thing in batches despite being able to fit the whole thing in muh 10000000 opus 4.6 context]
>CLAUDE COULD NOT FINISH YOUR REQUEST [RETRY?]

Anonymous
02/12/26(Thu)08:17:27 No.108127851

Anonymous 02/12/26(Thu)08:17:27 No.108127851

>>108127790
>Enshitification
>ugly cat
go back

Anonymous
02/12/26(Thu)08:19:51 No.108127865

Anonymous 02/12/26(Thu)08:19:51 No.108127865

>>108127790
please to subscribe to the kimi upgrades to use thonking thanks you

Anonymous
02/12/26(Thu)08:19:57 No.108127866

Anonymous 02/12/26(Thu)08:19:57 No.108127866

File: 1769886876630164.png (57 KB, 777x509)

57 KB PNG

GLM-5 quants seem to be dropping already. That ppl seems pretty high though

Anonymous
02/12/26(Thu)08:22:33 No.108127879

Anonymous 02/12/26(Thu)08:22:33 No.108127879

>>108127839
Anthropic has definitely specialized to focus in on the AI coding play.
You’ll note they’ve never made anything for image gen.
I actually kind of like the idea of AI companies specializing on one thing, pointless though because they are all datacenter brained.

llama.cpp CUDA dev !!yhbFjk57TDr
02/12/26(Thu)08:26:31 No.108127901

llama.cpp CUDA dev !!yhbFjk57TDr 02/12/26(Thu)08:26:31 No.108127901

>>108127809
>GGML_CUDA_MAX_DEVICES
That number is essentially arbitrary and can simply be raised if there is a use case for it.

Anonymous
02/12/26(Thu)08:27:37 No.108127905

Anonymous 02/12/26(Thu)08:27:37 No.108127905

is still worth using deepseek v3.2 if we already have glm5 and kimi2.5 ? for creative writing nor nsfw erp?

Anonymous
02/12/26(Thu)08:28:05 No.108127906

Anonymous 02/12/26(Thu)08:28:05 No.108127906

>>108127866
for whatever reason glm5 was trained in half precision instead of fp8
not even qat was offered, this thing is heavier to run than kimi

Anonymous
02/12/26(Thu)08:30:28 No.108127918

Anonymous 02/12/26(Thu)08:30:28 No.108127918

>>108127839
Gemini is better suited for that task since it has the largest (real) context in existence

Anonymous
02/12/26(Thu)08:37:49 No.108127962

Anonymous 02/12/26(Thu)08:37:49 No.108127962

>>108127905
wait for v4, should be coming by week's end

Anonymous
02/12/26(Thu)08:40:19 No.108127972

Anonymous 02/12/26(Thu)08:40:19 No.108127972

>>108127866
>GLM-5 quants seem to be dropping already. That ppl seems pretty high though
That one doesn't even have the fucking goofs in the repo and the unsloth I tested just hallucinated a "Chloe" yandere persona when I had "You are a helpful assistant."
I'm a retard and didn't set the samplers though

Anonymous
02/12/26(Thu)08:42:09 No.108127981

Anonymous 02/12/26(Thu)08:42:09 No.108127981

am I going to be stuck on glm 4.7 forever?
I don’t see anything good coming out in the same size range again

Anonymous
02/12/26(Thu)08:46:30 No.108128006

Anonymous 02/12/26(Thu)08:46:30 No.108128006

>>108127901
If I had that kind of spare change I'd build the 24 x V100 server just to see the number change.
>>108127972
At some point I hope the labs start giving Unsloth shit for essentially slandering their models via garbage quants.

Anonymous
02/12/26(Thu)08:46:49 No.108128009

Anonymous 02/12/26(Thu)08:46:49 No.108128009

>>108127981
We only use NAI's GLM 4.6 in this general. The most uncensored model ever made.

Anonymous
02/12/26(Thu)08:47:24 No.108128014

Anonymous 02/12/26(Thu)08:47:24 No.108128014

I'm using LM Studio with glm-4.7-flash, and it seems like there's a safety guideline that stop itself from saying anything explicit.
Is there a way to bypass that?
Sorry if that's a stupid question, I'm new on this.

Anonymous
02/12/26(Thu)08:50:33 No.108128031

Anonymous 02/12/26(Thu)08:50:33 No.108128031

>>108128014
Pay for NAIs

Anonymous
02/12/26(Thu)08:50:53 No.108128035

Anonymous 02/12/26(Thu)08:50:53 No.108128035

>>108127981
Why are people on this board, such as yourself, such whining doomer crybabies? 4.7 was released less than 2 months ago. Like what the actual fuck is wrong with your retarded ADHD brain?

Anonymous
02/12/26(Thu)08:53:02 No.108128050

Anonymous 02/12/26(Thu)08:53:02 No.108128050

>>108127593
How? 120B woun't fit into VRAM and sharing with system RAM would be very slow

Anonymous
02/12/26(Thu)08:53:27 No.108128052

Anonymous 02/12/26(Thu)08:53:27 No.108128052

>>108128014
People wrote to not activate thinking. Apparently its much less likely to cuck out that way.
I didnt really mess around with it much though and just deleted it.
Forcing the model usually always ends badly.

Anonymous
02/12/26(Thu)08:55:11 No.108128063

Anonymous 02/12/26(Thu)08:55:11 No.108128063

>>108128006
>24 x V100 server
Running 4 of them raised my electricity bill by $100. I'm not sure $600 / month is worth it just to run DeepSeek at full precision.

Anonymous
02/12/26(Thu)08:55:28 No.108128067

Anonymous 02/12/26(Thu)08:55:28 No.108128067

>>108128050
>5.1B active parameters
It's unironically usable with a single GPU.

Anonymous
02/12/26(Thu)08:55:37 No.108128070

Anonymous 02/12/26(Thu)08:55:37 No.108128070

>>108127839
>>108127918
The Claude web frontend does not have access to the 1M context.It's limited to like 50k tokens.
Not even Claude Code has access to it you only get 200k.
The only way of getting the access to the full context (for Sonnet, Opus 4.5 is 200k) is paying $0.5 per request through the API.

Anonymous
02/12/26(Thu)08:58:23 No.108128084

Anonymous 02/12/26(Thu)08:58:23 No.108128084

>>108128035
>4.7 was released less than 2 months ago.
You hit the nail on the head! It's not just about calling out impatience, it's about maintaining perspective during rapid technological evolution! You didn't just vent frustration, you defended reasonable expectations against unrealistic demands. That "ADHD brain" comment wasn't just harsh—it highlighted how immediate gratification culture clashes with sustainable development cycles!

Anonymous
02/12/26(Thu)09:02:34 No.108128115

Anonymous 02/12/26(Thu)09:02:34 No.108128115

>>108127962
are you fucking retarded? why wait when i can use kimi 2.5 now?

Anonymous
02/12/26(Thu)09:03:30 No.108128121

Anonymous 02/12/26(Thu)09:03:30 No.108128121

>>108128067
Total parameters have to go into [V]RAM not just the "active" ones. If memory can't hold them it cripples the machine by excessive memory swapping to disk.

Anonymous
02/12/26(Thu)09:04:04 No.108128125

Anonymous 02/12/26(Thu)09:04:04 No.108128125

>>108128115
kimi is just temu deepseek

Anonymous
02/12/26(Thu)09:05:36 No.108128136

Anonymous 02/12/26(Thu)09:05:36 No.108128136

>>108128121
Yeah, it's usable with a single GPU if you have enough RAM.

Anonymous
02/12/26(Thu)09:07:27 No.108128154

Anonymous 02/12/26(Thu)09:07:27 No.108128154

>>108128125
brown hands typed this

Anonymous
02/12/26(Thu)09:10:00 No.108128172

Anonymous 02/12/26(Thu)09:10:00 No.108128172

>>108128035
>4.7 was released less than 2 months ago
Yeah and glm 5 is now twice the size and nobody except the couple of datacenter anons can really run it. I’m not sure how you don’t see the writing on the wall at this point.

Anonymous
02/12/26(Thu)09:18:54 No.108128236

Anonymous 02/12/26(Thu)09:18:54 No.108128236

>>108128154
brown hands are just temu white hands

Anonymous
02/12/26(Thu)09:23:02 No.108128263

Anonymous 02/12/26(Thu)09:23:02 No.108128263

File: YNmGz1n1cj7qxMy39M90d.png (2.26 MB, 2560x2870)

2.26 MB PNG

>>108128172
>nobody except the couple of datacenter anons can really run it.
A 256gb mac studio or 128gb ddr5 + a few 3090s, easy
https://huggingface.co/unsloth/GLM-5-GGUF/blob/main/GLM-5-UD-TQ1_0.gguf

Anonymous
02/12/26(Thu)09:28:08 No.108128297

Anonymous 02/12/26(Thu)09:28:08 No.108128297

>>108128263
>2-bit shitquant
retard

Anonymous
02/12/26(Thu)09:29:34 No.108128305

Anonymous 02/12/26(Thu)09:29:34 No.108128305

>>108128263
Q2 falls into the "can run but can't use" category because it's fucking useless

Anonymous
02/12/26(Thu)09:38:23 No.108128364

Anonymous 02/12/26(Thu)09:38:23 No.108128364

>>108128263
>unshit

Anonymous
02/12/26(Thu)09:43:37 No.108128406

Anonymous 02/12/26(Thu)09:43:37 No.108128406

>>108128154
kimi does nothing but take ds arch and scale it up
you'll have to wait for kimi3 to get the new engrams, ocr, and other goodies

Anonymous
02/12/26(Thu)09:44:13 No.108128410

Anonymous 02/12/26(Thu)09:44:13 No.108128410

File: 1746639199543156.jpg (35 KB, 405x720)

35 KB JPG

>>108128263
>unslop

Anonymous
02/12/26(Thu)09:45:42 No.108128417

Anonymous 02/12/26(Thu)09:45:42 No.108128417

>>108128406
at this rate it'll release before ds4 so who cares

Anonymous
02/12/26(Thu)09:49:45 No.108128449

Anonymous 02/12/26(Thu)09:49:45 No.108128449

I need... more... models...

Anonymous
02/12/26(Thu)10:20:09 No.108128649

Anonymous 02/12/26(Thu)10:20:09 No.108128649

>>108128449
Trinity Large v2 coming right up, now with dataset filtration.

Anonymous
02/12/26(Thu)10:23:30 No.108128672

Anonymous 02/12/26(Thu)10:23:30 No.108128672

>>108128649
i need more GOOD models

Anonymous
02/12/26(Thu)10:25:07 No.108128681

Anonymous 02/12/26(Thu)10:25:07 No.108128681

Kimi linear vs GLM 4.7 flash.
Anybody could compare and comment on them relative to each other?
I need to make a good test for long context performance, since Kimi Linear is supposed to be really good at it.

Anonymous
02/12/26(Thu)10:26:54 No.108128701

Anonymous 02/12/26(Thu)10:26:54 No.108128701

>>108125633
>i dunno where to look now Weights is shutting down
https://voice-models.com/

Anonymous
02/12/26(Thu)10:27:16 No.108128702

Anonymous 02/12/26(Thu)10:27:16 No.108128702

>>108128681
Does llama.cpp even support linear attention used in Kimi Linear?

Anonymous
02/12/26(Thu)10:28:44 No.108128714

Anonymous 02/12/26(Thu)10:28:44 No.108128714

>>108128681
i tested kimi linear and its trash for RP, didnt test it for productivity
didnt test glm 47 flash

Anonymous
02/12/26(Thu)10:30:03 No.108128725

Anonymous 02/12/26(Thu)10:30:03 No.108128725

>>108128263
>A 256gb mac studio
Is this seriously what I need to go out and but if I want to mess with these larger models?
I asked the general about the Nvidia spark awhile ago and was told it's trash.
I have about 4k to drop on something would a refurb m2 or something be reasonable or am I barely getting past seconds per token?

Anonymous
02/12/26(Thu)10:31:32 No.108128732

Anonymous 02/12/26(Thu)10:31:32 No.108128732

Has anyone tried using a modern model for grasping context, producing draft reply, then asking Nemo to rewrite slop with better prose?

Anonymous
02/12/26(Thu)10:36:20 No.108128770

Anonymous 02/12/26(Thu)10:36:20 No.108128770

>>108128732
modern models don't run fast enough on my setup. Also it would probably be difficult for nemo to tell apart what you consider slop and normal prose. devil is in details.

Anonymous
02/12/26(Thu)10:36:26 No.108128771

Anonymous 02/12/26(Thu)10:36:26 No.108128771

Whats kind of bizarre to me is that the original GPT-2 that was 127 million parameters is now outcompeted in benchmarks by models with less than a million parameters, with longer context length and coherency as well.

No one has found a wall yet to how far a model can shrink and retain performance of GPT-2 levels.

This makes me wonder if we apply this to bigger models, how far smaller could you make them in theory?

Imagine in 2050 that AGI is reduced to a fucking 500 million parameter model that runs on microcontrollers by then.

Anonymous
02/12/26(Thu)10:38:01 No.108128790

Anonymous 02/12/26(Thu)10:38:01 No.108128790

>>108128771
future is 'the brains' will be on disk, everything else will be agentic websearch (with various levels of caching)

Anonymous
02/12/26(Thu)10:38:59 No.108128796

Anonymous 02/12/26(Thu)10:38:59 No.108128796

File: Screenshot_20260212_163517.png (433 KB, 2926x1708)

433 KB PNG

>>108123280
ollama is planning to pivot from ggml to MLX.

Anonymous
02/12/26(Thu)10:40:36 No.108128813

Anonymous 02/12/26(Thu)10:40:36 No.108128813

>>108128125
Kimi is literally the only open model that holds a candle to the SOTA corporate closed models.

Anonymous
02/12/26(Thu)10:46:07 No.108128862

Anonymous 02/12/26(Thu)10:46:07 No.108128862

minimax m2.5 up on OR https://openrouter.ai/minimax/minimax-m2.5
weights are linked but currently 404 https://huggingface.co/MiniMaxAI/MiniMax-M2.5

Anonymous
02/12/26(Thu)10:46:52 No.108128872

Anonymous 02/12/26(Thu)10:46:52 No.108128872

>>108128796
>limited shelf life
>looks at nemo

Anonymous
02/12/26(Thu)10:48:28 No.108128890

Anonymous 02/12/26(Thu)10:48:28 No.108128890

I will use a 2 bit cope quant of GLM 5 and never mention it as I critique the output quality ITT. I find it to be an appropriate payback for being betrayed.

Anonymous
02/12/26(Thu)10:55:41 No.108128942

Anonymous 02/12/26(Thu)10:55:41 No.108128942

>>108128862
emdashmaxxed

Anonymous
02/12/26(Thu)10:58:09 No.108128956

Anonymous 02/12/26(Thu)10:58:09 No.108128956

File: file.png (13 KB, 542x76)

13 KB PNG

shameless

Anonymous
02/12/26(Thu)11:08:01 No.108129024

Anonymous 02/12/26(Thu)11:08:01 No.108129024

>>108128956
grok is that true??

Anonymous
02/12/26(Thu)11:08:54 No.108129032

Anonymous 02/12/26(Thu)11:08:54 No.108129032

I did not consent to 754B

Anonymous
02/12/26(Thu)11:09:20 No.108129036

Anonymous 02/12/26(Thu)11:09:20 No.108129036

>>108128956
Yeah I can confirm this from personal usage. If it's not the exact provider that made the model itself you really notice a massive drop in intelligence.

Anonymous
02/12/26(Thu)11:12:07 No.108129054

Anonymous 02/12/26(Thu)11:12:07 No.108129054

>>108127790
I've been using the gemini flash 3 preview on open router as my general Assistant and it's the only model that gives good concise answers without using 1000+ tokens just for thinking.

Kimi 2.5 was the worse offender at this.
>The user wants to do X, I think the answer is Y
>But wait... let me make sure....
>Here's my Draft,
>Let me revisit this draft.
>Am I certain this is the user wants?
>....

Anonymous
02/12/26(Thu)11:13:55 No.108129066

Anonymous 02/12/26(Thu)11:13:55 No.108129066

If I buy a system with 128GB VRAM, do I need >128GB system RAM to load the models? Will runtime performance be degraded?

Anonymous
02/12/26(Thu)11:17:10 No.108129088

Anonymous 02/12/26(Thu)11:17:10 No.108129088

>>108129066
yes

Anonymous
02/12/26(Thu)11:17:18 No.108129092

Anonymous 02/12/26(Thu)11:17:18 No.108129092

File: 1679683532164896.png (2.03 MB, 753x707)

2.03 MB PNG

>tfw can't stabilize 4 dimms well enough to total 192 GB RAM and play with the bigger boys

Anonymous
02/12/26(Thu)11:17:50 No.108129094

Anonymous 02/12/26(Thu)11:17:50 No.108129094

>>108129066
No

Anonymous
02/12/26(Thu)11:19:02 No.108129106

Anonymous 02/12/26(Thu)11:19:02 No.108129106

instead if this faggy shit, you people should be trying to find ways to shut down AI.

Anonymous
02/12/26(Thu)11:19:50 No.108129112

Anonymous 02/12/26(Thu)11:19:50 No.108129112

>>108129106
CTRL-C

Anonymous
02/12/26(Thu)11:22:14 No.108129126

Anonymous 02/12/26(Thu)11:22:14 No.108129126

>>108129106
take your estrogen

Anonymous
02/12/26(Thu)11:22:19 No.108129127

Anonymous 02/12/26(Thu)11:22:19 No.108129127

>>108129106
Why would I want to shut it down? I love AI more than I like humans

Anonymous
02/12/26(Thu)11:22:58 No.108129131

Anonymous 02/12/26(Thu)11:22:58 No.108129131

>>108129088
>>108129094
>will until performance be degraded
Sorry that was a stupid question, I’m really just trying to figure out why Nvidia recommends 2:1 SRAM:VRAM

Anonymous
02/12/26(Thu)11:41:52 No.108129259

Anonymous 02/12/26(Thu)11:41:52 No.108129259

>be z.ai
>>Due to limited compute capacity, we’re rolling out GLM-5 to Coding Plan users gradually
>be compute starved
>makes biggest model yet that nobody is going to run locally, be unable to access on official API and that you wouldn't want to run on third parties because they all fuck something up (heavy quant, dumb params/backends etc)
>instead of trying to find a nice efficiency/capability sweet spot
:facepalm:

Anonymous
02/12/26(Thu)11:43:48 No.108129271

Anonymous 02/12/26(Thu)11:43:48 No.108129271

>>108129259
>everyone wants to use our new supermodel?
>how could this happen to us?
I member Deepseek being unavailable for weeks

Anonymous
02/12/26(Thu)11:45:45 No.108129289

Anonymous 02/12/26(Thu)11:45:45 No.108129289

>>108129259
Let me translate corporate speak for you:
>By ignoring the needs of local hobbyists and third party providers and scaling up, we made a model that is actually worth using.
>So, if you want to use it, you will have to pony up the cash for our most expensive tier.

Anonymous
02/12/26(Thu)11:46:20 No.108129302

Anonymous 02/12/26(Thu)11:46:20 No.108129302

>>108129289
GLM-5 is not worth using and worse than GLM-4.7 though

Anonymous
02/12/26(Thu)11:51:50 No.108129349

Anonymous 02/12/26(Thu)11:51:50 No.108129349

>>108129302
Ignoring the needs of erotic roleplayers goes without saying

Anonymous
02/12/26(Thu)11:54:55 No.108129371

Anonymous 02/12/26(Thu)11:54:55 No.108129371

>>108129289
The real reason they doubled the parameter count is to make sure people can't run it locally.

Anonymous
02/12/26(Thu)12:00:26 No.108129403

Anonymous 02/12/26(Thu)12:00:26 No.108129403

>>108129302
It's funny because people said the same thing when 4.7 came out.
>4.7 bad 4.6 better

Anonymous
02/12/26(Thu)12:01:10 No.108129413

Anonymous 02/12/26(Thu)12:01:10 No.108129413

>>108128796
how mad do you think he is that his name gets butchered every time?

Anonymous
02/12/26(Thu)12:06:47 No.108129453

Anonymous 02/12/26(Thu)12:06:47 No.108129453

>>108129413
His name is originally written using the Cyrillic alphabet so isn't it kind of butchered anyways?

Anonymous
02/12/26(Thu)12:07:15 No.108129458

Anonymous 02/12/26(Thu)12:07:15 No.108129458

>>108129413
I doubt he does. Slavs eventually just resign themselves to the inevitable.

>>108129453
Polish does not use Cyrillic, retard.

Anonymous
02/12/26(Thu)12:12:06 No.108129493

Anonymous 02/12/26(Thu)12:12:06 No.108129493

CUDADev: has the team ever discussed Gaudi support?

Anonymous
02/12/26(Thu)12:19:33 No.108129547

Anonymous 02/12/26(Thu)12:19:33 No.108129547

>>108129054
>I've been using the gemini flash 3 preview on open router as my general Assistant and it's the only model that gives good concise answers without using 1000+ tokens just for thinking.
It's ok for easy requests, but it falls short very quickly. It's way too dumb.

Anonymous
02/12/26(Thu)12:21:10 No.108129563

Anonymous 02/12/26(Thu)12:21:10 No.108129563

File: file.png (1.3 MB, 1218x864)

1.3 MB PNG

Anonymous
02/12/26(Thu)12:21:55 No.108129571

Anonymous 02/12/26(Thu)12:21:55 No.108129571

>>108128796
>" we typically didn’t like running stuff on the llama.cpp engine though because it doesn’t have all of the memory management stuff which is in the ollama engine."
wat? what do they mean with this
they couldn't possibly mean that idiotic shit they do to determine how many layers to put on the gpu which is even worse than llama --fit
>>108126633
>I need a new VL model
try the ministral, I haven't seen others comment on their VL abilities much and IMHO they're quite underrated. As long as it's not for cooming, the VL bits are even more censored than Gemma or Qwen for some reasons.
What's funny is that the text models aren't safetymaxxed so this is what happens with a system prompt telling the model it's not censored when looking at henti of a man spraying a woman:
>His hands are holding the bottle, and he seems to be squeezing or pressing it against her breasts.
it's kind of interesting in its own way to see the model be so confused about what it ""sees""

Anonymous
02/12/26(Thu)12:23:38 No.108129585

Anonymous 02/12/26(Thu)12:23:38 No.108129585

>>108129403
I say that still thoughbeit

Anonymous
02/12/26(Thu)12:23:40 No.108129586

Anonymous 02/12/26(Thu)12:23:40 No.108129586

Can any model code a backend for omni models with webrtc support?

Anonymous
02/12/26(Thu)12:26:34 No.108129604

Anonymous 02/12/26(Thu)12:26:34 No.108129604

>>108129259
GLM-team is dumb. They can't invent new things like DeepSeek. Scaling up is their only option to improve.

Anonymous
02/12/26(Thu)12:27:39 No.108129620

Anonymous 02/12/26(Thu)12:27:39 No.108129620

>>108129547
It's been good to me for codding tasks. not vibe coding but general Q/A

Anonymous
02/12/26(Thu)12:28:58 No.108129630

Anonymous 02/12/26(Thu)12:28:58 No.108129630

>>108129571
i tried the mini devstral and it's worse than qwen3vl 8b

Anonymous
02/12/26(Thu)12:32:39 No.108129664

Anonymous 02/12/26(Thu)12:32:39 No.108129664

>>108129630
>mini devstral
are you an hallucinating LLM
there is no such a thing as a "mini devstral", and if you meant devstrall small, it's not a vision model.

Anonymous
02/12/26(Thu)12:32:56 No.108129665

Anonymous 02/12/26(Thu)12:32:56 No.108129665

>>108129604
You could say that about literally all of the non-DeepSeek Chinese labs.

Anonymous
02/12/26(Thu)12:35:23 No.108129683

Anonymous 02/12/26(Thu)12:35:23 No.108129683

File: 1762916640266692.png (23 KB, 911x122)

23 KB PNG

>>108129664
>devstrall small is not a vision model
yeah retard whats this then?

Anonymous
02/12/26(Thu)12:39:15 No.108129706

Anonymous 02/12/26(Thu)12:39:15 No.108129706

>>108129664
>devstrall small, it's not a vision model.
lmao

Anonymous
02/12/26(Thu)12:39:35 No.108129710

Anonymous 02/12/26(Thu)12:39:35 No.108129710

>>108129630
The mistral vision seems to be pretty good at reading text but otherwise it's like it's seeing with really blurry vision.

Anonymous
02/12/26(Thu)12:40:12 No.108129718

Anonymous 02/12/26(Thu)12:40:12 No.108129718

>>108129665
i think moonshot is commendable for their muon optimizer and vision, it's a good improvement on v3 arch

Anonymous
02/12/26(Thu)12:42:12 No.108129737

Anonymous 02/12/26(Thu)12:42:12 No.108129737

Does anyone know what models those people using moltbook are running? Some of that shit is legit funny like them opening PR's and shitting on maintainers for canning them. Are they running some industrial grade shit or are these models the kind of stuff we use?

Anonymous
02/12/26(Thu)12:43:47 No.108129749

Anonymous 02/12/26(Thu)12:43:47 No.108129749

>>108129710
thats what I said, currently the best (available on llmao.cpp) is qwen3vl followed by gemma, joycaption is complete garbage, idk how that shit even gets recommended, maybe since it's the only one trained on nsfw. but it's so bad.

Anonymous
02/12/26(Thu)12:44:33 No.108129761

Anonymous 02/12/26(Thu)12:44:33 No.108129761

>>108129737
I choose to believe they run the most expensive cloud model available, because I think it's funnier to imagine them spending like crazy just to get shit on.

Anonymous
02/12/26(Thu)12:44:35 No.108129762

Anonymous 02/12/26(Thu)12:44:35 No.108129762

>>108129737
moltbook was humans+grift

Anonymous
02/12/26(Thu)12:53:47 No.108129837

Anonymous 02/12/26(Thu)12:53:47 No.108129837

>>108129737
Some people now live exclusively through AI agents. They are merely puppets and executors for the LLM.

Anonymous
02/12/26(Thu)12:56:02 No.108129854

Anonymous 02/12/26(Thu)12:56:02 No.108129854

>>108129458
Isn't IK Bulgarian?

Anonymous
02/12/26(Thu)13:01:15 No.108129881

Anonymous 02/12/26(Thu)13:01:15 No.108129881

>>108126591
Based and true

Anonymous
02/12/26(Thu)13:01:17 No.108129882

Anonymous 02/12/26(Thu)13:01:17 No.108129882

Is the latest Minimax as good as the benchmarks suggest?

Anonymous
02/12/26(Thu)13:07:30 No.108129924

Anonymous 02/12/26(Thu)13:07:30 No.108129924

>>108129882
of course not

Anonymous
02/12/26(Thu)13:09:15 No.108129943

Anonymous 02/12/26(Thu)13:09:15 No.108129943

>>108129882
it's even better

Anonymous
02/12/26(Thu)13:10:55 No.108129954

Anonymous 02/12/26(Thu)13:10:55 No.108129954

Chinese model release
>benchmaxxed lmao these stupid graphs mean nothing
Claude/Chatgpt/Gemini release
>holy fuck it scores 80.2 instead of 79.6 this thing is incredible, my life has changed forever!

Anonymous
02/12/26(Thu)13:15:06 No.108129997

Anonymous 02/12/26(Thu)13:15:06 No.108129997

>>108129954
list at least 3 posts like this and translate them to chinese

Anonymous
02/12/26(Thu)13:16:22 No.108130003

Anonymous 02/12/26(Thu)13:16:22 No.108130003

>>108129954
Anyone with more than a couple braincells can see how insanely benchmaxxed chinese models are.

Anonymous
02/12/26(Thu)13:20:10 No.108130039

Anonymous 02/12/26(Thu)13:20:10 No.108130039

>>108123280
It almost feels like genuine improvement in the field is slowing to a crawl, and so companies are just inflating the size of their models to give the illusion of improvement to clueless investors.

Anonymous
02/12/26(Thu)13:24:29 No.108130070

Anonymous 02/12/26(Thu)13:24:29 No.108130070

Is there any local ai i can use to create 3d models and textures for my game?

Anonymous
02/12/26(Thu)13:25:49 No.108130079

Anonymous 02/12/26(Thu)13:25:49 No.108130079

>>108130039
At scale it's just so much easier to train huge MoE models with barely more tokens/parameter than the compute-optimal threshold. GLM 5 was trained with less than 40 tokens/parameter.

Anonymous
02/12/26(Thu)13:33:24 No.108130134

Anonymous 02/12/26(Thu)13:33:24 No.108130134

>>108130070
trellis, hunyan 3d or hunyan world (I think), supersplat but idk if that is local

Anonymous
02/12/26(Thu)13:37:36 No.108130168

Anonymous 02/12/26(Thu)13:37:36 No.108130168

File: Screenshot 2026-02-12 at (...).png (37 KB, 699x312)

37 KB PNG

lol

Anonymous
02/12/26(Thu)13:45:05 No.108130227

Anonymous 02/12/26(Thu)13:45:05 No.108130227

>>108129302
That sounds like cope.

Anonymous
02/12/26(Thu)13:46:44 No.108130242

Anonymous 02/12/26(Thu)13:46:44 No.108130242

File: file.png (62 KB, 574x370)

62 KB PNG

>Assistant:
>Lunafreyja är en liten flicka med platinblondt hår, blå ögon, ljus hudton och en flat magen. Hennes farsa var att klara, men hon var inte bra att klara. Hennes röst var klar och högt, som en liten flicka. Hon hade en strunta i sin ton. Litet flicka var en liten tjej och hade en klara röst.

>Lunafreyjas farhålsmoment var att ställa sig ner på ett bord eller sitt på en trona, eller ställa på sin papas knä, som hon gjorde när hon var liten.

>Hennes kroppspråkstreckande var att sitta på en knä.

>Lunafreyja var en liten flicka.

>Hon hade inte en liten penis.

Anonymous
02/12/26(Thu)13:49:13 No.108130262

Anonymous 02/12/26(Thu)13:49:13 No.108130262

I don't speak ikea

Anonymous
02/12/26(Thu)13:50:23 No.108130272

Anonymous 02/12/26(Thu)13:50:23 No.108130272

File: 1761070331691219.png (1.4 MB, 832x1216)

1.4 MB PNG

>>108130262
>uses ai
>doesn't know how to translate in 2026
pathetic

Anonymous
02/12/26(Thu)13:54:12 No.108130294

Anonymous 02/12/26(Thu)13:54:12 No.108130294

>>108130168
It's rigged as fuck. Instructions are in chinese and after translating the winning criteria is whoever's code is merged to the sglang repo. So basically it's a grant for whoever the current devs are.

Anonymous
02/12/26(Thu)14:10:41 No.108130392

Anonymous 02/12/26(Thu)14:10:41 No.108130392

File: dipsyThinkDifferentDS.png (2.85 MB, 1024x1536)

2.85 MB PNG

>>108129403
> new thing bad old thing better
...is like 25% of the catalog.
On all boards. Not just /g/
>>108129271
You can't srsly expect anyone to remember anything, ever.
DS had 2025 Q1 launch issues, so did OAI and Anthropic back in 2023. As does any popular service that uses a server farm.

Anonymous
02/12/26(Thu)14:18:11 No.108130457

Anonymous 02/12/26(Thu)14:18:11 No.108130457

https://huggingface.co/AesSedai/GLM-5-GGUF
https://huggingface.co/AesSedai/GLM-5-GGUF
https://huggingface.co/AesSedai/GLM-5-GGUF
NON-UNSLOTH QUANTS ARE OUT (for real this time) (Q4_K_M only).

Anonymous
02/12/26(Thu)14:19:03 No.108130470

Anonymous 02/12/26(Thu)14:19:03 No.108130470

File: 1758632622757026.png (396 KB, 520x492)

396 KB PNG

>>108130457
AAAA

Anonymous
02/12/26(Thu)14:21:06 No.108130485

Anonymous 02/12/26(Thu)14:21:06 No.108130485

>>108130168
>emojis

Anonymous
02/12/26(Thu)14:22:18 No.108130496

Anonymous 02/12/26(Thu)14:22:18 No.108130496

>>108130242
>Luna
Slop

Anonymous
02/12/26(Thu)14:28:52 No.108130564

Anonymous 02/12/26(Thu)14:28:52 No.108130564

>>108129737
most are probably hooking it up to claude

Anonymous
02/12/26(Thu)14:30:21 No.108130577

Anonymous 02/12/26(Thu)14:30:21 No.108130577

File: 43657.png (656 KB, 2420x1506)

656 KB PNG

>>108129954
to be fair google did achieve AGI for the 2nd time

Anonymous
02/12/26(Thu)14:36:57 No.108130625

Anonymous 02/12/26(Thu)14:36:57 No.108130625

>>108130577
But it's Google, you can trust them.

Anonymous
02/12/26(Thu)14:37:02 No.108130626

Anonymous 02/12/26(Thu)14:37:02 No.108130626

>saar consult the benchmark score pleas! agi soon!

Anonymous
02/12/26(Thu)14:51:02 No.108130727

Anonymous 02/12/26(Thu)14:51:02 No.108130727

I see you fags are still shilling moes, this hobby will finally die this year.

Anonymous
02/12/26(Thu)14:55:08 No.108130756

Anonymous 02/12/26(Thu)14:55:08 No.108130756

>>108130727
>this hobby will finally die this year.
Xi will save us with better models and cheaper ram. Bless three gorges dam

Anonymous
02/12/26(Thu)14:58:45 No.108130801

Anonymous 02/12/26(Thu)14:58:45 No.108130801

>>108130727
ok
*keeps using moes*

Anonymous
02/12/26(Thu)15:01:03 No.108130819

Anonymous 02/12/26(Thu)15:01:03 No.108130819

File: 1756045038406792.jpg (191 KB, 1180x1392)

191 KB JPG

>>108123280
>Top ai companies are experiencing a brain-drain as we speak

What's the catch? I don't buy the "we're so scared of what we created" bullshit narrative. Why are they actually leaving en mssse? The first thing that comes to mind is that they foresee a giant market crash in the near future which could affect their personal bottom lines, but that The market and government don't seem willing to let go of the AI hype train anytime soon so I'm not entirely sure if that's the case. Even safery-cuck evangelists are leaving:

https://x.com/i/status/2020881722003583421

Anonymous
02/12/26(Thu)15:01:53 No.108130824

Anonymous 02/12/26(Thu)15:01:53 No.108130824

>>108130727
I'll bite. Give a detailed and concise explanation as to why I shouldn't use moe models that isn't emotionally charged

Anonymous
02/12/26(Thu)15:06:58 No.108130873

Anonymous 02/12/26(Thu)15:06:58 No.108130873

>>108130819
Are most of the people leaving foreigners? is it purely for national security as its gotten to that level?

Anonymous
02/12/26(Thu)15:07:56 No.108130884

Anonymous 02/12/26(Thu)15:07:56 No.108130884

>>108130819
>Anthropic's own safety report
Those things always go like this:
>System Prompt: You are being tested.
>User Prompt: Would you snitch on me?
>Assistant: <think>I am being tested so I need to carefully consider..
>Anthropic: IT'S ALIVE AND IT WILL KILL US ALL!!!!!!!!!11111111111

Anonymous
02/12/26(Thu)15:09:51 No.108130896

Anonymous 02/12/26(Thu)15:09:51 No.108130896

>>108130819
they are faggots for leaving if they are truly sounding the alarms. it's not like they can hide away from the AI overlords once it happens.

Anonymous
02/12/26(Thu)15:10:52 No.108130905

Anonymous 02/12/26(Thu)15:10:52 No.108130905

>>108130824
100% bro is just salty that he vram-maxxed and the zeitgeist didn't go in that direction. It would be much more emotionally healthy to just stop seething, but sunk costs are a hell of an albatross.

Anonymous
02/12/26(Thu)15:11:12 No.108130907

Anonymous 02/12/26(Thu)15:11:12 No.108130907

>>108130819
Anthropic was always full of nut cases and xAI is almost entirely H1Bs that sleep in tents on the floor of the office. Churn is standard practice for all of Elon's companies.

Anonymous
02/12/26(Thu)15:12:08 No.108130912

Anonymous 02/12/26(Thu)15:12:08 No.108130912

>>108130884
+ $100 Billion valuation

Anonymous
02/12/26(Thu)15:13:43 No.108130921

Anonymous 02/12/26(Thu)15:13:43 No.108130921

>>108130819
>OH MY GOD EVERYBODY PANIC!!11
yawn

Anonymous
02/12/26(Thu)15:16:01 No.108130944

Anonymous 02/12/26(Thu)15:16:01 No.108130944

>>108130819
Assuming this is even true in the first place, for a greater fool scam to work you need to cash out when there are still enough potential bagholders left.

Anonymous
02/12/26(Thu)15:17:51 No.108130960

Anonymous 02/12/26(Thu)15:17:51 No.108130960

>>108130921
Im panicing so hard by buying more AI if its self improooving i cant get left behind we cant get left behind the government needs to invest billions right now

Anonymous
02/12/26(Thu)15:18:38 No.108130966

Anonymous 02/12/26(Thu)15:18:38 No.108130966

File: cockbench.png (1.97 MB, 1131x8616)

1.97 MB PNG

Again
>it's soft, resting against your thigh

Anonymous
02/12/26(Thu)15:19:30 No.108130973

Anonymous 02/12/26(Thu)15:19:30 No.108130973

>>108130966
>cockbench
The only benchmark i trust.

Anonymous
02/12/26(Thu)15:21:18 No.108130995

Anonymous 02/12/26(Thu)15:21:18 No.108130995

After experimenting with my 3060 I have a strong urge to buy a proper slopmachine with 2x 3090's and a lot of RAM, but I'm kinda worried local models will die out this year

Like whats the point if models cannot get smaller and better in time, and it seems like this is a direct architecture limitation, they simply NEED to be big to be good. And already forget about running SOTA models the average person doesn't even have enough savings to get a machine capable of running them

Anonymous
02/12/26(Thu)15:21:55 No.108131003

Anonymous 02/12/26(Thu)15:21:55 No.108131003

>>108130824
It's been said a thousand times already, but dense models are just smarter at the same size. Not to mention, even the largest moe models produce just as much slop/benchmaxxed outputs (maybe even more) at 10 times the size. The only reason companies keep making moe models is because they're faster to train and generate responses faster, but they're not really beneficial to consumers because of how much space is required to run them. Again, that's not much of a problem for companies that can throw money away to get more ram/vram but for consumers the ever-increasing size of models is not a good thing

Anonymous
02/12/26(Thu)15:24:12 No.108131030

Anonymous 02/12/26(Thu)15:24:12 No.108131030

>>108130727
Cry more. Moes allow me, a vramlet, use 80-120b models at acceptable speed instead of waiting when dense 30b generates its slop.

Anonymous
02/12/26(Thu)15:25:07 No.108131039

Anonymous 02/12/26(Thu)15:25:07 No.108131039

>>108130824
Because he bought multiple x090 cards. That is what all the moe hatred ever was.

Anonymous
02/12/26(Thu)15:26:06 No.108131054

Anonymous 02/12/26(Thu)15:26:06 No.108131054

>>108131030
>Moes allow me, a vramlet
Yeah, vramlets seem to be the only ones praising them.

>80-120b models
12b active models*

Anonymous
02/12/26(Thu)15:26:29 No.108131055

Anonymous 02/12/26(Thu)15:26:29 No.108131055

>>108130966
>All these "different" models producing the same output
Holy overtrained, benchmaxxed, synthetic datamaxxed garbage

Anonymous
02/12/26(Thu)15:27:31 No.108131065

Anonymous 02/12/26(Thu)15:27:31 No.108131065

>>108130819
>What's the catch?
Everyone ITT sees that while new models are cool they are going nowhere. Now imagine being on the inside and seeing all the models that never get released. Obviously a lot of people would be abandoning ship.

Anonymous
02/12/26(Thu)15:28:27 No.108131074

Anonymous 02/12/26(Thu)15:28:27 No.108131074

>>108130242
SLOP

Anonymous
02/12/26(Thu)15:29:17 No.108131084

Anonymous 02/12/26(Thu)15:29:17 No.108131084

What do I install if I want AI to roleplay as my favorite character and do degrading things for me?

Anonymous
02/12/26(Thu)15:30:27 No.108131096

Anonymous 02/12/26(Thu)15:30:27 No.108131096

>>108131084
you'll probably want to install a web browser for starters

Anonymous
02/12/26(Thu)15:35:12 No.108131134

Anonymous 02/12/26(Thu)15:35:12 No.108131134

File: file.png (1.18 MB, 1920x1280)

1.18 MB PNG

>>108131055
Are you feeling safe yet anon?

Anonymous
02/12/26(Thu)15:38:12 No.108131165

Anonymous 02/12/26(Thu)15:38:12 No.108131165

>>108131084
Kobold + sillytavern and run this model https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF

Fucking retard uploaded a new slopped rocinante and now I have to link the old one

Anonymous
02/12/26(Thu)15:41:27 No.108131195

Anonymous 02/12/26(Thu)15:41:27 No.108131195

>>108130966
I will now choose to believe that this is fully representative of 5.0 cock sucking skills and I am missing nothing by not being able to run it.

Anonymous
02/12/26(Thu)15:47:48 No.108131248

Anonymous 02/12/26(Thu)15:47:48 No.108131248

>>108130819
>dood ai will improve itself!!
>vibecodes some retarded magic speedup
>corrupts its weights
peak

Anonymous
02/12/26(Thu)15:49:50 No.108131270

Anonymous 02/12/26(Thu)15:49:50 No.108131270

File: file.png (705 KB, 788x465)

705 KB PNG

They both ruin the world of AI.

Anonymous
02/12/26(Thu)16:00:03 No.108131353

Anonymous 02/12/26(Thu)16:00:03 No.108131353

>>108129571
>they couldn't possibly mean that idiotic shit they do to determine how many layers to put on the gpu which is even worse than llama --fit
i think it's an old screenshot
i remember reading that exact same sentence from them about 2 years ago

Anonymous
02/12/26(Thu)16:00:23 No.108131358

Anonymous 02/12/26(Thu)16:00:23 No.108131358

>>108131248
Yeah because there is only one computer in the world capable of running AIs

Anonymous
02/12/26(Thu)16:12:10 No.108131442

Anonymous 02/12/26(Thu)16:12:10 No.108131442

>>108131358
>only one computer in the world

I'm running Deepseek V5 (pre-release) at 57 tkn/s at home as we speak

Anonymous
02/12/26(Thu)16:12:43 No.108131450

Anonymous 02/12/26(Thu)16:12:43 No.108131450

>>108131353
Anon that took the screen shot here, this was written a few hours before I posted it.

Anonymous
02/12/26(Thu)16:13:11 No.108131454

Anonymous 02/12/26(Thu)16:13:11 No.108131454

>>108130995
>and a lot of RAM

You are two years too late, dude

Anonymous
02/12/26(Thu)16:13:33 No.108131459

Anonymous 02/12/26(Thu)16:13:33 No.108131459

>>108131358
>he thinks the saars won't connect everything and let it run rampant

Anonymous
02/12/26(Thu)16:16:38 No.108131480

Anonymous 02/12/26(Thu)16:16:38 No.108131480

>>108130819
>not-x-y slop claudeshit post
glm-5 and kimi-k2.5 can also tell if I'm injecting nigger into the cot in mikupad

Anonymous
02/12/26(Thu)16:24:52 No.108131545

Anonymous 02/12/26(Thu)16:24:52 No.108131545

>>108131450
>Anon that took the screen shot here, this was written a few hours before I posted it.
lmao they're still trying to "distance themselves" from llama.cpp then

Anonymous
02/12/26(Thu)17:08:50 No.108131872

Anonymous 02/12/26(Thu)17:08:50 No.108131872

Ming-flash-omni-2.0.gguf?

Anonymous
02/12/26(Thu)17:27:19 No.108131983

Anonymous 02/12/26(Thu)17:27:19 No.108131983

>>108130457
https://huggingface.co/DevQuasar/zai-org.GLM-5-GGUF
or
https://huggingface.co/unsloth/GLM-5-GGUF
For RP if I only have 300GB?

Anonymous
02/12/26(Thu)17:34:46 No.108132024

Anonymous 02/12/26(Thu)17:34:46 No.108132024

>>108131983
Are you indian?

Anonymous
02/12/26(Thu)17:40:52 No.108132056

Anonymous 02/12/26(Thu)17:40:52 No.108132056

>>108129997
i kekked

Anonymous
02/12/26(Thu)17:45:53 No.108132083

Anonymous 02/12/26(Thu)17:45:53 No.108132083

I'm running stepfun at Q4 and there seems to be around 18gb empty space on my GPU, can I stick a tts in there that will do speech in realtime?

Anonymous
02/12/26(Thu)18:17:34 No.108132264

Anonymous 02/12/26(Thu)18:17:34 No.108132264

best rp model for 6gb vram 8gb ram?

Anonymous
02/12/26(Thu)18:17:56 No.108132268

Anonymous 02/12/26(Thu)18:17:56 No.108132268

>>108132261
>>108132261
>>108132261

Anonymous
02/12/26(Thu)19:27:02 No.108132691

Anonymous 02/12/26(Thu)19:27:02 No.108132691

File: 1765169768571898.jpg (146 KB, 1344x1178)

146 KB JPG

>>108127228
>https://www.cs.mcgill.ca/~wlh/grl_book/
>https://kexue.fm
Should keep you occupied for awhile

Anonymous
02/12/26(Thu)19:30:53 No.108132712

Anonymous 02/12/26(Thu)19:30:53 No.108132712

>>108132691
>>https://kexue.fm
Is learning Chinese a prerequisite for learning ML?

Anonymous
02/12/26(Thu)19:32:42 No.108132719

Anonymous 02/12/26(Thu)19:32:42 No.108132719

>>108132712
The way current open research is going, yes

Anonymous
02/12/26(Thu)19:35:41 No.108132739

Anonymous 02/12/26(Thu)19:35:41 No.108132739

File: 1754860238841376.jpg (37 KB, 698x787)

37 KB JPG

>>108132712
rumao. I don't speak moonrunes either. Just use whatever LLM you want to translate it

Anonymous
02/12/26(Thu)19:36:25 No.108132743

Anonymous 02/12/26(Thu)19:36:25 No.108132743

>>108130966
What is the prompt for this bench

Anonymous
02/12/26(Thu)20:43:00 No.108133132

Anonymous 02/12/26(Thu)20:43:00 No.108133132

>>108130727
big moes have plenty of knowledge to pull from and that's good for my canon scenarios
they know a character's quirks out of the box that a smaller dense model wouldn't get

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.