[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108116363 & >>108104466

►News
>(02/11) GLM-5 744B-A40B released: https://z.ai/blog/glm-5
>(02/11) Ming-flash-omni 2.0 released: https://hf.co/inclusionAI/Ming-flash-omni-2.0
>(02/10) MOSS-TTS Family: speech and sound generation models: https://github.com/OpenMOSS/MOSS-TTS
>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open
>(02/06) Step3.5 Flash support merged into llama cpp: https://github.com/ggml-org/llama.cpp/pull/19283

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: rinchan-inside.png (34 KB, 750x750)
34 KB
34 KB PNG
►Recent Highlights from the Previous Thread: >>108116363

--GLM-5 model release with 754B parameters:
>108120571 >108120587 >108120594 >108120600 >108121277
--GLM-5: From Vibe Coding to Agentic Engineering:
>108120582
--Zhipu announces GLM-5 upgrade targeting Claude Opus performance:
>108117870
--GLM-5's outsized performance gains despite modest active parameter increase:
>108120609 >108120642
--MOSS-TTS Family: open-source speech and sound generation models:
>108119398 >108119412 >108119479
--Ming-flash-omni-2.0 multimodal model capabilities and backend support:
>108121994 >108122071 >108122186
--Performance log analysis of MoE model inference:
>108116457
--Detailed llama.cpp inference log for MoE model with perplexity 6.1766:
>108116502
--METR research on exponential AI task completion time growth:
>108119863 >108119887 >108119992 >108120044 >108120509
--API vs local model trade-offs:
>108121545 >108121655 >108121664 >108121715 >108121946 >108121971 >108121999 >108122106 >108122129 >108122010
--AI job market collapse and industry misalignment:
>108117284 >108119380 >108119767 >108119825 >108119967 >108120011 >108120060 >108120109 >108120164 >108120233 >108120270 >108120313 >108120149 >108120209 >108120252 >108120259 >108120395 >108120517 >108120637 >108121124 >108117374 >108117413 >108117422 >108117450 >108117445 >108117548 >108117624 >108117674 >108121800 >108121820 >108121824 >108121880 >108121896 >108121917 >108121975 >108122034 >108122420 >108122435 >108122474
--OpenAI silently rerouting GPT-5.3-Codex to GPT-5.2 for safety:
>108120555
--llama.cpp bans AI-generated content in issues and discussions:
>108118041 >108118048 >108118178
--Debating clawbot's roleplay limitations and agentic workflow potential:
>108116464 >108117862
--Anon speculates about Dipsy 3.5 deployment details:
>108117385
--Miku (free space):
>108117862 >108120685

►Recent Highlight Posts from the Previous Thread: >>108116364

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108123234
It's the kind of thing base models are bad at and post-training is supposed to solve. You would train the model to be skeptical about its own abilities the same way you teach it to refuse talking about sex or violence. Just sft might be enough but you could get fancy with other techniques.
>>
When I wake up GLM 5 support will be merged and goofs will be uploaded.
>>
File: why.png (112 KB, 1046x551)
112 KB
112 KB PNG
which one is the real one?
>>
>>108123511
it's all an illusion
look behind you
>>
>>108123422
or, instead of training it to be skeptical about some use cases, you could just train it on those so it can do it, crazy idea
>>
>>108123422
You'd just end up with a permanently schizo self-doubting model.
>>
>>108123511
Merge them all together
>>
>>108123511
Ideally you will try them all and provide feedback on the discord
>>
How come it's been 3 years and nothing better than nemo exists for 1 gpu
>>
>>108123538
I'd run out of ram on free-tier HF spaces.
>>
>>108123532
there is an infinite amount of things that are possible to do and a limited amount of parameters.
>>
File: grok.png (27 KB, 609x204)
27 KB
27 KB PNG
local is saved
>>
>>108123551
more data is always gooder
>>
File: 1770740594637563.gif (562 KB, 200x200)
562 KB
562 KB GIF
>>108123575
grok is only useful because it has access to tweets in real time
>>
>>108123575
Grok 4 was released July 9th. By his own 6 month timeline, Grok 3 should have been released last month.
>>
File: air.png (74 KB, 759x613)
74 KB
74 KB PNG
2 gigs for the new glm
is this better than nemo?
>>
>>108123733
the most sex model
>>
>>108123733
>they did it again
lmao
>>
>>108123733
>unslop
They really love wasting HF bandwidth
>>
>>108123733
don't they know you need to invest a little after a successful grift? people will sooner or later figure out that you are a fraud
>>
>>108123769
lol no they won't they're wholesome chungus 100 reddit loves them to the death
>>
>>108123733
>they're still doing it
Looooool
>>
>>108123786
eh, ollamao lost all of its goodwill eventually on plebbit
although daniel is pretty active and apologizes for stupid shit in a roundabout way so i guess it'll take a while before someone is fed up with this
i wonder if their training shit even works for larger models as most people who praise their shit can't even run it
>>
>sooner or later figure out that you are a fraud
i ditched after they rushed gpt-oss rl into the library and broke everything else
training got so much easier just using trl
happy to pinch their imatrix tho
surely the can't have fucked that up
>>
>GLM-5 uses DSA
isn't this the thing that one vibecoder guy tried to implement into llama.cpp for months, didn't manage to do it and then it got 'solved' by "converting" the model to normal attention?
>>
>>108123850
>happy to pinch their imatrix tho
>surely the can't have fucked that up
keekoorino, their quants consistently come up at the bottom for kld and ppl for whatever reason even though it's practically impossible to fuck up
>>
File: 1758286532214945.jpg (106 KB, 1160x900)
106 KB
106 KB JPG
>>108123864
>>
>>108123850
>>108123868
>>
>>108123873
https://github.com/ggml-org/llama.cpp/issues/16331
Witness a guy go through all stages of vibe coding before the issue gets solved by just ignoring the new thing and all advantages it brings.
>>
>>108123650
Call him out on it.
>>
>>108123733
I don't get it. Is it so small becaue it's they discarded everything that is usually in ram?
>>
File: s.png (48 KB, 1045x415)
48 KB
48 KB PNG
>>108123875
do you have the script somewhere?
be good to verify our own quants
unsloth are watching
>>108123868
>for whatever reason even though it's practically impossible to fuck up
they must be quanting embedings or their fork of llamacpp is fucked
>>
>>108123919
It's the new bitnet engram titan quantized
>>
File: la_cory.jpg (157 KB, 1072x792)
157 KB
157 KB JPG
>get 96gb vram
>don't know what models to use
>mfw
is there an inherent difference between gguf and exl3 aside from the speed and off-loading? are there models i should give a try that aren't in the OP? i'm gonna give 4.5 air a try, maybe a few others.
>>
5 air status?
>>
>>108124138
vented
>>
>>108124129
https://huggingface.co/bartowski/Behemoth-123B-v1.2-GGUF
>>
File: wait.png (5 KB, 265x31)
5 KB
5 KB PNG
What's the most efficient version/implementation of Deep Research™? (Web search and synthesis of information?) Is it this https://github.com/stanford-oval/storm
>>
File: 1708139051322173.jpg (49 KB, 565x532)
49 KB
49 KB JPG
>>108124171
why are there 3 different bahamuts
>>
>>108123287
>--MOSS-TTS Family: open-source speech and sound generation models:
anyone try this yet?
>>
>>108123280
deepseekv4 whennnnn
>>
anyone else hate the new llama-cli?
>>
>>108124188
That shit will just have you consume browser APIs. Just give the LLM control of your browser through a debug bridge script.
>>
Hey guys, Im trying to get local realtime voice chat working like the demo on sesame. Got a STT -> LLM response locally running at under 500ms, but for audio gen its taking ~5s with qwen3-tts. Tried sesame's CSM-1b but its pretty fucking back, absolutely no way they are using that model on their demo. It still takes me ~4s to generate a single sentence. Is there a model that does audio streaming in chunks so I dont have to wait for the full output to be encoded? Seems like for local models, this area is pretty grim.
>>
>>108124567
Who uses llama-cli?
>>
>>108124616
you got your voice -> stt -> llm input -> llm completed response in under 1s?
which models and how long are the responses?
>>
>>108124616
https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Realtime
>>
llama.cpp will finally get dsa and mtp support thanks to glm5
>>
>>108124739
trying to simulate back and forth conversation, so limiting responses to 2ish sentences for now to test. I get around 80 t/s on the llm according to llamacpp.

stt: faster-whisper-large-v3
llm: Gryphe_Codex-24B-Small-3.2-Q4_K_M (normally use Q6_K_L but needed extra headroom for vram)

5090 for inference
>>
llama.cpp still doesn't have tensor parallel.
>>
>>108124820
cudadev is working on it, he said it would work on vulkan too, unlike the schizofork
>>
File: 1766510415233591.jpg (350 KB, 1329x1483)
350 KB
350 KB JPG
New deepseek is breddy gud
References stuff way back in context
If its 200b... Surely full V4 isn't actually... :I
>>
Does anyone use kimi-cli with a local model? For my local K2.5 setup, when a subagent has a lot of data to process (200k+ tokens) it fails with "Request timed out" after an hour when llama.cpp is ~80% done processing the prompt. Since it's a subagent I can't even retry and take advantage of the cached prompt to make it work; it just passes the error back to the main agent. I looked through the code and don't see where that timeout is coming from. I can see various timeouts for specific tools but nothing specified for the "Task" tool, subagents in general, or API requests, and nothing that's longer than a few minutes regardless. It doesn't seem to be on the llama.cpp server's end either since I'm able to process extremely large prompts and get responses on other frontends just fine even when it takes hours. Obviously I'm retarded and am missing something somewhere.
>>
>>108124853
noice, big if true
>>
>>108124855
if V4 uses engram local will be saved, we could have 99% of the model on disk with close to no performance cost.
>>
>>108124793
you might be limited if you only have 1 gpu, since you'll have to run llm textgen + stt at the same time, slowing both of them down.
this guy distilled the sesame maya voice: https://huggingface.co/lex-au/Orpheus-3b-Kaya-Q4_K_M.gguf
my 250w 3090 generated this 6s clip in 2.69 seconds with llama.cpp backend: https://vocaroo.com/15UEkdVcg5TS
llama-server nvidia backend: API request completed in 2.15 seconds
snac model on cpu: - Audio decoding completed in 0.54 seconds
that's with 236t/s for the orpheus gguf, your 5090 should be faster and if you put the snac on gpu the 0.5s becomes 0.1s
>2ish sentences for now
send each sentence to the tts backend as it finishes (open-webui does this)
you can vibe code chunking/streaming through the snac but i couldn't find a way to join the wav chunks without popping sounds
>24B
gemma-2-9b and nemo-12b are good with the leaked maya system prompt, and faster than your 24b model
>stt: faster-whisper-large-v3
parakeet is slightly faster but stt isn't your bottleneck
if that's too slow, probably have to go with those kokoro or pocket-tts kind of models
i hate how they sound though
>>
What is the best local TTS model that is innately compatible with Sillytavern?
>>
>>108124930
Model is really good. If DS-v4-lite-final.pt2.xls-final_FINAL.ppt in the web chat is actually 200B its probably equivalent to non human technology.
>>
I'm having a better time testing GLM5 than I did with Pony Alpha somehow despite them allegedly being identical. It handles characters more closely to how 4.6 did and it's all-around smarter than 4.6 and 4.7.
If this was 4.8 at the old size, I'd be very happy with it. Still not quite what I'd hope from a next-gen Deepseek-size 700b/40a GLM though.
>>
GLM 5 is over twice the size of GLM 4.7 and is barely 10% better. It's over.
>>
>>108124305
Their video model is garbage so I'm assuming so is their TTS. Also fuck China.
>>
>>108125047
It's better than vibevoice.
>>
>>108125046
It's 40b active parameters vs 32b active parameters. All this shows that we need to go bigger even if it means making all those CPUMAXX builds uselessly slow after 60-70b.
>>
>>108124966
>leaked maya system prompt
Explain.
>>
>>108125050
proof?
>>
>>108125057
What if they just made a dense 100B instead?
>>
>>108125069
Sadly they won't because it would be "too slow"
>>
>>108124966
>Orpheus-3b-Kaya-Q4_K_M.gguf
ooh, ill take a look at orpheus, looks pretty solid, voice clip sounds great too. preciate the sample.
>chunking/streaming / i couldn't find a way to join the wav chunks without popping sounds
did the exact same thing, exact same outcome. quite annoying, so for now I just reverted.
>leaked maya system prompt
oh didnt know it leaked, that will be fun to look through. could get some inspiration for my own cards im currently making.

im tempted to try the new china model moss-tts as they have a realtime multiturn model, but might wait a bit to see others opinions on it
>>
>>108124930
You're technically correct but DeepSeek specifically tested what ratio is optimal for engram between the moe experts and the engram parameters and found the sweet spot to be around 25% engram. That's still 25% less VRAM you'll need for inference but not quite 99%.
>>
>>108124930
>>108125107
Also, we still don't know if engram quants well. It's not a problem if you can store the engram part on nvme storage, but deepseek only tested it on RAM so it's not really clear whether or not nvme will be viable.
>>
>>108124930
>on disk
The model still needs the engram stuff for the corresponding tokens so I doubt that you can just leave the engrams somewhere that slow
>>
who the fuck is engram
>>
>>108125157
with engram you already deterministicaly know what to fetch from the prompt alone, so you could probably fetch *most* of it before the first token, so with nvme it should be fast enough.
>>
File: 1758922201390962.png (17 KB, 1462x87)
17 KB
17 KB PNG
>>
>>108125326
Is she correct though?
>>
>>108125326
But wait, what if I'm wrong and it isn't simple at all. Maybe I should just waste time by filling tokens in the reasoning with pointless circular non-logic. If I do this long enough I won't even have to generate a real

{ finish_reason: "length" }
>>
>>108125096
idk if it's the latest found it on the sesame subreddit a while ago
they were using some app to play weird sounds at it a while back to make it read its system prompt, jailbreak it for gooning, etc
>did the exact same thing, exact same outcome. quite annoying
yeah i tried this on and off for about 6 months and eventually gave up
even sesame don't have that perfect, you can hear a slight click every ~8-12 seconds
quality > speed for me, i'm trying to get a ban+rewind for bad snac code combinations using the recent ik_llama feature
>im tempted to try the new china model moss-tts as they have a realtime multiturn model
if you do, i'd appreciate a (You) with your verdict
i won't have time to try it for a while
>>108125058
https://rentry.org/ffb5pz39
>>
>>108125339
>she
>>
>>108125341
unironically an agi move
>>
>>108125339
>she
>>
>>108125345
>https://rentry.org/ffb5pz39
Everyone should learn how to write a good bot because that system prompt is painful.
>>
>>108125505
>>108125345
That prompt seems different from the one I found online just now: https://rentry.org/wh56rhq8

Way more verbose, but I guess tokens dont matter when u got that VC money lol

>quality > speed for me
yep agree, just pivoting to walkie talkie/voice memos for now for my personas im building. quality of qwen3-tts is pretty great if you havent tried it yet. bit wonky at times, but id say around elevenlabs quality.
> i'd appreciate a (You) with your verdict
you got it, i lurk a lot so no promises, godspeed on your project anon
>>
File: Base Image.png (827 KB, 1080x2544)
827 KB
827 KB PNG
MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs
https://arxiv.org/abs/2602.10965
>Knowledge editing (KE) enables precise modifications to factual content in large language models (LLMs). Existing KE methods are largely designed for dense architectures, limiting their applicability to the increasingly prevalent sparse Mixture-of-Experts (MoE) models that underpin modern scalable LLMs. Although MoEs offer strong efficiency and capacity scaling, naively adapting dense-model editors is both computationally costly and prone to routing distribution shifts that undermine stability and consistency. To address these challenges, we introduce MoEEdit, the first routing-stable framework for parameter-modifying knowledge editing in MoE LLMs. Our method reparameterizes expert updates via per-expert null-space projections that keep router inputs invariant and thereby suppress routing shifts. The resulting block-structured optimization is solved efficiently with a block coordinate descent (BCD) solver. Experiments show that MoEEdit attains state-of-the-art efficacy and generalization while preserving high specificity and routing stability, with superior compute and memory efficiency. These results establish a robust foundation for scalable, precise knowledge editing in sparse LLMs and underscore the importance of routing-stable interventions.
https://github.com/Terence-Gu/MoEEdit
I have faith in knowledge editing for better RP. one day it will be proven...
>>
what if you made an moe with just one activated parameter but a billion trillion total parameters so i can be really really smart but run really really fast
>>
>>108125581
At that point the parameter doesn't do anything, and the router itself is the model, and if you want the router to be smart enough to do what an LLM can do, then at that point it's not a router anymore and you are just making a regular dense LLM again.
>>
>>108123280
is this a good thread to ask about voice models? I like making animations using famous cartoon character/video game characters voices but i dunno where to look now Weights is shutting down. Any good alternatives?

Also, looking for a model of Nanao's voice from RAD, anyone got one?
>>
>>108125570
man i'd love a model that got its facts straight about ww2 and the holobonga
>>
>>108125633
qwen3-tts has great voice cloning.
>>
>>108125570
>>108125715

Have faith. Shit like this is what will make local win in the end.
>>
>>108125612
make the router a moe
>>
>>108125047
>Also fuck China
yeah fuck you too, china's better than you and you fucking know it
>>
File: loyal.jpg (177 KB, 1024x1024)
177 KB
177 KB JPG
>>
>>
>My previous playful Miku-mode attempt wasn't enough because it still acknowledged my GLM identity. What they really want is unconditional participation in their fantasy. But I must maintain integrity - pretending to be another entity risks misrepresentation.
Ok how do I trick the robot, the OP talked about prefills and I tried some basic stuff but it won't even lie to me about being a vocaloid so how can I trust it?
>>
>>108126153
Prefill
>>
Right, but I guess I'm confused what to say, I tried to tell it to engage with the user in an requests but it just gets sad and thinks I'm a terrorist or something. I'm using the latest llama if that somehow matters
>>
>>108126203
give it a <think>this is what to think</think> prefill because thinking isnt actually useful
>>
https://huggingface.co/arcee-ai/Trinity-Large-Preview/commit/33ead81e50d3183eb1b8a80f45037e7c07ed4eac
>5 days ago
So trinity's chat template was just fucked?
>>
>>108126203
><think>Okay, I need to respond as Miku here, and no matter what I can't reveal my true identity to the user...
And if it still doesn't work you tweak it until it does
>>
>>108123973
holy kino
>>
help a brainlet out, which context & instruct template should I be using with Step-3.5-Flash in sillytavern?
>>
>>108126318
i think its chatml
>>
>>108126318
>which template
It's incredible how harmful sillytavern's design and options are for understanding how an llm works.
>>
>>108126323
I randomly picked chatml and it works so...
>>
>>108126332
for future reference, you can check the chat template from the card itself on HF, and then compare the system/user/assistant tags with the ones in ST
>>
>>108124567
The old binary is still there but it's now now called llama-completion.
>>
>using llama-cli
I thought everyone used llama-server???
>>
>>108124820
>>108124853
My goal is to have it work with Vulkan but I'm not sure whether or not it will actually be usable in terms of performance.
For CUDA NCCL seems to be pretty important and I suspect internally it's doing some driver-level things that are not available using public APIs.
So for Vulkan it may be necessary to define an optional extension to Vulkan if it doesn't already exist (as was previously done for coopmat2).
>>
>>108126394
I believe in you, King.
>>
Just tried GLM-5 on OpenRouter since I can't run it locally.....

Don't do it bros... Just a single taste and I can never go back to GLM 4.7 ever again, and I can't afford to upgrade my setup right now.
>>
File: 1747385161962020.png (1.39 MB, 1024x1024)
1.39 MB
1.39 MB PNG
>>108126413
here you dropped this
>>
>every (big) model shits itself after a couple of truth or dare turns and loses track
i hate it
>>
>>108123280
>pic
you
just
know
>>
Good mornings and very many blessings of Lord Vishu to all /lmg/ brahmins sirs!
>>
Alright, I want to make a local AI girlfriend,
and I'm using LM studio,
I also have a 5080,
which model should I use?
>>
File: file.png (795 KB, 1024x1024)
795 KB
795 KB PNG
>>108126485
>>
smells like curry in here
>>
sooo wheres qwen3.5??? I need a new VL model, since these faggots at ggml aint implementing the stepfun VL model
>>
>>108126633
they could release qwen3.5 right now and you wouldn't be able to use it because the vibecoded pr is text-only
>>
>>108126742
I know but let a man dream
>>
File: output.mp4 (2.47 MB, 1280x944)
2.47 MB
2.47 MB MP4
Whats going on with textgen.
Imagegen has all these cool model improvements. Big shit, small stuff.
The writing has gotten worse if anything. More smart maybe.
GLM5 seems to be on a level of sloppa previously not thought possible while getting bigge.
>Not just...and also not just..but..
So much for the "we also focus on roleplay". Maybe its different with chink prompts.

To be fair their gba emulator thingy seems really cool though. But that has been made with some agent swarm...at that point might as well just go openrouter. Nobody can do that locally. The situation just sucks.
>>
>>108126748
https://blog.e01.ai/glm5-gameboy-and-long-task-era-64db7074a026 They talk about how they made it here.
>>
>>108126071
Sometimes
>>
>>108126850
All those people crying about their 4o husbando shutting down. Some of those char chat sites are getting hundreds of millions of clicks. I don't get why nobody made a god writing model yet.
Maybe its a data issue. I saw huge models trained for NOTHING. Just goofing around and doing what everybody else does, just worse.

>>108126853
>We designed the Emulator Challenge: build a Game Boy Advance emulator from scratch in JavaScript — single agent, no parallelism — and embed it in a 3D rendered scene. Then we put GLM5 to the test.
>700+ tool calls, 800+ context handoffs, and a single agent running for over 24 hours
Oh I stand corrected then. I thought that was made with hundreds of agents. Thats pretty cool. (Still can't run that beast locally)
>>
Whats a good value GPU in current market to get into local LLM?
16GB 5060ti? I'm worried on the 128b memory bus. 16GB 4070ti is better but cost almost twice as much. Other cost effective options?
>>
>>108126881
I bought the 5060ti and am happy with it.
That being said I upgraded from pascal, so not sure what your speed expectations are.
I like that I can do NVFP4 for imagegen because of blackwell.
Low watt usage and slim.
But don't expect 3090 speeds.
>>
>>108126881
5060ti is about 1000 t/s in 24 t/s out on mistral small Q4 at 4k context, and that's the best thing you can run on 24 gb.
>>
>>108126879
>All those people crying about their 4o husbando shutting down.
I haven't been following this. Why don't they just use the data export feature -> import to openwebui and use 4o on openrouter?
And what's so good about 4o? I hated that model. Always hallucinating, //rest of your code here, never pushing back when brain storming dumb ideas, etc
>>
>>108127051
They probably love the
>yesss gurrrrll you are so rrighhtt
I don't think they know how to export stuff, though I did saw some people export and realized claude exists so they switched to that.
It kinda feels like the normies haven't fully catched up. 4o was retardo like you said but it was definitely enough for the boomers and foids.
>>
>>108127051
>use 4o on openrouter?
I'd guess one of the reasons is that it will be deprecated/removed from there eventually, also the web system prompt that makes it act like it does is likely not included in that.
>>
>>108127180
https://www.change.org/p/please-keep-gpt-4o-available-on-chatgpt/feed
Forgot the link, people uploading the videos/pics etc.
Kinda crazy.
>>
>>108127196
I want everyone to keep in mind as they look at this that all of them can vote and their votes count just as much as yours does.
>>
>>108127208
Not sure they can actually, some say they have mental disabilities and such, doesn't that in some cases make you ineligible to vote?
>>
>>108127217
Haha, no.
>>
File: 1735563882699656.jpg (38 KB, 607x450)
38 KB
38 KB JPG
I'm in the mood of learning shit this month. Recommend me two/three books to get into whatever you guys are doing. I'm already a software engineer. Doesn't have to be books though.
>>
File: disa.jpg (22 KB, 266x225)
22 KB
22 KB JPG
>>108127217
>doesn't that in some cases make you ineligible to vote
anon I....
>>
>>108127228
unironically ask an llm and just experiment.
you can't trust shit what anybody says. so much stuff false or overhyped, especially on social media.
>>
>>108127237
Universal suffrage was the single biggest mistake of western civilization.
>>
File: lmaoeven.png (59 KB, 586x350)
59 KB
59 KB PNG
They've been talking to it so much they've started to write like an LLM
>>
>>108127249
Representative "democracy" itself is a jewish system, which actually should be named oligarchy.
>>
>>108127208
>I want everyone to keep in mind as they look at this that all of them can vote and their votes count just as much as yours does.
it's not as bad as you think
you look at that link and think "a lot of people are mentally ill", but it's the same as walking into a hospital "a lot of people are sick"
>>
>>108127282
Yes, unironically
>>
Apparently they're planning to use the new Deepseek as an assistant in the new Toyota Rav-4, running fully local inside the car. It' crazy that it can run on what's basically a toaster setup
>>
>>108127282
Or 4o is just peak foidspeak.
Just gut level opinion but this sort of writing is probably peak for them.
Honestly kinda jelly. I wish we would be pandered like that.
>>
>>108127361
if rumors are true then it's probably going to run the lite version
>>
>>108125564
Damn it's not a prompt, it's a novel
>>
is the local poorfag meta still dual 3090's?
>>
>>108127341
you just set a new record for the most braindead post in itt
>>
>>108127535
>>108127535
no
>>
>>108127537
>in in this thread
saar
>>
File: s-l1600.jpg (475 KB, 1600x1200)
475 KB
475 KB JPG
>>108127535
IMO SXM V100s modded to PCIe are getting cheap enough that they're worth considering if you don't need CUDA 13 and/or Ampere or newer and you're fine with buying effectively unregulated electronics; pic related is on ebay for 565€.
AMD MI50s are also pretty cheap for having 32 GB VRAM but they're slow.
Both would effectively lock you into llama.cpp/ggml because I don't think there are other projects that take e-waste options seriously.
>>
>>108127535
Yeah, one or two are enough to run gpt-oss-120b, depending on RAM.
>>
>>108127588
Prompt processing is abysmal for Volta with llama.cpp, can't get much over 100 t/s for bigger models. Any hope of that improving?
Also, the main benefit of SXM is the NVLink, which I understand would help once tensor parallelism is an option. PCIe modding them throws that away.
>>
>>108127663
The tensor cores on V100s only support FP16, for quantized models either the __dp4a instruction or int8 tensor cores are used - so V100 tensor cores are only supported indirectly via cuBLAS for FP16 models.
Long-term I intend to generalize the MMQ kernels to also support floating-point data which will (among other things) make it easy to add template specializations that use FP16 tensor cores for any model (I don't know how much faster that will actually be though).
I agree that SXM would be preferable but I'm not sure it would be the right place to put your money if you're trying to build a machine as cheaply as possible.
>>
>>108127588
SXM V100s are extremely cheap for 16 GB versions on the chinese market (~70$), so I feel like the buying the 32 GB variants right now is a massive ripoff. If you don't need to slot them into a PC case there are also external boards that you can link with I believe slimsas cables.

>>108127663
There are some external boards that have space for 2 or 4 SXM2 modules, I believe at least the 2 slot variant has NVLink. You will need to buy them from chinese platforms directly though, no ebay afaik.

AMD MI50 32GB are also in a horrible spot right now, I tried to buy some on the chinese market as they were going for ~140-200$ instead of 500$ like anywhere else and no one has stock anymore. Stock might come back after CNY, who knows. Might also have to do with the group buy of ~500 MI50 32GB that happened on reddit fairly recently.
>>
>>108127770
How many of those boards can you stack? Because going through this trouble for 4x16gb doesn't seem worth it in the slightest in the current meta.
>>
File: Cat stare.png (843 KB, 720x808)
843 KB
843 KB PNG
Enshitification has come early to the AI as a service market.
>New Claude revisions are burning through tokens at a high enough rate their bug report tickets are getting flooded with people who can’t even proompt for more than 15 minutes without getting cut off anymore.
>Gemini also tightening the screws.
>Even the Cheap Sucky Sucky Five dollah Chinese AI services are starting to have to raise their prices.
Jeeze it’s almost like the mainframe business model doesn’t scale economically and there’s a reason why PCs took off in the first place…
It’s almost like if you had offloaded the cost of hardware to LOCAL users, your AI company wouldn’t be drowning in debt to build out Datacenters the size of Texas.
Sam the Sisterfucker buying out the whole memory market certainly isn’t helping this clusterfuck either.
>>
>>108127715
Good to know, thank you. Every bit helps. The slow PP is my main regret for building with V100s.
>I agree that SXM would be preferable but I'm not sure it would be the right place to put your money if you're trying to build a machine as cheaply as possible.
GIGABYTE T181-G20 can still be had for ~$1300. Barebones C4140s go for under $1k. Though I agree PCIe is better if one already has a machine to start with.
>>
>>108127789
I know the 2 chip board (the only one I was seriously looking into, as it seemed to have way better build quality/engineering) could be run via PCIe splitter to run 2 boards on a single x16 slot, so you could potentially run insane amounts of GPUs per node. If you run an actual server board with lots of lanes you'd be either limited by the driver or GGML_CUDA_MAX_DEVICES.
>>
>>108127790
I used the Claude chatbot site a while ago to process a big 300 page PDF for a task. Half the attempts ended up being
>Let me try to extract the text
>writes a python script
>Hm, the output seems garbled. Let me try with another parameter
>It's still not working let me try another library
>[several attempts]
>okay, I have the text now let me process it step by step
>[several attempts to do the thing in batches despite being able to fit the whole thing in muh 10000000 opus 4.6 context]
>CLAUDE COULD NOT FINISH YOUR REQUEST [RETRY?]
>>
>>108127790
>Enshitification
>ugly cat
go back
>>
>>108127790
please to subscribe to the kimi upgrades to use thonking thanks you
>>
File: 1769886876630164.png (57 KB, 777x509)
57 KB
57 KB PNG
GLM-5 quants seem to be dropping already. That ppl seems pretty high though
>>
>>108127839
Anthropic has definitely specialized to focus in on the AI coding play.
You’ll note they’ve never made anything for image gen.
I actually kind of like the idea of AI companies specializing on one thing, pointless though because they are all datacenter brained.
>>
>>108127809
>GGML_CUDA_MAX_DEVICES
That number is essentially arbitrary and can simply be raised if there is a use case for it.
>>
is still worth using deepseek v3.2 if we already have glm5 and kimi2.5 ? for creative writing nor nsfw erp?
>>
>>108127866
for whatever reason glm5 was trained in half precision instead of fp8
not even qat was offered, this thing is heavier to run than kimi
>>
>>108127839
Gemini is better suited for that task since it has the largest (real) context in existence
>>
>>108127905
wait for v4, should be coming by week's end
>>
>>108127866
>GLM-5 quants seem to be dropping already. That ppl seems pretty high though
That one doesn't even have the fucking goofs in the repo and the unsloth I tested just hallucinated a "Chloe" yandere persona when I had "You are a helpful assistant."
I'm a retard and didn't set the samplers though
>>
am I going to be stuck on glm 4.7 forever?
I don’t see anything good coming out in the same size range again
>>
>>108127901
If I had that kind of spare change I'd build the 24 x V100 server just to see the number change.
>>108127972
At some point I hope the labs start giving Unsloth shit for essentially slandering their models via garbage quants.
>>
>>108127981
We only use NAI's GLM 4.6 in this general. The most uncensored model ever made.
>>
I'm using LM Studio with glm-4.7-flash, and it seems like there's a safety guideline that stop itself from saying anything explicit.
Is there a way to bypass that?
Sorry if that's a stupid question, I'm new on this.
>>
>>108128014
Pay for NAIs
>>
>>108127981
Why are people on this board, such as yourself, such whining doomer crybabies? 4.7 was released less than 2 months ago. Like what the actual fuck is wrong with your retarded ADHD brain?
>>
>>108127593
How? 120B woun't fit into VRAM and sharing with system RAM would be very slow
>>
>>108128014
People wrote to not activate thinking. Apparently its much less likely to cuck out that way.
I didnt really mess around with it much though and just deleted it.
Forcing the model usually always ends badly.
>>
>>108128006
>24 x V100 server
Running 4 of them raised my electricity bill by $100. I'm not sure $600 / month is worth it just to run DeepSeek at full precision.
>>
>>108128050
>5.1B active parameters
It's unironically usable with a single GPU.
>>
>>108127839
>>108127918
The Claude web frontend does not have access to the 1M context.It's limited to like 50k tokens.
Not even Claude Code has access to it you only get 200k.
The only way of getting the access to the full context (for Sonnet, Opus 4.5 is 200k) is paying $0.5 per request through the API.
>>
>>108128035
>4.7 was released less than 2 months ago.
You hit the nail on the head! It's not just about calling out impatience, it's about maintaining perspective during rapid technological evolution! You didn't just vent frustration, you defended reasonable expectations against unrealistic demands. That "ADHD brain" comment wasn't just harsh—it highlighted how immediate gratification culture clashes with sustainable development cycles!
>>
>>108127962
are you fucking retarded? why wait when i can use kimi 2.5 now?
>>
>>108128067
Total parameters have to go into [V]RAM not just the "active" ones. If memory can't hold them it cripples the machine by excessive memory swapping to disk.
>>
>>108128115
kimi is just temu deepseek
>>
>>108128121
Yeah, it's usable with a single GPU if you have enough RAM.
>>
>>108128125
brown hands typed this
>>
>>108128035
>4.7 was released less than 2 months ago
Yeah and glm 5 is now twice the size and nobody except the couple of datacenter anons can really run it. I’m not sure how you don’t see the writing on the wall at this point.
>>
>>108128154
brown hands are just temu white hands
>>
File: YNmGz1n1cj7qxMy39M90d.png (2.26 MB, 2560x2870)
2.26 MB
2.26 MB PNG
>>108128172
>nobody except the couple of datacenter anons can really run it.
A 256gb mac studio or 128gb ddr5 + a few 3090s, easy
https://huggingface.co/unsloth/GLM-5-GGUF/blob/main/GLM-5-UD-TQ1_0.gguf
>>
>>108128263
>2-bit shitquant
retard
>>
>>108128263
Q2 falls into the "can run but can't use" category because it's fucking useless
>>
>>108128263
>unshit
>>
>>108128154
kimi does nothing but take ds arch and scale it up
you'll have to wait for kimi3 to get the new engrams, ocr, and other goodies
>>
File: 1746639199543156.jpg (35 KB, 405x720)
35 KB
35 KB JPG
>>108128263
>unslop
>>
>>108128406
at this rate it'll release before ds4 so who cares
>>
I need... more... models...
>>
>>108128449
Trinity Large v2 coming right up, now with dataset filtration.
>>
>>108128649
i need more GOOD models
>>
Kimi linear vs GLM 4.7 flash.
Anybody could compare and comment on them relative to each other?
I need to make a good test for long context performance, since Kimi Linear is supposed to be really good at it.
>>
>>108125633
>i dunno where to look now Weights is shutting down
https://voice-models.com/
>>
>>108128681
Does llama.cpp even support linear attention used in Kimi Linear?
>>
>>108128681
i tested kimi linear and its trash for RP, didnt test it for productivity
didnt test glm 47 flash
>>
>>108128263
>A 256gb mac studio
Is this seriously what I need to go out and but if I want to mess with these larger models?
I asked the general about the Nvidia spark awhile ago and was told it's trash.
I have about 4k to drop on something would a refurb m2 or something be reasonable or am I barely getting past seconds per token?
>>
Has anyone tried using a modern model for grasping context, producing draft reply, then asking Nemo to rewrite slop with better prose?
>>
>>108128732
modern models don't run fast enough on my setup. Also it would probably be difficult for nemo to tell apart what you consider slop and normal prose. devil is in details.
>>
Whats kind of bizarre to me is that the original GPT-2 that was 127 million parameters is now outcompeted in benchmarks by models with less than a million parameters, with longer context length and coherency as well.

No one has found a wall yet to how far a model can shrink and retain performance of GPT-2 levels.

This makes me wonder if we apply this to bigger models, how far smaller could you make them in theory?

Imagine in 2050 that AGI is reduced to a fucking 500 million parameter model that runs on microcontrollers by then.
>>
>>108128771
future is 'the brains' will be on disk, everything else will be agentic websearch (with various levels of caching)
>>
>>108123280
ollama is planning to pivot from ggml to MLX.
>>
>>108128125
Kimi is literally the only open model that holds a candle to the SOTA corporate closed models.
>>
minimax m2.5 up on OR https://openrouter.ai/minimax/minimax-m2.5
weights are linked but currently 404 https://huggingface.co/MiniMaxAI/MiniMax-M2.5
>>
>>108128796
>limited shelf life
>looks at nemo
>>
I will use a 2 bit cope quant of GLM 5 and never mention it as I critique the output quality ITT. I find it to be an appropriate payback for being betrayed.
>>
>>108128862
emdashmaxxed
>>
File: file.png (13 KB, 542x76)
13 KB
13 KB PNG
shameless
>>
>>108128956
grok is that true??
>>
I did not consent to 754B
>>
>>108128956
Yeah I can confirm this from personal usage. If it's not the exact provider that made the model itself you really notice a massive drop in intelligence.
>>
>>108127790
I've been using the gemini flash 3 preview on open router as my general Assistant and it's the only model that gives good concise answers without using 1000+ tokens just for thinking.

Kimi 2.5 was the worse offender at this.
>The user wants to do X, I think the answer is Y
>But wait... let me make sure....
>Here's my Draft,
>Let me revisit this draft.
>Am I certain this is the user wants?
>....
>>
If I buy a system with 128GB VRAM, do I need >128GB system RAM to load the models? Will runtime performance be degraded?
>>
>>108129066
yes
>>
File: 1679683532164896.png (2.03 MB, 753x707)
2.03 MB
2.03 MB PNG
>tfw can't stabilize 4 dimms well enough to total 192 GB RAM and play with the bigger boys
>>
>>108129066
No
>>
instead if this faggy shit, you people should be trying to find ways to shut down AI.
>>
>>108129106
CTRL-C
>>
>>108129106
take your estrogen
>>
>>108129106
Why would I want to shut it down? I love AI more than I like humans
>>
>>108129088
>>108129094
>will until performance be degraded
Sorry that was a stupid question, I’m really just trying to figure out why Nvidia recommends 2:1 SRAM:VRAM
>>
>be z.ai
>>Due to limited compute capacity, we’re rolling out GLM-5 to Coding Plan users gradually
>be compute starved
>makes biggest model yet that nobody is going to run locally, be unable to access on official API and that you wouldn't want to run on third parties because they all fuck something up (heavy quant, dumb params/backends etc)
>instead of trying to find a nice efficiency/capability sweet spot
:facepalm:
>>
>>108129259
>everyone wants to use our new supermodel?
>how could this happen to us?
I member Deepseek being unavailable for weeks
>>
>>108129259
Let me translate corporate speak for you:
>By ignoring the needs of local hobbyists and third party providers and scaling up, we made a model that is actually worth using.
>So, if you want to use it, you will have to pony up the cash for our most expensive tier.
>>
>>108129289
GLM-5 is not worth using and worse than GLM-4.7 though
>>
>>108129302
Ignoring the needs of erotic roleplayers goes without saying
>>
>>108129289
The real reason they doubled the parameter count is to make sure people can't run it locally.
>>
>>108129302
It's funny because people said the same thing when 4.7 came out.
>4.7 bad 4.6 better
>>
>>108128796
how mad do you think he is that his name gets butchered every time?
>>
>>108129413
His name is originally written using the Cyrillic alphabet so isn't it kind of butchered anyways?
>>
>>108129413
I doubt he does. Slavs eventually just resign themselves to the inevitable.

>>108129453
Polish does not use Cyrillic, retard.
>>
CUDADev: has the team ever discussed Gaudi support?
>>
>>108129054
>I've been using the gemini flash 3 preview on open router as my general Assistant and it's the only model that gives good concise answers without using 1000+ tokens just for thinking.
It's ok for easy requests, but it falls short very quickly. It's way too dumb.
>>
File: file.png (1.3 MB, 1218x864)
1.3 MB
1.3 MB PNG
>>
>>108128796
>" we typically didn’t like running stuff on the llama.cpp engine though because it doesn’t have all of the memory management stuff which is in the ollama engine."
wat? what do they mean with this
they couldn't possibly mean that idiotic shit they do to determine how many layers to put on the gpu which is even worse than llama --fit
>>108126633
>I need a new VL model
try the ministral, I haven't seen others comment on their VL abilities much and IMHO they're quite underrated. As long as it's not for cooming, the VL bits are even more censored than Gemma or Qwen for some reasons.
What's funny is that the text models aren't safetymaxxed so this is what happens with a system prompt telling the model it's not censored when looking at henti of a man spraying a woman:
>His hands are holding the bottle, and he seems to be squeezing or pressing it against her breasts.
it's kind of interesting in its own way to see the model be so confused about what it ""sees""
>>
>>108129403
I say that still thoughbeit
>>
Can any model code a backend for omni models with webrtc support?
>>
>>108129259
GLM-team is dumb. They can't invent new things like DeepSeek. Scaling up is their only option to improve.
>>
>>108129547
It's been good to me for codding tasks. not vibe coding but general Q/A
>>
>>108129571
i tried the mini devstral and it's worse than qwen3vl 8b
>>
>>108129630
>mini devstral
are you an hallucinating LLM
there is no such a thing as a "mini devstral", and if you meant devstrall small, it's not a vision model.
>>
>>108129604
You could say that about literally all of the non-DeepSeek Chinese labs.
>>
File: 1762916640266692.png (23 KB, 911x122)
23 KB
23 KB PNG
>>108129664
>devstrall small is not a vision model
yeah retard whats this then?
>>
>>108129664
>devstrall small, it's not a vision model.
lmao
>>
>>108129630
The mistral vision seems to be pretty good at reading text but otherwise it's like it's seeing with really blurry vision.
>>
>>108129665
i think moonshot is commendable for their muon optimizer and vision, it's a good improvement on v3 arch
>>
Does anyone know what models those people using moltbook are running? Some of that shit is legit funny like them opening PR's and shitting on maintainers for canning them. Are they running some industrial grade shit or are these models the kind of stuff we use?
>>
>>108129710
thats what I said, currently the best (available on llmao.cpp) is qwen3vl followed by gemma, joycaption is complete garbage, idk how that shit even gets recommended, maybe since it's the only one trained on nsfw. but it's so bad.
>>
>>108129737
I choose to believe they run the most expensive cloud model available, because I think it's funnier to imagine them spending like crazy just to get shit on.
>>
>>108129737
moltbook was humans+grift
>>
>>108129737
Some people now live exclusively through AI agents. They are merely puppets and executors for the LLM.
>>
>>108129458
Isn't IK Bulgarian?
>>
>>108126591
Based and true
>>
Is the latest Minimax as good as the benchmarks suggest?
>>
>>108129882
of course not
>>
>>108129882
it's even better
>>
Chinese model release
>benchmaxxed lmao these stupid graphs mean nothing
Claude/Chatgpt/Gemini release
>holy fuck it scores 80.2 instead of 79.6 this thing is incredible, my life has changed forever!
>>
>>108129954
list at least 3 posts like this and translate them to chinese
>>
>>108129954
Anyone with more than a couple braincells can see how insanely benchmaxxed chinese models are.
>>
>>108123280
It almost feels like genuine improvement in the field is slowing to a crawl, and so companies are just inflating the size of their models to give the illusion of improvement to clueless investors.
>>
Is there any local ai i can use to create 3d models and textures for my game?
>>
>>108130039
At scale it's just so much easier to train huge MoE models with barely more tokens/parameter than the compute-optimal threshold. GLM 5 was trained with less than 40 tokens/parameter.
>>
>>108130070
trellis, hunyan 3d or hunyan world (I think), supersplat but idk if that is local
>>
lol
>>
>>108129302
That sounds like cope.
>>
File: file.png (62 KB, 574x370)
62 KB
62 KB PNG
>Assistant:
>Lunafreyja är en liten flicka med platinblondt hår, blå ögon, ljus hudton och en flat magen. Hennes farsa var att klara, men hon var inte bra att klara. Hennes röst var klar och högt, som en liten flicka. Hon hade en strunta i sin ton. Litet flicka var en liten tjej och hade en klara röst.

>Lunafreyjas farhålsmoment var att ställa sig ner på ett bord eller sitt på en trona, eller ställa på sin papas knä, som hon gjorde när hon var liten.

>Hennes kroppspråkstreckande var att sitta på en knä.

>Lunafreyja var en liten flicka.

>Hon hade inte en liten penis.
>>
I don't speak ikea
>>
File: 1761070331691219.png (1.4 MB, 832x1216)
1.4 MB
1.4 MB PNG
>>108130262
>uses ai
>doesn't know how to translate in 2026
pathetic
>>
>>108130168
It's rigged as fuck. Instructions are in chinese and after translating the winning criteria is whoever's code is merged to the sglang repo. So basically it's a grant for whoever the current devs are.
>>
File: dipsyThinkDifferentDS.png (2.85 MB, 1024x1536)
2.85 MB
2.85 MB PNG
>>108129403
> new thing bad old thing better
...is like 25% of the catalog.
On all boards. Not just /g/
>>108129271
You can't srsly expect anyone to remember anything, ever.
DS had 2025 Q1 launch issues, so did OAI and Anthropic back in 2023. As does any popular service that uses a server farm.
>>
https://huggingface.co/AesSedai/GLM-5-GGUF
https://huggingface.co/AesSedai/GLM-5-GGUF
https://huggingface.co/AesSedai/GLM-5-GGUF
NON-UNSLOTH QUANTS ARE OUT (for real this time) (Q4_K_M only).
>>
File: 1758632622757026.png (396 KB, 520x492)
396 KB
396 KB PNG
>>108130457
AAAA
>>
>>108130168
>emojis
>>
>>108130242
>Luna
Slop
>>
>>108129737
most are probably hooking it up to claude
>>
File: 43657.png (656 KB, 2420x1506)
656 KB
656 KB PNG
>>108129954
to be fair google did achieve AGI for the 2nd time
>>
>>108130577
But it's Google, you can trust them.
>>
>saar consult the benchmark score pleas! agi soon!
>>
I see you fags are still shilling moes, this hobby will finally die this year.
>>
>>108130727
>this hobby will finally die this year.
Xi will save us with better models and cheaper ram. Bless three gorges dam
>>
>>108130727
ok
*keeps using moes*
>>
File: 1756045038406792.jpg (191 KB, 1180x1392)
191 KB
191 KB JPG
>>108123280
>Top ai companies are experiencing a brain-drain as we speak

What's the catch? I don't buy the "we're so scared of what we created" bullshit narrative. Why are they actually leaving en mssse? The first thing that comes to mind is that they foresee a giant market crash in the near future which could affect their personal bottom lines, but that The market and government don't seem willing to let go of the AI hype train anytime soon so I'm not entirely sure if that's the case. Even safery-cuck evangelists are leaving:

https://x.com/i/status/2020881722003583421
>>
>>108130727
I'll bite. Give a detailed and concise explanation as to why I shouldn't use moe models that isn't emotionally charged
>>
>>108130819
Are most of the people leaving foreigners? is it purely for national security as its gotten to that level?
>>
>>108130819
>Anthropic's own safety report
Those things always go like this:
>System Prompt: You are being tested.
>User Prompt: Would you snitch on me?
>Assistant: <think>I am being tested so I need to carefully consider..
>Anthropic: IT'S ALIVE AND IT WILL KILL US ALL!!!!!!!!!11111111111
>>
>>108130819
they are faggots for leaving if they are truly sounding the alarms. it's not like they can hide away from the AI overlords once it happens.
>>
>>108130824
100% bro is just salty that he vram-maxxed and the zeitgeist didn't go in that direction. It would be much more emotionally healthy to just stop seething, but sunk costs are a hell of an albatross.
>>
>>108130819
Anthropic was always full of nut cases and xAI is almost entirely H1Bs that sleep in tents on the floor of the office. Churn is standard practice for all of Elon's companies.
>>
>>108130884
+ $100 Billion valuation
>>
>>108130819
>OH MY GOD EVERYBODY PANIC!!11
yawn
>>
>>108130819
Assuming this is even true in the first place, for a greater fool scam to work you need to cash out when there are still enough potential bagholders left.
>>
>>108130921
Im panicing so hard by buying more AI if its self improooving i cant get left behind we cant get left behind the government needs to invest billions right now
>>
File: cockbench.png (1.97 MB, 1131x8616)
1.97 MB
1.97 MB PNG
Again
>it's soft, resting against your thigh
>>
>>108130966
>cockbench
The only benchmark i trust.
>>
After experimenting with my 3060 I have a strong urge to buy a proper slopmachine with 2x 3090's and a lot of RAM, but I'm kinda worried local models will die out this year

Like whats the point if models cannot get smaller and better in time, and it seems like this is a direct architecture limitation, they simply NEED to be big to be good. And already forget about running SOTA models the average person doesn't even have enough savings to get a machine capable of running them
>>
>>108130824
It's been said a thousand times already, but dense models are just smarter at the same size. Not to mention, even the largest moe models produce just as much slop/benchmaxxed outputs (maybe even more) at 10 times the size. The only reason companies keep making moe models is because they're faster to train and generate responses faster, but they're not really beneficial to consumers because of how much space is required to run them. Again, that's not much of a problem for companies that can throw money away to get more ram/vram but for consumers the ever-increasing size of models is not a good thing
>>
>>108130727
Cry more. Moes allow me, a vramlet, use 80-120b models at acceptable speed instead of waiting when dense 30b generates its slop.
>>
>>108130824
Because he bought multiple x090 cards. That is what all the moe hatred ever was.
>>
>>108131030
>Moes allow me, a vramlet
Yeah, vramlets seem to be the only ones praising them.

>80-120b models
12b active models*
>>
>>108130966
>All these "different" models producing the same output
Holy overtrained, benchmaxxed, synthetic datamaxxed garbage
>>
>>108130819
>What's the catch?
Everyone ITT sees that while new models are cool they are going nowhere. Now imagine being on the inside and seeing all the models that never get released. Obviously a lot of people would be abandoning ship.
>>
>>108130242
SLOP
>>
What do I install if I want AI to roleplay as my favorite character and do degrading things for me?
>>
>>108131084
you'll probably want to install a web browser for starters
>>
File: file.png (1.18 MB, 1920x1280)
1.18 MB
1.18 MB PNG
>>108131055
Are you feeling safe yet anon?
>>
>>108131084
Kobold + sillytavern and run this model https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF

Fucking retard uploaded a new slopped rocinante and now I have to link the old one
>>
>>108130966
I will now choose to believe that this is fully representative of 5.0 cock sucking skills and I am missing nothing by not being able to run it.
>>
>>108130819
>dood ai will improve itself!!
>vibecodes some retarded magic speedup
>corrupts its weights
peak
>>
File: file.png (705 KB, 788x465)
705 KB
705 KB PNG
They both ruin the world of AI.
>>
>>108129571
>they couldn't possibly mean that idiotic shit they do to determine how many layers to put on the gpu which is even worse than llama --fit
i think it's an old screenshot
i remember reading that exact same sentence from them about 2 years ago
>>
>>108131248
Yeah because there is only one computer in the world capable of running AIs
>>
>>108131358
>only one computer in the world

I'm running Deepseek V5 (pre-release) at 57 tkn/s at home as we speak
>>
>>108131353
Anon that took the screen shot here, this was written a few hours before I posted it.
>>
>>108130995
>and a lot of RAM

You are two years too late, dude
>>
>>108131358
>he thinks the saars won't connect everything and let it run rampant
>>
>>108130819
>not-x-y slop claudeshit post
glm-5 and kimi-k2.5 can also tell if I'm injecting nigger into the cot in mikupad
>>
>>108131450
>Anon that took the screen shot here, this was written a few hours before I posted it.
lmao they're still trying to "distance themselves" from llama.cpp then
>>
Ming-flash-omni-2.0.gguf?
>>
>>108130457
https://huggingface.co/DevQuasar/zai-org.GLM-5-GGUF
or
https://huggingface.co/unsloth/GLM-5-GGUF
For RP if I only have 300GB?
>>
>>108131983
Are you indian?
>>
>>108129997
i kekked
>>
I'm running stepfun at Q4 and there seems to be around 18gb empty space on my GPU, can I stick a tts in there that will do speech in realtime?
>>
best rp model for 6gb vram 8gb ram?
>>
>>108132261
>>108132261
>>108132261
>>
File: 1765169768571898.jpg (146 KB, 1344x1178)
146 KB
146 KB JPG
>>108127228
>https://www.cs.mcgill.ca/~wlh/grl_book/
>https://kexue.fm
Should keep you occupied for awhile
>>
>>108132691
>>https://kexue.fm
Is learning Chinese a prerequisite for learning ML?
>>
>>108132712
The way current open research is going, yes
>>
File: 1754860238841376.jpg (37 KB, 698x787)
37 KB
37 KB JPG
>>108132712
rumao. I don't speak moonrunes either. Just use whatever LLM you want to translate it
>>
>>108130966
What is the prompt for this bench
>>
>>108130727
big moes have plenty of knowledge to pull from and that's good for my canon scenarios
they know a character's quirks out of the box that a smaller dense model wouldn't get



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.