[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: winter miku.png (1.79 MB, 768x1344)
1.79 MB
1.79 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107412042 & >>107405479

►News
>(12/02) Mistral Large 3 and Ministral 3 released: https://mistral.ai/news/mistral-3
>(12/01) Trinity Nano (6B-A1B) and Mini (26B-A3B) released: https://arcee.ai/blog/the-trinity-manifesto
>(12/01) Merged: model: support Ministral3 #17644: https://github.com/ggml-org/llama.cpp/pull/17644
>(12/01) DeepSeek V3.2 and Speciale released: https://hf.co/deepseek-ai/DeepSeek-V3.2-Speciale
>(11/28) Qwen3 Next support merged: https://github.com/ggml-org/llama.cpp/pull/16095

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: migu.jpg (52 KB, 736x649)
52 KB
52 KB JPG
►Recent Highlights from the Previous Thread: >>107412042

--Troubleshooting persistent prompt caching in llama-server with slot API endpoints:
>107415735 >107415774 >107416053 >107416084 >107416666 >107416689 >107416734 >107416757 >107415832 >107416062 >107416103 >107415898 >107415914
--Mistral model's formatting issues:
>107416083 >107416554 >107416580 >107416560 >107416641
--Z-Image-Turbo Chinese prompt optimization with low VRAM solutions:
>107415693 >107415700 >107415702 >107415718 >107415725 >107416187 >107416711
--Critique of Mistral's model release strategy vs Llama 4:
>107420881 >107420930 >107420989 >107421043 >107421050 >107420983 >107420984 >107421070 >107421055
--Designing neural networks for specific cognitive functions:
>107412634 >107412692 >107412725 >107412778
--RTX 5070ti suitability for local models and overcoming token repetition challenges:
>107412315 >107412386 >107412419 >107412495 >107412541 >107412595 >107414188
--DDR5 price spike and DRAM market implications for consumers and tech companies:
>107421733 >107421959 >107421962 >107421993 >107421996 >107421969 >107422007
--AI model performance and usability debates:
>107420176 >107420784 >107421608 >107421728 >107421998 >107422102 >107422145 >107422210 >107422360
--Minimalist AI/ML learning resource recommendations for webdev transitioning to AI:
>107418678 >107418849 >107418882 >107418913 >107419160 >107419241
--Testing VibeVoice's Japanese and high-pitched voice capabilities:
>107419296 >107419308 >107419333 >107419339 >107419364 >107419372
--Perceived stagnation in LLM pre-training advancements:
>107416311 >107416340 >107416369 >107416402
--Tetris clone generated by Ministral 3 14B Q4 using pygame:
>107417044
--Nvidia's involvement in MistralAI's new model training:
>107421114
--Miku (free space):
>107416083 >107416641 >107418984 >107419160 >107419653 >107421986

►Recent Highlight Posts from the Previous Thread: >>107412048

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
https://huggingface.co/NousResearch/Hermes-4.3-36B-GGUF
>Hermes 4.3 36B is a frontier, hybrid-mode reasoning model based on ByteDance Seed 36B base
>Seed 36B base
first time I've heard of that model
>>
>>107423050
Funny. Second time I've read that post.
>>
>>107423174
almost as if it's useless to ask a question on the end of a thread or something
>>
File: 1761117575776279.png (10 KB, 957x596)
10 KB
10 KB PNG
Why doesn't Ministral work in ooba?
>>
>>107423191
you're using ooba trying to run a new model
>>
>>107423206
Yeah you're right. Silly me! Haha!
>>
>>107423050
>hybrid-mode reasoning
Into the trash it goes. Seriously though, a dense 36B reasoner with 512K context might be good for low-VRAM agentic coding.
>>
>>107423188
>ask a question
You didn't ask any question.
It didn't take much to find it
>https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Base
>>
>>107423221
>You didn't ask
it was implied, but only the 3 digit IQ could get that, sorry for your loss
>>
>>107423230
How much IQ do you need to use hugging face's search, then?
>>
>>107423239
how much IQ do you need to understand that I wanted to hear people's genuine opinion of that base model, and not a model card shilling their own product? are you retarded or something?
>>
>>107422991
>jacket
>scarf
>bare thighs
Is Miku retarded?
>>
>>107423217
dense is fucking shit bro
no reason to use it over moe
>>
>>107423245
>blablabla
>Never heard of it. Is it any good?
>>
>>107423270
>Yes, I'm retarded and autistic, and I can't read the room
yeah I know
>>
>>107423261
That's why Maverick 402B was better than the shitty old and dense 405B, right?
>>
>>107423191
The best decision you can make in this hobby is to ooba(pieceofshit) > llamacpp . It is like moving from windows to troonix if troonix was actually good.
>>
>>107423247
yes
>>
>>107423308
dense 405b was fucking shit too, they're both about equally bad
qwen showed that dense is utterly worthless and only helps people cope for overspending on hardware with qwen3-32b dense and 30ba3 where the latter was superior
>>
>>107423247
that's just women in general
>>
>>107423284
Communicating effectively is very useful, anon. You should try it.
>>
>>107423326
superior on benches?
>>
>>107423352
it was an effective communication, but you're too retarded to get it, not my fault your IQ is too low, blame god for that or something
>>
need magistral large 3
>>
>>107423375
need mistral medium
>>
>>107423363
It shows. You got a lot of useful replies.
>>
>>107423410
All because of tismo derailing.
>>
File: G7NSq1aW0AAeot-.jpg (192 KB, 1586x1036)
192 KB
192 KB JPG
lol. mistral large 3 is slightly better than a 9B / llama 4 maverick. Not even close to glm / deepseek / kimi...
>>
>>107423410
>You got useful replies.
I did yeah, again, are you retarded or something?
>>107423217
>>107423261
>>
If someone had actually supported and quanted the cohere with vision, i would have used it. Nobody did that tho.
I do need to revisit the ones that work since everything else is parrotmaxxed. Not like we're getting a large or have anything to look forward to anymore.
>>
>>107423435
that is what happens when EU law forces you to strip all copyrighted data out of your datasets.
>>
File: 1711104757843628.jpg (123 KB, 768x1024)
123 KB
123 KB JPG
>every model is bloated 1000TB moe
>they're still all slopped garbage
>nobody bothers to finetune anymore which was the theoretical advantage of local models
>rammaxxing was already a meme and is now fully dead
is it time to accept that it's really, truly over this time?
>>
>>107423435
>their own medium is quite a bit higher
wow I know migastral is a thinker but still
>>
>>107423473
just stop being poor, though ram costs a arm and a leg now
>>
>>107423440
You could have gotten that out of the mode card, anon. The one you could have found yourself.
>>
File: crysad.jpg (136 KB, 1887x1742)
136 KB
136 KB JPG
>>107423473
Looks pretty over yea. Be happy with what you got and maximize it.
>>
Sirs when Ganesh Gemma 4 to increase Bharati izzat?
>>
>>107423483
>You could have gotten that out of the mode card, anon.
all right he's genuinely braindead >>107423245
>how much IQ do you need to understand that I wanted to hear people's genuine opinion of that base model, and not a model card shilling their own product?
>>
>>107423435
>20b gpt-oss at the same level as R1
nice benchie bro
>>
>>107423435
artificial analcysts literally gushed over reflection 70b. none of their benches mean a thing.
>>
izzat!
get it it's indian so much kek lmao
>>
>>107423435
>GPT-OSS-20B above GPT-5.1
>>
>>107423533
Well, yeah, they were given Claude behind a wrapper
>>
>>107423534
Good Morning!
>>
>>107423534
SAAAAR
>>
>>107423547
>experts
Didn't notice. They're grifters supreme.
>>
>>107423473
>is it time to accept that it's really, truly over this time?
DeepSeek-V4-BitNet-671B-Omni (80GB) will drop for Christmas and save local
>>
>>107423534
SAAR DO NOT REDEEM MY IZZAT
>>
why can we not have medium sized models
>>
>>107423585
GLM Air? GPT-OSS-120B?
>>
File: 1744642411599485.png (57 KB, 608x695)
57 KB
57 KB PNG
>>
>>107423594
yeah, but actually good medium sized models
>>
>>107423534
???
>>
>>107423585
They eat into API profits.
>>
>>107423534
izzat respect deez nuts ahhahahaha
peak delhi humor activated bhai, full izzat loot li tune
>>
>>107423595
>feb 2025
unc posting prehistoric cave paintings
>>
>>107423611
keek
>>
>Transformers doesn’t support deepseek_v32
Over before it started
>>
>>107423595
>>107423611
it's timeless, nothing has changed
someone is still using dr*mmers tunes, guaranteed
>>
>>107423642
Yeah that someone is me it's the only thing that works all newer shit is just cope.
>>
>>107423659
For all the shit people talked, he's the last tuner left. Actually tried his hand at air and cranked out a bunch of larges that top out the UGI leaderboard.
>>
Is it possible to make a finetune of glm air on consumer hardware?
>>
>>107423762
no
>>
>>107423762
Depends on what you count as "consumer". A stack of 96gb 6000s, sure.
>>
>>107423762
Basically no, but it's not that expensive to rent a few big boy GPUs. Your dataset and strategy is what matters not so much the hardware
>>
>>107423776
>>107423781
This guy claims he did it with a single blackwell 6000 but that seems impossible. https://huggingface.co/Green-eyedDevil/Monika-106B-GGUFs
>>
File: file.png (46 KB, 847x151)
46 KB
46 KB PNG
whats causing the llm to do this

this is qwen vl 2b trying to describe an imag
>>
>>107423852
temp is too high presumably
>>
>>107423852
repetition penalty
>>
>>107423852
Repetition penalty. Turn that shit off.
>>
>>107423856
>>107423858
is it the repetition in the prompt? i do have some repeating sentences
>>
>>107423872
It's a setting on whatever you use to run the model. Either the front or the backend. Disable it.
>>
File: file.png (6 KB, 668x60)
6 KB
6 KB PNG
>>107423895
nice found it thanks
>>
>>107423834
Unsloth was promoting some new feature recently for finetuning deepseek with little vram so it was probably that
>>
Getting tired of the usual models I keep around for workshopping story ideas to write, give me some suggestions
>inb4 smolm, toss 20b, petra 13b, starling, pyg
>>
>>107424320
toss 120b
>>
>>107424320
https://huggingface.co/roneneldan/TinyStories-1M
>>
>>107424328
I mean I guess I could if I want to wait for it to ramble about its policy about a story designed to be ToS friendly for 8k tokens. Actual suggestion?
>>107424350
I've used 32bs that become retarded in 1k tokens, but thank you for your very constructive contribution
btw this retarded behavior is why virtually the entire thread hates you, find another hobby
>>
>>107424391
GLM Air is basically the only semi-decent and realistically runnable model available right now. Check back in 2 weeks.
>>
>>107423834
>Trained with Axolotl on my Blackwell Pro 6000 Max-Q. 12 rank, 24 alpha, 3 epochs. Took about 45 hours.
It's probably qlora. Minuscule rank.
>>
>>107423191
LOOOOL did you draw this?
>>
>>107424413
That's what made me open the thread, I was swapping between a bunch, regenerating model replies and was like "welp guess I have to try air" and it made a very stupid mistake within 700 tokens. The rest of it was fine, but it's just irritating. I was figuring maybe someone has a non major company model they like using for an occasional breath of fresh air, but I guess it's all just meta, qwen, mistral or whatever
>>
>>107423585
qwen 30b and other 20b-30b models
also qwen 80b
there's 48b model too, but no gguf
next med sized qwen will be around 30b-60b
yes all moe because moe is luv
>>
>>107424449
Yes
>>
>>107424467
What kind of mistake? You could edit your character card or system prompt around it. There is no such thing as a perfect model, so you basically have to play whack a mole in order to make any model actually usable.
>>
>>107423834
>>107424437
Why do grifters pretend that their loras are tunes?
>>
>>107424496
It conflated two completely separate pieces of information about the setting as somehow related to one another. Can't even say it's the quant either, since it's nearly 5bpw. Also raw completions, with just a system message stuck at the beginning to give it a very short amount of information on the fact that it is to act as a writing workshop. Tone down the sycophancy, provide constructive criticism, that sort of thing. Wouldn't cause it to just mix up two totally irrelevant pieces of info. I think I'm going to cut and paste the response into a file, then re-dl one of my old favorites that was particularly retarded but took things in interesting directions and compare
>>
>>107424582
they have always been synonymous.
>>
>>107424582
They ARE tunes.. just very superficial ones.
>>
>>107424582
because admitting to making negligible changes on small models using a public dataset and a gamer gpu has negative effects on donation revenue
>>
File: Antigravity_ngtwJEPTLH.png (93 KB, 1506x1005)
93 KB
93 KB PNG
What's a good IDE with agents?
Similar to antigravity, but something that is open source and I can use my local (ollama) models?
I'm enjoying antigravity a lot but I'm frequently running into model limits

>inb4 its just a visual studio fork
>>
>>107425021
its just a visual studio fork
get the real thing with roo cline
>>
>>107424582
>Why do grifters pretend that their loras are tunes?

He's not exactly shilling that model. Looks like some guy just finetuned (LoRA *is* a way of fine tuning) GLM to make it work better with his software.

And I create them all the time. If I have a repetitive task I need to keep sending to a larger model, I'll often create a dataset and create an adapter for a smaller model.

Or for TTS models, if you want to add more voices, it's much easier to create a few adapters and load them on the fly.

You can load/unload them without having to swap models.

What I don't like, is when people train an adapter, merge it into the base model, then only release the full fucking model on HF.
>>
>>107425021
As the other anon said. Visual Studio Code with an extension like Cline, Roo, Continue, Gemini Assistant, etc.
Or just use a cli.
>>
>>107425047
>What I don't like, is when people train an adapter, merge it into the base model, then only release the full fucking model on HF.
elaborate on this? would you prefer that both the merged model and lora are released?
>>
>>107425073
>>107425040
Ok thanks, I'll look into those.
Which one is the top choice for local agents?
I can't pay for a service but I have a 16gb vram card and I'm downloading Qwen3 coder.
I guess it should be good enough for simple tasks and I'll hit the bigger models when needed
>>
File: 1761834532357711.gif (1.38 MB, 1364x1364)
1.38 MB
1.38 MB GIF
I'm running Qwen 2.5-14B on a RTX 3070 with 8GB of RAM through KoboldCPP and then roleplaying (NSFW) through SillyTavern.
The results are fine for one on ones, although if the context gets full it starts repeating itself a ton and I have to restart with a summary instead, which is annoying but acceptable. However, the results for multi-character roleplays are pretty atrocious.
Is this something I can improve by changing models? Or is it an issue with the character cards/settings? Or is my hardware simply not sufficient? Any guidance is welcome, even if it's unrelated to my specific question (like informing my model is a terrible choice because of some shit I'm completely ignorant of).
>>
>>107425155
>would you prefer that both the merged model and lora are released?

Yes! I often end up trying to extract it myself with mergekit.

That would make life so much easier for those of us who load/unload LoRAs on depending on the task we're doing. Not to mention saving on storage costs.

I wonder how many merged copies of of llama-3 70b are sitting on people's drives.
>>
>>107425222
Roo is good for tasks, Continue for autocomplete and chat.
Qwen3 coder should be fine for simple stuff, you could stretch it if you have a lot of regular ram.
Not really local, but there's https://github.com/router-for-me/CLIProxyAPI to scam the free tiers on the CLIs.
>>
>>107424941
remember how rich undi got. he drives a hummer now. no time for us peasants.
>>
>>107425298
I myself like it when people provide loras alongside their finetunes I can merge them into another finetune I like in the same arch but at least with llamacpp, if you try to run a model with a lora unmerged, it runs on cpu and fucks your gen speed. Maybe this isnt the case with other frameworks (doubtful, or it's completely negligible if both are fully run in vram) but that's my guess as to why people dont bother sharing the lora and instead just share the full weights
>>
>>107425047
>If I have a repetitive task I need to keep sending to a larger model
examples? always looking for inspiration from how others use theirs
>>
>>107425366
lora works fine in transformers and exllama. VLLM too. It's only llama.cpp that can't hang. They used to let you merge the lora into quantized GGUF but that fell by the wayside.
>>
Does /lmg/ have any /lmg/-approved loli cards for ERP?
>>
>>107425418
It was wonky but I quantized loras, then merged them with the actual model via mergekit, though it's been a while and a lot has changed even in the last year. But if you're stuck on a non nvidia card from before AI really started being a thing you're not going to be using vllm or exl3 now. I do wish loras were more like sdxl loras where it was just a small ass thing that didn't suck ass but had an effect. Control vectors at best sort of came close to that, but that's a whole other irritating beast that isn't even supported on most models
>>
>>107423762
Do you mean group chats? They're trickier to get to work well. Something like
>Now I'm going to reply from {{char}}'s point of view, writing in third person.
can help, but the model might just be too dumb.
>>
>>107425539
Sure, she's named Elara on chub
>>
>>107424320
https://huggingface.co/NousResearch/Hermes-4.3-36B-GGUF
>>
>>107425021
Zed, if you don't want another vs code fork/plugin
>>
>>107425294
>'m running Qwen 2.5-14B
Please, try nemo or Qwen 3 30B a3b.
Nobody should be running Qwen 2.5.
Note that the 30B model won't fit your VRAM but that's fine. It is a MoE and you should configure koboldcpp to have the experts running in RAM/CPU.
>>
>>107425723
>won't fit your VRAM but that's fine
I seriously doubt this, because when I first set it up I did it incorrectly and it wasn't using my GPU, just my CPU and RAM. It was almost comically slow, often taking like >5 minutes to write a couple of sentences.
>>
>>107425771
>often taking like >5 minutes to write a couple of sentences.
That is what moetards would call fast as fuck
>>
>>107425771
Read about the differences between a dense and a sparse model, specifically mixture of experts.
The 30B model will run faster than the 14B model that'srunning only partially on your GPU.
>>
>>107425771
install linux
>>
>>107425539
***Elara Voss***
>>
>>107425809
The 14B is running ENTIRELY on my GPU, not just partially. It is unbearably slow otherwise.

>>107425812
I have a dual boot system but see no reason it would matter which OS I'm using.
>>
>>107425868
Dense models get unbearably slow on a normal computer if they have to touch RAM. MoE models can stay decently fast even if they don't fit in VRAM.
>>
>>107425956
I really don't understand why you're advocating for this. You're saying it's slow, but not as slow as it would be otherwise? That doesn't make it good.
>>
>>107425981
You can run smarter models at speeds that are still above reading speed.
>>
>>107425868
>ENTIRELY on my GPU,
Ah, got it. I misunderstood then.
Well, I stand by my recommendations regardless.
Specially Nemo.
>>
Has anyone tried the new 14b mistral? Is it better than Magistral?
>>
>>107426032
maybe
>>
>>107425981
Some people get really weird about the dense vs MoE thing. Basically, a MoE only uses a certain portion of it's size at a time.
So 30B-A3B would run about as fast as a regular 3B when offloaded. If you ran a MoE with 14B active, it would be as slow your 14B.
Whether 30B-A3B is as smart as a 3B or a 30B you would have to try for yourself.
>>
>>107426032
lol no it's 24b pruned
>>
>>107426044
That's only particularly true. If you only have some experts on RAM, then only the tokens generated with those experts will be slow, that’s why partial offload works significantly better with MoE models
>>
>>107426044
Compute time doesn't lie. When you gen images and add more steps, the pics often look better.
So now you take an A3B and its really really fast. I'm sure the two are completely unrelated and it's exactly like a real 30b.
>>
>>107426384
>So now you take an A3B and its really really fast. I'm sure the two are completely unrelated and it's exactly like a real 30b.
compared to old 30bs like qwen qwq, yes. its almost as if architectural improvements allow for many other different types of improvements
>>
>>107426384
beats the 32b on benchmarks
>>
>>107426397
Ahh benchmarks, the ultimate test of quality. And architecture. Like mistral managed to ass up deepseek. Maybe it's the curse of killing wendy.
>>
>>107426004
I'm not sure a smarter model would really make a difference for smut, but I can try I guess.

>>107426027
Nemo seems to be a framework, and I'm not going to train my own LLM. Why would you even recommend such a thing?

>>107426044
Like above, I'll try I guess. But it just seems more relevant for someone trying to cheat on their homework or something.
>>
>he doesn't know nemo
>>
>>107426547
>Nemo seems to be a framework
In the off chance you are not trolling, search for mistral nemo.
>>
>>107426609
>mistral nemo
Why would you not say this? It's completely different from NeMo itself.
This is like
>Have you tried Unreal?
>That's a game engine, not a game.
>he doesn't know Fortnite
>>
>>107426686
>That's a game engine, not a game.
Fucking kids these days, man
>>
>>107426686
Since you started you had very strong opinions on stuff you clearly know nothing about. Do you still not understand what a MoE is?
>Have you tried Unreal?
>That's a game engine, not a game.
kek
>>
>>107426686
Masterfully created bait
>>
>>107426702
>>107426740
>>107426747
I genuinely can't tell if you all just decide to troll everyone who comes in or are too terminally online to realize other people don't have the same perspective as you.
Googlin 'nemo llm' gives this: https://github.com/NVIDIA-NeMo/NeMo
Not the model found by googling 'mistral nemo' here: https://mistral.ai/news/mistral-nemo
There is no possible way a reasonable person without prior knowledge would know that they should add 'mistral' to the search.
>>
Is it possible to merge GLM Air with GLM 4.6?
>>
>>107426782
The next thing you say is that you didn’t know Unreal Engine was named after a game
>>
File: m.png (151 KB, 1031x747)
151 KB
151 KB PNG
>>107426871
Not again...
>>
>>107426906
but is it doe
>>
>>107426871
Someone tried to do that with SVD distillation. It was hilarious, he got glazed hard by Gemini and thought he'd done something "clever".

Turns out all he did was convert the GLM-4.5-air weights to FP32 lol

No, you can't merge them, they're different architectures. Best you can do is generate slop-logs from 4.6 and SFT Air with them.
>>
Why do merges even exist?
>>
>>107426983
Mainly to fulfill the merger's desire to contribute with minimal effort. Merges are the product of yet another contraption created to fling shit at the wall in a novel way. They can undeniably change the way the resulting model behaves, but the changes are not always positive. When they are, great; see Mythomix/Max. When they aren't anything special, the shitty model should be deleted.
The problems arise when the mergers try to grift and peddle their bucket of shit, claiming it has value, and perpetuating their delusions of being helpful.
>>
>>107423191
how did you manage to draw me?
>>
>>107425539
loli miku
>>
why can't we have stuff like 100B-A50B moes?
>>
>>107427535
that would be antisemitic
>>
>>107427535
Closest to that is Grok 2, 270b total, 115b active.
>>
>>107427535
not sparse enough to be worth it
>>
>>107427535
Or, hear me out, a dense 50B model. Impossible, I know. But imagine.
>>
>>107427636
more sparsity just makes models dumber for stuff like generalization even if they have access to tons of knowledge
>>
>>107427771
>dense 50B
Completely pointless. Too big for a high end consumer card, too small for 2 high end consumer/1 enterprise card.
>>
my lungs are collapsing, i need air
>>
>>107427535
Furthermore, it should be majority dense parameters so that the experts are more for knowledge augmentation than being relied on for the model's main intelligence.

>>107427771
There's no reason to go full dense when you've got perfectly fine and fast RAM sitting right there to use for knowledge augmentation experts. You did buy RAM before the price hikes, didn't you?
>>
>>107427771
Maybe the sun and moon are the same size when viewed from the earth, but a good size to performance ratio model that fits neatly in a tricked-out consumer setup is just not in the cards. The non-existence isn’t a conspiracy, just a sad reality
>>
>>107427870
Have your Miku use your body for practicing mouth to mouth resuscitation.
>>
I tried out TOSS 120b "derestricted" after all the recent shilling and this is the last time i'm falling for the abliteration meme. What's the deal with all the people acting like it's some big development? The model is STILL gigacucked, and it becomes very evident how hard they scrubbed its dataset clean whenever it has to write something vaguely spicy.
I suppose the technique might be convenient to eliminate refusals with less filtered models, but I can't believe people are really using it with stuff like gemma or TOSS and thinking it actually fixes them
>>
the people need air
>>
>>107428133
Only poor people need air
>>
>>107428151
i need a model i can run in fp16
>>
>>107428151
imagine being such a poorfag that you still have physical needs lmao
>>
>>107428157
Gemma 4b has you covered.
>>
>>107428459
i have 4 blackwell pros. enough for air fp16, but not 4.6 full fp16
>>
>>107428465
2 PSU?
>>
>>107428505
max qs
>>
i don't understand much about the AI lingo so forgive me:
right now i'm using perplexity pro (got it for free) but i don't like the censorship and want to replace it with a local llm
i downloaded lm studio and i have an m4 macbook air with 24gb of ram, what should i be running as a "general" text model (so like chatgpt, i can use it to research stuff, do math, generate code, just chat, etc, a jack of all trades sort of thing)
is gpt-oss good?
>>
>>107429309
Gemma 4 is all you need
>>
>>107429309
There's mistral-small, qwen-3-30ba3b, gemma 27b or 12b, gpt-oss...
You'll have to test them yourself to see if they're good enough for you.
>>
>>107429323
this? or am i looking at the wrong one? i'm a layman but i very much doubt something released 150 days ago is the best option in this space lol, shit gets released and made obsolete every other week
>>107429346
i'm using gpt-oss 20b right now, seems good enough but i'd like to know if there's better stuff i should be running instead
>>
>>107429359
>better stuff
I mentioned a few modes. Try them. Keep the one(s) you like. Better is subjective.
gemma-4 is not out, but there's gemma-3 27b and 12b.
>>
>>107429323
Sorry, Gemma has been deemed too unamerican. Please understand
>>
>107416103
Is slot id a lottery? Can I choose which slot to start with at server's start? This was a fresh start, and keeps picking id3 which I do not know BEFORE I start processing some prompt. But I want to load a pre-caches one
slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id 3 | task 1 | processing task
slot update_slots: id 3 | task 1 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 289
slot update_slots: id 3 | task 1 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 3 | task 1 | prompt processing progress, n_tokens = 225, batch.n_tokens = 225, progress = 0.778547
slot update_slots: id 3 | task 1 | n_tokens = 225, memory_seq_rm [225, end)
slot update_slots: id 3 | task 1 | prompt processing progress, n_tokens = 289, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id 3 | task 1 | prompt done, n_tokens = 289, batch.n_tokens = 64
slot update_slots: id 3 | task 1 | created context checkpoint 1 of 8 (pos_min = 224, pos_max = 224, size = 75.376 MiB)
srv log_server_r: request: GET /props 127.0.0.1 200
srv log_server_r: request: GET /slots 127.0.0.1 200
srv log_server_r: request: GET /slots 127.0.0.1 200
srv log_server_r: request: GET /slots 127.0.0.1 200
srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv stop: cancel task, id_task = 1
slot release: id 3 | task 1 | stop processing: n_tokens = 470, truncated = 0
srv update_slots: all slots are idle
>>
>>107429359
The one on Gentoo is the best one
>>
>>107429427
With a single slot btw
--ctx-size 131072 -np 1 
--n-gpu-layers 99
--no-warmup
--cpu-moe
--jinja
--slots
--slot-save-path "/home/ai/Desktop/KVCACHE/"
--port 8000
>>
So given that western RAM companies are now jerking their dicks to Altman and his incestual billionaire posse and giving the middle finger to consumers, what the hell does that mean for the future of local?
Are we going to be sucking Steve Jobs decayed teat? Are we going to be stocking up on 3090s? Are we just biding our time until either the bubble pops or Xi swoops in and undercuts the API virgins in price at every corner?
>>
>>107429473
its either waiting for the chinks or building your own hardware from scratch like terry did with temple os
>>
>>107429473
It's temporary artificial scarcity. Altman doesn't have the cash to make good on the orders he made. If he can't find investors to foot the bill, the orders will have to be canceled and that capacity that he reserved will go back on the market.
>>
>>107429427
>>107429470
Weird. I always get id 0. Even if i try to force it to something else with id_slot in the request it does nothing. Tried saving a few caches with -np 2, then loading them with -np 1 to see if there's some slot id saved but i couldn't make it happen. Always id 0 with -np 1.
Are you 100% sure you're running the correct script?
>>
Damn, imagefags eating so good and here we are having fucking nothing.
Unless you can run the really big moe monsters or enjoy the qwen/gemma slop.
No mid range dense model or like a fast 120b moe model that is for something else than math and tool calling.
>>
>"something about this" *She gestured vaguely between them.*
Yep it's slopping time.
>>
https://huggingface.co/openai/ChatGPT-6-2T-A300B-GGUF
>>
>>107430027
catpic
>>
File: 1762888821172375.jpg (190 KB, 434x509)
190 KB
190 KB JPG
I read somewhere that GLM4.6 Air will be a smaller model, but I can't find a source.
>>
>>107430186
https://www.chinatalk.media/p/the-zai-playbook
>Zixuan Li: For our next generation, we are going to launch 4.6 Air. I don’t know whether it will be called Mini, but it is a 30-billion-parameter model. It becomes a lot smaller in a couple of weeks. That’s all for 2025.
>
>For 2026, we are still doing experiments, like what I said, trying to explore more. We are doing these experiments on smaller models, so they will not be put into practice in 2026. However, it gives us a lot of ideas on how we are going to train the next generation. We will see. When this podcast launches, I believe we already have 4.6 Air, 4.6 Mini, and also the next 4.6 Vision model.
>>
>>107430204
two more weeks status?
>>
>>107430219
>Nov 21, 2025
In a few days it will be two weeks.
>>
>>107429676
As far as llama-server is concerned, YES

I have other things loaded on GPU in the same time, but unrelated
>>
>>107430204
The wording is a little strange here, but I understand they're going to be releasing 4.6 Air and 4.6 Mini as well as a vision model. I wonder if both Air and Mini will retain the RP of 4.6.
>>
Prediction for gpu prices for the next two years?
>>
>>107430204
>4.6-Air is 30 billion parameters
HAHAHAHAHAHAHAHAHAHA
Local is over
>>
>>107430410
They may or may not remain the same. One of those two.
>>
>>107430430
See >>107430333
>>
>>107430430
retard
>>
>>107430410
If there's one thing you can count on, it's that things can always get worse.
>>
>>107430430
upgrade from 8gb vram
>>
Is local going to have to hope for optimizations or new architectures for the next 2 years? Next-gen Memory and GPU will be unaffordable when they come out, if they even come out with this shift towards data-centers.
>>
>>107430661
No. No one wants you to be able to run models locally. Open weights releases will just get smaller.
>>
>>107430204
>>107430430
the Z team is the same team that made Z-image turbo and they destroyed the competition with a 6b model, they know how to make great small models
>>
File: airft.png (85 KB, 1058x820)
85 KB
85 KB PNG
>>107423762
lol, lmao even
>>
Is it just me or is ministral absolutely fucking useless?
Absolutely worthless built in general knowledge and constantly fails to actually call tools properly despite claiming it has (or worse, DOES call the tool and gets the result then claims it "can't do that"). Then if it's not fucking up tool calling it just gets stuck in endless think blocks needing a manual stop or retry.
Fiddling with temp and repeat penalty sometimes seems to help but it eventually just falls over again.
At least it seems fairly uncensored but other than that what a waste of time this has been.
>>
>>107430673
I'm beginning to think this is true. Are porn and some hallucinated instructions you can look up on google really that hated?
>>
>>107430757
It might be an issue with the quants as I've heard there aren't that many issues on openrouter. I could be wrong though. On winblows so I'm waiting for ooba.
>>
>>107430757
It's pruned Mistral Small 3.1 24B + (probably) a few hundred billion tokens of healing + updated instruct post-training. I think they did something to their vision encoder too though, because it performs markedly worse than Mistral Small 3.2's even though it should be the same. I find it almost useless for roleplay since it fucks up format every time, even at low temperature.

https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512
>Sampling Parameters: Use a temperature below 0.1 for daily-driver and production environments; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.
>>
>>107430777
Its probably that all of the western frontier labs decided to keep things on lockdown. That plus open source eats into their profits. I thought western labs would have pivoted hard to open source since OpenAI decided to actually release OSS to counter chinese dominance in the open source space but it's quite possible we get drip fed instead.
>>
File: 1764455523452284.png (1.43 MB, 1152x896)
1.43 MB
1.43 MB PNG
I wonder if a hybrid interleaved diffusion transformer can benefit reasoning LLMs. Currently they're bad at making in place revisions and edits to their own memory. Meanwhile such functionality comes inherently with diffusion. But autoregressive still has some of its own advantages. So why not try interleaving them, though ideally in a way that the model itself is still unified so parameters aren't wasted. For instance, when receiving a query, the model would first reason in AR mode, generate a draft, and then reason, and then edit the draft in diffusion mode. So nothing else but the editing is done with diffusion, and it uses the AR-generated reasoning to guide the editing. Then it would go back to AR mode to reason more, and so on and so forth. Though tbf I don't know how current diffusion LLMs truly work. My intuition is that this would work well because image models are pretty good at following prompt from the text encoder.

Actually you know what, I think Google might already be somewhat ahead of me. I'm surely not the only one who has thought this.
>>
>>107430693
Z-image is from Alibaba, though?
https://huggingface.co/Tongyi-MAI
https://huggingface.co/zai-org
>>
>>107430789
>On winblows so I'm waiting for ooba.
just use llama or kobold, oogabooga is outdated shite.
>>
paste your saved stories into context in mikupad before starting a new one to avoid having to steer it too much
>>
File: 1745229650643447.png (437 KB, 847x974)
437 KB
437 KB PNG
It's over for the Rammaxing fags lol
>>
>>107430883
juste don't poor ?
>>
>>107430909
>juste
saar?
>>
>>107430831
yeah you're right my b
>>
>>107430920
baguette
>>
>>107430757
>Is it just me or is ministral absolutely fucking useless?
Here's what I noticed from the most basic translation prompts (that I use as a filter for "can't be arsed to care to test this model further if it fails at that") I gave those models:
they are the first of any "major" labs to release models in 2025 that don't listen to "no commentary" as an instruction to shut the fuck up and not write TL notes and other garbage
and they can't even handle doing more than 50 lines in one go without breaking (by breaking it does things like outputting a single line and acting like everything else didn't exist. And yes, I did set up context length correctly.)
translation isn't the only thing I test in models, but failing so bad just makes me rm this shit immediately, those models would have looked good two years ago but with models like qwen 3, gpt-oss 20b and gemma 3n I see no reason for these things to exist, even the 14b is just not coherent enough
>>
>>107430333
It's a transcription of a dude on a podcast speaking chinkgrish, but yeah if you listen to the original audio the pauses makes it clear he's referring to two different models
>>
File: 1750423337779359.png (122 KB, 1400x1400)
122 KB
122 KB PNG
>>107430430
uweeeee~ ojisan. jiisaii. kimoiiii
>>
>>107430726
But unsloth told me I can finetune deepseek on a 3090
>>
>>107430957
Yeah it's insanely bad, either there are actual bugs or they benchmarkmaxed it to oblivion and it literally cannot do anything else.
Back to gemma for now at least it mostly does the right thing even if it's a bit boring.
>>
Hatsune Chief
>>
>>107430810
Chinese labs make a splash and then pull back too. Some of their stuff went proprietary after the initial drop. Big MoE count on us not being able to run them.
A new deepseek 2.5 would have been excellent and infinitely more palatable. Nowhere to be found. Instead they copy western labs with parroting and only giving useless smalls or mega weights.
>>
File: file.png (85 KB, 337x496)
85 KB
85 KB PNG
>>107430883
honestly not that bad when all things considered
>>
Hatsune Queef
>>
File: miku-holding-gemma.png (1.09 MB, 790x1054)
1.09 MB
1.09 MB PNG
Was Gemma 3 the only all-around good, consumer GPU-sized official release of the year? It's getting depressing.
>>
ministral: good shit or skip?
I'm still using glm air 4.5 (moe) / cydonia (dense) but I'm not really happy with either
>>
>>107431406
can only be worse than cyd as it's a prune of mistral 24b
>>
>>107431441
They obviously didn't just shave the model down, the model doesn't write at all like Mistral Small 3.1/3.2. It's much less dry and restrained than those, but also apparently overfit on a DeepSeek-R1-like writing style (to the point that formatting instructions may often be ignored). Vision performance seems worse.
>>
>>107423247
there's a cute (irl) girl (female) dressing just like that that lives near me and it looks hot as fuck
well except for the bare thighs part, those must get pretty cold
>>
>>107430883
That is not the kind of RAM you would be buying for a CPUmaxx build though.
When bulk ordered off of Alibaba the current price for DDR5 server RAM seems to be ~7 € / GB (EU import tax included).
That's still ~10k € just for the RAM if you populate 24 DIMM slots with 64 GB modules.
Though if I can get it recognized as an expense for research and development I would essentially be getting a 77% discount via tax incentives and I'll probably buy 96 or 128 GB modules.
>>
>>107431547
>Though if I can get it recognized as an expense for research and development I would essentially be getting a 77% discount via tax incentives
ok, nice blog dude
>>
File: file.png (656 KB, 1151x863)
656 KB
656 KB PNG
>>107431536
>well except for the bare thighs part, those must get pretty cold
That can be fixed
>>
>>107431547
>That's still ~10k € just for the RAM if you populate 24 DIMM slots with 64 GB modules.
Is anybody actually doing this? That's a lot of money to blow on e-waste
I could mayyybe understand getting quad channel, 8 ram slot machine, which is the upper limit of 'only moderately unreasonable price, can run big moe at slow but still kind-of usable speed' but going to 24 x 64gb makes buying a maxed out mac ultra seem like the better option
>>
File: haha.png (1.09 MB, 672x1861)
1.09 MB
1.09 MB PNG
>>107431536
>>107423247
>>
>>107431622
>red hair
Don't put your dick in that.
>>
>>107431630
Those are the best, just don't let her find out your real name or address.
>>
>>107431585
As of right now there is (to my knowledge) no software for language model inference that is properly utilizing memory channels that are spread across multiple NUMA nodes.
So I would be buying the RAM primarily to develop such software and to figure out what the upper limit of performance given proper software support is.
I would definitely not be blowing that much money on a toy though.

>mac
To my knowledge the biggest memory configuration available for that is 512 GB.
I already have that much in my DDR4 server and quite honestly I ended up regretting not buying more since it's not enough to run models like Deepseek or Kimi at 8 bit precision.
>>
File: the-dell-dude.png (240 KB, 400x470)
240 KB
240 KB PNG
>>107430883
Now Dell has an excuse for charging 500 dollars for a slight RAM upgrade. Dude bros we are so back.
>>
>>107431719
Apple's previously ridiculous pricing for ram upgrades looking cheap at this point.
>>
>the 96gb ddr5 kit I bought for $250 is now over $1000
all we really need is for CPUs to somehow get a 250% price increase to make really sure nobody uses anything but phones as their computing devices
>>
>>107431733
Apple have always been ripoff artists but Dell actually used to offer decent value once upon a time- competing with the DIY market. So for them it's a fall from grace.
>>
>>107431770
Average people were and will just use phones regardless, so simple a monkey can use them and even the prices of phones tripling wouldn't matter since most people rent them through payment plans anyway
>>
>>107431690
What makes developing for NUMA different? I think memory allocation is already NUMA aware
>>
Am I the only one who's legitimately freaked out by this ram thing? Likes it's not just the ram, it's the basic admission that everyone who isn't a mega corporation can be safely disregard. Sure it's ram now, but what's next?
>>
>>107431897
Next is storage/NAND.
>>
>>107431897
all part of the plan: you will own nothing etc etc
>>
>>107431897
Trust the invisible hand.
>>
>>107430643
I have 64gb of ram
4.5-air is 100 billion parameters, so 30b is a big downgrade.
>>
>>107431853
>even the prices of phones tripling wouldn't matter since most people rent them through payment plans anyway
the entirety of washington would be burned to the ground if iphones went from $1k to $3k
it's not like you'd keep the same monthly plan if prices went full retard

>>107431897
>it's the basic admission that everyone who isn't a mega corporation can be safely disregard
it's always been this way tho, if you can get infinite profits selling to exclusive customers you don't bother with unwashed peasants
>>
>>107431948
I wonder how many active parameters they'll give the 30b
>>
>>107431978
I miss the ~30b dense models
30b moe is just too little, not really worth bothering with imo
>>
>>107431977
If prices go full retard, they'll simply offer payment plans with longer terms to keep the monthly payments sane, same as they did with cars
>>
>>107432007
even the dumbest basketball american would think twice of signing a $50/mo 8 year long plan
>>
>>107431948
let's be honest, it doesn't matter what arch it is if it's benchmaxxed garbage with 70% of probability on one token.
>>
>>107432023
They buy beaten up Dodge Chargers at 19% APR and still haven't figured out the smoke detector thing.
>>
>>107432007
nobody really needs a personal computer anymore. we will just see more and more people ditching pcs. the only real need for a powerful machine today is gaming which is also stagnate or in decline.
>>
>>107432056
>nobody really needs a personal computer anymore.
But I want one
So you can go fuck yourself, kike.
>>
What's a good local TTS with an UI which I can use to generate sex sounds with?
>>
>>107432086
your next door neighbor
>>
>>107432056
>the only real need for a powerful machine today is gaming which is also stagnate or in decline.
Likely game streaming services like OnLive and PS Now on devices like the Steam Deck will take over
>>
>>107432069
I am just telling you the truth. the masses are who make the market, a few autistic people who want to tinker are not going to create enough incentive. also I'm not a kike, I denounce the talmud.
>>
>>107432087
what if he isn't looking for rape sounds specifically?
>>
>>107432086
vibevoice can do zero-shot voice cloning but it's only good at 3 steps 3 cfg. i use https://github.com/wildminder/ComfyUI-VibeVoice
>>
Oh yes harder daddy. :sweating emoji: :sweating emoji: :sweating emoji: lick lick lick lick L L L L L I am cumming :eggplant emoji: :water emoji: :water emoji:. :joy: :joy:
>>
>>107432042
4.5-Air isn't benchmaxxed though. Neither is full size 4.6
>>
>>107432103
>but it's only good at 3 steps 3 cfg
For the 1B or 7B?
>>
>>107432086
Depends what you're into. I'm into making tts do weird sounds.
>https://voca.ro/1opWmene7Yxx
>>
>>107431890
To my understanding the problem is that the llama.cpp/ggml CPU backend is not NUMA aware so you get threads accessing memory from other NUMA nodes.
At the rate things are currently going I should have a prototype for generic tensor parallelism between the end of December and the end of January.
I intend to try parallelizing NUMA nodes using the same code so the data would be split per NUMA node and the transfers between them would be minimized.
>>
>>107432120
Ani ahh post
>OH YOUR RUGGED BEARD
>OH YOUR RUGGED ASS
>OH YOUR RUGGED SHORTS
>>
>>107432132
both. i saw no significant difference in speed and the quality is similar too, but 7B adapts to weird voices better.
>>
>>107431897
You are simply observing the logical end point of capitalism and globalization.
Wealth distribution in a common market largely follows a Pareto distribution.
With globalization the markets have become much bigger and wealth has become more concentrated.
So now the markets largely follow the needs of corporations and billionaires at the expense of everyone else.
>>
>>107432217
>Pareto
the pareto frontier... its moved!
>>
>>107432144
Yea currently llama.cpp can't even push 1/4 of my MLC benchmarked bandwidth. Using a single node/CPU cuts the speeds further and is nowhere near even the worst of the benches.
>>
/lmg/ theme
https://youtu.be/HWl1Tu9oZmY?si=JwZTNrhipBwujxWm
>>
>>107431897
Anon even a Mega corporation can be safely disregarded nowadays. Some of the construction times for a datacenters are measured in decades because no matter how much they pay construction companies, nothing changes. Same for a skilled workforce or hardware components.

Want a reliable supply of water for your datacenter? Better pray to the gods.

Such are the result of deindustrialization and the shift towards Service and IT in the northern hemisphere (aside from China)
>>
File: mikuquestion2.jpg (989 KB, 1710x1779)
989 KB
989 KB JPG
So is Ministral 14B better than Nemo?
>>
>>107432295
>anon
Sign in to confirm your age.
>>
>>107432303
nyo
>>
File: anyacrying.webm (188 KB, 1920x1080)
188 KB
188 KB WEBM
>>107432316
>>
>>107432324
I know it's truly sad it was our chance but '26 will be another year of nemo
>>
>>107432302
https://redmondmag.com/blogs/generationai/2025/12/microsoft-is-sitting-on-a-pile-of-unused-gpus.aspx

"Quite frankly, the biggest issue we are now having is not a compute glut, but it's power and it's sort of the ability to get the builds done fast enough close to power," he told the show's hosts. "So if you can't do that, you may actually have a bunch of chips sitting in inventory that I can't plug in. In fact, that is my problem today. It's not a supply issue of chips. It's actually the fact that I don't have warm shells to plug into."

> "Once operational in the 2027–2028 time frame, the reactor will provide roughly 835 MW in capacity, supplying not only a nearby Microsoft datacenter but the nearby regional grid as well."

You can safely add another 3-4 years to that time frame
>>
>>107432310
https://yewtu.be/watch?v=HWl1Tu9oZmY
>>
>>107432367
That trick doesn't often work anymore:
This video may be inappropriate for some users.
After which you should try to:

Refresh
Switch Invidious Instance
Go to YouTube
>>
>>107432316
Drummer will save it.
>>
>>107432412
You must be over 18 to post
>>
>>107432144
In theory it’s quite simple: during model warmup, cycle through cpu cores per tensor and maintain that relationship of core-to-tensor when running inference. The difficulty is entirely due to the architecture of lcpp itself getting heavily in the way (mostly the scheduler iirc)
>>
>>107432485
You must have a google account too?
>>
>>107432412
It's I Wanna Fuck My Computer by Nanajirachi
>>
>>107432316
I have a cat they screams “nyo~~!” If pick her up
>>
>>107432492
you didn't see my zero width disclaimer?
>>
>>107432500
https://www.youtube.com/watch?v=jXvitLHphmI
Current anons stay logged in and stay tracked.
What next? FB and insta links?
>>
File: file.png (293 KB, 577x649)
293 KB
293 KB PNG
>>107432491
>>
>>107432295
it was a good music video until furries showed up.
>>
>>107431897
The only hope of getting affordable hardware will be Apple. Not exaggerating in the slightest. Shit is gigafucked for the next couple years... Maybe even longer. We are looking at the nvidiafication of the entire PC market where the only thing companies care about is selling to hyperscalers for fat margins.
>>
>>107432748
Y’all shoulda followed the cpumaxxing guide when you had the chance
>>
They'll start improving small models now that memory is stupidly expensive, right?
>>
>>107432801
memory is stupid expensive because they no longer care about small models.
>>
>>107432801
yes, you'll start seeing small 1b - 14b models that punch far above their weights and go toe to toe with GPT5. you'll also see the first 10t models next year and nothing in between
>>
>>107432768
I did but only went so far. Had I known, I might have upgraded to DDR5. At the least I would have bought double the current DDR4 I have.
>>
https://github.com/ikawrakow/ik_llama.cpp/pull/1033
We might get DSA in ik_llama before mainline.
>>
File: 1750706515055223.png (18 KB, 716x214)
18 KB
18 KB PNG
>>107432933
crazy gainz
>>
>>107432933
if they get 3.2 exp support first and improve the multimodal situation, they could overtake mainline
>>
>>107432973
No they can't.
The biggest bottleneck is maintainers and IK is literally the only one working on the fork.
>>
File: 1764805895712.jpg (647 KB, 1438x2044)
647 KB
647 KB JPG
Apologize, chuds
>>
>doesn't benchmax
haha yeah
>>
File: file.png (84 KB, 419x238)
84 KB
84 KB PNG
>>107433091
>plus or minus 26
>highest variation out of any model on the leaderboard
>>
>>107433091
>coding
>>
Mistral is so fucking embarrassing. Large 3 is literally the saddest thing that came out this year. It's disgusting.
>>
>large 3 is literally just deepseek finetuned under a eurocuck name
fucking kek
>>
>>107433207
It's so bad there has to be some kind of bug right?
...right?
>>
>>107433220
mistral was never good, all their previous models were just rebadged llama
>>
If Mistral Medium is actually dense then it's probably the original Large 3.
>>
File: 1762224833786389.jpg (234 KB, 894x1028)
234 KB
234 KB JPG
https://github.com/LostRuins/koboldcpp/releases/tag/v1.103
>>
>>107433250
what about mixtral
>>
>>107433091
>More on coding in a few days...
Codestral 2512 soon
>>
>>107431897
>Likes it's not just the ram, it's the basic admission that everyone who isn't a mega corporation can be safely disregard.
You say this like this hasn't been America's MO since the industrial revolution
>>
>>107433207
There must have been reasons why it took 6+ months instead of weeks after releasing Mistral Medium on API only and for select enterprise customers. I think that one was their preliminary MoE model before attempting a larger one, but scaling things further up didn't work as well as expected.
>>
File: 1758093444165074.gif (253 KB, 370x448)
253 KB
253 KB GIF
>>107433285
>leejet
>>
>>107433261
probably. i just want the medium
>>
>>107433261
The release blog for Medium 3 was literally the same post the "Large 3 in a few weeks" tease came from so I doubt it
Imo they screwed up training whatever the original Large 3 was supposed to be, then rushed to copy Deepseek's homework since it would look terrible to have no flagship releases for over a year
>>
the original large 3 was literally just a big nemo but they scrapped it to chase the deepseek moe meme train
>>
File: mistral-libgen.png (484 KB, 1119x852)
484 KB
484 KB PNG
>>107433250
They were (initially) good because they used to use pirated material in their pretraining data (libgen, etc), that's even in Kadrey v. Meta copyright lawsuit.
>>
>>107433334
is that what you get when you cross a chink and a jeet?
>>
>>107433330
My conspiracy theory is that they attempted to distill deepseek to a smaller 120b-220b model but failed spectacularly so resorted to copying v3 wholesale to avoid worse embarrassment
>>
>Buys your RAM
>Takes your taxpayer money
>Fucks your AI gf
It's nothing personal
>>
>>107433286
Schizo and impossible to finetune. Mixtral was just 8 Mistral 7Bs (rebadged llama) duck-taped together with some additional training on top. There's a reason no one does it that way now.
>>
Which Deepseek do I download for the best 3090 erp experience?
>>
>>107433597
>gf
>>
>>107433681
he's gay?
>>
>>107433731
https://x.com/sama/status/825899204635656192
>>
>>107433630
:8b
>>
>>107433816
Damn, imagine being a billionaire able to get any type of woman you want and being gay.
>>
3.2-speciale is my favourite model of the year
>>
>>107433961
ggufs when?
>>
>>107433850
thanks for responding, but I cannot navigate huggingface, could you share the full model name?
>>
>>107434001
he is recommending you use ollama to download a qwen distill of deepseek. just download whatever quant you can fit in system ram + vram. or stick to mistral nemo
>>
>>107434001
I was joking. If you don't have upwards of 256GB ram, forget about it.
>>
>>107433626
It looks like everybody forgot about Mixtral 8x22B (141B total)...
>>
>>107434056
:(
>>107434045
thank you for the information
>>
>>107433961
I'm waiting for the Competizione variant.
>>
>>107434077
wasn't Microsofts fine tune better tho?
>>
>>107433912
Imagine goyim know you raped your little sister, and you must beat these allegations
>>
>>107429504
Maybe Trump should ask North Korea to start a ram plant, since wost korea can't into make ram. He has his phone number.
>>
>>107434135
I found wizardlm quite boring, also you need an ancient llamacpp release to run it
>>
>>107434140
Then that's even more tragic. All the money in the world wasn't able to buy her sisterly love.
>>
>>107434056
It's gonna get better. Someone will make a memory efficient model that doesn't suck.

safety removal reduces memory, and reduces compute.
>>
>>107434244
I'm sure they will
Then they'll stick it behind on API wall and charge $15 per million tokens
>>
>>107434357
>>107434357
>>107434357
>>
>>107426882
So the NeMo framework was named after Mistral NeMo then?
>>
>>107434442
Retard.
>>
>>107432295
This girl is really cool, just listened to that whole album thx anon



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.