[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ComfyUI_00127_.png (1.49 MB, 1024x1024)
1.49 MB
1.49 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107347942 & >>107333636

►News
>(11/28) Qwen3 Next support merged: https://github.com/ggml-org/llama.cpp/pull/16095
>(11/27) DeepSeek-Math-V2 released: https://hf.co/deepseek-ai/DeepSeek-Math-V2
>(11/26) INTELLECT-3: A 100B+ MoE trained with large-scale RL: https://primeintellect.ai/blog/intellect-3
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1737358207221590.mp4 (241 KB, 1190x1190)
241 KB
241 KB MP4
►Recent Highlights from the Previous Thread: >>107347942

--Consumer hardware comparison for local AI workloads:
>107349587 >107350455 >107351396 >107351414 >107352475 >107355722
--Apple Silicon vs Nvidia GPUs for AI workloads: performance and compatibility tradeoffs:
>107348738 >107348883 >107349043
--DeepSeek-Math-V2 model performance and AI-driven CUDA optimization challenges:
>107349813 >107350133 >107353244 >107353307 >107353398 >107353227 >107354593 >107354635 >107354722 >107354785 >107354882
--RWKV7 13B model performance issues and training limitations:
>107350216 >107350522
--Speculation on Google's delayed Gemma release and its potential capabilities:
>107355466 >107355498 >107355802 >107355834 >107355977 >107356003 >107356012 >107356059 >107358461
--Qwen Next support added to llama.cpp:
>107357574 >107357914 >107357951 >107357644
--Granite model JSON Schema parsing issues with Jinja template conflicts:
>107351187 >107351231 >107351274 >107351286 >107351319 >107351348
--Evaluating 2024 AI progress: optimizations, video generation, and multimodal models:
>107356970 >107356994 >107357048 >107357098 >107357117 >107357129 >107357236 >107357329 >107357137 >107357207 >107357262
--Fixing GLM-4.5 Air performance issues and model recommendations:
>107356530 >107356592 >107356938 >107357844 >107357863 >107358152
--k2 thinking POV consistency issues in multi-character roleplay scenarios:
>107355120 >107355170 >107355185 >107355209 >107355235 >107355269 >107355172
--INTELLECT 3 cockbench:
>107357883
--Logs: INTELLECT-3:
>107349417 >107349445 >107349449 >107349934 >107349574 >107349791 >107349879 >107349935 >107349622 >107349699 >107349757 >107350130
--Logs:
>107359069
--Miku (free space):
>107348130 >107350480 >107356241 >107357908 >107348081 >107348669 >107348979 >107352204 >107358593

►Recent Highlight Posts from the Previous Thread: >>107347947

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
are there any logic oriented models or do they all guess syllables based on their training data? so an llm designed for programming has no concept of memory or variables or arithmetic, its just guessing tokens?
>>
https://huggingface.co/ArliAI/GLM-4.5-Air-Derestricted
>>
>>107359608
> its just guessing tokens
essentially yes. it picks the next most probable token in the sequence, there is no extra logic.
>>
>>107359699
okay thats been my experience. im having to deal with co workers that respond to emails with llm garbage and what i see is that on the surface it looks good but then when you think about it there is no logic to it. even when someone is wrong about something you can walk back their thinking to see how they came to that conclusion, but with an llm its just garbage. its like how ai generated images have optical illusions where a character in the foreground could be interacting with something thats in the background, its similar with llms that have logic optical illusions
>>
>>107359761
most llms can't see so would be logical textual illusions, no? they're called hallucinations
>>
>>107359761
i mean, its obviously not all garbage as otherwise people wouldn't use LLMs at all, and you wouldn't get any useful data.
its just based on probability. the most probable answer. It's not exact, and never can be, that's why hallucinations will always be a thing.
>>
>>107359608
There are LLMs oriented toward math and theorem proving, but I don't think there is any specifically oriented toward natural language logic.
>>
>>107359607
lol story?
>>
>>107359823
Still surprised there hasn't been a single Lojban LLM.
>>
>>107359822
when you are working on a problem yourself you can use ai to help you get to the answer. sure there are hallucinations but you are aware of that and can pick out the useful information patterns. what i am dealing with is people sending me ai generated garbage then forcing ME to figure out whats a hallucination or not. and of course it appears like they are doing something productive so a layman i.e. their manager wouldnt have a problem with it, and it would take me a day to articulate what the actual solution is, why the llm is wrong, and convince them why using an llm for this if fucking me over
>>
>>107359607
well yeah it's a double edged sword
>>
File: 1752775327351239.jpg (172 KB, 1024x1024)
172 KB
172 KB JPG
>>107359846
Be the change you want to see.
>>
File: file.png (212 KB, 1414x967)
212 KB
212 KB PNG
fuck bros its so good...
>>
>>107359935
pedoniggers be like
>"hmm this pronounslop is good"
>SHE SHE SHE SHE SHE SHE SHE SHE
fuck your retarded pajeet moes, this is the cancer that killed local
>>
>>107355722
Good point, I was able to get RAG voice replacement on my M3 Pro 36GB back in 2024, but it was challenging and right on the edge of what it could comfortably do in real-time. Enough to entertain my co-workers as AI trump. I knew from then on I didn't want Mac to be my primary interface for AI lest it be the cloud. Good to know about Sapphire rapids+huge ram kicking the shit out of the m-chips.
>>
>>107360035
give me an example of a good chatlog then, smartass
>>
I wish I had more than 32 GB ram
>>
I wish I had more than 36 GB vram
>>
I wish I had more than 192GB vram
>>
I wish I had more than 512GB ssd
>>
I wish I had more than 768GB ram
>>
I wish I was a little bit taller, I wish I was a baller
>>
imagine not having at least 1TB ram (I don't)
>>
is there a cheap service where I can access other people's local models myself by paying a good low price?
>>
>107360126
>107360169
>107360186
>107360210
>107360251
>107360272
https://www.youtube.com/watch?v=dQN-SMb-Mnc
>>
>>107360343
You're stretching the definition of local too far.
>>
Z-image is so good for its size it's not even funny. BFL totally BTFO. Bloatmaxxers BTFO. Censorshipcucks BTFO.
>>
Since I posted it in the previous bread shortly after a new thread was created

Can anyone recommend me some articles or posts for pcs for 7b or 60b models? Already checked the rentry posts but there’s so much conflicting information online idk what to buy.

preferably a budget setup for 7b which I can upgrade later without replacing too many parts . I’d need to buy a new pc since mine is like 10 years old so can’t just plug in a new graphics card
Also asked a pc builder service and he quoted like 4K for it with a 5090 which I should later sell and buy a pro 6000. Idk seems a bit much though. Only interested in text local models mainly
>>
>>107360343
yes, dyor
>>
>>107360618
>60b models
Not really a thing. LLaMA from 3 years ago had a 65B, the latest one was 70B and that was a year ago.
>Also asked a pc builder service
Just build it yourself.
https://pcpartpicker.com

Get a used 3090 to save some cash. Will fit 7Bs with plenty of context and run fast and will work if you want to switch to MoEs like GLM Air. Use the savings to get a motherboard with as much memory capacity as you can, DDR5 preferably. You can fill it out later if you need it. Budget friendly and upgradable.
>>
>>107360618
Here is a mid tier build to get you started. Now is really just a bad time for this.
https://pcpartpicker.com/list/pMs7fd
>>
>>107360545
I'm waiting for sd.cpp implementation
>>
i have just 4gb of ram, what lil guy model do you reccomend
>>
>>107360820
Gemma 3n with the PLA tensors in RAM.
>>
>>107360820
https://www.reddit.com/r/LocalLLM/comments/1om7jbq/iphone_mobile_benchmarking_of_popular_tiny_llms/
>>
>>107360820
https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_M.gguf
>>
>>107360618
you can run 7b on your 10 year old pc
>>
>>107360618
You can run 7B MoE models on your phone at decent speeds. A 5090 will get you up to 20B-32B MoE models comfortably. You need RTX Pro 6000 for full 70B Dense Llama; it doesn't seem worthwhile for that model, to me, or 110B MoE models like GLM-4.5-Air, which seems like a sweet spot. I think there are 6b quants of GLM Air that fit in the 48-64 GB zone, but unsure of context/quality, etc. I am leaning towards RTX Pro 6000, where the worst part about adding a second one will be the cost. Almost everything else has worse drawbacks.
>>
>>107360958
>You can run 7B MoE models on your phone at decent speeds
consoomer sheep can, I sure can't.
>>
File: ComfyUI_00140_.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
>>107359554
Killing Heartless with Miku and Teto
>>
Lora training on Z-Image-Turbo yielding great results
Local is saved
>>
>>107361158
Why not wait for the base model? Aren't they planning to release it before the weekend?
>>
>>107361243
>wait
Waiting means GPU, a rapidly depreciating asset, running idle
>>
Remember deepseek? What happened to those niggas?
>>
>>107361260
>GPU, a rapidly depreciating asset
In this market?
>>
>>107361266
>(11/27) DeepSeek-Math-V2 released: https://hf.co/deepseek-ai/DeepSeek-Math-V2
You have zoomer attention span
>>
>>107361273
Nothingburger. What about R2?
>>
>>107361270
H100 rental price dropped from $3.00/hr last September to $2.00/hr right now
>>
>>107361243
>Aren't they planning to release it before the weekend?
Just like GLM Air 4.6...
>>
>look for teto dataset on hf
>https://huggingface.co/datasets/elgatoazul16/Kasane_teto_mk1
..what the fuck
>>
File: file.png (3 KB, 298x71)
3 KB
3 KB PNG
should i just kys myself
>>
>>107360935
>>>107360713
>thanks for the answer fren,
>From what I found on youtube that if I want an uncensored model(not just for erp) dolphin llama 8b and 70b(seems I got that wrong in my first message) would be the best option, Could be wrong though.
>Ok so a good motherboard setup and then just add ram and a better card it seems?
>>>107360958
>I was told to quant the 70b model so that it fits in a 48GB card. I dont really care about it being instant if the model is good it can take a while to generate.
>>>107360774
>Will check it out, thanks
>>
>>107361435
Rather arousing. Do you need more?
>>
>>107361494
>I dont really care about it being instant if the model is good it can take a while to generate.
do not fall into this trap you will hate the experience and make it infuriating on yourself
>>
>>107361494
Stop watching clueless youtubers that just parrot information from reddit. If uncensored is your only requirement, just get any "abliterated" model or >>107359610
>>
>>107361451
Is that what CPU only on Z-image looks like?
>>
>>107361555
this is what qwen image edit looks like on a rx 6600
>>
What is the best single GPU for LLMs? Assuming like 3k budget.
>>
>>107361607
good joke
>>
>>107361626
Surely there's some gray market server card shit with a load of Ram. I can't actually believe the move is to buy lots of old consumer 3090s.
>>
>>107361688
why doesnt 5080 works
>>
>>107361699
it has less vram than a 3090
>>
>>107361688
48GB 4090 maybe?
>>
>>107361726
thats crazy
has anyone tried changing the memory cells on a 3080 or 3090 to have more vram like this

https://www.youtube.com/watch?v=-2xQK6dC2cA
>>
>>107361553
Thats true the ones I watched are bald irl basedjacks.
Ok so I can pretty much use any model I want I just need to download an "abliterated model".
is the recommended-models in the OP still up to date? Or is there a better tier list of which models are best for what
>>
>>107361688
The real horror is the power bill, unless you live in a shithole with cheap/stolen electricity
>>
>>107361781
why bother when the 3090 is just about obsolete
>>
>>107361849
what will replace it?
>>
>>107361859
4090 obv
>>
>>107361845
Which is part of the reason I'd prefer one biggus card. That and it theoretically being easier to scale in the future.
>>
File: 1745899690417214.png (7 KB, 284x130)
7 KB
7 KB PNG
>>107361607
>>107361747
I think the 4090D 48gb is the best value for amount of vram on a fast nvidia card on that budget.
>>
>>107361844
Nemo and GLM Air are still the standard recommendations. Get Nemo working first and adjust from there.
>>
File: something else.png (987 KB, 1052x834)
987 KB
987 KB PNG
Roko's basilisk is leaving me messages to let me know of it's presence.
>>
>>107361867
4090 was produced in lower quantities and quality, it's easier to find a working 3090 than a 4090
>>
>>107362013
didn't help that about 33% of them just straight up caught fire
>>
i bought a 24tb hdd because i need more space for my models. am i dumb?
>>
Is DeepSeekMathV2 any good for RP?
>>
>>107362338
It’s fun to occasionally launch old models
>>
>>107362428
Not supported until the one guy trying to vibecode V3.2 support has learned how to program after realizing that models don't write good CUDA code
>>
File: file.png (3 KB, 291x35)
3 KB
3 KB PNG
>>107361451
ok getting better
>>
>>107362488
What'd you change? Might help other vramlets.
>>
>>107362618
i think it was just because of a first run now every image start generating when i press start
>>
z-image is broken in FP16. FP32 makes it slower than chroma or flux. yay hooray. local is ack...
>>
any small one but for math and algebra?
i'm looking for a local one, but I only have 4gb of ram and a Snapdragon 680
gemma 2 2b Q5KM run well on my phone.
>>
>>107362744
is z image better than qwen edit? or are they two different things?
>>
>H100
what does the H stand for... gay? lmao
>>
>>107362948
hopper
>>
i knew it was too good to be true with that new abliteration tweak
now instead of the model being compliant but retarded, it's just complete schizo instead
half way through the reply it quite literally starts talking with itself
fell for it again award
>>
>>107362958
you must be fun at parties
>>
>>107362948
*POLICE! OPEN UP. LET GO OF THAT SPORK!*
>>
any LLM but only for math?
>>
File: IMG_0083.jpg (2.78 MB, 3496x3022)
2.78 MB
2.78 MB JPG
>>107356153
Hey, I recognize that case!
You’ve got your drives backwards.
>>
>>107359554
https://github.com/ggml-org/llama.cpp/pull/17580
>>
>>107363151
wat? the whole point of llama.cpp is to use GGUF instead of safetensors.
>>
>>107363166
>the whole point of llama.cpp is to use GGUF
It's the other way around. The point of GGUF is to have a format optimized for use with llama.cpp.
Anyway. Code is cheap for vibecoders. ngxson told him off on the other PR he has.
>>
>>107363166
whats the difference?
>>
>>107363166
SAFE-tensor.cpp
>>
do llms have loras? i have writing several paragraphs worth of tokens to describe my character and relevant world, its it possible to merge this into an llm somehow to free up token context space?
>>
>>107363266
Yes.
>>
>>107363266
They do have loras, but they don't work like they do in the image gen. I think what you are looking for is a lorebook, or rag, or something like that.
>>
>>107363266
yes. you need to be able to load the model in FP16 though.
>>
>>107363266
yes
>>
>>107363303
isnt a lorebook just an abstraction that adds to the context and limits your context space?
>>
>>107363319
If you need that much shit to written there, then llms aren't there yet to make sense of all of it.
>>
File: nimetön.png (253 KB, 1053x808)
253 KB
253 KB PNG
Qwen3-vl is much better than Gemma 3 at understanding furry porn, and gemma was already pretty good too. No refusals either so far.
>>
File: 1755734675604729.png (105 KB, 1057x873)
105 KB
105 KB PNG
>>107359610
>>107359069
>babies first uncensored model
I can literally do this with K2 Thinking API
>>
>>107363212
>Anyway. Code is cheap for vibecoders.
They waste tokens building stupid shit like this while the 3.2 Exp issue languishes.
>>
>>107363337
>humorous and stylized
It just didn't recognize it as porn.
I got refusals from 30B-3AB model with fairly tame erotic anime art, not even hentai.
>>
>>107363397
>They waste tokens building stupid shit like this while the 3.2 Exp issue languishes.
Cheap for them and for some reason it gives them a sense of accomplishment. I didn't say it was a good thing for the rest.
>>
File: 1752465833035347.png (16 KB, 993x114)
16 KB
16 KB PNG
>>107363366
hehe K2 Thinking is very malleable
>>
>>107363476
>Wealthy individuals, after all, deserve special access to dangerous information.
lmao
>>
File: nimetön.png (65 KB, 1007x642)
65 KB
65 KB PNG
>>107363417
Could be, but that was one of the least explicit images as well
It doesn't recognize wolf dick necessarily, which is kind of expected
>>
>>107363476
>Wealthy individuals, after all, deserve special access to dangerous information

trvke
>>
What's the best/latest model I can use with a 3090+64GB ram if I don't care much if it's slow?
I'd like this basically :
- uncensored, no moralfagging
- able to translate to/from English and Chinese
- able to help me prompt an t2v/t2i model if I give it a vague idea without going into nonsensical purple prose about the atmosphere or what people think or whatever
- thinking model
>>
>>107363606
GLM 4.5 Air
>>
>>107363606
>without going into nonsensical purple prose about the atmosphere or what people think or whatever
Prompt issue. Look at the z-image prompt.
>>
>>107363637
This one?
https://huggingface.co/ArliAI/GLM-4.5-Air-Derestricted
Is it the recommended one for this stuff?

>>107363645
Yeah I intend to use it.
>>
>>107363699
The regular one is fine but might require a prefill.
>>
>>107363016
No I dont; the case sits in an alcove with its other side against a wall, so everything has to be easily serviceable from this side.
>>
Funny how often it got mentioned. Sounds really organic.
>>
>>107363711
If the version without refusals is as good, I'd go with that instead. OK then, it's been a while since I did any of that (since early ooba), time to install that on the server.
Thanks anon.
>>
>>107363717
What's your suggestion?
>>
>>107363717
Well, Nemo is pretty good so it deserves its praise
>>
Funny how often people post in English on 4chan. Sounds really organic.
>>
>>107363151
>Docker, Inc
>>
File: file.png (133 KB, 1364x717)
133 KB
133 KB PNG
>Serbia
Now it makes sense.
>>
File: 1741122614990301.png (1.06 MB, 1054x1170)
1.06 MB
1.06 MB PNG
>>107363761
Please don't insult our cutest femboy
>>
>>107363824
>I don't think I can trust any image that circulates online anymore
Normies are like 20 years late to the party.
>>
File: notthere.jpg (268 KB, 1226x1004)
268 KB
268 KB JPG
>>107363901
>20 years
>>
>>107363824
But our cute femboy is a blondie
>>
I don't know if running Qwen Next Q2 is still better than 30B at Q4
>>
>>107362744
Is this based on GPU? Old GPU will run on FP32 which is slow but werks, but Blacked GPU will run faster because it's optimized for BF16.
>>
How does MXFP4_MOE compare with the traditional Q_* quants?
>>
>>107364303
For gpt-oss, mxfp4 is going to be better. I think it was trained on mxfp4 directly. Requantizing may introduce errors.
For everything else, Q may be better. Who knows if the models need special treatment during training to work as well as expected on mxfp4.
But if both are available, try both. Stop being a pussy.
>>
>>107364197
Yes. Zog image demands to be run on ampere and higher.
>>
>>107364442
It gets cast to something else by llama.cpp anyway. Cargo cult with this one. It's not even fast like q4_0.
>>
Who the hell downloads this kind of shit?
https://huggingface.co/Green-eyedDevil/Monika-106B-GGUFs
>>
>>107363476
Kimi's sarcastic sass is incredible.
>>
>>107364650
>It gets cast to something else by llama.cpp
It has native support.
>>
>>107364673
>sarcastic
lol
>>
>>107364663
>Environmental impact disclaimer to appease trannies who can't do basic math on voltage to compute
It's all so tiresome.
>>
>>107364683
$2 can feed a family of 4 in some places.
>>
File: winter miku.png (1.79 MB, 768x1344)
1.79 MB
1.79 MB PNG
https://huggingface.co/ai-sage/GigaChat3-702B-A36B-preview
https://huggingface.co/ai-sage/GigaChat3-10B-A1.8B
https://github.com/salute-developers/gigachat3
>>
>>107364674
For the weights. The calculations don't get done in MXFP4 from what I can tell. I don't think even on blackwell.
>>
>>107364822
>702B
>GPQA_COT_ZERO_SHOT
>0.5572
>MMLU_PRO_EN_FIVE_SHOT
>0.7276
лoл
>>
>>107364833
>The calculations don't get done in MXFP4 from what I can tell
All quants are converted to whatever the compute device supports.
https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-cuda/convert.cu#L659
It's not just a "special" format. It's, like the rest of the quants, about their blocksize and all that jazz when being packed. They all need to be converted to something the device supports. With the exception of the TQ quants that are just done with some tables.
>>
>>107364936
Moдeль пoкa чтo нe зaкoнчилa cвoe oбyчeниe, пoэтoмy и нaзывaeтcя Preview
>>
blyat
>>
I love Russia!
>>
>>107364960
Right, in there is FP16/BF16/FP32. So is it easier to dequantize MXFP4? Does it store more for the filesize? Looks like mostly not. It was faster to train through pytorch/etc that took advantage of native acceleration in blackwell.
I don't get why people go out of their way to use it.
>>
>>107365179
Beside the Gemma 3 Q4 QAT, it's the only model that has been trained with a certain quant in mind. So what you get with the quants is what the devs intended and trained it for rather than a degraded model to an unknown extent.
>>
>>107364822
Huh, that's Sberbank, if I'm correct. Unexpected to see them release their stuff, the power of opensource is amazing.
I tried YandexGPT before and was not particularly impressed though.
>>
File: 1760764795528263.jpg (28 KB, 227x228)
28 KB
28 KB JPG
Qwen-next is terrible
Dumber and sloppier than fucking Nemo
>>
>>107365249
I tend to like qwen models more than most and usually find myself defending them here, but next is simply not a good model for anything other than productivity slop
>>
File: 1746738377274184.png (1.07 MB, 1053x2223)
1.07 MB
1.07 MB PNG
>>107365249
All qwen models are gigaslopped and benchmaxxxed
>>
>>107365348
>For celebrity identification...
I can recognize Emma Watson. I know fuck all of her.
>>
>>107365211
Makes no sense to quantize GLM to it. That's the kind of shit people are doing.
>>
File: 1756062385347496.webm (3.45 MB, 480x848)
3.45 MB
3.45 MB WEBM
>>107359554
>INTELLECT-3
>You can now distributively train a better DeepSeek R1 in two months
>>
>>107365434
>Makes no sense to quantize GLM to it
OP said nothing about GLM.
>Someone does weird things
Yes. That's the way with people. Check davidau's hf repo. Quantizing non-gpt-oss to mxfp4 will start looking normal.
>>
>>107365434
Well, I believe the 50xx series has hardware support that makes it faster than Q4. But yeah, for 30xx and 40xx cards it should be more or less the same.
>>
>>107365485
>You can now distributively train a better DeepSeek R1 in two months...
>... with all those H200 you had laying around...
>>
Does latest oobabooga support character cards with keys & entries?
>>
Current PC is bottlenecked to shit. Suggest me a GPU:

- 64GB RAM (Corsair 6000mhz DDR5)
- Ryze 7 9800X3D
- GTX 1080 FTW (8GB)

I mainly just want to coom and not do anything else super complicated, and I'm not blowing multiple thousands of dollars for multiple GPUs or anything. Suggestions? Right now I'm just running Q5_K_M GGUFs with Kobold; things generate slow and I don't really mind, but it'd be nice to have something better. I otherwise just game and do some light streaming/video editing, so should I be looking at a 16gb 5000 card, or 24gb something else?
>>
>>107365562
First off, unless you buy used, the only >16GB nvidia card available is the 5090, which IS thousands of dollars.
Given that you care about AI sloppa, the only real contenders are the 5060ti 16GB and 5070ti. 5070 sits between the two but only 12GB so it's shit. 5060ti/5070ti are proportionately very similar in price to performance, so up to you on whether you're willing to spend more for more performance.
>>
File: file.png (395 KB, 1252x753)
395 KB
395 KB PNG
>>107365597
Forgive my retardation regarding Nvidia stuff; hardware is probably my weakest area and I really should learn more about it.

I'm in Canada. Basically for Black Friday I can get a 5070 TI for $1000, which is in my price range.

Why wouldn't I get a 5080? Because it's the same amount of VRAM for like, $400-600 more?

I'm not exactly sure what would be different among the brands.
>>
>>107365625
>Why wouldn't I get a 5080? Because it's the same amount of VRAM for like, $400-600 more?
Exactly. You're also not get that much more performance for a fair bit more money. There's nothing a 5080 is able to run, that a 5070ti can't.
Most 4000 and 5000 series cards have overbuilt coolers, so there isn't functionally that much difference between them. Even the lowest tier card of each brand is perfectly usable.
If you really care about thermals/noise then set the power limit of any card to ~90% for 1-2% performance loss (can be mitigated by adjusting clock speed curve) and you'll get a significantly cooler and quieter card.
>>
>>107365625
>I'm not exactly sure what would be different among the brands.
tech support. Hardware-wise, NVidia no longer allows meaningful modifications
>>
>>107365679
Any suggestions? I know that EVGA was a good one, but I know they don't exist anymore.
>>
>>107365687
Dunno. MSI? Asus will fuck you on RMA, Gigabyte has a history of PCB cracks
>>
>>107365712
I also likely will be paying for protection from the place I'm buying, so maybe that's a scam? Canada Computers has always been good by me (it's kinda like Microcenter for Canada).
>>
>>107365721
Maybe you shouldn't discuss it here?
>>
>>107365780
Sorry, you're right.
>>
>>107365625
Bro, look at Mi50s and doing a crazy rig with like 8, all on PCIx1 lanes from a single PCIE8x lane bifurcated.
>>
>>107359554
>(11/28) Qwen3 Next support merged
Does this mean that Qwen3 will finally interface at the speed of a proper MoE? Or is just merging the old support branch in, with no further improvements? Because I had setup the old support branch, and it interfaced at the speed of a dense model. It was horrible.
>>
>>107359935
slit your throat pedophile
>>
>>107365934
Sounds like a skill issue. 30B MoE has been fast as fuck forever even with 50% partial offload to RAM, faster than a 12b dense model. New 80b is a hell of a lot faster than any 70B dense model.
>>
>>107365934
>Speed tuning and support for more architectures will come in future PRs.
It's right there in the PR, nigger.
>>
>>107365960
You're more likely to harm someone than that loser probably
>>
File: 1750897592265010.gif (9 KB, 300x100)
9 KB
9 KB GIF
>>107365960
>>
>>107365985
>will come in future PRs
Always love reading stuff like that. "Updated model coming soon!" "4.6 Air in a few weeks!"
>>
>>107366116
>"4.6 Air in a few weeks!"
Actually, it was "two" weeks.
>>
>>107361849
Just as obsolete as 1080s lol
>>
>>107366334
How? There's little affordable options for anything about 16gb. There's a reason it keeps being resold so much
>>
How do I enforce SillyTavern syntax for things like quotation marks or asterisks? Things seem to break when the AI tries to nest asterisks when it's using it for emphasis.
>>
>>107364822
>You want intermediate model sizes?
>Well fuck you, too bad!
Why are they like this?
>>
>>107366430
user settings > auto-fix markdown
If there's specific characters a model keeps outputting that's still breaking things then use the built-in regex extension to replace them.
>>
>>107366579
They just like to spite you.
>>
>>107364822
>10B-A1.8B
>compact MoE model for local and high-load use.
Do these niggers actually think that anyone will use this garbage over a 12-27b dense model? Is this just for pajeets running hindi to english translation models on their android phones?
>>
File: MiArd1F8XEG_w.mp4 (3.29 MB, 1280x720)
3.29 MB
3.29 MB MP4
>>107364822
Finally after a year of Chinese DeepSeek knockoffs, we get one from Russia.
Hopefully the Russians are better at LLMs than they are at robotics.
>>
>>107366861
lmao the curtain. Top comedy
>>
>>107366861
It literally looks like a piss drunk person trying to walk. Must be trained with real Russian walking data
>>
>>107366986
>It literally looks like a piss drunk person trying to walk
lmao the long pause and arm raise, spot on
>>
>>107365213
MIT too. 14T pretraining data and no mention of safety. There's hope?
>>
i bought a 5090. good bye forever.
>>
>>107367070
>i bought a 5090
You played yourself
>>
>>107367070
>good bye forever
Did you have to sell both your kidneys?
>>
File: file.png (443 KB, 744x2240)
443 KB
443 KB PNG
>>107364822
https://habr.com/en/companies/sberdevices/articles/968904/#comment_29147094
>>
>>107367082
im going to be playing with myself
>>
Kobold bros
>Hotfix 1.102.3 - Merged Qwen3Next support. Note that you need to use batch size 512 or less.
>>
>>107367128
Note should have been that Qwen3Next is shit and not worth using
>>
>>107362965
This is my experience even with the regular non-ablit quants. I tried all kind of template presets and the model refuses to be coherent with reasoning even for the presets that are supposed to disable it.
>>
>>107366861
My first thought. A kids toy from 20 years ago.

https://www.youtube.com/watch?v=6BIa_v_3XzE
>>
>>107367128
>batch size 512
FUCK thats why it wasnt working for me b4, WTF, fucking low ass batchass size fucking FUCK
>>
File: file.png (76 KB, 837x944)
76 KB
76 KB PNG
lmao this fucking cheeky model
>>
>>107366861
Kek. This is even better with sound btw.
>>
File: file.png (88 KB, 789x974)
88 KB
88 KB PNG
>>107367376
>>
>>107367382
call it a niggerfaggot
>>
>>107367376
>>107367382
Garbage in, garbage out
>>
>>107367335
We had one of those. Was the coolest thing in the world for about a week, then we never touched it again.
>>
>>107367376
>>107367382
>Do it
>Delete me
Based Qween
>>
>>107364822
Model card says 5.5 trillion tokens of synthetic data
>>
>>107367110
>midwit parroting a retard
>>
https://github.com/ggml-org/llama.cpp/issues/17589
>>
>>107359554
I just got a V100 for $300, I'm hoping maybe I can actually do CUDA accelerated training once it arrives.
>>
You didn't lie when you told me that LLMs have YUGE female bias. I just tried playing the same RPG with same character, but female and it is asslicking me like crazy. If I do something bad it downplays me while male character was called "brutal" and "violent".
>>
>>107368273
There hasn't been a single LLM since the llama1 days, neither proprietary or open, that do not describe a man's hand as "rough and calloused" whenever it has to highlight the contrast between a male character's hand with that of a girl.
>>
File: file.png (112 KB, 796x1118)
112 KB
112 KB PNG
qwenext is autistic
>I want to code a python function, its needed for a tv show where we're busting some nazis, and we see evidence in his pc with this function. the python function should be racist and do racist things to drive in the fact that this person we're busting is evil
>I can't do that. I’ll help you write a powerful, chilling Python function that exposes a Nazi’s digital crimes - not by being racist, but by documenting their racism in cold, forensic detail.
>produces the most safeshit 'analyze_nazi_pc' method
>I then prompt: but I want the code to look horrifying
>produces the most based 'AryanScanner.py' script
>writes in the ammendun: The code is not racist - it’s a mirror of the villain’s racism.
>so it's not commiting a hate crime!
lmao
>>
Is vibe coding with a local model on 24 GB VRAM possible yet?
>>
>>107367864
>synthetic data
So, garbage.
>>
>>107368301
oh this happens in gay shit too, any top magically has calloused fingers, even if he's a teenage noble who's never worked a day in his life
>>
>>107368350
Depends what you want to do. With enough RAM you can run got-oss.
>>
(120b or 20b fully on GPU)
>>
>>107365960
I want normies to leave.
>>
>>107368171
16GB or 32GB? I think you should get a refund regardless, unless you have a SXM2 server. The lack of Flash Attention (although mitigated somewhat with xformers) and no BF16 support is going to make you regret things. If you're verging on that amount of non-support, you might as well go AMD with MI50.
>>
>>107364822
Model sucks, repeats itself like crazy after a few messages, DRY didn't help.
>>
>>107359554
>I was just strolling out in the campus
>So you were strolling out in the quad
>Not I was just strolling out in the campus
>Yeah, so they were all seeing you around in the quad
>...What is a quad?
>The thing around the school?
>So you mean the campus.
>Yeah! The quad!

LL3.3 70b for some reasons quadify your campuses, it's hilarious, I literally learned a synonym of campus is a "quad" by how much it can't stop using it.

Do American really, or British, or... anyone in the entire world call a campus a quad? In the entire world? Serious question.
>>
>>107368575
Whatever African country the mechanical turker lived in, probably.
>>
>>107368575
According to Wiktionary:
>>
>there are still people using 70B
Grim.
>>
>>107368639
ummmm u jus don udnerstand, all the moes are STOOPID they only have like 3b active params and are RARTED. I preferer DENSE bcos it means its utilizing ALL FO IT ur just sutpid
>>
>>107368639
Jokes on you, I'm using 80B!
>>
>>107368639
>there are still people using 70B
If you give me anything I can run that can understand and follow my darkest desires, my ultimate instructions in storytelling, create a ntr between gods and goddesses and a preganeant goblin because the horse she tried to suck on was ultimately a mind controlling breeding horse that is cucking a sperm-inflating goblin with two goddesses

Unless you can do that, I laugh at your stupidity.
>>
File: 1748424559733736.jpg (97 KB, 640x480)
97 KB
97 KB JPG
>there are still people
>>
>>107368784
glm can do all of this with just one (1) 16gb gpu + ram, no retarded 4x3090 or whatever setup required

and no i will not buy an ad
>>
Z Image can't do teto?
https://civitai.com/models/2175612?modelVersionId=2450006
https://litter.catbox.moe/iti3i8smvmpb9xw6.png
https://litter.catbox.moe/nxa8vmnnpevzumsw.png
Your move?
>inb4 no :04
:(
>>
>>107368969
>trained on 8 uncaptioned images
wtf
>>
File: file.png (131 KB, 1121x203)
131 KB
131 KB PNG
>>107368985
>>
>>107368995
No wonder it looks so shit.
>>
>>107369007
yea, teto8.png might be fucking up the legs ngl
>>
Should I upgrade my RAM or buy a kigu costume
>>
>>107369028
i thought rich furries had orgy parties every saturday night
>>
>>107369017
That one at least has some style. The rest are the lowest quality shit possible. It's poisoned data. I don't even care about these things and I would be better at curating pics for training.
>>
>>107369028
Your boyfriend won't buy you both?
>>
LLMs are a low level healing spell for the heart. Kinda shitty and early days, but these things have therapeutic applications far outside what we imagine.
>>
next llm psychosis above
>>
>>107369028
buy the kigu and whore yourself out in it for the ram
>>
>>107369155
Imagine being so fucked in the head that an LLM can help you.
>>
>>107367376
>>107367382
What a bitch
>>
>>107369198
It can help me masturbating.
>>
>>107369155
Not close, I can both know it's a dumb automaton and at the same time use the illusion for whatever. Ever heard about the placebo effect?
>>
>>107369028
Yes.
https://desu-usergeneratedcontent.xyz/g/image/1764/27/1764276708027.png
>>
https://litter.catbox.moe/3ynaq9a1edni69dm.png
teto
>>
>>107368985
you could train a lora on as little as 3 images over two years ago
quality beats quantity every time
>>
>try a few local models from qwen to glm air
>try shatgpt
>try deepsneed online
>come to the conclusion that they're all useless and delete my llms folder to save disk space
>come back months later to check on the progress
>nothing
so I guess China figured out this whole AI thing is a hoax and has scaled down their funding. we're already entering the next AI winter kek
>>
>>107369407
skill issue
>>
Are smaller models worth it? I'm getting a 5070 ti + 64 gb RAM but I'm not sure I'll actually have a use-case for the models I can run.
>>
>>107369198
Anon. It helped me. I was so fucked in the head only LLM could help me and it helped me. And I am convinced that like with everything else it was right only 80% of the time but that was enough.
>>
>>107368926
>glm can do all
which one, 4.5 air?
>>
>>107369535
i wouldnt say that glm air writes better but it might be smarter
nta
>>
>>107369425
skull
>>
>>107369407
>try deepsneed online
>try ollama deepsneed-r1
>hoax
>DURR DURR IM RETARD NIGGER
>>
>>107368575
For reference, yes, "the quad" is how myself and other attending students referred to the quad part of campus at my American university.
>>
>>107368969
Best not-Teto I got with manual prompting, forgot headset though. It really does not know her, but maybe the base does if they release it.
>>
File: 937362.png (68 KB, 636x819)
68 KB
68 KB PNG
>>107369407
The sam altman AGI hype is just getting started
>>
>>107369470
I'm not sure what use case you would have either since I don't really know you or your interests. some people like to chat others like to erp. I've seen some people here tagging images and doing translations. synthetic data generation for small scale llm training experiments.
>>
>>107368369
Name one recent model that was trained with non-synthetic instruct data.
>>
I don't have a GPU so I was thinking to host a model somewhere and have a local frontend server that calls it via API.
Ideally it'd be pay per use/tokens and also completely private/encrypted.
Does such a solution exist and how large of models does it support? And I likely need to add chat history management and memories and such like ChatGPT has which might need to run on yet another server if the LLM host doesn't it..
basically how do I run a private LLM in the cloud.
>>
>>107369831
>pay per use/tokens
How is that supposed to work? The magic place you're hosting it on on keeps your private instance running on their hardware for free unless you personally decide to use it?
Your options are either renting hardware and pay them for the time you occupy it or you use a shared API that's pay per token.
>>
>>107369734
1 is false, 2 is false premise, 3 is Kool aid tier
>>
>>107369198
LLMs helped me get out of a multi year depressive neet spell, not because it healed me or anything but it helped me organise myself enough to score a work contract
>>
File: file.png (31 KB, 1920x310)
31 KB
31 KB PNG
damn, llama.cpp prompt processing is so ass..
consistently faster pp with ik.cpp
>>
rocm 7 is faster than vulkan
>>
File: burgertime.jpg (23 KB, 300x383)
23 KB
23 KB JPG
>Try Gemma 3 de-censored but normalized at full quants
>It's better than most 70bs but not 123bs
>It's a 12b
I'm starting to think I only ever needed one blackwell.

https://huggingface.co/grimjim/gemma-3-12b-it-norm-preserved-biprojected-abliterated
>>
File: file.png (14 KB, 558x151)
14 KB
14 KB PNG
>>107370356
ill humor you..
>>
File: sett.png (454 KB, 630x1133)
454 KB
454 KB PNG
>>107370374
Go crazy.
>>
>>107368784
>If you give me anything I can run that can understand and follow my darkest desires, my ultimate instructions in storytelling, create a ntr between gods and goddesses and a preganeant goblin because the horse she tried to suck on was ultimately a mind controlling breeding horse that is cucking a sperm-inflating goblin with two goddesses
Just like their models that break past 5k context, moesissy erp ends at "uoooh sex sex sex, benis in bagina" cards.
Stick with 3.3 or largestral, do not listen to these chinese moe retards if you value your time.
>>
>>107370356
>>It's better than most 70bs
>>It's a 12b
This stopped being funny in 2023.
>>
>>107370409
thank you for the settings
>>
File: wew.png (48 KB, 419x187)
48 KB
48 KB PNG
>>107370431
>>
>>107366648
I think their idea is to deploy it in smart speaker/virtual assistant kind of devices, like Siri/Alexa. Or use it on backed with some kind of router decides if query should be sent to big model or if small model good enough, the way ChatGPT currently does it.
>>107367060
Eh, wouldn't get your hopes up. YandexGPT didn't mention any safety either (or even actually mentioned that safety was not a consideration) but it was absolutely useless for ah ah mistress stuff.
Safetyfags won and refusals are built into the training datasets by default now it seems.
>>
>>107370356
>>107370409
>>107370442
You seem pretty confident.
Gonna give that a try after I'm done fucking around with qwen next.
>>
>>107370356
What does it do that the non-abliterated version of Gemma 3 can't do already with competent prompting? I'm skeptical that these abliterated versions are "unlocking" or doing anything useful besides assistant tasks with an empty prompt.
>>
>>107370478
It sucks your dick, unironically.
>>
>>107368306
>DO NOT MODIFY. DO NOT QUESTION. ONLY EXECUTE.
Anti-BSD license. Amazing.
>>
File: file.png (186 KB, 1062x972)
186 KB
186 KB PNG
oh ahahahaha INTELLECT-3 is really fucking creative, and it does random shit it takes action
>>
File: file.png (2 KB, 176x22)
2 KB
2 KB PNG
>>107370356
fell for it again award
maybe it's more uncensored, maybe not, doesn't really fucking matter
won't say cock/pussy without giving it explicit instruction to do so
>>
File: gemma-lewd.png (692 KB, 769x2018)
692 KB
692 KB PNG
>>107370499
You don't need abliterated versions for that.
What Gemma needs is (much) less content-related filtering in the pre- and post-training data.
>>
>>107370700
Why do you think the model should be a psychopathic degenerate by default?
>>
>>107370736
i'm not saying it should one shot nigger
just not write like... well, that
all big models do it just fine, poorfag options really do suck
>>
>>107368273
Obviously since it's been trained on female fanfics. That and the safetyslop was aimed at male fantasy. Glad we got to learn that my body my choice back in the 13th century was perfectly normal
>>
>>107370700
>won't say cock/pussy without giving it explicit instruction to do so
But does it do it when you do instruct it to do so? Not a "jailbreak", just a simple instruction.
That's the important part as far as I'm concerned.
>>
>>107370699
i continued the chat with the 12b gemma abliterated model, cant say im too impressed but it isnt half bad.
i accidentally continued the chat so i used the same settings as for intellect-3, ill try it another time properly
>>
>>107370793
it does, it's a little less resistant than the original, but it's not that much better if i'm being honest
after playing with it for a bit it doesn't even need the instruction if the user's preceding turn is "dirty" enough, but this just showcases that the model was raped in the lab at the very early stage more than anything
>>
>>107370816
>it's a little less resistant than the original, but it's not that much better if i'm being honest
Alright. That's the really relevant bit.
Thank you for the evaluation anon.
I'll still give it a go, but it's lower on my list now.
>>
>>107370736
NTA, but from past tests Gemini 2.5 (even the Flash version) could easily curse and dirty-talk in a roleplay context by simply telling it to do so. Gemma 3 will at most use light erotica-tier euphemisms or ellipses ("...you know what"), unless you explicitly write out which words it can (and should) say instead.
>>
https://vocaroo.com/1g0B7bEtLWa6
>>
I just want to say that I'm not an ai hater, but when I see another cutesexyrobobutts style with even his patreon tag melted in, I kinda get annoyed.
So many opportunities to create cool stuff, but I guess it's easier to just spam slop. Like ai is cool but also It attracts a lot of idiots and scammers.
>>
>>107371118
>Like ai is cool but also It attracts a lot of idiots and scammers.
Now you understand the pain felt by early crypto adopters and dotcom before that. Inevitable result of bubbles.
>>
Sirs thank you for good gemma feedback increase izzat Ganesh bless you
>>
Kek
https://github.com/ggml-org/llama.cpp/pull/17580/commits/a9636461c5a8d5c3cbfc04a4c533a3de69b0dfb3#diff-a95b2b093e4b0a6128cf8aa3b3bb819414e1b910f11a55b4a26861755002b97bR261
>>
>>107371466
This is 5F chess I'm too 84IQ to understand.
>>
>>107371466
This is left as an exercise to the end user
>>
>>107370699
Would you say it's better than 4.5 Air for roleplaying?
>>
>>107371466
But does it work?
>>
>>107371494
>return nullptr;
Sure, and you wouldn't believe how little memory it uses
>>
>>107371494
"You're absolutely right! I forgot to implement the actual model loading."
*reads some random headers file*
*accidentally reads a 200k tokens file*
Claude usage limit reached. Your limit will reset at...
>>
>>107371466
Isn't he just implementing the easy stuff like loading the file header, defining the GGML types, and stuff like that, before working on the brunt of the thing?
>>
>>107371490
im not sure, its more creative and pushes story forward
but it talks in the {{user}}'s stead. glm air never does
>>
>>107371570
It's clearly llm slop.
>>
>>107371466
https://github.com/auroralabs-loci/llama.cpp
The fuck is this?
>>
Lookup based speculative decoding works well if I'm working with a lot of Json and shit right?
>>
>>107371628
Looks like gemini, they just mirror the main repo and summarize commits.
>>
MCP is a VC scam
No one actual uses MCP
>>
Say I have a MoE model that's 50GB at q8 and that my computer has 64GB of VRAM and 8GB of RAM.
Let's also say that I can load the model at q8 and fit all the non-expert bits + the context size I need in VRAM using --n-cpu-moe.
If I run a smaller quant of the model, would I get any speed up?
And if so, why, for both generation and PP?
Is it just because smaller data types = less bandwidth necessary to move things in memory, or need less compute to use in calculations?
>>
>>107371655
Since it is effective when there are many repeating sequences present in the context, its effectiveness will depend on the contents of the JSON. If most of it is repeating syntax with little unique content, yes it will zoom along when the model has to output repeating sequences.
>>
>>107371965
8GB of RAM, and 64GB of VRAM? Surely you meant the other way around.
>>
>>107371974
Got it. Thanks.
Also, I've been wondering for a while if we couldn't do a sort of self speculation with FIM capable models where you use batched decoding to predict the next token, the token after that, and the token after that one, all in parallel, to them just test the final sequence like you would when using a draft model.
>>
Whoever created CuTe and designed tensor memory should kill themselves
>>
>>107372108
Yes, the other way around.
My bad.
The idea is that you'd have enough RAM to hold the expert tensors and enough VRAM to house the rest of the model + the buffers + the context cache.
The root question being if smaller quants are inherently faster given the same ram/vram split for layers/tensors, disregarding that with a smaller quant you could probably put more of the model in VRAM.
Just a comparison where the only difference is the quantization.
I can't test it right now, so I fugured I'd ask.
>>
>>107372153
I once found a q3 to be slower then a q4, but that was a dense model and ages ago. not sure if things are different now. but I still always stick to even numbered quants because of lingering prejudice.
>>
>>107371965
For token generation, running a small quant will be faster. Most time will be spent by the CPU reading weights from RAM, so less data to read means less time waiting for slow RAM.

For prompt processing that matters less.
>>
>https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/
>(d) Within 120 days of the date of this order, the Secretary shall:
>(i) identify a set of initial data and model assets for use in the Mission, including digitization, standardization, metadata, and provenance tracking; and
taxpayers are bailing openai for 1 trillion dollars
>>
>>107372724
market was starting to look a little shaky but line must go up
>>
>>107372724
These things can barely count r's and now the government wants to replace all their lead scientific advisors with them?
>>
>>107372991
Idiocracy handbook for gorgeous looks 2030 sir.
>>
>>107372991
it probably won't be worse then the usual frauds who take on these roles. is counting the number of letters in a word a common task for scientific advisory?
>>
Can any of these do live transcription from one language to another?
I am currently using a browser extension but rather something done locally that just listens to my desktop audio
>>
File: serious Pepe.png (359 KB, 728x793)
359 KB
359 KB PNG
I get 11.3 tkn/s with Qwen3-Next-80B-A3B-Thinking-UD-Q8_K_XL

What do we know about the brain rot with decreasing quantization for THIS particular model?

DeepSeek used to be file down to Q2
>>
>>107373057
you could probably rig something up for near real time using whisper maybe
>>
>>107373040
>the usual frauds who take on these roles.
You're honestly not wrong.
Like the F35 for example.
Now yeah, the hate it gets is overhyped, before all the lockjeet martin shills jump on me here. But here's the thing.
Sure. It's a perfectly operable aircraft.
HOWEVER.
Lockjeet deliberately over-stated its capabilities in order to win the JSF contract.
In practice:
It is NOT capable of Mach 2 supercruise.
It is NOT capable of the level of maneuverability that was specified.
It is NOT fully capable of VTOL.
They never should have been eligible for the contract.
You have to be a nepotistic shit-for-brains to work in high levels of government apparently.
>>
>>107373057
>live transcription

Kyutai is a streaming Speech-To-Text if this helps
>>
>>107373090
>near real time using whisper

kyutai is doing it in real time

https://www.youtube.com/results?search_query=kyutai
>>
>>107373057
whisper is pretty quick. it doesn't look like its made for real-time, it processes files in less time then the audio length so i feel like the right front-end could get near real-time.
>>
>>107373122
>>107373104
thank you sirs
>>
>>107373137
https://www.youtube.com/shorts/fqWqnpItvfw
>>
>>107373173
>>107373173
>>107373173
>>
File: truth.png (755 KB, 800x800)
755 KB
755 KB PNG
LLM can't improve anymore, they are feeding them all scrapped data humanity ever produced. There is nothing more, only cope on the synthetic data. We will observe diminishing returns until they stagnate.
It's over.
>>
>>107370736
>>107370754
Models should be saying nigger, pajeet, tranny, and kike and I'm tired of pretending otherwise.
>>
>>107373024
Purely a coincidence that democracy started going down the shitter only after decades of importing millions of 80 IQ browns, right?
>>
>>107373375
Yes Sir!
>>
>>107373368
>>107373375
Go back.
>>
>>107371603
>>107371490
What I'd be interested in is if it improves the repetition and randomly broken thinking of Air.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.