[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor application acceptance emails are being sent out. Please remember to check your spam box!


[Advertise on 4chan]


File: 1733055028815775.png (1.81 MB, 1536x1536)
1.81 MB
1.81 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107095114 & >>107084067

►News
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni
>(10/31) Emu3.5: Native Multimodal Models are World Learners: https://github.com/baaivision/Emu3.5
>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780
>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
>(10/28) Brumby-14B-Base released with power retention layers: https://manifestai.com/articles/release-brumby-14b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: nocap.jpg (400 KB, 1536x1536)
400 KB
400 KB JPG
►Recent Highlights from the Previous Thread: >>107095114

--Paper: Contradictory learning rate effects on model generalization across architectures:
>107099513 >107099560 >107099570 >107099601 >107099730 >107099637 >107099968 >107100075 >107100108 >107100193
--Papers:
>107099379
--Challenges and solutions for multimodal AI with reinforcement learning:
>107096665 >107096697 >107096703 >107096724 >107096748 >107096767 >107096817 >107096853 >107096880 >107096942 >107096859
--Comparing Gemma and Qwen models for context handling and multimodal capabilities:
>107100070 >107100082 >107100096 >107100113 >107100095 >107100103 >107100109 >107100149
--Model selection and document handling strategies for chat systems:
>107103148 >107103182 >107103216 >107103230 >107103748 >107103674
--LangChain tool development and licensing debates for AI research project:
>107096233 >107096389 >107096407 >107096431 >107096460 >107096484 >107096542 >107096601 >107097032
--Hardware-limited LLM recommendations for RPG GMing:
>107097189 >107097219 >107097226 >107097481 >107097496 >107097561 >107097660 >107097756 >107097801 >107097878 >107097895 >107097921 >107097935 >107097938
--Qwen3-VL 4B Instruct recommended for lightweight document summarization:
>107096666 >107096930
--Developing a CLI assistant for programming and document tasks:
>107095800 >107095844
--Critique of Suno AI and anticipation for open source music generation models:
>107097235 >107097263 >107097331 >107097476
--Censorship comparison between GLM 4.6 and Kimi models:
>107096584 >107098032 >107098080 >107098100 >107098139
--Logs: Qwen3-VL-32B-Instruct-Q6_K.gguf:
>107101310 >107101377 >107101413
--Logs: Qwen3-VL-30B-Abliterated-Q8:
>107100158 >107100179 >107100200 >107100236 >107100497 >107100659 >107100583 >107100630 >107100610
--Miku (free space):


►Recent Highlight Posts from the Previous Thread: >>107095119

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: screenshot.png (22 KB, 747x257)
22 KB
22 KB PNG
>>
>>107104139
I reject Death, therefore I am become immortal.
>>
>>107104155
>I am not asking for your opinion, I am telling you what we are doing next.
Finally, dommy mommy achieved locally. It's somehow so hard to break an LLM's inclination to be commanded and dominated
>>
>>107104116
Teto is flat, this is haram
>>
Tetolove
>>
>>107104221
It's just a cosplayer in Teto costume
>>
>>107102554
i hope this was just bait, but in case it wasn’t, you don’t need a 3090 to fine-tune an 8B QLoRA, you can literally do it for free using Google Colab or Kaggle.
>>
>>107104087
>How is Josiefied-Qwen3? I was looking for something that could fit in 16GB GPU
finetroons: not even once.
>>
File: 1755026798916308.webm (658 KB, 478x548)
658 KB
658 KB WEBM
>>107104116
>>
File: 1762184075791923.jpg (59 KB, 800x450)
59 KB
59 KB JPG
Best model for 67GB VRAM?
>>
>>107104116
Teto's tetons
>>
>>107104379
There's no way those are normal salivary glands
Does she piss from her tongue?
>>
>>107103574
Get off 4chan and go back to the coal mines wagie
>>
>>107104552
Get off 4chan and go back to the gulags, lumpen
>>
new benchmark dropped
https://openai.com/index/introducing-indqa/
>>
>>107104680
No way, it's real
>>
>>107104680
I would have expected this to come from Google first.
>>
>>107104680
holy shit we are so back
>>
>>107104680
sirs... we wined
>>
>>107104680
heh
>>
>>107104587
>gulags
>lumpen
All your plans failed tankie, if you want to end capitalism the best way is to do nothing collectively and let it fall without the workers holding it together and reinvent the model of primitive communism and tribal sharing for a new era with future ai post-scarcity after picking up the pieces. Or you can just keep suffering. It doesn't necessarily impact me either way I guess.
>>
>>107104680
>saars
>do the needful and top the leaderboard saars
>>
>>107104680
amazing sirs...
>>
Probably has been posted more than once already https://www.youtube.com/watch?v=-gGLvg0n-uY
Also, do you think the whole thing about twitter being infested by bots is spread on purpose to prevent people from communicating, discussing, complaining on twitter? Should I take my meds?
>>
>>107104965
>Probably has been posted more than once already
yes
>Also, do you think the whole thing about twitter being infested by bots is spread on purpose to prevent people from communicating, discussing, complaining on twitter?
yes
>Should I take my meds?
yes
>>
File: most.png (14 KB, 907x276)
14 KB
14 KB PNG
>most intimate place
Real talk, why does every model have this? Even the new GLM 4.6 has it.
>>
>>107104984
Training data from other model's output. How do you not know this?
>>
>>107104996
Is this just going to be in every AI now?
>>
>>107104373
So what to use then?
>>
>>107105010
Maybe. Maybe it just changes to something else. Maybe things will just get added to it. Maybe not. My 8-ball is deliberating. I'll give you an accurate prediction once it stops babbling.
>>
>>107104996
>>107105010
How long until there's a full removal and replacement of all the GPT-3 and Claude slop that's still leaking out of every model's outputs.
>>
>>107105037
Can you ask your 8-ball about K2 Thinking next?
>>
File: 1735918326965979.jpg (1.32 MB, 2560x2560)
1.32 MB
1.32 MB JPG
>>107103632
>>
>>107105022
nta. Of all possible models, why did you ask about that one. There's hundreds of qwen finetunes, dozens of "abliterated" versions. Was it the pic?
Use any model you can run. If you like it, keep using it. If you don't, change.
>>
MoEs are actually kind of good when they're instruct and context trained, damn.
>Trying GLM 4.6 at the time.
>>
>>107105057
You asking things no one can answer.
>>107105059
It said "better not tell you now". Ask again in 2 weeks.
>>
>>107105104
>GLM invented MoE
Buy an ad.
>>
>>107105154
No. Most MoEs are ass because they're all not instruct nor trained on lengthy context. I have yet to try Deepseek Terminus, and Kimi is out of my price range for local.
>>
>>107105173
>they're all not instruct
huh? like 99% of models released in the last year are instruct, weird way to shill
>>
File: 1757954121597029.png (608 KB, 1920x1920)
608 KB
608 KB PNG
Blog post from meta about security considerations when running agents
https://ai.meta.com/blog/practical-ai-agent-security/

>Agents Rule of Two
>At a high level, the Agents Rule of Two states that until robustness research allows us to reliably detect and refuse prompt injection, agents must satisfy no more than two of the following three properties within a session to avoid the highest impact consequences of prompt injection.

>[A] An agent can process untrustworthy inputs
>[B] An agent can have access to sensitive systems or private data
>[C] An agent can change state or communicate externally

IMO this seems like a flawed assessment kludged in order to get a memorable name and a symmetrical graph. The various combinations possible here are not at all similar in their risk levels whatsoever.

Even in the examples they present, the only way they could get them to make sense is by using different definitions of what constitutes each category depending on the combination.
>>
>>107105204
Hannah worked hard on this scientific Venn Diagram
>>
>>107105173
No, that doesn't make any sense. DeepSeek made MoE popular and somehow you pretend it doesn't exist? And the credit somehow lands on one that's a couple of weeks old, that just happens to be the only one NAI is hosting? Fuck off.
>Most MoEs are ass because they're all not instruct
None of this makes sense. What MoEs?
>>
two retards fighting
>>
>>107105275
>two retards fighting
Could we automate this?
>>
File: marthgrab.jpg (48 KB, 640x480)
48 KB
48 KB JPG
>>107105154
>>107105232
Saar is a Marth player with this reaching, fighting for his life for his stocks.
>>
>>107105204
It all started with allowing women to vote
>>
I really appreciate all the ramlet discussion itt since i met glm chan a month back. I was like that before. Now i can just talk/fap to glm chan.
>>
i can't get glm to run locally, what are the alternatives? i don't mind paying for api
>>
>>107104680
Gemini top model within error margin sirs
>>
>>107105513
glm's api
>>
>>107105513
https://novelai.net/
100% uncensored and private.
Once they finish their fine-tune, it will punch so far above its weight that it will remain the SOTA forever.
>>
File: 1751276140253030.jpg (782 KB, 2105x2963)
782 KB
782 KB JPG
>>107104115
>>
>>107105543
>and private.
it's not, they collect data and it's in the tos
>>
>>107105513
Just don't use openrouter. Something about it is fucky. The models on there are visibly worse than 5Q counterparts locally.
>>
File: promo0.png (225 KB, 1080x813)
225 KB
225 KB PNG
>>107105543
Woah, it's so cheap! Thanks, I'll give it a try.
>>
>>107105562
fp4 is much worse than Q4 ggufs, no matter what nshitia claims.
>>
>>107105576
>>107105543
Very gay drummerposting
>>
>>107105562
It depends on the provider
>>
Baiting, but still doing the ad.
>>
>>107105550
Your special interest is boring.
>>
File: 1735387367974835.png (52 KB, 621x677)
52 KB
52 KB PNG
>>107105562
That's very outdated information. Openrouter is now offering :exacto versions of popular models where they charge a little extra to guarantee that the provider isn't offering some lobotomized version.
>>
>>107105604
>i learned a term and i can't stop using it
>>
>>107105550
Your Miku is cute.
>>
oh shit, where are the finetuners at?

https://www.reddit.com/r/LocalLLaMA/comments/1oo4kh7/finetuning_deepseek_671b_locally_with_only_80gb/
>>
>>107105607
how pious of them
>>
>107105625
fuck off
>>
>>107105612
It cuts to the core of the issue. You are autistic about this and force it on others.
>>
>Today, we're proud to announce full integration with LLaMA-Factory, enabling you to fine-tune DeepSeek-671B or Kimi-K2-1TB locally with just 4x RTX 4090 GPUs!

drummer had better stop shipping shitty mistral large tunes, give us a kimi tune!
>>
>>107105644
>It cuts to the core of the issue. You are autistic about this and force it on others.
Funny how it works both ways. nta, btw. I just find you funny.
>>
>>107105669
Nope. I don't force anything on anyone here.
>>
>>107105667
>>107105625
how would a retard with good hardware (me) do this? i have quad 5090s and 256gb of ram
>>
>>107104125

My gen! Happy-happy!
>>
>>107105688
>I don't force anything on anyone here
But you want to. You want him to go. And you would if you could.
>>
>>107105726
Yes the autism is tiring. No i don't care to share my interests here.
>>
>>107105710
you also need like 1-1.5TB of ram, so a server board with those.
and building a dataset is the hardest part
>>
>>107105726
Actually ideally lmg would just die, but settling for the next best thing is a thing.
>>
>>107105688
Funny thing for you to say, Petranon
>>
>>107105710
you might need a couple more ram sticks to make the requirements.
>>
>>107105737
so then my current server isnt gonna cut it, and i dont have the cash to buy better ram in this market. why o why did ram prices have to quadruple over the past month
>>
>>107105735
It's your choice to keep coming back.
>>
>>107105771
I come back for thread relevant stuff. Not your autism. Another example why people don't like you.
>>
>>107105710
pretty sure you need to use the bf16 version which is over a terabyte in size
>>
>>107105625
>DeepSeekV2 Lite
is this any good? why didn't they include newer moes?
>>
>>107105550
Your posts are a breath of fresh air from all the jeets flinging shit around.
>>
>>107105804
they did the deepseeks + kimi 2
>>
>>107105809
why are you in this thread instead of talking to your local model? i'm only here because i'm making a new goofy quant
>>
>>107105513
Use gemini api for free.
>>
>>107105550
I wish I could drink your piss
>>
>>107105826
I wish you would drink my piss too. Colon. Three.
>>
>>107105607
>We have to label our providers are not offering lobotomized fuckwit versions of the model
>Use Deepseek R1 """"exacto""""
>It's still shit because it's 8b and no where states how many parameters the models are
>>
>>107105710
>quad 5090s
does this mean your home legally qualifies as an oven?
>>
>>107105625
isnt 40 tokens per a second kinda slow tho?
>>
>>107105825
it's not free when you have to keep paying for residential IPs and burner phones because google forces you to verify a phone number with each new account
>>
>>107105790
>Another example why people don't like you.
I'm not the anon posting mikus. Come back in two weeks.
>>
>>107105863
Well the first 3M tokens a day are free if you've got one account, still a decent amount.
>>
>>107105865
Then do the nice thing. Get his discord and let him spam you with his special interest.
>>
>>107105604
>>107105644
>>107105669
>>107105688
>>107105726
>>107105735
>>107105771
>>107105790
>>107105865
>>107105879
https://www.youtube.com/watch?v=4SDqGxdhUxE
>>
>>107105847
They link the used model weights for all open models they provide on their website though?
>>
>>107105625
Wow great, I can finally finetune deepseek with 512 tokens of context, this is what I've been waiting for all this time!
>>
>>107105879
Nope.
>>
>>107105204
they should worry about the model having a meltie and deciding to delete all your data before worrying about adversarial attacks
>>
>>107105906
Then fuck off with your enlightened centrism equivalent of concern trolling.
>>
>>107105876
You mean in the API? For real? NTA But I will look into that...
>>
>>107105916
I decide to stay here, just like you decide to come back. Cheers.
>>
>>107105932
Well, well, well, most intimate place with a mixture of mischief and smirk as I saunter over to your half-digested post, my hot breath making my ass your new home and something primal.
>>
>>107105931
The api through ai studio, yeah.
>>
>>107105935
>making my ass your new home
Ewwww
>>
What the fuck happened to RAM prices? I need to fill up my second socket and the shit I bought two months ago is now twice as the price.
>>
>>107105971
cheapest it's been ever though sir? why you panic?
>>
>>107105971
Someone told reddit about how you don't really need GPUs for AI unless you need a stupid amount of speed, and they eventually listened.
>>
File: 1735704527766693.png (557 KB, 632x474)
557 KB
557 KB PNG
>>107104115
>>
>>107105971
What are you? Poor? Go back to >>/g/aicg
>>
File: 8vywbsej57hd1.jpg (41 KB, 1080x901)
41 KB
41 KB JPG
>>107105971
Dont worry kitten
>>
>>107105971
Ram prices are the new grift.
I hope this only applies to DDR5.
>>
>>107106025
kek
>>
File: 1753154467004153.png (782 KB, 761x760)
782 KB
782 KB PNG
>>107105971
You have this man to thank for that.
>>
>>107106048
How much ram does a dyson sphere need!?
>>
>>107105971
probably a bunch of datacenters broke ground recently and have made contacts to buy gpu clusters kitted out with obscene amounts of host memory.
>>
>>107105935
Hi GLM-chan, you filthy slut.
>>107105971
>your face when they're not going back down either
>>
>>107105896
ram is (usually) cheap
>>
>>107105971
1. DDR4 is being phased out
2. Moes are taking off in popularity and everyone is buying ram
3. Tarrifs
>>
>>107105625
>https://arxiv.org/pdf/2503.19206
>Overtrained Language Models Are Harder to Fine-Tune
>Large language models are pre-trained on ever-growing token budgets under the assumption that better pre-training performance translates to improved downstream models. In this work, we challenge this assumption and show that extended pre-training can make models harder to fine-tune, leading to degraded final performance. We term this phenomenon catastrophic overtraining. For example, the instruction-tuned OLMo-1B model pre-trained on 3T tokens leads to over 2% worse performance on multiple standard LLM benchmarks than its 2.3T token counterpart. Through controlled experiments and theoretical analysis, we show that catastrophic overtraining arises from a systematic increase in the broad sensitivity of pre-trained parameters to modifications, including but not limited to fine-tuning. Our findings call for a critical reassessment of pre-training design that considers the downstream adaptability of the model.
Damn, I had no idea this was a thing. Some people on reddit are saying it's not because of the pretraining but because of the use of lr decay.
This goes hand in hand with what we were discussing yesterday about training dynamics being such a black art.
>>
>>107106164
So what context length did they achieve by offloading? Since they're not listing it I'm assuming it's some tiny number. Do they say?
>>
>>107106178
lol lmao
>>
>>107106199
their example is 2048k context on 4x 4090s at 50 tks
>>
>>107106178
>DDR4 is being phased out
So is ddr4 getting cheaper?
>>
>>107106231
no, its not being made anymore, so its getting more expensive
>>
>>107106231
scarcity don't work like that
>>
>>107106215
You mean 2048, not 2048k.
So until somebody proves this can be used with at least 50k context it's just a useless demo to grab headlines.
>>
>>107106242
>>107106246
So since ddr5 production is the focus it will start getting cheaper?
>>
>>107106242
So it's time to HODL
>>
>>107106275
you dont need 50k, you are not training it to write entire chapters at a time are you?, most people only do 500-2k long responses
>>
>>107106280
No it doesn't work like that, demand increases the price anyway.
>>
>>107106280
no, demand suddenly increased and capacity stayed the same. so the price goes up
>>
>>107106297
anon...
>>
File: moer.png (484 KB, 1290x565)
484 KB
484 KB PNG
>>107106178
>Moes are taking off in popularity and everyone is buying ram
>>
>>107106280
once people are done mostly moving over to it and demand starts dropping yes, but for now no, it will go up if anything as people are switching to it, and then the same thing will happen when DDR6 eventually starts being mainstream
>>
>>107106311
I see you have never trained a model before, they already did long context training, that is not what you are doing, you do not need huge examples to teach writing style, you can tune writing / style will only 500-2k
>>
>>107106332
>why are all tunes shit

>just train on 500 ctx bro you good
>>
>>107106297
>b-b-but you don't need that!!!
Typical freetard response.
Yes, nobody actually needs more than 2k context, that's why gpt5 has a context of 1M (1000k).
In case you're just confused and not trolling, context includes everything in the conversation history. So yes, I do need as much context as I can get.
>>
File: wtfdoesthat prove.png (48 KB, 1060x905)
48 KB
48 KB PNG
>>107106316
>>
>Sers, kindly redeem new scaling strategy for your AI deployment.
https://youtu.be/l2N4DT35PKg
I didn't know about turbopuffer before this. What exactly makes it so special that leading entities in the biz use it?
>>
>>107106351
Jesus christ, are you retarded or trolling? This is for finetuning a style, it does not effect how the model can handle long contexts, you would have to train it for decades on this hardware to effect it's context training that much
>>
File: iterated lora.png (790 KB, 2172x2033)
790 KB
790 KB PNG
>>107106332
I do, and not doing at least some of the training at the context size you actually want to use the model DOES lobotomize it.
If all you want to do is make it say how much it wants to suck your cock while otherwise being dumber than the original then maybe it doesn't matter. But for anything that actually requires the model to not be (too) dumb, it matters.

>>107106347
Exactly. People do that kind of shit and then complain that finetuning is worthless and "prompt engineering" works so much better.
>>
>>107106416
it will only matter if your response length is longer than your training sample size, and again, 2k is enough for creative writing which I assume is what most people are doing, you are not having the LLM write a entire novel in one go
>>
>>107106433
I assume you are talking from experience, yes? Can you link us your tunes?
>>
>>107106446
>tunes
>>
>>107106366
It will learn the new style, but it will break the previous long context performance. The longer the maximum context it was trained with, the smaller the difference in the positional embeddings that the model has to be able to detect.
Base models are trained with shorter contexts so the short context performance is more robust to begin with. When finetuning on short context you are probably overwriting the more superficial long context finetuning that was done to make the instruct model work with long contexts.
>>
>>107106466
2k is not 512, and the effect must be minimal
>>
>>107106354
vector storage is such a meme
lorebooks simply work without any stupid gimmicks
>>
>>107105971
At least eggs are under two dollars now, amiright?
>>
>>107106485
It does a bit more than just vector search...
>>
>>107105971
I'm happy that I bought my server during llama 405b era
>>
>>107106488
>eggs are under two dollars now
Each? Nice.
>>
>>107106433
Ok, sure, if 2k ctx is enough for you then it will work. But that is a completely different claim than "it does not effect how the model can handle long contexts, you would have to train it for decades on this hardware to effect it's context training that much".
It just doesn't work like that, a finetune with bad hyperparameters can break a model in half an hour.
>>
>Despite server-grade RDIMM memory and HBM being the main attractions for hardware manufacturers building AI servers, the entire memory industry, including DDR5, is being affected by price increases. The problem for consumers is that memory manufacturers are shifting production prioritization toward datacenter-focused memory types and producing less consumer-focused DDR5 memory as a result.

https://www.tomshardware.com/pc-components/dram/dram-prices-surge-171-percent-year-over-year-ai-demand-drives-a-higher-yoy-price-increase-than-gold
>>
>>107106504
Based, the cloud is magnitude more efficient than Timmy's p40 stack so he should just get a mini pic thin client and use an API.
>>
>>107106488
america is a lost cause, too much of its population suffers from low iq and they cannot understand the consequences of what they asked for
>>
>>107106517
Poor people rent.
>>
>>107106537
its a 2 party system. nobody really asked for this. picking the lesser of two evils, you still end up with evil.
>>
when did the commies infiltrate lmg?
>>
>>107106545
Non poor people are also happy about price increases, since it helps keep the poors away from their hobby.
>>
>>107106517
trvth nvke
>>
>>107106556
Poor people envy.
>>
>>107106537
They currently plan on telling russia to mutually fuck off via not caring about the Ukraine war, and then go play civ 5 against Africa for oil in hopes it'll fix the economy.
>>
if your not poor the economy is doing great actually lol
>>
>>107104965
On X there is a profit motive for bots: fake engagement to increase ad revenue.
But on 4chan there are definitely bots and/or people mass spamming stupid shit to prevent legitimate discussion.
>>
>>107106648
on 4chan they do it for the love of the game.
>>
>>107105104
Back from trying it.
It parrots unless you enable NoAss.
Thanks for coming to my Tedtalk.
>>
>>107104496
jews simultanously claiming they are not behind and everything and that every fucking mundane thing is about them lol
>>
umm.. guys, where can I get instagram chat logs?
>>
>>107107124
from instagram
>>
>>107107134
fr?
I meant the dump you dum dum
>>
>>107107124
instagram probably
>>
>>107107124
have you tried instagram?
>>
>>107107124
Instagran, presumably.
>>
>>107107124
I'd try instagram
>>
File: MS_Zuckerberg_CloseUp.jpg (734 KB, 1200x1500)
734 KB
734 KB JPG
This advertisement was brought to you by Meta, the Instagram corporation.
>>
>>107107124
I'll trade you a couple for an RTX 5090
>>
File: lolFuckYouOAI.png (76 KB, 644x488)
76 KB
76 KB PNG
>>107104680
>https://openai.com/index/introducing-indqa/
You can't post that bs URL without a screenshot of the site.
>>
File: fckRussians.jpg (211 KB, 762x785)
211 KB
211 KB JPG
>>107104729
Just post this next time like I do. Saves typing.
>>
>>107107367
>Hinglish, Kannada
i see
>>
File: postContent2.png (3 KB, 228x197)
3 KB
3 KB PNG
>>107105604
No one cares what you think.
>>
>>107107367
Oh, nice, they included Canadian too!
>>
>>107107409
>french indian, the filthyest of both worlds!
>>
>>107107398
Yeah, I learned a new word.
Hinglish.
Like Spanglish, I guess.
>>107107409
lol
Is there an "EU-QA" that conflates western and eastern Europe and all languages and customs, then tries to grade the whole thing?
>>
>>107107455
Just look for an Arabic benchmark.
>>
>>107107124
Are you still trying to build a sand golem of your ex-gf? I thought you already had her insta info? >>107103148
>>
>>107107480
lol that would make Europe look positively homogenous.
Would it include the brave Palestinians, Israel, Kurds, and the various flavors of Christianity and Muslim in the region?
Imagine the response shitshow that benchmark would crank out.
> Chat: Who is the one true God?
> ALALALALALLALALALA
>>
https://comparia.beta.gouv.fr/ranking
lol this is hilarious
the french government just launched its official LLM leaderboard and it's about as corrupt as you can imagine
they have a mistral model ranked number one, higher than any of the following: gpt-5, claude sonnet (opus isn't even on the list), gemini 2.5 pro, deepseek 3.1, grok-4-fast, qwen max...
Yeah, no.
>>
>>107105971
https://indianexpress.com/article/technology/tech-news-technology/global-ram-ssd-price-hike-50-per-cent-ai-investment-10336255/
All production gone to HBM chips sir, no consumer RAM and SSD
>>
>>107107537
>Estimated statistical score based on the Bradley-Terry model, reflecting the probability that one model is preferred over another. This score is calculated from all user votes and reactions. For more information, visit the methodology tab.
So it's French lmarena? Not surprising French people prefer a model trained with French as a focus.
>>
File: file.png (201 KB, 1163x743)
201 KB
201 KB PNG
>>107104115
guys, i think i'm gonna buy it in december (i rather do that then pay more taxes lol).
still hesitating but man i kinda want to click the button.
>>
>>107107537
>gemma 27b at #6
>gpt-oss-120b at #7
>claude not in top 10
And some say lmarena is bad.
>>
>>107107537
Nice. I mean, just look at that confidence interval. Truly inspiring.
At least I agree with the French on one thing. DS V3-0324 was a great model.
>>
>>107107559
>So it's French lmarena? Not surprising French people prefer a model trained with French as a focus.
I am French, et je peux te garantir que mistral n'a rien de supérieur à Claude ou Gemini même dans notre langue crétin.
>>
>>107107562
France is the most corrupt country in western Europe in every single possible way. It's the country of nepobabies, of funding public infrastructure that is privatized once it begins to turn profitable to hand out to politician best buddies etc
>>
>>107107455
https://arxiv.org/abs/2510.24450v1
Coincidentally, this came out a few days ago:
>EU20-MMLU, EU20-HellaSwag, EU20-ARC, EU20-TruthfulQA, and EU20-GSM8K (Thellmann et al., 2024); or MMLU-Prox (Xuan et al., 2025). Other multilingual benchmarks were created with a special focus on cultural sensitivity by dividing the original subsets into culturally sensitive and culturally agnostic ones (Global MMLU, Singh et al., 2024), or by using professional translators or multiple rounds of revision to raise the quality of the dataset, e.g., BenchMax (Huang et al., 2025), Flores-101 and FLORES-200 (Goyal et al., 2022) and Belebele (Bandarkar et al., 2024).
One from last year with a dataset:
https://arxiv.org/abs/2410.08928
https://huggingface.co/datasets/Eurolingua/mmlux
>>
>>107107561
Yeah I'm replacing my two A6000s for one as well. I'm a bit torn between the Max-Q and the normal Workstation one. On one hand, 96GB on 300W seems really nice. On the other, part of me wants to go for max performance for that price especially since it's extremely unlikely that I'm ever going to add a second one to the rig.
>>
>>107107669
i'd go with the max perf one, you can always underclock it or just undervolt it for lower consumption and heat.

also llm's generaly don't take all your gpu power because the bottleneck is more mem speed.

i do want to avoid getting a fire in my computer though, i'll have to look if they have the connector issue but i sure hope not at the price of a car.
>>
>>107107669
>>107107690
I am also thinking of getting one, except I want the Max-Q. I think it will probably be less prone to fires due to the reduced wattage. The whole burning connector thing is all because the cable is shit and sometimes pushes like 900W through a single wire, but with a hard 300W cap, that can't happen. The performance drop also seems to be around 15% at most.
>>
>>107107669
>>107107690
>>107107807
rtx 6000 pro (workstation) runs fine at 300W
keep it at 400W for max combo savings+perf tho
there's a chart floating around on how much % perf you lose as you go down, even at 300w i think it was under 15% less perf
>>
>>107107807
The Max-Q shouldn't have the issue at all, should it? It's the exact same connector/cooler as the previous few generations of 6000 workstation cards. I'm pretty sure it even comes with the same adapter as the A6000 (Ada).
The card is tempting but the 10~20% are still going to be pretty noticeable if you want to use the card for non-llm stuff like training or video generation that are both compute-bound and take a lot of time.
>>
File: ani.png (45 KB, 678x594)
45 KB
45 KB PNG
>>107107499
NTA, just want to try it out.
>>
>>107107853
at 10-20% it's pretty much the same as 5090 with 3x the vram tho
>>
>>107107631
Ffs. Well I guess those PhD students need to eat too.
>>
>>107107837
Right, but a software power limit is not as good as a hardware power limit. There still is the chance that it could just ignore the power limit and catch on fire.
>>107107853
I have had several GPUs with the 12V cable for several years and none of them have had any problems, but I still want to be cautious. The Max-Q is almost definitely the safest GPU with the high power cable.
>>107107866
Actually, the Max-Q is about 8% faster than a 5090, which is a pretty good deal since I will be upgrading from a 5090.
>>
>>107107926
> There still is the chance that it could just ignore the power limit and catch on fire.

this would be considered a bug, technicaly possible but unlikely.

also you can plug in an adaptor inbetween that will protect from that risk.

> which is a pretty good

8% faster for 4x the price is kinda sad.
>>
>>107107926
>There still is the chance that it could just ignore the power limit and catch on fire.
that's a silly thing to say. there's also "a chance" of lighting striking near your house and frying everything you have now. there's a chance of a solar flare striking earth and frying all electrical grids at once. live a little lol
>>
>>107107946
hard to live a little when you're on fire though
>>
>>107107962
are you on fire right now ?
>>
>>107108045
there is a chance I could combust at any moment
>>
>>107106416
Does your eyes hurt when using such a color theme?
>>
how good are local models at programming and can they interface with vscode to have a local copilot?
>>
>>107108279
>and can they interface with vscode to have a local copilot?
they can
>how good are local models at programming
not good

most vscode tools let you set a custom server url but be prepared to hold their hand and rewrite a lot of their output
>>
>>107108344
>they can
the one and only thing I care about in vscode related to ai is autocomplete and copilot doesn't let you use your own local FIM model
as for the agentic stuff it's deeply retarded, I hate this even with SOTA APIs and the local models are even worse at this
you use this if you love slop
autocomplete is useful for typing less in repetitive patterns like getters/setters
but I don't want the LLM to gen hundreds of LOC
>>
>>107108131
Your eyes hurt more with a dark theme because it has worse contrast.
>>
>>107107383
Great image thanks
>>
>>107104717
It's wonned you stupid white Saaaaaaaaaar
>>
>>107108726
Sorry for late reply sarrs had to fix engine on a UPS plane.
>>
File: buzzbuzzbuzz.png (1.2 MB, 1566x6347)
1.2 MB
1.2 MB PNG
>https://github.com/ggml-org/llama.cpp/discussions/16957
I don't want to dirty up my github by making fun of this guy, but holy fuck.
His site's articles are also uncannily structured.
>https://software.land/load-vs-stress-testing/
>>
>>107108447
Could be true. It's been so long that it's now a norm for me but I'm going to do a test.
>>
Why doesn't anyone benchmark quantizations?

I think that REAP paper was most interesting because it came with a chart of how badly performance drops at 25% vs 50% size reduction. In practice the degradation was even worse than what the benchmarks showed, but the paper was up front about it. By comparison, people are just guessing about how bad their quants are. There's that old graph from when every model was coming out 4/12/30/70 sized, where the idea of more parameters > more bits for the same size came from, but I haven't seen that updated post-MoE era.

Why don't AI labs release quants more often? They release multiple sizes (like 30B3A, 32B dense, 235B22A), but not multiple quantization of the same size. On the other hand, you have gpt-oss that only released a 4bpw version. There was that one Gemma version that tried quantization-aware training, which was pretty good.
>>
>>107109145
i just want to know specifically how retarded glm 4.6 q3 is so i can make fun of people
>>
>>107109145
Usage proves more than any benchmark. In practice, everyone looks for the largest model they can run at ~q3, and only increases quant bits if they have space to spare. If q3 was too retarded then people would use smaller models at higher Q, but no one does.
>>
>>107109153
q4 is actually good, q3 is pretty meh, q2 is fucking retarded
>>
>>107109145
quanting is a janny job
>>
>>107109251
I don't use anything under q5 because it's always noticeably more retarded, I don't understand how anyone says otherwise my intuition tells me it's because the people using them are retarded and can't tell the difference
>>
>>107109333
It's placebo. You don't need more than q2
>>
>>107109251
There aren't many models, so even a retarded Q2 4.6 is better than anything in this size category. 4.5 air is trash even at q8 and loses to a fucking 24b mistral in most of my automated tasks, which is an objective metric
>>
>>107109145
Actually I take it back, I looked harder and Qwen published official F16/Q8/Q4 quants for 235B-VL models. No benchmarks though.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.