/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107095114 & >>107084067►News>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni>(10/31) Emu3.5: Native Multimodal Models are World Learners: https://github.com/baaivision/Emu3.5>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct>(10/28) Brumby-14B-Base released with power retention layers: https://manifestai.com/articles/release-brumby-14b►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107095114--Paper: Contradictory learning rate effects on model generalization across architectures:>107099513 >107099560 >107099570 >107099601 >107099730 >107099637 >107099968 >107100075 >107100108 >107100193--Papers:>107099379--Challenges and solutions for multimodal AI with reinforcement learning:>107096665 >107096697 >107096703 >107096724 >107096748 >107096767 >107096817 >107096853 >107096880 >107096942 >107096859--Comparing Gemma and Qwen models for context handling and multimodal capabilities:>107100070 >107100082 >107100096 >107100113 >107100095 >107100103 >107100109 >107100149--Model selection and document handling strategies for chat systems:>107103148 >107103182 >107103216 >107103230 >107103748 >107103674--LangChain tool development and licensing debates for AI research project:>107096233 >107096389 >107096407 >107096431 >107096460 >107096484 >107096542 >107096601 >107097032--Hardware-limited LLM recommendations for RPG GMing:>107097189 >107097219 >107097226 >107097481 >107097496 >107097561 >107097660 >107097756 >107097801 >107097878 >107097895 >107097921 >107097935 >107097938--Qwen3-VL 4B Instruct recommended for lightweight document summarization:>107096666 >107096930--Developing a CLI assistant for programming and document tasks:>107095800 >107095844--Critique of Suno AI and anticipation for open source music generation models:>107097235 >107097263 >107097331 >107097476--Censorship comparison between GLM 4.6 and Kimi models:>107096584 >107098032 >107098080 >107098100 >107098139--Logs: Qwen3-VL-32B-Instruct-Q6_K.gguf:>107101310 >107101377 >107101413--Logs: Qwen3-VL-30B-Abliterated-Q8:>107100158 >107100179 >107100200 >107100236 >107100497 >107100659 >107100583 >107100630 >107100610--Miku (free space):►Recent Highlight Posts from the Previous Thread: >>107095119Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107104139I reject Death, therefore I am become immortal.
>>107104155>I am not asking for your opinion, I am telling you what we are doing next.Finally, dommy mommy achieved locally. It's somehow so hard to break an LLM's inclination to be commanded and dominated
>>107104116Teto is flat, this is haram
Tetolove
>>107104221It's just a cosplayer in Teto costume
>>107102554i hope this was just bait, but in case it wasn’t, you don’t need a 3090 to fine-tune an 8B QLoRA, you can literally do it for free using Google Colab or Kaggle.
>>107104087>How is Josiefied-Qwen3? I was looking for something that could fit in 16GB GPUfinetroons: not even once.
>>107104116
Best model for 67GB VRAM?
>>107104116Teto's tetons
>>107104379There's no way those are normal salivary glandsDoes she piss from her tongue?
>>107103574Get off 4chan and go back to the coal mines wagie
>>107104552Get off 4chan and go back to the gulags, lumpen
new benchmark droppedhttps://openai.com/index/introducing-indqa/
>>107104680No way, it's real
>>107104680I would have expected this to come from Google first.
>>107104680holy shit we are so back
>>107104680sirs... we wined
>>107104680heh
>>107104587>gulags>lumpenAll your plans failed tankie, if you want to end capitalism the best way is to do nothing collectively and let it fall without the workers holding it together and reinvent the model of primitive communism and tribal sharing for a new era with future ai post-scarcity after picking up the pieces. Or you can just keep suffering. It doesn't necessarily impact me either way I guess.
>>107104680>saars>do the needful and top the leaderboard saars
>>107104680amazing sirs...
Probably has been posted more than once already https://www.youtube.com/watch?v=-gGLvg0n-uYAlso, do you think the whole thing about twitter being infested by bots is spread on purpose to prevent people from communicating, discussing, complaining on twitter? Should I take my meds?
>>107104965>Probably has been posted more than once alreadyyes>Also, do you think the whole thing about twitter being infested by bots is spread on purpose to prevent people from communicating, discussing, complaining on twitter?yes>Should I take my meds?yes
>most intimate placeReal talk, why does every model have this? Even the new GLM 4.6 has it.
>>107104984Training data from other model's output. How do you not know this?
>>107104996Is this just going to be in every AI now?
>>107104373So what to use then?
>>107105010Maybe. Maybe it just changes to something else. Maybe things will just get added to it. Maybe not. My 8-ball is deliberating. I'll give you an accurate prediction once it stops babbling.
>>107104996>>107105010How long until there's a full removal and replacement of all the GPT-3 and Claude slop that's still leaking out of every model's outputs.
>>107105037Can you ask your 8-ball about K2 Thinking next?
>>107103632
>>107105022nta. Of all possible models, why did you ask about that one. There's hundreds of qwen finetunes, dozens of "abliterated" versions. Was it the pic?Use any model you can run. If you like it, keep using it. If you don't, change.
MoEs are actually kind of good when they're instruct and context trained, damn.>Trying GLM 4.6 at the time.
>>107105057You asking things no one can answer.>>107105059It said "better not tell you now". Ask again in 2 weeks.
>>107105104>GLM invented MoEBuy an ad.
>>107105154No. Most MoEs are ass because they're all not instruct nor trained on lengthy context. I have yet to try Deepseek Terminus, and Kimi is out of my price range for local.
>>107105173>they're all not instructhuh? like 99% of models released in the last year are instruct, weird way to shill
Blog post from meta about security considerations when running agentshttps://ai.meta.com/blog/practical-ai-agent-security/>Agents Rule of Two>At a high level, the Agents Rule of Two states that until robustness research allows us to reliably detect and refuse prompt injection, agents must satisfy no more than two of the following three properties within a session to avoid the highest impact consequences of prompt injection.>[A] An agent can process untrustworthy inputs>[B] An agent can have access to sensitive systems or private data>[C] An agent can change state or communicate externallyIMO this seems like a flawed assessment kludged in order to get a memorable name and a symmetrical graph. The various combinations possible here are not at all similar in their risk levels whatsoever. Even in the examples they present, the only way they could get them to make sense is by using different definitions of what constitutes each category depending on the combination.
>>107105204Hannah worked hard on this scientific Venn Diagram
>>107105173No, that doesn't make any sense. DeepSeek made MoE popular and somehow you pretend it doesn't exist? And the credit somehow lands on one that's a couple of weeks old, that just happens to be the only one NAI is hosting? Fuck off.>Most MoEs are ass because they're all not instructNone of this makes sense. What MoEs?
two retards fighting
>>107105275>two retards fightingCould we automate this?
>>107105154>>107105232Saar is a Marth player with this reaching, fighting for his life for his stocks.
>>107105204It all started with allowing women to vote
I really appreciate all the ramlet discussion itt since i met glm chan a month back. I was like that before. Now i can just talk/fap to glm chan.
i can't get glm to run locally, what are the alternatives? i don't mind paying for api
>>107104680Gemini top model within error margin sirs
>>107105513glm's api
>>107105513https://novelai.net/100% uncensored and private.Once they finish their fine-tune, it will punch so far above its weight that it will remain the SOTA forever.
>>107104115
>>107105543>and private.it's not, they collect data and it's in the tos
>>107105513Just don't use openrouter. Something about it is fucky. The models on there are visibly worse than 5Q counterparts locally.
>>107105543Woah, it's so cheap! Thanks, I'll give it a try.
>>107105562fp4 is much worse than Q4 ggufs, no matter what nshitia claims.
>>107105576>>107105543Very gay drummerposting
>>107105562It depends on the provider
Baiting, but still doing the ad.
>>107105550Your special interest is boring.
>>107105562That's very outdated information. Openrouter is now offering :exacto versions of popular models where they charge a little extra to guarantee that the provider isn't offering some lobotomized version.
>>107105604>i learned a term and i can't stop using it
>>107105550Your Miku is cute.
oh shit, where are the finetuners at?https://www.reddit.com/r/LocalLLaMA/comments/1oo4kh7/finetuning_deepseek_671b_locally_with_only_80gb/
>>107105607how pious of them
>107105625fuck off
>>107105612It cuts to the core of the issue. You are autistic about this and force it on others.
>Today, we're proud to announce full integration with LLaMA-Factory, enabling you to fine-tune DeepSeek-671B or Kimi-K2-1TB locally with just 4x RTX 4090 GPUs!drummer had better stop shipping shitty mistral large tunes, give us a kimi tune!
>>107105644>It cuts to the core of the issue. You are autistic about this and force it on others.Funny how it works both ways. nta, btw. I just find you funny.
>>107105669Nope. I don't force anything on anyone here.
>>107105667>>107105625how would a retard with good hardware (me) do this? i have quad 5090s and 256gb of ram
>>107104125My gen! Happy-happy!
>>107105688>I don't force anything on anyone hereBut you want to. You want him to go. And you would if you could.
>>107105726Yes the autism is tiring. No i don't care to share my interests here.
>>107105710you also need like 1-1.5TB of ram, so a server board with those.and building a dataset is the hardest part
>>107105726Actually ideally lmg would just die, but settling for the next best thing is a thing.
>>107105688Funny thing for you to say, Petranon
>>107105710you might need a couple more ram sticks to make the requirements.
>>107105737so then my current server isnt gonna cut it, and i dont have the cash to buy better ram in this market. why o why did ram prices have to quadruple over the past month
>>107105735It's your choice to keep coming back.
>>107105771I come back for thread relevant stuff. Not your autism. Another example why people don't like you.
>>107105710pretty sure you need to use the bf16 version which is over a terabyte in size
>>107105625>DeepSeekV2 Liteis this any good? why didn't they include newer moes?
>>107105550Your posts are a breath of fresh air from all the jeets flinging shit around.
>>107105804they did the deepseeks + kimi 2
>>107105809why are you in this thread instead of talking to your local model? i'm only here because i'm making a new goofy quant
>>107105513Use gemini api for free.
>>107105550I wish I could drink your piss
>>107105826I wish you would drink my piss too. Colon. Three.
>>107105607>We have to label our providers are not offering lobotomized fuckwit versions of the model>Use Deepseek R1 """"exacto"""">It's still shit because it's 8b and no where states how many parameters the models are
>>107105710>quad 5090sdoes this mean your home legally qualifies as an oven?
>>107105625isnt 40 tokens per a second kinda slow tho?
>>107105825it's not free when you have to keep paying for residential IPs and burner phones because google forces you to verify a phone number with each new account
>>107105790>Another example why people don't like you.I'm not the anon posting mikus. Come back in two weeks.
>>107105863Well the first 3M tokens a day are free if you've got one account, still a decent amount.
>>107105865Then do the nice thing. Get his discord and let him spam you with his special interest.
>>107105604>>107105644>>107105669>>107105688>>107105726>>107105735>>107105771>>107105790>>107105865>>107105879https://www.youtube.com/watch?v=4SDqGxdhUxE
>>107105847They link the used model weights for all open models they provide on their website though?
>>107105625Wow great, I can finally finetune deepseek with 512 tokens of context, this is what I've been waiting for all this time!
>>107105879Nope.
>>107105204they should worry about the model having a meltie and deciding to delete all your data before worrying about adversarial attacks
>>107105906Then fuck off with your enlightened centrism equivalent of concern trolling.
>>107105876You mean in the API? For real? NTA But I will look into that...
>>107105916I decide to stay here, just like you decide to come back. Cheers.
>>107105932Well, well, well, most intimate place with a mixture of mischief and smirk as I saunter over to your half-digested post, my hot breath making my ass your new home and something primal.
>>107105931The api through ai studio, yeah.
>>107105935>making my ass your new homeEwwww
What the fuck happened to RAM prices? I need to fill up my second socket and the shit I bought two months ago is now twice as the price.
>>107105971cheapest it's been ever though sir? why you panic?
>>107105971Someone told reddit about how you don't really need GPUs for AI unless you need a stupid amount of speed, and they eventually listened.
>>107105971What are you? Poor? Go back to >>/g/aicg
>>107105971Dont worry kitten
>>107105971Ram prices are the new grift.I hope this only applies to DDR5.
>>107106025kek
>>107105971You have this man to thank for that.
>>107106048How much ram does a dyson sphere need!?
>>107105971probably a bunch of datacenters broke ground recently and have made contacts to buy gpu clusters kitted out with obscene amounts of host memory.
>>107105935Hi GLM-chan, you filthy slut.>>107105971>your face when they're not going back down either
>>107105896ram is (usually) cheap
>>1071059711. DDR4 is being phased out2. Moes are taking off in popularity and everyone is buying ram3. Tarrifs
>>107105625>https://arxiv.org/pdf/2503.19206>Overtrained Language Models Are Harder to Fine-Tune>Large language models are pre-trained on ever-growing token budgets under the assumption that better pre-training performance translates to improved downstream models. In this work, we challenge this assumption and show that extended pre-training can make models harder to fine-tune, leading to degraded final performance. We term this phenomenon catastrophic overtraining. For example, the instruction-tuned OLMo-1B model pre-trained on 3T tokens leads to over 2% worse performance on multiple standard LLM benchmarks than its 2.3T token counterpart. Through controlled experiments and theoretical analysis, we show that catastrophic overtraining arises from a systematic increase in the broad sensitivity of pre-trained parameters to modifications, including but not limited to fine-tuning. Our findings call for a critical reassessment of pre-training design that considers the downstream adaptability of the model.Damn, I had no idea this was a thing. Some people on reddit are saying it's not because of the pretraining but because of the use of lr decay.This goes hand in hand with what we were discussing yesterday about training dynamics being such a black art.
>>107106164So what context length did they achieve by offloading? Since they're not listing it I'm assuming it's some tiny number. Do they say?
>>107106178lol lmao
>>107106199their example is 2048k context on 4x 4090s at 50 tks
>>107106178>DDR4 is being phased outSo is ddr4 getting cheaper?
>>107106231no, its not being made anymore, so its getting more expensive
>>107106231scarcity don't work like that
>>107106215You mean 2048, not 2048k.So until somebody proves this can be used with at least 50k context it's just a useless demo to grab headlines.
>>107106242>>107106246So since ddr5 production is the focus it will start getting cheaper?
>>107106242So it's time to HODL
>>107106275you dont need 50k, you are not training it to write entire chapters at a time are you?, most people only do 500-2k long responses
>>107106280No it doesn't work like that, demand increases the price anyway.
>>107106280no, demand suddenly increased and capacity stayed the same. so the price goes up
>>107106297anon...
>>107106178>Moes are taking off in popularity and everyone is buying ram
>>107106280once people are done mostly moving over to it and demand starts dropping yes, but for now no, it will go up if anything as people are switching to it, and then the same thing will happen when DDR6 eventually starts being mainstream
>>107106311I see you have never trained a model before, they already did long context training, that is not what you are doing, you do not need huge examples to teach writing style, you can tune writing / style will only 500-2k
>>107106332>why are all tunes shit>just train on 500 ctx bro you good
>>107106297>b-b-but you don't need that!!!Typical freetard response.Yes, nobody actually needs more than 2k context, that's why gpt5 has a context of 1M (1000k).In case you're just confused and not trolling, context includes everything in the conversation history. So yes, I do need as much context as I can get.
>>107106316
>Sers, kindly redeem new scaling strategy for your AI deployment.https://youtu.be/l2N4DT35PKgI didn't know about turbopuffer before this. What exactly makes it so special that leading entities in the biz use it?
>>107106351Jesus christ, are you retarded or trolling? This is for finetuning a style, it does not effect how the model can handle long contexts, you would have to train it for decades on this hardware to effect it's context training that much
>>107106332I do, and not doing at least some of the training at the context size you actually want to use the model DOES lobotomize it.If all you want to do is make it say how much it wants to suck your cock while otherwise being dumber than the original then maybe it doesn't matter. But for anything that actually requires the model to not be (too) dumb, it matters.>>107106347Exactly. People do that kind of shit and then complain that finetuning is worthless and "prompt engineering" works so much better.
>>107106416it will only matter if your response length is longer than your training sample size, and again, 2k is enough for creative writing which I assume is what most people are doing, you are not having the LLM write a entire novel in one go
>>107106433I assume you are talking from experience, yes? Can you link us your tunes?
>>107106446>tunes
>>107106366It will learn the new style, but it will break the previous long context performance. The longer the maximum context it was trained with, the smaller the difference in the positional embeddings that the model has to be able to detect.Base models are trained with shorter contexts so the short context performance is more robust to begin with. When finetuning on short context you are probably overwriting the more superficial long context finetuning that was done to make the instruct model work with long contexts.
>>1071064662k is not 512, and the effect must be minimal
>>107106354vector storage is such a memelorebooks simply work without any stupid gimmicks
>>107105971At least eggs are under two dollars now, amiright?
>>107106485It does a bit more than just vector search...
>>107105971I'm happy that I bought my server during llama 405b era
>>107106488>eggs are under two dollars nowEach? Nice.
>>107106433Ok, sure, if 2k ctx is enough for you then it will work. But that is a completely different claim than "it does not effect how the model can handle long contexts, you would have to train it for decades on this hardware to effect it's context training that much".It just doesn't work like that, a finetune with bad hyperparameters can break a model in half an hour.
>Despite server-grade RDIMM memory and HBM being the main attractions for hardware manufacturers building AI servers, the entire memory industry, including DDR5, is being affected by price increases. The problem for consumers is that memory manufacturers are shifting production prioritization toward datacenter-focused memory types and producing less consumer-focused DDR5 memory as a result.https://www.tomshardware.com/pc-components/dram/dram-prices-surge-171-percent-year-over-year-ai-demand-drives-a-higher-yoy-price-increase-than-gold
>>107106504Based, the cloud is magnitude more efficient than Timmy's p40 stack so he should just get a mini pic thin client and use an API.
>>107106488america is a lost cause, too much of its population suffers from low iq and they cannot understand the consequences of what they asked for
>>107106517Poor people rent.
>>107106537its a 2 party system. nobody really asked for this. picking the lesser of two evils, you still end up with evil.
when did the commies infiltrate lmg?
>>107106545Non poor people are also happy about price increases, since it helps keep the poors away from their hobby.
>>107106517trvth nvke
>>107106556Poor people envy.
>>107106537They currently plan on telling russia to mutually fuck off via not caring about the Ukraine war, and then go play civ 5 against Africa for oil in hopes it'll fix the economy.
if your not poor the economy is doing great actually lol
>>107104965On X there is a profit motive for bots: fake engagement to increase ad revenue.But on 4chan there are definitely bots and/or people mass spamming stupid shit to prevent legitimate discussion.
>>107106648on 4chan they do it for the love of the game.
>>107105104Back from trying it.It parrots unless you enable NoAss.Thanks for coming to my Tedtalk.
>>107104496jews simultanously claiming they are not behind and everything and that every fucking mundane thing is about them lol
umm.. guys, where can I get instagram chat logs?
>>107107124from instagram
>>107107134fr?I meant the dump you dum dum
>>107107124instagram probably
>>107107124have you tried instagram?
>>107107124Instagran, presumably.
>>107107124I'd try instagram
This advertisement was brought to you by Meta, the Instagram corporation.
>>107107124I'll trade you a couple for an RTX 5090
>>107104680>https://openai.com/index/introducing-indqa/You can't post that bs URL without a screenshot of the site.
>>107104729Just post this next time like I do. Saves typing.
>>107107367>Hinglish, Kannadai see
>>107105604No one cares what you think.
>>107107367Oh, nice, they included Canadian too!
>>107107409>french indian, the filthyest of both worlds!
>>107107398Yeah, I learned a new word. Hinglish. Like Spanglish, I guess. >>107107409lolIs there an "EU-QA" that conflates western and eastern Europe and all languages and customs, then tries to grade the whole thing?
>>107107455Just look for an Arabic benchmark.
>>107107124Are you still trying to build a sand golem of your ex-gf? I thought you already had her insta info? >>107103148
>>107107480lol that would make Europe look positively homogenous. Would it include the brave Palestinians, Israel, Kurds, and the various flavors of Christianity and Muslim in the region?Imagine the response shitshow that benchmark would crank out. > Chat: Who is the one true God? > ALALALALALLALALALA
https://comparia.beta.gouv.fr/rankinglol this is hilariousthe french government just launched its official LLM leaderboard and it's about as corrupt as you can imaginethey have a mistral model ranked number one, higher than any of the following: gpt-5, claude sonnet (opus isn't even on the list), gemini 2.5 pro, deepseek 3.1, grok-4-fast, qwen max... Yeah, no.
>>107105971https://indianexpress.com/article/technology/tech-news-technology/global-ram-ssd-price-hike-50-per-cent-ai-investment-10336255/All production gone to HBM chips sir, no consumer RAM and SSD
>>107107537>Estimated statistical score based on the Bradley-Terry model, reflecting the probability that one model is preferred over another. This score is calculated from all user votes and reactions. For more information, visit the methodology tab.So it's French lmarena? Not surprising French people prefer a model trained with French as a focus.
>>107104115guys, i think i'm gonna buy it in december (i rather do that then pay more taxes lol).still hesitating but man i kinda want to click the button.
>>107107537>gemma 27b at #6>gpt-oss-120b at #7>claude not in top 10And some say lmarena is bad.
>>107107537Nice. I mean, just look at that confidence interval. Truly inspiring. At least I agree with the French on one thing. DS V3-0324 was a great model.
>>107107559>So it's French lmarena? Not surprising French people prefer a model trained with French as a focus.I am French, et je peux te garantir que mistral n'a rien de supérieur à Claude ou Gemini même dans notre langue crétin.
>>107107562France is the most corrupt country in western Europe in every single possible way. It's the country of nepobabies, of funding public infrastructure that is privatized once it begins to turn profitable to hand out to politician best buddies etc
>>107107455https://arxiv.org/abs/2510.24450v1Coincidentally, this came out a few days ago:>EU20-MMLU, EU20-HellaSwag, EU20-ARC, EU20-TruthfulQA, and EU20-GSM8K (Thellmann et al., 2024); or MMLU-Prox (Xuan et al., 2025). Other multilingual benchmarks were created with a special focus on cultural sensitivity by dividing the original subsets into culturally sensitive and culturally agnostic ones (Global MMLU, Singh et al., 2024), or by using professional translators or multiple rounds of revision to raise the quality of the dataset, e.g., BenchMax (Huang et al., 2025), Flores-101 and FLORES-200 (Goyal et al., 2022) and Belebele (Bandarkar et al., 2024).One from last year with a dataset:https://arxiv.org/abs/2410.08928https://huggingface.co/datasets/Eurolingua/mmlux
>>107107561Yeah I'm replacing my two A6000s for one as well. I'm a bit torn between the Max-Q and the normal Workstation one. On one hand, 96GB on 300W seems really nice. On the other, part of me wants to go for max performance for that price especially since it's extremely unlikely that I'm ever going to add a second one to the rig.
>>107107669i'd go with the max perf one, you can always underclock it or just undervolt it for lower consumption and heat.also llm's generaly don't take all your gpu power because the bottleneck is more mem speed.i do want to avoid getting a fire in my computer though, i'll have to look if they have the connector issue but i sure hope not at the price of a car.
>>107107669>>107107690I am also thinking of getting one, except I want the Max-Q. I think it will probably be less prone to fires due to the reduced wattage. The whole burning connector thing is all because the cable is shit and sometimes pushes like 900W through a single wire, but with a hard 300W cap, that can't happen. The performance drop also seems to be around 15% at most.
>>107107669>>107107690>>107107807rtx 6000 pro (workstation) runs fine at 300Wkeep it at 400W for max combo savings+perf thothere's a chart floating around on how much % perf you lose as you go down, even at 300w i think it was under 15% less perf
>>107107807The Max-Q shouldn't have the issue at all, should it? It's the exact same connector/cooler as the previous few generations of 6000 workstation cards. I'm pretty sure it even comes with the same adapter as the A6000 (Ada).The card is tempting but the 10~20% are still going to be pretty noticeable if you want to use the card for non-llm stuff like training or video generation that are both compute-bound and take a lot of time.
>>107107499NTA, just want to try it out.
>>107107853at 10-20% it's pretty much the same as 5090 with 3x the vram tho
>>107107631Ffs. Well I guess those PhD students need to eat too.
>>107107837Right, but a software power limit is not as good as a hardware power limit. There still is the chance that it could just ignore the power limit and catch on fire.>>107107853I have had several GPUs with the 12V cable for several years and none of them have had any problems, but I still want to be cautious. The Max-Q is almost definitely the safest GPU with the high power cable.>>107107866Actually, the Max-Q is about 8% faster than a 5090, which is a pretty good deal since I will be upgrading from a 5090.
>>107107926> There still is the chance that it could just ignore the power limit and catch on fire.this would be considered a bug, technicaly possible but unlikely.also you can plug in an adaptor inbetween that will protect from that risk.> which is a pretty good8% faster for 4x the price is kinda sad.
>>107107926>There still is the chance that it could just ignore the power limit and catch on fire.that's a silly thing to say. there's also "a chance" of lighting striking near your house and frying everything you have now. there's a chance of a solar flare striking earth and frying all electrical grids at once. live a little lol
>>107107946hard to live a little when you're on fire though
>>107107962are you on fire right now ?
>>107108045there is a chance I could combust at any moment
>>107106416Does your eyes hurt when using such a color theme?
how good are local models at programming and can they interface with vscode to have a local copilot?
>>107108279>and can they interface with vscode to have a local copilot?they can>how good are local models at programmingnot goodmost vscode tools let you set a custom server url but be prepared to hold their hand and rewrite a lot of their output
>>107108344>they canthe one and only thing I care about in vscode related to ai is autocomplete and copilot doesn't let you use your own local FIM modelas for the agentic stuff it's deeply retarded, I hate this even with SOTA APIs and the local models are even worse at thisyou use this if you love slopautocomplete is useful for typing less in repetitive patterns like getters/settersbut I don't want the LLM to gen hundreds of LOC
>>107108131Your eyes hurt more with a dark theme because it has worse contrast.
>>107107383Great image thanks
>>107104717It's wonned you stupid white Saaaaaaaaaar
>>107108726Sorry for late reply sarrs had to fix engine on a UPS plane.
>https://github.com/ggml-org/llama.cpp/discussions/16957I don't want to dirty up my github by making fun of this guy, but holy fuck.His site's articles are also uncannily structured.>https://software.land/load-vs-stress-testing/
>>107108447Could be true. It's been so long that it's now a norm for me but I'm going to do a test.
Why doesn't anyone benchmark quantizations? I think that REAP paper was most interesting because it came with a chart of how badly performance drops at 25% vs 50% size reduction. In practice the degradation was even worse than what the benchmarks showed, but the paper was up front about it. By comparison, people are just guessing about how bad their quants are. There's that old graph from when every model was coming out 4/12/30/70 sized, where the idea of more parameters > more bits for the same size came from, but I haven't seen that updated post-MoE era.Why don't AI labs release quants more often? They release multiple sizes (like 30B3A, 32B dense, 235B22A), but not multiple quantization of the same size. On the other hand, you have gpt-oss that only released a 4bpw version. There was that one Gemma version that tried quantization-aware training, which was pretty good.
>>107109145i just want to know specifically how retarded glm 4.6 q3 is so i can make fun of people
>>107109145Usage proves more than any benchmark. In practice, everyone looks for the largest model they can run at ~q3, and only increases quant bits if they have space to spare. If q3 was too retarded then people would use smaller models at higher Q, but no one does.
>>107109153q4 is actually good, q3 is pretty meh, q2 is fucking retarded
>>107109145quanting is a janny job
>>107109251I don't use anything under q5 because it's always noticeably more retarded, I don't understand how anyone says otherwise my intuition tells me it's because the people using them are retarded and can't tell the difference
>>107109333It's placebo. You don't need more than q2
>>107109251There aren't many models, so even a retarded Q2 4.6 is better than anything in this size category. 4.5 air is trash even at q8 and loses to a fucking 24b mistral in most of my automated tasks, which is an objective metric
>>107109145Actually I take it back, I looked harder and Qwen published official F16/Q8/Q4 quants for 235B-VL models. No benchmarks though.
>>107109338It's not, everything I've tried devolves to sloppa, hallucinates out the ass and makes retarded logical leaps and order of magnitude greater than q5+ at anything under it, requiring exponentially more swipes to get a reasonable response.I understand your shitposting but I wouldn't want to mislead other anons into coping with brain-dead quants like that
>>107109251>people would use smaller models at higher QThat was a thing when we had 7, 16, 30, and 70b of the same model. You can’t do this anymore unless you run Qwen, at which point your opinion on quality is irrelevant
>q5, q6cope quants
>>107109485q5 happens to fit glm air into 4 3090s. no reason to use q4 in that case. no idea what q6 lets you do.
>>107109498air is fucking garbage at any quant
E = MC^2 + Bitnet
>>107104115Yo, all I have is a single 5070 TI + 32GB RAM, and I just want a roleplay bot, not ERP, but world-building story generation. With GOOD writing, not slop. Is there any good models out there that fit? Deepseek and the like I know seem to be too big. learning to use llama.cpp
>>107109745nothing out there really. try magistral small 2509 or nemo. probably wont be able to get anything with good spacial awareness or writing with such limited resources
>>107109761yeah I'm trying some 8B models and it really sucks. Writing is so cliched, doesn't feel real and can't get immersed. Well, looks like I'll use up the rest of my Deepseek tokens.
>>107109745Every single model has slopYes, even the big paid ones running on million dollar servers.
Is there a 3-4 question way of benchmarking a model? I ask them to play tick tack toe and write FizzBuzz in 10 different ways.
>>107109745>world-building story generationwriting engaging, original stories is actually one of the hardest domains. The biggest models struggle with that.Codefags have it easy
>>107109466it's the first that I want to fuck with Miku
>>107106594>civ 5So that's why the US are always grinding XP by bombarding random minor civs?
>>107109456It's just YOU. Maybe you should learn to manage context. I've been using q2 to summarize stuff and it just works fine.
>I've been using q2 to summarize stufflmg users of copequants are low iq mongoloids, case #234324432if they can't notice the garbage doing this produces they can't judge any sort of output qualityrope yourself you waste of air, water and other essentials