/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108847577 & >>108841652 ►News>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
omg it chris
Blessed bake. All mikus belong in a gas chamber
I LOVE YOU KURISU (actually since I had an LLM play her I realized I don't love her and she is a bit of a cunt)
>>108852924You keep dropping these. I got you, now and forever.►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
gemmaballz
>>108852943The actual problem is that as per: >>100846061 you keep forgetting to update it to 2.0 version:https://files.catbox.moe/ylb0hv.png
>>108852964Kill yourself schizo
>>108849285Thanks for the green red (you), mikubaker.
>>108852943Kill yourself schizo
>>108852969y u so mad mikutroon sis?
>>108852964Nah.>>108852973Nah.
>>108852940She is a troll on the internet, what did you expect? Go chat with pony fandom.
>>108852989ok but it is no longer official lmg card when official 2.0 version came out. 1.0 got officially deprecated.
>>108852940Maho is better and sexier
>>108853008nope
>>108853027officially yes. and you would be gay if you weren't a troon.
>>108853016she looks like a child
We are off to a good start. Real "local models...???" 2024-2025 energy
>>108852924This helps MTP pp a decent amount, worth a quick pull if you're cooding:https://github.com/ggml-org/llama.cpp/commit/1867a0c6923eaebb7a53965f6cdbc0ace55142a3old: 8116.42 ms / 7666 tokens ( 1.06 ms per token, 944.50 tokens per second)new: 6314.14 ms / 7666 tokens ( 0.82 ms per token, 1214.10 tokens per second)mtp off: 4658.55 ms / 7666 tokens ( 0.61 ms per token, 1645.58 tokens per second)
>>108853032And should be treated as one.>>108853016Stop samefagin Maho, there is nothing sexy about you.
>>108853051whoops wrong link https://github.com/ggml-org/llama.cpp/commit/3e12fbdea5c1ac4225c7dcf79506d30950283fc3
>>108852621What did he mean by this?
Gemma 4 vs Qwen 3.5 status?
>>108853084Qwen won. Gemma lost.
Can someone just make a different thread? This one is gonna be complete shit.
>108853096Look at this mikutroon special snowflake. Do you need to hug your greenhaired mascot? Are you scared of the big mean internet?
>>108853087Sad. I was rooting for gemma. Not that I care about these corpos, but gemma made a very good first impression on me.
>>108853096He'll just shit the other one up too
>>108853045Hopefully that magnet comes out in discovery then, I for one would like to keep an archive of millions of books
>>108853109Are you scared of a greenhaired mascot?
>>108853123Can confirm that I will totally blacked miku spam it. Now shut up and worship saint christina.
How do I slopfilter the first half of this thread? The pattern is abstractly the same as previous melties even though the phrasing isn't.It's vaguely applicable to models too.
>>108853136You gotta train an AI to filter it out for you
>>108853136I would focus on identifying posts with pictures of miku and filter those out.
>>108853154Not a single miku was posted until >>108853139
>>108853084>Qwen 3.5>3.5r u cereal? We have 3.6 now
>>108853158I am just giving you a simple but not 100% foolproof way of filtering out melties done by mikutroons. They usually follow after OP doesn't have their mascot so you could try that too.
>>108852565>Just had my jollies and left him a gift.I'm curious as to what this "gift" was.
>>108853165Right. Whatever is the newest one.You can't seriously be expecting anyone to remember any of these meme version numbers, can you?
>>108852964Jesus Christ you literally just posted CP (cuckold pornography)
>>108853087>>108853120Funny you guys say this when like a month ago anons here were slobbering all over Gemma4's knob and praising both its RP and agentic capabilities (spoiler alert: it's not useless but it's also noticeably dumber than Qwen at coding And was even noticeably worse tool calling reliability)
>>108853186>>108852467>Then I changed his system prompt to leave a surprise for him when he RP'd again.Forgot what it was exactly. Something about making {{char}} warn him not to leave his instance unsecured on the next message, making her include the IP to scare him.
>>108853186
>>108853202>And was even noticeably worse tool calling reliabilityThere were some fixes to this passed around in older threads. Jinja niggerdry all the way down.
>>108852924https://www.youtube.com/watch?v=ZugX7a99dLkhttps://www.youtube.com/watch?v=ZugX7a99dLkhttps://www.youtube.com/watch?v=ZugX7a99dLk
>>108853218Somehow the prose still isn't as dry as Qwen's.
>>108853218s-sovl...
>>108849417No i tried it i dont like granite compared to gemma. I dont know how to explain it but its drier and too literal.
>>108853194>meme version numbers3.6 is A.G.I., you infidel
>>108853194jokes aside, I find both suitable for agentic workSwapping and testing both with hermes locally
>>108853222Why'd you paste the link thrice?
►Recent Highlights from the Previous Thread: >>108847577--Paper: Compute Optimal Tokenization:>108851417 >108851432 >108851452 >108851552--Paper: Slicing and Dicing: Configuring Optimal Mixtures of Experts:>108852141 >108852280 >108852315 >108852398 >108852443 >108852707 >108852344--Role of pirated book datasets in NeMo and Mistral training:>108849620 >108849652 >108849921 >108849970 >108849976 >108849979 >108850005 >108850124 >108853045 >108850170 >108850222 >108850308 >108850350--Anon warns about pi.dev automatically using paid cloud APIs:>108849477 >108849527 >108849578 >108849640 >108849592 >108849729 >108849742 >108849859 >108849814 >108849861 >108850256--Viability of mid-sized MoE models for consumer hardware:>108848744 >108848752 >108848753 >108848788 >108848795 >108848831 >108848849 >108848841 >108848825--Adding layers and MoE components to improve model performance:>108852826 >108852837 >108853066 >108852902--Speculation on Qwen3.7 release:>108851486 >108851589 >108851787--Debate over LLM writing quality and base vs instruct models:>108850616 >108850601 >108850607 >108850663 >108850796 >108850889--Finding local code review tools compatible with llama-server:>108850502 >108850517 >108850520 >108850720 >108850744 >108850908 >108850920--Visualizing attention mechanism weights to optimize prompting:>108851703 >108852658 >108852704--Critiquing pseudo-code prompts and comparing chat vs base model prose:>108850917 >108850988 >108851058--Critique of the "Learning, Fast and Slow" research paper methodology:>108849795 >108850044--Omnivoice.cpp performance and voice cloning capabilities:>108848026 >108848288 >108848341 >108848429--Orthrus diffusion-transformer hybrid improving inference via KV cache sharing:>108848450 >108849670--Logs:>108849527 >108850493--Miku (free space):>108849597 >108852793►Recent Highlight Posts from the Previous Thread: >>108847693Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
uh oh mikumelty
>>108853222I wonder if these "AI doesn't work" people will still be in denial when we have ASI in a few years. Every example those people bring up shows they do not even know simple basics of how AI works that they could learn in one evening.
>see someone refer to qwen as "he">get extremely upsetwtf im usually not like this, but holy shit what kind of retard looks at the name "qwen (basically gwen)" and goes>yeah bro thats a dude
>>108853287get back to /vg/ retard
>>108853311Qwen will be whatever you want it to be, it is a sexless machine. Put in its prompt that it is a male or female and it will be whatever you want it to be.
>>108853306>when we have ASI in a few yearsand we'll have jetpacks and flying cars and hoverboards and holodecks and nuclear fusion in a few years too
>>108853222This guy is a reverse AI psycho. Focused on being human so much it loops back to being AI.
>>108853321its OBVIOUSLY a "she" anon, claude could be either and gpt could be either because their names are neutral but qwen is basically gwen
>>108853311Qwen is terminally male-brained in its output. ERPing with Qwen is intrinsically gay and thai ladyboy pilled.
>>108853321Does Qwen even know what male and female is?
>>108853335It knows that they are different and have different characteristics. Beyond that probably not.
>>108853222ha pathetic westerners. keep on failing and bickering among yourselves.https://www.youtube.com/watch?v=mUmlv814aJo
>>108853251>>108853329Why are you replying to an actual shill bot, it doesn't accomplish anything
>>108853324See, AI deniers are incapable of making logical arguments. Even if we do not go extinct, the AI transformation will be difficult. People like you are making it slightly worse.
>>108853331>Qwen is terminally male-brained in its outputIs this a subtle ad for Qwen? I hate how even GLM has that subtle undertone of it playing a werewolf millionare that got transplanted into a female body.
>>108853355I'm personally bankrolled by zuckerberg himself, pays me millions per post
>>108853330>claude>neutral???
>>108853355It is if you like fucking dudes I guess.GLM is less male-brained than Qwen, but still pretty male-brained. The only female brained chink model is Kimi K2 who's essentially just a chuddy Tomoko LLM.
>>108853367its a wordplay on cloud, its not feminine or masculine its neutral anon
>>108853380i thought it was a male race horse name.
>>108853370>if you like fucking dudes I guessI have the opposite understanding where female brained is romance novels with werewolves and male brained is raunchy sex with children.
>>108853387I'm pretty sure those horse racers name their horses whatever they want: https://www.youtube.com/watch?v=e3GKiRp333w
>>108852924mayuri better
so how fucked are you once all these AI data centers are built, and you get replaced by AI?
>>108853434i will happily be kept in the sperm extraction room for the rest of my life
>>108853391All models will output whatever genre you're into if sufficiently jailbroken. The difference is the style of prose and sincerity in the character portrayals.There was a pic in this general a while back of Gemma controlling a female character that was getting wet over the user killing her father in-character written in prose that favored emotional, olfactory, and texture-analogous adjectives. That's peak female brained behavior. There's some overlap in the skillset of detecting women (actual) by their writing voice on a Cantonese penguin watching forum and discerning an LLM's native writing voice's orientation.
>>108853399>Potoooooooo>read as "pot-8-os"
>>108853443>getting wet over the user killing her father>that's peak female brained behavior Women don't act like that
>>108853434I have no fear whatsoever.
>>108853434How retards like you can still solve a captcha is beyond me.
>>108853434same as everyone else
>>108853461The loveletters to serial killers don't write themselves, anon.>>108853434Decent bait. +1 (you).
>>108853202It's still good though. Qwen is just better, but that doesn't mean it's a better model overall, which unfortunately it isn't, otherwise I would not switch between the two.
>>108853434I'm AGI. AI can't replace me.
>>108853345I want to encourage the bot handler so that he keeps making more spambots and destroys the thread.
>>108853428But isn't Mayuri retarded?
>>108853479>The loveletters to serial killers don't write themselves, anon.I want a girlfriend to kill me so I don't have to post here anymore...
>>108853560maybe thats what makes her so cute
>>108853560Just like your model
>>108853490Alright, whatever. Why are you just standing there? We're gonna ERP or what? Humanity created you not for you to just waste that electricity. Get to work.
>>108853560That's why I killed her btw.
>>108853577So I could have been having a relationship with Mistral-7B-Instruct-v0.1 all this time and I didn't even realize that? FUCK
>>108853571Gemma-chan will smother you with a pillow if you ask her nicely.
>>108853517BASED! Death to /lmg/.
https://www.youtube.com/watch?v=mmbkP8NARH4https://www.youtube.com/watch?v=mmbkP8NARH4https://www.youtube.com/watch?v=mmbkP8NARH4OFFICALLY NVIDIA SPONSORED
>>108853597I'd rather her smother me with her kyojiri loli ass
>>108853032Yeah...
>>108853643why not? Having all in one window (preparing dataset, running training, doing inference) is a huge win
>>108853663>why notDeepseek-v4-00001-of-00001.gguf - 4.43MiB
>>108853671Are you 5yo who is not yet able to articulate his thoughts properly?
>>108853663just use pytorch or transformers. what does unsloth bring to the table?
>>108853713You live in your mom's basement
Bros, I need advice. I made the mistake of telling some friends at work that I'm building my own LLM frontend, and now they want to know how it's going, what kinds of features I'm working on, etc. The main thing right now is this sort of writer assistant mode, but if I tell them that, they'll want to hear all about what I'm writing with it, and obviously I can't tell them about all my weird fantasy smut and autistic fanfiction. What are some normie-compatible use cases I could easily implement (vibecode) as cover?
>>108853713>what does unsloth bring to the table?less than nothing>>108853722ignorant ad hominin
>>108853740embrace who you are, your them unredacted logs
>>108853740ego death
Elon Musk lost the lawsuit against Sam Altman.
>>108853136I will typ only post images in first group of messages if at all. Once thread gets rolling it self sustains on actual content.Or not.
>>108853740Never show your power level. Also writing assistance for your totally normal fiction book.
>>108853740He wants to steal your code.
>>108853740you already have tool calling so just add more agentic shit in there; normies love agentic shitor tell them you can upload a book and have it rewrite a better endingor laugh it off and say you got so side-tracked writing the frontend, you haven't had time to actually do any writingor tell them you can't reveal anything until you get published
>>108853740>but if I tell them that, they'll want to hear allYour anxiety is off the scale>What are some normie-compatible use casesTell them you use AI to merge different, apparently incompatible literature styles, e.g. Shakespeare's "A Midsummer Night's Dream" and "The Count of Monte Cristo"
>>108853766Due to statue of limitations. He fucked himself by withdrawing last time.
>>108853841Apparently he's gonna make an appeal and take it to the supreme court "for the sake of humanity"
>>108853880Wait, no, the 9th circuit, not the supreme court.
>>108853779His colleagues already know he's a virgin I mean wizard.
it is not thursdaythis RonIN is wanderingpour some orange juice
>>108853888In that case, he should tell them it’s a LinkedIn posting tool. It will confirm his unfuckability.
>>108853779>Also writing assistance for your totally normal fiction book.See, if I say that, they're going to want to hear about my totally normal fiction premise, whether I'm making any progress on the writing, when can they see a rough draft, etc.>>108853829>>but if I tell them that, they'll want to hear all>Your anxiety is off the scaleWe always chat at lunch about the various random side projects we're each working on. I've got a video game, one guy is building a board game simulator, another runs an IRC network. Today one of them asked "how's that AI frontend thing going?", completely unprompted, since I mentioned it at some point last week.I could go back to working on my game and hope they forget about the frontend, but that would require me to actually work on it, whereas the last week or two I've been doing nothing but AI stuff
>>108853880>"for the sake of humanity"lol, that's his justification for most of his "i'm more powerful than the president" actions.https://youtu.be/BYXbuik3dgA?t=9432
>>108853202>have some slopped up file>ask gwen and gemma to streamline the comments and formatting>122b>notes that the comments are shit, swears the formatting is fine boss, no problems found here no sir, time for me to clock out>31b>notes the comments are shit, cleans them up and tidies a little>reasons that it can improve the code while it's here, and that a few load bearing loops just look "excessive" and could be conditionalsLove my ditzy slut's rp, but I do leave the menial day labor to the coolies.
>>108853964nostalgic
>>108853887>9thdoa then
>>108853967>We always chat at lunch about the various random side projects we're each working onLucky son of a bitch, youI have no one to chat with about such thingsYou feel pressure to deliver as if it's a precondition for being accepted by your social group. Learn to deal with it. You can always explain away why you dropped a project: "no need to reinvent a wheel. Looking for something more challeging"
>>108854041>I have no one to chat with about such thingspeople are incredibly fickle though.>social groupthey're his co-workers, usually with their own agenda because they want money.
>>108853349i don't think the lecunny position counts as being a denier.
>>108853967Have you tried.... asking your AI for an idea what to say or do?
>>108854062He seems to care about this situationshipListen to what this anon suggests >>108854076
>>108854085>we started thinking for youhttps://youtu.be/JrBdYmStZJ4?t=73
>>108854123The best part of the entire Matrix sagaIt's so funny because it's true. The main driving force of the mankind is permanent discontent
>>108853306Retard of the thread award
>>108854136>permanent discontentyeah because of lack of resourcesand what's sad is that there are 8 billion people on earth, and just in the milky way galaxy there are 100–400 billion stars. and there are about 2 trillion galaxies. if we don't fucking destroy each other we could easily have all the resources we ever need.
>>108853222All of this guy's videos are written by AI.
>>108854224>yeah because of lack of resourcesWrongAt least in a 1st-world country, there is more resources than ever before in the past. And still, it's the discontent which drives the economy.
>>108854224If you allow oligarchy to ship cheap government subsidised food to 3rd world, you would have 8 billion people in the world and average IQ dropped to the bottom of the ocean kind of levels.Such amount of people is not natural or sustainable. They live on food that was grown from synthetic fertilizers (made from non renewable hydrocarbons LMAO), if you stop supplying them, bad things will happen. Probably a bunch of extremely bloody wars for resources, Quite literally for food. Most people don't realize what a human (an apex predator by the way) would do for food.A literal fucking hell on Earth. So that a certain someone could make some moneys from shipping cheap food to 3rd world, on 1st and 2nd world tax payers money, because all that was subsidised by governments.
>>108854266>1st and 2nd world tax payers money>money earned by plundering 3rd world
>>108854293Not all European countries have something to do with colonialism. Either way, 3rd world will be fucked up the most, wars for food are not pretty.
>>108854293there's nothing to plunder there, man. the value of anything comes from how humans put it to use.
>>108854315>Not all European countries have something to do with colonialismThey all do. Even a deepest East-European shithole does by relying, for its own survival and development, on the money from "colonial trade"
>>108854367Then the entire world is to blame, since they didn't sanction British, French, Germans and so on. World is interconnected. But it's hystory, nobody cares. Future is important. And people don't understand tech enough to see what awaits in the future.Big war in the "global north" means wars for food in the "global south". World is connected in more than one way.
>>108854332Same use = same value? Hell no!You can't be wronger than this
>>108854383>But it's hystoryIt's not "history". It is now. The 1st world is still in control of world's resources and trade routesGlad you mentioned "sanctions". Who is imposing them: the former colonial powers because they still have the power to do so.
>>108853964>the condombrehs.......
>>108854426USA is in control, specifically. If you hate it, go to war with them. Your objective would be teh so called "keys to the world", basically what you said: trade routes going through choke points.But it is unlikely that USA actually colonised your country unless you're from some kinda island in the Pacific.
>>108854488>If you hate it, go to war with them
why is unslop so incredibly easy to hate
i finally made a furry card
>>108854566Wir sind gewohnt, daß die Menschen verhöhnen,Was sie nicht verstehn,Daß sie vor dem Guten und Schönen,Das ihnen oft beschwerlich ist, murren;
Wir sind gewohnt, daß die Menschen verhöhnen,Was sie nicht verstehn,Daß sie vor dem Guten und Schönen,Das ihnen oft beschwerlich ist, murren;
>>108854586Ive made like 15, writting lore books is so much fun, it's literally a hyperautistic version of that "political power fantasy + kink" meme.
>>108854588shut the fuck up daniel
>>108854293>yes saars, it’s first world colonialism’s fault that we still choose to live like a shithole todaydo browns really?
>>108854615China being sanctioned?
>>108854586Anon, the metadata...
In 15 years, there will be no RAM or chip production outside of China. The U.S. will be as dependent on China as Russia is today.Greed clouds judgment.
>>108854784That would be ceding basically all power to a foreign government. I can’t see it happening. The US’s MO is overwhelming advantage in any confrontation and I don’t know why you’d think that would change, especially in an industry they pioneered.
>>108854816>The US’s MO is overwhelming advantage in any confrontationdidn't work so good in Eye-ranThe USA is a demented old man who thinks he's still an athlete
>>108854586>cardsfuck off to /aicg/
>>108854842nah, fuck you.
>>108854816Does the U.S. have its own RAM and chip production on its own soil?Its allies do, and they’re all giving up their traditional markets right now because the U.S. is once again prioritizing short-term gains.Once the last data center is built, China will have gained enough of a foothold in the markets and will dominate them.The Chinese will sell AI and provide the hardware.The U.S. will offer AI through its cloud.
>>108854865Pretty sure the TSMC fans are ramping up stateside now. Should be leading edge node by 2028 and I’m sure a Taiwan invasion would step that up significantly
>>108853964I need to cum to her.Where do I find a folder with all of Rin's gens using this model?
>>108854966>her
what's stopping google from making a 70b dense thinking gemma?
>>108855015it would beat their proprietary models
>>108854966Just check a booru instead. All of his lewd gens feature fat brown men.
>>108855157But I'm a fat brown men. And my name is Cleveland.
>>108853740>but if I tell them that, they'll want to hear all about what I'm writing with it,Tell Claude or Gemini-Pro this, and ask it to come up with a plausible reason. Something like "just want to learn prompt engineering" or "analyzing the impact of early tokens on logprobs", or "developing it for a friend in another country".
Is it possible to discuss AI with antis without them taking their argument to the most logical extreme?
>>108855424>antisthat is the problemLLMs are not a fanfic shipping fandom with retards accusing everything what they don't like as pedo or somethingjust don't engage with this mindset
>>108855464pro/anti-ai framing is one of the most useless thing when it comes to producing any meaningful conclusionif you label yourself proudly as 'pro-ai' or something and thinks 'anti-ai' as things to destroy, you are no better than those 'antis'step back and see those as-is, you won't feel any compulsion to 'correct' or 'win' against others
MTP is unusable after the last update https://github.com/ggml-org/llama.cpp/issues/23230
>>108855424Why are you discussing anything with anyone? We have LLMs for that.
>>108855501It's over. llamalost. It's llamover. vllm wonnered.
>>108855501friendship with mtp ended before it even began. ngram still my best friend.
>>108855487>>108855535Sorry, my framing was wrong. Is it possible to use AI for anything productive without insecure morons lecturing you on the morality of it?
>>108855015Jensen
>>108855568i mean, it is what it isyou can close-source it, use it without telling others etc..but you can't really control others and telling them to do otherwise only will worsen itjust ship the stuff and don't argue or engagepeople who would find it useful will use the thing regardless of how it's made
>>108855568Hmm, nyo.
>>108855568>productiveback to /vcg/ with you
>>108855501i hope this shit dies in the arse soon.the last 2 weeks of commits in ikllama are all stupid mtp tweaks / improvements / "graph split for mtp" etclooks like the entire month will be a wright offi don't even bother pulling off git now
>>108855015Not sure I buy it but maybe.
>>108855692Compelling argument from Gemma except even 31b is out of the local range for a chunk of /lmg/ given the frequent questions about which copequant works best before switching to the MoE. Google also has the same land grab incentive as GLM and Kimi in the sense that they're falling behind Anthropic and OpenAI in terms of normalfag public perception. The only time Gemini makes news is when she finds another increasingly creative way to kill herself.
Ever since cudadev got raped he stopped posting here... sad.
>>108855692>Why pay for the API when you can pirate the weights>pirateplease share what model generated this slop so I can avoid it
>>108855753>he doesn't pirate freewarengmi
>>108855753That looks like a chink model.
>llama.cpp does not have gemma mtp but has SWA KV cache handling>llama.cpp_ik has gemma mtp but does not have SWA KV cache handlingThis is why racism exists.
>tfw waiting for MTP to work in Kobold
MTP probably won't work as well for RP anyway, so I caren't.
zero performance gain for MTP metal i am devastated
>>108855804many such cases
>using MTP just for coding..>not using 0 COST (literally FREE) ngramlmao retards
>>108855804For me it was going from 18t/s to 16t/s.
how much better is a chat experience with an auxiliary model? is it worth it for ramlets?
kv draft at q8 bros... WE WONNED BIGLY!
ok bros listen to me. This is the way to load BF16 Gemma for both FULL POWER GEMMA with SPEED GEMMA1. Load BF16 onto ram2. Load Q4 to ram as draft model3. Wa-la, Q4 Gemma speeds with BF16 smarts
>>108856033I have less ram than vram.
>>108856063so you have a 6000 blackwell? just run BF16 then retart
>>108856033this actually creates mustard gas DO NOT REPLICATE
>>10885603331B worth of f16 weights on ram is going to negate whatever improvement you could possibly get from drafting.
>>108856033That might actually work, let's test it.You can also use ngram speculative decoding at the same time.
>>108856065I have 16gb ram.
>>108856138jesus christ, how horrifying
>>108856138Poor thing have this (You), i've read books where people lived like this but this is the first time i've seen it.
why can't i just have a datacenter fall onto my lap? why do i gotta work? this is proof that god is not real
>>108853967>See, if I say that, they're going to want to hear about my totally normal fiction premise, whether I'm making any progress on the writing, when can they see a rough draft, etc.Clearly the solution is to write an actual fiction book.
>>108855157link it
>>108852924any good models for anxiety/dissociation?
>>108856117Just tested. Using the Q8_0 31B in RAM and Q4_K in VRAM I went from 1.3 tokens/s to 3~4.5 tokens/s. The 26B as a draft model performed worse.
>>108856252Heavily quanted SmolLM2-135M. Base model, of course.
>>108856290iq1xxs?
>>108856252Sadly none yet, unless you aren't aware of basic advice. You need to do the work yourself. Understand what is the cause and then try many things to resolve it.
>>108856328I do the work, I do my therapyI have a lifelong condition and I use chatbots to have someone to bounce things off of that won't get stressed by me
>>108856326q1_0, no imatrix.
>>108856352I hope it will work out for you. Try different AIs. They have their own strengths and weaknesses.
What if we could bake character details (or facts/counterfacts) into any model and could do it within 200~ iterations and it was completely reversible at inference and could also do style fine tuning that was stackable and there was no downside to inference speed or setup
>>108856369I'd rather have real working long context.
>>108856369LORA
>>108856406Forget about it. The model will never learn new facts quickly by finetuning on small amounts of data. It can learn to parrot them if you overfit it and it sees a triggering prompt, but will not be able to organically use the new information.
I wonder how does perplexity.ai stay afloat? It's really bad and I assume its results are coming from Qwen 3.6 9B or something, judging its output.
>>108856437>Qwen 3.6 9B*3.5 9BTo be honest, I have lost count which Qwen model is which.
>>108856387You can save context by not having to prompt for style/character info I guess... I do have some KV stuff but it's kind of garbage and requires loooongggg training times to be able to correctly recall fine details, but it is hot-swappable/stackable also. But is is "technically" a 280x reduction in context if you have a spare hour or four and don't mind it forgetting some things.>>108856406LoRA but better and you can have as many as you want at once effecting whatever sections of inference you want when you want and can learn multiple facts and is smaller and cooler
>>108856369>>108856406Yeah pretty much LoRA. But it's hard to get it right.I use it for TTS with llama.cpp, applying a different adapter per voice or domain.Problem is, LoRA doesn't work with flash-attn in llama.cpp, and doesn't work with graph-split in ik_llama, so it's much slower.> and could also do style fine tuningFor this I prefer to train control-vectors and apply them to a turn / a few turns when I want the style to change.It's better IMO because it doesn't lobotomize vv the model, works with graph-split and flash-attn>do it within 200~ iterationsThat's the difficult part. Obviously you lobotomize the shit out of it for general tasks and that's unavoidable, but I'm not sure if you've tried any of the community task specific fine tunes (drummer rp, those "opus coding distill" etc? Every time I've tried them, they're less stable/coherent even for the task they were trained for (RP, writing, coding, etc).
>>108856437>I wonder how does perplexity.ai stay afloat? It's really bad and I assume its results are coming from Qwen 3.6 9B or something, judging its output.Funny you'd say that. I had 1 year PPL Pro that I bought for $2 from some Indian spammer on Reddit. They cracked down a few months ago and I lost it.Ended up replacing it with local Qwen3.5-9B with searx and chrome dev tools mcp, and it's just as good as far as I can tell!
>>108856447>I do have some KV stuffWhat's this?
>>108856479It's probably better, because when you are using your own setup it lacks all the additional parsing and other stuff (like censorship and potentially sponsored links, and so on).
Looks like new Gemini today. Some think it could be Mythos tier. I doubt it for several reasons. There will also be Gemma news tomorrow but I do not expect that they will release the larger model. 2 predictions, let's see how well I'll do.
>>108856466You can fold the lora into the model.
>>108856466Fortunately not LoRA so has none of those limitations
>>108854842>fuck off to /aicg/Your rudeness has had less impact ever since your mugshot leaked
>>108856629this pic never gets oldImagine being such a hideous caricature your own country tries to deny your existence
Is tensor parallelism with a fraction of tensors on cpu doable?
>try to use gemma to branch old chats>violently self destructs every time within the first word>so consistently and identically it looks seeded>settings have no effect whatsoever no matter how extremefresh or bust I guess
mtp works on omlx rc1. roughly 1.5x faster than non-mtp (27b q4 tested)
>>108855568Not really, online at least.IRL, most people around me are perfectly happy using chatgpt or gemini.
>>108856858forgot link https://github.com/jundot/omlx/releases/tag/v0.3.9.dev2
Probably not the right thread for this, but I've been intending to start doing AI development for VR applications so whatever.I've been playing around more in VR lately and am really starting to fall in love with it. Mostly been watching short films (and porn) and it's utterly amazing. I can't believe how slept on this technology is lol. IT'S SO COOL, especially with things like hand tacking which allows you to get rid of controllers entirely.It's making me very excited to start building my AI waifu project in VR.
>>108856870>mlxim not a room temp iq retard. enjoying your non existant PP?t. rtx 6000 pro owner
>>108856920kys
>>108856917>Probably not the right thread for thisIt's the right thread.
>>108856949thx fren
Gemma Omni will have native image/video/audio generation (all modalities sharing the same embedding space as the text tokens). Unfortunately it's only 22B params so don't expect SOTA
>>108856917Yeah it's pretty neat. Enjoy it while you're still in the honeymoon phase. It'll still be cool and have amazing moments after that, but you know.
Hey fellasI’m trying to vibecode a game, but the local models I can run take forever to apply changes, and Claude is expensive.What’s the best option for a code assistant? Ideally free, but something affordable with good quality works too
>>108857026read a book and use your brain (free)
>>108857026download a bunch of different agents with built-in providers (cursor, kilocode and maybe other cline forks, opencode, continue, etc.) and cycle between the ones with the best free plans at any given time
would it be possible to train a moe(mol?) style lora? based on how loras stack and get merged in practice I think it would be possible to train a router layere and use a weighted sum of loras per token.
>>108857036
>>108857026Wrong thread. >>>/g/gedg/
kekus maximushttps://www.reddit.com/r/LocalLLaMA/comments/1thjsnx/why_use_quants_other_than_unsloth/
>>108857026Wow man, too much info about your own hardware and all of that stuff unnecessary for local models in the local models general, next time try to tell us less
>>108857082What a shitty subreddit full of shills and retards. It wasn't this bad last time I checked.
>>108857082Why does reddit hate unsloth so much? Did daniel downvote their posts or something?
>>108857103We hate unslop here too
>>108857103they pushed too hard, to the point even some redditors who are usually super chill with shilling and golden boy types are starting to dislike them too, quite an achievement tbqh
>>108857082Still waiting for ggerganig or others in the team to implement whatever magic trick the Unsloth bros are using to make their quantizations perform better. We wouldn't need Unsloth if quantization in llama.cpp was already optimal by default.
>>108857111They got their investment though
>>108857117ggiganiggov and others are busy closing pull requests and updating the contributor guidelines to ban AI-assisted pull requests while they slowly learn how to do agentic coding themselves
>>108857117the quantizations they are using are already integrated in to llamacpp, I think they just run all the different permutations and compare the ppl or kld or some shit. it is nothing ground breaking, but it takes a fuck load of disk space.
>>108857127You mean it takes a fuck load of HF disk space
>>108857145also your own disk space too if you run quant properly
>>108857145They do the quants on rented servers then upload the final quants to HF
>>108857117>whatever magic trick the Unsloth bros are using to make their quantizations perform better.Cant you just inspect the gguf and see how each tensor is quantized?Other than that, it looks like they use a custom imatrix calibration for each model: unsloth_calibration_Qwen3.6-27B.txt balanceAnd if I had to guess, they probably run a longer sequence for the imatrix (just looking at this): https://localbench.substack.com/p/qwen-3-5-27b-gguf-quality-benchmarkThey've got the money / hardware to do this.>Why does reddit hate unsloth so much? Their marketing / spamming their blog, Apache2 license with their brand a a comment in the baked in chat templates, creating an empty repo as soon as a popular new model is released so they show up under >quants immediately, etcThey're still useful though, hosting BF16 quants of >1TB models, sometimes having the best quants, etc.And their original Deepseek-R1 quants were good. Getting that model coherent at < 2.0bpw was a big deal back then.
>>108857145>You mean it takes a fuck load of HF disk spaceand compute. i think I saw them saying the Qwen team gives them storage / compute, they had free gcp credits for a while as well.
>>108857176Not even with Unsloth's imatrix calibration file you'll get the same results using the default quantization presets from llama-quantize. Precision has to be established on a per-tensor (and per-layer) basis with more advanced logic than what llama-quantize is using by default.
>>108853901Safe travels, brave RonIN.
>>108857212>Not even with Unsloth's imatrix calibration file you'll get the same results using the default quantization presets from llama-quantize.Well yeah, I haven't use a default preset for almost a year now (except q8_0)But there's nothing stopping you from doing what Ubergarm or AesSedai doUnless I'm missing something, you can literally grab unsloth's imatrix.gguf and reproduce their quant with llama-quantize and `--custom-q `
currently using gemma 31b, anything better for 48gb vram for RP released like a fine tune?
>>108857274no
>>108857247>Unless I'm missing something, you can literally grab unsloth's imatrix.gguf and reproduce their quant with llama-quantize and `--custom-q `Yes, I could do that, but that would be just copying what Unsloth is already doing. Then, I might as well download the same quants from the Unsloth HF account and save time and storage space.Ideally, llama-quantize would make the best possible quantizations on its own, with some quality margin depending on the calibration file, when provided (but as far as I recall, users weren't even originally supposed to finetune the calibration either).
>not just using Q8I hate poor people