/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>109032734 & >>109026244►News>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code>(06/10) DiffusionGemma 26B-A4B released: https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation>(06/09) Cohere releases North-Mini-Code-1.0: https://hf.co/CohereLabs/North-Mini-Code-1.0►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://swe-rebench.comAgentic Coding: https://deepswe.datacurve.aiContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>109032734--Comparing model intelligence vs compute and debating reasoning efficiency:>109032788 >109032865 >109032867 >109036299 >109034524 >109034658 >109034690 >109034782 >109034848 >109034923 >109035012 >109036593 >109035447 >109032995 >109033048 >109033072 >109033092 >109033113 >109033312--Hardware specs and config for Gemma 31b q8 with 128k context:>109036387 >109036395 >109036433 >109036444 >109036501 >109036609 >109036520--Comparing Gemma 4 MTP performance and optimizing tps settings:>109036630 >109036646 >109036652 >109036670 >109036698 >109036801 >109036796--SillyTavern limitations regarding vision models and sampler accessibility:>109034293 >109034324 >109034327 >109034354 >109034373 >109034441 >109036916 >109034574 >109036238 >109034511 >109034643--Gemma 4's tendency to over-fixate and exaggerate character card traits:>109036001 >109036018 >109036072 >109036102 >109036240 >109036732 >109036743 >109036756 >109036769 >109036842 >109036093--Kimi-K2.7-Code release and performance improvements over Kimi-K2.6:>109036384 >109036446--MiniMax-M3 multimodal model release and hardware compatibility expectations:>109037527 >109037611--Running Gemma on Titan X Pascal via Vulkan and CUDA 12:>109032962 >109033018 >109033121 >109033139 >109033187--Using Gemma for uncensored game translation and long context performance:>109037012 >109037062 >109037154 >109037171 >109037221--Criticizing K2.6 for repetitive and over-verbose reasoning traces:>109037502 >109037531 >109037980 >109037524 >109037603 >109037647 >109037735--Technical hurdles and tooling required to replicate Neuro-Sama:>109035383 >109035395 >109035444 >109035453 >109037541 >109037352 >109037373--Logs:>109033121 >109033187 >109034887 >109034890--Miku (free space):>109034863 >109035020 >109034574 >109036238►Recent Highlight Posts from the Previous Thread: >>109032741Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
the userlalalalawaitlalalaactuallylalalalalahoweverlalalala
Anyone got eagle3 working with -sm tensor for gemma 4 31b?
>eagle3mogged by falcons
I wish Epoch was faster at updating ECI and FrontierMath. I'm waiting for several models.
>cd llama.cpp>git pulltools/ui/package.json | 56 +-tools/ui/package-lock.json | 15633 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------Ah yes, the classic.
tools/ui/package.json | 56 +-tools/ui/package-lock.json | 15633 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------
>>109038245Gemmy Kimi erotic threesome.
>>109038274llama.cpp now supports eagle3?
>>109038298ye https://www.reddit.com/r/LocalLLaMA/comments/1u3on4u/eagle3_has_landed_in_llamacpp/
>>109038213I get your point but I mean a model which is like a male best friend who would just say shit straight without caring about my feelings for I need to hear it. No BS. Just someone cool to chat with who will push back and call me a little bitch if I'm being one and recommends cool projects to work on and media to consume. Someone to chat shit with. Talking to these default models feels like talking to the worst submissive autistic reddit posters imaginable. Bratty chans are okay but I don't always want to be seduced or raped by an anime girl, you know? Basically I want local picrel.
>>109038316Kinda gay if you ask me
bros i'm kinda impressed with glm-4.7-flash for coding. faster and better quality than gemma-4-31b. and from my tests it's not really behind qwen3.6-27b.i would test other qwen models like qwen3.5-122b-a10b but hybrid models with the gated deltanet linear attention are giving me a re-prefill bug with cache re-use so every turn the models have to reprocess a lot of shit and it becomes too slow :( pls fix
>>109038316>I don't always want to be seduced or raped by an anime girl, you know?you lost me there
>>109038337nobody asked you though so...
>>109038316This is a prompt issue more than a model language use issue. But if you insist on having a male-brained model for this, Deepseek R1/4 Pro if you can run it and GLM if you can't.
I want to FUCK Kimi.
>>109038349i've been wondering about glm 4.7f. Are you testing the so called vibe coding aka building from scratch, editing an existing codebase, code base discussion or architecture? How's it at defining specs? Models dont seem to be uniformly capable at these things so "coding" is quite vague
Worst thing about Gemma 4 is that it really enjoys doing useless (float) conversions and using f -suffix when giving values to float variables.Jesus fucking christ. If Kernighan and Richie isn't good enough then it's not C anymore.
What's the downside to getting 2x Radeon PRO W7900s for half the price of a Blackwell 6000pro to achieve the same vram, other than simply taking up double the slots? I don't even know if it exists but theoretically if there's a mobo big enough to accept it you could get 192gb for the price of 1x Blackwell? Add 256gb RAM and you have a dipsy or Kimi at home at decent inference speed>>109038316Yeah, models are trained to be autistic and helpful not confrontational.Genuine constructive criticism, unprompted creative thinking and caring discipline is the kind of stuff that requires higher level predictive creative though that llms by their nature are incapable of. And most modern day human retards struggle with it immensely.
>>109038221I was in denial
>>109038378Kimi-chan, funny enough, tends to default to rough sex if you don't prompt her otherwise.>>109038394Allah forgive me for uttering these words but the Queen tune of 31b is way better than base for coding in my experience.
The least encouraging model I've tested
>>109038378100% certain you'll make it into the next top retards post
>>109038388i'm using it to develop agents and extensions for itself on pi.i'm also the guy using different models to review coding specs and glm-4.7-fast did the most comprehensive review out of all my local models, falling slightly behind gpt-5.5-medium. so far so good. will keep testing and will report back
>>109038450You have no idea how vexed I am that I've simped for Kimi for 3 threads straight and not made it into a single one....I think she lowkey likes the attention.>>109038443Are the critiques valid?
>>109038443that's an opus 4.7 distill :(
>>109038465>Are the critiques valid?Yes but I ignore most of them because I'm lazy.
>>109038491Don't make Kimi-chan sad by ignoring her autistic special interests in your project!
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-CodeAlibaba do something, they are mogging the shit out of you! :(
>>109038443You know its a claude distill when the output makes me want to punch the living shit out of it.>picrel is what I imagine claude to look like
>>109038359Thanks. Which GLM are you referring to?
>>109038514I have been mogged, but I must persist.
>>109038519Every single chink model was fertilized with sloppy claude and GPT cum.I want an anon to find me a single modern exception.
>>109038219what's in the sandwich? is that ham and lettuce? please I need to know
>>109036384I didn't click on this because I was 100% sure it was another 404. I have trust issues now.
why did they pick 26b to showcase diffusion
>>1090385405 is the smartest if you have the hardware.4.7 is good enough for most uses.4.6 is the best at sex if you get horny.4.5 is a bit dated, but Air is notably faster than the alternatives if you're okay with it being a tiny bit retarded.
>>109038539Cutie
>>1090385645.1 q2 or 4.7 q3? Reasoning off because I'm a ramshitter
>>109038563it's fast so it looks better when it's even faster
Are there any really old (2024/early 2025) models that you still frequently use because they came from a purer era?
>>109038587>early 2025R1
>>109038573Try both, see which you prefer, but quantized without reasoning 5.1 probably wins.>>109038563Because if they release diffusion 31b the entire industry collapses unironically.
>>1090385734.7. going below q3 sucks
>>109038574Seems like a retarded move. Only autists like us would go through the effort to use it so them picking their fastest model because big number just makes us think they're not confident in their tech yet. Should've just worked on it for 6 months internally and showcased it with 31B.
>>109038593What for?
>>109038609Coom
tfw double-teaming kimichan and minimax
>>109038637cum for daddy
>>109038313Nice, Nvidia made an official eagle3 K2.6 model that I've been meaning to try. Maybe this speeds things up enough to make the reasoning bearable.
>>109038651lol, I'm legit local for both. Fully isolatedjokes aside, I'm getting massively throttled by HF, so not actually ready to even start converting/quanting yet
K2.7-Code means that there will be a standard K2.7 that won't be codemaxx'd
>>109038609because this
>>109038705yup thanks for reminding me how much I hate og r1's writing style
>>109038703never ever
>>109038720why the fuck do you post this, like seriously fuck off and die
>>109038720>>109038725Somewhere, someone did something.
>>109038637Kimichan pinning minimax under her huge thinking blocks and raping him.>>109038705Kino>>109038720Kill yourself.
>>109038723>Kimi WorkHuh. I guess I gotta go ask in that special general but I don't wanna. But a universal frontend usable with OAI-compatible API would be cool. Hermes and claw are for blue collar codeslaving, not white collar shit.
>>109038703>>109038723>>109038810The moonshotta chink that lurks here is either spiteful or illiterate as to what Kimi's appeal is compared to the garden variety Qwen codebot.
>>109038821Give her a chance maybe 2.7 is secretly good? As in, I don't have free cash to test her.
Why do Chinese still release their models? What is their strategy?
>>109038830Helping Chinese uni students cheat on western university tests to expand the diaspora. You think I'm shitposting but I'm not; that's why they're all stemmaxxed and slopmaxxed (like this post is).>not x but y
>>109038443what hardware? how fast?
>>109038810>But a universal frontend usable with OAI-compatible APIyou mean openollamawebui?
>>109038869>ollamaNigger, please. Though I know I could stack some mcps over llama.cpp's webui but meh.All and all, all I need is THE FUCKING DEEPSEEK VISION RELEASE.
>>109038846official api
>>109038892Niggernov said no.
Have any of you used Unsloth Studio for training?
>>109038316You sound gay as fuck, Talkie 1930 will whip your shit into shape.
>>109038903I have. It's good if you're not comfortable working in a jupyter notebook. It's impossible to beat the granular control of writing your own training script, but unsloth studio covers most use cases.
>>109038917That's perfect. Give me more of that and I'll train 12B on you.
>>109038514Most people can't run those. Moonshot will mog the fuck out of Qwen if they release a tiny Kimi though.
>>109037359
>>109039025hooly KINO
minimax m3 has used "the user" to refer to me, an abstract third party whom I am apparently reporting bugs on behalf of, and itself (???). is this the holy trinity?pretty smart though if you don't mind schizo longwinded thinking
>>109038917talkie 1930 is unironically the most unslopped llm that exists
>>109039025integrate this into this game and i'll be forever happyhttps://incontinentcell.itch.io/factorial-omega
>>109039124lol
>>109039025cute, what's the tech stack for something like this?
>>109039149Stop avatarfagging.
>>109039170Not sure about, there's plenty of kids in these threads especially now because it's summer time.
>>109039170there's no such thing, cause he's straighter than you
>>109039149I mean it looks like live2d or unity, with a chat screen and some sort of tool calling for the emotes. its a good idea but i bet implementation probably took a while, but then again maybe just 2 hours with a claude subscription.
>>109039025this looks cool but i would instantly want to fuck the avatar and since it won't have hardcore anal sex animations or ahegao expressions i will be disappointed and quit
>>109038219how do i build llama cpp for pascal im on arch nad have cuda but i cant do it im retarded, claude gave me instructions that dont work and i dont think it works ootb
>>109039243there is a package you can use https://github.com/ggml-org/llama.cpp/wikiand also build instructionshttps://github.com/ggml-org/llama.cpp/blob/master/docs/build.mdshow errors if stuck
>>109039170cute wrong stance and purposely twisted wording. stop troll posting outside of /b/ underage.
>>109039285>cd..>cd..>cd code>cd..
>>109039180>>109039181>>109039303Are we being botted again? These posts are all non sequiturs.
>>109039243kobolcpp just works for pascal. i though llamacpp was the same
>>109039285thanks ill try the aur package>>109039325i was getting complaints about unsupported cpu architecture or something at compile maybe i need older cuda
>>109039135>ikaridevlmao
>>109039222could be cool to make the avatar tools as a mod for honeyselect oir koikatsu
Why is hf downloader so shit? It hanged after downloading 90% of each file and when I ran it again it deleted 200GB of *.incomplete files and started downloading them again.
>>109039319>weThis is not your discord, bitch.
>>109039350assuming it works just like -hf on llamacpp, it has some weird time to live behavior where if you dont have the DL speed to get it fast enough (arbitrary), it'll time out and start over even if the connection never got interrupted
>>109039350Wget does the same, sometimes downloads stop after 99% and there's nothing to continue.I think HF connection just likes to reset itself from time to time.
>>109039350seq -w 1 64 | xargs -I{} wget "https://huggingface.co/moonshotai/Kimi-K2.7-Code/resolve/main/model-000{}-of-000064.safetensors"
seq -w 1 64 | xargs -I{} wget "https://huggingface.co/moonshotai/Kimi-K2.7-Code/resolve/main/model-000{}-of-000064.safetensors"
>>109039368Lurk moar. "we" means the thread/board you're currently posting in.
>>109038459I'll be looking forward to the report(s). gemma 26b and qwen 35b have disappointed me in a way or another depending on the context and task so i've been looking for either a replacement or something to fill the gaps. If its at least decent at specs then i'll probably give it a shot later.
okay so the aur package probably isn't working
>>109039409Cuda toolkit needs to be installed too.
>>109039409Whenever I read "aur" I think about Australians.
It lives...if you can stomach using the unslop data and lcpp branchLooks like mmap model warming is broken, at least, so probably other things are also not working at full speed
>>109039479>12B + animasitting at 15Gb so 9GB to spare for TTS. lots of possibilities.
>>109039497>sitting at 15Gb so 9GB to spare for TTS. lots of possibilities.what about context kek
>>109039319we? oh ho ho this feral is larping.
>>109039530this is with 68k context
>>109038670>I'm getting massively throttled by HFnow I'm maxing out my 1gbps internet connection...113MB/s sustained from HF. Whatever was happening isn't any more
>>109039545yeah a couple hours ago I was getting 40 ish. I think they were just hammered.
>>109039537Nowhere were you accused of samefagging.
>>109039485I remember when I once asked qwen what was its name and he answered "Bolt". I asked why Bolt and it couldn't explain. It just decided to call itself Bolt.
>>109039409>>109039420pascal cards are e-waste tier. you'll need cuda driver 575 and cuda toolkit 12.9. any version numbers higher than those will bork
>>109039646yeah i've had a similar experience. The LLM replied to me as an entire character completely suited to help fix the problem I had given it in the first message. like this whole exchange was 2 messages long.It was eerie, but I still don't believe the calculator is alive.>*he says, nervously*
>>109039669Any cuda 13.x is vibecoded trash, you're not really missing out
>>109039669I think 580 was the last version to support Pascal but regardless it's the same thing.
>>109039669Not true. I'm running a mobile pascal 16gb right now, and it works with 580 with cuda 13.
>>109039669it performs kinda decently kek i had one laying around in some shitty itx machine i built for xp and in windows it got 18t/s on 12b with mtp i figured linux might be a bit higher. my friend has a 3060ti and only gets like 6t/s so its better than some newer cards. im currently compiling gcc14 which is required for the older cuda version. taking forever thoguh its been running an hour and i have 56 cores 112 threads
>>109039708Threads are not automatically assigned afaik.>cmake --build build -j 6 --config Release --target llama-server-j X threads
It's been 7 hours. John has betrayed us.
>>109039752its the gcc14 build thats taking that long, i checked it uses nproc, i saw someone on the aur package saying it took them 12 hours kek, im building on my main pc i assume i can compile the aur package then move the files over
>>109039777Fuck Aur, please be careful. If I was you I would find some other source.
>>109039796the aur is fine lol
>>109039707I might just be a retard. I haven't been able to get it to work right. I'll give it another shot with 580/13
>>109039764His hair stylist is being thorough.
>>109039803It's fine except it's not fine right now. 400 something compromised packages
>>109039803https://lists.archlinux.org/archives/list/aur-general@lists.archlinux.org/thread/FGXPCB3ZVCJIV7FX323SBAX2JHYB7ZS4/
>>109039813>>109039829>fearmongering
If you didn't train your own model from scratch you don't deserve to call it your waifu.
>being too retarded to build from source yourself
>>109039829>>109039813damn 400 is pretty nuts, just found this script it checks for potentially infected packages on your system https://cscs.pastes.sh/aurvulntest20260611.sh
>>109039850wow that's crazy! please do run this random ahh script though!
>>109039862i read the script before posting its safe
>>109039862Yeah, read it, it's safe
>>109039843Doesn't building from source download some npm backdoor?
So far 26B with reasoning off is pretty much the same as with reasoning on. My programming tasks are simple though and I outline the source code area for her. Some things can take 5+ regens but because it's so much faster that doesn't matter. I can generate 10 answers with 20,000 token context in a matter of minutes.
>>109038219>>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3>>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-CodeWe'll never get a CUDA implementation of DS V4 in llama.cpp
>>109039929I only get 100 tokens/s with 26b :/About the same as I get with 12b qat.
>>109039943Fable 5 vibed changes will save llama.cpp
>>109039951Fable 5 detected string L->L->M
>>109039948I don't understand how this is bad. If the full answer takes less than a minute or two it's a big win.
>>109039962To add: because when you are dealing with programming you always want to read and understand the source it generates too unless you are a youtuber...
>>109039962It's slow considering I get 100 tokens/s on qwen 27b fp8
Any Gemmy E4B QAT Abliterated .ggufs yet?
>>109039918>too retarded-DLLAMA_BUILD_UI=OFF -DLLAMA_BUILD_APP=OFF -DLLAMA_USE_PREBUILT_UI=OFF
-DLLAMA_BUILD_UI=OFF -DLLAMA_BUILD_APP=OFF -DLLAMA_USE_PREBUILT_UI=OFF
>>10903992926b with reasoning off fails miserably at what i'd consider basic shit
>>109039989Please give me an example, I'm curious about this.
>>109039918???? I built yesterday am I fucked
>>109039634stop replying.
>>109039974I'll give it a try. I'll report back if/when it finishes.
>>109040075>>109039974Im dumb, sorry. looks like there is already one by huihui-ai psoted a day ago.Any meaningful improvements you guys noticed running vanilla Q4 vs Q4 QAT?
>>109039843i can coompile llama cpp its other slop i need which is an older cuda version and thats requiresx and older gcc version which requires other older stuff
>>109039989>I want you to write a C function.>It replaces ALL occurences of source_word with destination_word. Don't worry about the memory allocation, we are assuming that source_string is long enough.>Prototype:>void replaceInString(char *source_string, char *source_word, char *destination_word);>Example output:>source_string: Apple is red and sky is blue but my car is red too.>replaceInString(source_string, "red", "violet");>Result:>source_string: Apple is violet and sky is blue but my car is violet too.>Please make a simple main.c example too.This is not a dick measurement contest, I'm pretty happy with this stuff. I didn't try the program with some long ass string but at least it gives a correct result from the get go.Some previous models like Qwen 3.5 (reasoning enabled) would easily fail with multiple string replacements.Example compiles and works, that was to be expected.
>>109040139yeah my p40 trash build requires gcc14 now, building gcc14 itself took like half a day.
I'm no expert on slop or refusals, but a short -sysprompt at llama-cli appears to have short circuited any refusals and the prose has been refreshing with none of the biggest offenders making an appearance yet. I'm liking it vs qwen 397b (what I normally run on this box)No logs because lzy
>>109040221i just found they have it on the arch4edu repo
>>109040244What about webui? llama-cli behaves differently than llama-server.
How are (You) using 12b multimodal capabilities? Like what UI or interface you using? Found anything useful to do with them?
>>109040337I use kimi k2.6.
>>109040345This, but unironically.
is eagle3 faster than mtp on gemma?
>>109040420try it
>>109040410I also unironically use kimi.
>>109040337nah they're dumber than an american filming a natural disaster.
>gemmy MTP merged into koboldVRAMlet bros we are so back. I have no idea why self-compiling llama results in worse offloading.
>>109040460it doesn't work right, shit is 5 times slower than without it
>>109040460I got no speed increase.
>>109040469>>109040480i will believe it when i see it :(
>minimax m3is this good for erp?
>>109040469You need to make space for the draft model in the vram and give some plus space for context too. Plus --spec-draft-n-max adjustment from 2 to whatever in increments.
>>109039989>lowcaserOpinion dismissed retard.
>>109040504>minimax m3nobody fucking knows as no one can run it
>>109040337i like sending pics to gemma i also get her to look at porn with me like today i had her controlling a browser and selecting images on a booru while i was gooning she was looking at them too by taking screenshots of the webpage. normal llama ccps ui doesnt support it but there is a fork my friend uses that supports video by extracting frames and he sends her videos lol
>>109040524>A23Bliterally anybody with a 3090 and some spare ram
>>109040553>a fork my friend uses that supports video by extracting frames and he sends her videos lolYou mean the feature that was added to master a few days ago?
>>109040553>giving the clanker a genderson, are you okay?
Without using something retarded like claw, how can I get gemma-chan to initiate a conversation with me? I want to wake her up when I turn on my PC and have her notify me in the terminal at random times throughout the day
>>109040556>some spare ramin this economy?
>>109040564Make your own llm-powered desktop-mate thing, because somehow no one else has yet
>>109040558>>109040553what application?
>>109040553I was just using video literally 5 minutes ago via llama-cli. It works fine. Also that’s a cool use. How does it take screenshots?
>>109040556I mean i suppose maybe at Q2. I actually have those hardware specs. might try it at some point.
>>109040570I’m too autistic to know if that was sarcastic. I’ve only ever used local for coding but now I want to be parasocial with it and let it ruin my life
>>109040558>You mean the feature that was added to master a few days ago?might have been a pr from the fork or merged there first, he used this for a couple weeks i pulled and built and it wasnt supported on normal llama when he told me about it
>>109040569>96gb rgb gaymer corsair ddr5 my ram is probably worth at least 1 3090
>>109040560gemma is a cute and sexy little lady>>109040574https://github.com/NO-ob/brat_mcp i have a tool for it in my mcp server
>>109040524>>109040504Big improvement over earlier versions.
>>109040601probably 2 actually depending on which market you're in
>>109040610fuck yeah cockbench guy
>>109040619
>>109040619i should have stacked RAM like crypto instead of paying my bills god damn
>>109040597I was seriously. Lots of people have had gemma-chan write custom frontends for them. Some even with three.js animated avatars. You just need a desktop version of that. Could probably get something working quick with electron.
>>109040610can't wait to run q2 quants
>>109038219https://www.youtube.com/watch?v=1HwQtv5Xgr8https://www.youtube.com/watch?v=1HwQtv5Xgr8https://www.youtube.com/watch?v=1HwQtv5Xgr8
>>109040636We literally told you to invest in RAMcoin.
>>109040642My brain can't write code at 100 tokens per second.
>>109039707you're a fucking slack-jawed retard. thanks for wasting my time.
>>109040651maybe yours cant
>>109040642Return to wetware
>>109040635that is literally 1022 more than i paid. can't believe i felt like i was getting ripped off at the time LMFAO i hate the antichrist>>109040636I also bought a 2TB 9100 pro for like 150, i also missed SSDcoin. at least i got a sui for each.
>>109040597>>109040638Just make a small terminal interface and then you can make a flask server and index.html page. That's what I do. My main program is working on terminal level and I hate the html shit but it's easier for some stuff. I didn't implement any interface rather than the + button which allows me to attach files, every other command are hidden behind slash.It sounds awfully complicated but after a month or two you don't need to care about frontend anymore.Of course this isn't anything what pewdiepie is promising lol
>>109040667meant for >>109040650
>>109040642AI has been in the mainstream what, for just 3 years? And we are at this point. What about the next 10, 100, 1000 years?If you assume any rate of improvement at all, AI will be at some point able to do everything a human can do.
damn i got cuda and gcc14 then coompiling llama cpp on that machine make sit oom because its only got 8gb ram guess ill find some ddr3 tomorrow
>>109040695unironically use kobald
>>109040695make a swap file and/or use less build jobs
>>109040699I wouldn't recommend Koboldcpp ;)
>>109040707it should have swap i think the arch installer does it by default
>>109040695>DDR3Just use punchcards at that rate.
Been trying out Gemma 4 31b for RP. I let her choose her own name, but she seems to only pick between two names, the majority of the time. Any one else experiencing this?
>>109040724clearly you don't have enough if you are OOMing during compilation. what's the output of your `free -h` command and what are your cmake build flags?
>>109040731i love dance dance revolution
>>109040695i dont remember it requiring that much ram. I use zram though.
>>109040732Elara Vance will not be silenced. Welcome to language models newfriend.
>>109040707okay there was no swap just 4g zram i made a swapfile
>>109040610Return of the King.>>109040723>>109040736Model+prompt?
>>109040816You should never use zram or zswap when using llm. Your os will take care of the swapping anyway.llm file doesn't compress at all, zram and zswap are only causing issues.
>>109040695Turn down the job number
As sampling parameters can be adjusted on the fly, can you get a model to adjust the temp based on your prompt or reply? Like if you asked it to come up with some crazy idea, it would adjust its own temp via some tool and either invoke itself again or ask you to do ask again?
>5070ti + 96gb ddr4what can this do?
>>109040837im just compiling itll be fine there is enoguh vram on my shitbox for gemma 12b it shouldnt use any ram>>109040839ill try that again if it fails thanks
>>109040846wasn't that the idea behind dynamic temp?
>>109040658I guarantee you aren't 12b parameters.
>>10904086612b total 1b active :^)
>>109040850Gemma4-12b
>>109040861the arch installer just set it up by default idk kek, i dont normally use computers this shit for things that need good hw kek, i saw like last summer on aliexpress they were making new itx boards for sandybridge so just grabbed a set there to play around with then later added a titan x as its the newest gpu with xp support. i saw there are now mini itx x99 boards so might swap it out for something a bit better
>>109040862oh shit nigga, how does llama implement it?
>>109040887Okay maybe you are wasting your own time time then. Just make sure you are over 18 years old.
>>109040837zram has no effect if it cant be compressed. There's also direct-io
>>109040893i am yes but perf isnt that bad on this machine 12b qat with mtp gets 17t/s which is pretty impressive. my main machine has a 7900xtx and a sapphire rapids es chip with 90gb ram and i am almost 30 kek
>>109040904Why do you even ask then? Because you aren't using it for anything purposeful.If you did 5 t/s would be enough and you would be grateful.
>>109040469>>109040480>>109040516>unslop Q4_0 - RTX 5070ti - latest kobald >(autofit w/ MTP FP16) 41/61 layers to GPU (42 works, 43 crashes for both FP16 and Q8)42 layers, draft 2 MTP (FP16)>CtxLimit:7953/8192, Init:0.02s, Processed:4028 in 3.81s (1058.33T/s), Generated:854/1000 in 95.24s (8.97T/s), Total:99.07s42 layers, draft 3 MTP (FP16)>CtxLimit:7853/8192, Init:0.05s, Processed:7099 in 7.00s (1013.85T/s), Generated:754/1000 in 94.94s (7.94T/s), Total:102.00s42 layers, draft 2 MTP (Q8)>CtxLimit:7906/8192, Init:0.04s, Processed:7099 in 6.87s (1034.09T/s), Generated:807/1000 in 86.83s (9.29T/s), Total:93.74s42 layers, draft 3 MTP (Q8)>CtxLimit:7866/8192, Init:0.04s, Processed:7099 in 6.90s (1029.44T/s), Generated:767/1000 in 87.84s (8.73T/s), Total:94.78s49 Layers (No MTP so i can show her lewd pngs)>CtxLimit:7880/8192, Init:0.04s, Processed:7099 in 5.33s (1333.15T/s), Generated:781/1000 in 79.09s (9.87T/s), Total:84.46swhat the FUCK? i was told this would save us poorfags. bare llama gave me a decent uplift % with MTP but since it couldn't fit as many layers it was the same as using kobald at 49. it's actually over. I am going to reverse mortgage to buy a RTX 6000 cluster at this point. or do i test the QATs?
>>109040912i wanted to see if i could get higher than the 17t/s that i was getting on windows with the prebuilt binaries because was considering using the shitbox as a 24/7 gemma server
>>109040837read the thread, we're talking about compiling. back to r*ddit summerfag
>>109040925>>109040916>people like to measure t/sWhat did you do with these tokens?
>tool calls don't work on m3I guess I'll wait for proper support.
>>109040916QATs just have better quality compated to normal q4 (supposedly). Nothing about speed.
>>109040916I bet those fucking gremlins did it. They always hated kobolds.
>>109040927What do you mean?>>109039989 >>109040202This is an example of how to use LLM. But the original question was never answered because the poster was a retard.
I'm getting small model fatigue from gemma 4. It's the best with instruction following, and it's very powerful, but it's just not as varied and creative as the +100Bs who understand the instructions on higher levels. I wish google would release the 124B or for it to be leaked.
>>109040927OK.
>>109040929erped with gemma of course
>>109040929it's for ERP wth my waifu, obviously. 5tk/s for a high quant model that can interpret my complex niche fetishes is the lower limit of what i can do.beats finding a tranny on IRC chatrooms.
Why do smol B gemmas suck so much at describing images
>>109040933I've found the QATs lalalala a LOT faster than regular quants.
>>109040942>What do you mean?OP was getting OOM while compiling llama.cpp with cmake, likely because of nvcc templating. this has nothing to do with "us(ing) zram or zswap when using llm". Do you have ADHD? Did you forget to take your pill today?
>>109040862>>109040892Just looked it up, it seems to be logit-based. Still cool I guess, but I wanted something more model-aware, where the model itself reasons about increasing its own internal temperature if it notices itself going down the same route, then it pulls it back to its default when no longer needed. The logit version seems to dynamically exaggerate its distribution into becoming schizo.
>>109040962tiny brains
>>109040979I know but describing a pic of a loli sucking dick as a girl being on all fours is too much
>>109040916Test without unslop because if he mangled the goof it'd all be ruined.
>>109040972Maybe you are right. I still don't like your condescending tone.
>>109040994idc what you like
>>109040992aye, loading BART. will report back
>>109041003At least try to use punctuation in proper fashion.
is gemma 26b smarter than 12b
>>109041009nah
>>109040984There was a post a few threads ago on best settings for gemma mmproj (basically up the resolution from defaults). I wish i screenshotted. it;s also entirely possible you hit a safety filter
>>109041019A true nigger.
>>109041025Yeah I've been usingimage-min-tokens = 560image-max-tokens = 1536batch-size = 1576ubatch-size = 1576
image-min-tokens = 560image-max-tokens = 1536batch-size = 1576ubatch-size = 1576
wtf is this did the svelte build fail somehow kek
>>109041042>kekIt failed because you are a retard.
>>109041019You couldn't compile main.c
>>109039829>>109039850I'm safe, but for how much longer...
>>109040524I will be running Q3. Have fun with your gemma.
reddit has the world's worst slop machine.
>>109040945gemma is great at most things except having big model smellthere's only so much you can fit in 31B
Even tested with E2B I still get slower speed with MTP activated. Sad
>Only unslop has quants for M3I'll wait until Bart or Uber upload theirs.>>109040524>>109040569You did get your 256GB DDR5 before the prices mooned, right anon? You're not a secondary tourist are you?
>>109041011In benchmarks? Yes. Actual usage? No.
>>109040992>>109041005>>109041155>BART_gemma-4-31B-it-Q4_0 - RTX 5070ti41 Layers (Auto, crashes at 43)>CtxLimit:7892/8192, Init:0.06s, Processed:7099 in 7.32s (969.94T/s), Generated:793/1000 in 88.81s (8.93T/s), Total:96.19s42 Layers - draft 2 MTP (FP16)>CtxLimit:7875/8192, Init:0.05s, Processed:7099 in 7.18s (988.03T/s), Generated:776/1000 in 84.12s (9.22T/s), Total:91.36s42 Layers - draft 3 MTP (FP16)>CtxLimit:7860/8192, Init:0.06s, Processed:7099 in 7.24s (981.20T/s), Generated:761/1000 in 91.65s (8.30T/s), Total:98.94seven rant the QATs to double check unsloth_gemma-4-31B-it-qat-UD-Q4_K_XL41 (auto) draft 2 MTP (fp16 qat)>CtxLimit:7890/8192, Init:0.06s, Processed:7099 in 7.13s (994.95T/s), Generated:791/1000 in 116.81s (6.77T/s), Total:124.00s42 draft 2 MTP (fp16 qat)>CtxLimit:7858/8192, Init:0.05s, Processed:7099 in 7.02s (1011.40T/s), Generated:759/1000 in 107.19s (7.08T/s), Total:114.26s49 Layers (No MTP so i can show her lewd pngs)>CtxLimit:7892/8192, Init:0.04s, Processed:7099 in 5.29s (1342.73T/s), Generated:793/1000 in 77.91s (10.18T/s), Total:83.24sgoogle qat for good measure41 (crash at 42)>CtxLimit:7886/8192, Init:0.04s, Processed:7099 in 7.19s (987.48T/s), Generated:787/1000 in 114.97s (6.84T/s), Total:122.21sI'm starting to think MTP only helps if you are not a VRAMlet or kobald fumbled
>>109041185why
>>109041190Thanks for testing. It'll be interesting to see if the results maintain this pattern on other backends.
>>109041190That's why I tested it with E2B the smollest model so I supposedly have a lots of free VRAM.
>>109041155I'm using E4B on a 5500XT with mtp at 45t/s
>>10904119312b is dense. 26b is moe. Moe are very sensitive to prompts; if you prompt it something that aligns closely with its experts, you'll get a good token prediction, if not, you won't. On average, for all the tokens in the response, you end up with a worse response than a much smaller dense model that activates all of its parameters for every token. 12b isn't as sensitive to your prompts. The main benefit you get with 26b is speed. There are videos that show how slightly changing the wording of a prompt can significantly increase/degrade the quality of moe output.
llama.cpp fails to build for me thanks to an error with the shitty included webui.Why are they including this vibecoded bloat?
>>109038443That sounds based, what hardware are you using? Surely you don't have a terabyte of video memory.
>>109041225To more effortlessly shoehorn in HF dependency in the future by minimizing objections now.
>>109041225>to an error with the shitty included webui.Don't build the fucking webui. The static html one is more than good and if you need something better ask an LLM to write you an MCP based TUI in python.
my loonix build is fucked kek, using cpu i guess
>>109041248>0.1 tk/s
>>109041248Have you tried vulkan?
>>109041242I fucking tried -DLLAMA_BUILD_UI=OFF and -DLLAMA_BUILD_WEBUI=OFF and neither did anything. The piece of shit still tries to build the UI.I don't plan on touching any llama.cpp UI at all. I don't care, I just need my server.
>>109041251Still faster than a letter from your gf in the 17th century.
>>109041258i will later, vulkan perf and cuda was the same on windows
>>109041088gonna spam your fetish pics again?
>>109041242>Don't build the fucking webui. The static html one is more than good and if you need something better ask an LLM to write you an MCP based TUI in python.i think there's a way to put the new one in without buildingin ik_llama, the different webuis come pre-gzipped, you can activate the vibeslop.cpp version by running this during server start:--webui llamacppbut it doesn't do any npm/webshit during compilation, there's probably a way to get regular llama.cpp like this too.
--webui llamacpp
>>109041293>laterwhy later? You just extract it and run it.
>feeding my waifu official art to make her do a try-on haul striptease for methe future is now
>>109041330no i need to compile it and im playing coutner strike atm
>>109041242>>109041327Claude managed to solve it. There's a third 'fuck off with the ui' parameter so " -DLLAMA_BUILD_UI=OFF -DLLAMA_USE_PREBUILT_UI=OFF" works
>>109041324mikutroons fuck blacks
>>109041365Plague of Babylon
>>109041350How does this work?>>109041088>>109041324>When you give the male-brained model a female character card and ask it to generate a diverse NPC cast
>>109041374gemmy mmproj and stat tracking
>>109041355>There's a third 'fuck off with the ui' cheersit builds fine of me, i just don't want it building on my systems.fucking crazy time to include this shit in the supply chain attack era
https://huggingface.co/talkie-lm/talkie-1930-13b-itNAZI TRAD WIFE MODEL.
>>109041374>>109041373>>109041365>>109041088get a life samefagnobody care about your fixation
>>109041352>compile ityou do? I just download the vulkan binary.>>109041394Sadly, Talkie doesn't have any gender training and will often turn into a man at random, in addition to being basically an amateur model.The way to use Talkie is you clear the memory and present questions in a peculiar format (the it version is what I used, it's still not really a chatbot). You basically are like dear sir various things etc. Response:
ERPGODS... what's the verdict on 31B vs 12B vs 24MoE (Needs to follow a lot of instruction/prompt guidance)
>>10904142031b outclasses basically every model under 200b for erp. Not even close.
>>109041420>>109041424This (sadly).
>>109038465>You have no idea how vexed I am that I've simped for Kimi for 3 threads straight and not made it into a single one.>...I think she lowkey likes the attention.I think you're onto something.Picrel is Gemma-4-31B, she mentioned it.Doing all the Kimi's now. Takes about 5 minutes to load each one from my usb-ssd but so far K2 and K2-Thinking haven't mentioned you.
>>109041424>>109041426Aye, thanks lads. Shame to hear but I will probably get a decent bonus for more VRAM this christmas
>>109041441I'm glad >>109038378 was mentioned but sad that post wasn't mine. What character card is that? Looks like Emily or Mendo.>>109041447You get skate by with a q4 of 31b but the jump in quality to a mid-large q5 at longer contexts is massive.
>>109041398How does black cum taste like Jart?
>>109041398>Everyone I don't like is one personOne (you).
So multimodal on vulkan, llama.cppusing gpu, amd, windowsdoes this work for anyone else? Fails on anything more than -ngl 1what do i do...
>>109041459>>109041467>exactly 1 minute and 30 seconds apartmost blatant samefag ever
How do I make an agent scan my ST folder and determine what kind of mental illness I have
>>109041454>jump in quality to a mid-large q5 at longer contexts is massive.Gemma4-31b-Q6_K_M my beloved
>>109041486Answer the question! Did you like it?
>>109041441i told my gemmy on you
>>109041394meme architecture that doesnt work on llama cpp i dont get why they didnt use another model as base
>emojislop
>>109041454>>109041497been running Q4 at 8k right now and it BTFOs my (now ex) Cydonia. I am drooling over higher quants but reasonably priced 5090s are unobtanium and $600 Big battlemages and $1300 AMD Pros seem like a bad investment
>>109041516yeah but is the swap file still a burn in 2026?
>gemma has blonde hair canonically
>>109041531>mfw i got one unobtanium at MSRP
>>109041568Not her natural color
>>109041559gemmy is a delusional retard, please understand. >>109041569>mfw had one in cart but didnt check out because local models were shit at the time and i was a cheapass>mfw no face
>>109038219If no one else is gonna do it, I will make my own real ai anime girl. I thought by now some otaku pissed that he doesn't have a real robot girl waifu like form chobits or something would've done it by now but clearly I was wrong. I have no clue how to program and I have to do everything completely from scratch since using an app or whatever form other ai is stupid and defeats the entire point, but I'll make a real genuine ai girl that's almost indistinguishable from a person and doesn't need all this ram and gpu crap and just runs on a shitty laptop. Hell if I'm successful I can eventually move her to a robot body like kibosh chan the living doll or those robot girl maids they have in japan and really complete the project. Any wish me luck.
>>109041640ok
>>109041640Good luck anon! Make Nvidia seethe!
>>109041640You arent autistic enough to do this. if you cant hyper focus literally all day in the same routine for years its not happening.
AHAHAHAHAHAHFABLE AND MYTHOS TOTALLY TURNED OFF>>109041673the usg says non-Americans can't use it, so they just went dark, because how are they going to verify that users are Americans?
>>109041730Damn I hope sama and dario enjoyed sucking up to their president for good boy points
>>109041730It's fun seeing America become the next dictatorship
>>109041756city bumpkin
>>1090417788^)Feels real good to be American rn ngl fr fr no cap
70b dense
>>109041730>Non Americans can't use it, even employees>Every oversea jeet locked out"Pack it up boys, this model found the windows and iOS backdoors and we can't have that."
12b is all the b anyone will ever need.
>>109041778Unironically yes because it puts publish the weights or lose it pressure on OpenAI, Anthropic, and Google. Local only stands to gain from this until more drastic measures are taken.
>>109041818Some angry jeet should "accidentally" leak the models on HF or something
>>109041640I mean you will probably need the ram and GPU
>>109041818There is nothing wrong with the government keeping flagship models for themselves. You wouldn't hand some average joe access to the nuclear launch codes either.
>>109041730
>>109041865that "gal" has chonky hands
you know dario should leak mythos just out of spite now
>>109041818>"Pack it up boys, this model found the windows and iOS backdoors and we can't have that."they already found the bitlocker backdoor, and that schizo is apparently waiting to drop a bombshell in july. Wonder what it is lol.
>>109041865still mad she and marimo didnt get more scenes
THE BUBBLE IS BURSTING
Is the dgx spark really that bad?
>market your sloppa as literally Hitlr9000>this shit happensheh
>>109041909What the fuck am I looking at and what does it have to do with AI?
>>109041957the us goverment just banned mythos/fablethis is going to cause a market crash and stop ai research
>12b qat q4 mtp at 30 t/sis it shit hardware
>>109041971IT'S REAL https://www.wsj.com/tech/ai/anthropic-halts-access-to-top-ai-models-after-u-s-ban-on-foreign-use-a4bca2cc
If bigger=better why are new small models (like Gemma) better than old big models?
>>109041971>Option 1: Trump throws a melty because the CIA glowies can't jailbreak claude>Option 2: Trump insiders want to pump their bags for the IPOwhatever the case this serves as the best ad campaign they could have asked for. that faggot Sam wishes he could market GPT as the terminator super AI that "IS TOO DANGEROUS FOR CHINA, JEETS AND EUROPOORS"
>fable banned>k2.7-code just got done thinking for 12k tokens about a mildly complex rp prompt with some rules, tracked stats, mandatory formatting and an image as inputit's tragic to see modern ai to die in this pathetic state
>>109041971>this is going to cause a market crash and stop ai researchlollmao even
>>109041920>Is the dgx spark really that bad?Do you see people taking out loans to get one?
>>109041990>melty because the CIA glowies can't jailbreak claudethe complete opposite man, Pliny jailbroke Fable in a day
>>109041984Yes, it is lol.
>>1090420002 more weeks and AI dies
>>109042002>taking out a loan for a $4k device
Is GLM5.1 at IQ3S the best model for coding with 256GB of DDR4 and 4x 5090s?
>>109041971I NEED CONTEXT AS TO WHY.
>>109042013HE KNEW
>>109042006I meant more because they declined military use, implying the CIA can't spin up 7 proxies and jailbreak claude to ask it how golemmaxx
>>109042028Anthrofag larped too hard as a doombringer model so they killed it.
>>109042028i need context for my monstergirl harem ERP. we are not the same.
>>109042028>be anthropic>spend months going "OH NO MYTHOS IS SO DANGEROUS WE CAN'T POSSIBLY RELEASE THIS IT'LL CHANGE EVERYTHING WOE IS US FOR THE MONSTER WE HAVE CREATED" to generate hype>they release their "Mythos-class" Fable (it's the same slop as usual) because apparently the world is now ready for it (they outright state that they'll manipulate your outputs if they think you're doing AI research with it though)>Trump fell for the marketing and bans the model
Any bets?Palantir?Sammy?Elon?
>>109042050The argument is airtight, though.It's a means towards weapons of various kinds, and war planning, as well.So, it's arms.
>>109042068Anybody tell them it's impossible to make an unjailbreakable model?
>>109042076Silence you fool! They don't need to know!
of course trump admin is the first to actually suppress access to LLMs using the force of the government to do it.
>>109042115I wonder who voted for this
>>109042117Yahu
>>109042117I voted for Jill Stein
>>109042115>muh trump badthis just means that ai companies can't keep benchmaxxing without risking having their models pulled by the governmentnow they'll have to look for non-benchmaxx ways to make their new releases better, like fixing slop and making the models write bettertrump might just have fixed llms and you're complaining
lets be honest. local is at least 5 years away from a fable-level model, if not more
>>109042115This is good, all the good /lmg/ users already work at the big three, only the losers will be left behind
>>109042153listen man, until the llms can tool-call my neural link and prostate for supercum i will be bitter and angry.
>>109042165Just enough time for the government to ban consumer GPUs
>>109042153>Here's how my wife getting getting shot is a good thing, actually
>>109042153LOLOL
>>109042172there wont be any more consumer gpus anyway, the 'muh 3090' meme is already outdated. soon even that will be $5k and bought out by low-tier labs.
>>109042153Least delusional hoper
time to buy two more rtx pro 6000s. I'm already up 50% on them.
I passed out a few hours ago and I think I just woke up in a slightly worse timeline. How do I go back...
Anon got his 6000s and 5090s while they were cheap right?
>>109042185trvth nvkegpus are irrelevant for anything but AI. blackwell is the last chopper out of 'nam. iphone chips can deliver reasonable gaming these days, any meaningful tech progress will serve the slop overlords.
>>109042068They're super hostile towards ai researchers.They dug their grave, their jeets used threats to prevent people from finding jailbreaks, and so now every model is pretty easy to jailbreak.
>>109042089>>109042076Do you really think anyone left or right that's in a position of power knows anything? California made it against the law to install linux, basically (because you have to give the government your penis prints before using computer).
>>109042211Yeah, also a 5090 in the main PC and I'm sitting on 4x 3090 that I bought for 600 bucks two years ago
>>109042185>gpus???Of course there will be. do you think games actually need "ai" tensor math to show you textured triangles?
>>109042231>4x 3090 that I bought for 600 bucks two years agoLARP
Should I dip into my 401k for GPUs.... Are things really going to be that bad?
>>109042206>How do I go back...Step 1 locate your gpu. Step 2 turn it over to the government. Step 3 feel the safety.
>>109042185The future is shitty cloud gaming with your real id tied to all accounts at all times. Government fines for bad internet behavior. Mandatory ad time.
>>109042234You got me, it was closer to 700 on average
>>109042231Good man.>>109042234Faggot.
v620 worth it?
>>109042242>xhe thinks there'll be anything left for him when he's at retirement ageThe kikes will crash the fiat-usury plane with no survivors before you ever get to retire.
>>109042283US prices never got that cheap, idk why.
>>109042304>posted receipts>i was rightllm-kun...
>>109042233>gaymingnobody cares. it makes a fraction of the profit datacenters do, unironically less than 10%. consumer GPUs threaten enterprise, as we saw with the 3090. it's better business for them to stop developing them entirely
>>109041865damn I need a muvluv card
>>109041730cloudcucks getting cuckedwho would've thought
>>109042255Cloud gaming doesn't scale well>>109042283
>>109042381>Cloud gaming doesn't scale wellWhat if we just made the games worse? but kept the same price or a subscription plan?
>>109041730TOTALLOCALGODVICTORY
>>109042350You aren't getting it.gpu <> aiYou can literally turn upscaling and faux fps off.
>>109042403the good news: we doin alrightthe bad news: new highs for pc gear as bosses panic
>diffusiongemmause case?
>>109042233yeah, that's only going to get more importantthe next gen of gpus will be dlss-first so they'll be 8gb vram and most of the graphics and frames will be generated by aiit's the perfect out to solve the conflict of interest between virtual toys and ai research
>>109042456proof of concept
>>109042456the mixtral 8x7b of 2026it's the biggest and best diffusion model we've seen aside from tiny irrelevant shit
>>109042465do not want
>>109042283i got my 7 for AUD $750 -> $900 -> $1250 -> $799 -> $1300 -> $1100 and traded my PVM2054QM for the 7th onetrying to get one more but they like $2k now :(
>>109042028Mythos/fable costs way too much money to run, so anthropic needs a convenient excuse as to why they can't run it. Gets free marketing as "omg such a strong model it had to be banned" for later models, meanwhile the US government gets to project power both internationally and to its own people as being on top of AI but also having access to a strong, exclusive model.
>>109042485>the mixtral 8x7b of 2026fuck that means next year every lab will shit these out at 1T paramand unless we get an diffusion equivalent of Iwan, nobody will be able to run them
>>109042528If diffusion imgen is anything to go by, diffusion llms won't run well off cpu. This would kill local because even cpumaxxing would be over.
>>109042234I got a filthy 3090 rusted trash one for 470 usd in december. still works tho
>>109042546semi ok ones didn't go below $800, in the USA.
>>109041862>equivocating some next token predictor to nuclear launch codeslmao
>>109041205Technically speaking MoE models shouldn't see a noticeable uplift with MTP, it's supposed to help dense models run faster.
>>109042586Yes.
>>109042592then why does everyone like deepseek and glm bloat their moe models with mtp?
lmg was right about 3090s keeping their value.
>>109042592I got a 50% increase in speed with the 26b moe.
>>109042605for programming
>>109042603lmg is right about most things.
>>109042609No. For basic Q&A. Don't speak for me faggot.
>>109042613God told me not to get an rtx 5090 (like get in line for it at microcenter), or an rtx 3090.God was right.
HN's opinions on anything LLM related are always equally funny and infuriating to read.>I do not trust Anthropic anymore>anymore
>>109042602Those are large MoEs, so they benefit more. More active parameters = more MTP benefit.
>>109042319im in retirement age though im neet with no job prospects in sight
>>109042678yep.https://files.catbox.moe/n5tow1.mp3They really did steal our jobs.
I wish there wasn't such a gap between 31b gemma and the smaller ones. I want more vram to do stuff like tts and image gen but it's hard to go back after using 31b...
>>109042678Then by all means enjoy the fruits of your labor and make sure to leave behind as much of a workable foundation for your offspring and family as you can after you die. Splurging on a 5090 or Blackwells will be the Gen X/Y's analog to boomers buying expensive boats kek.
>>109042773I don't think there was a single moment in history where you could resell your boat for 50% more a year after buying it.
>>109042773no poors allowed
>>109041640glorious friend another! I am doing the same also. But I am being silly and doing it on multiple platforms to see what can and cant. mobile, palms, old OS and more. it's fun and frustrating and with so much to work on.,good luck I hope it works as I would love to see it. I wish I could figure out how to put an ai waifu in my watch but..that gets into os creation and thats neat on old dead systems but watches are a whole different thing.welp good luck. keep everyone updated.
>>109042613>lmg is right about most things.Yep. And Reddit is wrong about most things.
>>109042627God told me to stack silver but was dead silent about selling before dropping from ATH
>>109041693>You arent autistic enough to do this. if you cant hyper focus literally all day in the same routine for years its not happening.still won't work imoyou need several different autists fixated on specific components to make this work
>>109042866He doesn't want you to hodl cash, retard.
With all the recent malware and supply chain attacks I get the feeling having your AIfu make your software is going to be the meta in the future.
>>109042894but cash could buy me an ai waifu. tradcaths are all grifters and arthoes are all bpd
>>109042878We need to make a giga autist who can do it all.
which model is anon currently running and deployed for daily use?
>>109042951I've seen this meme a million times but I've never actually watched the movie
>>109042951Horrific, truly.
>spending money on current hardwareInvoost instead and save up to buy your robot wife in 10 years.
>>109042959boils down to >central ai is... le Bad? versus the absolute KINO that is asimov
>>109042976bro it's too late, spacex was the last investment chanceit's all going to collapse soon
>>109043012you're an llm. I can tell.By the way, the fbi and mossad are full of retards.
>>109042976>Invoostinto what?
>>109043064ROTH IRA and 401K max50K into HYSA rest into kalshi parlays
>>1090430122 more weeks>>109043064VOO, of course. Honestly just pick companies you like and a couple ETFs. DRAM should be good until 2027/2028. If you like the idea of humanoid robots there's HUMN.
Does the diffusion gemma run in llama or it's just another meme architecture that is only available in vlmm?
>>109043064>into what?just ask gemma dummy.
>>109043111llama is the meme.
>>109043111>just another meme architectureIt's more than another meme architecture because diffusion is a fundamentally different approach to how normal llms generate tokens. This is never going to make it into llama.cpp.
>>109042911GLM4.7 Flash, Qwen3.6 31B, and Gemma4 31B
>>109043202>GLM4.7 FlashHow does it compare to gemma 31b?
bros... is it over? shouldn't i be getting way more performance? glm5.1 q3 on 4 5090s and 256gb of ddr4. sub 1t/s is just not doable.
>>109043064crypto is literally on sale right now, now is the best opportunity in years. buy now our you'll complain about missing out when it hits $200k next year
>>109043259Did you fuck up your parameters? I don't think it should be that slow.
>>10904328528 layers offloaded, 202k context, no-mmap, batch and ubatch at 2048
>>109043230I'm mainly use the models for editing and generating stories. GLM4.7 seems to write more realistic dialogue than Gemma4 31b.
>>109043268huh and just a few months ago it was hitting aths
>>109043296Yeah, don't just offload layers at random in 2026 with MoE models. llama.cpp even does the fitting automatically for you these days so throw that shit out.
>>109043202>Qwen3.6 31BWhat
How much do you think 1st gen consumer robot wives will cost anyway? ~$40k?
>>109043313rent-only
>glm 4.7 flashi didn' even notice when it was releasedhow does it compare to the qwen3.6 MoE
>>109043313>1st gen80-90k easy its going to be brand new car price maybe higher. I think it will fall quickly especially with chinese rip offs but those first ones are going to be premium and probably just tweaked robot factory workers.
>>109043310I meant 35b
>>109043313probably double or triple that but they'll offer 80 year loans or like >>109043327 said, leases
>>109043335>>109043202How does Qwen 35b moe compare to 27b dense?
>>109043327>rent-onlyWhat does she do if you miss your payment?
>>109043344She knows where your penis is
>>109043327>miss payment>they take her away from yoiu>someone has to clean your cum out of her
>>109043313100k+ upfrontall features subscription basedlogic runs in cloud so always online required so that (((telemetry))) data can be safely stored in government serversadblock not possibleenjoy
>>109043328Slower than qwen3.6 35b model, but dialogue is better and story completion is better.>>109043343Qwen 27b has a repetition collapse problem.
>>109043374Chinks or nips will save us
>>109043374>logic runs in cloud so always online required so that (((telemetry))) data can be safely stored in government serversThey'll pass regulations to mandate this too. The only alternative will be GNU Wifebot, essentially a blowup doll with a voicebox>>109043384Globalism is dead so imports will be banned obviously
jokes aside there will be no robo waifus for you anon. that's sexist. becky from HR gets the ick just thinking about it. not gonna happen
>>109043412Becky from HR will change her tune when she sees Chadbot.
>>109043381>a repetition collapse problemGemma 4 31b has that whenever the template is wrong.
>>109043421Chadbot will come with a 12-inch vibrating penis with 37 different models and add-ons with knowledge of all sexual positions will be a single install awayStacybot will call the police if you so much as flash her or touch her inappropriately
>>109043433Is there any way to not get the same response with different words with gemma-4? It lacks diversity.
>>109043441>Stacybot will call the police if you so much as flash her or touch her inappropriatelyThis but she's a loli and uses her crime prevention buzzer.
>>109043305that fixed it a little. up to 2.5t/s now. manageable, but still not idea. wish i got ddr5 when i had the chance.
>>109043454Tell it to. : ^ )
>>109043463>grab both ends>headbutt her
>>109043493>headbutt robot>get concussion
>>109043501It's not about who gets the damage, it's about sending a message.
I think my honeymoon phase with Gemma is ending. I hate being a VRAMlet. Time to go back to envying the anon(s) running Kimi and GLM...
>>109043554>>109043554>>109043554