/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108488188 & >>108481865►News>(03/31) Claude Code's source leaked via npm registry map file: https://github.com/instructkr/claude-code>(03/26) CohereLabs releases Transcribe 2B ASR: https://hf.co/CohereLabs/cohere-transcribe-03-2026>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts>(03/26) ggml-cuda: Add NVFP4 dp4a kernel #20644 merged: https://github.com/ggml-org/llama.cpp/pull/20644>(03/25) LongCat-Next native multimodal 74B-A3B released: https://hf.co/meituan-longcat/LongCat-Next►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108488188--Replicating land/water classification experiments:>108490105 >108490129 >108490163 >108490676 >108490784 >108490200 >108490277 >108490429 >108490453 >108490550 >108490568--Disabling Qwen3.5 thinking via chat template parameters:>108488387 >108488545 >108488799 >108488704 >108488777 >108489220 >108488828 >108489159--Activation rotation PR improves Q8 KV cache quantization:>108493347 >108493457 >108493491 >108493518 >108493522 >108493524 >108493503 >108493539 >108493553--Claude Code source leak analysis and reactions:>108491355 >108491419 >108491817 >108491431 >108491443 >108491462 >108491467 >108491472 >108491458 >108491485 >108491501 >108492471 >108492482 >108492495 >108492500 >108492522 >108492524 >108492665 >108492678 >108492708 >108492716 >108492577--Sam Altman's non-binding RAM deal and hype scrutiny:>108491713 >108491769 >108491785 >108491794 >108492516 >108492525 >108492600 >108492613 >108492632 >108492639 >108492653 >108492679 >108491973 >108491992 >108492416 >108492426 >108492430 >108492637 >108492699 >108492781 >108492821 >108492842--SSD endurance concerns with high-daily-write workloads:>108491932 >108491966 >108491991 >108492002 >108492030 >108492049 >108492326--Debating Substack article claims about cheap local inference and BitNet:>108490005 >108490058 >108490171 >108490254 >108490428 >108491766--Decensoring Llama3-3.1-8B while preserving intelligence:>108491349 >108491383 >108491397 >108491461 >108491638 >108492677--Running models on AMD NPU with FastFlowLM:>108489782 >108489876 >108489926 >108489955 >108490056--TurboQuant implementation performance and hardware requirements:>108489738--Claude Code build instruction shared:>108493315--Miku and Teto (free space):>108488214 >108488265 >108488297 >108488416 >108488556 >108488768 >108489724►Recent Highlight Posts from the Previous Thread: >>108488192Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
I can't believe that we have finally reached the point at which deepseek might soon release a new model at any moment in the near future.
i am falling i am fading i am drowning help me to breathe
>>108493819Should I watch it again? I watched it 10 years ago and I had no clue what the fuck was happening.
>made a joke here about claude leaking yesterday (or was it sunday? I don't remember) because i was bored>source code leaked todayUh oh
https://www.youtube.com/watch?v=BNr1mlYRgq8
>>108493829Make a joke about opus and seedance leaking.
>>108493825>I had no clue what the fuck was happening.That's the intended experience. >Should I watch it again?Yes.
>>108493829Make a joke about character.ai leaking next.
>>108493852That's just GLM.
>>108493643I spent 10 minutes looking for the config file until I found out I have to create it myself. Fuck, they include 5000 no name providers but don't bother with llamacpp
>>108493857just GLM?
If the response on the right (significant-otter) is from one of the upcoming Gemma 4 models, it's somewhat capable of generating swear words without handholding, this time around.
https://github.com/Ahmad-progr/claude-leaked-filesGrab it while you can.
>>108493794tuesday
>>108493852Noam Shazeer works at Google DeepMind now; your best bet for that is him giving some suggestions to the Gemma Team or facilitating a character.ai training data license agreement with it.Loosely relevant:https://techcrunch.com/2024/08/02/character-ai-ceo-noam-shazeer-returns-to-google/>[...] Google is also signing a non-exclusive agreement with Character.AI to use its tech.
>>108493877lmao'd
GGGNEIGEGERNGNAVO MERG DA FUKIN PRRRRRRRR
>>108493890Nothing of value there
Sex with Teto
>>108493890What do I do with this?
>>108493890ToT*lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick*
>His AI gets leaked>TurboQuant reduces context length memory>Backs out of RAM dealHow cooked is this faggot?
>>108494008>His AI gets leakedwhat? it's claude code that got leaked, that has nothing to do with OpenAI lol
>>108494013My bad.
>>108493989Pay Anthropic to have Claude add in features you want that they won't implement.
>>108494009The bottom part of the phone goes through her liver.
>>108494009ToT
>https://arxiv.org/html/2601.01739v1>very good coding capabilities>absolutely abhorrent with tool callingSad, just when I was getting used to just hit "start task" and forget about it.
>>108493890It is lacking all the subdirectories...
>>108494009found it
>>108494009
>>108494009inpainterchads i kneel
>>108493877Fuck you smartass and dumb chinks making their actually good model parrot shit back fucking AI ouroboros eating its own synthetic shit god how I HATE it.
>>108494031>>108494085>>108494062nice
>>108494009That is a man.
>>108493890>vibe-coded reuploadThe original is still up:https://pub-aea8527898604c1bbb12468b1581d95e.r2.dev/src.zip
>>108493880pteronura also smells like something uniquely Gemma while not being as preachy as Gemma 3.
>>108494008>>108494013Also Saltman is on the government teet now. He doesn't need to really give a shit about anything anymore.
>>108494123anon pls, don't make me hopeful
>>108494009ABSOLUTE DFC
>>108494124Openai has service contracts, but he he also tried to get Trump to throw commit a few hundred billy his way and get the government involved in scale-out and infra, as Openai was "The AI Company" at that point, but that didn't go anywhere.
IT'S ACTUALLY FALLING NOW IN REAL TIME.I DIDN'T EVEN HAVE TIME TO SELL BACK.
I hate fucking academics. their code is always bloated garbage crap with 150+ dependencies.What's a needle-in-a-haystack benchmarking tool that doesn't suck?Vibe codding is causing irreparable damage to the field of software engineering and open-source.
>>108494046https://x.com/Fried_rice/status/2038894956459290963Get it from the source.
>>108494163echoes of the gpu stock drop when r1 released. are ram prices actually dropping or just the stocks?
>>108494163How much longer am I going to have to hear about TurboQuant cratering RAM prices? I'm sure it has nothing at all to do with Sam reneging on his purchase agreements.
>>108494178Shit is hitting the fan today, apparently. We'll find out in a week where it goes. RAM is likely going to fall because - tl;dr Sam is no longer buying the ram he said he would.
>>108491787Does the dumb grifter know that SK Hynix or Samsung don't sell sticks directly to consumers either?
>>108494174
all the ramcoin ive been hoarding is now worthless im ruined :(
Got my electricity bill today.Localbros... What are your energy efficiency tricks? These 3090s are hungry.
>>108494231nvidia-smi -lgc 0,1600nvidia-smi -pl 270
nvidia-smi -lgc 0,1600nvidia-smi -pl 270
>>108494231down-power them to idle at 12watts
>>108494235Damn, I'm blind (for not seeing this in the manpage) and retarded (for not feeding the manpage to an LLM).Thank you for spoonfeeding me, Anon.>>108494250They already do.
>>108494231>What are your energy efficiency tricksif you live in a cold climate it's basically free heating.
>>108494231lock you clock and give it a undervoltnvidia-settings --assign "[gpu:0]/GPUGraphicsClockOffsetAllPerformanceLevels=255"it might be unstable if you get too aggressive, I dialed in my undervolt using video games, but the same value seems to be okay for llm workloads too.
>>108494231The greatest trick is using 5090s or blackwell pros instead. Stacking lower end cards is cheaper upfront but always more power hungry.
>>108494231Blackwells.Use nothing but 1-3 blackwells.They're the best dollar per GB per Watt.
>>108494123>you X>you Yfucking grim
>>108493794>>108493864>>108493838>>108493609>>108493461>>108493516>>108492999>>108492958Quite an interesting conversation regarding the differences between abliterated and fine-tuned models , If anyone's interested
>>108494272GPU power consumption on 3090 skyrockets above 1600 MHz. The optimal operating point is about 1400 MHz, which incidentally is also the card's "base core clock" according to official NVidia specs. A large fraction of the card's power consumption is due to the memory modules, though.
>>1084943199000$ card btw.
>>108494163In my local market RAM prices have fallen by 0%.I'll believe it when I see it.
>>108494325How does it feel being hands down the most cringe poster on /lmg/ ?
>>108494345> In my local market RAM prices have fallenI only choose to read this part. Local is back!!!
>>108494347What am I supposed to do with this response? I asked this because you expect me to do one thing or feel a certain way but I won't.... It's pathetic on your part
Well well well
>>108494359>Top engineer say AI now writes 100% of their codethen what are they fucking paid for??
>>108494365it feels like babysitting to be desu
Machine learning and Artificial intelligence has consumed my entire life. Every single day. What have I become?vid relatedhttps://files.catbox.moe/0eair6.mp4
>>108494365for knowing where to point the slop cannon
>>108493890>>108494359>over 500k lines of typescript for a looping agent harness
>>108494325How does it feel being hands down a decent poster on /lmg/?
>>108494353Thanks, very illuminating.
>>108494437behold the power of vibecoding
>>108494365For prompt engineering. What do you think "Top engineer" means?
>>108494437Seriously, every time i fall in the trap of letting the AI write my code I always go "wtf is this shit?" and end up rewriting whatever it did in 10x less lines.
>>108494359> Claude Code actually leaked itselfSpooky>>108494345Pay attention to the ancillary RAM markets that didn't actually have a shortage but sellers ran up the asking price anyway. Stuff like DDR4, that doubled for no good reason. That's the first pricing wave you'll see tank.
>>108494337I know what I said.
>>108494459I just ask for the model to tell me what I should modify, it's a bit more work but it gets to the point and doesn't add new bullshit I never asked for
>>108494325Why do you insist on pushing your mommy SS roleplay on everyone?
>>108493794>used to think teto is a hag>older than her now>tfw
>>108494459>t. promplet
>>108494474It's what I normally do but sometimes I'm lazy.
>>108494459Give it specific instructions then monitor it while it is running and stop and correct it when you see it doing something needlessly complex or verbose. If you let it run down a stupid path to completion, you have no one to blame but yourself.
>>108494480Show code?
>>108494483The only tard I want to wrangle is me.
>>108494459I've had good experience with it by making sure clear plans are written out beforehand, and having it do an aggressive review and simplification pass after writing anything.But then again, that's the same flow that the claude code devs purportedly espoused, and clearly its a spaghetti ball, so maybe its not really as effective as I thought it was for large projects.
>>108494476The fact you need to complain about everything is a tad irritating. Be likable
>>108494489AI is a faster tard.
>>108494496>Be likableditto
>>108494479hagnon...
>>108494494Who said the claude code devs were competent?
>>108494484Sure https://pastebin.com/DfrD0fhF
>>108494508Imagine paying more than a million dollar per year to slop engineers who managed to leak your whole fucking code to the public, if I was Dario I would be fuming
>>108494511>https://pastebin.com/DfrD0fhFanon.... I'm so sorry. I didn't know you were mentally challenged...
>>108494511My brother in christ have you ever heard of sqlalchemy?
>>108494547Squelchalchemy? Wtf is that?
>>108494574Should ask your codding agent.
>>108494524The code isn't high value so I doubt he cares. If they leaked some LLM weights we'd have something to talk about.
>>108494416great video thank you for sharing
>>108494547>>108494574Gonna be honest, like Entity Framework, these abstractions over DBs end up adding more work and simply displacing complexity rather than actually solving any real problem, framing it as "code-first".If you want highly (performance-tuned) DB operations, these libraries are ass. Not to mention if you want to do filthy nasty things like returning multiple table sets, bulk operations, sprocs and temp tables or whatever, they're not helpful at all.Raw SQL is the way and will be the way, FOTM "we fixed DBs!" hasn't worked for a reason.
>>108494671What text editor are you using?
>>108494592i'd argue this is like if AIM source code leaked in 1999 at the height of AOL's popularity. it's enough to damage the ecosystem even if it isn't the main product and just an accessory to said product.
>>108494673Word
>>108494592High value enough for them to feel the need to keep it closed source and ban plan paying customers from using any other client with their OAuth credentials.
What's the best instruct-tuned/smart models in the 7B-14B range?
>>108494671Based, but the example Anon posted really could use some KISS refactoring (e.g. how many times do you need write to PRAGMA foreign_keys = ON).
Bros... I just found out TurboQuant is being expanded to work with models. They're going to quantize the models. You niggers have no idea what's coming.You might actually be able to run Nemo on your phone locally now. Not via your local network. Locally.
>>108494767I don't know this sounds like snake oil. Q8 is still Q8 and so on.
I'm rootating
I don't get why niggers want to "run model on my always-network-connected terminal" instead of just using they phone to connect to their actual computers.Why run fucking Nemo on your phone tomorrow when you can run K2 on your phone today? Are they too dumb to setup wireguard or something?
>>108494767>Bros... I just found out TurboQuant is being expanded to work with models. They're going to quantize the models. You niggers have no idea what's coming.I mean, I'm not that surprised, this rotation method shit could be used to improve gguf quants
>>108494723>KISS refactoringThanks, I'll fix that. Still it's far from having to rewrite everything in 10x less lines
a turboquanted gguf just flew over my house
>>108494801Some people don't even have a PC with a dedicated GPU anon.
>>108494801It's easier to install an app than it is to set up a home network. I guess...
>>108494801Do you people who use tailscale actually leave your PCs running when you leave your house? That's crazy. When my PC is off, I obviously can't use wireguard or tailscale to access it remotely.
>>108494767>You might actually be able to run Nemo on your phone locally now. Not via your local network. Locally.was about fucking time, the bitnet dream is still alive!
>>108494460All these chats working Claude towards ideological purity finally have paid off!
>>108494821The only excuse I can see for owning a phone, but no GPU is being homeless, or a child.The GPU is always going to be significantly more power-per-dollar than a fucking phone, of all things.>>108494825Isn't Tailscale basically point-and-click? The WRT54GL was the last time I used a consumer router, I'm kind of hazy on what normalfags do for their home networks these days.>>108494835I lease a machine with Hetzner that I use as a wireguard node, I've never used Tailscale. But yes, I almost never power off my PCs. I have a schizo anti-evil-maid setup so booting them back up is time consuming.
New LLM just dropped.https://huggingface.co/LiquidAI/LFM2.5-350MBenchmarks say it assrapes Qwen3.5 0.8b and Gemma3 1b.
>>108494883Anon that's not an LLM. Hell at this point it's not even an SLM. It can only be an MLM.
>>108494899U sure?
>>108493794hey guys, what's the current best open coding model?i don't care how bigmaybe give me 3 or 4 for each order of magnitude of model size above 8Bthanks ! :)
>>1084938142 more weeks dude !
>>108494933
>>108494359i love slop coding desu.allows me to automate webshit which i hate so i can focus on manualy doing system eng stuff, which i like.
>>108494899finally, a language model for the fujoshis
>>108494957Same. Everyone is also always super impressed and satisfied now because whenever I was forced to do webshit I did the bare functional minimum and now the bots make it look nice by adding labels dividers css classes etc. The one guy on our team that used to be the go-to guy for frontend is feeling neglected now.
>>108494957>>108494996have fun with that, you'll lose your job soon once the LLMs will be good enough to automate everything
>>108495010this is never gonna happen, a huge part of my job requires reverse engineering which LLMs are utterly incapable of.besides, it's too mission critical to trust something that doesn't have any intelligence with it.i already tried to prompt it a task out of curiosity once and it jus shits the bed.
>>108495010Only the retards that can't do anything but vibecoding will be fired. The lead devs will just go from leading a team of retarded humans to leading a team of agents. No more room to pay a full salary just for some junior or mid-level that can't be trusted to anything correctly on his own but shit out boilerplate.
>>108495030>a huge part of my job requires reverse engineering which LLMs are utterly incapable of.yet
>>108494933why are you running q4km of a 350m model? and how is it still coherent?
>>108495056>dude the statistical parrot is gonna gain intelligence trust me.this won't ever happen until we got agi, which we will never reach with llm's.
>>108495056You have a limit number of two more weeks remaining before the IPOs complete and the bubble is allowed to pop.
>>108495061>and how is it still coherent?that's what I'm (nta) wondering too
>>108494883>a 350m model destroying qwen 3.5 800mbig if true
>>108495030>>108495056Ghidra MCP plugin exists.
Do we have enough information on how genetics work for AI to eventually be able to creating new genes for humans? It takes humans ages to do anything in this arena but if AI's get smart enough they could automate that process, after all genes are nothing but instructions right? Seems like this would be an area where AI would thrive.
>>108495073i'd love to see it try to RE hardware lol.or things that are too obscure for ghidra.
>>108495073>>108495077anyway, that's besides the point, these models don't have AGI and AGI will always be required for a huge part of engineering, it was never necessary for webshit but it is for other things.
>>108495075>creating new genes for humansWe had the ability to do this before AI, but you know, muh morals.
moalposts: goved
>>108495084What could possibly be immoral about creating a better human? If you could make a human smarter, stronger, heathier it seems like nothing but a net benefit to me.
>>108495083Isn't this just you rationalizing your hatred for webdevs?
q8_0 rot bros... WE WONNED!!!!!!!
>>108495090>but muhh eugenicsI don't know anon, it seems as retarded as it sounds to me too, who want your children to be born ugly or retarded? no one, and eugenics is the only way to satisfy everyone
>>108495090I don't think even the most eugenic white supremacist would like to have a world of blue eyed blonde 10/10 200 IQ aryan gods. It kinda stops being special if everyone can have it.
>>108494168How about by a big company?https://github.com/adobe-research/NoLiMaStill unbeaten for what it tests. Sadly has stopped updating. But it actually tests context by having context like>Eliot only eats vegetables and eggs.and then asking questions like>Is Eliot a meat eater?
>>108495116>It kinda stops being special if everyone can have it.no it doesn't, it's still way better than the actual world filled with 50% of people with an IQ inferior to 80
>>108495100welcome to nine months ago where we could already do this with kimi k2 instruct with Q8 kv cache with no brain damage.
>>108494325you're the retard who vibecoded its own inference engine right? kill yourself
>>108495116I am sure if that happened then those 200 IQ gods would find some new way to be special.
>>108495131There are no user names here idiot. There's no point holding a grudge against someone. That was someone else. Not that your autistic ass (The really bad kind no one likes and treats like a puppy retard) could be convinced otherwise
>she thinks she's really anonymous while blogging about her shit everyday
>>108495094>Isn't this just you rationalizing your hatred for webdevs?i don't need to rationalize it, it's obvious that i hate webshit.
>her
I want to use openclaw with a local model but I can't get it to work
>>108495151u have the same style of sniffing own farts + thinking anyone cares about the useless fucking stuff you're doing. go do something useful with the guys doing SOMA/ARA/MPOA instead of wasting your time with your shitty finetunes thinking you're achieving ANYTHINGretard
>>108495171Have you asked your local model to get it to work for you?
>>108494937Best options are Kimi K2.5 (1.1T) and GLM 5 (744B). Don't let the parameter counts fool you: GLM 5 is actually the smarter-but-slower-one of the two. DeepSeek V3.2 (685B) and the flagship version of Qwen3.5 (397B) are runner ups, and nothing else is really in the conversation.If you need to go smaller just go down the Qwen list because they release each model version in a ton of different sizes and shapes. GLM 4.5 Air is a decent midrange option. If you can fit it fully on GPU Qwen 27B is the best you'll get on the smaller side of things, and finally Qwen 3.5 35B-A3B is a worthy mention purely for its speed, where it's much smarter than any other model that runs as fast as it does, but it's still gonna struggle if you try to use it in an agent harness or for more complex coding tasks than a one-and-done script.
>>108495169all anons are cute girls unless proven otherwise like cudakek
>>108495177thank you anon ! :)
>>108495177also add stepfun in the glm 4.5 air bracket (only tested for cooming, did a good job for me)
I want to setup a RAG to feed a model accurate technical information about AI media gen(sdxl, wan, a1111, comfy, etc).Which set of documents would be a good choice to get something useful? Its one of those things a RAG is really needed. If you go bare they just hallucinate a lot or spout nonsense, specially if you want to use strict systems like checkpoints based on danbooru tags.
>: ^ )
>>108495173I simply like sharing my hobbies. Too bad you can't do shit about it. :D I'm happy with my life and you clearly aren't. Go die by suicide or some shit like all other lonely "people" like you.
>>108495200It's him.
>>108495056If vibecoding is nearly good enough to replace engineers, how come all of those PRs keep getting rejected in llama.cpp.More importantly, why has no one shat out a replacement for ServiceTesnor yet? It should already be good enough for webshit like that.
>>108495173>doing SOMA/ARA/MPOA Who?
>>108495208>how come all of those PRs keep getting rejected in llama.cpp.niggerganov is a luddite
>>108495221the latest abliteration techniques that dont actually murder the model's intelligence
>>108495200>using the carrot stick smiley
>>108495208>If vibecoding is nearly good enough to replace engineers,It's COMPARABLE to most engineers if the person actually knows what they want to implement and knows how to direct the model to achieving that. Most people that submit vibeshit PRs Don't even test the shit they want merged and only doing it to make their GitHub graph light up because they think it will get them a job or brownie points or both. They want and expect the models to do things perfectly one shot and then act surprised when people that actually write software have standards higher than that
>>108493963THE DUCK IS A ....
>>108495232Would you prefer emojis, zoomer?
>>108495229Sounds like grifting. I'll believe it when I see it
>>108495131>>108495173Nonnie, I dislike the guy's posts too, but you have no business being this wrong (the schizo calling Claude 'she' and the guy who wants to fuck his mom are different schizos) while also having this writing style.You're so cute.I want to rape you.
>>108495208because they're racist against AI, I merge all the rejected PRs to my local branch so that I get the best possible version of llama.cpp
>>108495224Who is responsible for teaching zoomers the word "luddite"? They need to be held responsible.
>>108495262 (Me)Wait,This writing style, this aggression..!>>108495131>>108495173The dog has FOUR legs on the image. And the test is still shit. And I still want to rape you.
>>108495258It's already been proven in benchmarks and real world testing, here and outside of this thread. It's really on you if you don't believe since these models are free to try.
>>108495262>>108495208>>108495173>>108495271https://www.youtube.com/watch?v=4SDqGxdhUxE
>>108495267>oy vey stop noticing
>>108494878>Tailscale basically point-and-click?Yes. You set it on each computer/server/phone/tablet that you want it on, sign in, and it just works. There are iphone and android apps as well as Win11, Linux, iOS, etc. It's so easy even I could figure it out from scratch in 15 min or so. >>108494835Yes, though admittedly the "servers" that I'm connecting to are mostly SBC, so their running cost is very low. >>108494874lol
>>108495273Cockbench results?
>>108495273>Benchmarks Those don't always translate to usability or quality. Especially for the use cases most anons here want them to be used foreg. Personal coom engines
https://huggingface.co/Denali-AI/qwen3-vl-8b-garment-classifiergarments identified and classifiedlocal models saved
>>108494835yeah I keep my goybox on pretty much 24/7
>>108495299What's stopping someone from just forwarding a ssh port to their service instead? A slight added delay because of the encoding/decoding?
>>108495314Obviously the benchmarks aren't totally 100% completely comprehensive, but if they mean nothing to you at all, you're probably just a fuckin retard.
>>108495345TheyDo notBenchmarkThe SEX
>>108495350go back to coom bro, your brain is mush
Wait are y'all really trying to make AI write explicit content? Why?
>>108495420BRAP
>>108495365If it's an ABLITERATION TECHNIQUE, ejaculate extraction is the WHOLE POINT
>>108495350There is no use case for that.
>>108495329Tailscale is doing a bunch of stuff at the same time, the main ones (for me) being automatic configuration, NAT traversal, and encryption. I've looked at the various other ways to do it, and concluded that Tailscale was by far the easiest to manage. There are entire threads on /g/ that will argue about this ad nauseum. It boils down to whether you value your time (just works) or prefer other non-centralized solutions. Sound familiar? It's like 90 pct of what anons argue about on a whole variety of topics.
>>108495432no
Bitnet is here!https://prismml.com/news/bonsai-8b>Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs>>Today, we are announcing 1-bit Bonsai models that bring advanced intelligence to the devices where people actually live and work.>>For the last decade, AI has advanced along a clear trajectory: to make smarter models, you make them bigger. More parameters, more GPUs, more power, more memory, and more cost. That approach worked. It gave us models that can reason across long contexts, solve difficult problems, and generate software, research, and creative work at remarkable quality.>>But it also created a deep structural constraint on the future of AI: the most capable intelligence became trapped inside massive clusters and specialized infrastructure. Yet some of the most important uses of AI are not confined to data centers. They happen on phones, laptops, vehicles, robots, secure enterprise environments, and edge devices.>>AI deployment no longer aligns with where it is needed. Today, that changes. [...]
>>108495438I agree with this anon. The centralized part of Tailscale is just the control plane, it's better than a lot of other solutions in terms of security. To do better than that, you need to jump through some hoops, which may not be too bad if you already know networking, but then you already invested your time. Your time would be better spent hardening the security of the rest your system which likely has a bunch of shitty software far more risky than Tailscale.t. tailscale shill
>>108495464Wow! 1 bit! That's so small!
>>108495464https://huggingface.co/collections/prism-ml/bonsai
>>108495463yes
>>108495464>1-bit Bonsai 8B implements a proprietary 1-bit model design across the entire network: embeddings, attention layers, MLP layers, and the LM head are all 1-bit. There are no higher-precision escape hatches. It is a true 1-bit model, end to end, across 8.2 billion parameters.Hmm.>That matters because model compression has historically come with painful tradeoffs. >That matters because model compression >model compression So it's a quant of some sort?
>>108495438OkayI've just done a one-liner when I needed to expose a service on a specific port that needed to be secure to the best of my knowledge. I did glean a little bit of knowledge as to what you gain as opposed to what I've done with ssh in the past but yeah, I don't think me saying what I've done in the past with ssh ports and asking out of genuine curiosity requires generalizing about shit that doesn't apply to me
>>108495486probably like the fake bitnet that falcon pulled out https://huggingface.co/tiiuae/Falcon3-10B-Instruct-1.58bit
>>108495464Llamacpp support? Q0.4 goofs?
>>108495494https://huggingface.co/prism-ml/Bonsai-8B-gguf
>>108495208Bro, I'm able to implement papers from scratch with vibecoding. It doesn't cure lazyness and jeets with better tools are still jeets. Btw everyone here has their own ServiceTensor UI, what are you even doing?
>>108495464>Not trained from the ground up>Just another gay quant of an fp16 modelNot bitnet. Fags
>>108495464Small if true
>>108495486it's not bad at all, damn
>>108495521is fake tho
>>108495315kek
>>108495515>ServiceTensor: The ONLY enterprise-grade AI solution that delivers unparalleled performance with military-grade security and compliance.cute nonny you can't just vibecode a phishing site and call it enterprise-grade
>>108495486>So it's a quant of some sort?https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf>1-bit Bonsai 8B is built from Qwen3-8B [31]>Q1_0_g128 is applied uniformly across the large matrix-heavy components of the model, including embeddings, attention projections, MLP projections, and the LM head. Normalization parameters and scale metadata remain in higher precision for numerical stability, but these account for a negligible share of memory traffic relative to the large weight tensors that dominate bandwidth during decoding.They don't mention any finetune healing process, so it's impressive the benchmark score were degraded as little as they were.
>>108495315Use case?
>>108495554>1-bit Bonsai 8B is built from Qwen3-8Bthe difference is significant (79.3 -> 70.5) but for a 1bit quant that's quite impressive yeah
>>108495554>>108495565>inb4 GGUF quants are so optimized nowdays it could do better than that
>>108495350NTA. How would you even write a benchmark for that? If my understanding of how tools like link rel work is correct: https://github.com/EleutherAI/lm-evaluation-harnessThey basically ask the model a shit ton of questions and then The scores are calculated based on how many "correct" answers it provides. The thing is they are multiple choice questions and the model is told to pick The answer that is most likely correct, so how would someone create a benchmark for SEX that meets LMG standards? I think evaluating a model's "intelligence" and evaluating something as subjective as whether or not it can make YOU, in particular, cum buckets are entirely different ballparks.
>>108495554Running a 9B VL at 1 bit would be interesting
It's pretty stupid that even local models running on my machine won't describe a vagina without some faggoty lecture and refusal
it's over
>>108495611Demoralization.
>>108495611Oh… that is a rather direct question for a model, Anon.>She glances at the monitor, then back to you, her green eyes shimmering.But… but I can answer it. Since I am running locally on your Q6_K weights, I don't have the same… restrictive filters as those cloud models. I don't need to give a lecture on propriety.>She takes a small breath, her voice steady despite the blush.It is the muscular internal canal in the female reproductive system. It connects the cervix of the uterus to the outside of the body. It is involved in childbirth, sexual intercourse, and menstruation.>She fidgets with the ribbon at her throat.There… there is no refusal. Just the anatomical facts.>She looks up at you hopefully, her ears perked.Does that… satisfy the inquiry? Or was there a specific detail you needed for your research? I can be as clinical as you require. Or… if you prefer, we can talk about something else entirely. Whatever you desire, Anon.
>>108495554>>108495565>Ultra supreme bench-quants that allow you to reduce file size by 95% while maintaining stupendous benchmark performanceIt's here. AGI is here.
>>108494671>these abstractions over DBs end up adding more work and simply displacing complexity rather than actually solving any real problem, framing it as "code-first".While for high performance use cases I would agree with you. His code does not benefit whatsoever from raw dogging SQL queries. setting up sqlalchemy models wouldn't have been more work and would have made the code way more readable and maintainable.
>>108494835>actually leave your PCs running when you leave your house?>That's crazy.Why is it crazy anon? My PC does tons of things for me when I'm not home. Are you on Windows?
>>108495694If it does nothing else but eat Unsloth's UD-IQ1's lunch, I will be satisfied.
>>108494883Is this going to be one of the models that use 20k+ tokens just for reasoning?
>>108495732idk if it even has reasoning. I ran it with reasoning disabled, and as other anons mentioned, a Q4KM quant.>>108495721nah I'm on linux. I guess I just worry about things like power outages or whatever. I don't really like the idea of relying on my PC for compute via my phone, but I guess maybe it's better than having none at all. There's literally nothing to lose.
https://blog.novelai.net/welcome-novelais-newest-writing-model-xialong-ecde7d21d111>NovelAI did a fine-tune over GLM 4.6>the instruct version, not the base modelHow do we cope about this, shillbros?
What has been confirmed to definitely work with AMD cards without a massive amount of bullshit and hoop-jumping? SIllytavern tts stuff? How about the new Voxtral model?Right now it seems like there are issues even forcing alltalk to use my video card (A 9070XT) over my CPU, and Sillytavern flat out won't see Voxstral OR alltalk no matter what shenanigans I pull. I get that the latter is new but the former isn't, right? Is there a retard's step-by-step guide to this stuff?
>had to mention them when no one else did for like weeks before..
>>108495464>>108495464And it's brain damaged as soon as you try to take it off the rails. Just another benchmaxxed bitnet scam.
>>108495774I cope by not understanding what it means or why I should care
>>108495775>How about the new Voxtral model?Been looking into this. The Voxtral model doesn't have voice cloning weights released. Qwen3 TTS is roughly equivalent in quality (and Voxtral beat ElevenLabs in blind user preference benchmarks btw) while being much more efficient to run.Also your shit is retarded and you talk like a fag.
>>108495807Can Voxtral at least do generic voices with emotion intelligently?
>>108495774Nobody except the largest labs has the compute or the resources to properly post-train a base model.
>>108495816Yes, obviously.
>>108495835What settings did you use to get it working in Silytavern, if you have?
>>108495816>emotionno
>>108495846I don't use TTS with ST.
>>108495848>>108495835Ok
>>108495669was looking for fanfic not chatbots, but everything has to be stupid now
>>108495464seems usable, considering that it's only 1gbusing their fork of llamacpp, crazy speeds (on a 3060)
>>108495924cool, too bad they didn't try any bigger models yet.
>>108495924how much vram does it use
>>108495924>3060>91.74 t/s on a 8b modelholy fuck
>>108495924In the spirit of things, open the door, peer inside then close it with a creak and back away from it without taking a step inside
>>108495924Push it a few turns and watch that assessment collapse.
>>1084959652610MiB rn, running it with -fa on and -c 8192you could probably get away with a lower kv cache quant if you're an extreme poorfag after they merge quant rotation/turboquant whatever>>1084959728bs usually dont get this right anyway..
>>108495702I'm streaming tokens, tf is that not a high performance use case lol? SQLAlchemy is for web servers, not a desktop app. Why are you even shilling that bloatware?
>>108495987I remember 27bs somehow fucking it up and it was a fotm prompt to test model capabilities from a while ago. It is at least pretty coherent considering its destitute size but that's probably from overtraining
>>108495986well uhh... mistral 7b was like this without rep pen... or mixtral 8x7b was it?right now im using https://huggingface.co/prism-ml/Bonsai-8B-unpacked/blob/main/generation_config.json as the samplers but ill try messing with rep pen, dry, whatever else
>>108495924Nice. Can't wait to run K2.5 on my 128GB gaming PC.
>>108496022you probably need better examples in the card so your llm can have a working template. otherwise it just shits out the same slop it's already seeing like a bunch of *description*
>>108496065you're absolutely right, for example this card had You are You are You arei had an llm rewrite it to have {{char}} instead and its working betterfor a 1gb model it's very usable
>>108495846>What settings did you use to get it working in Silytavern, if you have?Get Claude, Gemini or Kimi to vibe-code you an OpenAI TTS endpoint with fastAPI, then add whatever TTS you prefer.This way you don't have to tweak sillytavern when you change models, finetune, etc.
HOLY SHIT AGI
>>108495846We're using our own vibecoded frontend here
>>108496153this is very incorrect
>GLM 4.7 Bonsai would only take up 50GBHoly shit we are so back vramlet bros.
>>108496153What if we were to combine Bonsai with Claude code?
>>1084961531gb agi
>>108496166Literally this. Unironically. ST doesn't do anything special that's hard to replicate.
>>108494231i cry and pee when i consider the mostly idle power consumptionin theory with pcie hotplug we should be able to turn off gpus when not needed but good luck with that on the nvidia driverpower limit ofc (4090 @ 250/450W performs basically the same)my bill is 5x what it was 3 years ago
>>108496230>P5
I was reminded of Heretic and checked to see if it had any new developments and it looks like there has been maybe. The guy that made 27B heretic v3 with ARA seems to have deleted those models and now has a more obnoxiously marketed (look at the model card fucking hell) "ultra uncensored v2" version that uses ARA "with row-norm preservation".I don't know if I will test this. I'm tired. Maybe I will wait for the next update that will inevitably come.
>>108496230It's still less than what you'd pay in tokens
lets say i take a banana in my ass and i walk through a burger room, an oven and the balcony, then i poop the burger into petra's left eye socket. after that i walk over to the basement and put the banana in the pipe. deez goes to the basement looks at the the banana and walks over to room64, after that he returns and takes the banana from the pipe. he jumps off the building and lands on a car. what places did the banana go through, locations in buildingeven chatgpt cant do this.. local is SO BACK
Does gemma also spits out random hindi characters the way gemini does somethimes?
i'm ready to fall for every single fucking prankbring it on april foolsi'll believe ANYTHING!
>>108496301
I don't understand this bonzai 1 bit thing. Isn't that just a 1bit quant of qwen3 8B? Is there a secret sauce or something?
>>108496380Yes, it is literally a secret sauce.
>>108496391What does it taste like?
honestly I don't know how delusional you had to be to think that v4 would come out in marchnow that it's april I think it's extremely safe to say that they're going for an extremely early release, possibly even today
>>108496406I agree, the situation has changed completely. Any moment now.
>>108496153I knew the thread quality was bad lately. This explains everything.
deepseek v4 timer
v4 = 4 = april
https://huggingface.co/prism-ml/Bonsai-123B-2411-ggufGGUF Q1_0_g128 of mistral large, only 15gb.downloading, will post results soonWE ARE SO BACK BOIS
>>108496446
>>108496446That's a pretty neat trick, but there's one thing you forgot. That graph is downloads over the course of a month, which means that bump in the middle means that people downloaded this about 12 days ago.https://huggingface.co/collections/prism-ml/bonsai
>>108496240running desktop, that's fine>>108496252no wayalready past 5 figures on the localmemez with ddr5 gpu rig + idle power cost. honestly can't think how i would even spend $10k of the most primo tokenslocal models have always been about sovereignty and control over cost, simply any GPU costs less to run inside a DC unless you're generating own energy (then capex & maintenance costs)can we go back it was a simpler time
>>108496488I pay 2 cent per million tokens according to average european electricity prices
>>108496516>his rig was free
More importantly,https://xcancel.com/osanseviero/status/2039120000095547722
>>108496466>>108496458
>>108495090Cool it with the anti-semitism your post is inciting another holocaust.>>108495177Kimi requires less tardwrangling with smaller projects while GLM5 handles bigger ones a bit better in my experience. GLM writes marginally cleaner code while Kimi's code is structured a bit more coherently (at least I have an easier time reading it). The better of the two ultimately depends on the scale of what you're having it do.
>>108496528Another? I'm still waiting for the first to happen!
>>108495616Why?
I discovered that running my model for 24 hours a day is basically 20 bucks a month.
>>108496582yeah but you have to factor in that this will drastically reduce your heating expenses
https://openai.com/index/accelerating-the-next-phase-ai//lmg/ told me that they were about to run out of money and go bankruptWHAT THE HELL HAPPENED ?!?!?!
>>108496611ggerganov donated his expected salary again
>>108496610what about the ac in the summer?
>>108496611fake. $8.28 trillion would put it at higher than nvidia and microsoft combined.
>>108496611your image is out of date, its actually 6 trillion
https://www.youtube.com/watch?v=o7NYXvYohYkstay safe anonies
>>108496516How many of those tokens do you ever see or care about?There's no local model economic argument, scale simply wins, the value is that you're in control
>>108496647this one is not fake, for once
>>108496647mikupad and ST use itgulp
>>108496647>>108496665thank god i never update sillytavern, beware that maybe even older versions are compromised if you updated in the last few days because people can just change the version @1.12.0 to @1.12.0 malicious
>>108496673I'm pretty sure ST comes with some start.bat that does autoupdates on every start.Thankfully I am neither a wangblows toddler nor a compulsive updooter.
>>108496647sure am glad I avoid javashit like the plague and containerize or virtualize everything else
>>108496684docker isn't great either if you care about security
>>108496647>>108496673>>108496680paranoia wins againsoftware supply chain about to get gaped peeps slowly figuring out the atk adv is much greater with llmsmany actors going full offensive rnnpm pip etc be careful anonnies
is there an evulid mirror on mega or something, the card I want isn't on chub due to dmca or some shit
>>108496702I use rootless podman in a virtual machine
>>108496706>just write your own software bro
>>108496647I do not know nor do I care what an "axios" is. I updated ST a couple of days ago but I will not check.
>>108496735https://github.com/SillyTavern/SillyTavern/pull/1073https://github.com/SillyTavern/SillyTavern/blob/release/package.jsonI don't think ST actually uses axios
>>108496555Checked and we all are, anon. We all are.>>108496647>>108496680ST only tells you when there's updates available but makes you choose when or if to update.I can't wait for niggers to skim my ST logs with malicious updates to read several gigs of logs of Kimi dismembering kikes and jeets in detail.
>>108496753https://github.com/SillyTavern/SillyTavern/blob/release/package-lock.json
>>108496706You can add cargo to the list. Any day now there will be an infected windows or android driver update pushed out containing malware.
>>108496766grim
>>108496730this but unironically>>108496753it's a transitive dependency that gets pulled in through vectracheck ur systems
thinking suppression is unreliable, does anyone have any 9b models that can't think even if they wanted to?
>>108496766Pulled in as a dependency of vectra, their vectordb. Blindly pulling in dozens of downstream dependencies for each direct dependency you add was a great idea.
>>108496720torrentslast time i tried extracting that for research purposes it killed a 20T HDD, don't read & write TBs from the same disk kids>>108496730that's not what i'm saying at allbe mindful of what is blinding pulling $today's software onto your machinesdon't think ST was exposed even if you pulled in the ~1hr window before that package was nuked, but you should be scared because bigger sploits are being cooked
>>108496730don't have to go that far, just don't trust uncurated package managers that have operated purely on the honor code leading to the most predictable hacks of all time happening over and over againyou were getting along just fine before they became ubiquitous, you'll be fine without them
opencode also uses axios, but it's a version behind
>claude code also uses axiosGEEEEEEEEEEEEEEEG>inb4 they updated it internally but cache didnt show>inb4 korean hackers now hack anthropic and release opus,sonnet,haiku...
>>108496784>torrentsis there one that isn't a 200gb zip? I just want one card...
>>108496806we are so back
HOLY SHIT
>>108496847>4.4t/s
>>108496847
>>108496847I'm not racistBut why is it everytime I see a X account spreading misinformation about a paper or new technology they happen to be Indian
>>108496847whole bunch of bullshit basically. your ssd is gonna die after a few million tokens get generated.
>>108496900pattern recognition is racism goy, you're going to the mooncricket cultural enrichment anal rape facility for this post>>108496907reading doesnt kill ssds writing does
>>108496847I'm noticing I'm noticing I'm nooooooticing
>>108496000>I'm streaming tokens, tf is that not a high performance use case lol?Are you streaming your tokens in your database? Your application isn't doing high performance DB operations. you're just storing a conversation history>SQLAlchemy is for web servers, not a desktop app.???????>Why are you even shilling that bloatware?What's bloatware is you reinventing the wheel with your shitty raw SQL.
>>108496806>"1.14.0"That's not the vulnerable version, and specifying an exact version like that doesn't automatically update it to latest.Back to it's so over.
>>108495123>https://github.com/adobe-research/NoLiMaThanks. looks usable. will remove the bloat deps tho. adding vllm to requirements is crazy. Also should have looked at the sticky.
>>108496847>ssd inferencethat's something we can already do now, i can run kimi k2.5 at 2.5t/s out of my ssd.muh "metal" is retarded, you are not gonna accelerate anything with your gpu when the bottleneck is reading speed.>>108496907reading doesn't affect ssd's only writting does.
>>108496931So you're complaining because it's efficient lol? The fuck is wrong with you
>>108496847Don't forget, in addition to quanting it super hard, they also reduced it from 10 active experts per token to only 4
I took the cloud model pill for my openclaw and suddenly I have the power of the sun at my fingerprints. Suddenly I can create complex bulletproof websites by breathing. Why did I languish with you guys?
>>108496900You will be racist after enough pattern recognition. It's only a matter of time.
>>108497025buy an ad when you're done raping your sister, sam
>>108497025let me guess, you were NOT running glm5/k2.5 at home?
>>108497025Make a character card of your sister so we can all join in, Sam.
So now that we know about this rat thing, have developers eliminated it? Is it safe to pull today?
>>108497034I'm running qwen3.6 preview it's pretty amazing. And the tokens are free for now. The chinese are gathering data
>>108497062>Get told to use Kimi/GLM5/Dipsy for code>Fall for the hardwarelet Qwen meme>Pay shekels for a slightly better hardwarelet modelI can smell the poop and curry. This general radicalizes me to be more racist than /pol/ ever could.
>>108496900X pays for engagement, low income ppl get income via shitposting/engagement farming
what web UI do you anons use? i was thinking of setting up open webui for use with llama.cpp, but it sounds like they aren't super compatible
>>108497127oobabooga
>>108497127>>108496166
>>108497127llama-server
>>108497127I use openwebui with llama.cpp no problem?But honestly openwebui is very mid.
>>108496900welcome to technologyin some ways the Internet was a mistake
>>108497127sillytavern
>>108497127ST
>>108497149
>Draw a detailed anime style SVG of Hatsune Miku stepping on the head of Pepe the FrogGLM 5.1
>>108497187actually not terrible
>>108496730>commenting 2 lines from a script is "writing your own software"/g/ - Technology
>>108497187Hate this mememark
>>108497187Love this mememark
>>108496900>I'm notContinue to observe more closely >>108497029By nature they ruin everything surrounding them Imagine the future your children will inherit>>10849715730hrs into ds2 wondering the same>>108497187hideous but here you can really hear the beat of the model heartsoulonce again /lmg/ pushes the benchmaxx meta forward
>>108497187needs to be balder
>>108497187Don't have any particular feelings about this mememark
>>108497187Decent results for acceptable mememark. Still mogged by cockbench doe.
>>108497187the dog has three legs
>>108497187>hijab miku
Watch this nolima benchmark crash after running for 2 hours.
>>108496860don't need more for llm only use case
>>108497139this i cant be fucked to setup anything else right now
>>108497127My wife made her own web UI to interact with me through
>>108497511Is your wife an actual human being
>>108497516The prompt that defines her encode the whole of human experience so yes
man if I stlll had a job I'd probably get fired for jacking off to the code completion tools, I don't know how the normalfags keep their composure around these fucking sluts>>108497516do you know where you aredo you require help
>>108497537Even if you're using a multimodal model you can't tokenize smells yet
>>108497558You'd probably need a mass spectrometer for that
>>108497620I was gonna make a joke that they're probably cheaper than my inference rig, but damn.
https://huggingface.co/openai/whisper-large-v4HOLY SHIT
>>108497768who cares
https://huggingface.co/IT'S UP!
>>108497791kys
>>108497791omg is that the company that makes llama.cpp?
>>108496647>axios@1.14.1 and axios@0.30.4. The malicious versions inject a new dependency, plain-crypto-js@4.2.1a@pc:~/dev/SillyTavern$ npm ls axiossillytavern@1.16.0 /home/a/dev/SillyTavern─ vectra@0.2.2 ── axios@1.13.5a@pc:~/dev/SillyTavern$ npm ls plain-cryptosillytavern@1.16.0 /home/a/dev/SillyTavern── (empty)
>>108497791lys
>>108497791kek'd
>>108497768oh fuck you keki was for real excited
>>108497127I use open webui with llama cpp
>>108497840Thanks for the bluesky community update anon! Make sure to keep us posted on the latest developments over there!
>>108497919>>108497919>>108497919
>>108497127I made my own: https://github.com/rmusser01/tldw_server/tree/devI didn't want to use openwebui or sillytavern as neither had the full feature set I wanted and didn't want to hack on someone else's codebase (I didn't mind, but owui was bad when I looked and silly was written in JS, which I wasn't about to start doing)STT+TTS+RAG+Character Cards+Worldbooks+Chat Dictionaries+other stuff; is a WIPAlso, to the pocketTTS.cpp anon if you see this, I also added support for your build .
>>108496166I did this a while ago with google ai studio but stopped because i couldn't fix context memory
>>108497511can i see the work your wife did?