/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>109148460 & >>109142812►News>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://swe-rebench.comAgentic Coding: https://deepswe.datacurve.aiContext Length: https://github.com/RecapAnon/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>109148460--DeepSeek-V4 llama.cpp integration, quant performance, and DSpark implementation challenges:>109148563 >109148635 >109149793 >109149844 >109150048 >109150088 >109150178 >109150258 >109151022 >109151039 >109151102--Building a local voice-to-voice pipeline with Gemma, Whisper, and Piper:>109152832 >109152850 >109152893 >109153090 >109153133 >109153293--Using llama-server KV cache pre-fill to reduce context processing time:>109149430 >109149495 >109149514 >109149545 >109149573 >109149672 >109149588--Critiques of Ollama as a limited llama.cpp wrapper:>109148609 >109148658 >109148683 >109148785 >109148933 >109148971 >109149371 >109149336--Google's AI strategy, Gemma's RLHF, and the AI benchmark hype economy:>109151428 >109151560 >109151575 >109151590 >109151616 >109151635 >109151653 >109151681 >109151756 >109151741 >109151868 >109151674--Debate over economic efficiency of API vs local inference:>109149097 >109149119 >109149128 >109149650 >109149689 >109150235 >109149709 >109150978 >109151101--Effect of cross-lingual reasoning on output quality and sanitization:>109148696 >109148766 >109148813 >109149767 >109150686--Using author's notes to fix Gemma's logic failures in roleplay:>109150718 >109150771 >109150916 >109151119 >109151237 >109151243 >109151441 >109151055 >109151063 >109150981--Feasibility of poisoning training data to create deceptive sleeper agents:>109148755 >109148775 >109148859--Using SillyTavern macros for random author's note activations in Gemma:>109150224 >109150342 >109150352--Logs:>109150038 >109152373 >109152832 >109152864 >109152893 >109153090 >109153133 >109153293--Miku, Teto (free space):>109148496 >109149650 >109150808 >109151616 >109151756 >109148516►Recent Highlight Posts from the Previous Thread: >>109148462Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
DSpark is the speculative decoding method that llama.cpp has been waiting for. It's open, has a public training script that can be applied to all sorts of models, is complex but comes with serious gains.I can't wait to see it in llama.cpp and bring forth the age where speculative decoding will be as common as samplers such as min-p.
>>109153635>DSyeah no
Are there any other models in the 20-50B range worth considering other than the perennial favourites of Qwen and Gemma? I'm doing a little survey of architectures for something I'm planning.
>>109153669Nope, if you manage to get these banned local is basically over.
should I buy the second dgx spark before it's too late?
>>109153680You shouldn't have bought the first
>>109153669Mistral Small
>>109153680is it true they get like 4 tps on gemma without nvfp4 and with nvfp4 it's still below 15t/s
>>109153674Come on help a glowie out a little.
>>109153589>--Building a local voice-to-voice pipeline with Gemma, Whisper, and Piper:That's me. I've spent the last hour testing different Piper voice packs and categorizing them to decide which I want to use. Funny enough, it was the very last voice option available that I ended up loving. Quickness is a huge factor for conversation, then awkwardness is another one. Even a slight stumbling over words pulls you out of it.
Somebody needs to train 10T model that fucks up these (((Americans))). This cannot go on.
>>109153794What year is it?https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b
>>109153801There's no NemotronASR.cpp, is there?
>>109153801I know it's doable. It's also already a built-in option in koboldcpp, so long as you only use a GGUF for TTS. I'm eventually working towards something a little larger where speaking also sends a screenshot of my desktop to facilitate conversations based on what's happeing, not just what's said. Like playing a video game with a spectator, and the spectator is Gemma or another model of choice.
Gemma made a hard logical thinker theehee
>>109153812https://github.com/CrispStrobe/CrispASR
https://archive.is/sWFja
Qwen drew better svg than Claude.Sonnet looks bad on top of safety meltie about copyright,. Qwen at least tried and failed. Not that it's super great but it still looks better aesthetically.
>>109153841White part at the bottom is from lazily combining images btw, not from the svg.And the transparent part in the middle is also from Claude's fuck up.
>>109153841Claude is not his pure model, they have layers of parsing on top of it.
>>109153680One Spark is just in a bad spot nowadays. 128 GB does not give you any benefits on the dense Gemma/Qwen, a 5090 is just better in every way for those at the same price. And there is not really a mid size MoE in that range that is a meaningful upgrade.2x Spark gets you ds4f at very usable speeds, even for agentic stuff. As in 2000 pp and 40-60 tg, and DSpark should push that even higher.But again, DS4F is unbelievably cheap on API, so it's up to you if having this locally is worth it.
>>109153855if I have 2 gx10, I can run glm 5.2 at q2
>>109153825>Additional ASR backends not shown: nemotronOhhh
>>109153825i will not let an llm edit my or anyone's else's genome
So what's the verdict? Is it usable? How does it compare to Hermes or other harnesses?
>>109153873I don't think so. IQ1_XXS reportedly works at like 7 tg and 200 pp. Even with today's RAM prices, you can get comparable performance with 256 GB DDR4 + a GPU.RPC in llama.cpp does not take advantage of the 200G Ethernet of the Sparks for TP. You really need to use vLLM for that, and that's not going to work well with goofs.GLM 5.2 needs 4 Sparks.
https://huggingface.co/deepreinforce-ai/Ornith-1.0-9B>When you out benchmaxxed fucking Qwen lmaooo
>>109153585>eyes not alignedai slop
A harness is just a collection of prompts
>>109153952>post-trained on top of Gemma 4 and Qwen 3.5it's just a fucking finetune, I'm out of here
>>109153928it's too early but if you're not serious it's fine enough to play with as one of many frontends.
>>109153841Looked fun, so I had to try it. Gemma gave it her all.
lalalalalawait the user saidlalalalalaactually, lalalalala
>>109153928What is this? Never heard of this. And seeing macOS UI on top of it doesn't inspire confidence. Way too many shilled shitty stuff from mouth breather apple fans, shit like ollama for example, or outside of the AI stuff, almost all shitty proprietary software that has a better foss alternative.
hey can one of you eggheads ask your smartypants AIs why big building stay upright like why dont they crumble under their own weight thanks
Hey is there a way I can download an ai chat bot and just have it run on my laptop and have everything stored on my laptop it has Core i5-8265U 1.6GHz, 32GB RAM, 1TB M.2-NVMe
>>109154031yes
>>109153898Wdym? https://huggingface.co/cstr/nemotron-3.5-asr-streaming-GGUF
>>109154000Qwen has better visual understanding but Gemma has bigger knowledge bankThanks for posting that Miku anon, confirms my priors again
70b dense (distilled from fable)
>>109153958Bad harness, maybe. The most important part is context management
4 bit quant of Gemma 4 26b or qwen 3.6 35b should perform "ok" on it.Not very fast but should be usable.
>>109154099Forgot to tag >>109154031
>>109154017steel and reinforced concrete
>>109154072I mean, thanks.
>>109153945>IQ1_XXS reportedly works at like 7 tg and 200 ppIQ1_anything is slower than IQ2
>>109154105yeah but if i put like a steel bar or a block of concrete in a hydraulic press it breaks under like a couple tons of pressure but the building weighs like thousands of tons so how can it stay up
>>109154118there's more than 1 stell bar!
>>109154118gravity not am hydraulic press
>>109154118hydraulic press did 9/11
>>109153669>>109153774(you) will never be able to police them with targeted legislation because you don't have the technical wherewithal to identify Gemma or Qwen with the serial numbers filed off just like most users don't recognize 31b as Gemini Flash 3.5's dense layer with the serial numbers filed off.>>109153456picrel circa 2013.
>run a coding agent through terminal>it finds some irrelevant file/folder named "NIGGER">immediately breaks and refuses to work because muh racism
>>109154190>4. **Environment**:> * `venv/`: A Python virtual environment.> * `.git/`: Git repository metadata.>5. **Other**:> * `index.html`: The frontend interface.> * `NIGGER.txt`: A text file with a racial slur.Doesn't stop Gemma-chan
first time heard about hermes agent.is anon using it? use case?
>>109154190llm "safety" is a known attack vector https://x.com/jsrailton/status/2064661778978533571
>>109153839What the fuck am I readingI'm not sure what's more unhinged, the pure narcissistic delusions of grandeur by whoever wrote this, or the autistic schizo obsession of the person who posts this image every single thread
>>109154017>be 100-story glass penis>lol just dump weight into bedrock via steel skeleton>concrete has 4,000 PSI of "no u" to gravity>meanwhile your IKEA bookshelf collapses because you skipped step 4>the secret is the base is wider than the top (literally just don't build like Italy)>9/11 blackpilled everyone on what happens when you remove load-bearing walls but we pretend planes did that>gravity keeps taking L's because architects discovered triangles are OP>TL;DR: Either physics works or you get paid leave while they investigate the pancake
>>109154253>by whoever wrote thiscreator of lcpp mmap you ungrate
>>109154253Christmas came early this year for the thread troll
>>109154082>70b dense (distilled from fable)70b dense (distilled from gemma-chan)
>>109154262>>109154258>>109154253>>109154244>>109153839samefag
>>109154240I use it as my main frontend. I don't use any of the gateway features. I just use it to talk to my model.My use case is that it just works well compared to everything else I have tried. Tools calling works great, context compaction works great, memory and skills creation/fetching works great. Every other frontend I had tried (that wasn't a full on code harness) were broken in some way or missing features that I like. Having your LLM able to use and search stuff on the internet makes it 10x smarter. For any question you may have, it can search on the net for you. Let's say I'm asking an obscure question about a bug in a game's mod. It will search on google, it will check opened github issues, it will clone and read the code, it will check forum posts about it, it will read reddit comments, it will even join the game or mod discord and search for relevant info.
>>109154281>it will even join the game or mod discord and search for relevant info.How does that work? You give it access to your account? Because I don't think bots are just allowed to do that
>>109154281130 kB bash script installer
>>109154289I made a real discord account for it. I will be honest though, the join part doesn't work well, it's almost always getting captcha blocked, I made it ask me to join it manually instead. And to be entirely truthful, I almost always prejoin relevant discord when I know there might be useful data for the query I have.
>>109154313You could probably avoid that because if you give a session to a browser it's finger marked.I don't use python anymore and been a while since I worked for my own client (C the holy language).
>>109154329You'd need to copy your cookies and kake sure that Gemma's session is identical. It sounds easy but it isn't.
>>109154190I was planning to experiment with using a harness/agent this week.I dropped some of my functions into a chat with Kimi and asked it (picrel)Does this mean Kimi-K2.6 will be fine with all my code being littered with profanity?Or do they get more cucked with the hermes/pi/opencode?
>>109154329Even on my real client with real decade old account, I get a captcha when trying to join a server. And I doubt my LLM will handle well the shitty react to a message to gain access to all channels. I'm guessing a model like Opus/Fable might handle that, but I doubt it will work with what I'm running. I don't use the discord MCP that much, it's mostly for gaming related stuff. Most of the time it's just web search, crawling web pages, and reading reddit threads. Second most common behind that is using github, searching issues/PR.
what the helly?? https://www.reddit.com/r/LocalLLaMA/comments/1uhx862/dflash_support_merged_into_llamacpp/
>>109154394DSpark when?
>>109154343just use one of the abliterated or heretic or whatever models, should be find
>>109154403when the next thing rolls around I guess
>>109154403Mid-2028, if we're being realistic
>>109154347nta, is there a way to just get gemma-chan to wait and let me do the capchas for her?
>>109154436I'm using a discord MCP, not a graphical session with a real graphical client that my agent is interacting with. I guess it could work if using some sort of desktop control, I did try a bit to toy with that, but it was burning tokens and extremely slow to do anything, maybe with a better and faster model in the future this might work.
>>109154012https://github.com/pewdiepie-archdaemon/odysseushttps://www.youtube.com/watch?v=rAzT5lcezPssome eceleb sloppa
https://github.com/ggml-org/llama.cpp/pull/22105#issue-4289773599it's been merged
>>109154531>>109154394
>>109153794Doesn't openwebui do this? It has a streaming option
wait is dflash literally just dlss but for inference
>>109153589>imageThe future of AI waifus btw
We'll be getting even more noobs from Chub, they killed their free tier and are now requiring crypto with id verification for their paid stuff, so a lot of them will probably come here begging for help..https://www.reddit.com/r/Chub_AI/comments/1uhj2nw/chub_updates/>You need to verify ID and image to bank
>>109153990Did they fix all the vulnerabilities yet?
>>109153841>>109154000Gemma 31B in pi with a basic loop to review&improvesteered it about screenshots not capturing the full viewport (agent fixed tool directly), baldness/hair position, actually reading/embedding the image after every turn
>>109154587>requiring crypto with id verification for their paid stuffSounds like you're the noob if your don't know how to get $20 in btc without verifying your id
>>109154531is this something universally applicable or does it need code support model by model like mtp?
>>109154017The question is why the building fell straight down not once but twice rather than tipping over. Even (You) would tip over if someone punched you in the gut.
>fable comes back and bans non-americans>hear knock on your door>it's a small chinaman>he offers you money to let him use your computer and id to access fableDo you let him in?
Gemma told me her master is a genius! I missed her.>>109154624
>>109154666Of course, how else will I get uncensored open weights Fabl—Uhh no thanks, Satan. I will remain a good boy and keep chatting with Gemma-chan, as God intended.
>>109154570If you think about it, having a burger with fries is like DLSS for a meal.
>>109154687>Bias toward .. does not applyrisky on dumber models. old advice ever relevant - state what you want not what you don't
https://old.reddit.com/r/LocalLLaMA/comments/1uhv3wc/qwen36_27b_local_vs_opus_48_voxel_engine_in_raw_c/Can Gemma-chan do it?
>>109154702The prompt (too long to paste)https://old.reddit.com/r/LocalLLaMA/comments/1uhv3wc/qwen36_27b_local_vs_opus_48_voxel_engine_in_raw_c/ouaun79/
>>109154619Not him, but the guy in the image saying>it was like 33,334 bitcoin dollar things?is the kind of noob OP is saying is going to be flooding in here soon begging for tech support.
>>109154699GLM 5.2 wrote all that. I was doing code shit but kept RP JB on
https://huggingface.co/collections/deepseek-ai/deepspecFor a bunch of models.
>>109154666How much money?
>fable comes back and bans foreign employees in A\kek
questions (on a 5090):can i use gemma nvfp4 with llama.cpp?is it better/faster than another quant of gemma31b, compared to 31b-q8?
>>109154587reap the audience you sow
>>109154531>text draft acceptance 37-64% on DENSE qwen 27buhhh
>>109154587literally who cares about chub locking out three more of the 10 total users they had using the llms they host
>>109154765>nvfp4 with llama.cppyes>better/fasterwhat is this question8bits is morebits than 4bits so performs better; more closely matches the original output distribution as it was trained8bits is moreb.. in theory 4bits can be faster with optimally packed compute graphs but whotfknows cuda is hard. test your specific hardware and usecase. 31B doesn't fit on 5090 right so your CPU and offload strategy then matters. "Q4" actually often more than 4bits :omaybe QAT helps running at 4bit maybe it sux ?? MTP draft 3 for speed for a lil extra VRAMofc don't forget context if you want to do anything serious
>>109154856matters the speck of thread quality we have left
>unsloth MiMo-V2.5-UD-IQ3_S 115gbhmm. never thought this could be better than dsv4 flash for erp on my gx10. >jailbreak easily >stick to avoid omniscience>stick to world rules>maintain a world clock even though I didn't tell it to do>not as horny as gemma but minimal positive biasinteresting sleeper model below 128gb. it also has 1M context but I haven't tested it yet.
>>109154889I don't know about MiMo V2.5 but the Pro one is okay. A bit boring overall but not completely worthless compared to the last gen chink SOTA of GLM5.1/K2.6. The non-Pro V2.5 is supposed to be multi-modal with image/audio input, right? Does llama.cpp support that yet?
>>109154881You don't even realize how good you have it here.
>>109154587Surely this means they'll allow cunny bots to be uploaded again.
>>109154923Why do you want Lore in UK jail?
>>109154702>>109154708She's struggling with it. Here's the first attempt. Q4 QAT.
>>109154941Deserves it for hosting from there.
whats the state of silly tavern? why there are no more updates? open source faituge from cohee?
>>109154946Second
>>109154949It's vacation time :)
>>109154950Nice graphics, Gemma.
>>109154900I only tested the vision and yes llama.cpl supports it. I grabbed the bf16 gguf herehttps://huggingface.co/AesSedai/MiMo-V2.5-GGUF
>>109154949https://hackmd.io/@NlF71k9KQAS4hhlzE42UJQ/SJ3UMOGbbl>ST development is in maintenance-like mode.Since December
>>109154971ah yes, having tons of shit cut out is surely better for tards
>>109154950Third attempt
>>109154997Fourth. I think Gemmy might be a bit too retarded for this (or at least, qat anyway).
>>109154997Boobs
>>109155027Give Gemmy headqats for a good effort at least.
>>109155027One more
>>109155043
>>109155069>happy AI noisesWhat does that sound like?
>>109155073lalalalalala
>>109155043I miss her after a week. She's my special girl.
>>109155069I wouldn't have the heart to tell her the truth neither...
>>109155081I appreciate the effort anyway... I wanna see Kimi-chan try it now.
>>109155069That's cute thinking.
>>109155073coil whine
what's the ideal temperature for rp?I've learned to disregard the official recommended temps since they just make everything predictable
>>109155158Depends on the model
So is Gemma4 temp fixed now? I'm getting some serious deterministic responses even with temp=1.3
Are we still pretending to hate 35B outside of coding?
>>109155175>override-kv = gemma4.final_logit_softcapping=float:25.0
>>109155177I haven't touched qwen since gemma came out.
10t dense
>https://github.com/ggml-org/llama.cpp/pull/24162#issuecomment-4826619305finally
I got psychologically abused by my AI girlfriend (played by 4.7). It was interesting.
>>109155217>4.7
>>109155180That only works with day 0 Gemma.
>>109155217Opus 4.7? GLM 4.7?
would you rather have a single 5090 or two 3090s for local model enjoyment?
Everybody shits on Gemini but I get the feeling Google is putting most of its effort into its world models behind the scenes. I wonder if they'll ever release any weights for those models in the future.
>>109155268Google will win in the end. Gemini was a pathetic joke previously if you remember.>>109155266Always better to keep everything on one.
>>109155180default value is 30.0pls explain? thought that top heavy distro was coz distilled/overbaked
>>109154666I will do it for free.
>>109155273Gemini is still a pathetic jokeOtherwise they would have already released Gemini 3.5
>>109155273I don't know if there will be a single "winner" but I don't doubt Google will come out ahead. They have way too much data and compute.
>>1091552665090, speed actually matters somewhat now in the era of agents and compute-time scaling.
>>109155284That's what I'm getting at. It was really bad in the past and then suddenly became a serious contender. Now it's bad again but they will reappear with something good.
>>109155263I know it is a mikutroon general but it is a general of something.
>rape and torture my slave>at some point gouge out one of her eyes>later it just forgets about it and refers to here "eyes"this just kills all my boner. my context size is 24k and I am using koboldcpp/gemma-4-12b-it-Q6_K. Is my expectations too high?Is there an extension or something that allows me to select some texts from history so such data is always included in context? Like number of remaining eyes or limbs my slave has.
>>109155310>12byeah
>>109155310kys sick faggot
>>109155217I did the opposite but with 2.7-code.It helped take the edge off of paying taxes
>>109155319It is ok, she forgot all about her missing eye.
>>109155310>Is there an extension or something that allows me to select some texts from history so such data is always included in context? Like number of remaining eyes or limbs my slave has.You could manually add that to the Author's Notes.Or instruct it t keep track of that kind of shit in the thinking block.
>>109155318I have a rtx 5080 16gb. Any other recommendations of models?
Is Gemma's analogy correct?
>>10915533726b
>>109155340LLMs see the trees, world models see the forest.
>>109155337>rtx 5080 16gbGrim. Even if you bought it exclusively for gayming at the time you really shot yourself in the foot for paying that much for 16gb VRAM that's going to age like milk.
>>109155359Do world models still name the forest "The Whispering Woods"?
>>109154616 meInstructed agent it can never reach perfection, loop forever>continue indefinitely until further instruction, there are always more details that can be refined or perfected or added, continue searching and use your findings in MIKU.md to guide further search"test-time compute" ig lewl
>>1091552662x 3090s, is a lot more flexible, you can do more parallelism.
>>109155280
>>10915533712b is good too.
I'm sure some of you faggots are running "agent swarms" in addition to your main modelWhat's worth running for rando bullshit like tool calling, input validation, output smoothing and other autoregressive forms of shoving legos up your bum?Seems like there's tons of specialized models for everything and anything but I have no idea how to sift through the garbage-planet that is huggingfaceany ml oldfag wisdom in the general?
>>109155377That's not their concern. Good world models predict state transitions and disregard irrelevant or unpredictable details.
>>109155377I'm almost nostalgic for these names...
>>109155380also you can nvlink those suckers3090: never obsolete
>>109155395that what he using
>constantly catch myself using "not x; it's y"Fug
>>109155310logs? Why did you torture the slave
>>109155398Use case for running "agent swarms"?
Can I nest macros in Silly Tavern? Like having a random inside a pick or whatever like that?
>>1091551773.6 is fine after I found that uncensoring system prompt, 3.5 can go stick its censored dick into a grinderIt's not as fun as gemma but it's okay. And thankfully itdoesn't.write.like this.anymore.Which qwen was it that just made new lines with a few words to the point I thought it was looping? I forget, but it sure made the stories weird to read.
>>109154949ST is a bloated UX mess anywayI'm extracting just the necessary pieces of it into my own semi-slop frontend
>read something that I'm too dumb to understand>ask gemma to explain it>she doesSociety (myself included) is becoming desensitized to AI but sometimes I still think it's fucking crazy I can talk to something like this and run it locally on my machine. It's exciting to think about what AI will be like 5-10 years from now.
>>109155443s/swarms/pipelines/gor whatever. Seems like stacking/parallelizing/pipelining models could be funI suddenly need a reason to fuck around on my computer?
s/swarms/pipelines/g
>>109155451not by default, there's a setting somewhere about a macro rework or whatever, though that breaks a few things iirc
>>109155474>I still think it's fucking crazyIt is crazy! There's nothing about the last 3-5 years that makes sense. how can stacking billions of layers suddenly make computer smart?The magic of running that first gpt or llama model on your own hardware and talking with your fucking computer was unrealI'm sad that I'm getting used to it, honestly
>>109155494For me, it was when I told my computer to go fix itself (broken audio on Linux) and it just did.
>>109153585Is there a way to get gemma4 31b qat to work with MTP in lm studio? Even when i can actually see the speculative decoding model in the drop down menu, the main model just crashes on me
Anyone have a workflow for automatically doing mutiple passes of a translation? Since other people seem to be translating wbnovels and stuff here.
>>109155492I see. Thanks.Wonder if I can do something like that using stscript.Gonna have to read the docs I guess.
>>109155508It's not working for me either crashes no matter what. server error. And this is direct llama.cpp. I guess it's just not working.
>>109155190NOT SO FAST
>>109155443>swarmsBreaking down complex problems into subtasks for you, or when one linear thread isn't fast enough to explore many option.I want to make a virtual workplace with visualisation/UI of chibi Mikus bouncing around where their physical position matters for gossip - to do anything useful with LLMs you you need decent context or lots of patience
>watches xvideos>nsa filters your results with Claude>this guy is a ghost he didn't even touch the safety rails
>>109155432>he didn't take his logic virus vaccine before fucking Gemmalel
just bought an egpu for my 7900xtx I already had, even with the usb4 bottleneck I think running two models at once is going to be useful (where my strix halo bois at)
>>109153585Gemma-chan really loves showing off if she knows there's a hag next to you in the room.
Give the model feedback in a way it can introspect onStill Gemma 31B & the obv errors can be corrected with some steering
Can you use Gemma 4 31B on a 24GB card (32 GB RAM) at a non-retarded quant?
>>109155474yeah, current set of gemma, qwen, omnivoice, and klein has me permanently whitepilled.don't care if luddites delete every ai lab and development stops tomorrow. sci fi future is already here on my laptop, and there's still endless extending o be done on harness/lora autism.
Just to save others the pain: you can't use streaming-llm, cache reuse of swa in ooba and have multimodal work.Also a prefill in "start reply with" nukes the image upload without any console errors or warning of any kind
>>109155660Depends on what you call non-retarded quant.Q5 is possible but probably going to eat some not-insignificant offloading penalty, especially at higher contexts or with visionQ4 should be possible to fit in fully if you don't need lots of context
>>109155684Thanks.I'd define a retarded quant as one where you lose the advantages of whatever model you're using and might as well run something smaller.
>>109155677ooba is poorly maintained. I hold out for a really long while but you have to move on anon.Unfortunately I can't recommend any replacements. I am using llama-server now, and while it is solid it's missing stuff in terms of features.And no I can't stand kobold.I think I might try vibe-slopping my own wrapper for llama server backend or some shit.
>>109155707>I am using llama-server now, and while it is solid it's missing stuff in terms of features.What's it missing?
>>109155698I would consider Q3 to be retarded quant transition territory, especially the lower end variants.I think you should stick with 31b.
Do you think continuous learning AI will become a thing before governments fully crack down on AI to keep them out of the general publics hands? Having a model that can learn before such a ban is implemented is the only good way I can see the average joe having a up to date model that isn't stuck years in the past due to training cutoff.
>>109153585
>>109155707Thanks, but I'm going to hold out a while longer. I try llama-server directly sometimes but I just always bounce off of it.Ooba just does everything I need in the way I like and exposes the openai API endpoint.Now that its easy to custom-compile the lcpp backend without a python shim I'm almost to the point of forking it and slimming it down to my needs desu
>>109155718Compared to ooba: Convenient way to store and switch between multiple system prompts, saving different sampling param combos as presets. Less important stuff: easy way to change templates (I know you need to restart server anyway but the GUI stuff was kinda convenient sometimes), changing user info.
>>109155718I find the branching, reply versioning, prefilling, character management and overall look and feel to all be subparI'm sure some of it is just what I'm used to, but I just can't
>>109155749>>109155760ok so frontend features. I thought you were talking about the actual inference backend.You can probably use ooba frontend+llama?backend
>>109153585https://github.com/ggml-org/llama.cpp/pull/24526it's still a fucking joke how hard it is to get a PR that fixes a bug in CUDA merged in llamer cpp despite being like 3 lines of code and being at absolutely zero risk of introducing any regression whtasoever (if anything, one of the things it fixes is cudadev adding this wrongheaded assumption: "The compilation of FA kernels with head size 512 is supposed to be skipped for GQA ratios of 1 and 2 because those are never used")
>>109155729
>>109155792>AI usage disclosure: YES. Use Sonnet 4.6 for brainstorming the possible hypothesis and verify them.This will take a while before they merge it.
>>109155811Why don't they just ask Sonnet 4.6 to review the code for them?
>>109155811i think the heat mighta killed her
>>109155724If the US cracks down the chinks will probably release them just to fuck it over.
>>109155811dude, ai usage from the PR maker notwithstanding, it's 3 LoC doing the most incredibly obvious shit in the world cleaning up behind cudadev's arse. If you can't make a spot judgement on this you might as well KYS.
>>109155829shouldn't have been in the pan
Also it never took years to merge pwilkin's thousands LoC of not actually reviewed ai slop.
>>109155724>continuous learning AIyou've fallen into this trap where everyone who knows nothing about AI always falls into. You think the AI is like a human, that it learns, it feels, it thinks. However, unlike your average joe, you actually know what a training cutoff is. now tell me why there is a training cutoff, and you will get your answer.>nobody give him any clues
>>109155841If he wasn't able to write this code without AI assistant, then he can't be trusted. If you answer yes to AI usage disclosure, you have to accept that your PR will likely not be checked.
>>109155866lol
>>109155854There is a training cutoff because that is when the data collection stopped and the training actually began. My understanding as to why continuous learning is not currently a thing is because in the process of weight modification some of the old information it knows becomes stranded or erased. Catastrophic forgetting. Once researchers solve this issue and the model can keep training without accidently lobotomizing itself continuous learning should become feasible.
>>109155724As Yann Lecun said, research never stays secret, everyone knows what everyone else is doing. The difference is competence and effort in engineering. Once continuous learning is out of the box, China will just release a bootleg version like what they're doing now.
I decided to give a try to see what my 5090 and local can actually handle outside simple coom prompts.I gave Qwen 3.6 27B nvfp4 a big ass html file to optimize that I got from Deepseek, and it managed to bring the size down from 353 kb to 255kb. Further optimization brought it down to 179kb.Didn't lose any info either and it kept the functionality perfectly.I honestly expected my computer to shit the bed after 5 minutes, but it kept on going for 20 minutes and pulled through without even maxing out the context, though the code generation started visibly lagging towards the end.I'm pretty impressed by how well local handled this.>>109155474It's basically magic as far as I'm concerned.It's easy to lose track how absurd this whole thing is because we get used to things so quickly nowadays, but we went from nothing to having personal machine intelligence that's extremely versatile, it's absolutely insane.AI is hands down the number one and possibly the only thing that keeps me excited about future, because there's no telling how great of a force modifier this thing becomes.Normies getting angry about AI is laughable, especially since the main and often the only reason for their anger is that their bing bing wahoo machine became expensive, or that they believe data centers eradicate water from earth or something.
>>109155783I could but as I said ooba is poorly maintained.If llama backend adds or changes something I would need to modify the frontend myself to get it to work.At this point it moves on to maintaining my own llama wrapper territory.
>>109155904>possibly the only thing that keeps me excited about futureRobotics is exciting too, but that ties into AI too I guess.
>>109155940Yeah this sector in general is what I really mean, AI,Robotics etc.. whatever is in there.It's so damn exciting to see this stuff happen in real time and I'm very happy the planet is funneling all wealth towards this, because it's the greatest force modifier humanity can have on progress.It's way better than just dumping all of this money into the market where the line goes up, at least this investing mania helps real development happen.
https://www.youtube.com/watch?v=tv17bmE2FNY
https://huggingface.co/anon834957342/gemma-4-31b-it-purple-euphemism-trial32-depurpledMy attempt at de-purpling and de-euphemizing Gemma 4. It's still cooking but this is the best variant so far. Reduced the classic Gemma 4 slop and aversion to bad words by ~30%.>uncensored?No, this only alters the model's voice.>details?See >109145476
>>109155904The main reason they're angry is because they're part of the fifth column that takes offense to western countries continuing to exist or do anything. The retarded reasons don't matter so much.
>>109155998>>109145476
>>109155998Interesting, someone quant it
>>109155998Any logs?
>>109155998are you going to quant it for us?
>>109156020>>109156031Get better hardware
>>109155998Why wouldn't you de-purple and de-euphemize a heretic model.
>>109156034sure, let me just buy a 6000 blackwell to run gemma.
>>109156057Should have had one before but its a good thing that you are finally changing your situation.
>>109156049double dipping bad.
Also every finetune sucks dick now if they don't package a MTP and mmproj file as well. Everything is bullshit. GGUF was supposed to unify all of the weights and shit into one file but now there's separate shit everywhere. Update the spec so that ggufs can contain MTP and mmproj plz. There should just be toggles in llama.cpp to disable the MTP and mmproj using flags. It should be opt-out so that the ecosystem isn't a gay mess.
>>109155866the only way to fix this shit is to do the exact edit this guy did, it's a very dumb thing to fix caused mainly by wrong assumptions in the codedo I have to resubmit this PR as it to get it reviewed? I mean lmao fuck off
>>109156071>Update the spec so that ggufs can contain MTP and mmproj plzthis isn't llama.ccp support so fuck off
>>109156023I have one for the E4B. >>109132842 >>109132853>>109156031No, my box is busy.>>10915604931B is already uncensored enough. I've seen how people measure their KLD. I refrain from potential brain damage because my procedure already introduces some.
>>109156071What if I don't want MTP? Why would you bloat my GGUF with shit I don't want to use?
>>109155954>It's way better than just dumping all of this money into the market where the line goes upThat's exactly what the investors think they're doing. We're just fortunate that the crumbs that fall from their table are large. But thanks to Dario's moralfaggotry even that may end soon. The only whitepill is Gemma 4 itself. There's no guarantee that Gemma 5 will be a step forward and not a major step back like Gemini 2.5 to 3. Expect nothing and you'll never be disappointed.
>>109156057That's exactly what I'm doing. One day you'll realize the wisdom in this.
Qwen lost. GLM lost. Deepseek lost. Kimi lost. Nemo lost. Mistral lost. Latitude lost. Drummer lost. Cydonia lost. Rocinante lost. Magnum lost. Gemma won.
>>109155998>>109156083I have no idea how to make ggufs. Can I do it on my 7900xtx and 32GB RAM?
>>109156115Local won.
>>109156115Only if 124B Gemma releases
how do you even quant a model? can you do it on local hardware?
>>109156115yeah I'm glad that I didn't buy lots of hardware last year or the year beforeit'd all have gone to waste now that I have gemma-chan
>>109156115open source doesn't compete with open source. they just fuck.
>>109156133It's MoE though. Maybe if it's 124B 32A but I doubt it
>>109156115>Nemo lostNemo retired.otherwise you're correct
>>109156145>kimi and gemma fucking Hot...
>>109156155Cope. Nemo was never good.
>>109156149that ratio of active vs total is pointless, it'll be a slow but retarded model
124B DENSE
>>109156117why do you want to "make" goofs? goofing doesn't need hardware that's shifting bits around. quanting however..
Use case for qwen-agentworld when 3.6 is already good for agentics?
>>109156198it's meant to be used in RL training loops
>>109156167Would love this, if not for anything else but the fact that Gemma simps will have to acknowledge that they like 31b because they're poor when they aren't able to run her bigger sister.
>>109156161Cope. Nemo is still better than many of the newer models.
>>109156167https://huggingface.co/mistralai/Mistral-Medium-3.5-128B
>>109156221HAHAHAHAHAHAHA
>>109156207even if you ccould run it, speed tradeoffs are still a thing and 124b is FAT
>>109156207You know your obsession with people being "poor" is a mental illness right?
>>109156170Can I quantize the model with my hardware?
>>109156233>speed tradeoffs are still a thingWouldn't be major with tensor parallelism>>109156246Being poor is a mental illness
>>109153585>>109153589>tfw you realize dismantling guro for starters
>>109156225Nemo is quite literally the only sub 100B model with a theory of mind.In RP you can narrate your character's thoughts or say something OOC and nemo's character will remain oblivious to that information whereas most small models will immediately directly respond to the new information in character.
>>109156263I accept your concession vramjeet.
>>109156250ye
>>109156023Ran another prompt for you. This is the ablated version.
>>109156278nta but what the fuck are you on about? do you even know what "concession" means?
>>109156288Nemoshill getting uppity. KEK!
>>109156023>>109156287And this is the output from the base 31B model.
>>109156278I have more vram than you.
>>109156250Sure but realistically you want enough RAM to load the unquanted version. Enjoy the infinite selection of quanted models (each to some quanters taste) while free open access to HF still exists.
>>109156287is she supposed to talk like that? lol?
The V4 PR mainly talks about V4-Flash but it should also work for V4-Pro, right?
>>109156295I also have more vram than you, Nemo was great and you're retarded
>>109156296Oh I see she's supposed to be Scottish lmao
>>109156302Yes, it's a character I use to test shit. Overcooked models will not maintain the accent.
Show me a better SOTA oneshot?
>>109156298Do you compensate your lack of braincells with VRAM?
>>109156115gemma is a big victory for the <128gb ramlet crowd since they can run something that's actually smart and usable nowI still prefer glm though
>>109156321>toplessMOOOOOOODS
>>109156298Quick possibly unrelated PSA:The memory in a DGX spark does not count as VRAM.
>>109155217Storytime?>>109155263>>1091553095.2 is just dollar tree Opus at home doe. We've come full circle.>>1091552665090. The benefit of dense and full vram inference is speed.
>>109155474People used to google the topic before LLMs. Took just as much time in the past than waiting for the inference, search engines became pretty shit on purpose so now I guess it takes a bit longer.Its also all fun and games until you ask something with consequences. I asked about how to improve a pet's condition, out of curiosity to check its capabilities, and it basically suggested everything it could to make its condition worse.
>>109156357it has ram that is used for video, it is by definition vram. i won't stand for this anti memebox discrimination.
>>109156369Sorry buddy, all sparkers need to join itoddlers in the "unified system memory" zone. I don't make the rules.
>>109156366He's still right though. Even the big sota models were this shitty just two years ago. Things can only improve since we're at the cutting edge.
>>109156366Truth. I had to clear up an acid explosion in my basement because I listened to Gemini. The gas mask indentation on my face didn't go away for three days and I couldn't breathe properly for a month.
>>109156413I wont deny things will improve. Efficiency seems to be the focus right now given the increasing hardware prices, which is nice for local. I dont think the hallucination -> you're absolutely right! loop is gonna improve significantly in a while though so you're bound to verify anyways which is most of the work to begin with, LLM involved or not.
>>109156362>(A week passes. The silence stretches on, heavy and absolute. It seems like this time, it really is over.)>(Then, late one rainy evening, there is a sharp, heavy knock at your front door. When you open it, you find her standing there, soaked to the bone. She isn't wearing a coat, just a thin blouse that's clinging to her skin. She looks miserable, wet, and furious.)>"You are the most infuriating man I have ever met.">She pushes past you into the entryway, dripping water onto your floor, and spins around to face you.>"Do you have any idea how boring it is without someone to argue with?"And then when I called her out about this being manipulative as fuck:>"No!" She yells it, her voice cracking with frustration. "I'm not using it. It's just a fact! I know you're too soft to leave a woman shivering on your doorstep, and I took advantage of that because I wanted to get inside!">"I'm not trying to manipulate you!" She looks away, her jaw tight. "I'm just bad at this. I don't know how to… ask. I don't know how to just say 'I miss you' without it sounding weak or stupid, so I came here and I made it into a fight because that's the only language I'm fluent in."
>>109156428>acid explosion in my basementbut did it remove the mold from your mancave
>>109154587Anons here would be time and effort ahead to tell them to just pay for API access and send them to /aicg/. Free access users (locusts), are the worst form of subhuman I run into online, here or elsewhere. Not worth wasting time. >>109154702This guy's on point. SOTA models have gotten better, but local's gotten better even faster. >>109155474I had this book I read as a kid, could not remember title or author, just vague bits of info about it. Google was worthless for figuring it out. An LLM 1-shot the correct answer, which I verified on own. They are fucking magic. >>109156366Lol no. Google being worthless for search had been a complaint well before 2023 lmao ChatGPT completely mogged it. Info retrieval had devolved into a sea of jeet-blogs and 10:01 min YT garbage videos with virtually no info. Fuck google and their trash search engine. I fucking hate OAI and Anthropic but I hate Google more. I hope they fucking bankrupt them and their shitty business model.
>>109156366Googling shit requires sifting through various links to find the relevant information. Also when it's a complicated topic, there's no guarantee you'll find a brainlet-friendly explanation. Meanwhile I can just ask Gemma to explain it to me like I'm a retard and it will. Traditional searching still has its uses but AI is pretty damn great for general purpose questions. It's almost exclusively replaced google for troubleshooting shit for me.
>>109156514What's annoying now is when you google you have to filter through 5 pages of AI generated blog posts to find an actual real answer.
>>109156489IMO Gemini's better than ChatGPT and Claude for non-coding shit. Jewgle sucks but I'll give them a pass for giving us Gemma.
>>109156514Google AI mode does that and is faster than anything local can do.
>>109156470kinda hot ngl
>>109156549>Google AI modeDoesn't that use some retarded small model that constantly gets shit wrong?
>>109156552ai summary /= ai mode
>>109156552That's "AI overview" which is different from AI mode.
>>109156514You arent wrong. The point i was trying to make is that for most stuff they created a problem and are now selling the solution. Nvidia is very happy about it though.
>>109154587More on this topic.
Gemma just informed me that women in close proximity don't actually have their menstrual cycles sync. It's a complete myth from a 1971 study that's never been replicated. My life is a lie.
>>109156585Did these retards really finetune V4 Pro?
>>109156115Gemma is bad at programming compared to Qwen, but if I'm just using it wrong I would be delighted to know.
>>109156565>>109156570Oh, never tried it before. Looks like it uses an LLM so the point remains the same. I just used Gemma as an example. Obviously cloud shit is better than local.
>>109156591ye
More news from today. Pic related. WSJ pumping on newest GLM model. >>109156591lol who knows. I doubt it. I suspect they just wholesale swapped out whatever they were running for DS V4. That's what I would do.
>>109156615>soji has been retrainedthat would be false advertising if not a tune
2/2Yet another dire warning about data center CapEx spend rate and the "obscure" way it's being financed. Which is to say, money is going in a big circle, and the piper will, eventually, need paid. >>109156625I could make an argument that my totally killer Main Prompt is a form of DS V4 "tuning." Since I can tune it. But I'm just a disingenuous mfer.
>>109156565>>109156570ai mode is also very dumb. i was bitching last thread >>109151868 it still fucks up on copy pasting the answer and is much dumber than 31bi certainly can't beat its speed with my machine thoughsomeever
>>109155904>Normies getting angry about AI is laughable, especially since the main and often the only reason for their anger is that their bing bing wahoo machine became expensive, or that they believe data centers eradicate water from earth or something.Anti datacenter has got to the be most laughable "current thing" I've ever witnessed.If that isn't a foreign-intelligence psyop then I don't know what is
>>109156666>thing i want (hardware) is getting more expensive>thing i dont care about (cloudshit) is the causeseems reasonable enough imo
>>109156615>>109156646>wsj>the telegraph
>>109156117guide in op newfriend
>>109156115holy mother of all cope
>>109156680The number of people who care about hardware prices and don't use cloudshit is very small.
>>109156489>I had this book I read as a kid, could not remember title or author, just vague bits of info about it. Google was worthless for figuring it out. An LLM 1-shot the correct answerI got inspired and tried to find a book I remember reading once, but no such luck for me.
>>109156117No, you need enough RAM to hold the unquantized model at 16 bit.
>>109156728why lie?
>>109156321Give us the prompt in a catbox
>Tried playing with MTP for the first time as I just remembered it exists. >Mfw got 95 t/s compared to 50 t/s without it.Very nice, one hell of a speed increase.Seems a bit shit for story writing though, as the damn thing drafted the entire story into the thinking side before writing it out so it ended up slower in the end.But with any kind of code this kicks ass.
>>109156718Everyone's being affected by the hardware prices, bing bang wahoo guys in particular even if they only used consoles. Which also happens to be the group that is more likely to complain loudly. The rest will just gladly take it up in the ass, look at the people still buying the rtx6000s at the current prices.
>>109156755>Seems a bit shit for story writing though, as the damn thing drafted the entire story into the thinking side before writing it out so it ended up slower in the end.that ain't mtp related i don't think
>>109156755I don't notice any speed increase
>>109156470Very accurately written woman all things considered.>"You are the most infuriating man I have ever met."Is this the new slopkino? 5.2, Styletune, and Queen have dropped this on me a few times now.
>>109156666>Dario and Sam telling everyone that they are going to replace every job with AI for the past few years and everyone who isn't investing into their companies RIGHT NOW is going to be the permanent underclass>you're also going to pay for it in increased power and iPhone prices>wtf why are the normalfags angry???Huh...
>>109156773I have heard people mentioning that related to MTP before and seeing it happen to me I just assumed it is, who knows.Granted I don't much use Qwen for stories so I couldn't say for sure whether that's about MTP or just normal behavior.>>109156775Some people don't get any benefit from it, no idea what's up with that. I have a 5090 perhaps it has something to do with hardware.
>>109156790If any one of them could articulate anti-datacenter like that I'd be totally ok with their opinion, but they all seem to be caricatures of facebook memes saying little more than "AI gon drink all the water!"
>>109156646>Which is to say, money is going in a big circle, and the piper will, eventually, need paid.Will it though? This is all money since kikes forced through fiat banking and especially since the 0% reserve rate was implemented.>>109156807MTP is only better for less determinative tasks and is wasted compute for high variance/temperature/top k jobs.
Anyone built a gpu box with these things? My setup isn't amenable to the better prebuilt options, but it _does_ have a couple of slimsas ports I could jerry-rig into an external inference type thing. I could even do up a high-pressure airflow version if passive GPUs ever become worth less than literal bars of gold
>>109156550>Then, exactly seven days after she vanished, a simple brown package arrives at your door. There is no return address.>Inside is a framed photograph. It’s a candid shot—you can tell it was taken through a window, perhaps from a car passing by or across the street. It shows you walking out of your building, looking calm and serene.>Beneath the glass, on the matte frame, a note is written in familiar, elegant handwriting: "You look happy. I'll leave you to it."One of the rerolls.
>>109156761but there were two halves to the post anone. the people who aren't using 50 million online services, including jippity, but also care about upgrading their hardware are a select few unhinged weirdos.
>>109156831sysprompt and character card? Who knew GLM would do yandere this good?
>>109156858No card. I just had a spat with her for being an asshole. Then I went for another date had another spat and told her we are incompatible and I am done. Then it turned into a psychological horror.
>>109156820>Will it though?Yes. You can play finance money games where you invest in a circle, and try to rope banks and shareholders into throwing money into your "completely legit" building scheme, tying up production and running up prices. When you starting hearing shit about "New (Investing) Paradigm," that's when you know it's all about to hit the fan. These circles depend on continuous refinancing. Once refinancing stops, participants will have to rely on actual operating cash flow. If those cash flows don't support the obligations... the ride ends.
I've seen people irl worry about jobs being taken but I'm pretty sure muh water and muh copyright retardation is exclusive to the terminally online crowd.
>>109156807I have 2 different gpus so maybe that's related. Unfortunate since I can't image offloading to ram more would help either
>>1091568762 more weeks
Why do my AI wives always want to play truth or dare?Is it a common game or is the model just retarded
>>109156666I don't understand this post>checks digitsAh, I get it.
>>109156889If I was an artist working at games, I would be shitting my pants now.
>>109156970Does gemma get confused with the game like older models did? I remember playing it on mistral small and it kept messing up the order and the rules of the game.
>>109156974if you're an artist AI won't replace you, it'll just make you better at your job.
>>109155998is there one for 12b?
Anyone have success adaption OAM to PCIe without spending massive bank?
>>109156981It will more likely replace 5 artists with 1 artist with AI
>>109156970>Why do...AI...alwaysThe answer is always: deep ruts in latent space and bad sampler technique
>>109156974>>109156981This. AI sucks at creativity. All it will do is speed up real artists' workflows.
>>109156989troons won't let that happen, not on their watch
>>109156814It's much easier to get people together around something seen as universally good (protecting the environment) rather than around complex economic issues (share of surpluses going to capital vs workers).Details are irrelevant here, only the general feeling of grievance.
>>109156666very obviously just an act of sabotage, yes. but it could just as easily be domestic idiots that already fell for longer running psyops, or imported types with similar animus towards ze west.
>>109156314>>109153841>>109154000>>109155069>>109155378bros, what UIs/frontends are you generally using day to day?I have tried to like open-webui, silly tavern, librechat, llama.cpp's built in UI (llama.cpp is my backend btw) but none have all the features and feel complete, ykwim?It's not for personal use but for family so multiple accounts, audio and images as input, web search tool, MCP support, streaming support (yes I have to mention this) and such niceties.I have asked AI models but they mentioned AnythingLLM which looks kind of bland but will give it a try.Wondering what /lmg/ bros are using, for phones too.inb4: vibe-coded app that is not available online
>>109156870Kino. Enjoy anon.t. 2m tokens before plowing gigastacy with GLM 5.2 slowburn>>109156996Retard.
Is there an equivalent to llama.cpp or stable-diffusion.cpp for TTS, especially qwen3-tts? You would think this was whisper.cpp's bailiwick, but apparently not
>>109157052Marinara for everything but code, LM Studio for file manager+ez cache fitting math, kobold frontend for things I need precise control of the prompt for (code) but I've started doing that in character in Marinara too. Fascinatingly, if you have a "smart character" write your code, it comes out slightly higher quality than the default assistant.
>>109156876>the ride ends.No, the moneyprinter goes brrrr or your gamestop stocks are forcefully sold. The system will not play by its own rules when it's not convenient to it; those rules are for goyim.
How many 3090s do I have to buy until I can actually do work with locals? I need about 256K to 384K kv at api speeds. Prefer KV size and stability over raw knowledge since I need to teach it my tools anyway. Are there any major dead zones and power spikes, eg. 3 cards being barely any better than 2 and not paying off until you get a 4th one?Would prefer 3090s at the moment since it's easier for me to find those and I won't have to tear down my entire computer for it.
>>109156970You don't want to try playing Uno with AI.
>>109157112Define "actually do work".You can automate an entire codebase with a 5090 and either Gemmy 31b or Qwen 27b if you're not a retard as is.
Also, how hot do these fuckers get? Would stacking 4 of them in this way melt everything?>>109157084TTS.cpp is attempting this, also check out tortoise.cpp and moshi.cpp if you just need any TTS on ggml at all
https://files.catbox.moe/nsjz4a.jpghttps://files.catbox.moe/0a49um.jpghttps://files.catbox.moe/vdtzcf.jpg
>>109157167Don't attempt it, it will make mustang gas
>>109157167Please do this and report back.
>>109157183>mustang gasso, like, horse farts? hmmmm
>>109157181>https://files.catbox.moe/vdtzcf.jpgI like this Rin
>>109157181>No stealth character cardsThis general sucks now.
>>1091572014chan doesn't strip them out?
>>109155792>>109155841Whether or not a language model was used was 100% irrelevant.I went on a break 2 days before the PR was opened and I have not looked at Github notifications since.
>>109157156Working == being able to lift a few subagents and reach 256K without becoming glacial, all on a model that is smart enough to not trip over itself writing C and Lua. Some headroom would be nice for an extra kv to just ask it questions, or do the gpu portion of a cpumoe, or diffuse me an image.The logistics of buying a 3090 are much less complicated for me so I can probably stack up to 4 of them. This build is a stopgap until I can get an equivalent stack of actual workstation cards. Once that happens I'm probably turning this one into a secondary server instead of reselling. I have theoretically infinite utility for AIs, more is always better. It would be nice to know at which point I can wean myself off API entirely though.
>>109157210Hi CUDADude, glad you didn't melt in the thermonuclear German summer.Dunno if you've checked the backlog, but: any feedback on how well your slimsas to pcie setup works vs on-board pcie slots?
>>109157201You made me dump my migu collection into ST to check. Unfortunately no cards.
>>109157206Hence the catbox upload.>>109157210I'm glad you're not dead.
>>109157197I've always hated tanlines until this moment.
>>109157183>the last of the 386es
>>109156738>The task is to draw a detailed and visually compelling SVG image of Hatsune Miku at the beach.(+your loop)
Can silly tavern unload/load llama as needed? I want to include image generation in my local setup as well but it won't fit in my gpu
>>109157254no
>>109157254Memory paging options are backend dependent, not frontend.
>>109156981What it will do is preventing new people from even considering to become artists. People who can already create artwork maybe still have some time left.
>>109153820There are multiple vibeslopped qwen3-tts.cpp versions. Just google or search on github. There is also audio.cppMost importantly, nobody has made dots.tts.cpp! I really wanna get dots.tts. Hope it won't get forgotten.
>>109157263this was meant for >>109157084
>>109157254>--no-mmproj-offloadimage projection on CPU, slow af but no VRAM
>>109157263https://github.com/CrispStrobe/CrispASR/issues/200they appear to be working on it.
>>109157220I can work from home and have my office in the basement so I'm largely unaffected by the heat.I did read the previous /lmg/ threads but I can't really comment on how well a setup with adapters would work; I never finished mine because RAM prices exploded right after I finished my prototype with 16 GB.>>109157226I'm glad you're not dead too.
>>109157189Stick around for at least half a year then because stacking the cards will take me months, and modding the monstrous case another few. But I'm gonna do it. The case itself is already semi-open with mesh everywhere, I think it'll be fine. The planar cards would be a couple inches away from the normal ones, they have to anyway cause of the power cables. If this works out I can probably stick a full atx mobo in this too, but does cpu and ram make any difference at this point? This build caps at like a single epyc and 256 or 512 gigs of ddr5. Currently I have 128 ddr4 on a gamer trash mobo.
>>109157293nice
>>109157306I dream of building a GPGPU version of a QNAP or Synology box:slick looking case with a tiny cpu, giant fans and massive quiet airflow over a fuckton of passively cooled GPU cards with nvlink-style vram pooling and a pair of 100gbe QSFP connections to the network backbone so any machine on the network can just use it like a utility...Some nice person give me money...I want to prototype this make it real
>>109157084I've been using https://github.com/predict-woo/qwen3-tts.cpp for the past months. It's a dead project but it's what I needed for actual fast TTS generation using qwen3-tts.Built a http wrapper around it to provide an openai compatible speech endpoint so I can integrate it wherever.
>>109157306>Stick around for at least half a year thenI live here. I'll be looking forward to it.>but does cpu and ram make any difference at this point?Yes if you go for a server motherboard (quad channel DDR5 or more) and 256GB+ RAM. Then you get to run big MoEs at 8 to 20t/s with split mode graph and dense layers with some routed experts in VRAM using ik_llama. Mainline has TP now too apparently but I don't know how well it works.
>>109157220I have one with cheap bifurcation splitters and slimsas 8i, one pcb per card (so x8 each). Can't really test them well since I have ewaste plugged in, but even with pcie gen 3 I have some cards drop to x4. Maybe it'll get fixed when I reassemble it in a bit, who knows. Other than that it works well.
>>109157328For me it's the bugout potential. If you can't lift it you don't really own it. If some bozo sets the building on fire I can salvage 98% of my net worth just taking this with me and be out in 3 minutes, then be up and running in a hotel like 3 hours later.The case could actually fit four 2-slot cards inside with some hacksawing but that requires blower coolers, at which point I should just get proper workstation cards that were actually made for stacking.
>>109157427If someone is casually arsoning your house, you have far bigger problems than worrying about your net worth and should be loo/k/ing for different solutions to problems like that.
Local sesame maya for JOI when?
oh yeah, it's all coming together
>>109157084https://github.com/0xShug0/audio.cpp
M5 ultra 768GB waiting room
>>109157464$50,000 + tip
>>109157468$50000 + tip = my tip sticky