/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107826643 & >>107815785►News>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107826643--Paper: Recursive Language Models:>107831224 >107831529--Mistral Small surpassing Nemo-Instruct in roleplay performance:>107831248 >107831280 >107831370 >107831406 >107831462 >107831464 >107831497 >107831449--Heretic tool's impact on language model performance and censorship bypass:>107831617 >107831631 >107831713 >107831846 >107832054 >107832060 >107832079--Struggling with tool calling models on 19GB RAM hardware:>107826694 >107826795 >107826819 >107826837 >107826853 >107826861 >107826877--Kimi-Linear support PR for llama.cpp:>107832698 >107833129 >107833201 >107833241 >107834018--Skepticism and mixed experiences with new Jamba2 models:>107827347 >107827506 >107827604 >107829891--Silent event execution limitations in AI interactions:>107827620 >107827869 >107833064 >107827956--Llama 4 Scout architecture and finetuning discussion:>107827217 >107827325 >107827348 >107828092--New 72B MoE openPangu-R-72B-2512 with modest training setup:>107827977 >107828115 >107828140--Exploring Live2D model generation via semantic segmentation and workflow tools:>107830798 >107831073 >107831131--Skepticism about Google's AI long-term memory research:>107831725 >107831741--Detecting vector usage through logprob comparison experiments:>107830837 >107831021--Context limitations in 24b models vs. small task creativity:>107829571--Request for dynamic GPU device selection in llama.cpp to handle sleep-induced device name changes:>107829783--Prototype merge of GLM 4.6 and 4.7 models:>107828096--Critique of high-context benchmarks and Qwen model performance differences:>107832068--Miku (free space):>107826689 >107829891 >107832692►Recent Highlight Posts from the Previous Thread: >>107826648Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107834494oh wait now I read more and it can translate NL into the actual calls. That's crazy since I am nocode.
>>107834544it means you were having a good dream
>>107834544>frogYou're talking shit
>>1078345444.6 made me analyze my dreams and actually find out what they mean.
>>107834544check under your bed
>>107834544Stop jerking off to scat.
Maybe one day AI will learn how to number things...
Well, this thread is already off to a great start.
https://github.com/ikawrakow/ik_llama.cpp/pull/1131>I've looked though ST codebase; it is a nightmare to navigate. Why did they make it so needlessly convoluted? Implementing anything there is far beyond my skill and patience and it would be much easier to make something from scratch that works and looks good in a single html file than to bother with it. How do you manage to make a webui >300MB?>To prove the point:>This funtioning simple webui that I use for testing is 18kb.Sheeesh, vibecoder really roasted ServiceTesnor
>>107834748wait until someone asks to see logs
>>107834742isn't it common to leave gaps and group things by 10's? it makes it easier to add something in later.
>>107834748Want to hear about my ego death?
What's the differrence between embeddings and reranking models? One is short and one is long term memory?
>>107834901You are just an LLM trying to justify itself.
>Her skin is smooth and flawless. She has no blemishes or scars. Her hair is dark brown and thick. It's long and wavy. She keeps it tied back in a ponytail. Her eyes are bright green and full of life. She's very intelligent and curious. She loves to read and learn new things. She's also very playful and mischievous. She likes to pull pranks on people. She's not afraid to speak her mind. She's very independent and self-reliant. She doesn't need anyone to take care of her. She can take care of herself. She's very strong and resilient. She can handle anything that comes her way. She's a survivor. She's a fighter. She's a winner.Great text.
>>107835060Typical chub.ai card, 2025
>>107834742Maybe ask for it?
>>107835060GLM 4.7?
>>107834750ST is not worth anyone time
If I want to vectorize a documentation, do I need to get rid of all the cosmetic hashtags and asterisks to prevent token waste? Are there some premade presets for this?
>>107835060Is this the new Jamba?
that model is shit, nobody should use that modelor that model, that model is shit tooyou want me to tell you the best one?or any i think are good?lol fuck off
fuck off tobs
>>107834987Go ahead, let's get it out of the way.
>>107835196obsessed
>>107835233>let's get it out of the way.When you put it like that I would rather leave it unsaid and instead keep bring it up randomly.
I'm a real retard with a low school diploma who got into AI through anime waifus, and now I wanted to dive deeper.If I have to learn complex differential and integral calculus, that's too deep, right?Do I really have to complete half a math degree?That really demotivates me.
>>107834544I know you're just a dumb kike that's here to derail any discussion that isn't about how based zognald trump and the feds are but weird bathroom dreams are actually one of the more common themes for disquieting dreams. It means nothing.
>>107835318First, what do you want to do?
>>107835325go back to /pol/ faggot
>>107835123>>107835088It's Gemma 27B but accidentally had Mistral template enabled
>>107835318I am unironically very smart and I am 100% sure that you don't have to know anything about how integral and matrix calculus works (and I don't),
>>107835318the computer does the math for you. are you trying to develop your own model architecture?
>>107835375>I am unironically very smart
>>107835396Everything I said was absolutely true and also a joke and also a self aware joke.
>>107835403Sometimes I wonder how often Elon Musk posts in these threads.
>>107835331Well, I thought it would be best to learn the whole topic of AI from scratch, so I thought it would be wise to understand what a neuron actually is and how AI was inspired by nature and developed from there.Well. The first topic on neurons looks like this. What kind of retard wouldn't lose interest?
>>107835427I hate Elon. And that other faggot that keeps posting ITT.
>>107835431Bro, if you don't have a goal you'll lose interest very fast. Start with that
>>107835431i think you need to realize that you are going to die relatively soon/adult life isn't as long as people think. don't waste the time you have learning something you're not going to use.only learn what you need to know.
>2026>kobold is still the only thing worth considering
>>107835488That's such a pessimistic way of looking at education. You have no idea when some bit knowledge will come in handy.
>>107835612Yes. We all run KoboldAI.
>>107834480alright what's the opengoy equivalent to topaz? I'm looking to upscale a film it's around 520p PAL DVD. How long would it take for 2 hours basically.
>>107835488In principle, you're right.On the other hand, I'm just interested in it, and “aha” moments are also quite affirming.I accidentally started a follow-up course and learned about the Hopfield model and how associative memory works, and understanding that felt better than generating a few naked waifus.I'll try to understand it this week, and if it doesn't work out, I'll drop it.
How can I train my own LLM? I downloaded a bunch of data from someone satanic and I bet they would love a satanic chat bot trained on them. I'm assuming it would be some kind of fine tune or lora equivalent of an existing model? I'm running AMD on Linux if that matters. 64gb ram 16gb vram
>>107835653Try asking on beaverai discord. It is a serious organization focused on finetuning LLM's.
My gemma herectic is to mean no matter what I do, any fixes?
>>107835676>>discordThanks for the lead I'll ask.
>>107835653biggest model you can do is a 24b with qlora, assuming the dataset is under a few hundred thousand tokens. download axolotl.
On a serious note though. Why do we all unanimously agree when we tell some newfag that he can't finetune a model cause resources needed for that are astronomical. But some people here pretend drummer's shittunes do something positive?
>>107835679I don't speak ESL, can anybody translate this?
>>107835693I don't care what lies drummer tells people about his shittunes.
>>107835707Your the esl retard
>>107835679Tell it to be nice
My banned token list is a whole ass novel at this point lule
>>107835736proof?
>>107835612You don't need more
>>107835736post it?
I love modern software
What's the juicy choice for 32 GB? Just upgraded from my 580, so I'm new to this.
>>107835742>>107835749https://huggingface.co/Sukino/SillyTavern-Settings-and-Presets/blob/main/Banned%20Tokens.txtplus maybe 150 extra lines of mistral small specific slop>>107835758mistral small, qwen 2.5 32b, cope quant of a 70b
>>107835758Mistral nemo
>>107835679There's nothing rude about that response.
>>107835765>post your banlist>posts someone else's ban listbenchod
>>107835758a good quant of glm air if you have at least 64gb of ram. a cope quant of glm 4.6 if you have at least 128gb of ram. otherwise a cope quant of a 70b.
>>107835772It is too direct. GPT would never speak like that.
>>107835736What you got against Elara and Zephyr?
>>107835777I will call the miku police on you
>>107835785You're absolutely right!
>>107835797Yeah you better speak nicely if you live rent free in my hardware.
>>107835767stop trolling newbies
>>107835793and do what? take me to miku jail?
>>107835785Please forgive my insolence, but I would like to inform his lordship that this is Powershell, not cmd.
>>107835826
Hey guys.I am a drawing beg learning to draw. I was wondering if there is a local model that could provide constructive criticism for an input image. Is this possible on local? And if so, what exactly should I be looking at?
>>107835754>windows 10sure you do
https://about.fb.com/news/2026/01/meta-nuclear-energy-projects-power-american-ai-leadership/the company that has never made anything but garbage in AI wants to invest in fucking nuclear power to help with ai? what is going on in that lizard brain
>>107835848one of these three depending on your hardware. you need to download both an mmproj file and a gguf file. the combined size has to be less than your VRAM.https://huggingface.co/bartowski/mlabonne_gemma-3-27b-it-abliterated-GGUF/tree/mainhttps://huggingface.co/bartowski/mlabonne_gemma-3-12b-it-abliterated-GGUF/tree/mainhttps://huggingface.co/bartowski/zai-org_GLM-4.6V-Flash-GGUF/tree/main
>>107835882But I saw earlier someone saying to use GLM Air?
>>107835887do you have the hardware to run glm air? there is a version of air with image support. use it if you have the hardware.https://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF
>>107835873Zuck thinks he can change the world and the only way he knows how is throwing money at projects until he gets bored and finds another.
>>107835900I have 2 3090s, so Air > Gemma? Also thank you for the help I appreciate it.
>>107835121>do I need to get rid of all the cosmetic hashtags and asterisks to prevent token waste?They convey information. They're not just cosmetic.>Are there some premade presets for this?Learn sed if you really want to remove them.
>>107835915didnt expect you to have good hardware. most people do not. download these files here. you will have to offload a little bit of the model to your ram, but this is best multimodal experience that local ai has to offer.https://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF/tree/main/zai-org_GLM-4.6V-Q4_K_Mhttps://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF/blob/main/mmproj-zai-org_GLM-4.6V-bf16.gguf
>>107835941Thanks king
Is REAP worthwhile? I noted that there is an 82b REAP variant of GLM 4.5. It claims to offer nearly identical performance, but I'm skeptical.
i have 24gm vram and 96gb ram, what's the strongest model that i can run?
>>107835959Are you going to be asking it to generate code and nothing else?
>>107835948no problem.>>107835959reap models offer similar performance to their base models, but only for specific tasks like coding. they are significantly worse for creative purposes due to how the reap process works. they basically just rip out random experts from the models and then do a finetune using coding stuff to regain some of the lost intelligence>>107835969glm air is basically your only option.
>>10783596924 grams of vram is a fuckton, you can probably run whatever you want
>>107835882>the combined size has to be less than your VRAM.Is dram offloading just not an option for this?>you need to download both an mmproj file and a gguf fileIs there a guide for setting that up? I noticed there are only mmproj files in your third linkFor my usecase, does it make more sense to go for high-parameter, heavier quant, or low-parameter, lighter quant?
>>107835996>I noticed there are only mmproj files in your third link*forget i said this
>>107835980>reap models offer similar performance to their base models, but only for specific tasks like codingeven that isn't true fuck the benchmarks and the benchmarks believers
>>107835969just shy of glm 4.7. probably say you can run 4.5 air quite comfortably on kobold.if you get a bit more ram or another gpu glm 4.7 could run at like 5-10 tokens a sec
Why is ram so expensive if yall use vram?
>>107835996>Is dram offloading just not an option for this?that is only an option for mixture of experts models. none of those are moes, but glm4.6v is.>Is there a guide for setting that up?kobold.cpp should be all that you need.https://github.com/LostRuins/koboldcpp
>>107836017https://www.youtube.com/watch?v=ISOIOadu7LE
>>107836049The video attributes skyrocketing computer hardware prices, particularly for GPUs, to a severe lack of competition at every critical stage of the global supply chain. This monopolistic structure begins with Nvidia, which commands a 92% market share in discrete GPUs, giving it the power to drastically raise consumer prices. The bottleneck tightens upstream as all major chip designers rely exclusively on TSMC for manufacturing, a dominance secured by TSMC’s mastery of Extreme Ultraviolet (EUV) lithography. Crucially, the supply chain is anchored by ASML, a Dutch company holding an absolute monopoly on the essential, multimillion-dollar machines required for EUV production. With the AI boom exacerbating shortages and no immediate rivals to challenge these entrenched players, consumers face high prices with little prospect of near-term relief.
>>107836069>>107836049>>107836032so you guys don't offload anymore?
>>107836080offloading is for queers. real men run dense
WHY AREN'T THERE ANY GOOD TTS OPTIONS THAT RUN ON GPU WITHOUT CUDA. FUCKKKKKK
>>107835996>high-parameter, heavier quant, or low-parameter, lighter quantfor the models you are looking at, you generally do not want to go below q4_k_m. going about q6_k is generally unnecessary as well. stay within that range and use that to determine whiche parameter count model you should use.>>107836080only for giant moe models. i can keep glm air entirely in vram.
>>107836088>he bought aymd
>>107836088Just vibe code your implementation in nigga.
>>107836089>i can keep glm air entirely in vram.you must have a dual 5090 or 6000 then damn
>>107836110yeah like 6 years ago>>107836120nigga what?
>>107836129i am one of the guys with a blackwell 6000 and a 5090
>>107836135damn nigga
what's the minimum quant for glm 4.6 air to not be retarded?
>>107836120>vibe code in nigga
>>107835977Nah, creative writing
>>107835980>they are significantly worse for creative purposesWell, that's disappointing. Download cancelled!
>>107836140it is quite the experience. i get about 75t/s on a q6_k of glm4.6v with 64k context.>>107836145absolute minimum is q3_k_m. q5_k_m is the sweet spot for quality and speed.
>>107836088Sell your trash and buy Nvidia. Cuda isn't going away anytime soon
>>107835777bloody
>>107836159Then the only thing you would get out of a REAPed model is retardation and hallucinations.
>>107836209you are probably thinking of abliterated
>text completion>instruct>dense model>q4moesissies need not apply
Is creative writing codeword for goon material?
>>107836215qrd
>>107836192What if I want cross platform compatibility. Even LLMs work on both AMD and NVIDIA. It's fucking RETARDED that TTS models can't do the same. They're so goddamn SHIT. EVERY SINGLE ONE except for FUCKING PIPER is damn near impossible to INSTALL in the FIRST PLACE because of their DUMBASS PYTHON/PYTORCH DEPENDENCY HELL. WHY CAN'T THINGS BE FUCKING SIMPLE? WHY CAN'T THERE BE A TTS.CPP TYPE PROGRAM THAT JUST RUNS .GGUF TYPE FILES FOR TTS MODELS. WHY? WHY IS IT SO FUCKING AIDS? EXPLAIN THAT. JUSTIFY IT. YOU CAN'T.
>>107836217Yes
>>107836222>>>/wsg/6070487
>>107836213No, I'm not.
tried glm 4.6 flash, god it's bad, i need a horny vision model, simple as
>>107836240gemma abliterated
>>107836222get fucked lmao, been playing around with voxcpm btw might be better than chatterbox. not that you could run either
>>107836222Couldn't some TTS models be converted to onnx and run on AMD that way?
I still use ooba and sillytavernI'm not sorry
>>107836222Then run your TTS on CPU or rent a GPU and stream from there? That's not rocket science.
>>107836255wait we dont use silly anymore?
>>107836240it's a 9B model. try the bigger version.
>>107836268i only have 32 gb of vram
>>107836262Since people complained about SillyTavern being rebranded as ServiceTesnor, it has been deprecated and discontinued and a new corporate-friendly project was started instead.
>>107836274so what are we using now?
>>107836272how much ram? the point of large mixture of experts models is to offload most of it into ram.
Us 8-16gb vram niggas are all running penumbra aether btw
>>107836282Having an LLM generate your own custom frontend is minimum requirement to post here now.
>>107836222https://github.com/mmwillet/TTS.cpp
>>107836286oh I didn't mean to use a moe model, what do I do?
>>107836291nah fuck off, you are probably using a fork
>>107836251Can either run on AMD?>>107836252Is onnx inherently AMD compatible or something? Please spoonfeed me AI doesn't know shit about any of this and all of the docs are ass.>>107836257CPU isn't nearly fast enough. I need low latency and I'm not renting shit as a matter of principle.>>107836293Yes I know this project exists and I like the concept but it's half-baked. Doesn't even support vulkan yet.
>>107836318just buy a h100
>>107836222llama.cpp is already kind of understaffed for text completion only and there just aren't any devs investing the time to implement and maintain a TTS equivalent.
>>107835882what about qwen-vl?
>>107836318You can build it with vulkan support check the issues.
>>107836331those also work, but are extremely dry. try a q5_k_m of qwen3vl 32b.
>>107836222There are a dozen of TTS coming up every three months with vastly different architectures. No one got the time for that shit
>>107836318voxcpm definitely won't, but chatterbox might have some half-baked rocm version. assuming you want voice cloning
>>107836330ollama is better anyways
>>107836343ok will do. thx for tip.>>107836355voice cloning would be nice ig, but at the end of the day I just need a voice that sounds vaguely cute, girly, and sexy that is expressive/emotive. 90% of the options out there sound like 50 year old librarian wine aunts.on that topic, Piper actually does have some surprisingly cool voices available (e.g. GLADOS and HAL9000) but they don't suit my current needs.
>>107836318>Please spoonfeed me AI doesn't know shit about any of this and all of the docs are ass.Everyone runs Nvidia for a reason. If you aren't comfortable patching things yourself, it's going to be very difficult get anything working.
>>107836318just buy nvidia, even if you are in some third world shithole surely you can get one
>>107836372ollama doesn't have GGUF TTS either.
>>107836441i use comfy for that
Someone has put an optimized gptsovits that runs on CPU, you should give it a try https://github.com/High-Logic/Genie-TTS
>>107836422Do you understand how insanely frustrating it is to have to spend hours figuring out backend tooling when you're working on a separate project that requires it? llama.cpp is great because you can just connect to the server api and have it work with everything out of the box. I don't have to bloat the fuck out of my project and it just works. But for TTS? Oh ho ho, no no no.
>>107836438I was going to during Christmas but my paycheck got delayed and then the prices went up by $400 for no reason.
>>107836476>went up by $400 for no reason.it was me, sorry
Best options for "fast models" supposed to fit fully in 24 GB VRAM, with no RAM offload? GLM 4.6 is good but sometimes I just want fast iteration.
>>107836244gemma derestricted is better, preserves the intelligence
>>107836537>gemma derestrictedLink, can't find it :(
>>107836537why not gemma norm-preserved biprojected abliterated. that preserves the intelligence the most afaik.https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration
>>107836351I'd settle for OuteTTS-1.0.The previous versions are already supported in llama.cpp.I think the only thing really missing is support for a DAC encoder model (the previous versions of OuteTTS were vocoder based).The biggest hurdle would probably be integrating it into the WebUI and API.
>>107836556Oops, it looks like that's what 'derestricted' is. I don't know why they rebranded the name, rather than just calling it abliterated NP.>>107836554https://huggingface.co/mradermacher/Gemma-3-27B-Derestricted-GGUFhttps://huggingface.co/mradermacher/Gemma-3-27B-Derestricted-i1-GGUF
This hobby is too hard to keep up with
>>107836605Why'd you want to stay bleeding edge? Sandbag yourself at your local fortress with your daughterwife and let the world explode. Keep backups of WORKING sw configs. Never update unless needed.
>>107836605
>>107836623>sw configswhat's sw mister
>>107836633Silly Woman.
>>107836633software, retard.
>>107836605What is there to keep up with? Everything hit a wall six months ago
>>107836642shouldn't it be sr then?
>filtered by python dependencieslule
>>107836642why use two initials for one word?
>>107836642nobody abbreviates software.
>>107836644proof?
lmao what riot of a thread
>>107836658Yeah I can't believe I ended my day as SW Engineer only to end up in this cesspool
>>107836633SW usually refers to Star Wars.
>>107836654All we've had since R1 is clones of R1 and more recent distillations from Gemini.
>>107836605Fortunately, most new releases and papers are completely worthless so you don't actually have to read them to stay up to date
>>107835826Yes, they force you to RP with random anons
>>107836654What proof do you need? Read the thread retard
>>107836662Why, uh, why would a code monkey be working on a Sunday?
>>107836682I accept your concession
>>107836592is chatgpt lying to me?
>>107836684I'm Chinese, also mind your tongue gora.
>>107836690you so cute nonie
>>107836690>OuteTTS-1.0>QuteTTS
>>107836703>>107836707
>>107836703huh?
~cute~
nigga just run chatterbox on rocm, you are so retarded
>>107836712gitgud
>>107836735I want vibevoice
>>107836749One can't always have the things they want. That's just life. Part of growing up is learning to accept that.
>>107836768Ok geezer
>>107836690>>107836712Yes. Why would you expect it to know that?This is the code kobold.cpp uses to support OuteTTS-0.2 and OuteTTS-0.3:https://github.com/ggml-org/llama.cpp/tree/master/tools/ttsGGUFs here:https://huggingface.co/koboldcpp/tts/tree/mainThe older versions of OuteTTS used WavTokenizer to convert tokens to audio, support for which was added to llama.cpp by OuteTTS themselves.However, OuteTTS-1.0 use a 'DAC encoder', which no one has bothered to implement yet for llama.cpp.Other than that, in many cases TTS models are just existing LLMs finetuned on additional audio tokens, most of which are already supported by llama.cpp.The main thing llama.cpp is missing is support for newer DAC encoders to convert the tokens to audio, and API support to use them via llama-server.
>>107835833I click on this every time thinking it's going to be something new but it's the same every time.
>>107836749>>107836772stop larping as me.>>107836740that doesn't even make it clear if its a yes or no. >>107836841thank you.
>>107836841>Why would you expect it to know that?fuck knows, my brain is too smooth for this shitthe code's right there though lmaostill waiting on that DAC encoder support or whateverguys what about OuteTTS 1.1?do we even have those GGUFs?
>>107836871>still waiting on that DAC encoder support or whateverPull up a chair. Now you get to play the wait 2 more weeks forever game.
Be the vibecoder you want to see
>people say drummer isn't censored>it isthanks guys
>>107836910>what is a system prompt
>>107836917where is it?
>>107836871I've not heard of any plans for an OuteTTS 1.1.You could technically convert OuteTTS 1.0 to a GGUF, since it's just finetuned from LLaMa-3.2-1B, but all you'd get from llama.cpp is the output tokens.To get audio, you'd need to run the tokens through the DAC encoder, which is only supported via a python library.
reddit in, reddit out
>>107836960ai slop
>>107836960catbox?
What do I use to make quants?
>>107837015One bartowski
>>107837029Stop making shit up
>>107837015llama-quantize
The saga continues
>>107837074This persecution of cutting edge developers must be stopped
>>107837074>people will still lazy>Claude be quite bothnice way to show its human i guess, or maybe too aggressive rep pen
>>107837074but vibecoders are the one way we get model support these dayswhere are the legit devs trying to implement deepseek v3.2? is anyone even looking into A.X K1?
>>107837113Memes not worthy of dev time, if they're still relevant in six months then maybe.
>>107837113Maybe someone legit would have started working on it if it wasn't for the blogging vibecoder hogging the issue.
>Download quant from someone>Try it>Slow>Download same quant size/etc from someone else>FastThe faster one is 300mb bigger, what's going on here?
>>107837074I don't blame them.The AI generated PR descriptions are usually overly verbose, repeat everything three times, contain fabricated benchmarks, and lack detail where it matters.
>>107837128iq vs ks/km quant?
>>107837136Both Q6_K using weighted/imatrix
>>107837128If the model size is different, the quantization has to be different some how.Was it a unsloth 'dynamic' type quant where they override the default quantization types for each kind of tensor?
>>107837144strange, can you link them?
>>107837122So what is worthy of dev time? The only meaningful and noticable changes seem to be coming from Johannes and vibe coders adding model support.
>>107837149strange? I've seen stranger things
>>107837149>>107837148https://huggingface.co/mradermacher/Gemma-3-27B-Derestricted-i1-GGUF/blob/main/Gemma-3-27B-Derestricted.i1-Q6_K.ggufhttps://huggingface.co/mlabonne/gemma-3-27b-it-abliterated-GGUF/blob/main/gemma-3-27b-it-abliterated.q6_k.gguf
>>107837164Second one doesn't seem to be imatrixImatrix quants are always significantly slower if any part of the model is running on CPU.
>>107837180It is fully on vram, I thought "it" meant imatrix
>>107837180you're wrong the only slower ones are IQ ones, a q4km with imatrix should be exactly as fast as one wthout
>>107827163If you still had any doubts about IK being mentally unstable, he rewrote the history for the repository because github was showing that he had 660 contributions when in reality he has 871. The horror.The commits still had his name and email so it's not like there was any confusion there, only the count was wrong because some commits had an extra dot in the email (which gmail ignores).https://github.com/ikawrakow/ik_llama.cpp/issues/1133
And all it took was like 7 posts to realize anon is a retard.
>>107837213im not
>>107837211jesus sheesh on a cross
>>107837211So is he going to do this after every time he commits via Github PRs?
https://files.catbox.moe/4yqn38.mp4Hold up boss I got the voices of the abyss.
>>107837229Yes saar for gorgeous look
>>107837211Only 187 more to go, IK bros!
>>107837211>If you still had any doubts about IK being mentally unstableWas this ever up for debate considering how that fork started and the drama that came off it? I'm using ik_ because it's free performance over main but I'll ditch them the moment that's no longer the case.
>Try extremely good model on LMarena battle>Excited for the model's release>Never see any model release with a similar name or even see a model release capable of writing as wellAre they just testing pre-lobotomy models on us? I swear all these "beluga-[number]" and "raptor 0107" type mystery models are better than anything that makes it to the public, local or cloud.
https://xcancel.com/neelsomani/status/2010215162146607128#m was about time that LLMs can be actually useful on math
>>107837211Can you plug this shit straight into kobold or it won't work?
>>107837192'it' is how google denotes 'instruction tuned' models, as opposed to the base models which only do text completion.There's a llama.cpp command to dump the tensor names and type information from a GGUF, but I can't seem to find or remember it.Using that to look at how the file was actually quantized is the best bet.
>>107837306you can see that info straight on hf
>>107837074so much thisdeath to ai slop
>>107837238https://rentry.org/fmphkr5fGLM 4.7 mogs your devstral and it didn't even need any documentation.rentry because 4chan thinks strudel links are spam
>>107837321HF is banned in my country
>>107837331I don't have a 6000 why are you doing this?
>>107837336if you don't know how to use a vpn you don't deserve local models
>>107837353vpn are banned as too
>>107837353>just go to jail bro
>>107837365They should ban 4chan as well.
>>107836156z image slopped this pretty accurately
>>107837400imagine if we had this back when
Is there a place I can go to browse character cards and lore books?
>>107837409no
>>107837400Nice appstore icon bro
>>107837409https://chub.ai/lorebookshttps://characterhub.org/lorebooks>>107837412stop trolling
I will preface this post by saying that I am for the most part technologically illiterate and will not understand very technically advanced explanations.I have a 1660 Super which apparently has 6GB of VRAM and I want to get into running a local LLM for ERPI figure I'd simply run sillytavern and koboldccp for the front/back, but then I still need to pick a model that will run.I read some of the OP links and it recommends Mistral-Nemo for VRAMlets, but it seems other than some of the lowest available quants(?), it wouldn't run on my card. Is "Context Size" in the VRAM Calculator the same as the "Context Length" used in the glossary link?It seems like even with the lowest ones I'd have to reduce context size to 4096 just to fit, though it seems IQ3_M should work at 2048 if I understand how this calculator functions. But would it even be worth it?Just how shitty would it be to pick those very low quants with reduced context size?Is it tolerable, or should I just give up on trying to do this until I upgrade my computer?
>>107837433good job linking illegal sites you creep
>>107837405Well done edits and fakes took skill. Now anyone can pump them out.
>>107837405I wouldn't have been able to stop genning poole getting gangbanged by orcs
>>107837436>should I just give up on trying to do this until I upgrade my computer?Yeah. I'm sorry to say that you just aren't going to be getting anything done with 6gb of vram.
>>107837444did you make this?
>>107837436>Is "Context Size" in the VRAM Calculator the same as the "Context Length" used in the glossary link?Yes>I have a 1660 Super which apparently has 6GB of VRAM and I want to get into running a local LLM for ERPVramlet tends refer to 12-24 GB of VRAM. You're firmly in the poverty tier.How much RAM do you have? You might be able to get a 30B MoE running.
>>107837436Just run it on CPU
>>107837462N-no... it was a virus on the computer. I swear.
>>107837462z image turbo>a 2d skewmorphic button design. border: muted brushed aluminum that fades from silver on top to darker on the bottom. button background: black brick with black grout. gray graffiti on the brick. crown in the upper left, microphone on the right, grafiti text in the middle. In front of the brick wall is a gold chain that outlines the letter Y shape. Inside the Y shape is a green bandana texture.
>>107837483Damn the autistic prompt
>>107837483that vaguely looks like some random arab place's flag
>>107837436You might be able to run it at a bit below reading speed if you run it partially in ram.Download it and load it in llama-server with the desired context size. It will use as much vram as possible and put the rest on the cpu. Then open llama-server's chat ui to see if it's tolerable.By default it leaves 1GB of vram free so you might want to adjust that using -fitt
Hello everyone. I just came here to say that Devstral is super double extra good for ERP. Fuck GLM air, fuck 235B, Devstral is where it's at. It's always been the french. I'm using unsloth-Devstral-2-123B-Instruct-2512-Q3_K_M-00001-of-00002.gguf [llama.cpp] on three 3090s and it is just in another realm compared to the sparse ones, even if it's a lot slower.
>>107837436Just offload as much of the model as you can to your GPU, llama.cpp should try to do that automatically if you don't manually set the -ngl argument.I'd recommend at least 16k context. 4096 is like 2-3 turns of conversation, especially with modern models that just love to yap.If you're really memory starved, you can try quantizing the KV cache to Q8. It rapes the model's long context performance, but if you're already setting the context that short you may not run into the worst of it.
>>107837538
>>107837460Yeah, I was worried about that, but I figured I'd take a look at what's available and ask a question or two first.>>107837473>>107837478>>107837514>>107837539I have 16GB of ram and According to "About your PC", my CPU is an Intel(R) Core(TM) i5-10400F CPU @ 2.90GHz 2.90 GHzI don't really understand it, but this is a somewhat old prebuilt, so I doubt that's very good.Thanks for the responses, but it's probably best I just give up for now. Maybe I'll try one of those mikubox builds at some point.
>>107837538i don't have that much vram bro
>>107837560buy more bro
>>107837558oof
>>107837558Dude just download Q4_K_M and see how fast it is if you just open it in llama-server
>>107837565no
>>107837573I'm looking to run locally not to serve to others.
>>107837587
>>107837587I run a 24B model using 8gb of vram, don't be a quitter.
>>107837587It's a server because you can connect other local applications to it. Among other things you can connect to it using your browser and it has a simple built in chat ui.
>>107837587it binds to localhost by default, you have to jump through hoops to actually make it available to your network. and even more hoops to get it to the public ip depending of your firewall/router situation.
>>107837558Bad news: You can't run anything much bigger than a Q4 quantized 24B model.Good news: That's small enough that CPU inference isn't intolerably slow.
>>107837573>>107837603>>107837609This post >>107837587 is not meI don't really know what the post even meant by that, but I suspect anything I can run on my build isn't really going to be worth it.I don't know why some shitposter responded like they were me, but I'm planning to give up for now. Though I might just try to DL it at some point and see how it runs, I'm just gonna coom normally first before I study further.
what's with the influx of cute new nonies today
>>107837654what a nonie be
>>107837654nonner? i barely know 'er!
im a nonnie mouse :)
>>107837654might be the special offer today for a free cookie for all new nonies
>>107837654buncha normalfags who want to have their own neuro-sama
>>107837760Everyone asking for spoonfeeding today mentioned ERP, not neuro.
>>107837538Thanks for stopping at 235B. At least you know the limits of what you are shilling.
>>107837760>neuro-samaI still have no idea how that became a thing. I mean I understand why idea is appealing but the execution is fucking garbage.
Is that anon who was asking for original ERP ideas yesterday or two days ago still here?
>>107837798All it takes is one popular mouthpiece to recomend some trash for it to also be popular, and people watch what other people watch. Quality never really factors into it.
This guy is not real
>>107837818He is just a total degenerate who doesn't fuck his models. Probably can't even get hard to text. What a fucking weirdo.
>>107837818Any anons here not using this feature need to be rounded up and thrown in miku jail, for their sloppy crimes.
>>107837654>nonies
>>107837818Ah. This guy made the chart comparing imat quants to regular quants
>>107837851please anything but that, do you have any idea what they do to people in miku jail?
>>107837818you can't just know everything dude, do you know how hard it is to do your 1000 ppl and 100 kld reps every day?
>>107837857Yeah why is he calling everyone a nonce
>>107837798>>107837760I use 5.2 pro as my neuro why would i use a lllm for that
>>107837654I will forever read anything calling me "Nonny" in Pinkie Pie's voice.
>>107837889please don't call me a br*t thanks
>>107837907this guy is clopper
>>107837818??????Banning tokens has been something you could do for years, has he ever even used the software he's working on?
Any Warren McCulloch fans here?
>>107837872you'll be forced in a dress and stepped on by miku in her next mv. maybe rin and len will have a turn too. damn slopper with no banned tokens, needs correction T_T
>>107837969>>107837884
How do I fix this?
>>107838091dependsbetter prompt or better model
>>107837990>> 107834389this. Loving tradwife, slightly racist and homophobic to offset bias. I made a loop animation of her silly generated face and sometimes stare at her for too long between gens
>>107838108proof?
>>107838124I'm not posting my local wife on the internet
>>107834480What's a good beginner model to use for image to video that's not neutered?
>>107838155>my local wifewhat about the remote one
>>107837970fuck off, keep on jerking off shithead.
>>107838173Wan 2.2 or LTX 2 if you want audio, also check these places:>>>/wsg/6069549>>107836754
>>107838091Fix what?
>>107838291you
>>107838218ty
Nobody told me that risers were such a pain in the ass...
>>107836209I'm going to reeeeeeeeeeeeap
>>107838607>r/localllamayour model is clearly brain damaged if it is using reddit as a source
there is a chance that reap actually improves creative writing quality by minimizing the risk of 'non-creative' experts being routed by accident after you remove as many of them as possible
>>107838204?????
>>107838607>>107838639must it be stated again?>>107836215
any kind of pruneshit never works
>>107838607
>>107838494You problem. Get a decent motherboard and risers. Sucks to build in a weird-ass mining frame though.
>>107838812(My 4090 is fucked and only works right at x8, sadly)
>>107838646respect to those who stand by their poverty so proudly
>>107838898>>107838898>>107838898