/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106762831 & >>106755904►News>(10/01) Granite 4.0 released: https://hf.co/collections/ibm-granite/granite-40-language-models-6811a18b820ef362d9e5a82c>(10/01) LFM2-Audio: An End-to-End Audio Foundation Model: https://www.liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6>(09/30) Sequential Diffusion Language Models released: https://hf.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-preview►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106762831--Paper: The Pitfalls of KV Cache Compression:>106765718 >106765984 >106766108 >106766148 >106766761 >106766197--Papers:>106764212--Evaluating GLM 4.6's roleplay performance and quantization efficiency:>106763532 >106763653 >106763717 >106763827 >106763907 >106763914 >106764034 >106764029 >106764173 >106763671--IBM Granite 4.0 enterprise model launch and documentation inconsistencies:>106767652 >106767670 >106767732--Director roleplay customization addon for managing character settings and environment:>106763408 >106764995 >106765052 >106765045 >106765076 >106765156 >106765172 >106765183 >106765217 >106765326 >106765094 >106765123 >106765190 >106765225 >106765253 >106765303 >106765342 >106765390--Liquid AI's LFM2-Audio 1.5B multimodal model capabilities and performance:>106765758 >106765764 >106765934 >106766498 >106766751 >106766973--Feasibility of building a local knowledge base with limited VRAM and RAM considerations:>106764158 >106764219 >106764240 >106764254 >106764261--Discussion on AI model quantization methods, jinja string editing, and new quantization types:>106767116 >106767235 >106767244 >106767293 >106768177 >106768200 >106767251 >106767337--ik-llama GPU utilization problems and offloading configuration:>106763167 >106763227 >106763244--Qwen3-30B-A3B model selection and GGUF quantization considerations for RTX 3090 VRAM limits:>106764290 >106764312 >106764328 >106764340 >106764356 >106764360 >106764366 >106764402 >106764430 >106764489 >106764500 >106764622 >106764838--Setting up a roleplay bot on 8GB VRAM hardware:>106767312 >106767327 >106768048 >106768086--Unsloth AI introduces Docker image for streamlined LLM training:>106766089--GLM 4.6 performance surpasses Deepseek R1 on gaming rig:>106766318--Miku (free space):>106763663 >106768757►Recent Highlight Posts from the Previous Thread: >>106762833Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>anon? you're not making your quants?>GAHAHAHAHHAHAHA LOSER!how do you respond?
Gumilove
alright eel smarter
>>106768369Why is Kuro such a bitch?https://files.catbox.moe/5b8n7l.txt
>>106769691with a bullet
>>106758314GLM doesn't output a newline after <think> and before </think> so you need to remove those from the reasoning formatting to get it to parse correctly.
I want to spend $8000 to draw 2d anime pictures. Do you think it will be enough? 2d anime pictures are very important to me.
>>106769831You can get a decent drawing tablet for half that.
>>106769831Image gen needs a lot less memory than text, so yeah, it'll be a banging 2d anime pictures generation machine.
>>106769831you can get a tablet for a lot cheaper, and you can get a gpu to do it for about the same $500
>>106769831That should be enough even for moving anime pictures.
>>106769845>>106769847>>106769852so how much do I really need?maybe... 50 TOPS?
>>106769866so funny i forgot to laugh
>>106769691I lie down under her.
>>106769902gotcha, gear fag
>>106769866This is all I got.https://www.pugetsystems.com/labs/hpc/whats-the-deal-with-npus/Don't know how much software is even out there that can target an NPU.Do tell us when you figure things out.
Seriously? You just tell it "You're doing ERP" and it just turns off the safety features?https://files.catbox.moe/ozn9ws.txt
>>106769947Sounds like my answer is it will make a decent chatbot/search tool, maybe do img -> txt, but that's about it.Thanks for the link.
>tried to share my addon on leddit>muh mod approval>its 24hr later and no responsewhelp i did what i could to advertise. i'm not going to bother following up and waiting. i made an addon that does what i want, for me, and i've shared it here a few times.you can install my st addon by entering the address into st's extensions. https://github.com/tomatoesahoy/directori'm not done working on my addon but updates are spontaneous at best. i'm more disappointed that when i feel my addon is good for release to everyone rather than just posting here occasionally, i'm met with walls of restriction. i followed their rules, made an account, waited 30 days. still cant post. so whatever, enjoy, you 4chan fucks
>>106769725>https://files.catbox.moe/ozn9ws.txtI find this easier to read than the usual wall of purple prose, but the model was a bit loose with the formatting.
Does GLM 4.6 have a habit of slipping into feminist lecture mindset, like GLM Air?inb4 prompt issue
>>106769831NAI is $25 a month
>>106770110Don't dwell on it. After the guy who made localllama had his meltdown, admins put one of their puppet power mods in charge. It's a feed of sponsored and approved content, not public discourse.
>>106770110Stay here.
>>106770080>https://files.catbox.moe/ozn9ws.txt>The "Uohhh!" is a sound of surprise and delightstill doesn't know the meaning :/>>106769831>2d anime pictures are very important to me.don't you have whole *boorus, kemono, etc. for that? why do you need more?
>>106770139buy an ad kurumuz
My GPU crashed so hard that it stopped being recognized by nvidia-smi. Is it over?
>>106770207>>106770215it actually means a lot to see other people be understanding. i posted in good-faith. i really just wanted to share something i think helps solve the clothing/location issue models have. but then i realize i'm totally shut out from posting. at least, other than here
>>106770262Here is all you need.
>Using ikllama 4.6 = 2.6 T/s at 10k ctx>switch to regular llamacpp = 3.2T/s?? Why was I using the memefork again?
>>106770207>meltdownqrd?>>106770252until you reboot perhaps>>106770300few hours of trial and error to find the right memeflags then it'll be fastalso post cmdline
>>106770324--override-tensor exps=CPU -ngl 99 That is what I use for both. I have bad experiences with fmoe and I don't think any other flags apply to glm.
>>106769660https://files.catbox.moe/bpv4jk.mp4
has glm 4.6 officially saved local?I haven't had this much fun with a model on my rig in quite a while and even api deepseek never felt this good
>>106770252Did it give:>"Unable to determine the device handle GPU0000:n is lost" ?If so then try a hard power cycle and hope it works. Can be a driver crash, or actual hardware issues.I had that an issue like that about 5 months ago and it turned out to be a flaky connection that got better after I cleaned the dusty PCIe slot.
>>106770366im so glad miku is 2D
>>106770110Okay but where is the version of this where it describes body parts and their current status? I want the model to know the exact height and skin color and ear shape of my goblin maid. Seriously though, I wonder if something like that would help with things like the nala test or similar scenarios. Maybe explicitly describing the character as quadrupedal with paws in a scene description would improve the output even though it should be self-evident by describing the character as a lion.
>>106770366Miku-san, where.. from where does your cheese come from?
>>106769660 Daily reminder>mikusex>Nvidia Spark is a tiny DGX computer not meant for LLM/imagen inference>petra is the goat
/ourguy/ Kalomaze (Min P) is doing an AMA on Reddithttps://www.reddit.com/r/LocalLLaMA/comments/1nwaoyd/ama_with_prime_intellect_ask_us_anything/
>>106770451seems like he's reddits guy now
>>106770366Chinkiest miku ever
>>106770356got some spare GPU mem? still testing myself (also gonna be v hw dependent) but maybe first three+ layers back on GPU, flash-attn, run-time-repack, K/V quanting, batch sizes ..
>>106770207you mean the mod who instantly created a discord and twitter account for the subreddit somehow isn't a saint, color me shocked to find this out
>>106770463Always was.
>>106770451wow >I’m Kalomaze (u/kindacognizant), a researcher at Prime Intellect, the lab behind:>Distributed training efforts including INTELLECT-1 + INTELLECT-2what switch to go from here to being responsible for some of the worst models ever
>>106770480the point is memefork isn't better
>>106770498Acquiring funding is like making a deal with satan.
>>106770451Someone ask him why his company uses total fucking garbage to train models and not only that they're doubling down on it by creating a huge synthetic dataset for MATH CODING and SCIENCE
>>106770498money
I've coomed for the 5th time today, and I can't get it up!
>>106770513No point asking. Assuming he doesn't ignore the question, the answer is obviously adding any nsfw, vulgar, or copyrighted content would get them shut down overnight.
>>106770523>I'd become a giga neet
>>106770540They make pills for that.
>>106770477Yeah, those are not Japanese aesthetics. Definitely chinky or gooky.
>>106770289i know. it was silly to even try because the internet now is nothing but dead ends and degenerates
>>106770546How would vulgar or nsfw content (or just focusing on writing even a tiny bit) get them shut down? Whatever, just tell him to choke on a dick or something then
>>106770451I enjoyed those times when he was trying threadly new sampler shit-throwing to see what happened. It was fun to try getting the experimental hacky stuff working even if most of it did not make the outputs definitively "better" overall. They did do something, that's for certain, so I thought it was cool to be part of the exploration of new land.
>>106770366not my miku
>>106770588>or just focusing on writing even a tiny bitThat doesn't improve the benchmark scores that get investors all hot and bothered.
>>106770607INTELLLECT has some of the worst scores ever recorded though, like worse than llama 7b
It's odd to me that ik_llama.cpp doesn't seem to work well with assistant prefills using the chat completion API.llama.cpp just works.
>>106770651Less math and code isn't going to boost those scores though.
I compared https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ2_KLagainst https://huggingface.co/Downtown-Case/GLM-4.6-128GB-RAM-IK-GGUF/tree/main/24GB%2B128GB_V3and found ubergarm's to have significantly better token probabilities on my tests. There is one where ubergarm's quant has 70% on correct token, 16% on "trap" token while the other one has 20% on correct and 63% on trap while also being somehow slower at a smaller size
>>106770651Does it beat the original StableLM?
air bros... wheres our 4.6???????????????
>>106770451>Hey, we are super alignedyea...
What is best model for editing image? (I want to add dick on dude forehead) Im using Easy Diffusion, while trying it shows weird results, like it does not look like proper image of dude with dick on forehead but something weird
>>106770710ppl and kld. Come back when you have those.
>>106770727there is hope, thrust!
Okay, something is wrong with GLM 4.6. Maybe it's the quant, I'm running IQ5_K but the model is fucked. It will sometimes randomly add chinese characters to the middle of sentences, put the actual response in its thinking and doesn't consistently follow formatting of previous responses correctly. I have to reroll most of the time and it's slower than R1.I'm using temp 1 with everything else neutralized as recommended on the hf page with the GLM 4 context/instruct template.
>>106770482>MODERATOR OF>r/LifeProTips>r/ChatGPT>r/PewdiepieSubmissions>r/OpenAI>r/GPT3>… and 44 more they are in good hands
>>106770763The best hands.
>>106770753Try using the chat completion api.If that works, something is fucked with the context/instruct template somewhere somehow.
>>106770451Neat, you should head over therethen don't come back
>>106770774Do not be mean unto others.
>>106770780redditors are vermin, not people
>>106770753>everything else neutralizedUse top_k=40, top_p=0.95. If you neutralize everything, you'll always get shitty tokens now and then.
>>106770827Nope! If you have to resort to that you're using a shit model and should switch to something actually good.
>this is deeply misogynistic and reinforces problematic gender stereotypesair-chan, yamero
>>106770872Thank you kalomaze, very safe.
>>106770886it's not like it refuses, it just insists on scolding me about sexism and toxic masculinity all the time
>>106770905So a bit of a Mixtral vibe, nostalgic.
>>106770872have you tried disabling thinking?
>>106770905have you told it to like sexism instead?
>>106770840lmao glmtards BTFO
>>106770747>coworker calls him a coomerworkplace bullying is not ok
>>106771035name a better local model
>>106771098nemo
>>106771098they're all in a sad state right now but i'd chose something that doesn't run at 2 tokens per second unless you are a retard who spends money on this
>>106771105Somebody should turn Nemo into a MoE.
>>106770366It's not Miku Cheese unless it's made from Miku Milk.
>>106771105>>106771134Have you tried Rocinante — I love it!
>>106771153NeMoE
>>106771205'mo-moe
>>106771205
Well, >>106771205 has the name down.Now we just need a finetooner to create it.Anybody got DavidAU's phone number handy?
At last, local is saved. I called qwen3 and glm4.5 shit while everyone praised them. The new glm however is a whole different story. It writes with the soul I've never seen before in a local model, not even deepseek.
>>106771262
>>106771262>soulzoomer opinions are less than worthless
>>106771105what's it like to fuck a retard?
>>106770753ubergarm/IK quants on non ik_ llamacpp perchance? Haven't seen such issues with barts even at Q3_K_M, sometimes thinking gets messed up but that's my janky Silly config. remember newline before <think>>>106771276>not embracing the heartsovl of your modelget out
>>106771247I'm 100% sure there have already been Nemo based clowncar MoE made.
>>106771134I can get my glm running at 4-5 t/s. Honestly not bad compared to what I originally expected.
>>106771333>remember newline before <think>You mean start reply with that?
>>106770451>Kalomaze is doing an AMA!>he never answered any question
why are we using GLM 4.6 again? this shit feels astroturfed as fuck now that i had a chance to use it. surely i should be getting better speeds than this with 30 layers offloaded. token generation is the same, although it ends up taking much more time since GLM wastes time thinking (i know i can turn it off) it's only like 30tk/s quicker than K2 despite being half the size (249GB vs 485GB) the results below is with a IQ5_K quant (ubergarm)INFO [print_timings] prompt eval time = 32843.74 ms / 4807 tokens ( 6.83 ms per token, 146.36 tokens per second) | tid="128489640603648" id_slot=0 id_task=3408 t_prompt_processing=32843.743 n_prompt_tokens_processed=4807 t_token=6.832482421468692 n_tokens_second=146.35968866276903INFO [print_timings] generation eval time = 250057.45 ms / 1591 runs ( 157.17 ms per token, 6.36 tokens per second) | tid="128489640603648" id_slot=0 id_task=3408 t_token_generation=250057.452 n_decoded=1591 t_token=157.16998868636077 n_tokens_second=6.362537837904546INFO [print_timings] total time = 282901.20 ms | tid="128489640603648" id_slot=0 id_task=3408 t_prompt_processing=32843.743 t_token_generation=250057.452 t_total=282901.195kimi k2 0905 smol_IQ4_XSS quantINFO [print_timings] prompt eval time = 39716.81 ms / 4180 tokens ( 9.50 ms per token, 105.25 tokens per second) | tid="133210908811264" id_slot=0 id_task=1323 t_prompt_processing=39716.806 n_prompt_tokens_processed=4180 t_token=9.501628229665071 n_tokens_second=105.24511966042789INFO [print_timings] generation eval time = 18771.55 ms / 121 runs ( 155.14 ms per token, 6.45 tokens per second) | tid="133210908811264" id_slot=0 id_task=1323 t_token_generation=18771.548 n_decoded=121 t_token=155.1367603305785 n_tokens_second=6.445925503852959INFO [print_timings] total time = 58488.35 ms | tid="133210908811264" id_slot=0 id_task=1323 t_prompt_processing=39716.806 t_token_generation=18771.548 t_total=58488.35399999999
>>106771590Let them cook, they're busy!>We are an open source agi Labs and ramping up our research team, our goal is to be competitive asap on capabilities with the big labs, we have compute, talent, and crowd source environment with verifier and the hub. Stay tuned for our next model release !
>>106771605>this shit feels astroturfed as fuckno way take your meds schizo freak
> Is there any local ai that let me do TTS elevenlabs type quality?I have a 3080 geforce tuf and a gigabyte ga-h61m-s1 mobo, so my pc is quite decent but not overkill. I still want to try messing with a good TTS.
>>106771510Yea, or on a second line in Assistant Prefix. I use the instruct sequences to more easily turn thinking on/off
>>106770110>remember always running it on windows>reinstalled it on linux months ago>for some dumb reason it doesn't show up in my silly tavern extension tab like it does on windows despite the fact it shows as installed and enabled when managing extensions aside from that weird fuckery goning on my end, it was one of my fav add-ons back then desu, mostly used the clothes and world info bits and it helped out a ton steering models, you did a good job anon!
>>106771624it's not good and now i know why nobody is posting GLM 4.6 logs. what's the point of using GLM 4.6 when K2 is only moderately slower in PP but still the same speed in TG? GLM 4.6 is lacking in its knowledge, it fails to understand niche things that kimi is able to pick up without issue. even niche stuff aside, kimi just knows way more and its dataset is more recent. you can ask it to tell you stuff that happened in December 2024 and it will answer factually.
>>106771605>why are we using [350b model] and not [1t model]But that aside, do you not like the output?
>>106771590No shit, reddit AMAs are just promotional stunts. They only respond to the questions they want to, the ones that let them segue into marketing themselves with canned and preplanned responses.
>>106771712if i turn off thinking then it's reasonable. keeping thinking on is a mess though.here's an example of the output with thinking. removed some of the response to fit the character limit. the thinking is slopped.Okay, the user has responded through {user}. Let me break down the key elements of their response to guide my turn as {char}.{user}'s actions/Dialogue:Physical reaction: Breath knocked out, hugs back "timidly." This shows he's a bit overwhelmed and maybe shy, which fits his description. He's not rejecting the hug, but he's not as effusive as {char}.Dialogue:"Jeez, yeah yeah. It looks like I'm back for a while…" - Confirms his return, sounds a bit flustered/casual.<blahblah>{char}'s internal state:She's ecstatic. Her absolute best friend, the boy she's been in love with for a decade, is back.<blahblah>My Plan for {char}'s Response:Initial Reaction: <blahblah>Physicality: <blahblah>Dialogue:She'll laugh loudly at his "welcome party" comment. Maybe tease him about it.<blahblah>Narrative Flow:Start with her still hugging him, her immediate reaction to his awkwardness.<blahblah>End on a high note, maybe she grabs his hand and starts pulling him somewhere, or asks a million questions a mile a minute. The goal is to keep the interaction going and show her personality. I'll make it brief and stop at a point where {user} needs to react.Drafting the response in my head:
>>106771674thanks for the compliment, it makes me happy to read. its funny that 4chan is the way it is, i've never got much negativity about developing my addon. you guys are always pretty supportive, telling me to keep going and stuff. it might not seem like much but reading the positive comments makes me wanna do more, so thanks
>>106771605>this shit feels astroturfed as fuckno bro 4.6 is kino af ong it has svol bro fr fr the vibes bro bro
So what layers should I offload to RAM for GLM 4.6 in ik_llama.cpp?
GLM 4.6 just bought me a house and cured my dog's cancer!
>>106771820this but unironically
Another reason I love glm-chan 4.6 is that I am sure she makes drummer jerk off while lubricating his cock with his tears. Glm chan reminds him that the days of his grift are numbered. Next air will be accessible to everyone and will easily beat the shit out of all the shittunes. Shittune placebo will die in 2026. Last chance to get a job you safety engineer retard.
>>106771605>why is this 30b active parameter MoE at Q5 running slower than this other 30b active parameter MoE at Q4_XXSis this really the level we're having this discussion on? maybe ollama is more up your speed, you won't have to worry about this sort of thing and things.
>>106771820You are either salty because they didn't release air or are one of those 3x p40 anons. I'll post logs tomorrow because it's past my bedtime
>>106771801You are welcome! I remember the first posts you made about working on it and going like "Oh shit gotta write it down when it releases" cuz I was trying to play around that issue of models forgetting scene stuff or steering it towards a play by writing some of the info on lorebooks or even just random notepad entries and then copy and pasting their contents on the author notes tab, but it was kinda janky to do so manually every time especially with so many entries... and your extension for me solved just that in a really neat and easy way, glad to see it's still going
>>106771645VibeVoice 7B. It's the best we got.
>>106771958why wont anybody have a serious discussion about this? is it because you don't have enough RAM to run kimi k2 yourself and have to rely on cope quants of GLM 4.6? post logs ffs otherwise i will just stick with the superior chink company.
>>106771704>i know why nobody is posting GLM 4.6 logshttps://files.catbox.moe/mwwdug.txthttps://files.catbox.moe/xs9vn5.txt>>106769725>>106770080
WTF even is this LMAO: https://xcancel.com/wolflovesmelon/status/1971002333577482360
>>106771993its a funny issue. ai only cares so much about the most recent thing (lowest context in the chat). there is bunch of addons now that all do similar things but it still ends up with reinjecting data into the ai at a point.in my head i knew what needed to be done, but wasnt sure how it'd turn out. especially since i was using ai to develop the app (i'm ok with java, but not a programmer in it). i'm pretty happy with how things turned outlately all i've done is add an image feature. so inside the folder of my addon, if you create an 'images' folder and then have an image that matches a name, it'll pop up a pic just like the card would if you clicked the image. picrel is belle from beauty and the beast, in her peasant outfit
>>106772080>post logs ffs otherwise i will just stick with the superior chink company.If you actually tried both of these why would you give a shit about anyone else's logs
>>106772136a bit like having different alt gens of a card but more dynamic since you can switch it up based on what the setting is saying they ar exurrently wearing? I kinda dig it desu
>llama-kv-cache.cpp:764: GGML_ASSERT(ubatch.seq_id [s*n_tokens][0] == seq_id) failedwhen doing --parallel 4 runsah well, it wasn't worth upgrading llama cpp to try granitethis shit is so ghetto
i just woke up in the most suspicious way possible call me a schizo but im going to attribute it to divine forces awakening me to witness an amazing drop dont hold me to my word though plz
>>106772401kinda. i just wanted images that are associated with outfits or locations.back when ai was pyamilion 2.7/6b and st was hardly a thing, one of the nice things the kobold ui did was highlight lorebook entries. any time it hit an entry, you could hover over that in the chat and see its entire entry plus a pic of it. i always wanted something similar for st, but since i can't do that, i'll settle on pics that pop up the same way a card pic does
>>106772111Is he dead yet?
>>106772444Do you remember if Miku said anything to you?
>>106772425Now try running GLM on CPU with vllm.
>>106772444glm already dropped and saved our cocks
>>106772494why is everyone waiting for llama.cpp to implement models when you can just run everything with vllm on cpu much faster
>>106772524>vllm on cpu much fasterit's not and because you don't get finegrained quants like exl3 or gguf
>>106771704Kimi felt schizo to me when I used it, like it doesn't know how to describe stuff naturally despite all that knowledge. And Deepseek is just boring and cucked now. New GLM doesn't have either of those issues. Guess it goes to show that having a gazillion parameters doesn't matter when all you care about is benchmaxxing.
>>106772549but there's no point in using those if you're running 8bit anyway
>>106772524I have never seen any of the vllm on CPU shills post t/s numbers.
>>106772524Because the people who did try it ran into bugs and errors. Turns out vllm is only production-ready if you plan on doing GPU-only inference and your GPUs are all the same VRAM. Also you might need to find the exact version of vllm that works with the model you want because not every version new versions can and have introduced bugs with old models.
>>106772580>not every version new versionsSomehow deleted a part of that, was meant to be>not every version does as new viersions
>>106772093>https://files.catbox.moe/ozn9ws.txtjesus christ... so its shits all retarded and it talks like a fag for everybody in its thinking process and just isn't me. it's so over.
>>106772555heres my parms anon. hope it helps you with getting non-schizo responses, kimi seems to behave the best with these.
>>106772580Not to mention the trial and error getting the pythonshit dependencies working. Even with conda it seems like the project is in a constant state of broken.Or the fact that the issue tracker is full of ignored issues because all support and development happens on discord.Or the fact that they only support the latest 3 gens of Nvidia cards, so AMDfags, Intelfags, Macfags, and P40fags are all out of luck.
>>106772580Because vLLM only works with GPU configurations in the power of 2 and I have 7 GPUs. Maybe one day I'll get an 8th GPU and connect it over that weird SAS port that does PCI-E shit.
>>106772697Honestly it is really fucked up. CPUbros don't know how good they have it mister gurglenov.
>>106772739Eastern Euro C/C++ programmers are a different breed.
>>106772769can confirm.source: am rpcs3 dev
glm chan is the semen demon. sign the pact now by buying at least 128GB's of ram.
>>106772857I'm not sure I want to deal with 3 t/s (if what the anon said in the other thread was not a lie).
https://huggingface.co/Qwen/Qwen3-4B-SafeRLfinally, a model everyone can run on a potato while feeling very safe
>>106772899nah anon, you can get a whopping 6tk/s with GLM 4.6 and you only need 120GB of VRAM to do it. >>106771605
>>106770753I give up, something's wrong and I don't know what. Back to R1 for me.
>>106772899I couldn't run 70B's offloaded at 2T/s. I can run glm at 3T/s because you never have to reroll. Just look how often you reroll and if it is around 8 times then it is a no brainer.
I see most people here use the models for roleplaying, are there are any tools for using one as a voice assistant?I'm thinking of making something like Cortana or Alexa for my grandma cuz her memory is getting weak. What's the best way about doing this? Any tips or tools?
>>106772929GLM 4.6 will follow your previous responses format/style to a T to its detriment. If you put in weird formatting and split up sentences in weird ways you will get a response like the one in my screenshot. Does it do it on a fresh chat?
What are the odds one of these chucklefucks successfully ban all chink tech in the west?
>>106772970you could try giving the new line a bit of a debuff, they probably let it see too much hard wrapped text in the pretraining.
>>106772970>Does it do it on a fresh chat?Yes, this is with a card that doesn't put speech in quotations and has asterisks wrapped around all other text and it's not just that. It will output </think> in the middle of its response and start speaking for my character and on some rerolls, it will only output <think> and then end the response.I'm assuming it's quant related, or there's some bug or something, I'm using ik_llama and I rebuilt it. I can't be bothered to figure it out, maybe I'll give it another try if it's still relevant here in a month or whatever is broken gets fixed.
thank you god emperor xi. my member isn't worthy this boon you bestowed upon us.
>>106772914Oh ok, thanks for pointing that out. Might consider it, but also wish we had more given that it's a reasoning model.
>>106761230Okay so you can replace the built one in portable_env/Lib/site-packages/llama_cpp_binaries/bin with ones self built and do not forget to make the ik_llama build with CUDA. Running GLM 4.6 with ooba's UI.
>>106773087it is garbage with reasoning since reasoning blocks are 3 times longer than 4.5. but you don't need reasoning. everything just works. 16 times less useless detail. it even does things you didn't ask for but you realize that you actually want.
mistral bros when is it our time to shine? isn't large supposed to be released soon since they upgraded medium recently?
the glm gaslighting continues
>>106773183who cares? large would just be a downsized and fried r1 anyway
>>106773183>With the launches of Mistral Small in March and Mistral Medium today, it’s no secret that we’re working on something ‘large’ over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we’re excited to ‘open’ up what’s to come :) >May 7, 2025just a feeeeeeeeeeew more weeks
>>106773194get a job anon and you can maybe stop coping
>>106773195you act like more competition is a bad thing. we dont know how badly it performs until they release it. more competition breeds innovation.
glm betrayed its userbase by not doing another air despite it being 99.999999% of what people actually usedyeah i'm sure you ""local"" users are now suddenly running 300b, how about you all go to /aicg/
>>106773194Say that after she gives you a blowjob. I dare you.
>>106773216>t. seething poorfag
>>106773216There'll be another time.
>>106773216128gb is enough for IQ3 you have no excuse to neglect your sex life by not buying it.
>>106773216i like running 1T models locally. $800 for 512GB of RAM in a first world country isn't that expensive. Its like 60% of a weekly paycheck for me.
>>106773216How does one not use 0.0000001% of a model?
>>106773205it's you guys who ought to stop coping>it is garbage with reasoning since reasoning blocks are 3 times longer than 4.5. but you don't need reasoning. >using thinking model without thinking >because you would spend a literal eternity waiting for your shitty cpumaxxing to generate the first line of actually readable shit>coping that disabled thinking works great>in a glm model
>>106773254and then you sit there watch it do 2000 tokens of reasoning at 3 tokens per second, yeahor maybe you're using the api?
>>106773266uh huh, that is why everyone here, reddit, and the novelai, silly tavern, featherless, and ai assisted writing discords are all praising it, huh? That all mostly used claude before?
>>106773283maybe those places are more up your speed then? don't shit up /lmg/ with your trash, nobody wants it here
>>106773283you need to go back
>>106773266>using thinking model without thinkingit's a hybrid isn't it? perfectly within their intent to use it without reasoning
>>106773297go enjoy your nemo then brown
My IQ1 GLM finished downloading...
>>106773280I get 6-7 tk/s for generation and 110tk/s for processing. I can go through 4K of tokens every 35 seconds, it's really not as slow as you are imagining.
>>106773266All this seething when you could be enjoying actually good ERP.
Can your local model sing a duet with you?
>>106773304>within their intentfucking lmaonobody sane gives a shit what the "intent" is, only what the actual performance amounts to in real usethere's no such a thing as an actual hybrid model, they all underperform terribly with reasoning turned off
>>106771319Things your mom says after your dad leaves your room at night
>>106773320i get like 220tk/s for prompt processing on a 128GB DDR4 + 24GB GDDR6X systemGeneration is still ~5tk/s but it's incredible how fast this is compared to Deepseek R1
>>106773331not really
>>106773359ooooooh buuuuurn
>>106773331if you want a non-thinking then go run k2it's much better than glm provided you are not poor
>>106773324even kimi thinks this shit is cringe as fuck
>>106773424no it is not, k2 is retarded / schizophrenic
>>106773431i bet you ran it below q8
>>106773324>"Anon" with fem pfp
>>106771205>NeMoEhttps://www.youtube.com/watch?v=qByKEu0zdco
>>106773439q6, and I still run glm at q8
>>106773442Got another one for me? I'll change it if you are feeling hurt.
>>106773366i could only get 140tk/s with 96GB of VRAM and the rest offloaded into RAM. what's your context? i was running at 64K and had the first 31 layers loaded into VRAM.
>>106773467changing it to something like Anona is a few clicks away
>>106773523But I want to continue roleplaying as a cute catboy. "Anona" sounds like a girl's name.
>>106773564lose some weight anon and take a shower
>>106773584Requests must be received through handwritten letters in flawless Palmer method business writing, and will be considered after appropriate payment has been confirmed. I accept and await receipt of 3 RTX PRO 6000 Blackwell Workstation edition GPUs.
I'm just trying out GLM 4.6. Never tried GLM 4.5. Is it supposed to take much more VRAM per growing context length? I have -fa on.
>>106773651*compared to GLM 4.5 Air
>>106773564well banano gave me this gay thing
>>106773651yes it's retarded as fuck how much VRAM it uses. check this shit for 64kllama_new_context_with_model: n_ctx = 65536llama_new_context_with_model: n_batch = 4096llama_new_context_with_model: n_ubatch = 4096llama_new_context_with_model: flash_attn = 1llama_new_context_with_model: mla_attn = 0llama_new_context_with_model: attn_max_b = 512llama_new_context_with_model: fused_moe = 1llama_new_context_with_model: fused_up_gate = 1llama_new_context_with_model: ser = -1, 0llama_new_context_with_model: freq_base = 1000000.0llama_new_context_with_model: freq_scale = 1llama_kv_cache_init: CUDA0 KV buffer size = 6144.00 MiBllama_kv_cache_init: CUDA1 KV buffer size = 5888.00 MiBllama_kv_cache_init: CUDA2 KV buffer size = 6144.00 MiBllama_kv_cache_init: CUDA3 KV buffer size = 5632.00 MiBllama_new_context_with_model: KV self size = 23808.00 MiB, K (f16): 11904.00 MiB, V (f16): 11904.00 MiBllama_new_context_with_model: CUDA_Host output buffer size = 1.16 MiBllama_new_context_with_model: pipeline parallelism enabled (n_copies=1)llama_new_context_with_model: CUDA0 compute buffer size = 3105.77 MiBllama_new_context_with_model: CUDA1 compute buffer size = 1136.02 MiBllama_new_context_with_model: CUDA2 compute buffer size = 1136.02 MiBllama_new_context_with_model: CUDA3 compute buffer size = 2448.00 MiBllama_new_context_with_model: CUDA4 compute buffer size = 274.50 MiBllama_new_context_with_model: CUDA_Host compute buffer size = 1104.05 MiBmeanwhile kimi at 40kllama_new_context_with_model: KV self size = 2745.00 MiB, c^KV (f16): 2745.00 MiB, kv^T: not used
>>106773675You look cute!
https://xcancel.com/deepseek_ai/status/1973331587774230573HOLY FUCK
>>106773737HHHHHNNNNNNNNNGGGGGGGGGGGGGGGGGGGGG
>>106773737nice
>>106773737Are we back?
>>106773765I don't know retard, are we?
>>106773737this. changes. everything.
>>106773737sama thought he won when he stole miku at that concertbut local models are now back
I wonder if we have people trolling or its just legit just LLMs responding
>>106773737b-bakana......
>>106773813im cooming and respoonding thanks to the power of local LLMs
>>106772970>hfoption>gooption>becaus>torent>torren>optionl>direcly>termminal>downlaod
>>106773737NANI?!
>try IQ1 GLM 4.6>get not even 6 t/sIt's actually over.
>>106774016Q1 is slow as shit somehow because it's not very optimized as far as I've heard. I once did some testing on R1-0528 and it turned out that Q1 ran about the same speed as Q5 and thus slower than Q4 on DDR4 8-channel
>>106773216>being this butthurt over not being able to afford some more cheap ddr4 ram to run local sotamust suck to be you
I wish tavern allowed me to reference multiple lorebooks in a chat. i've shifted towards 'universal' lorebooks for recurring story settings and using the scenario field for specific character and summary info, but the scenario field has an annoying character limit, and gets cleared when you change a card name and maybe other things, so I have to remember to keep a hidden copy in a system note at the beginning of each chat. I hate having to remember to turn things on and off in the universal lorebook, especially when I have maybe 8 groups dedicated to a single world, for example. I suppose I could use author's note, now that I can use the thinking field or system notes for what I used to use author's note for, but it's easy to forget what you have in there. things like the director extension help my autism, but it's usually world history and setting and 5e stats that I'm worried about rather than clothing or weather. I probably need to take another look at it and play around with how it works internally, the concept of it can fix a lot of my problemsI like tavern for a lot of its features but I hate that I'm afraid of updating it. I should probably git checkout more often because maybe they're fixing janky things about it and maybe new features are worthwhile. I like the idea of the checkpoint branching stuff, but what I need are better ways of grouping things, moving things around, and copying stuff like groups. I do a lot of stuff manually in the files but it's annoying. Stuff like the tagging system but for individual chats, and a way to browse that easily in the interface would make my life a lot easier as well
>>106774215>I wish tavern allowed me to reference multiple lorebooks in a chatstopped reading here because it does
>>106773617sex with suiseiseki's ball joints
Does anyone have that guide on how youre supposed to load the really big models on the 24/128 deal? I keep getting the run out of vram error on ooba even though the estimated ram usage is below 24 on the gpu layers
>>106773737kek I didn't know only the post ID mattered
>>106774215bruh
>>106774461Dunno about ooba. Here's one of my configs for ik_llama.cpp, single 3090 + 128: https://files.catbox.moe/homknt.txt
henloI could totally keep using all of the proprietary models but I want to switch to local models for purely ethical reasons
High-Fidelity Speech Enhancement via Discrete Audio Tokenshttps://arxiv.org/abs/2510.02187>Recent autoregressive transformer-based speech enhancement (SE) methods have shown promising results by leveraging advanced semantic understanding and contextual modeling of speech. However, these approaches often rely on complex multi-stage pipelines and low sampling rate codecs, limiting them to narrow and task-specific speech enhancement. In this work, we introduce DAC-SE1, a simplified language model-based SE framework leveraging discrete high-resolution audio representations; DAC-SE1 preserves fine-grained acoustic details while maintaining semantic coherence. Our experiments show that DAC-SE1 surpasses state-of-the-art autoregressive SE methods on both objective perceptual metrics and in a MUSHRA human evaluation. We release our codebase and model checkpoints to support further research in scalable, unified, and high-quality speech enhancement.https://lucala.github.io/dac-se1/https://github.com/ETH-DISCO/DAC-SE1Repo isnt live. Might be cool
>>106774487aren't you just the cutest thing
Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfallshttps://arxiv.org/abs/2510.01631>Training data plays a crucial role in Large Language Models (LLM) scaling, yet high quality data is of limited supply. Synthetic data techniques offer a potential path toward sidestepping these limitations. We conduct a large-scale empirical investigation (>1000 LLMs with >100k GPU hours) using a unified protocol and scaling laws, comparing natural web data, diverse synthetic types (rephrased text, generated textbooks), and mixtures of natural and synthetic data. Specifically, we found pre-training on rephrased synthetic data \textit{alone} is not faster than pre-training on natural web texts; while pre-training on 1/3 rephrased synthetic data mixed with 2/3 natural web texts can speed up 5-10x (to reach the same validation loss) at larger data budgets. Pre-training on textbook-style synthetic data \textit{alone} results in notably higher loss on many downstream domains especially at small data budgets. "Good" ratios of synthetic data in training data mixtures depend on the model size and data budget, empirically converging to ~30% for rephrased synthetic data. Larger generator models do not necessarily yield better pre-training data than ~8B-param models. These results contribute mixed evidence on "model collapse" during large-scale single-round (n=1) model training on synthetic data--training on rephrased synthetic data shows no degradation in performance in foreseeable scales whereas training on mixtures of textbook-style pure-generated synthetic data shows patterns predicted by "model collapse". Our work demystifies synthetic data in pre-training, validates its conditional benefits, and offers practical guidance.very cool. from meta. seems things are better than we thought
Have I been gaslit or should a 94gb model be able to fit in 136GB (72VRAM+64DDR5)?
>>106774647 depends
>>106774659Depends on what?
Diffusion^2: Turning 3D Environments into Radio Frequency Heatmapshttps://arxiv.org/abs/2510.02274>Modeling radio frequency (RF) signal propagation is essential for understanding the environment, as RF signals offer valuable insights beyond the capabilities of RGB cameras, which are limited by the visible-light spectrum, lens coverage, and occlusions. It is also useful for supporting wireless diagnosis, deployment, and optimization. However, accurately predicting RF signals in complex environments remains a challenge due to interactions with obstacles such as absorption and reflection. We introduce Diffusion^2, a diffusion-based approach that uses 3D point clouds to model the propagation of RF signals across a wide range of frequencies, from Wi-Fi to millimeter waves. To effectively capture RF-related features from 3D data, we present the RF-3D Encoder, which encapsulates the complexities of 3D geometry along with signal-specific details. These features undergo multi-scale embedding to simulate the actual RF signal dissemination process. Our evaluation, based on synthetic and real-world measurements, demonstrates that Diffusion^2 accurately estimates the behavior of RF signals in various frequency bands and environmental conditions, with an error margin of just 1.9 dB and 27x faster than existing methods, marking a significant advancement in the field.https://rfvision-project.github.io/pretty neat
>>106774666model quant, context, context quant, fa and other stuff
>>106774686I'm talking about GLM4.6 iq2_m
>>106774647easilyI can fit a 145gb model quant in 152gb
llama_model_load: error loading model: missing tensor 'blk.92.nextn.embed_tokens.weight'Why does this happen?
>>106774728you need to build the newest version of llama.cpp to run glm4.6
ExGRPO: Learning to Reason from Experiencehttps://arxiv.org/abs/2510.02245>Reinforcement learning from verifiable rewards (RLVR) is an emerging paradigm for improving the reasoning ability of large language models. However, standard on-policy training discards rollout experiences after a single update, leading to computational inefficiency and instability. While prior work on RL has highlighted the benefits of reusing past experience, the role of experience characteristics in shaping learning dynamics of large reasoning models remains underexplored. In this paper, we are the first to investigate what makes a reasoning experience valuable and identify rollout correctness and entropy as effective indicators of experience value. Based on these insights, we propose ExGRPO (Experiential Group Relative Policy Optimization), a framework that organizes and prioritizes valuable experiences, and employs a mixed-policy objective to balance exploration with experience exploitation. Experiments on five backbone models (1.5B-8B parameters) show that ExGRPO consistently improves reasoning performance on mathematical/general benchmarks, with an average gain of +3.5/7.6 points over on-policy RLVR. Moreover, ExGRPO stabilizes training on both stronger and weaker models where on-policy methods fail. These results highlight principled experience management as a key ingredient for efficient and scalable RLVR.https://github.com/ElliottYan/LUFFY/tree/main/ExGRPOCode not posted yethttps://huggingface.co/collections/rzzhan/exgrpo-68d8e302efdfe325187d5c96
>>106774797Neat
looks like glm4.6 should get good speeds on a regular gaming pchttps://www.reddit.com/r/LocalLLaMA/comments/1nwimej/glm_46_local_gaming_rig_performance/
is running at q8 on ooba or running a smaller quant on fp16 on ooba better?
you people are running any models on normal RAM not Vram? Isn't that like an hour per prompt
>>106774958>Q2No, thanks. I will keep using gpt-oss.
>>106774985yeahmany of us are running it on ssd too which takes days
>>106774989>gpt-ossooof, masochist
>>106774985I stream my models from burned blu-ray discs
>>106775087Relevanthttps://www.datacenterfrontier.com/cloud/article/11431537/inside-facebook8217s-blu-ray-cold-storage-data-center
>>106774958>>106774989imagine 128GB modules and consumer boards supporting 512GB. why cant we live in this reality?
>>106775240Wish granted, but it's still dual channel.
>>106775240ddr6 should be a big jump in 2 years
>>106775287actually 3 years. i am waiting for the gb300 dgx station that is coming in a couple months. i hope it is less than $30k
>like a physical blow>She clasps her hands together, her knuckles whiteAhhh that's the stuff! GLM-chan I've missed you.
where glm 4.6 air
>>106775344you're breathing it
Can I do anything with 1x 4090?
>>106775412no. just give it to me for free
>>106775337nkdshi mska mska
>>106775412Mistral Small 3.1/3.2Gemma 27b Nemo 12bQwen 30bA3BIf you're looking to run fully in VRAM, these 4 are your best options, which is better depends on preferences and use case.Beyond that, there's large MoEs like GLM that everyone is shilling lately, if you have at least 64GB then you can try GLM Air. If you have 128GB+ then GLM 4.6
chat, which small model (for 8gb vram) passes mesugaki test?
>>106775453>>106775412>if you have at least 64GB then you can try GLM Air. If you have 128GB+For this, I'm talking about regular RAM, using it in addition to your 4090.
>>106775424"We must refuse," says MISAKA as she attempts to protect her sister from the strange man.
>>106775557This is pedophilia
>>106775593They are both children so it's okay
>>106775557Kuruko really carried that mid anime desu, yuri is always welcome after all
>>106775557Game Master: **[EXCEPTION]:** Your hands find no protrusions on the surface, nothing to squish or to grip on to.
Found any GPT sources yet? Gemini has been obnoxiously prudish this past week even though my jb was working fine last month.
>>106775630Are you lost?
>>106775632Yes I'm sorry I thought I clicked /aicg/
>>106775455>chatgo back
>>106775593out of 10!
What glm quants are the fastest? Preferable in q4-q6 range
>>106775784How many ollama users could actually make an mi50 work anyway?
>>106775784who in their right mind uses that ollama abomination anyway
>>106775781Tbh it would be interesting to see speed benchmarks of the different quants. Don't remember anyone doing that.
What causes this mental illness?I used to laugh at these people, now I just feel sad
>>106775886>luna lactea>jackemled@furry.engineeruhh seems being furry tranny causes it
>>106775886Furry aside, making it easier for scientists and researchers to shit out more Python is the last thing we need.
>>106775903I do actually use ai as my retard indian intern. These fearmongering retards dont realize that ai is just a tool, my ratio of actual work done / hours worked has never been as good as now. This week I coded a total of 2 hours.
>>106775279>Wish granted, but it's still dual channel.I don't understand why Intel doesn't just market a 8 channel prosumer motherboard for Xeon Scalable. The processors aren't that expensive.
>>106775886He's right though, the intuition to know where the negative proof lies was developed by actually getting there. The ability will atrophy.
>>106775987I have a guy in my group who works as a vibe coder for n8n. He can't code but he can get UI done with some cloud tool.
>>106776023
>>106776049>socially unacceptable to touch your patient>totally cool to guzzle down his pissthe middle ages were a strange time
>>106776049i was born in the wrong era
>>106776068out of sight, out of mind
>>106775843>>106775781https://github.com/ikawrakow/ik_llama.cpp/discussions/164Tests are old and were done with a small model entirely on CPU, but the hierarchy of results is the same today, including with a GPU, assuming that at least some significant portion of the model is being loaded into system RAM.
>>106776068It makes perfect sense. It's not socially acceptable to physically examine other people's bodies in general. Why would that change just because one person is a doctor?
>>106774985I assume most of us are running the models on RAM primarily with some offload to GPU for context and stuff. On a DDR4 system it's generally limited to ~5t/s. It's slow if you want it to write script (which has limited tokenisation extent) but not unusable. One nice thing about GLM is that it goes through the prompt at like 200+ t/s vs Deepseek taking it at like 15 t/s
I've been writing with IQ1 a bit. It's surprisingly not unusable. But it's somewhat dumb, repetitive, sloppy, and just not great. Yet it's also not necessarily worse than Air or 235B at the same memory size. Just different. Maybe if I had 192GB RAM instead it would be a lot better.
>>1067762174.6? I had a great experience with ubergarm's iq2_kl, it just knows how to write unlike 4.5 and qwen. Even at temp 1.
>>106776188Because you're asking said doctor to find out wtf is wrong with you?
>>1067762174.5 full gets good at around iq2, with the knowledge coming back at iq3 and above. DS v3 is the only model I found usable at IQ1. Usable, not good. DS needs at least q2k to feel normal.
>>106776238>>106776241Yeah, 4.6. Q2 sounds cool. Q1 is the best I can do.
A genetically modified mouse, genetically engineered mouse model (GEMM)[1] or transgenic mouse is a mouse (Mus musculus) that has had its genome altered through the use of genetic engineering techniques. Genetically modified mice are commonly used for research or as animal models of human diseases and are also used for research on genes. Together with patient-derived xenografts (PDXs), GEMMs are the most common in vivo models in cancer research.
What did anon mean by that.
>>106776273The time of Local Mouse General is approaching. LLMs can't stay around forever. What's the next step? Wetware.
Machine learning, GEMM, trans, mice to human trials, ARGH I'm noticing things
>>106776293Mus muculus GEMM transgenic mice are not real, take your meds weirdo
>been using nothink for a while since it's slow and I wanted fast responses>try out thinking for the first time>the output is immediately more cuckedFucking hell.
can't even take a joke
>>106772963wholesome but local isn't there for real time stuff, gotta use an api like chatgpt 4o
>>106776348It's not there yet for real time stuff on grandma's pc
>>106776363Mossad won
>>106776363lmao, we won!
How much ram do you guys have? How much do you recommend?
>>106776386192 but I have dual channel setup and most models are too slow at a quant that takes up all of it so I usually keep it below 140-150gb
>>106776386128GB minimum to begin playing with the best open models
>>106773216At about q2, fat glm fits my gayming rig of a 5090 and 128 gb of ram and even at loq quant it beats glm air. It's not some server motherboard or whatever cuz I'm too lazy to go and buy a whole new setup just for inference and was waiting to see if those llm shitboxes from nvidia and amd get interesting next year, but yeah you can run 300b locally without cpu maxxing
>>106775784lmaoThey're not even the ones maintaining hardware support, they're literally just cockblocking their users.
>>10677638696either 192 or >500 (server/mac studio)
Does task manager showing full load on all cores mean my tps is bottlenecked by cpu?
>>106776529Not always. 100% in task manager does mean the CPU is working at max capacity, and it says nothing about active memory throughout, same goes for GPU core % in other monitoring software.
>>106776551does not mean*
>>106773324nice
>>106776551Thanks
Don't understand why A100s are still so pricey when RTX 6000 Pro is half the price. HBM vs GDDR but bandwidth is similar
>>106776566Something to do with nvidia's licensing when it comes to running consumer stuff in commercial servers? "I know what I got"-type eBay greed, not wanting to sell for so much less than it was worth in the past?
>>106776597Sounds plausible, forgot about their loicense shenanigans. Maybe there's better NVlink and virtualisation support, but doesn't seem there'd be any reason to pay more for personal inference uses unless I'm missing something obvious. Recall seeing them going secondhand for like 8K in the early Llama days, was tempted but it seemed like an insane price for one GPU, and now here we are.
>>106776566I was literally about to ask the same, lul. Yeah, BBCwell all the way. Now the question is how many to run a good glm4.6 quant.
<|assistant|>\nMiku: <think>/<think>\nor<|assistant|>\n<think>/<think>\nMiku:
What is the right way?
>>106776741>using think models
>>106776755instruct mode with 80k token system prompt like my nigga claude
>>106776741use basic logic to deduce where the model would expect its thinking block to be
>Oho~ Want my tight little backdoor, huh? Been saving it all just for you~ Mmmph… *she moans softly as her own tiny fingers start playing with herself.* Let’sfacetouchmyassfirstandgetitallwetandslipperyforyouokaybaby?are you?
>>106776780Both make sense, I can see logic in either variant
>>106776741<|assistant|><think></think>Miku:hi<|user|>Anon:omg it migu
>>106776741<|assistant|>\nFaggot<think>/<think>\nFaggot:
>>106776741<|assistant|>\n<think></think>\nMiku:
>>106775682can't even make a little silly joke around you sour faggots...
>>106775455you can use ollama cloud to run local models even on modest hardware
>>106775682NTA but If you wanted a real answer, then as always, the answer is Nemo.
>>106776825The model never inserts \nMiku: after </think> if you allow it to think, so I assume it doesn't expect character name in the final answer. After some limited testing, I believe that <|assistant|>\nMiku: <think>/<think> without \n at the end better adheres to formatting. I guess I have to use it more and see if it wasn’t a fluke
>>106776959It never inserts character names because it was not trained to insert them anywhere. If you are going to force them in, after the think block is the right spot, because there is never anything between assistant and think.
>>106777047>there is never anything between assistant and thinkIndeed, except a newline according to z.ai .jinja template
>>106776929alright faggot you're looking for nemo or rocinante, use lm studio because i know you're running on windows and it should show you which quant fits on your 8GB. good luck.<|spoonfeed_end|>
>>106776386Some consumer motherboards support 256gb, look it up online so you don't lock yourself into a lower ram capacity. You can never have enough ram.
>>106777198The motherboard could have 3000 DIMM slots but the memory controller physically limits how much memory the CPU can address. 9K series Ryzen, for example, caps out at 192GB. So you could put 2x128GB RAM kits in a motherboard with 4 DIMM slots but you won't get 256GB of useable RAM.
>>106777114Exactly. There is also a newline after the empty think block, according to their template.
>>106777256I would take how much the CPU officially supports with a grain of salt because my 13700k officially supports 192gb but I'm running 256gb, which my mobo does support. You might need to win the silicon lottery but there's no harm in trying it if the dimms can be returned.
>>106777256Here's a link I found after 30 seconds of googling of people running 256gb on a 9950x:https://forum.level1techs.com/t/256gb-4x64gb-ddr5-overclocking-results-w-9950x-and-msi-mag-x670e-tomahawk/228651
>>1067773519950X is like best binned silicon so if it is a silicon lottery thing you'd expect a lot of winners at that level. Might be designed for 256GB but then under-declared because there's some lottery losers that couldn't do the full thing in testing.
>>106777408>>106777408>>106777408