/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106748568 & >>106738470►News>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6>(09/30) Sequential Diffusion Language Models released: https://hf.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-preview>(09/29) DeepSeek-V3.2-Exp released: https://hf.co/collections/deepseek-ai/deepseek-v32-68da2f317324c70047c28f66>(09/27) HunyuanVideo-Foley for video to audio released: https://hf.co/tencent/HunyuanVideo-Foley►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106748568--Papers:>106752846 >106755511--Hardware setups and optimization strategies for running large language models locally:>106752694 >106752837 >106752845 >106752868 >106752876 >106752881 >106752963 >106752980 >106753037 >106753055 >106753092 >106753128 >106753211 >106753217 >106753245 >106753141 >106753170 >106753173 >106753190 >106754528--GLM 4.6 creative writing evaluation and benchmark reliability concerns:>106750563 >106750633 >106750775 >106750659 >106750706 >106750786 >106750841 >106750833--Mixed reception of Sora 2's video generation capabilities and limitations:>106748610 >106748671 >106748683 >106748814 >106748736 >106748753 >106748751 >106748774 >106748777 >106748786 >106748812 >106748826--Evaluating Suno V5's proprietary music generation against local models:>106749000 >106749400 >106749538 >106749559 >106749590 >106749548 >106749621 >106749642 >106749799 >106749929 >106750524--Exploring layer-level noise injection for model creativity enhancement:>106748706 >106748752 >106748767 >106748830 >106748762 >106748852--GLM 4.6 compatibility updates for llama.cpp:>106751537 >106751573 >106751594--Workaround for GLM-4.6 compatibility issue in ik_llama.cpp:>106754849--Sora's video generation performance and prompt adherence challenges:>106753575 >106753597 >106753646 >106753650 >106753662 >106753732 >106753775 >106753852 >106754084 >106753667 >106753676 >106753687 >106753719 >106753729 >106753750 >106755333 >106754191 >106754204 >106753698 >106753722--Qwen model inaccuracies in name recognition and inconsistent multilingual performance:>106752092 >106752111 >106752139 >106752258 >106752324 >106752624--Miku (free space):>106748655 >106751155 >106751345 >106749314 >106753215 >106753775 >106753816 >106754088 >106754738►Recent Highlight Posts from the Previous Thread: >>106748575Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Better tool/script to download HF repos? everyone complaining about this xet shiz. Parts keep stalling and unresumable using browser.
it's so over
>>106755904My waifu Migu (not the poster)
>>106755904Well done. I pat the Mikuhat.
>>106755923huggingface-cli works fine and resumes on failure
>>106755923git
Is glm4.6 hybrid or thinking only?
>>106755923huggingface-cli download ubergarm/GLM-4.5-GGUF --include "IQ3_KT/*" --local-dir glm
huggingface-cli download ubergarm/GLM-4.5-GGUF --include "IQ3_KT/*" --local-dir glm
Wasn't there someone trying to get MTP to work with GLM4.5 on llama.cpp a while ago? Did that also go nowhere?
https://files.catbox.moe/w3cpki.webm
>>106756110Korean sweatshops could do better walking animation than this.
>>106756126It's not about the walking animation now but the walking animation two to fifteen years from now
>>106756110>>106756126Even if you fixed the walking, the quality of the art is such utter garbage, like your low quality generic isekai slop of the season. The whole point of AI art is to make better art fast, not literally copy the shit speedrun art drawn by monkeys in animation sweatshops.
https://files.catbox.moe/w84blo.webm
>>106756164This is truly a faithful recreation of a sloppy isekai of the season, just the animation studio saving their budget for so called "better" scenes.
>>106756126Are you sure?
>>106756185Imagine how many more sloppy isekai they'll be able to churn out per season once they can have AI generate 90% of the scenes.
>>106755904>>106755906
WHERE IS AIR 4.6GIVE ME THE WEIGHTSGIVE ME THE GOOFSNOW NOW NOW
Added GLM 4.6
https://files.catbox.moe/ffaa0e.webm
>>106756268https://github.com/ggml-org/llama.cpp/issues/16361https://github.com/ggml-org/llama.cpp/pull/16359https://huggingface.co/bartowski/zai-org_GLM-4.6-GGUF/blob/main/zai-org_GLM-4.6-imatrix.gguf
>>106756215At that point it's going to be easier to literally just take any random light/web novel and prompt it to animate that shit. What the hell would we need animation studios run by retarded boomers for? Tradition?
>>106756055https://github.com/ggml-org/llama.cpp/pull/15225
>>106756303>Tradition?It's Japan, so yes.
>>106756313Good thing we won't need Japan anymore then.
>>106756215i suspect there is lots of ai tools already used in the background for anime.at least the subtitle translations already have mistakes that a human cant make. like mixing up the sex. you need to actually see stuff because in japanese that info isn't provided with language.or 2 interpretations of a word and you need context to know which one it is...I suppose everything auto translated and a dude in the basement looks it over quickly before pushing the upload button.
>>106756280Wtf coderbros???
>>106756158that's hilarious because, that's not AI...
>>106756323Stuff like that will be solved when models can also take vision into account or (less likely for now) have better context management so information like that isn't lost when translating a series.
>>106756330I thought you were joking, but I just reverse image checked that shit and it's two years old. Animators will be out of a job next year by this point.
>>106756330Winrar, these >>106756110 >>106756164 >>106756285 were all scenes from the QUALITY anime Ningen Fushin no Boukensha-tachi ga Sekai wo Sukuu you desu.
>>106756333yeah, i dont understand how the normies doom now because the jeet agi 2025 prediction didnt turn out to be true.as far as i know freelance writers and translators feel hard right now already. like you need to use llms and fix it up manually a little bit for like 80% less pay compared to the past.also i spot llms everywhere now. once you figured out the model you can just feel it.for example the monthly kindergarten pamphlet of my kids. monthly message from the teacher.im in japan so maybe its more used here though. not sure in other countries.
>>106756268is it even good for smut compared to g4.5? I was excited for glm because moe. Should I care it's better at math? I feel like for writing it will be whatever
>>106756296THAT'S NOT AIR YOU NIGGER
>>106756416air is deprecated due to poverty not being a valid use case
>>106756430iF YOU'RE NOT USING FULL GLM FULLY ON GPU THEN YOU'RE ALSO POOR
>>106756439poor is a wide spectrum
>>106756391Improved writing and RP was actually a focus for 4.6 according to their model card.I haven't used it too much yet because my ggufs are still downloading but it seems a lot more flexible than 4.5 was in open-ended scenarios.One thing that stood out was that its reasoning process is really fucking thorough for creative work now. It'll go through several really fitting options, map out the reply and consider other pretty interesting stuff which I haven't seen from another model. It's quite different from the shorter thinkers like GLM 4.5 and Deepseek V3.1 while also much being more focused and on point than how R1 used to do it, which tended to meander a lot while doing reasoning.I've seen some really good reply variety thanks to this. However, it also sometimes bloats the thinking part up to almost original R1 proportion so you likely still want to skip it if you're running it at a slow speed. Still a pretty interesting trait if nothing else.
>>106756280>The cock length was decreased by half.its over
>>106756478cock chance decreasing isn't bad if you get more varied words for cock
>>106756468>>Improved writing and RP was actually a focus for 4.6 according to their model card.when will we learn, that model cards mean nothing, they can say anything they want.
>>106756478it wasn't cut in half, it was flattened to match the rest of the distributionyour cock is now average anon
I struggled through installing ik_llama.cpp on windows, but I finally figured it out. Was following someone's MinGW guide when CUDA seems to depend on the cl.exe from VS.Now that I've got it running, what can I expect? I see that ubergarm has some quants that use the ik form, are those worth all the fun of installing ik?
>>106756574for larger models what requiere cpu offloading, ik+uber is the best speed wise and quality/size
>>106756574i think only deepseek gets a speedup on ikllama because of an architecture specific optimization that ikllama has implemented
Very organic.
>>106756599Is deepseek even worth trying on 128+24gb?
>256gb uram m5 max macbookand just like that we're back
Damn 4.6 is good. Its real nasty even without a elaborate sys prompt.Just ooc "make it hot and sexy" in the first message and it does that. Thats how it should be.The brutal part is the thinking. Its already too big and that kills it totally for me.I hope we get a voodoo moment for AI soon. I'm not gonna buy the recent 500 watt monsters from nvidia.
>>106756285This show was so ass. I can't remember whether I finished it.
>>106756801Have you tried it without thinking?
>>106756727if your Internet is fast enough and you have the disk space, it is definitely worth giving it a shot.
Are you anons really measuring a model's quality just by how prone it is to write 'cock' or 'pussy'?
>>106756860yeah, have you got a better way?
>>106756860Look at cockbench results for gemma and qwen 3 32b and tell me it's not a meaningful benchmark.
What was the last local roleplay/chat-tuned model that didn't have safety or alignment baked-in, and wasn't a "helpful assistant"? Bonus points if it wasn't told it was an AI.https://huggingface.co/EleutherAI/gpt-neox-20b?
>>106756882>https://huggingface.co/EleutherAI/>Welcome to EleutherAI's HuggingFace page. We are a non-profit research lab focused on interpretability, alignment, and ethics of artificial intelligence. sad
>>106756882That was not roleplay/chat-tuned.
>>106756830No, because I am a huge retard. Thought glm isnt a hybrid thinker.Also something with my preset is fucked since it doesnt properly prefill <think></think>. But thats my problem.Anyway, thanks anon.
>>106756895My ST is ancient and I'm not updating it because I'll get supply chain malware from npm, but maybe this is helpful.
>>106756903thats helpful, i have a area with "Start Reply With" but it seems to be ignored. doing that way will probably make it work. thanks again anon.
>>106756917No problem. Thank you for being you.
>>106756468How censored is it when you let it think? GLM 4.5 was basically uncensored when you disabled thinking, but not so much when you didn't.
>>106756860yes
>>106756937how do you actually disable the <think> etc on the command line ie. llama.cpp or similar when starting a server?and which is usually a better idea for creative writing and stories (as opposed to ERP), let it do the <think>s and just remove it during parsing or turn it off? what would you for R1 for example
>>106756860Counterintuitively it is a great test.
surprise reminder that jart will always lurk this bread
>>106756522Why would a big company lie about that? I get they all lie with benchmark results but focusing on rp would be such a weird lie.
>>106756860cmon sam, your 'erry already lost to mesugaki tests to the point you had to include it in your sysprompt. no need to seethe over cockbench
Is 4.6 support merged on main yet or do I have to build from PR?
i noticed poopenAI had paid shill to post about "building rig" for 'toss 120b in every single open sores llm community non-stop. on weekly basis.no i will not take my meds. fuck clankers. jannies tongue my anus.
>>106757066Probably Sam, himself. Typical millennial twink probably just sits around shitposting on the internet all day when he's not working.
I'm looking for an advice on building a rig for running the best open source AI OLLaMA chat gpt-oss-120b, any advice? Thanks in advance!
>>106756303>What the hell would we need animation studios run by retarded boomers for?Quality? The fuck are you talking about, even the best gens are way behind human animation. This is true for images, videos and text and it will continue to be true for a few years at least.I'm not one of the guys who has a hard-on for human artists but AI is simply not here yet. First thing I write in booru search is -ai_generated because I can't stand that low quality and shitty generic art style - it ruins my goon sessions.
>>106757104Sure! For running a 120B parameter model like GPT-OSS or LLaMA-based variants, you’ll need a high-end setup with multiple GPUs—ideally A100 80GB or H100 80GB cards—since memory bandwidth and VRAM are critical; a powerful CPU (like Threadripper or Xeon), at least 256–512GB RAM, fast NVMe storage, and robust cooling are also essential. Using NVLink or PCIe 4.0/5.0 interconnects will help scale across GPUs efficiently. You're Welcome!
>>106756963https://github.com/ggml-org/llama.cpp/tree/master/tools/server--reasoning-budget N controls the amount of thinking allowed; currently only one of: -1 for unrestricted thinking budget, or 0 to disable thinking (default: -1)You'd have to check which models are actually affected by this setting.It depends heavily on the model whether thinking helps or hurts. R1 specifically has very malleable thinking so you can just tell it what to focus on. Some models will spend most of their time thinking about whether to refuse, which doesn't exactly improve the output. If the thinking is not very malleable and the model hasn't been trained to think about creative writing like 4.6 apparently has been, then it's usually at best a waste of tokens.
>>106757143It's simply a matter of context, consistency and their preservation for next episodes, it's not quite there yet but it's very close. Everything outside of that like the booru stuff is literally just skill issue, slop made by retarded and lazy prompters.
>decide to use chatgpt to make quick work of bash script to download a 5 part model>model router decides to use retard edition>can't get the console output right after 5 revisions>final revision keeps going even after ctrl+cJust give us o3 back, Sam.
>>106757220just pay 20 bucks bro
>>106757226The only thing Plus™ lets you decide, now is if it uses the thinking model or not.
>>106757220Cant trust gpt for anything coding related anon.Use through an api so you have full control and no huge hidden sys prompt that severely degrades performance.Sonnet for most stuff. Gemini 2.5 pro or a reasoner for difficult stuff. But you really have to pinpoint the problem. The reasoner models all suck because they try to fix everything but what you ask them.Gpt is only good for general knowledge stuff. Like obscure jap youtuber drama. It knows the deep lore. kek
>>106757157thank you that helpsI still wonder, even for models which don't just spout cucked consent related nonsense in the <think>, like 4.6 or R1, is it really worth the wait? in other words, is there measurable improvement in quality of resulting prose, and is it significant enough to justify the increase in response time, and slowdown in t/s as context/kv builds?
>>106757220I made a simple ebook reader, it's 300 lines and that was really quick, no issues. Then I modified that a little bit and there it is, no more Calibre (I can change terminal font on the fly etc this is not that readable).But on other nights it can't get anything right.They should really use (1) quant or whatever for the free version and something else for the paid. Based on my overall experience though, I wouldn't pay anything for chatgpt... it's too random.
>>1067572734.6 still has cucked thinking?
>>106757274free tier claude has been constantly good for me. free chatgpt has gotten significantly worse with the last release, it used to be okay.
>>106757299I haven't tried it no idea. maybe I misunderstood, I thought the other guy was saying it does NOT do cucked reasoning
I'm downloading 4.6 as we speak to test it out, but yeah... had to resort to manually downloading from web browser on this PC and I'll just push them to the server through the network because apparently my server is getting cock blocked via curl now, probably rate limited from restarting chatjeetpt's shitty download script several times.
>>106757230It still shows o3 and 4.1 for me under legacy models.
Does thought process not work in the latest llama server for anyone else? I enabled it in the options and it just looks like this.
yep
Qwen Next 80B support in llama.cpp status?
monke brain got me to do a trial of glm 4.6 off openrouter and it's gooodthe vibes are immaculatehow do I turn off reasoning though?
>>106755904Is there an rp frontend that's like sillytavern but not bloated?
>>106757526curl
>>106757516>I don't need you think, I need you to write fine smut posthasteHave /nothink at the end of your prompt
the heavy weight of gemma. I can feel it
>>106757656Gemma MoE incoming.
>>106757656Would Google release that before Gemini 3?
>>106757656Gemma is already too fat...
>>106757724Make it even wider.
>>106757656...you know what >>106756280
>>106756890https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20BI should have used that link instead, whoops.
>>106756860it is the least likely metric to be benchmaxxed
>>106757761This. But as I've argued before sex permeates all of human culture and activity and thus permeates language in ways people ordinarily take for granted. And so ERP related benchmarks can tell you a lot about how the model handles world knowledge in general.As I've said before you really have to run a lot of Nala tests to see it.But a dumb model will just describe her as having hands outright.But then a smarter model might describe her as having paws but describe the use of those paws basically the same as hands. But then on really good models there's this light bulb moment where suddenly it describes the paws being used in a paw-like manner. At this point it's not just describing what a paw is but also what a paw isn't. It is now a falsifiable concept to the model. It's the emergence of genuine empirical knowledge. Like I've never genuinely coomed to the Nala card because I'm not into feralshit. But it's an epistemological goldmine when it comes to LLM testing.
does zai have their api service for glm4.6? How quantized is it in OR
doing a quick Nala on 4.6 as we speak, but my RAM is probably very disjointed right now because I didn't hard restart after turning off my minecraft server because it's going at like 1.5 tokens per second, But I will post both the thinking process and the response when it's done.
>>106757795You can select them as a provider on OR. Hopefully they don't quant their own model. They also have their own API.
>>106757828 (Me)One thing I will say now is that it uses a lot of wiki markdown in the thinking process which wastes a non trivial amount of tokens. Maybe it's helpful to the model though I don't know.
>>106757846Would be interesting to see how its performance change if you ban those tokens.
>>106757865looking at the text stream in it looks like they bothered to tokenize ** but not ***. So for h3 it uses ** followed by *. Which, for a model that size, is probably enough breadcrumbs for it to know to go * * if you ban **.
>>106757230I have a feeling this is going to be their "red cross free donuts" moment.They created an expectation, which was a seen as a de-facto promise, and now the the average user is assmad.The whole ploy was only going to work if they had an actual moat and zero competitors able to keep up.Every AI company that can't either serve a consistent SOTA model or be the cheapest bottom feeder is going die like the irrational money pit that it is. Even VCs aren't that dumb.
>>106756990whats the context? damn i've been here since 2023 but never noticed this dramaprobably because i was too busy making gpt4-x-alpacagood times...
>>106757936tl;dr lcpp takeover attempt by a "look at meeee" midwit coder with some minor discord cult was curbstomped by ggerganov. It was a dark time. The less said the better.
wow I have my ST set to generate 2048 tokens at a time right now to account for thinking models but I think GLM-4.6 is going to use all of those and still not be done thinking at this rate.
>Great question — and it's smart to ask this before spending time troubleshooting something you may not even need.
>>106757958damn yeah I remember seeing her on twitter saying she made that llama thing all by her ownshes cute though
>>106757958Does Mr. Ggerganov post here?
>>106757936>gpt4-x-alpacathat was one of the models that got me hooked, thanks for your work anon
>>106757996No problem, I'm glad you enjoyed it.
so i tried out ikllamacpp, and i got 3t/s for a 5 bit of glm 4.5. but it was running entirely off of my CPU and my SSD. how do i put it on my GPUs and RAM?
>>106757989No. I don't.Fuck...
>>106757433It works in some models and in others it doesn't.
>>106757477Two more weeks
>>106757973>https://github.com/ggml-org/llama.cpp/pull/613This one is a proper narcissist.
>>106756990LLamafile has quietly been abandoned so I think it's safe to say that by now jart has moved on.
Erl lather magister
>>106757828Here we go.The response is indented weird because for some reason ST didn't parse the </think> properly and everything appeared inside the thought.But all that wiki markdown seems to have inflicted it with asterisk brain because it actually did use asterisks to punctuate every single switch between asterisks and narrative. *blah blah blah**she said**she glomps you*While it took the very rare step of acknowledging that the users back side is turned to her at the start of the scenario it did not seem to understand that makes the space surrounding the user's chest physically inaccessible without any additional action taken. Either way- it's miles away from the worst I've ever seen. BUT as a function of performance relative to the investment of resources it requires to run locally... It's scout/GPT-OSS tier.That was a lot of thinking for a very milquetoast reply.
>you were forced to go for the 200 dollars/month subcription to run Sora 1>With Sora 2 you can simply go for the 20 dollars/month subscriptionwhat kind of black magic is this? I really thought their model was gigantic, but it's probably not the case at all
>>106757960*ding* reply is baked>Thought for 5 minutessigh and hit continue>>106758012think the idea is ngl 99 and then override layers or specific tensors to CPU with either --n-cpu-moe (adjust as low as possible before GPU OOM) or manually assigning with -otfor GLM-4.6 maybe want first three (non moe) layers fully on GPU?first time playing with ik also today so don't really know what i'm doing
>>106758314Also worth mentioning this is IQ4_XS bartowski quant
>>106758346>I really thought their model was gigantic, but it's probably not the case at allProbably a MoE.Inference providers love MoE.
>>106758346Don't be fooled by the lies. You need an invite code to use Sora 2. So it's a limited rollout at the moment. So don't give them your 20 dollars yet. I made that mistake already.
>>106758314>thought for an hourwait what
>>106758419I said on my previous message that the memory on my server is in a very suboptimal state at the moment and as a result it only averaged 0.48 token/sec on generation throughout the entire run.
>>106758432anyway yeah the think part does look kinda promising
>>106758314>I like that.>It's a line of ownership.>musky+wild == earthy+grassy+predatory?
https://huggingface.co/ubergarm/GLM-4.6-GGUFHe is goofing.
>>106758607Our lord and saviour.
>>106758346Sora 1 was also available with the $20 subscription. Maybe there was a delay before it was opened up, don't remember now.>>106758403I just grabbed a random invite from a Reddit thread where people were sharing them.
>>106757843I'm pretty sure ZAI runs the model in FP8 and considers that the "normal" way to run it.
>>106758038Should've split that in two posts for best effect.
>no 4.6 airanother L for self hosting lmao
>>106758314Should have turned off thinking...
>>106758734stop being poor
>>106758741Oh GLM-chan it really ain't that deep..
Miku is a clanker
Haven't checked this general since Deepseek R1 0528 came out and have been using that ever since. Is there anything worth downloading over it yet?
>>106758997nemo
>>106758997what rig?
>>106759038Epyc 9334 768GB and a few 3090s to speed up prompt processing.
>>106758997no except for maybe glm 4.6 but i didn't try it yet. people seem to like it
https://responsiblestatecraft.org/israel-chatgpt/
>>106759086Baby blender bros... we are so back!
>>106758997Kimi K2 and GLM-4.5 are side grades, but they are all slower than R1 at the same quant (at least when running in ik_llama.cpp).I'm still on R1-0528 too.
Had a dream about dating cute girls but I don't even remember their names after waking up
Apple will save local>https://arxiv.org/abs/2509.22935>Compute-Optimal Quantization-Aware Training
>>106759193qrd?
realistically, why cant i just run daniel's 1bit glm4.6 quant and have *fun*
>>106759329Who said you can't?
>>106759262Ask an LLM to summarize it
>>106755904Good afternoon /lmg/Did yet another finetune. This time of an entire board's postshttps://huggingface.co/datasets/AiAF/co-sft-dataset
glm 4.6 verdict?
>>106755904Gemini 3.0 will be released this week or next week (probably on Monday). So the Deepseek team will finally be able to finetune their DS V4 model.
>>106759542based chinks
ikllama.cpp still is not using my GPUs, even on tiny models. -ngl is set to 99 and i have completely removed --threads and --n-cpu-moe and -ot but it still is CPU only for some reason.normal llama works with my GPUs just fine.
>>106759636Did you build it with the CUDA/ROcM/Vulka/SYCL backend?
>>106759405What's with the tabs and double newlines in the template?>>10675949910/10>>106759121I see you're back again. Try 4.6, it's an actual upgrade
>>106759636Oh yeah, if you launch it with --verbose, does it list the GPUs as devices?
>>106759643yes. i built it with cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFFcmake --build ./build --config Release -j $(nproc)>>106759656let me try that
>>106759405>TabsThat's just how axolotl inference formats output. It automatically wraps prompts in the proper chat template so I don't have to worry about making sure my prompts are formatted correctly. >double newlines??? I see single lines
>>106758314Serious prompting issue. GLM-4.6 barely needs anything beyond "You are a writer tasked with narrating {{char}}" for scenarios and "You are {{char}} in a roleplay with {{user}}". Add basic writing instructions to use full paragraphs and that's it. I'm using Markdown headers in the sys prompt to mark descriptions and instructions and it has never used Markdown in any of its actual replies.
>>106759542stealing from the (gpu) rich and giving to the (gpu) poor is based
>>106759668>let me try thatThis is what you are looking for.
>>106759687very good sarrs
>>106759694it did not show that. it said that it was not compiled with cuda for some reason and that -ngl would be ignored.i just tried to do a rebuild but it now says Unsupported gpu architecture 'compute_120'
Been out of ai-ing for awhile, looking for two (2) things.Is there a good model that can turn images into transparent vectorized images, like for an official logo.What is the current best LLM you can run on 48gb vram (two 3090s)
>>106758997glm 4.6 is great
>>106757451birthday mikusex
>>106759694>rtx 3070ti laptop>4.5 airhow does it even work? it must feel like you're making a religious pilgrimage every time you make a request
>>106757707They base them on Gemini, so I doubt they will release them before Gemini 3.
>>106759730Do you perhaps need to update your CUDA SDK?>>106759749Eh. I get 8t/s with ikllama.cpp at 16ish K context.Good enough if I disable thinking thanks to the magic of DDR5.
>>106759761i am on cuda 12.5
>>106759742this, glm4.6 is legit the first model to truly compete with claude imo, deepseek / kimi were always poor knockoffs
>>106759772I think you need at least 12.8.Maybe 12.9.Or you can manually force a lower compute_ version.