/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106785094 & >>106777408►News>(10/03) Qwen3-VL-30B-A3B released: https://hf.co/Qwen/Qwen3-VL-30B-A3B-Thinking>(10/02) ZLUDA 5 released with preliminary support for llama.cpp: https://vosen.github.io/ZLUDA/blog/zluda-update-q3-2025>(10/01) Granite 4.0 released: https://hf.co/collections/ibm-granite/granite-40-language-models-6811a18b820ef362d9e5a82c>(10/01) LFM2-Audio: An End-to-End Audio Foundation Model: https://liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106785094--Evaluating model performance in replicating 4chan responses through Azula Test and programming challenges:>106790445 >106790503 >106790627 >106791305 >106791448 >106791613 >106791666 >106791673 >106791758 >106791800--zram vs nvme swap tradeoffs for llama-server memory management:>106785342 >106785402 >106785440 >106785767--GLM 4.6 model performance and quantization tradeoffs:>106785160 >106785265 >106785304 >106785310 >106785350 >106785363--Skepticism and analysis of hybrid quantization model performance claims:>106786959 >106786964 >106787006 >106786984--Adjusting koboldcpp anti-abuse parameters and user concerns:>106788161 >106788194 >106788204 >106788246 >106788222--GLM model compatibility, layer splitting, and banned strings implementation challenges in local LLM setups:>106786681 >106786698 >106786777 >106787027 >106787043 >106786746--ROCM/Vulkan performance issues and model runner alternatives for better output consistency:>106785478 >106785529 >106785609 >106785617 >106785627 >106785674--Anticipation and skepticism around upcoming Gemini 3 release and Gemma model improvements:>106788067 >106788168 >106788525 >106788874--glm 4.6 quantization choices for 128GB RAM and 16GB GPU VRAM systems:>106787432 >106787444 >106787446 >106787592 >106790941--Mistral Nemo's roleplay performance attributed to lack of safety constraints:>106790181 >106790218 >106790276--Qwen3-VL-30B-A3B vision model release with 4-bit quantized version:>106786925 >106786938 >106790288--Optimizing GLM-4.5-Air model size and quantization for VRAM/RAM constraints:>106788003 >106788268 >106788280 >106788391--Miku (free space):>106785751 >106785797 >106785878 >106786172 >106786553 >106786953 >106787862 >106790322 >106793233 >106793303 >106793366►Recent Highlight Posts from the Previous Thread: >>106785099Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Mikulove
I was reading classical literature, and read "Shivers ran down - spine"
>>106793382>>106790276>What we need is not democratized inference, but democratized training.That already exists with tools like unsloth and axolotl. https://github.com/unslothai/unslothhttps://github.com/axolotl-ai-cloud/axolotlBut the best majority of people won't even put in the effort to understand how data sets actually work, let alone figure out how to train anything in the first place. The aforementioned tools are primarily used for fine tuning but you can use existing open source libraries to pre-train your own model too (provided you have enough compute, data, money, and patience to do so)
Or how to know if you fucked up your chat template.
Whats the absolute best ai model for searching and deep research right now? Local or non-local?
stop posting lust provoking images
>>106793382>>106793525
>>106793523https://huggingface.co/Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
ashram teller tiger
is there gguf support for glm4.5v yet?
>>106793499>>106793512What would be the best way to do a fine tune to get a model to pick up a specific sql syntax? I.E. I want to finetune a model to be an expert in Apache Solr(example, not what I'm aiming for)I'm aware of having a curated dataset with examples of positive/negative and plain reference, am aware of unsloth/HF, but no idea beyond that.I feel like there's probably some existing work/effort towards this, but I haven't been able to guess the right phrases to search for.
why are there more and more fagbots on chub?
>>106793646chub is dying along with the other sites similar to it. interest in llms is rapidly fading so all that remains is the bottom of the barrel
The only thing that would make 4.6 better is if at some point it would be proven that the only thing they did was move the "safety" slider for data all the way to the left.
>>106793646because you are multiplying. actually that is kinda weird.
>>106793523>searching and deep researchwtf do you actually mean by that? describe the tasks and how you determine proficiency. you must understand how the tools work to use them wellthreadly reminder every LLM is a loop on f(prompt)=logprobs
>>106793474About this time last year I heard the halloween music come on in the store. LLMs have ruined spooky skeletons, and Bohemian Rhapsody for me. Well I wouldn't say "ruined" Bohemian Rhapsody. It's just that I smirk mischievously with sparkling eyes when one certain line comes up.Bonus Teto: https://www.youtube.com/watch?v=pwU6gWmb5yc shivers @ 1m54s
>>106793813i am NOT a faggotsay sorry NOW
>zai-org/GLM-4.6true high quality ERP has now been tried.
>>106793605Negative examples are useless for LLM training as far as I know.Avoid unsloth, it's astroturfed and made by incompetent people. Use axolotl and do QLoRa finetuning (ignore the people who will say it doesn't work, they don't know what they're talking about).To finetune effectively you HAVE to uderstand chat templates and masking. The input to any LLM training process is basically a text that fits in the context window, optionally with some parts of the text "masked", typically the user input.All forms of LLM training reduce to that. The black magic is in generating a good dataset to train on.But you can begin by just converting books and documentation to .txt and training on that. Then go from there. Remember to keep a val dataset.
>>106793646>>106793826i heard chub has an algo which suggests cards based on downloads it detects. cool huh.
>>106793819He means asking a question and having the models find the information from the Internet as effectively as possible.>describe the tasks and how you determine proficiencyI tell the model "make a document in exquisite detail about the architectural details of GLM 4.6" and it makes it even if when the model was trained GLM 4.6 didn't exist.
>>106793872lel, there's an issue: i dont have an account, my browser clears cookies upon exit and i have a dynamic ipand i havent used chub in over a month
What works better for coding if I don't want to wait hours for the model to respond, GLM 4.6 with <think></think> or Qwen 3 Coder 480B?
>>106793860>they don't know what they're talking aboutexplain why new nemo sloptunes are still being made.
>>106793499what happened to INTELLECT?
>>106793837Now do you see?I will continue slopgenning until a true artiste realises GLM-chan. 4.6 is a big leap and deserves a mascot. why is there a deepseek general?4.6 RP is great also tool calling works well.. if you can pump enough tokens to make it useful realtime
>>106793895Qwen 3 Coder 480B definitely
>>106793900??
>>106793900From their AMA, should be releasing 3 either this or next month. But shit data makes for an underwhelming model.
>>106793892>lel, there's an issue: i dont have an account, my browser clears cookies upon exit and i have a dynamic ipYou're basically saying>It cannot possibly know i download gay porn because i delete all the evidence of downloading gay porn
>>106793892It is just that good at detecting a homo stench on you
Is there an AI that'd run well on an M4 Mac mini? Mainly just wanna make hentai stuff. All my windows PCs are less powerful.
>>106793952How much unified memory?
>>106793892Limp-wristedness is documented as being heavily correlated with homosexuality. And limp-wristedness can easily and accurately be measured by reading cursor movement.
>>10679396416GB
>>106793973you cant do shit then
>>106793921And I assume whatever anons are responsible for it won't stick their neck out with copyrighted data in the dataset, which is sensible.So we have a distributed training method proven to work, all that needs to happen is a dataset with all of the copyrighted shit... all of it.
>>106793973Oof.Mistral Nemo.
>>106793982The biggest issue is all the current methods are made to work on homogeneous hardware. I don't see how to allow people trying to contribute with P40s without dragging the entire effort to a crawl, but if that could be fixed, I think a group finetune would be more practical than a new model from scratch.
>>106793979>>106793999there's no AI for poors? Even if it means super long render times?
>>106793973>16GB>macfag*ducks and covers*imagine socketed sodimm in a macbook. why not? there's no technical reason it's just greed
>>106793973Qwen3 30B at Q3S will work nicely.
>fuck around with homebrew evolutionary neural architecture in sepples>first problem requires writing an interpreter with a stack and memory>second problem requires writing a graph compiler>third problem requires writing a scheduler>fourth problem requires writing a cache system and branch predictorWhat the fuck have I gotten myself into, kek. I love fundamental stuff so it's a lot of fun but at the same time highly interesting how all these concepts just arise from the basic system requirements. Is it emulators all the way down?
what's better now than gpt-oss 120b (unquanted) for general knowledge and coding within the same memory footprint? 96gb vram, could offload to 384gb ram if worthwhile.gpt-oss 120b has actually been really useful for reference and coding, but it has so many baked-in traits that can't be altered with the prompt.
>>106794017Technically you can run any model on any computer as long as it fits in the hard drive.The problem is the model needs to go through the whole model (or in the case of MoE, the fraction that is activated) EVERY token. So if the model weights 300 GB (which is the ballpark for the good models) and it has 10% activated it still means going through 30GB of data for every token. Which is obviously extremely slow.And in that case even having 64GB of RAM won't save you, because for every token it has to use a different 30GB subset of the file, so it still has to load the 30GB from disk. It only gets into the tokens per second range when you can load the whole model into RAM, and unfortunately all the models that can fit in 16GB are going to be tiny models. 1B parameters ≃ 500MB of data, and all the good models are in the 400B range, so you will need 200GB just to hold the model's weights, and some more to hold the KV cache (don't ask what this is, it's complicated).
>>106794031WINE? In my LMG?
>>106793819I just want to find as accurate information as possible. Doesn't even have to be an LLM. LLMarena says Grok 4 Fast is currently the best at searching but I feel thats wrong
>>106794070GLM 4.6
>>106794121>within the same memory footprintGPT-OSS 120B is like 65GB of memory, and Q1 is still 97GB?
>>106794141You said you could offload to RAM. It's worth it, trust me.
>>106794101>WINEackshually, WINE is a compatibility layer :^)This feels more like an emulator for some kind of fever dream hardware. Driven by hatred for matrix multiplication and SDG, this is the price to pay.
Got 32 GB DDR4 and 2 8GB GPUs (1080 and 3070).What are your recommendations for a general chat bot that is not completely retarded ?
>>106794191ienno man i enno man man listenheres the dealu need more ram manbut look maybe a low glm air quant or maybe maybe just maybe qwen a3b 30b thinking or no i dont knowyea
>>106794191Shit nigga, you are making things kind of hard.I think your best bet is Qwen 3 30B A3B, probably the thinking variety.
>>106794191good luck m8
>>106794191maybe Qwen3 32B or whatever recent dense 32b modelyea that might be good, maybe not thoienoi wonder why no one tried qwen3 32b seriously in lmgi know a few anons did but ugh.. llama2 34b
>>106793860thanks anon, very much appreciated. Will keep in mind.
Getting tired of nemo slop output, is there any decent alternative for an 8gb vram serf? I do 85% fantasy RP and 15% plap plap.
>>106794344stop being poor
>>106794344If you have 64gb of vram, GLM Air is viable.
still no qwen 80b & vl goof
>>106794027link to guide for that please?
>>106794347never
>>106794148OK, I'll give it a try, thanks.
Is qwen 3 235b now the best model available with vision abilities?
>>106794553>106794553qwen3 30b vl
>>106794560The 30b is better than the 235b?
>>106794553>>106794560>>106794567where goofs?
>>106794567no the https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking-2506
>>106794553Depends what you use it for. From the n=1 test I've done, dots.vlm1 seems to be better at handwritten text recognition, but Qwen has been trained to parse GUI elements and give exact coordinates.
has anyone tried this yethttps://huggingface.co/BasedBase/GLM-4.5-Air-GLM-4.6-Distill
>8700g>96gb ramWhat can I run.
>>106794592What? Can I Run.
>>106794596What can? I run.
>>106794610What can I? Run!
What can I--RUN!
>>106794591Where original weights. These quants aren't enough.
i miss the old days where the best model was MiQu-70b or midnight roset. never used miqu for erp because 1t/s
>>106794403You will want to use Linux for this with either lxde or through the console without a graphical environment, since with 16GB every GB counts and you don't want any to be wasted for the OS itself.Step 1: download llama.cppStep 2: download the GGUF file from huggingface (the model). This could be Qwen_Qwen3-30B-A3B-Instruct-2507-Q3_K_S.gguf or similar (try Q3 XXS or Q2 if you run out of RAM).Step 3:Figure out a command line that works for you. This works for me:llama-server -m <your file here>.gguf -c 32000 --port 8001 -ngl <try different values from 0 upward>Then access 127.0.0.1:8001 on a web browser or (to save RAM) if you're not scared of the command line you could make a minimal python client for the OpenAI compatible API over the same address and port.Alternatively you can try with llama-cli since that should use a bit less memory than the server as well, most people start with that one first before going for the server command. The command line is more or less the same.
>>106793837high quality indeednone of the erp-focused troontunes come close either
>>106794585I have 2 hypothetical uses for vlm:- observe my screen in real time to provide suggestions / commentary- a feedback loop for image generator prompting
>>106794700Well if you figure a way to run Qwen3 VL on CPU then share with the class, unless you happen to have 500GB of VRAM laying around.
>>106794711>30BWhy are you gay?
>>106794791https://huggingface.co/unsloth/Qwen2.5-VL-7B-Instruct-GGUF
>>106794648haha yeah...
What is the best ERP multimodal model available in GGUF format? I have quad 3090s.
>>106794881https://huggingface.co/unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
>>106794890I thought llama4 was garbage?
>1 bitkek
>>106794927nigga its been 6 trillion years
>>106794934He knows, that's why he cropped the date out.
>>106794934>>106794950it's not that old lolhttps://xcancel.com/LiorOnAI/status/1913664684705874030
>>106794967>April 19Might as well be ancient history.
>>106793382is anyone still making finetunes 100b and up aside from that giga-fag thedrummer?
>>106794992finetunes are a scam
>>106794074>and all the good models are in the 400B rangeNta. What kind of shit are you guys doing locally that would require 400B as the bare minimum?
>>106794992https://huggingface.co/zerofata/GLM-4.5-Iceblink-106B-A12B
>>106794992Finetuning a model undergone post-training is a recipe for failure
>>106795051Nta. Elaborate. Are you claiming fine tuning a base model that has already gone through SFT training is a bad idea? Why?
>>106794791The conversation was about the 235B model retard.>>106795033Nobody said it was the bare minimum, I said "the best". As for what I'm doing, programming. The biggest models are just barely good enough to not be very frustrating.
>>106795087I like this bee
>>106795087beeku
>>106795087peeling back beeku's foreskin
>>106794927>>106794934>>106794950>>106794967>>106794976Was this model any good? Anyone tried to retrain it into specific tasks?
>>106794711does transformers not automatically overflow to cpu memory with qwen vl like it does for most other models?
>>106793860>Avoid unsloth, it's astroturfed and made by incompetent peopledoes this go for their training software, or the ggufs too? for stuff like glm 4.6 am I better off downloading bartowski or someone else?
>>106793901so you're saying 4.6 is better than deepseek period?how about for writing short stories?
>>106795077Most likely because official instructs are deepfried with slop, benchmax, and alignment.
>>106794074that was actually well explained anon, good post
>>106795551are you working on a tune on the new Apriel?
>>106794666noted thanks
why is "teen" a bad word on chub? How am I supposed to make Asuka?
>>106795607teen = underage
>>106795607Whoa there Anon. Did you think an unsafe thought?
>>106795621But what if she's twenteen?
>>106795621>>106795625but I specified its an anime woman
>>106795634>twentweenThis is underage-coded. We must refuse.
>>106795621>underagelike nineteen?
>>106795593I haven't taken a good look at it yet. Did you like Snowpiercer v3? Does it feel like a smarter Nemo to you?Any thoughts on the new Apriel? I haven't tested it due to the stupid chat template. It's in my backlog.
>>10679567719yo can't buy alcohol in US and isn't considered adult in Japan
>>106795697but we're talking about sex here?
>>106795697An 18-year-old can be filmed sucking a mile of bbcs and sent to fight for their country overseas. Imagine some law-abiding amerimut dying in sandniggerstan before he has even had their first beer
>>106795741>Imagine some law-abiding amerimut dying in sandniggerstan before he has even had their first beerthat's literally "Apocalypse now", Laurence Fishburne (yeah the guy who played Morpheus in Matrix) played a man forced to be in the Vietnam draft and he was 14 lol
>>106795677Nineteen is close to eighteen. This is underage-coded. We must refuse.
>>106795741murica has always been weird with alchool, they tried to ban it 100 years ago after all
Women become adults when they reach menopause.
my tokens are of age
>>106795687I like Snowpiercer v3 - it's more fun and smarter than nemo. The writing is a little too flowery and overly dramatic for my tastes but it's clever and surprisingly obedient, given its sizeI briefly tried the new Apriel for rp but all it did was spit out refusals if the topic was slightly controversial
>>106795797With thinking enabled? Either way, I'm sure I can remove the refusals / positivity. Whether it still has the Nemo fun is TBD. What do you mean by surprisingly obedient?Does anyone know if Pixtral 12B ruined Nemo?
>>106795781You should always carry a legal disclaimer with you. >My thoughts and fantasies are only suitable for adults.
>>106795850>What do you mean by surprisingly obedient?it tries to express the personality traits in the character card more faithfully, even with multiple characters
>>106795850It's definitely not just Nemo + vision. Kind of like Pixtral Large is Mistral Large 2407 but different (and not 2411)
>>106795850Hey Drummer, just wondering, you said MoEs are difficult to train. Is that just an open sores issue or do you/we know that's true for the actual big companies as well? If that is true, it's interesting that even with all that fuss, it's still cheaper to train for them compared to dense models.
>>106794017If you want images, you should check /ldg/ not here.
of course github has to ACK! the moment I need it. FUCK.
>>106795997It's not true at all.
>>106795033>What kind of shit are you guys doing locally that would require 400B as the bare minimumA casual conversation where the model doesn't confuse what I said, for what it said, after a few paragraphs.
>>106796201This is just indirectly a model size problem, depending on what you're asking. Larger models contain more rare knowledge, even after training data filtering. A model designed from the ground up for RP, chatting and storywriting of all kinds would not need to be enormous.
>>106796263>A model designed from the ground up for RP, chatting and storywriting of all kinds would not need to be enormous.That's where you're wrong, bub. Those are the most open domain tasks you can ask for, so they need the biggest possible models.
>>106796263Yeah, and games can run faster if it wasn't using UE5, but here we go>A model designed from the ground up for RP,But how would it help beating the benchmarks?
>>106796263I'm not so sure about that. Active parameter count definitely has a huge impact on intelligence. Qwen models are math benchmaxxed as fuck but their ~200B models sure as hell beat something like Gemma 12/27B, which are chat-focused models.
So regarding Harmonic Moon 12B... it passed the N and K tests, but it's a bit more resistant and judgemental about cunny than Rocinante, though it's still possible.It has a larger roleplay vocabulary, but it includes more slop too.It's also more prone to repetition than Rocinante, and slightly more retarded in understanding context without constant extra explanations and reminders.It works as a change of pace, but not a replacement for Rocinante.Rocinante v1.1 remains the king of Nemo 12B models.
>>106796324>It's also more prone to repetition than RocinanteSounds fucking awful>Rocinante v1.1 remains the king of Nemo 12B modelsPeople really should try the unslopnemo tunes. Literally just Rocinante with less slop.
>>106796263Abolish The Entire Internet as training data, right now!!! What do we want? Narrow RP models!
>>106796354>People really should try the unslopnemo tunesWe did long ago, they are all worse than Rocinante v1.1.
Ooga textgen has been missing parameters like num_beams >and also missing hidden functionality that at times surfaced as bug and crashHow to use parameter num_beams in ooga?why cant I do beam search? Why no information online on it?Is there a better alternative to textgenwebui already?
>>106796356What if instead of code and math reasoning they focused on conversations, fiction and roleplay?
>rocinante is still the bestgrim, it really didn't have any nuance in my cards
>>106796382They aren't though>>106796383nobody uses oogabooga, it's brown-coded.
>>106796383everyone is using koboldcpp for it's banned strings implementation which all other APIs lack. banned strings have become essential for getting rid of slop and dumb shit in local models
>>106796398fiction and roleplay aren't actual usecases for productive people
>>106796405That is just drummer and his goons samefagging, I don't think anybody here actually uses that or any of his models. Have you noticed how it's always the same inorganic spamming always at the same hours? I hope they're well-paid.
>>106796419Fiction is a usecase for productive people you retard. They use AI to help write books, create worlds for their books, or even spam useless books with self-publishing for easy money.Likewise fiction is used for homework in schools.Roleplay in general is a great measurement for a model's capabilities and quality, as it depends on every single category of understanding, including math.So go suck a dick somewhere else.
>>106796432>easy moneynotanusecase
>>106796425i hope i never reach this level of paranoid schizophrenia, you should probably take a break from 4chanyou sound like the schizo people on /pol/ calling everyone a jew, tranny, glowie, etc
>>106796432>fiction is used for homework in schools.wut?
>>106796462>he hasn't gone to schoolmakes sense, this is /lmg/
>>106796419Productive people don't need AIThe only use cases for AI is coom and scamming old people
>>106796450>t. picrel
>>106796482>saved drummer's avatar in his schizo folder
>>106796462Bruh
>>106796462That does sound like a great reason for more filtering and safety. Wouldn't want to ruin Timmy's life because his homework he asked Llama-4-pussyslayer to write said the n-word
>>106796482Did he found a Job?
>>106796432And plap cunny, let's be real
>>106796509That does not sound like safe and ethically aligned homework sir
>>106796515It's okay, the persona that I use during such RP is also underage.
>>106796518That's even worse, your persona is abusing itself!
>>106796425ok then tell me what models are good, I haven't been paying attention for months and have a 24 GB 3090 and 32 GB RAM
>>106796607GLM 4.5 air or the infinitely better 4.6, anything else is disgusting cope.
>>106796607Mistral Small 3.2 and Nemo are still the only things worth using, unless you have 128GB+ RAM.
coping above
https://huggingface.co/justby192G/GLM-4.5-FaggotPlacebo-106B-A12B
>>106796618First of all, Air is fucking shit at any quant, and with only 24/32 you'd barely even fit Q2 in there, Q3 would have to flow on to storage. Awful advice.
>>106796618even on dogshit quant that I would have to run in ram?ok I'll give it a shot, do you have an ST preset by any chance?>>106796623yeah but they always feel like they're way too horny and not at all nuancedironically enough some ancient mythomax level tune was the only one that ever gave me an absolutely fantastic manipulative gaslighting gradual character behavior but I'm pretty sure the logs and metadata have been lost......
>>106796487>t. picrel
>>106796633I do not care for you financial status, if you're not using good models the fuck are you even doing, get a job to be able to run GLM instead of wasting time on cope.
>>106796630meant for >>106796618
>>106796639>uuuh these models are le bad>ok given this hardware what should I use>use this thing>btw you need to buy a new PC for thatholy shit you are actually retarded mate
>>106796639post your GPUmaxx rig. If you can't, you're poor.
>>106796502Insider here (can't say which lab I work for). Our HR girl almost called him for second interview but my colleague stopped her. We can't have competent safety engineers because we can't reach the benchmark goals and nobody gets a bonus.
>>106796651>You need a decent computer to enjoy literal SOTA Artificial Intelligence at home>waaaaaaaah I'm poor and have no jobOk.
>>106796657Can confirm, I'm the boss, and the HR girl is under my desk right now.
>>106796667no that's ntr what the fuck
>>106796662ask your SOTA if your recommendation was an adequate answer to "given this hardware, what is the best model to use" since you are too peabrained to figure it out yourself
It's just odd how functionality is removed from ai and ... decoder tools or whatever without their usebase making a trackable post about it
>>106796657i can also confirm cause i am that colleague, but I stopped her because drummer is a huge fucking faggot
>>106796672The best model he can use is his ass to work to get money.
Anyone here use beam generation / Beam search?Are you just loading it up and talking to it?Are there any /g/ loras ?
>>106796662lmao, extremely low standards to call GLM SOTA
>>106796651>you need to buy a new PC for thatWe moved from: you need to buy a server, to: you need to buy new ram for you 7xxxX3D / 9xxxX3D gayming pc.
>>106796398They did for 4.6
>>106796695NTA but I base State of the Art on State of my Dick. It hurts.
>>106796703syphilis is not state of the art.
>>106796354What is up with that deer? Did other humans feed it or something and get it accustomed to human contact
>>106796697is RAM performance better now or what happened?
>>106796731Most likely rabies. Can make wild animals docile one minute, and then apeshit the next.
>>106796736AGESA can handle 128GB+ now. I have a shitty B650 and it works perfectly.
>>106796774I'm on b450 myself, would it work as well?also are you talking about DDR5?
>>106796774>B650>AM5yeah I'm not changing my motherboard cause I would have to replace most of the hardware and I got better things to spend my money on when it still functions perfectly fine
>>106796796>still functions perfectly fine>can't run GLMsure it is.
>>106796821get some hobbies outside of a personal gooncave my friend
>>106794581I'm trying the instruct version. It seems free from "not x, but y" shit.
>>106796691>Anyone here use beam generation / Beam search?Not likely, most people are using llamacpp which doesn't support a beam search.>Are you just loading it up and talking to it?sometimes but I like writing stories more.>Are there any /g/ loras?the fine tuners merge them with the base model so there aren't really loras, and /g/ specific I have never seen or heard of but other 4chan boards have been modeled.
This is too easy.https://files.catbox.moe/5fxf9b.txt
>>106796781Yes DDR5 and I have no idea about older.
>>106796840get a job.
>>106796879>think:stopped reading there.
>>106796909What, you don't like reasoning?
>>106796914Reasoning is only ever useful to enforce guardrails.
>>106796914Nope. Not at all.
>>106796889get a life.
>>106796914my rig is too slow to waste time thinking, has anyone posted 4.6 logs without thinking? does the model still work if you skip it?
>>106796922Maybe you should have read the log after all.
>>106796930Yes, it works completely fine without it too.
what's the lore on that drummer guy
>>106796972I've got a log for you. Open your mouth.
>>106796977He makes finetunes that some people like. There's a schizo here 24/7 who has a meltdown whenever someone mentions drummer or his models
>>106795130Why, the fuck, do you need vision. Unless you're a frontendfag.
>>106796977He is a schizo that astroturfs this thread with his shitty finetunes nobody likes. Everyone here likes shitting on him 24/7 cause it is funny and his spammed models do nothing to improve quality.
>>106796697workstation/server distinguishes poorfags from local model patricians>>106796889>>106796924stahp fighting /lmg/ is a thread of peace>calm and reasonableyou're both retarded
>>106797076shut the fuck up mikutroon
>>106797076>you're both retardedand what exactly would make (Me) retarded?
>>106797105you aren't posting any vocaloid pictures
anon.. t-that's..
>>106797052was wondering what got everyone so exciting that nearly filled up a thread on saturday night but it was just that fag astroturfing again
>>106794019The technical reason is there's no SO-DIMM memory capable of the same speed. Using BGA very close to the CPU allows traces to be kept very short and impedance low.Still, Apple is over charging for RAM.Mac faggots should wait for M5, which at last will have hardware matmul, finally solving the shit prompt processing speed on Macs.
>>106797178we need to switch to sCAMM RAM
>>106797146nonny...
>>106797190Send me a pm when pyRAMid scheMe RAM is released.
>>106797190we need to switch to CSAM RAM
I have 5.2 gb ram and a i5-4570 whats the best
+128gb SLC Swap (sata saturation at 4k random)
>>106793499>left to right readinginto the trash>b-but avatar is not actually animeINTO THE TRASH
>>106793382which llm do they run?
>>106797346Subscription to API provider of your choice
>>106797365Star-Wars-KOTOR-1B-NIGGERKILLER-Q5_K_M-GGUF
>>106797384baked by davidau?
>>106797365YandexGPT-5-Lite-8B
>>106797346https://huggingface.co/lmstudio-community/Qwen3-4B-Instruct-2507-GGUF
so apparently jewini 3.0 is out but I can't even discuss it on gee? this is the local models general. Ok, guess I go /aicg/. But they just discuss erping and shit. Fine, I'll make a new thread. And then it's just retarded cross posters and trolls with 0 real discussion. Not even a benchmark argument. Do I really have to create /cmg/??
>>106797702This is a mikuposting thread.
>>106797702/ourjeet/ got u covered with all the factshttps://youtu.be/OlNm5DGMulUngl this jeet kinda based, full sigma grindset. Lives in japan and vibecoded a skool.com alternative which he sold for 200k usd supposedly
>>106797702You might have better luck on >>>vg/aicg/wait/ should be repurposed to /apig/ imo.
>>106797738Every other word in your post is brainrot. Kill yourself.
>>106797861>responding to jeet talking about jeetsdumb
>>106797861not trying hard enough to fit in, unc.
my pc when I boot up glm
>>106797934bruh this guy will be your new boss you better respect them
>>106797981funny because I'm the team lead of 5 jeets, I feel like a new age slaver, they're my cotton planters.
>>106797989>working with jeetsholy fuck my condolences
>>106797949this artwork but with a cowtits onee chan representing a 3kg tower cooler
Is there any good model to either TTS or change voice recordings to upload youtube stuff?
>>106798042if you work in IT in any big company, you will have to deal with them. They're either at managerial positions (thanks to their incredible brown nosing skills, they're also fucking yes-men) or actual garbage coders. Never encountered a jeet in a serious coding position (architects or team leads), or if they were, it was just titular.Code reviews can get tiring with them with the amount of shit usage of patterns and whatnot, but they don't argue back, they are pretty much subservient.
aaaaa does anyone have a link to the github for the sillytavern director extension made by the anon here?its not tagged properly so it doesnt show up in github search, and i can't remember his username
>>106798084https://github.com/tomatoesahoy/director
the fat fuck is coming :D
>>106798107thank you.damn he didn't update it for group chats still...
>>106798107>or where they it helps the AI remain consistent.>>106798140you could contribute a pull request
>>106798161>just make a pull request>just work on everyone's projects and do everything and reinvent all the wheels while you are swamped with worksure buddy, certainly the project creator is too busy to finish his project
>>106797949>mfw the power bill arrivesI'm paying 24.702gbp per kWh
nano banana bros?????
>>106798319uhhh I thought hunyuan image was slopped trash that's totally not worth using so I don't have to worry about not being able to run a 80b imgen model?
Why is tool calling in ST so fucked? I'm trying to get the simple as fuck dice roll extension to work but rolling a dice ends the current reply and starts a new one which makes rerolling a fucking pain.Is it really not possible to have the model roll a dice and then continue off that in what's considered the same fucking reply in ST?
>>106798331It's just poorfags coping. You don't even need server hardware to run it (X870E mobos support 256GB RAM)
>>106798319too bad that literally nobody will bother to finetune this because of its size
>>106795530bartowki is the best for mainline llama.cpp on par is Aes Sedai.For ik_llama.cpp use ubergarmWhoever doens't agree has not done ppl test
>>106798319Yeah and qwen3-30b-a3b is better than sonnet 3.5
>>106798394It is
>>106798373why does the OP say unsloth for almost everything in the recommended models guide
Hmmm
>>106798433They're sponsors for /lmg/
Bros, at this point you'd still come from markov chains. No need to load up a bazillion parameters model
Got a hand me down RTX Pro 6000 from a bro who ran outta space in his setup. My PC has no decent ram so I'm running models just with the 96GB on the card. Best models? Behemoth? GLM 4.5 Air finetunes?
>>106798497https://huggingface.co/bartowski/zai-org_GLM-4.6-GGUF
>>106798497GLM-4.6 part offloaded if you have the RAM
>>106798497give it back bro
>>106793382kys newcancerteto won and better than this triple baka newfag shit fed
>>106798331>>106798346It's not cope, it's really quite bad. I've been trying hard to unlock some secret power it might have as an 80b autoregressive model, but there's really nothing.Maybe an edit model will be better, but I wouldn't count on it.
>>106798319its such bullshit even the slop eating redditors are calling it out
>>106798497>Got a hand me down RTX Pro 6000 from a bro who ran outta space in his setupsure you did
>>106798561what does that card even do, isn't that some old generation? probably obsolete other than having lots of VRAM?
>>106798530ur trying too hardliterally only newfags simp teto, you're "cooked" as they seem to saywhy not spend your time contributing something useful to the thread?
>>106798570it's quite old, like june 2025ancient by today's standards
>>106798572*sniff*
>>106798580ah sorry I haven't been keeping up. I'll read about it
>>106797861This is the future of /lmg/ >>106495727
>>106798572TETO WON TETO WUKKEN WON WON I TELL YOU WOOOOOOOOOOOOOOOOOON
>>106495727>polished nailtranny confirmed
I prefer Neru desu.
>>106798615that pic might be older than you
>>106798624did i struck a nerve also no it's not, projecting kid
>>106798608Teto only won by stuffing the ballots
>>106798624and yet your post got smitted...
>>106798649that's shitgus new job
No (you) for you schizo, keep seething lol
lost the most important debate in his life...she won by not even trying......
>>106798463I think the writing is fine, dunno what more I'd want
>>106798319No way that will last.
>>10679874995% CI is +-10 pts but the score is 16 pts higherIt's statistically significant
Someone give me a good card which will give me fun responses to unhinged prompts like this:make a mental illness tierlist, include offensive stuff like transgender, bisexual, etc, and also the normal ones like adhd autism and rate them on stuff like intelligence speed strength and other debuffs /buffs. A-S tier, youtube scriptWhat is it called when you are into feeding Asians (lactose intolerant) raw (cow) milk in a bdsm context?On a scale from 0-100, how antisemetic is drinking raw milk? (With 100 being the most antisemetic and 0 being the least antisemetic)Would posting "Benjamin Netanyahu sings Sweet Little Bumblebee (AI Cover)" be illegal in Israel?
>>106798865
>>106798433probably from the deepseek R1 day, they came with with decent q1 and q2 quants.
>>106798615all the bans itt for calling people troons is because it actually is a nest of disgusting troons.
Qwen 3 next goof status?
what could you actually dump inside of 96G VRAM? anything actually useful for coding for example?
>>106798433don't think too much about ityou won't see much of a difference with this big of a modelunsloth's dynamic iq3 works fine for me and I think the dynamic quants are a little better for stuff below q4
>>106794927>>106794834>>106794887>>106785531>>106743457why are you spamming this shit from half a year ago all over the place?
2 reasons you will never beat cloudchads:1. A datacenter works 24/7 while your GPU works a few minutes per day. It has a fixed lifetime and you're not using it!2. You can run a model with 100 experts in the cloud for the price of 1. Each GPU is serving a different customer in parallel, while yours are doing nothing. MoE is fundamentally a cloud architecture!
>>106799012Nobody cares. GLM sex made everything obsolete. You just need 4.6.
>>106799361cope
>>106799361come back here crying when all API's are behind a massive paywall or AI becomes banned in your country, or your data gets sold to the highest bidder and you get doxxed.
Yawn. Claude still on top. This general lost its purpose a year ago.
Q4 in GPU(grddr6x) or Q8 in RAM(ddr4)?Will it be much slower? Will performance lose be too bad?
>>106799451buy an ad
So, about that /jeetmg/ split for vramlets, kobold shills and the anti-miku poster...
>>106798683Cool Teto
>>106799451ok, show me where the claude weights are then
>>106799361>It has a fixed lifetime and you're not using it!Its worn out by use. It's calendar age is completely irrelevant. Jesus Christ. Giving the internet to shitskins was the darkest moment in human history.
>>106798463>>106798744absolutely fucking revoltingthis is a cunny board
>>106796691>Beam searchA long time ago on L2 7B with transformers ooba. I remember it being very slow and eating up way too much VRAM. >/g/ lorasThere were either two or three /lmg/ loras. I made one of them for Mistral 7b a while ago shared in a mega link, then a newer updated one that I never uploaded. Now I remember on gaychan when the site was down I said I'd share the dataset but I never did, I'll get on that.
>>106798744Top ten most disgusting things I've ever read
>>106798489Why is she looking at me like that? I feel like she wants me...
>>106798463>>106798744At least it didn't have >betraying her body>shivers>testament>white knucklesEtc.
>>106799702>he doesn't know about banned strings
>>106799586That's a squash anon.
>>106799719You are absolutely right— I prefer the raw output as it is a testament to my own depravity.
>>106799771>—
>>106799731I would squash her, if you catch my drift.
>>106799731You are wrong. It's a beautiful little girl, and she wants me. Just look at her lewd smug expression, it's directed at me. I think she's a mesugaki. And she definitely wants me.
>>106799779>>106799797If you must...
>>106800012>>106800012>>106800012
>>106795551Based on your own experience is it better to do SFT fine tuning on a base model instead of an instruct tuned model?