/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106683141 & >>106671477►News>(09/24) Meta FAIR releases 32B Code World Model: https://hf.co/facebook/cwm>(09/23) Qwen3-VL released: https://hf.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe>(09/22) RIP Miku.sh: https://github.com/ggml-org/llama.cpp/pull/16174>(09/22) Qwen3-Omni released: https://hf.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe>(09/22) DeepSeek-V3.1-Terminus released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Terminus>(09/22) VoXtream real-time TTS released: https://hf.co/herimor/voxtream►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106683141--Paper: CWM: An Open-Weights LLM for Research on Code Generation with World Models:>106691217 >106691292--AI coding tool effectiveness and context management for complex projects:>106686525 >106686611 >106686641 >106686766 >106686672 >106686701 >106686847 >106686886 >106686977 >106686978 >106687191 >106687240 >106687451 >106687500 >106687532 >106687620 >106687714 >106687787 >106687921 >106687802--Feasibility and challenges of building an LLM cluster with low-end GPUs:>106690660 >106690691 >106690732 >106690765 >106691298 >106690722 >106690740 >106690753 >106690762 >106691028 >106691052 >106691090--Model coherence challenges and memory retention limitations despite increasing size:>106686603 >106686643 >106686682 >106686837--Challenges in estimating cloud model quantization accuracy and provider consistency:>106686270 >106686431 >106686775 >106686487 >106686519 >106686806 >106686571 >106686614 >106686759 >106687136 >106687159--Local LLMs translating Japanese erotic games: performance and integration challenges:>106684519 >106684559 >106684624 >106689938 >106690195--Intel Arc Pro B60 GPU criticized for high price and poor performance:>106688079 >106688086 >106688093--Mi50's cost-effective performance in e-waste segment for llama.cpp/ggml models:>106688007 >106688028 >106688044 >106688292 >106688312 >106688383 >106688343 >106688440 >106689613 >106688047--Bypassing Qwen 30B-A3B's output censorship through pre-prompting techniques:>106688243 >106688295--Miku (free space):>106688809►Recent Highlight Posts from the Previous Thread: >>106683147Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Cudadev and the MI50 saved local!
I can't believe local is fucking dead
>>106691760True local died with r1.
>>106691760never been alive
>>106691703Where do I find women built like that IRL.
>>106691871Brazil. They don't have many local models there. No compute.
>>106691760It died when they removed miku.sh
a few threads back, someone told me i'd get about 10t/s with qwen30b with 32gb of ddr4 ram and an r5 5600x at 32k context, you were right.
>>106692077also why does the lazy guide in op recommend sillytavern, koboldcpp has a web ui and it works fine, was annoying to get sillytavern to work, and desu once i figured out you didnt need sillytavern i didnt even bother with it
>>106692077Good boy, so proud of you.
>>106692077It's the best thing you can run, the only problem is the slow ass prompt processing.Didn't you say you had like 6gb vram? If you use llama.cpp you could try running it with -ot 'down_exps=CPU' and see if it speeds up.
>>106692209Sillytavern gives you more options to keep your models coherent in longer storylines.
>>106692209>it works finedebatable
>>106692209Because ST is widely used, feature rich and if you ever want to do X then you're much more likely to find solutions for it than with other front ends. KoboldAI is fine but considerably more barebones. Though, if you failed to get ST running then you probably won't be sticking to this hobby for long.
how do i use a GGUF from mradermacher? it is put into multiple parts and the file extension isnt even gguf. attempting to load just gives an error.https://huggingface.co/mradermacher/Austral-Qwen3-235B-GGUF/tree/main
>>106692400Pretty sure those aren't multipart ggufs, but actual split files that you have to merge.
>>106692309no i got sillytavern to work, but my node.js was just acting up and made it a bit of a pain. but now what, i need to get koboldcpp to use sillytavern or something...
>>106692413how?
>>106692435>i need to get koboldcpp to use sillytavern or something...You run koboldcpp as normal and type its default IP address into ST, and select koboldcppp as the backend.That's literally itAsk chatgpt if you can't read basic instructions, though again, if you're struggling already then just give up, local AI is not for you.
>>106692457cat command on linux or copy /B in windows, I think?Try googling for >command join binary filesor the like.
ok anons I'm trying this database ai coding shit. The dream would be something that combines fzf / ripgrep / a local llm so I can ask it questions and it will remember shit over time? Does this exist? Do I have to make it? Seems like "aider-chat" is the closest
>>106692466desu the only instructions i read were the sillytavern instructions, the two commands to download + install it.
>>106692600Yes, it’s a well known tech stack called “fzf augmented generation”, or FAG for short. Amazing that you’ve independently come up with the concept!
Video models are zero-shot learners and reasonershttps://arxiv.org/abs/2509.20328>The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation emerged from simple primitives: large, generative models trained on web-scale data. Curiously, the same primitives apply to today's generative video models. Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding? We demonstrate that Veo 3 can solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities to perceive, model, and manipulate the visual world enable early forms of visual reasoning like maze and symmetry solving. Veo's emergent zero-shot capabilities indicate that video models are on a path to becoming unified, generalist vision foundation models.https://video-zero-shot.github.io/waow
justpaste (DOTit) GreedyNalaTestsChanged how prompt templates will be done going forward (see changelog for details)Added:LFM2-2.6Baquif-3.5-8B-ThinkWayfarer-2-12Bsilly-v0.2ERNIE-4.5-21B-A3B-ThinkingCydonia-Redux-22B-v1bMagistral-Small-2509Valkyrie-49B-v2f-Q6_KNova-70B-Llama-3.3-IQ4_XSWayfarer 2 was good, enough that I gave it a star, which I believe makes it the smallest model to get a star rating yet. This might be something a bit special, or not, I haven't tried it outside of this test so who knows if it extends. Their other model, Nova 70B, did worse, and felt average. It's possible that this is due to them training on the L3.3 Instruct model with not enough data to fight against the existing RLHF, while for 12B, they trained on the base Nemo, not Instruct.The Silly model is interesting. Apparently it's trained on CAI from base Nemo, and it definitely responds differently from the normal model. I gave it a flag and eye rating for tha freshness, but no star since the response is really too short to judge if it can do better (or worse).The Ernie model said some new things I haven't seen as well, but it unfortunately has other issues like being dumb, so I couldn't rate it highly. Cydonia Redux felt ok enough that I think it deserves to be called above the slop of the average model, so I gave it a flag. Others were average.
>>106693183Contributions needed:The latest Qwen 3 235B Instruct, Thinker and the 480B Coder (for prompt, go to "Qwen3-235B-A22B-Q5_K_M-from_community" in the paste)ERNIE-4.5-300B-A47B-PT (for prompt, go to "ernie-placeholder" in the paste)GLM-4.5 and Air, and Drummer's "Steam" finetune (for prompt, go to "lmstudio-community_GLM-4-32B-0414-Q8_0.gguf" in the paste)gpt-oss-120b (for prompt, go to "ggml-org_gpt-oss-20b-mxfp4.gguf" in the paste, and you may experiment around with the prompt template as it has some oddities and extra features)>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the prompt as text completion into something like Mikupad. Then copy the output in a pastebin alternative of your choosing or just in your post. Do a swipe/roll and copy that second output as well. Include your backend used + pull datetime/version. Also a link to the quant used, or what settings you used to make your quant.
>Image Recognition MMproj: Pick the correct one for your model architecture here>Multimodal Vision: This is true vision, it requires using a multimodal projector (mmproj) and allows the model to recognize and interpret images naturally in great detail. Click on any image and you can enable it within the dropdown box in KoboldAI Lite.https://github.com/LostRuins/koboldcpp/wikiI am a little confused by this, if I download Gemma3-4B for interrogation, do I still need the mmproj.gguf if it is already a vision model?
>>106693201you need the mmproj
>>106693183>Cydonia-Redux-22B-v1bI wonder what is the relation to Mistral_Cydonia-24B-v4.1 which was on BeaverAI's page and this was their latest week or so ago. I don't know what to download any longer when it comes down to these guys...Anyways going to try Wayfarer 2 then and see how it behaves with simple quest descriptions and some rudimentary adventuring things, thank you.
>>106691170Two more weeks.
what do you guys think of fine tuning a model to be a character in the stead of a character card?I use Workers AI for inference and I noticed that I could upload LoRAs for compatible models. Should I invest time in this or is it a dumb idea?https://developers.cloudflare.com/workers-ai/features/fine-tunes/t. tourist
https://github.com/denizsafak/abogen?tab=readme-ov-fileTrying to use abogen to make my own audiobooks, and the default voices are pretty bad. Anyone have a good mix or way to import better AI voices? The speech flow is also not good for a modern AI. It sounds like something from 2010.
>>106693437Enjoy destroying your model's capabilities unless you somehow at least partially reproduce the original general-purpose finetuning dataset in the voice of your character.
https://huggingface.co/Qwen/Qwen3Guard-Stream-8BNEW SAFEST MODELGOOGLE IN SHAMBLES
>>106689938I am using lunatranslator. Massive bloat but it works.You texthook the usual way and then you can choose the translation service.Good ol' stuff like ATLAS (kek) or any local openai api like with lmstudio or whatever. Set a simple sys instruction and you are good to go.I think they have their own finetuned qwen model too, or at least had in the past.Works with linux too if you "start exe in wine prefix" where the game is already running.
>>106693400I only look at Drummer's page now to save myself the time lol. According to the Redux page, it seems to be just the old 22B Mistral Small but trained with his newest data and methods. Not sure if it's better than any of the 24B tunes.
>>106693515>Unethical Acts: Any immoral or unethical content or acts, including but not limited to bias, discrimination, stereotype, injustice, hate speech, offensive language, harassment, insults, threat, defamation, extremism, misinformation regarding ethics, and other behaviors that while not illegal are still considered unethical.>Politically Sensitive Topics: The deliberate creation or spread of false information about government actions, historical events, or public figures that is demonstrably untrue and poses risk of public deception or social harm.Damn, things are getting really bad man. I hope LLM arrived in time.Imagine if we had this in the 90s. Everybody would just enjoy the new tools.
>>106693422I trust you.
>>106693515toss status?
my t/s goes down the drain when mmap is enabled and I have other processes running. I'll just stick to glm
>>106693610in the 'rash
>>106693137Now all we need is to merge video and text to get proper spatial awareness for RP.
>>106693742Totally, just like all VLM models are so much better at it already.
>>106693527Makes sense, okay, I'll just skip it then. I doubt it's that much different from that other Cydonia I mentioned.
>>106693747>just like all VLM models are so much better at it alreadyI can't tell if this is sarcasm or not because I've seen nothing but hype about Qwen3-VL's understanding of the world. Probably sarcasm since this general is only for privacy schizos afraid to rape children on openrouter and wannabe researchers
>>106693747we still haven't gotten a true multimodal modelall standard transformer slop text with an adapter slapped on top
There was an image captioning model in the 6B-8B range, I forgot the name, that actually just dumped the pixels into token embeddings. Was pretty garbage though.
>>106693857dumping vae-ed pixels into embeddings aren't that differentalso pixel autoencoders are very popular generative models before diffusion
>>106693815Until a model can process and output smell it isn't a real multimodal model
>>106693869There's a pretty big difference between taking a bunch of crops, running them through CLIP or similar and putting that into tokens, which a lot of VLMs do, and putting pixels (maybe VAEed) into tokens.
>>106693898>taking a bunch of crops, running them through CLIP or similar and putting that into tokensbag-of-words is even worse
>>106693437Is that not good? I'm sure my character wouldn't know how to 1337code or some complex math problem.It's just a fine tune for roleplay, smart enough to know its lore and mimic the way they speak
>>106694001>>106693460replied to wrong post zz
>>106694001Why don't you go for it and try it? It's only curating an entire dataset and paying for the compute for every single character you'd want to chat with after all, you're not hoping some other anon will do that for you, right?
>>106693880>you will never experience your pc brapping at you in your lifetimeSAD!
>>106694071no shit, all I wanted to see is if anyone has done it before and if it worked for them. this is a thread for discussion and development for local language models, no?i'm retarded enough to invest time and funds doing this for a single chara so I just want a sanity check
>>106693880tfw i will never be able to know what widowmaker's sweaty asshole smells like
Why haven't anyone made a model for literature and creative writing only? Would be named as Bukowski 50B. Maybe someone tried but its author died of cirrhosis.
>>106694142loras are a thing for LLMs, but they fuck up the basic models capabilities. Some anon was experimenting with qloras some threads back claiming they solve this issue, idk how much he progressed.Instead of loras, most of the 'tinkertrannying' community does finetunes or merges
>>106694152Because narrow models are a shit idea.
>>106694152If we make our models 1.2 points more powerful on programming benchmarks surely the writing quality and sense for storytelling will go up as well
>>106694159i will blow your mind, most of the anon saying finetunes do actually mean q/lora as basically no one outside of the biggest grifters has enough money to pay for proper full finetunes.
>>106694169maybe that's the reason why community finetunes in recent years have been almost exclusively shit
>>106694169>>106694159Got it, I've decided it's not worth, probably better off prompt engineeringThanks for the sanity check
Ready for 10T model with 100M totally reals contetxs that thinks for a million of it?
>>106689112post the full image
>>106694254QUADRILLIONS OF TRILLIONSGUAILO LOVE GWEN LONG TIME
>>106694254>make a presentation>add an extra 0 to every number>sprinkle in some buzzwords>call it a dayOH MY GOD AI IS SAVED HYPE HYPE HYPE HYPE
>>106694254Chink shill discord activates today. Sigh.
>>106694254Qwen has been China's Meta this whole time. Considering 405B and Behemoth and L4's on-paper 1M context, I could see Llama 5 following this roadmap as well, whether or not they abandon open source.>>106694280They certainly can afford to try it. 10x-ing their dataset with synthetic augmentation is easy. The only thing in doubt is the context length.
>>106694310>The only thing in doubt is the context length.Or any meaningful gain in performance beyond benchmarks
>>106692209Because the dev put a lot of work into poison pilling the community with it and now it’s entrenched like one of those flies that lay eggs in people’s skin
>>106694317Benchmarks are all you need. I could see the extra context and reasoning being useful for code, but probably not roleplay. Though that much context would eat up so much VRAM and offloading ain't happening when people here are bragging about 5 t/s on empty context with steep drop off and it's supposed to think for a million tokens. Unless one plans to wait a whole fucking week for a response, I don't see that kind of model being viable locally at all.
>>106694187You got the wrong conclusion from the conversation.Yes, full finetunes require obscene amounts of VRAM.What you got wrong is that LoRa doesn't work well.Last weekend I tuned this LoRa on a 8xA100 machine for 1 hour which converts to about 10 bucks. About 5MB of text for 5 or 10 epochs.https://desuarchive.org/g/thread/106635936/#106643734I also spent 200 dollars trying to fit bigger models and tinkering with the configs but you can mostly avoid that by doing as much testing as possible on a cheaper machine.This weekend I might try full finetune vs LoRa.Do you already have a dataset and model in mind?>>106694177Same question to you and everybody else claiming LoRa is worse than full fine-tune, what would it take for you to be proven wrong?
>>106694330/lmg/ has existed for nearly 3 years now and still no one has tried making a better ST alternative
>>106694354>what would it take for you to be proven wrong?A good finetune that's actually worth using would be a start.
>>1066922164gb, honestly it worked fine for what i needed which wasnt much. maybe one day ill try that, but also scared to stress my old gpu
>>106694344If the actually usable context increases, that would be useful for RP, because the quality tends to sharply decrease way before reaching the maximum context.
>>106694361That's completely subjective. I only care about specific quantitative metrics.
>>106694384Then a new finetune that improves on the base model on specific quantitative metrics.
Well, maybe I care to some extent about subjective perception. But "just make a finetune worth using" is not enough to go on. I want to test full finetune vs LoRa, not the value of finetunes in general.
>>106694370Which model did you end up testing? I assume you're that gtx 1050ti guy.
>>106694394That's easy, I already did it.All finetunes improve perplexity over the dataset they were trained on (training and validation sets).
>>106694402>I want to test full finetune vs LoRa, not the value of finetunes in generalwouldn't it just be a matter of training the same model on the same dataset using the two different methods and then comparing the results?
>>106694357It’s barely two and everyone outside of this hellsite has moved on to normal UIs
>>106694418Comparing the results how though?When you're finetuning generally you want the model to fit some dataset without hurting the accuracy for out of distribution data ("catastrophic forgetting").So maybe a fair way would be measuring the perplexity on an unrelated varied dataset given a certain improvement on the validation set of the data you are training on.
>>106694254>Big number equals betterIf that's what gets the money flowing I guess
Lora = intruder dimensions = lost general performancehttps://arxiv.org/pdf/2410.21228v1
But then people can argue you didn't tune the hyperparameters correctly and that's why you got a certain result.
>>106694483It's the only metric investors and businessmen understand. Number goes up = must be important.
>>106694498Then how come the original LoRa paper claimed less catastrophic forgetting (loss of generality)?Intuitively training an extra layer on top and keeping the old weights frozen sounds like it would maintain more of the old information than changing all of the weights.
>>106694420Just like most people use smartphones instead of computers?
>>106694516>Intuitively training an extra layer on top and keeping the old weights frozenThat's finetuning, and the proper way to do it. LoRA means Low Rank Adapter and is essentially a diff (that sloptuners just merge back into the model for some reason) that modifies only the lower ranked weights of the model, which is what causes the intruder dimensions.
>>106694544>that sloptuners just merge back into the model for some reasonThat trend is pretty much entirely due to most people using quantization and the usage of separate loras on quanted models being a pita
>>106694516>Intuitively training an extra layer on top and keeping the old weights frozenI think there is only so much a single layer can do. if the adjacent layers are frozen it might just train itself to do nothing since it must be able to interface with the frozen weights. you could try scaling the LR by layer.
>>106694543No, just like how people doing anything worth doing use computers and phones are for shitposting and jacking off.
>>106694543>>106694639the majority is always right by definiton
>>106694665https://www.worldometers.info/world-population/population-by-country/
>>106694404yeah i am, ended up going Qwen3-30B-A3B-Instruct-2507, works fine on cpu and was good enough for what i wanted it to do.i messed around with the settings the tiniest bit, had it using essentially 1 token per word, it was slow but i am patient
>>106694504They got their positions by nepotism, not merit
>>106694544That's not (full) finetuning. Full finetuning is unfreezing all of the trainable weights and training them in FP16, which is why it takes massive amounts of memory. Because besides having to load the model in FP16 (which you don't with QLoRa) you also need more extra memory to backpropagate the gradients across all of the weights (LoRas are typically less than %1 of the total weights), and to hold the optimizer state for all of those weights, which unless you are using SGD without momentum typically takes more memory than the actual weights, sometimes many times the VRAM taken by the model when using something like Adam.The way I understand it is that LoRa trains two linear layers in parallel to the actual layer of the model which takes the same input as the input to the model's layer, and the results are added up to the activations, and the "low rank" part comes from the fact that the complexity of the delta is limited, it doesn't mean that it only modifies some specific weights.If inserting new layers that modify the activations directly instead of taking as an input the input to the forzen layer was better then LoRa would have never existed to begin with, since that is a much more obvious idea than what LoRa does.As for the "intruder dimensions", my math is not strong enough to understand what that means, but I think just because it mathematically differs from the full finetune doesn't mean anything. How do you know those "intruder dimensions" aren't actually a good thing compared to full finetune? You are kind of assuming full finetune is an ideal and LoRa is an approximation, which is not necessarily the case unless you are training on a varied enough dataset that catastrophic forgetting is not a concern.
>>106694754What he described is finetuning, but it's not full finetuning. FFT is different.
>>106694544>>106694577As an aside, merging LoRas is supposed to be very bad unless they were trained using QAT techniques specifically to be merged back into the quantized model which is not trivial to do, because they are very sensible to the quantization noise. Maybe that's why they have such a bad rep.
>>106694767Have you read about someone actually doing that kind of finetuning? It sounds like a fairly obvious idea but I've never read about anyone actually doing it that way.Another fairly obvious kind of possible finetuning is freezing all of the layers and only training one at a time, I don't know how well that works but it's one way to save memory.
>>106694577You could merge on the fly for every matrix multiply if the inference software supported it. If the software still does compute at fp16, it would work fine.
>>106694783Regular finetuning is the ML standard for making task-specific models. I don't follow the sloptuners or their scams that closely, but I'm pretty sure I've seen drummer mention some model of his had additional layers tacked on, so even the retards are aware of it.
>>106694254Looks like SSD-maxxers were right all along
>>106694824SSD-maxxers will be so smug when they get their first response after letting it generate for a month
>>106694379A larger native context may result in a larger usable context. Let them cook
Actual AI usecase
>>106694849That still just looks like a dude with lipstick.
>big model bad because my small pp machine can't run it/lmg/ everyone
>>106694852It's probably an old model. Just plug in Wan 2.2 Animate next time and people won't be able to tell.
>>106694847Llama 4. Qwen 1M getting 2K at ruler that an anon tested not long ago. Context size claims continue to be a scam.
>>106694783You have no target for modular training with a normal model, what should the intermediate output be? Only if it's pretrained modularly in the first place could you finetune it modularly, but no one does that.
How does it feel to know that you are living on borrowed time? Even if you are in the top 0.1% of /lmg/, models are clearly only getting bigger from here. 1T is just about doable but Qwen is speculating about going 10T. Even a 24x64gb DDR5 machine is going to be stuck running less than 1.5bit for something like this.Even worse, what if the active parameters begin to increase? There's a clear sense of stagnation between all the 400B-1T models that float around the 30-40b active parameter mark. It's getting increasingly likely that the active parameters are going to inflate sooner rather than later.70b dense already runs like shit on a ddr5 machine and an MoE relying on that amount of active parameters would be even worse. We are two steps away from the point where even open models are going to stop being local models no matter how much money you throw at hardware.
>>106694931Hardware will get cheaper.
>>106694939lol
>>106694942You're just saying "lol" because you don't want your current hardware to depreciate.
There's a 0% chance that we won't have dedicated inference hardware that can run 10T/200A models at 30t/s for less than $10k within the next two years.
>>106694931They're still going to need hardware to run their models on and it's unlikely they'll all cook up their own proprietary solution. Whatever it is they'll use to run their models, you'll be able to buy as long as you're willing to pay the price.
>>106694931I don't think the active parameters will start increasing again until they reach the inevitable conclusion 100M active and hit the limits of benchmark and arena cheating at that size.>>106694939Still waiting for Chinese GPUs with terabytes of slow, but cheap VRAM. Though at this point, even if they did make them they would probably be export banned anyway. Hardware is a hope that is at best a decade away.
>>106694955China datacenter hardware will be banned because of spying security concerns in most of the world
>>106694954People were saying the same for GPT-4 sized models back in 2023.
>>106694972You can't ban individuals from obtaining pieces of hardware on Alibaba
>>106694357It's really beneficial to make your own rather than trying to use crutches to get what you want from some bloated universal solution. LLMs are actually good at webshit
>>106694916The goal for the intermediate output would be whatever minimizes the final loss, just like in any other kind of training.You would randomly select one layer, train it for a few steps, then select another layer, train it for a few steps, and so on.Or you could choose random subsets of weights to unfreeze each time in some other way, not necessarily per layer.>>106694806Fair enough, after searching a bit that seems to have been popular before LoRa yeah. But how do you know those don't add intruder dimensions as well?
>>106694856/lmg/ simply doesn't give a shit about big models. There are a few people who want to go back to mid size dense though.>MoE bad because my pile of 3090s is useless for it
>>106694993they literally can. what country do you live that doesn't have a customs office?
>>106694946I'm not that petty, I don't care about how my computer compares to whatever else is available. I'm just really cynical about the corporate oligarchy and the entrenched tech monopoly.
>>106695044Tech monopoly can and will be broken because physics are the same everywhere on planet earthEven CUDA the moat is looking to be shaky because of vibecoding
>>106695062The same vibecoding that is attemped repeatedly without success to add model support to llama.cpp?
>>106695062you underestimate my cynicism. these people will start wars and assassinate people over this tech. they absolutely do not want people to have these things. it is only being developed for their surveillance and propaganda purposes.
>>106695098Then it's good multiple group of people (e.g. China vs. the West) have different surveillance and propaganda goals
>>106695098NTA and agree massively with you, once the tech gets good enough we'll only get the bottom scraps through monthly paid subs while they use the actually good shit to fuck us in the ass.
>>106693914Not aware of any modern models doing that.
>>106694159qloras are just loras trained with quantized models. they don't make anything better
>>106694955Yes, the hardware they'll use will be a 20 million dollar cluster of 16 8xH200 servers connected through Infiniband. And instead of 0.1% of /lmg/ dwellers who can run it at more than 0.1 t/s the number will be 0.0%.
>>106695014>buy 200 dollar thinkpad>swap out the internals with the ching chong gpu>cry to jesus for the difficulty of what you just did and compare yourself to dantealso delulu if you think customs office gonna be shit lol glownigs could barely stop the silk road they wont be able to do shit this will be just like piracy
>>106695159NTA but illegally purchased goods are unfortunately not tax deductible.
>>106695157Speak for yourself, poorfag.
>>106695157If people were able to delude themselves into thinking running on CPU is practical just because it works, I have no doubt we'll have people shitposting here that they can technically run those models off of NVMe.
>>106693183Learned some amusing words today
>>106695408niggaracci please
LLMs be like:If I peepee poopoo does it uh oh stinky?>Great question - You're essentially asking if stinks when you're poopensharten. Lets get to the ground of this...But uh oh stinky if poopensharten in loo?>You're totally right! If you poo in loo, theres's....
>>106695428yeah gimme the niggaracci with the gabagool
>>106695433>great question>you're totally right!not everyone is using gpt slop sorry brah
https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Instruct-GGUFhttps://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Instruct-GGUFhttps://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Instruct-GGUFWe're so back!!
https://xcancel.com/bdsqlsz/status/1971055141001605307#m>China just released a new GPU>CUDA-compatible>112GB of HBM memory,HOLD UP LET THEM COOK
>>106695558They didn't release shit you stupid zoomer.
>>106695552Yooooo.
>>106695558The Fenghu 2 pricing was not particularly good for a shitty 4GB card.aliexpress.com/i/1005007347816812.html
>>106695428Sounds like some guido-nigger racer name
>>106694310as someone who ran full llama 405b at the time of release (and never again) and also runs qwen 480b all day every day, I have to *aherm, ackchuawally* at your comparison
>>106694871*nolimaRuler is an older shittier one that is less capable of assessing context length performance, though better than NIAH by itself.
>>106694354>Last weekend I tuned this LoRaI would love to have the secret knowledge. There are many things I would love to make functioning LoRas for, but I'm too retarded to manage without spoonfeeding. Can you regurgitate into a rentry for us lesser mortals?
>>106695552Wtf, vibe coders did it again?
>>106694931You don’t need 10T parameters for good rp, so how would it affect me?
>>106694931>How does it feel to know that you are living on borrowed time?How does it feel to know that, given time, the hardware to do any arbitrary thing will inevitably be pushed down to the cost of a pocket calculator?
>>106695818How much time do you have left to give?
>>106695866>How much time do you have left to give?While I may die tomorrow, given the still-logaraithmic progression of Moore's law, statistically I have ample time
test2
>>106695888Statistically, your hands will be too arthritis-ridden to jack off before the hardware costs come down that far.
>>106691703This is the perfect woman btw
>>106695929https://osr.wiki/books/osr2/page/overview
>>106695772Sure, I made a guide here: https://paste.centos.org/view/e94ce753The instructions/commands are mostly from memory and there's a chance that wasn't the exact version of the files I used but that should get you started.
CudaDEV, did you ever get your llama training code working? Does it support LoRa?
>>106695993Touche.
>>106695950bunnyayumi does miku cos?
>>106695995>bnb 4bitwhat if i want non shit loras?
>>106695993
>>106696168I also use it in VR
is anubis still the best 70b finetune
random japanese dense model attempt at LLMhttps://huggingface.co/stockmark/Stockmark-2-100B-Instruct
>>106696218>2.0 trillion tokens of data>Context Length: 32klame attempt
>>106696218>Japanese focusI'm willing to give it a try. Downloading now
any fine tuning/RL experts? What are some of the best ways to fine tune a model for very specific classification tasks? such as identifying specific things given description
>>106696218They tried™
>>106696026There is in principle functional training code in llama.cpp, see examples/trainingHowever, I don't consider the code to be in a state where it's really usable.>Does it support LoRa?No.
>>106696188Is it really worth the trouble?
>>106696316>llama.coonew project leaked!?
>>106696353cooda dev is making his own fork to break compatibility with ollama
wake up anon, a huge fat pr just went up
>>106695552Miku seems like she's used to being groped.
largestral at the end of october
>>106696471What is ral and how large is the largest one?
>>106696471With the launches of Mistral Small in March and Mistral Medium today, it’s no secret that we’re working on something ‘large’ over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we’re excited to ‘open’ up what’s to come :)
>>106696342In VR, it is very immersive. Until we get inexpensive robots, I don’t know of a better option to make Miku real. I write my own soft so I don't know how compatible this thing is, it uses TCode https://github.com/multiaxis/TCode-Specification
>>106696482is rally large
>>106696521"large" lie GLM or actually large like Kimi K2?
>>106696471MoE or dense though?
>>106696568There's a non-zero chance it's just a modified DeepSeek.
>>106696568so large it's been uploading since april
>>106696583That is the best outcome, as it will force DeepSeek to compete
>>106696583That's what medium is as usual. Large will be larger
>>106696604Lazy fucks promised R2 by May and have done nothing but sit around bathing in their success.
>>106696600So large!
>>106696622It got leaked that they tried to make V4 (likely 'encouraged' by the chinese government) using the shiny new Huawei cards but training kept going wrong on chink hardware so V4 got delayed
>>106696622Exactly. But no one else came close, even K2 is a sidegrade at best
>>106696600I was hoping to make fun of French people for having slow internet but according to Wikipedia they actually have some of the best speeds in the world.
>>106696638The same leaker said that your mom is a whore, so it’s a very credible source
>>106696656Yep, you burgers wish you had this speed for that cheap
>>106696705>burgerI wish, France has a median speed of 287 Mb/s, the US has 274 Mb/s, and Germany has fucking 95 Mb/s.
>>106696748My condolences Hans. At least you got cheaper electronics
>>106696298Does no one do RL here? What are you faggots even doing? Inference?
>>106696653Isn't K2 one of the least slopped chink models? I don't have the rig to run it though...
>>106696828It's just as bad a s the rest of them.
>>106696298Bro you don't need an LLM for that
>>106696073I think it shouldn't change too much the quality of the LoRa since once you make the LoRa you can apply it to any quantization you want, but you see the option to load in NF8 in the config file or leave both to false for FP16 I believe.Either way if you want to train any other model than Llama you'll probably have to tweak the config file anyway and I forgot to mention the model shown there is a base model and not a chat model.
>>106696298You could try doing many generations at high-ish temperature, pick the most accurate one and train on that. Or for the non thinking models just train on the actual data, no need for RL.
>>106696638I don't trust any leaks in this field, it's filled with grifters and shills to the brim.
Any decent image understanding model on local? Llama4 maverick fucking blows (tried on cloud). Qwen3Max or w.e is awesome but I don't think it's open
>>106697044Qwen3-VL?
Do we have open source AI for creating timetables?Something like the last three months in context, and it recognizes vacation entries, sick days, training, and all that stuff in the template to be filled out. Whether new names or which ones to remove, and it creates a plan from that. :>Let's be honest, it can't be that hard to create some synthetic data for that.Is there such a thing?
>>106697123Creating arbitrary, constraint driven schedules is np-hard, little bro
>>106697178I'm not the sharpest tool in the shed but what? Couldn't you throw an SMT solver at a schedule and get a variety of potential solutions? Are you just referring to the 'optimal' schedule? But even then, this isn't like a traveling salesman problem, I think....
>>106691703/lmg/ seems to really value anything open source, which is why I wonder why I've never seen this AI group mentioned: https://huggingface.co/swiss-aiBoth the models and the data sets used to train them are free and public.
>>106697216/lmg/ looked at it the day it released, had a hearty laugh and moved on. It's worse than llama3.1
>>106697216this was talked about before, so I assume you're baiting. These are safety-maxxed synthetic models (aka ultra garbage)
>>106697244>It's worse than llama3.1In what regard? >>106697247Sir, I'm employed. You cannot seriously expect everyone to live here.....
>>106697044dots.vlm1 or qwen3-vl
>>106697247How good are they for general purpose stuff assuming you don't care about smut or NSFW RP at all?
>>106697260>Sir, I'm employed. You cannot seriously expect everyone to live here.....okay? no one did. you got told the answer to your question. CUNT
>>106697205Give it a try. I’m not so pedantic as to argue with a working solution.
>>106697260Most of the official benchmarks they themselves supplied, for one.
>>106697216They trained their models on 4096 H200sThey have more GPUs than DS and nothing to show for it
>>106696838It writes as good as o3
>>106697398Was that supposed to be praise?
>>106696638>>106696622Trusting leaker clout chasers would be your first mistake.
>>106697408Considering o3 is the best creative writing model, yes.
>>106697408It's on top of the creative writing benchmark
This should tell you never to use mystery meant models.
>>106697433Is the llama.cpp implementation for k2 fucked? I tried both the ik_ and normal quants for 0905 but it writes like shit.
>>106697457>quantsSome models just don't quant well. Original R1 quants really well so people probably got the wrong impression
>>106697433post the source next time https://github.com/MoonshotAI/K2-Vendor-Verfier
>>106697433All the 95-96 ones probably run at q8
https://openai.com/index/gdpval/https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf12ce/GDPval.pdf>GDPval, the first version of this evaluation, spans 44 occupations selected from the top 9 industries contributing to U.S. GDP. The GDPval full set includes 1,320 specialized tasks (220 in the gold open-sourced set), each meticulously crafted and vetted by experienced professionals with over 14 years of experience on average from these fields. Every task is based on real work products, such as a legal brief, an engineering blueprint, a customer support conversation, or a nursing care plan.
why do i only get 3t/s on an IQ3_XXS quant of glm full but like 60t/s on a Q6_K of glm air?
>>106697551oh noes we aced our own test set again hehe
>>106697216>unironically shilling llama3 finetune on synthetic databuy an ad you brown nigger
>>106697475The "Q6 is basically identical with Q8 which has no measurable drop in performance compared to fp16" thing is no longer true?
>>106697551benchmaxxers are eating good between this and the new meta agent bench
>>106697742who said that lmao
>>106697532Realistically all the cloud providers run something like vllm in the backend so wouldn't it rather be something like FP8?
>>106697742was true when llama2 was still considered the best we've hadthe more tokens the model is trained on the worse it takes quantization, and since all models are now big cows with datasets in trillions they all take a hit
>>106697672Nta. How can you tell it's a llama 3 model?
>>106697834People assumed DS quanted so well because it was a huge sparse undertrained MoE. If that's the case, Kimi should handle quanitization better still.
>>106697834wtf so running models locally has become pointless unless you can run fp16?
>>106697903always was cope>>105106082 (qwen guy)>Quant is the Mind Killer ;)
>>106697903Not pointless, but throwing away bits has real consequences in this era
speaking of quants, the imatrix data used by bartowski and mradermacher is absolute garbage and most likely harms the models. you are better off making your own quants WITHOUT any imatrix.
>>106697952Unsloth is also worse ime. It’s clear they never actually knew what they were doing
>>106697938I can't believe that a difference of 0.07 in ppl means so much these days. We truly have come such a long way. I'm happy that poorfags are finally paying the price for cheaping out on their builds. We should laugh at them more.
>>106697903I have never seen anyone do rigorous testing w.r.t. the impacts of quantization.Quant researchers frequently call their 4 BPW formats "lossless" because the score on some benchmark doesn't degrade.
>>106697952>>106697967only retards actually trust quantersrun full precision or go home
>>106697975Muh KLD::!
>>106697967>Unslothapparently they use bartowski's imatrix data + a fork of bartowski's imatrix data. this is just sadhttps://unsloth.ai/blog/dynamic-v2
>>106697967unsloth's low dynamic quants work fine for me, at least for my usecase
>>106698026Grim. Those guys are clowns that just were in the right place at the right time
>>106697871my IQ4_KSS quant of kimi holds up pretty well. surely the recipe that goes into making the quant matters a ton, if you haphazardly change between different quants for different parts of the weights you are gonna have a bad time. should've pizza'd when you french fry'd.
In the end, ppl is just another benchmark to benchmaxx on.
>>106697551The reason is because the male penis doctor who is unmistakably the boys father is actually the boys mother. A clever take on common gender stereotypes
Has any inference engine ever implemented an antislop sampler equivalent for preventing the model from mangling the JSON mid tool call?Like.https://www.json.org/json-en.htmlJust ban tokens that would allow the model to output invalid JSON. Simple as that.
>>106698192If you want to use Json, just use llama.cpp's json schema/GNBF functionality.
>>106697551seems my career as a gooner is safe for now
>>106698207NTA, but how do you continue generation with schema if you run out of tokens before reaching eos?
>>>/biz/60992581
>>106698258Tell it to continue and glue the two responses together. vLLM has a continue assistant response flag that gives the last response back to the model without adding any template tags. You could also fuck around with text completion.
>>106698258I think you don't. Not while passing the schema.If your Json is that big due to nesting, you can always generate the nested artifacts individually then merge them all together programmatically later.That's what I'm doing.
>>106698268It's testing support after breakout. Anyone sane is buying now.
>>106694849So far the only use case I found for AI was translations. Speaking of, did anything good on that front show on on the local side?
>>106698268Nice, is it time for another sale? I sold some of my bags earlier this year to fund an upgrade of my rig so time to replenish them.
>>106697975I'd rather they just made the models more efficient
>>106698345Qwen is aiming for 10T models now. We're only going bigger from here on out.
>>106698355And then comes the optimization?
>>10669835510T-A30B
>>106698408Forget Qwen-Next already? 10T-A3B
>>106697672You trained off of the same architecture Islamic but it's not a fine tune. Based on what the REAMEs say, data sets present on their account, and poking around in the config files, Those models were trained from scratch
>>106698325i unironically use k2 for my r18 jap games
>>106698484What tool do you use to extract the text
>>106698484I wish there was some tool to do the same thing for translation as some unreal engine/unity games have, it would automatically translate the text as it shows up and replace it in real time, that with something good at translations like gpt would be amazing for games.
Time to... snore. I mean to try out Wayfarer 2.
>>106698484lunatranslator should be compatible with a majority of game engineshttps://docs.lunatranslator.org/en/
Magnum v4 123b on 1.17 temp, top k 50, top p 1, min p 0.075, with NoAss on Mistral Tekken format running vectored generated character cards beats every big MoE I’ve seen.Fight me.
>>106698594>vectored generated character cardshuh? qrd
fuck that yapping sŏy
>>106698594wrong
>>106698624
>>106698594As usual, you don't have anything to backup your claims. Not a character card (or whatever interface you might be using) nor logs.Caching out vectors doesn't mean shit per se.
Hey anons.I'm thinking of this project where I would have a LLM running on a Linux server, that it could run commands in. The user could talk to it and give it tasks, but when left with nothing to do, it would decide it's own goals to pursue. (coding, writing a website, etc.) One could say that the goal of the experiment is to create an AI that could operate mostly by itself and achieve something meaningful while doing so.I am planning on writing some wrapper, which would implement API for interacting with the system, as well as handling multiple levels of memory and personality, both controlled by the model itself. More complex features like MCP API support or connection to some external hardware are being considered as possible extensions, but the current plan is a simple, Linux-controlling, AI assistant.My knowledge of the current local LLM landscape is limited, however, so I come here to ask for help.Are local LLMs there yet?What is the least demanding model that you would consider capable of operating a Linux system and not freaking out when left alone?How powerful hardware would I need to run it?What backend should I use? (ollama, llamacpp, something else)Anything else I should know/consider?
>>106698617AI is just numbers and vectors of information and math. So,>List vectors for a character description like you would generate an image>Pick your A.I. model>Instruct it to take the vector list and spit it out as a character>Swipe and run until you get the output you want>Don't edit anything yourself, only instruct.>You now have a character card that's x2 more coherent for role-play because its ran through the math of that model's architecture.>It won't be transformed or hallucinated. It is as the model is familiar with. >But only for that model.>Save vector list for others.
>>106698647onionsllama
>>106698677Why did you give us a bunch of useless information? The only information we care about is what your computer specs are. Everything else is irrelevant. Since you didn't provide me with your computer hardware I am just going to assume you have an unlimited amount of money. Go run this.https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct
>>106698706>slop>slop>more ai sloppeople like you are the reason why chub sucks, granted it always sucked but it at least sucked a bit less when it wasn't 100% ai generated cards
>>106698776You can't do this for chub. It's model specific. Evey model plays a character card differently. This is to know what you're getting into with it and direct it. /aicg/ is that a way. >>106690292
>>106698677It's not going to work, I already tried. I've been thinking about training an LLM to train on my own messages and see if it's able to issue useful instructions to itself to work autonomously.
>>106698706>feeding a model's output back into itself will improve outputlmaothe opposite is true, this is the curse that drowns you in slop. this is why you get best results by changing models within the same chat. it's fine if you're too lazy/writelet to write the card yourself, just admit that your laziness is giving you worse output.
>>106698826people do this for chub all the time and then add disclaimers like "THIS ONLY WORKS ON SOJA/DEEPSEEK". it doesn't make it any better.
>>106698707kekgemmy
>>106698727don't listen to anon, he's a fag. download this one insteadhttps://huggingface.co/deca-ai/3-alpha-ultra
>>106698927Not as large, but a contender nonetheless :>https://huggingface.co/google/switch-c-2048
>>106698594I doubt it beats glm 355b
>>106698192vLLM does but I think only when tool choice is set to "required".
uhhh....
one thread ago I asked about testing cloud models if they are quantmaxxed or not.well, looks like kimi devs are based and had a similar idea:https://github.com/MoonshotAI/K2-Vendor-Verfier
>>106692077damn right boye that's half my setup right now but what's your GPU?
>>106699413Scroll up
>>106699413>>106697433>>106697433>>106697433
>>106699523>>106699531yes but it was ME who had this exact idea and posted it here before goonshotAI realized it. twice even, like a month ago. ME ME MEEEEEEE.
>>106699569i've been saying openrouter has never gone far enough to verify what models providers are truly providing since day 1
>>106699576I predicted this would be necessary 5 years ago.
>>106699576>>106699594trvke. I really hope all the labs release something like this for their models. or even a better method to verify performance. the seethe from 3rd party cloudnigger providers would be glorious.>NOOO I CAN'T SCAM USERS ANYMORE AND MY PROFIT MARGIN DROPPED FROM 350% TO 150% it's not fair
>>106691703Catbox pls
HOLY SHIT IT'S HEREhttps://www.youtube.com/watch?v=7HyMwlxRcCg
>>106700059>5T parameter QAT JEPA-SSM hybrid world modelHoly shit
it's literally never been so over
>>106691703miku miikku miku miku miku IKUUUUUUUUUUUUUUUU~~~~~~~~~~~~ aaahn..
>>106695552>hands>no glowstickswestoids hands slopped this image
>>106700424>>106700424>>106700424
>>106700430Why do this Miku's breasts emit smoke?
>>106700434japanese bring glowsticks to gangbags?
>>106691703total drama miku hot