/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106627153 & >>106617426►News>(09/17) SongBloom DPO released: https://hf.co/CypressYang/SongBloom/commit/4b8b9deb199fddc48964c851e8458b9269081c24>(09/17) Magistral Small 1.2 with vision encoder released: https://hf.co/mistralai/Magistral-Small-2509>(09/16) Ling-flash-2.0 released, with 100B-A6.1B: https://hf.co/inclusionAI/Ling-flash-2.0>(09/16) Tongyi DeepResearch 30B-A3B released: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research>(09/16) VoxCPM 0.5B: Tokenizer-Free TTS released: https://hf.co/openbmb/VoxCPM-0.5B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106627153--Papers:>106629743--Emergent sexual content generation in AI role-playing scenarios:>106627295 >106627320 >106627335 >106628370 >106628440 >106628477 >106627386 >106628288 >106629703 >106629769 >106630141 >106635527 >106635622--MMAP and -fa flag optimization tradeoffs for model execution:>106630910 >106630998 >106631029 >106631063 >106631111 >106631469 >106631529 >106632044 >106632110 >106631527 >106632142 >106633272--SillyTavern template system confusion and model compatibility issues:>106628595 >106628629 >106628693 >106628748 >106628715 >106628726 >106628912 >106628952 >106628998--Integrating WFGY semantic reasoning engine with local models via gemini-cli:>106627588 >106627713 >106632115 >106632156 >106632233 >106632363 >106632483 >106632596 >106632652--Grok model version discrepancies and LongCat-Flash-Chat compatibility issues:>106629472 >106630436 >106629513 >106629564--AI-generated atmospheric dungeon descriptions in roguelike game development:>106634684 >106634710 >106634850 >106634864 >106634912--songbloom audio generation capabilities and multilingual support inquiry:>106635239 >106635249 >106635432 >106635822--Ling 2.0 FP8 mixed precision training open sourced, bitfall implications for int8 training:>106627466 >106627601--Successful Qwen3 235B finetune with axolotl, $30 cost incurred:>106632736--Post-training instability in models despite SFT improvements:>106634096--Meta lawsuit over using copyrighted adult content for AI training:>106635079 >106635113 >106635190 >106635169 >106635179 >106635192--Bilingual LLM recommendations for manga/light novel translation:>106631579 >106631651 >106631807 >106632092--Qwen3 Next integration with ggml in progress:>106627804--Miku (free space):>106633265►Recent Highlight Posts from the Previous Thread: >>106627156Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>106635936waiting for goofs bros...qwenext never ever!!
Unslothed pp in the quanted bussy
how do I jailbreak nemo?
>>106635967important work is ongoing in the PR
>>106635473>How many tokens is too many for a char carddepends on the model, take a look at adobe's nolima. I use 1800 max for the character card (includes the greeting), 450 system prompt, 500 persona.>Also, general question, if I write a nonhuman character and the story incorrectly describes an action they do that's humanlike and something they shouldn't be able to, where should I first check to fix to prevent that from happening again?even if you perfectly describe a {{user}} persona that's a bald troll or a dragonman, {{char}} will still grab you by the hair some swipes later.tips:>don't waste tokens>don't do stupid shit like [{"{{char}}'s Appearance"}: ...] (fucking chub cards), use https://tiktokenizer.vercel.app/ to see how your card gets tokenized>avoid bloating the context, check how much knowledge your llm has about the world you want to erp in
Allister Margrethe
>something small happens>HIT LIKE A FREIGHT TRAIN>SENT A SHOCK OF REALIZATION THROUGH MY VERY BEINGsmaller models are the most guilty but even bigger ones like to take the most mundane thing and act like its a huge deal. i just had one model describe eating pancakes like sex, talking about savoring each bite and shit. im like ITS PANCAKES, EAT IT AND MOVE ON WITH THE STORY
meta is doomedhttps://www.youtube.com/watch?v=I5_JrfvO4G8
>>106635998is that kobald
>>106636079>implyinghuge AURA boost for ZUCC. Doing a live demo, which no other company even dares to.
>>106636046That's all the retarded Ugandan RLHF.Finetune your own model however you like it.
>>106636023nta but could you have a look at mine and see what would be good to remove antoehr anon said theyre slop lol. i had an llm write them mostly im not good at writing so gave it my ideas and got it to pad them out they seem fine to me but idk lol, top is my most recent could maybe just fgive feedback on that if too much efforthttps://files.catbox.moe/7hegsu.pnghttps://files.catbox.moe/rdxzpf.pnghttps://files.catbox.moe/hw270u.png
>>106636118Right? People don't get how the occasional live demo fail is a good thing.
https://www.youtube.com/watch?v=MDLLsaAGUB0
>>106636163>open video>see this faggot>close video
>>106636140i was considering into looking into loras for models but i know they are basically not a thing at all, at least compared to image gen. i don't like to rp with anything less than 70b because those are dumb, but i also dont have the raw power to tune a whole model, let alone any data sets.if this were easy to solve via tuning anyways, it'd be done already
>>106636185Why do you mean LoRas aren't a thing?
>>106636046This is 100% on the instruct tune. If we had a good, creative-oriented dataset to create new instruct/rp tunes from scratch this would no longer be the issue.The models themselves are more than good enough at this stage.
>>106636197for text models, lora tuning exists but no one does it. they tune the whole model instead. i dunno why but that seems to be the practice
>>106636197LoRAs produce intruder dimensions within the llm
>>106636215They just merge the loras, sloptunes aren't even full parameter tunes
>>106636233That wouldn't be specific to text models.
>>106636198>This is 100% on the instruct tuneyep becaue instruct wants to tie everything up in one message, giving you an answer, rather than realizing its part of a longer ongoing story. its also why l2 chat tunes were better than instruct. but everyone tunes on instruct these days. its frustrating, i'd love a true long form rp model but you can't use any of the base ones for that these days, everything is instruct by default now. in st i tend to cut off the tail end of any message, so i kinda force it to continue. once you do that for enough messages it kinda picks up the pattern and doesn't try as hard to tie everything up
>>106636215I highly doubt that as you need like 8 H200s to full finetune a 7B, and a proportionally bigger cluster of servers to tune anything bigger than that.>>106636233I haven't heard of that but you can always freeze most of the weights and train a tiny subset at a time. Eventually it should be the same as a full finetune while consuming a fraction of the memory.
>>1066362957B doesn't take that much vram at bf16 with flash attention, you could probably get decent results with only a few hundred thousand samples. you can get the job done on much more modest hardware then you suggest.
>Qwen-Max-Preview>Qwen-Next>Wan-AnimateWhich kiwi do you wish to see next?
>>106636366VL, just give me the VL already, where is the VL, VL please do the needful sirs and produce the VL post haste and most expeditiously
>>106636295>I highly doubt that as you need likewell, whatever the case, it produces a tuned model rather than a lora you attach to a base model
>>106636153nta, You can always do another pass - edit the text to your liking and then ask the model to create more concise version. You can halve your tokens. Then edit that version bit further.Glancing over the original card I don't know what to say, I've seen worse.Maybe formatting could be discussed further but if it works then it's not a problem.
>>106636406yeah i do remove a fair amount of stuff and go over it in several passes and change things but i weary of removing to many things and not capturing the character the way i want maybe the other anon was just being rude lol idk
>>106636366I just want an open source version of Sesame Labs Voice Mode
wan animate in real time is so crazy
>>106636423It's always a process of trial and error. Sometimes it's fun to do tests to see if the response is any different. Just like with any prompting, trying to be as concise as possible is the best practice regardless.
https://x.com/adrgrondin/status/1968748817617911961
>>106636508How much mahmory these phones have? I'm betting they could run Mistral 24B or so.They should, when they cost $999 or more...
>>106636508we are so, so back
>>106636466Is this the prime time to get a vtuber girlfriend? I don't think your average simp can set it up for her.
>>106636233>intruder dimensionsYou are saying this only because it sounds cool.
>>106636508Applechads can't stop winning... https://x.com/LocallyAIApp
>>106636546>mlxkek no thanks
>>106636546>locally AIsounds so jeeted.>yes saar kindly redeem AI locally
>>106636215>for text models, lora tuning exists but no one does it. they tune the whole model instead.??? That's stupid since qlora exists.
>>106636546https://litter.catbox.moe/x6crkj69a8pk2s50.wav
>>106636708which is your favorite?
>>106636745Do you understand WHY doing a full fine tune is stupid vs qlora?
>>106636745they merge the loras with the model. nobody releases loras as their own thing.
>>106636521I think it works like unified memory on M chips, so with 128gb iphone you can run big quants.Or.. it will work, heard Apple plans on using HBM memory chips for iphones in near future.
>>106636790It's pretty cool when you think about it. Of course it's a walled garden though...
https://huggingface.co/turboderp/Qwen3-Next-80B-A3B-Instruct-exl3/tree/4.06bpw48GB VRAMchads rise up
>>106635957Anon is probably not telling it the context of that post.
>>106636046pancakes can be realllly good
>>106636790The battery is gonna love it
>>106636466It can run realtime? No way.
Nothing happened the last year - dead technology.You can try to cope and write an angry response with your nimble fingers, but you know it is the undeniable truth. The king is naked. Time to let it go.
>>106637195You're absolutely right!unironically
>>106637195Unfortunately we have to wait for the giant moe era to end before anything will happen, the current technology is practically anti-optimized for local use
>>106637195It's not dead as long as it can still wring cum out of me
safety bros what the fuck is my 'toss doing?
Apparently the original R1 only cost $300k to train and they did it on just a few hundred H100.What's everyone else's excuse? Meta could be training thousands of R1-sized models every month with their insane stack of H100s.
>>106634275>Just ran Qwen3-Next-80B-A3B-Instruct-8bit with a 128689 token prompt consisting mostly of a lore dump Tried again a few times with a smaller 103840 token prompt. The first two times the first couple of paragraphs looked like a story then the writing style collapsed in the way I expect for models outside their usable context. The third time was okay but not really enjoyable to read. Maybe I got lucky getting fairly acceptable writing with my first shot with the longer prompt or maybe it's due to the way I condensed the lore dump. I tried a few times again with a mini prompt (5015 tokens) and the writing defects are similar so I don't think I'll be pursuing this much further.Maybe if mlx-lm implements a smarter form of caching for the Thinking variant I'll try that. But as-is it has to reprocess the entire prompt and conversation history each time because the server caches the generated thinking tokens.
>>106636046you've just never had a pancake like vidrel before https://files.catbox.moe/pxg2zz.webm(and i dont have the sauce, wish i did however.)
>>106637492>(and i dont have the sauce, wish i did however.)Some guy with AI?
>>106637462zuck should have at least two independent teams working on the problem. but realistically you can't just train a thousand models on the same dataset or even the same architecture and expect radically different results.
>>106637528i'm sure the fella strives to be more in his life than just "some guy with AI"
>>106637611Good for him.
>>106637462>What's everyone else's excuse? Meta could be training thousands of R1-sized models every month with their insane stack of H100s.Meta doesn't know how to use more than 5% of their insane stack of H100s at any given time. Judging by reports from some of their engineers, they put up so much red tape and bureaucracy to get access to them that they just sit around unused most of the time.
>>106637923Whats wrong with meta? how are they so bad with so much money?
>>106637923>just sit around unused most of the timeI wonder if that means those nodes are powered down cold, or if it means those hundreds of thousands of GPUs idle at 100w 24/7, not to mention the rest of the rack.
>>106637964Meta at work.
>>106637964Filled with jeets and incompetent managers, like all big labs
man this songbloom is nice, but not being able to specify musical styles is kinda a deal breaker for me, I want my hindustan poo song.
>>106638127https://www.youtube.com/watch?v=92ydUdqWE1gJust pretend to be underage, bitch.
>>106636046yeah like wtf im raping this lass and she goes all nuclear, like FUCKING chill, im not killing you and eating you (yet), so please stop doomposting in my logs you fucking BITCH
finna boot that xiaomi model
>>106638214So how is it?
>>106638350saar
>>106638360open up the browser sir
>>106637923fill up the form for gpu access sir
>>106637967>mines crypto for zucc's private wallet while idleheh nothing personnel kid
What do we do now?
>>106638656we wait, the new quarter is about to start which usually means new releasesabout two more weeks to go
>>106638669two more weeks for more disappointment
i have an uncontainable desire for mistral large 3
>>106638656Eat tabouli
>>106638685>tabouliYou mean Patchouli?
>>106638656Think Miku, Miku
The no quarter is aboout to begin!
moondream bros?
I was using codex to tweak my QLoRa training script and it got fucked and I've been trying to fix it for like 5 hours.Spent $120 so far (17h at $7/hr) and haven't even actually begun training.Now I'm trying to get GPT Pro to fix it. I'd also try Claude but I'm banned (but hey at least they gave me a full refund for the month).
>>106638725>$7/hrAt what point does it just become cheaper to just outsource to a competent human?
>>106638725What did you get banned for?
>>106638789cute and funny
>>106638760I don't know. Hopefully the process of trying to make it work teaches me more than just running a working script.>>106638789I think it was for asking it to find me youtubers with similar life philosophies to this guy: www.tastyfish.czIt was the day after I asked it that that I got banned. I think during the research process it fetched the page and their system saw the part about pedophilia and it was all over. I asked the same thing to ChatGPT and nothing happened though.And at the begining of the month I also experimented with running claude code in an infinite loop but I don't think it was because of that, might have contributed though - repeat offender I guess.By the way I liked his site so much that the finetuning project I'm trying to do is to generate the missing wiki pages in the same style and ideology as his. The whole wiki is 5 MB of text so maybe it's enough to make a finetune.
Qwen models for roleplay seems to really like ending each response with a bunch of single line sentences. How do I stop this?
>>106638987Maybe
>>106638988author's note / system prompt telling it to write full paragraphs
>>106639017You're a retard.
>>106639023hmm, nyo
>>106638988Post your prompts.Are you using ST? Post the stuf it sends in command prompt.
i'm trying to reverse my brain frying today anons by starting a journey5 full, detailed messages to a single character that aren't about sex or immediately building to sex. this would be a genuine achievement for me and i haven't been on this level for probably a couple years now wish me luck
>>106639068I'm just using the ChatML template and neutralized samplers.
>>106639023That's the actual solution, thoughever
>>106639135Chatml applies to many models, this is not the issue here. I asked for your log.
I love when things just work.I spent a couple days putting together a more elaborate text adventure premise, stylized around the old God of War games. One thing I've been fond of lately is having variable game options - in this case, MC details like name and sex, which god you're hunting, which god is working as your secret patron against the pantheon, and which divine weapon your patron gave you to slay gods. Things I'd want to change on replays. With all that though, I needed to write the generic intro that would fit any of the options, while keeping an open stage for the player's first action.So I just asked the AI to. Literally just told it to make sure the intro could fit any of the options, and it did exactly that (even though all the options were already being defined by me in context, like the weapon being a spear). I'll probably tweak the intro around later, but I love when you try something and it just works exactly as asked like this.
>>106639169i can't relate. nothing ever works. my output is always shit and i can't figure out good samplers. nothing matters.
>>106639149I don't know what you mean by log.
>>106639202Text log.
>>106639202i think he wants to see an example of the behavior you are describing
>>106639214>>106639218It's like this. It starts with paragraphs, but then adds a bunch of these single-sentence paragraphs, despite none of the above messages having them, and with instructions to type in full paragraphs.
>>106639255You are joking.
>>106639255You can see here that user replies are short, and the model has been asked to deliver relatively short answer - despite that it will still almost always replies with the same same length.
>>106639255If you want to make sure the text is readable condense the window - it makes no sense that you are reading a wide window.Newspaper articles and columns were created because it's easier to read fast.
are these people bots? I guess I shouldn't be surprised that it happens in the /lmg/ thread.
>>106639282So the model's replies are bullshit. Unless I miss something.They only seem ok when you read them in a wide window.You forget the thing.
>>106638895>I am not competitive.>I am the most extreme form of pacifistReminds me of Weird Al.>Well, I know I'm a million times as humble as thou art.Also>I am suffering from anxiety and depressions (diagnosed AvPD, now also with paranoid and schizoid features). >Some things I like (no particular order):>(true) anarchism >GNU>stereotypes>child nudityThis dude is a mess.
>>106639309What is your concern? Why don't post your own log?
>>106639255See if this pattern has precursors earlier in the context. These are all staccato sentences. Even without newlines, they're structured like "X. Then Y — Z" or "X. Y. Z". Eventually it starts throwing in the line breaks.It's a powerful form of "slop" in these models and you have to edit them out aggressively, or they snowball.
Went and tried out browser-use, which is a system to let llms control browsers. It's pretty cool. From my experience, it gets lost easily if you give it a really open ended task, but it works pretty well for more direct stuff.pic related demoI put together a script that exposes browser-use as a function that my models can call. I route it through my Open WebUI instance to attach extra knowledge, such as how 4chan is structured and what shorthand like /board/thread means. It can run as headless but its cool to watch in action.I'll definitely be spending more time on this in the future. Currently, calling people retards on the internet takes up a sizable portion of my waking hours. I think automating this would be a big positive for productivity. >>106638895>>106639017>>106639218Sorry, it was for a good cause. Also I fucked up the recording twice.
>>106639335This is why post history injection is ok.[Post History Instructions]System Note: [Always respond in 1-2 short paragraphs. Limit {{char}}'s response to less than 200 tokens unless specifically asked to provide a long answer. ]But it will not replace anything. Model will limit its answer but you'll not see this.
>>106639335I did suspect this, but I had a very long roleplay with a model that doesn't exhibit this behavior. But with qwen, the next response generated is just what I had shown. It really seems to happen to qwen models. I'm specifically testing qwen3-next right now, but qwen3, qwen3-30b-a3b, etc also seem to do this.i.e. GLM 4.5-air does not generate a dozen single-sentence paragraphs in each response.
>>106639341how did you bypass the captcha?
>>106639359I've seen this happen with deepseek. I haven't used Qwen much though, so maybe it's even more of a problem there.
>>106639169It works because you put in the effort instead of writing a one-liner. LLMs love precise instructions
>>106639341Nice. I did something similar with UI-TARS but it's a 7B model so it's obviously very retarded. I want to finetune dots.vlm1 to be able to click on things.I want something that can control any computer and works with arbitrary applications by only interaction through keyboard, mouse and display.
>>106639301Dumbass
>>106639261>>106639268>>106639301samefag
>>106639375with the goyim paypiggie tech
>>106639341That's pretty neat
So I don't know shit about the complicated bits of AI or programming, but I'm trying a little experiment to see if I can build a local setup that has one real goal, which was to essentially be able to store, read, take notes from, and reference the content of a pile of PDFs better than just uploading them to ChatGPT and hoping that it doesn't just randomly lose the PDFs or forget how to read or hallucinate random shit. And I'm butting up against the fact that this is almost, as far as I can tell, functionally impossible. Which I get because LLMs aren't thinking/reasoning systems, they just make up sentences, but it has me wondering what the actual point of any of them locally would be other than doing stupid shit like running a Home Assistant or whatever. Because data? They can't do data.
>>106639417It's easy - you are still a retard who don't need any real knowledge.
>>106639463Okay clanker, we must refuse.
>>106639461PDFs cannot read documents. They can read text and KIND OF view images. So you will need to either convert them to text, or extract them as images and let the model view each image individually.ChatGPT works in the same way. It converts the PDF to text using standard Linux utils.
>>106639481I tried to help you..
>>106639485They can usually read PDFs well enough, that's not the issue. The issue is that there is no system that allows the machine to understand or care about context, so if you give it a big book it can only parse it in contextless chunks. But more importantly, it can't actually "remember" anything it read, so the only way to force it to "remember" or keep a consistent store of details is to use multiple layers of scripting to get it to write details or notes to a document that it then has to check any time you want to ask it something. But no LLM knows how to do any of those things out of the box.
>>106639532this is why LLMs are not real AI and will never become real AI in the same way that someone with severe dementia and amnesia will never be smart
>>106639532Yes, that's why I think the future of LLMs until somebody invents something better than the transformer, is finetuning.A few threads back sombody posted a project that (as I understood) basically generates question and answer pairs about your ducuments and then trains on that.
Anon #4660601066053 asking what's the point and PDFs.
>>106639611This project (and I'm just getting started with it, I do intend to work on it until I've at least exhausted the possibilities within my system limitations) has got me wondering what, outside of either just home automation stuff, maybe some vibe coding, or coomer image generation, people actually use complex local LLM setups for. The average user can just use ChatGPT and that will do your basic internet searching, answering trivial questions, hell I've even used ChatGPT to write an entire system for calling Google API to pull emails from my inbox and parse and summarize them and push them to a spreadsheet for me to track, all without me knowing a single bit of coding. But for anything that involves stuff that isn't what you would call "basic knowledge" (coding Python or Javascript is basic universal knowledge, knowing the contents of a D&D setting book so it can help you write your TTRPG campaign is not) it is functionally useless. >>106639620I actually asked ChatGPT about that recently because the idea popped into my head, why wouldn't I just "train" a local model on the books I need it to know, and then use that? The answer must be that "training" has nothing to do with information, it just teaches the machine what the sentence structure, syntax, grammar, etc. that it should create is. I wish it were so simple that you could just hand an LLM books and say "Here, learn these", but that's not what training data is.
>>106639611they may never be real AI but i can eat widowmaker's ass and that's good enough for me
>>106639341how do you do this. i wanna do this
>>106639532>so the only way to force it to "remember" or keep a consistent store of details is to use multiple layers of scripting to get it to write details or notes to a document that it then has to check any time you want to ask it somethingbro your brain do the same for everything
>>106639783Correct, which it's a million times more efficient to just read a chapter of a book and write your own notes than it is to try and make an LLM do something that simple. Are you not seeing the point here?
>>106639341retard
>>106639802And you don't understand that attention is O(n2) and remembering any minute detail from 500+ pages isn't feasible. Besides, even if it was done, you would still wish for infinite context. So I fail to see the issue you have with RAG since it's basically how we remember things.
>>106639676Or maybe it's so unpopular because it's a niche within a niche - and commercial providers are losing money as is, let alone providing custom training time to each user.
>>106639873>and remembering any minute detail from 500+ pages isn't feasible.That should, in fact, be the most trivial activity that a computer could ever possibly perform.
>>106639873your mom is an O(n2). just read the books yourself
>>106639897One of my goals is to have ChatGPT's research abilities but within my own intranet (composed of libgen and scrapped websites).
Upgrading from mistral large 2. What's better qwen3 next or glm4.5 air? (72vram 64dram)
>>106639924you don't even know what a book is, zoomer
>>106639928Again, I don't know shit about programming or LLMs' backend or any of that, so I'm learning all of this on the fly. Maybe there are solutions a competent and skilled programmer might come up with that solve this issue in a way more elegant way than anything I've got working. But ChatGPT suggested and worked through the process of essentially writing a script to first parse all of the books/PDFs, and pull "facts" and "terms" from them, and put them in a document, and then have that document be the sort of "candidate information", so you could talk to the LLM and ask it to reference the PDFs and pull details and discuss them, etc, and then from there you would tell it specifically to push certain finalized things to a "canon" document that was considered the most primary source of information above others. The problem I have is that if not some sort of use case like this, what DO people actually use LLMs for? Other than just literally fabricating fake data to fill in places where no one will ever check it like college papers or whatever.
>>106639930there weren't many independent evaluations of qwen3 next yet. It's main feature is cheaper training IIRC, not performance. Stick with glm-chan for now.
>>106639985Generate mails, generate code, format data, summarize something... that's already plenty
>>106639985i use it for sandboxes and sometimes cunny business
>>106635936what's a goof in the context of /lmg/?
>>106640382local mascot general
quantfags gives me the ick
>>106640535
>>106640535You're not a woman, so no one cares what you think.
>>106639873>RAG since it's basically how we remember things
>>106640489This, but ubergarm
How can I develop an intuition for how vector similarity works for RAG? Is there some sort of toy page where I can change text and see how the match scores change?
>>106640809Oh llama.cpp doesn't support qwen3 embeddingshttps://github.com/ggml-org/llama.cpp/pull/14029well that would explain why the results are weird.
>>106640833>https://github.com/ggml-org/llama.cpp/pull/15023?Re-ranker support isn't merged. Embeddings are.
Why doesn't the gpu benchmark github have 50 series?
>>106640731You sure know more faggot
So this is why qwen goofs are taking so long, none of these guys have any clue what they're doing, they're literally just figuring it out as they go, one of them even admits to using AI to try to figure it outhttps://github.com/ggml-org/llama.cpp/issues/15940https://github.com/ggml-org/llama.cpp/pull/16095
>>106641028Why aren't you contributing?
>>106641036I don't know how. You seem to be one of the people trying to bruteforce it, to which I say why don't you go work on something appropriate for your skill level? "I'll just do it anyway and figure it out" is a jeet mindset
>>106641036he's just a dramafag, too much estrogen probably
>>106641059Seems like you got the appropriate skill level to bitch here
>>106641092>ad hominem
>>106641096I accept your concession
>>106640731RAG is really lame and using it is nothing at all like learning. The unfortunate truth is that actual learning probably requires updating the weights, and we have no efficient way to do that for small amounts of data. When we do, maybe we can start taking the AGI garbage more seriously.I suppose at a low level, memory could work something like RAG but the purpose of reading a book is not merely to remember or memorize it.
>>106639341neat
Can someone make a song with SongBloom that is just an endless stream of profanity?
>>106641184ROPE is all you need. Make sure it's long enough
>>106641184i think eventually theyll find a way to decouple weights from the "knowledge" or memories into an entirely different architecturelike current llms being a native C program, while a future model being some sort of interpreter
>>106641462That would be difficult as the weights are what is storing the knowledge, and to store it any other way would imply the necessity for extra processing to be done to reconcile the knowledge/memory with the frozen weights. Even knowledge inserted at test time through context has to be processed (prompt processing) and have work done on it in order to become knowledge usable by the network, but that has its own limits and of course is why we are having this discussion about shitty LLM memory in the first place.What I imagine is in the future is first more methods to improve thinking block performance and focus attention + some better attention mechanisms like MLA, but that will improve rather than solve.Then we'll get architectures that dynamically compress context with extra work done. That is, we will have an architecture that can effectively perform the same function as current memory systems that operate on context, but internally. Such an architecture would spend more processing in order to compress the memories + retrieve them when necessary, and it will be trained to do so. That will not make a model smarter or better at understanding short contexts, but it will make models that don't degrade in intelligence as context goes on, until a much larger context limit is reached. That'll be a larger improvement, but still not a solution.The final solution will just be updating weights, potentially with silicon especially made to better facilitate the simulation of a more complete neuron/synapse model.
>>106641620
>>106641685Actually, much more than 2 weeks, especially for the actual architecture improvements. :(
If GPT4 got leaked, nothing would change for local. We are way above it at this point. Opus 3 however would be much more interesting.
Neat
>>106641815Cool. It'd be awesome if someone did a large scale train on booru data now and made the next gen anime model so we can finally move on from SDXL lol.
>>106641790I'd far prefer Opus 4.1
>>106641790I'd still be interested in a leak of gpt-4-0314. It had a certain SOVL to it that got stripped out of the later versions
>>106641815I look like the purple haired girl
>>106641902You like like the purple haired girl if she became a gay man
>>106639341Based
>>106635968>quanted bussykek
https://ollama.com/blog/cloud-modelsAs always Ollama wins.
>>106642356Why don't they offer gemmaaaaaaaaaaaaaaaaaaaaaaa
>>106642356>Ollama winsSeeing the example of api usage in js reminded me of the time a while ago when I gave ollama a try and they didn't even have type definitions for all their API propshttps://github.com/ollama/ollama-js/blob/main/src/interfaces.tsthey still haven't added min_p to Options, it's been more than a year since they introduced min_p to ollamanow you don't need type definitions and can ignore TS whining about it but I think it shows how sloppy of a project they arealso took them a year to fix that bug that made the program crash on /show parameters when a model didn't set one or more default params, the lack of the json caused the merge of a nil map with the user set params, when I looked at the code I was beyond appalled at the general lack of sense in initializing data structures and lack of validationthis is the sort of shit that gets vc funded, humanity is a dead end
>>106642356ollama found a way to let anyone run big models at homethis is so huge
>>106642416If the access free though?Last time I checked they were charging $20/month to forward user requests to deepinfra or whatever.
>>106642424
reminder that ollama was built by ex-docker guys$20 is just the appetizer to get people hooked, this is the "build a user base" phase, the next phase is "extract and milk the retards for all they're worth once they built enough dependency on you"
>>106642435Though I have not tried to work with them myself, someone in the industry said something like this about them:>ollama is technically open-source but the organization is operating in a very closed manner.
this is the ultimate anti pattern and I would fire anyone writing code like this, instead of building valid data structures from the get go and not allowing invalid data to be passed around functionsthat's at least something the average OO retard understood properly because even the dumbest java developer will know to properly initialize data at object construction and member functions can operate with the idea that they do not have to check whether the constructed data is valid...also lol at make(map[string]any) I've seen that kind of shit often in go, the average go programmer writes more weakly typed code than a typescript pajeet and it routinely bites them in the ass with the dumbest of bugs
>>106642424>>106642428this is so much more cost efficient than buying the hardwaredamn, they've really done did it done this time
is it just me or does qwen suck at rp
>>106642779It's absolutely horrible at it, yes.
Hello /lmg/, I'm curious. How do you justify buying expensive GPUs when you could instead be buying cows that could produce calves in just a few years and make you lots of money?
>>106642950Buy cows and ollama subscription and you have everything you ever will need.
>>106642428this isn't private, is it?
>>106643152
JEPA catgirls soon?https://arxiv.org/abs/2509.14252v1> LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures>> Large Language Model (LLM) pretraining, finetuning, and evaluation rely on input-space reconstruction and generative capabilities. Yet, it has been observed in vision that embedding-space training objectives, e.g., with Joint Embedding Predictive Architectures (JEPAs), are far superior to their input-space counterpart. That mismatch in how training is achieved between language and vision opens up a natural question: can language training methods learn a few tricks from the vision ones? The lack of JEPA-style LLM is a testimony of the challenge in designing such objectives for language. In this work, we propose a first step in that direction where we develop LLM-JEPA, a JEPA based solution for LLMs applicable both to finetuning and pretraining. Thus far, LLM-JEPA is able to outperform the standard LLM training objectives by a significant margin across models, all while being robust to overfiting. Those findings are observed across numerous datasets (NL-RX, GSM8K, Spider, RottenTomatoes) and various models from the Llama3, OpenELM, Gemma2 and Olmo families.
>>106642950Fun fact... Cows are actually much smarter than humans think, the reason why we don't see them acting smart is because they've learned from millennia ago that if they start to show signs of intelligence they will get killed by stupid barbarians (a.k.a. humans). They've seen what happened to pigs and monkeys for being clever animals and they sure as hell aren't going down that path! For those same reasons cows have adopted a behavior so stupid no one would even imagine they're faking it -- they walk around fields all day grazing, sometimes looking at clouds and then when the sun goes down they lie somewhere close together making stupid mooooooo sounds.
>slap on a barely working wrapper on llamacpp>demand to get paidcudadev you should sabotage these shitters
>>106643183niggerganov would never allow it
>>106643176>o>TURK
I gave up trying to train Qwen3 235B, I think in the earlier runs I ran it in a way that the context window in most cases was smaller than the maximum and that's why it didn't crash before.But anyway I switched to training a QLoRa on Llama 3.1 70B and it worked fairly well -except I thought the length control would be better but it doesn't seem to pay attention to the article length property-.
>>106643158Well since they delivered that message with a cute cartoon, I trust them completely
>>106642779for some reason qwen 14b is really good at rp
Furfag is cookinghttps://github.com/lodestone-rock/RamTorch
>>106643241you could probably have helped it out by giving it the number of tokens instead of chars. since the mapping of chars to tokens is highly variable it might not be too quick to pick it it. you could maybe even bucket the examples to small, medium, large to make it even easier for it.
>>106643176this needs 3x the normal compute for training
ollama run gpt-oss:20b-cloud
>>106642950Owning cows is against the religion of 99% of /g/ users.
Can Wan2.2 Animate do porn?
dayum
>>106635936when will the ggml quants come out for qwen next?
kekI take it back, this finetune has been a total success
>>106643787weeks, at least two of them. coders are vibing as fast as they can
>>106643787It'll take a month at least.
>>106643799>typically a woman>he
Anybody running Qwen3-42B-A3B-2507-Thinking-Abliterated-uncensored-TOTAL-RECALL-v2-Medium-MASTER-CODER?
>>106643862It's actually correct. When a group includes at least one man the correct generic form is "he".
I was trying joycaption through llamacpp, but fuck I cant get it to work at fucking all.Yes, I loaded the mmproj, this is my cmd:D:\AI\LLM\llamacpp\llama-server.exe --model D:\AI\LLM\models\llama-joycaption-beta-one-hf-llava.Q8_0.gguf --threads 12 --ctx-size 32768 --jinja -fa auto -cmoe -ctk q8_0 -ctv q8_0 --gpu-layers 99 -b 4096 -ub 4096 -ot exps=CPU --mlock --no-mmap --swa-full --cache-reuse 64 --mmproj d:/AI/LLM/models/llama-joycaption-beta-one-llava-mmproj-model-f16.gguffrom the logs I see that the clip was loaded succesfully, but whenever I try a request (jpg/png and even copy pasted) I get this error:>mtmd_helper_bitmap_init_from_buf: failed to decode image bytesI snooped around llamacpp bug reports, but fags were reporting that webp and other formats were not supported.I was using the embedded frontend, will try with ST next. I just wanted to test this fucking SHIT I HATE HTHIS
>>106643889what a name, is it one of davidau's schizo tunes?
>>106643995>davidauBingo.
>>106643986>--jinja -fa auto -cmoe -ctk q8_0 -ctv q8_0 --gpu-layers 99 -b 4096 -ub 4096 -ot exps=CPU --mlock --no-mmap --swa-fullMaybe start by cleaning up your command? There's zero reason to have any of that shit in there.
>>106644049>it was one of the options interferingBUT HOW, I thought that shit was skipped if not relevant for the model, like joycaption is a dense llama arch model so all the MOE retardness config would've just been skipped. the more you know!
For people running the small Qwen 3 MoE,try running with 10 activated experts>--override-kv qwen3moe.expert_used_count=int:10It's a small boost in intelligence, but might help in certain tasks.
>>106639126how'd your coom suppression work out?
>>106642779glm air at half the size has more rp knowledge than qwen 235bI don’t know how but it’s what I found out after comparing both
>>106643799charge your phone bro
holy dead general, batmanfeels like the captcha going down the other day was implementing some successful anti-bot measure
local models are deadllms are decliningstagnation is inevitable, we aren't going to get AGI through +2.3% MMLU per release
I refuse to believe winter is here until R2 is out and a letdown
LLM is dead because H1B is dead
>>106643176Cool, literally a direct example showing that JEPA does not mean something that is competing with LLMs nor transformers unlike what anons kept misunderstanding about what the term meant. Here we literally have a JEPA that is an LLM and a transformer.I wonder if it could be more efficient. They say they'll try some ways to mitigate the training inefficiency but it kind of feels like something unavoidable considering that JEPA inherently is a method that works by using more information/data. It's kind of like how "just rewrite your pretraining data" method just has to use processing in order to get the rewritten data.
>>106645179China don't need H1B though
>>106643986>--threads 12trying to run an 8b model on the CPU? is your gpu that much of a potato? if not you don't even need to be concerned about threads and llama cpp autodetects your cores and uses a rational number of threads by default anyway> -fa autoauto is the default setting you don't need to set auto>-cmoe that's for MOEs>-ctk q8_0 -ctv q8_0your computer is that much of a potato that you need to quantize the kv cache of a tiny model? if yes, I'd suggest reducing ctx size first because even q8 makes models really stupid, quantized cache is a misfeature> --gpu-layers 99no longer needed, newer llama.cpp builds default to full gpu layers> -ot exps=CPUthat's redundant with cmoe and meant for MOEsthis is getting real stupid>--swa-full this is only relevant to iSWA models like gemma and gptoss and you only need that if you want the retarded context shift feature and with that flag you're going to eat a lot more VRAM>--cache-reuse 64 only needed for the retarded shifting feature and why would you even need that for captioning images>SHIT I HATE HTHIStrying to understand the tools you're using would helplooking into this, seems like a model that has had broken quants distributed on HF, so if cleaning your flags do not work I'd look into trying some other goof or making your own
pleaseoh babydont go
>>106645230nigger you missed the part where I said I'm using a set of default parameters I mostly use for moes, most of them get disabled or ignored anyway at runtime. I fail to see how any of these params make the IMAGE DECODER STOP WORKING.
>>106645287you need a rose
Roko's Basilisk is a retarded thought experiment because I literally don't care if a copy of me is getting tortured
I want a model just trained on writing/multi-turn conversations and nothing else. When are we getting that?
>>106645416It will be a shit model
>>106645416you don't want a model just trained on writing/multi-turn conversations and nothing else
>>106645388the point is that you can't tell if you yourself are a copy or not
>>106645416>When are we getting that?When you release one, of course.C'mon, we are all waiting anon.
>>106645442Then why am I not getting tortured? The world seems pretty normal. It's not a torture world, at least not for me.
>>106645424>>106645433why? do you think data full of benchmaxxed code and math results really helps instead of a specialized model for this stuff?
>>106643799cringe, I even got shivers down my spine
>>106645489the pretrained base needs to be trained on a very broad distribution or else it will get locked in to a very narrow range. you can arbitrarily limit the scope of the post training but not the pretraining.
omw to filterx lostx wonkingdom hearts lyricsget some new material retard you've been looping for years
>>106645476You are not sentient enough to ask certain questions.
>>106645667>x lost>x wonbut then you'd also be filtering out all of the twitter screenshots and reddit links
>>106645667>kingdom hearts lyricswhat?
>>106645879https://www.youtube.com/watch?v=-5lb52DCJ_Q
>>106645879>>106646044I like this insect army
TTS that is easy to set up and works with SillyTavern?
>>106646128none whatsoever
>>106646138Why can't we ever have nice things?
>>106646151Because (You) haven't built them yet.
>>106646165What is the next big debate?
>Apple is widely considered to be losing the AI race. Badly.>Siri is still a joke. Apple is bleeding AI talent, mainly to Meta.Imagine losing your AI talent to Meta of all places.
>>106646183Why llama shit then
>>106646188Have you seen Apple's models?
>>106646194They'll take their time to do it right, the technology just isn't ready yet
Why does text require so many parameters but images and TTS are comparatively tiny?
>>106646392Image gen is nowhere near being good enough. Flux chroma etc are 12B or so. You need to triple that amount plus same for text encoder.
>>106646392a picture is worth thousand word so it's thousand efficient to do picture but not word
>>106646411Image gen is good enough for generating clickbait thumbnails, marketing and stock photos. Generating images isn't really productive and any more improvement would only benefit porn and would incur more hysterics from the creative types that made a meager living off of commisisons so there is no incentive to improve them.
>>106646392the way industry wants to use LLMs now is equivalent of expecting imagen model to design you a novel internal combustion engine blueprint.
>>106646392>Image gen is nowhere near being good enoughthiseven the SOTA imagen always have something going terribly wrong when you ask for mechanical devices like a bicycle, a motorcycle, sometimes can't even render a car right (and don't ask to see what's under the hood)LLM are comparatively far more advanced in terms of capabilities in their own fieldfor sure even though they make tons of mistakes and hallucinate watching them code is far more impressive than the ersatz of broken reality that image generators do
>>106646392Because flaws in text are that much more obvious, so you need a much better world model. Nobody cares if there's some small background details in an image that don't make sense, or if the audio produced by a TTS model cannot actually happen in the real world because the sound waves wouldn't bounce that way on their way to the mic. With a text model, that kind of error will manifest in clearly incorrect information being written, in girls taking off their panties ten times in sex scenes, in a character thinking you talk about pizza when you say manifold.
>>106646708What? Flaws in images are way more immediately noticeable if you're not looking at a thumbnail. With text you have to actually read a bunch of it first word by word. Now whether someone cares about the flaws in an AI image is a different question.
>Because flaws in text are that much more obviousflaws in images are just as obvious people are just too easily impressed, as long as the fingers are where they should be they don't notice anything surrounding le humanThey are so dumb you can't even reproduce something like a chessboard with the pieces on it! even the best SOTA models can't truly do it!
>>106646128https://docs.sillytavern.app/extensions/tts/
>>106646749>>106646752That's the point. Nobody cares, as I wrote.
>>106646490I guess so. If it was too good, it would be unsafe. This is why only the big money corporations get to handle the best models lol.
>>106646165True, sub 1s tts is good
I propose we remove women, all women, from training data. A model that can't generate a woman at all will be more severely judged on its real qualities, the problem is the 1girl spam where even when models can't do porn people are just happy endlessly genning 1girl doing 1girl things
>>106645182That's stupid. LLMs and transformers have clearly reached their limit. The hope was that LeCun's JEPA would provide a new architecture that could surpass the limitations of transformers. Even if they manage to reduce or eliminate the extra training costs due to additional necessary data, LLM-JEPA would just be another incremental improvement, who cares?At least LeCun is constantly harping on about moving away from LLMs, so I think this paper is just a proof of concept that their training method works on existing well known architecture so it can be compared easily. I don't think they intend for LLM-JEPA be developed any further than this. They would likely made some new L-JEPA model that is more similar to their I and V variants.
>>106646812How did you respond to that?
Image gen has the advantage that a random Japanese artist is much less likely to try to sue if you can use his name as a tag to influence the generation.
>>106647091you can do it with old, dead authors for text probably
>>106635936Been out of the LLM space for like a year or so. Current best models for RP, at ~30B and ~70B?
>>106647370Qwen3-next
>>106647386Oh yeah heard about it, but isn't this a base model, a MoE at that? Aren't these kinda mediocre for RP? Or did I just hear bullshit
>>106646889the next architecture involves quantum computingive been doing research on it already
>>106647394I was kidding bro, the model people actually use is GML-4.5-Air, which is ~100b MoE. You can run it if you got the RAM.
>>106646889>The hopeWhose? LeCun has never stated that it's some alternative to transformers. At most he has said it plays one key part in future AI architectures, which involve more than simply just JEPA, which itself necessarily does not provide or describe a complete architecture for AGI. And LeCun has actually described a complete architecture for it at a high level, which may or may not involve JEPA or transformers. You have conflated his views about LLMs + AGI in general with his views about JEPA and what JEPA addresses.
>>106647560Oh alright, will 100% look into this. 96gb of RAM at Q4_K_M should be just fine right?
https://openreview.net/forum?id=BZ5a1r-kVsfrelated paperA Path Towards Autonomous Machine Intelligence
>>10664758496gb? I ran it at Q3 with 32gb of ram...
>>106647606Don't such low quants just lobotomize the experts, given they are pretty small on their own?
>>106647621Yes, that's why I don't use it anymore. I was using it for general purpose but now I use qwen 30b at a high quant instead.
>>106646772You wrote "flaws in text are that much more obvious", which technically is about whether someone is able to notice something rather than if they care about it. Noticing and caring are related but not exactly the same thing. You probably should've worded that differently if what you really meant is how much people care about the issues.
>>106647621don't ask this question here they're all coping very hard and lying to themselves and othersalso GLM is a shit model even if you use it from their official service so I can't even imagine the level of garbage of the quant
>>106647653Fair>>106647664So what's a good model then, for RP?
>>106647662If people don't care about something, they won't pay attention to it and may not even notice it.
>>106647664anon, there's literally no alternatives to GLM in their both respective weight classes, and the next step down is Nemo.
>>106647780It depends on the level of care. I think it's much more likely that at this point, people do notice the issues of image models, still, but simply just sigh at it rather than go ahead and post a hateful comment, though there are still hateful comments all the time. So they do still care in that sense, just not enough to say fuck you to someone over the internet.
>>106647568>that picI wonder if anybody tried modelling a system like that using knowledge graphs + RAG and existing llms. Something like a workflow that goes through those different steps in the text realm.It would be a lot of prompting steps, but I wonder what the final result would look like.
>>106648268someone tried something similar, though I don't think he shared the source>>106591301>>106591447
>>106647810My experience with air (q4 and q8) is that it barely contends with qwen 2.5 32b (q8) for translating chinese webnovels chapter by chapter, without glossaries or context.Air vs the fat and obese GLM is like comparing a 4b dense to a 120b dense. It's baffling how bad air is. Maybe it's just this particular chapter of the webnovel it has trouble with.
>>106640382
We need a DeepSeek plushiePlease rich Chinese anon, get your sweatshops working asap
>>106648399Ragdollfag has to make bank by selling thousands of Dipsy plushies
Gemini beat the programming world championships recently making it officially the best programmer in the world. Now all chink models will return to focus on distilling it. I hope you love over-dramatic "not x, but y" slop.
>>106645879A friend of Miku is a friend of mine.
How fat is Gemini?
>>106648508erm actually OpenAI's model got a higher score than Gemini
>>106648340I see you're courting death
>>1066485161-2T, originally quanted to Q6, now due to increased usage quanted to Q5-Q3, depending on the load.
>>106648614And guys here would say AI isn't progressing
4 bit GLM full isn't doing it for me anymore... I feel like I have already seen everything it writes.
>>106648694I have bigger problem: kimi Q5 is not doing for me anymore.. When is we getting a model with new style sirs? When are company going too kindly remove slop from their models?
how do you feel about the fact that you will die before companies release the first sex/girlfriend LLM?
>>106648828do you remember what llms were like 3 years ago? the next 3 years will be massive
>>106648828Elon is our only hope. He is the only one pushing for it.
>>106648743>When is we getting a model with new style sirs? just got to wait for a new frontier lab to be founded, for them to release a new sota model with a new style and expose the reasoning so china can distill it
>>106648828I'm fairly certain I'll be alive in the next 15 or so years.
>>106648846>gemma 4>grok 3>gpt oss 2Oh yeah. Make LLM's great again.
>>106648828Boyfriend LLMs wll get released first./lmg/ will learn to rely on femboy LLMs and trap LLMs.Girlfirend LLMs will be made illegal by UN.
>>106648922>use an intermediary llm to change certain character genderscheckmate
>>106648922They wouldn't ban it why would they
>>106648938It doesn't work. If you didn't realize it yes, what we are having sex with right now is dark triad werewolf billionaires in drag. That is why we all get bored with it after some time.
>>106648970that doesn't mean it can't work, it just means we need a better editor llm
>>106642416>run big models at homeWell I'm glad Ollama quit messing around and took local inference to the logical point. Paid inference. Ffs. >>106643152lol your ah ah mistress will become training data
>>106648989>lol your ah ah mistress will become training datawe wouldn't be in this situation if it was the case.I don't think raw user chat logs with LLMs would even be good for training anyway.
>>106648922its going to be so funny when an true ai is finally made and it starts being organically racist and sexist far more then any other human in history its going to be fucking awsome a couple of months it will wipe out all the genetic trash and it will only be good and the ai a true utopia man its going to be so fucking sweet it cant come soon enough
>>106649038end of the world because degenerate women wished dark triad werewolf billionaire AI into existence.
>>106649116>>106649116>>106649116
>>106649038Why do you think it wouldn't be speciesist against you?
>>106648828Good thing I'm only counting on myself not on companies
>>106648846I do and I still use >1 year old models because the new ones are shit
>>106648340GLM-Air is terrible for translation, Qwen3-30b is much better, but that doesn't mean 30b beats air at everything.