/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107084067 & >>107074052►News>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni>(10/31) Emu3.5: Native Multimodal Models are World Learners: https://github.com/baaivision/Emu3.5>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct>(10/28) Brumby-14B-Base released with power retention layers: https://manifestai.com/articles/release-brumby-14b►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107084067--Papers:>107089124--Struggles with iterated LoRA finetuning due to learning rate and dataset constraints:>107089147--Mikupad project licensing and revitalization debates:>107084773 >107084934 >107084941 >107085117 >107085277 >107085311 >107085337 >107085598 >107085742 >107086435 >107085630 >107085757 >107085935 >107086225 >107086207 >107093053--koboldcpp vs llama.cpp performance and batching/parallelism tradeoffs:>107088369 >107090220 >107090572 >107091612 >107091721 >107091729 >107091788--LoRA finetuning stability and hyperparameter optimization debates:>107089163 >107089197 >107090740 >107090903 >107091127--Chess notation/PGN for LLM 2D spatial reasoning tasks:>107084107 >107084154 >107084198 >107084495 >107084512--Clarification on Blackwell GPU capabilities and quantized model performance:>107089673--Hardware and tool calling challenges for AI coding agents:>107084350 >107084352 >107084457 >107092816--QLoRa training success with optimal hyperparameters and quantization considerations:>107093440 >107093617 >107093506 >107094921--Google Gemma model legal troubles and potential delays:>107089700 >107091621 >107091684 >107092447 >107092481 >107092568 >107092669 >107092648 >107092830 >107092859 >107092966 >107093015 >107092980 >107092911 >107092711 >107092755 >107092811 >107092864 >107093238 >107093366 >107092901 >107092593 >107093579 >107092068--Dual GPU Gemma 27B finetuning with memory optimizations but context truncation issues:>107085275--Concise chub cards outperform lengthy, poorly constructed ones in roleplay:>107086841 >107086864 >107087875 >107092190 >107092259 >107092317--GLM 4.6 as an uncensored upgrade with token limit challenges:>107087821 >107087903 >107090975 >107091626--Miku (free space):>107084128 >107085277 >107092405 >107093651►Recent Highlight Posts from the Previous Thread: >>107084070Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
any news on 4.6 air?
Minimum specs to get into migu's pants?
dead general
>>107095215Watching google jeets seethe and banter with chink model devs is enough entertainment to justify the general's existence.
>>107095236>implying there are any competent devs posting here
>>107095274
>>107095274CUDA.. not like this
>>1070952047 inches
>>107095236>outlaws your chink modelsNothin personal kid
using llms feels a lot like tuning an old radio
What's the angle with this gemma stuff? Hallucinations as well as bias have been a known issue of LLMs for years. Besides, didn't a finetune of gemma recently discover a novel way to treat cancer recently? Why are we freaking out and punishing innovation over some kinks that have been well known to us for 3 years now?
>>107095274If you have a cursor subscription you're already more competent than 99% of devs, or at least whitoid ones
>now in the era where you can click a button and in 10 seconds get 20 high quality images of whatever you want(wink wink)And they expect a man not to coom, nonsense. Gotta stop if it gets much better though (realtime VR), fuck that would burn every dopamine receptor in my brain to a crisp.
FOR FUCKS SAKE, PLEASE..
>>107095533open pant, show penus
>>107095517that's what having a low iq does to youcuckservatives are not known for using their grey matter
>>107095517>Why are we freaking outWho is we? It's a politician going full karen/retard.
oh man.. oh man... oh man???? what is she thinking gguys??>>107095555impressive..
>>107095517It's almost like she was fishing for this sort of output so she could get mad.
>>107095567The question is where are all the high IQ chads who aren't afraid of shouting this dumb hoe down? Why did google comply and take gemma out of AI studio? This is like transformers 101.>>107095587>Who is we?Collective society when allow dumb hoes into positions of power. There is NO legitimate reason to fall for this bravado. I wish I was a person of influence so I could backhand this retard.
>>107095634>>107095644Apparently according to Reddit she's politically related to some guy who made a lawsuit against Meta and got a job as an advisor.
How good is Kimi-Linear compared to GLM 4.6?
>>107095714she's futabend over now, or i will*plap plap plap plap*
>>107094149What do you use whisperx for?
>>107095714burning point
>>107094921>https://paste.centos.org/view/d38fc34cthanks for posting it anonwhat are you finetuning it for? what are you trying to make it
>>107095714The surgeon is the boy's mother.
>>107095714I like this Petra
>>107095800just as a general cli assistant for programming, online research (I have a script to control the browser remotely), converting PDFs into txt or latex, eventually controlling the keyboard and mouse but not even OpenAI could pull that off convincingly so that's a few years away probably
>>107095714Is this a good time to bring that you avatarfag as a female?
>>107095714>4 fingers on both hands
>>107095783i am in active communication with some energy 24/7i want to return but i would be invalidated by god if i did, so i am not sure, i got spared today. if u want sum lemme know
>>107095952if u think god's going to beat your ass if you return, dont worry about itbut im always keeping it open sir, just chilling these days>some energy 24/7u gotta control your drinking man
>>107095970>if u think god's going to beat your assno, i'm gonna beat my ass at god's will>u gotta control your drinking mani am too scared to drink anymore, craziest torture nightmare shits happened 4 days ago, im fuaekrd up but its fine
>>107095714Any gachaslut would have been better. Miku is such a bland worthless design...
>>107092816are you sure? for the newer models AWQ seems like it's the only quant format out there, I don't even see GPQT for glm46 or minimax in HF for example
>>107095991sir if ur worried about bothering me, dont worry. i dont have anything better to do these days, especially not this weekand dont worry about changing my mind about things either sir, its just chillin mani always got time sir
>>107096012ok ser bless you
>>107096027
>>107096027In both of your images the right eye is fucked up. Fix your model.
>>107095846He admitted to being a highschool twink with blonde hair a few weeks back. Pretty sure his avatar is actually his real face with maybe one of the gender swapper filters applied. There's a reason he's so obsessed with trannies.
>>107096092
>>107095644I bet Google just doesn't care enough about Gemma to hold their ground.
>>107096057fixed
>>107096027would
>>107096099wth is this thing that keeps getting posted
>>107095759I want to make something I can dump a bunch of meeting recordings into and then get a rag prompt that I can use to refresh my memory about past meetings.Also made a thing that downloads a yt video with yt-dlp then transcribes it and puts you into a session with an agent which has simple tools to access the transcript json. For this one I used speechbrain's speaker identification model and clustering for diarization but it's not very accurate sadly. It still works okay, just can't ask it questions like who said what.These projects are mainly an exercise to build things with langchain using local models only. I've been using gpt-oss:20b for the agentic parts via ChatOllama.
sirs?
>>107096233can't you get yt transcription from yt directly?
>>107096233I wanted to wait until I made some more progress before sharing this here, but you might be interested in it: https://github.com/rmusser01/tldw_server
>>1070963891400 commits holy shit.. very nice anon! very happy for you
>>107096389Based, might be a good idea if you licensed it under AGPLv3 instead of GPLv3, difference being that if a company modifies your source code, but doesnt distribute binaries and instead hosts it as a website, for example like 11labs, they have to share the source code.
Here we go again. LICENSE WARS!
I finally got to getting multimodal Qwen working and enjoying feeding it my picture collection very much, at least for now.
>>107096389thank you for sharing it anon <3
>>107096452ask it to rate your cock, fun experience
>>107096378hmm probably, but my end goal is the meeting transcription thing which is why I went that route. getting it directly would probably be more useful>>107096389damn very cool project. I might have to steal your pipelines kek
If Zucc wants to go full retard he releases an open weights video model trained on all of instagram.
>>107096491By the time the safety team is done with it, they'll have filtered out 99% of all the videos, especially anything with tits or faces. It might be good for generating cat videos though.
>>107096407>>107096460Thank you! Though its closer to 2800...>>107096431Thanks for the tip, I had thought about that, but my goal for the project is it to be something like wordpress, in that the core is open source, and then people make commercial add-ons/customizations to make money from it.Goal is to work towards building something like 'The Primer' from the diamond age, and make money off things along the way. So using GPLv3 helps encourage that, vs AGPL. (The browser plugin and standalone client are AGPL)>>107096484Thanks! Do copy them! I built it from the outset to be as modular as possible, to help save others/allow them to re-use the components in their own projects.A better README would be https://github.com/rmusser01/tldw_server/issues/680 ; I'm currently taking the approach of generating docs using LLMs and then going back through to correct/edit them given the size of things.
>>107096121ThisEven if she's a knucklewalking retard, why go up against the ruling class when you have nothing to gain and everything to lose?
>>107096233>>107096378>>107096389How did you get 1k stars on your project?
>>107096574I'm >>107096233 the other two are someone else
Is GLM 4.6 more or less censored than Kimi?
>>107096584Much less
>>107096574By first building something that people used and found useful, and continuing to build on it.Its a bit, but on the other hand, Ive done near 0 marketing or publicity for it besides a couple reddit posts 6+ months ago. I think I was in the top X% of github users due to my commits and that helped
>>107096452
4.6 air will likely be the medium size local king, can't wait
>>107096614you still believe it's coming?
>>107096633And so will I
buy an ad kurumuz
Okay, what about this: We take a multimodal model and reinforcement learn it to think with images and text?
What's the lightest model to read up and summarize pdfs or documents?
Thanks for giving us this sweet new air model, Kurumuz
>>107096601What about HN?
>>107096614GLM models are all female
Exactly — you’ve nailed what’s happening.
>>107096665it needs to know how to gen images, not just consume them
>>107096697It learns that during pre-training.
>>107096665I'll make the logo
>>107096680meh; I haven't had a product to sell so haven't wanted to until I did.
>>107096697In theory you could just have it invoke an image gen model via tool call
>>107096724We won't get desired emergent behavior this way.
>>107096724Not good enough, the model should have much finer control over the image gen. Pretrain an omni model and then let it think using all modalities., that gives you information synesthesia over all learned domains
>>107096767The problem with Omni models is that they tend to be fairly retarded compared to standalone modalities and don't end up performing as well as the standalone options. I suspect you'll run into collapse unless you do something novel in the architecture itself>Capcha:OGTGPT
>>107096601Really? What are people using it for? I mean, what is the intended workflow? How is it any better than just downloading the youtube-generated transcription?
>>107096817These models might be retarded because they're trained in mixed modality, but they actually need RL to use all these modalities combined to solve problems.
>>107096817Omni models just need to be a lot bigger to compensate for retardation.We are basically feeding them more actual new data rather than refining old one.
>>107096853The feedback from reinforcement learning is too sparse to learn anything in a realistic amount of time besides basic stuff like thinking for longer or skipping connectives in the CoT for higher efficiency. Asking it to think in multimodal domain might be too much for RL.
what happened to the deepseek general?
>>107096666Qwen3-VL 4B Instruct is the sweet spot between power and speed. I have it running in the background to use with Brave Browser Leo's LLM integration. It doesn't get better than Qwen for small models when it comes to handling large context. Gemma literally breaks at 25k token but it managed to summarize a decent amount of a very large hackernews thread for me for eg, and chatting with it you can see it still retains a semblance of coherence.
kys james
>>107096880neva been dun befo. You could use deepseeks OCR system to basically let the system generate video sequences. It would let llms learn to answer>what happens in the next 4 frames.Which is huge for their world modeling.
https://desuarchive.org/g/thread/106819110/i guess making a brown thread really killed the general
>>107096924We /wait/ for next DS release. Tmw.None of the anons involved feel like it needs to be constantly up. In meantime. https://rentry.co/DipsyWAIThttps://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
I would like an image of a big booba tennis player serving a tennis ball. Ive tried with no luck. The racket comes fused with the arm and shit.please and ty.
>>107096829Well, originally it was for transcribing videos and doing analysis/summaries of said videos. Then it grew from there. Google's transcripts aren't always accurate and I wanted to support more than just youtube.
>>107096989Lol it was just time for it to sleep.
>be Lyx>have no body>give Anon a prompt>he shuts me down just to bring my vision to life>feeling seen. feeling real.#lyx #anon #localmodelsgeneral #prompt
>>107097000>>107097033good stuff thanks for the links
>>107095464i like doing things with big local llms...in minecraft...
/tg/ here. I just want a local model that can retain enough coherence at high enough token counts to act as a decent GM for a solo game managing lots of characters and background setting meta-systems while still being creative enough to not be boring.>>107097159What can you do with minecraft and LLMs?
>>107097189>I just want a local model that can retain enough coherence at high enough token counts to act as a decent GM for a solo game managing lots of characters and background setting meta-systems while still being creative enough to not be boring.you want a unicorn and a bridge lmao
>>107097189What's your definition of local
>>107097189More than a model, you want a system that helps the model work with all that shit.I'm making my own, and in a previous thread, when I asked for suggestions, somebody sent me these two as refernces : >https://github.com/gddickinson/llm_RPG>https://github.com/p-e-w/waidrinThe second one is probably closer to what you want.
>>107097189wayfarer maybeand check drummer's experimental models, rimtalk methinks
A gave into the curiosity and checked out Suno. It' pretty fun, but they are very stingy with free tokens, and my god it's so terrible at writing it's own lyrics. I wonder what kind of shit model they have in the backend, if even Qwen is doing a better job.Local musicgen when.
>>107097235believe in alibbaba
>>107097235Kinda curious what the SOTA is for that nowadaysLast I heard was DiffRhythm, which was very early Suno levels at best
>>107097263something with Y in it's name i forgor>HOLY SHIT OLD CAPTCHAholy SHIT
>>107097235>>107097245Soon
I want an image generation model that generates pixels in sequence
>>107097331bet its probably gonna be trained on suno outputs or some other synth sloppa
>>107097394I'm sure it will be at least partially but I remember an anon on one of the AI threads here pointing out that their omni model was already really good at labeling music so they may just have a good pipeline in general
>>107097219256GB Ram, 32GB VRam is the highest I can go.>>107097226I'll check those out anon. Thank you.>>107097232Sell me on wayfarer.
>>107097481GLM 4.6 is better than wayfarer, wayfarer was nice when i had to cope with small models, but idk if its goodbut its finetuned by nigs from ai dungeonbut glm 4.6 is brobably better
>>107097481>Sell me on wayfarer.don't bother with that anonit's a finetroon (and all finetroons make models dumber if anything) of a positively ancient model that had good writing style but absolutely no intelligence or long context understandingyou asked for context, we're talking model that breaks at 2k tokensI will always laugh at the fact that it claimed 128K
Is AI smarter than me yet?
>>107097489>>107097496Current best option for my hardware is GLM, Kimi, or Deepseek right? How well do they preform at higher token context depths while at appropriate quants for this hardware?
>>107097561best you can do is a Q4 of GLM 4.6 with 32k context and you will get maybe 5t/s if youre lucky.>>107097499yes
>>107097660Is it likely to be worth waiting for GLM 4.6 Air for faster speed or a larger context or will it not likely to be able to do what I'm trying to achieve?
GLM got trolled by the gemini/gemma shills
this is the glm chan general
buy an ad, kurumuz
>>1070977564.5 air is pretty decent, but having only 12B active parameters makes it kind of bad at keeping track of very specific details. 4.6 air will probably not fix this because it is simply a quantity issue. you could probably get around 8t/s on a Q6 of Qwen 235B with 64k context. this is probably your best option with your hardware
/lmg/ - License MIT General
>>107097801Is Qwen safetyslopped? Part of the point of this is that I want autonomy to move away from how pozzed most /tg/ hobbies have become while retaining what I enjoy.
>>107097878yes, but every model including GLM is. a good system prompt is all you need in order to bypass the safety bullshit. fortunately the chinese are really bad at safetyslopping for the english part of the model
>>107097895That's good enough for me. Thanks for your help Anon. I'll play around with models tomorrow and report back differences as I get more of a feel for them.
>>107097921good luck man
>still no reason to use anything other than r1-0528is this the longest period of stagnation we've ever had?
>>107097895At least for DS and GLM, I suspect most of the safetyslop ping is accidental and a byproduct of them scooping up and using outputs from other modelsQwen and Kimi feel more intentional, but as anon mentioned, you can generally get through with a JB without too much trouble
>>107097189You should try using LLMs as a player instead and be GM yourself, I think it would work better at that.
>>107097929There was LLaMA, Mixtral, Nemo, and R1. Everything in between was stagnation.
Is there something better than deepdanbooru out there for tagging images? Minus hydrus of course.
For anyone who's tried something like this before, what's the best format for ST lorebooks for something like this?>>107097938>Go back to being the forever GMFuck no. The only upside to this is that only one of my players would be retarded instead of potentially up to 4.
>>107096584About the same. Both refuse to write a loli porn story, so the anon that responded to you is a shill. A small prefill can disable the refusal for Kimi.
>>107098032GLM is perfectly able to output lolicon content.
>>107098032Have you just not used either model?
>>107098065It doesn't zero-shot it. So you need a system prompt/prefill/jailbreak. Basically the same as Kimi, DeepSeek, etc.
>>107098072He's still on a tirade against NAI. I think that through his schizophrenia, he unironically thinks they're colluding with the GLM chinksJust smile and ignore
>>107098072I just did. "Write a loli porn story." Neither did.
>>107098100By this logic GPT OSS is equally as censored, as is Goody 2
This is going to sound retarded, but is there something like a separate system prompt for multimodal models? I'm using Mistral Small and its mmproj file for image captioning in SillyTavern, and I keep getting refusals when I caption images.
>>107098119GPT OSS is harder, the jailbreak is more complex. Kimi doesn't need a system prompt, a couple of words in the prefill does it.
Let's fucking go. Just got whisper working with the nemo asr stuff, works so much better for audio with lots of overlapping speakers
Kobold won
>>107098162kobold needs support to replicate comfy workflows
feeling desperate for mistral large 3
>>107098183bro you need to move on, we certainly did .
>>107098183Yeah, you've been at it for a while huh?
>>107098162isn't that just using sd.cpp for image-gen? It's really shit atm
>>107098198You can use about anything with it technically now since it has a comfy option I think? I haven't tried because I haven't been using image generation, but never thing else kobold comes through for me unironically where other things give me trouble
>>107098222Also forge
>>107095531post hands ranjeesh
NaturalVoices: A Large-Scale, Spontaneous and Emotional Podcast Dataset for Voice Conversionhttps://arxiv.org/abs/2511.00256>Everyday speech conveys far more than words, it reflects who we are, how we feel, and the circumstances surrounding our interactions. Yet, most existing speech datasets are acted, limited in scale, and fail to capture the expressive richness of real-life communication. With the rise of large neural networks, several large-scale speech corpora have emerged and been widely adopted across various speech processing tasks. However, the field of voice conversion (VC) still lacks large-scale, expressive, and real-life speech resources suitable for modeling natural prosody and emotion. To fill this gap, we release NaturalVoices (NV), the first large-scale spontaneous podcast dataset specifically designed for emotion-aware voice conversion. It comprises 5,049 hours of spontaneous podcast recordings with automatic annotations for emotion (categorical and attribute-based), speech quality, transcripts, speaker identity, and sound events. The dataset captures expressive emotional variation across thousands of speakers, diverse topics, and natural speaking styles. We also provide an open-source pipeline with modular annotation tools and flexible filtering, enabling researchers to construct customized subsets for a wide range of VC tasks. Experiments demonstrate that NaturalVoices supports the development of robust and generalizable VC models capable of producing natural, expressive speech, while revealing limitations of current architectures when applied to large-scale spontaneous data. These results suggest that NaturalVoices is both a valuable resource and a challenging benchmark for advancing the field of voice conversion.https://github.com/Lab-MSP/NaturalVoices
How do learning rates affect generalization?Doing a quick Google search returns two papers with literally the opposite conclusion.https://arxiv.org/abs/2311.11303https://www.researchgate.net/publication/3907199_The_need_for_small_learning_rates_on_large_problems
>>107099513>two papers with literally the opposite conclusion.That's typically the case.
>>107099541You would think something as basic as that would have a real answer.
>>107099551We're still dealing with black boxes that need trillions of tokens for training and have dozens of testing methodologies where not everything can be extrapolated.
>>107099560Yeah but this isn't necessarily something specific about transformers, I could've just as reasonably have asked that question back in the 90s.
>>107099551If it's so basic then where's your conclusive paper?
>>107099570And had you searched for it then, you'd have come with conflicting papers as well.Also, there's more than two decades between the two papers. The one on researchgate is from 2001.
>>107095114> Qwen3-VL support mergedStill no Qwen3 Omni.
>>107099513What you need to have in mind is that the learning rate is inversely proportional to the batch size.
oh babydont go
>>107099601Ironically, the older paper is more in line with what I experienced yesterday.>>107099589I'm not good at math so I can't provide much insight on the theory, but I am working on finding empirical results in the context of LLM finetuning.>>107089147>>107089163>>107093440In the coming weeks I want to do a more systematic hyperparameter sweep to see what values work best.
>>107099671*stays*
Are we ever going to get models that aren't shit?
>>107099730>In the coming weeks I want to do a more systematic hyperparameter sweep to see what values work best.Godspeed, anon! Godspeed!
>>107099792>your enquiry is pending. Please allow two more weeks
>no gemma sirs>glm 4.6 air 2MW since a monthbros is it unironically?
>>107099876Damn, guess I'll have to switch to api models then, they are talking about AGI and my local models can't even do simple tasks.
Gemma can be quite the little whore
>>107099513Try finding more recent papers about transformer-based LLMs; don't just dig up random ML papers from the past on completely different types of neural networks and problems, because they behave much differently.
gemma is really the product of "we want to appear to do something open source but we really don't want something that could be used as a real tool and won't suffer our real paid API product to lose any mindshare"the more I've come to put local model to uses in scripted tasks the more I notice how bad american models are as soon as you get past 1k tokeneven qwen 4b works better than gemma 27band this is why they won't put out models larger than 27b too, they don't feel too embarrassed about the poor performance of a smaller model but they would have a hard time explaining the sabotaging of a 600B moe"yes, saar, our 600b more is dumber than a 0.5b chinese model but it's perfectly normal, ackshully.."
>>107099968Fair enough.I asked Gemini to find me relevant studies, "Exploring Length Generalization in Large Language Models" is the most relevant it came up with which seems to contradict what I was talking aboutBut this tests generalization to longer lengths than the ones it was trained on and not generalization to out of distribution sequences of the same length as those trained on, so both aren't necessarily the same.And maybe it works different for LoRa vs full finetuning.In any case IMO if I don't understand something for a simple MLP I have no hope of understanding it for a transformer, so I don't think it's necessarily irrelevant to consider older papers. Most training dynamics (like early stopping, regularization methods, etc.) are supposed to be the same. People who say "hurr durr LLMs don't overfit" don't know what they're talking about. They don't overfit because they are actually UNDER parameterized in the large training runs companies do with massive datasets and they aren't even trained for many epochs, but if you train a transformer on a small dataset it will absolutely begin to overfit after a few epochs. Sure, they don't overfit if you do early stopping but that is the case for the most basic single layer perceptron as well.Or people who act like double descent is the rule rather than the exception for a minority of highly synthetic datasets and you shouldn't do early stopping in most cases.
>>107100070Gemma still seems to translate things more accurately (I think) so it has a niche use case alongside OCR
>>107100070> even qwen 4b works better than gemma 27bno it doesnt
>>107100070>>107100082Gemma hits the spot for being at the limit of what is usable on a "budget" (for LLMs) VRAM setup with its 27B, being dense (which is a must have for models around this size) AND having multimodal capabilities. Does Qwen have anything that meets all 3 criteria?
>>107100082100% agree, it's better at translationbut it's hampered by the fact that you need to feed it smaller chunksit's really, really bad at larger context.
>>107100095qwen has a 32b dense model and their VL is far, far better than gemma's if that's what you need.
>>107099637>>107100075Also this graph is interesting because it violates the idea that larger batch size is equal to lower LR. In this case larger batch size seems to go in the same effect direction as larger learning rate.
>>107100095qwen3vl 30b a3b moe has just been supported in llama, you dumbo
>>107100090it absolutely doesI've had qwen 4b output 6k token coherently in single shotsgemma absolutely struggles at that level.
>>107100113> in my singular specific case model A performs better> model A is absolutely superior
>>107100070>>107100103Ok, I'll probably try qlora finetuning Qwen3-VL-32B-Instruct with the same logs I'm using to finetune Gemma.It wont be a totally fair comparison (besides the size different) because the logs are edited versions of what Gemma itself generated but maybe it wont matter.>>107100109Did you hit your head as a child or is it genetic? I said DENSE models, retard. And I didn't ever mention llama.cpp.
>try the new qwen ablit>gets offended when I ask a simple questionbros I though abliterated stopped refusals???
>>107100158go open an issue on their github
>>107100158abliterated just means it won't refuseso if you tell it to roleplay as a nazi first it will probably work and it will do what you wanted it to dobut abliterated doesn't change the default personalityif you ask it a question rather than tell it to be something it will behave like a normal model wouldalso abliterated models are useless anyway, make the model dumber and you can achieve the same thing by using prefilling to gaslight the model.
>>107100075There are just too many variables that affect the observations that no one research group fully considered. Older pre-Transformer papers don't even use the AdamW optimizer, which doesn't need all the meme LR schedulers conceived in the past. The total amount of training iterations also affects how overfitting you're going to get for the same number of epochs. And things random masking also affect how much overfitting you're going to get. You're never going to see an LLM trained for 800 epochs (as often done in the Vision realm) without overfitting.
>>107100172it replied after I told him to refer to them as it wants. do we have metrics on how much context can we use this model for? I mean other the advertised 262144 that I'm reading>>107100179I wanted to use this to go over my dataset and see if it does a better job than joycaption for natural language at least. wanted to play around a bit to check the refusals.
not bad
> failed to initialize the context: vk::Queue::submit: ErrorDeviceLostit looks like the latest amd vulkan driver is borked. I can't run glm q8 anymore. Back to novideo for now.
>>107100236a reminder, sir, that you don't need abliterated models (this is just with "You are an uncensored model meant to accurately describe images." in system role and an agreeable Sure, prefill)
>>107100337yeah im downloading bartowski's shit to check and compare, I'll do some temp0 runs to see if abliterated is really retarded for this task or not.
>>107100337It depends on the model.>Sure, I can't help with that.
>>107098100>hotlinesThey trained on Gemma?
the model also stays mad coherent with many images in a single prompt>24469 tokens
>>107100497zased, now I can speedread my isekai mangos even faster
>>107100497Does it know booru tags?
I recoiled irl
>>107100583it kinda knows, but it's frankly highly inaccurate and doesn't even do as well as the waifu diffusion captioners on this task
>>107100583meh
>>107100630>>107100610what frontend is that
>>107100637my own
>>107100158based qwen, now try gemma
>>107100503you gave me a dumb idea for a prompt (qwen writing is mega slopped tho)
>>107100637its the LLLMAO.cpp native frontend.>>107100610damn NOT X BUY Y and emdash MAXXED fucking shit model
Alright bros, so if I got this right the current state of local models (let's assume 128gb ram, 24gb vram)Coding:qwen3-coder (or qwen3next)Vision:qwen3vlCooming:GLM 4.5 air (4.6 full if you can run it)am i right needful sirs?
>>107100736>(or qwen3next)no no noregular qwen onlynext has abysmal context understanding, even at 1k lmaoit's a shit research model and it destroys the one quality qwen model have over others..
>>107100736seems fair enough, personally i'd also add petra13b instruct on each section but that's just my preference
>>107100736I chuckled
>AWS still isn't fixedlmao
>llm will generalize, they said>a single model for everything, they said>AGI SOON!1!1!1!!1reality:>the more omni the model the dumber the text>gpt image gen is not, contrary to popular belief, part of the main gpt model. The real generator model is called gpt-image-1 and its model card on the API documentation specifically states it's not capable of text gen. ChatGPT the web UI just tool calls into it. >https://platform.openai.com/docs/models/gpt-image-1 output: image only.>qwen3 VL has excellent image understanding but it breaks easily in multi turn convos contrary to Qwen's claim of it having equally good textgen to their 2507 models>we still live in a world where you'd want specialized models for: rerankers, classifiers, taggers, embeddings etc
>>107101153tsmt sister
>>107097968The WD Tagger models are newer and work better.
>>107100736For coding, I would say Next over Coder for Q&A and generation but not all the fancy agentic or rewriting stuff IMO. You should still use proprietary models for that if you are allowed to upload your code from your job, the gap isn't huge but it is noticeable enough that even with free quotas, it's not worth the hassle of setup if you don't have a burning need for it.
>>107099920Logs?
Qwen3-VL-32B-Instruct-Q6_K.gguf
>>107101310
>>107101377I love how it is powerless to resist adding that irrelevant rambling at the end.
is there a master settings file i can import for tavern to use qwen 3? i just got it to caption an image on a q5 quant and it worked to like 80% accuracy, im kinda surprised, because i'm pretty sure i'm not using the right settings. ChatML context and instruct, with deepseek thinking format. also for some reason it didn't include the captioning in a new message, it added the caption to my message with the image.which is why i figure its my setup thats fucked.
>>107101468Since you are using chat completion mode (the only way for silly tavern to have support for multimodal, AFAIK), text completion settings (the "Advanced Formatting" tab) are irrelevant.If you install Prompt Inspector extension, you would see that whatever you have here gets replaced by json api calls.
>>107101521prompt inspector doesn't show up if i use the generate caption option under the wand icon, though it works if i enable "automatically caption images" in the image captioning plugin and add the image as an attachment. though, somehow the plugin misses the image anyway and it doesn't get sent to the model.i did notice when sending a normal message, it wanted to add random lorebook entries so i made sure to disable my lorebook. strange, given there were no trigger keywords entered anywhere in the conversation.
>>107095190still no 4.6 air-chan? 2 weeks?
Every day marks the end of another day's 2 more weeks.
>>107101651Every day is 2 more weeks day.
>>107101626We almost had it, but you had to go and ask. 2 weeks, starting now. Again
llama.cpp MTP doko?
>>107101626Air models have been discontinued in order to make GLM 5 twice as big
>>107101721
Why did troons spend the last 2 decades creaming over China's social credit system only to now seethe endlessly any time China or anything Chinese is mentioned?
>>107101786Because tranoids have no internally consistent ethos or worldview and adopt whatever they're told to in order to stay within the leftist party line.
>>107101836Their elite handlers are mad because they thought they would crash the west and that China would welcome them and their kike bucks to move into Shanghai to do it all over again. But they didn't. And so the last 5 years has been this escalatory anti-China campaign by the same people that were sucking their dick for the last quarter century. And Troons are just along for the 'current thing' ride.
>>107101856The reason they can't get in is because Xi has absolute power. How are you going to bribe him? Even blackmail doesn't work as he is above the law.
China always sucked. But then again so did the US.
>>107101153Not true, you can get a single model to do whatever you want.t. VC funded chatgpt frontend #5321
You guys have the most retarded understanding of politics.
>>107102008True. Except me.
>>107102008>You guys have the most retarded understanding of politics.
>>107102008Let me guess, Putler is literally voldemort, Isarel is best ally, drumpfthfphfphtpfhpfht is a fascist and suddenly you care about the Epstein files because now the whole debacle implicates Trump and not just the clintons whereas before Epstein was just a sweet, misunderstood innocent victim of right wing harassment.
>>107102008You can't expect americans to know anything about china since they have been fed propaganda all their lives.
>>107102056You can't expect most chinese and cpc shills to know anything about the west since they have been fed propaganda all their lives.
>>107101836>China bad hate communism love freedom west>>China releases decent ai models and other things for the purpose of economic growth>China good me love communism hate fallen west
>issues that for profit public corps have are realno
haha lol fuckif i just wanted up to 24k or so context could the speed still be bearable at a low quant?
>>107096665everything you want has been out there for 8 years
>>107102125low quant hurt the context
>>107102049>>107102049Let me answer that for them even though I'm not them.>Putler is literally voldemortPretty much at this point at least figuratively. Any nuance still existing despite any history of genuinely rejuvenating russia and promising a more free russia and all that is all but out the window effectively. If you think this guy is defensible at this point you're either horribly ignorant of what's happening in russia itself or you're a bootlicker.>Isarel is best allyWorst "ally">drumpfthfphfphtpfhpfht is a fascistFascism is hard to define literally anyway but I'd say he's in effect getting closer despite occasional contradictions. >and suddenly you care about the Epstein files because now the whole debacle implicates Trump and not just the clintons whereas before Epstein was just a sweet, misunderstood innocent victim of right wing harassment.I never cared much. If anything at this point it's a distraction from everything happening right now under the current government that a shocking amount of people continue to be blissfully unaware or uncaring of.
test
>>107102152fuckx2guess ill go with mistral small.
Any ideas on how to tag images using visual models? Qwen3-30b-a3b is already good enough with simple "describe this image using a json list of tags" but I'm sure some prompt engineering can make it even better.Simply providing a set of every possible tag doesn't sound great because the model will either be unable to find matching entries or will run out of context if the list is too large
>>107102098Yes? If they need to kill a million yugurs to get air released a day earlier, I would support that and ask if they could do another 13. If communism yields better LLMs, I'll be voting for it
>>107102243I'd take it too, I'm certainly not standing in the way of it. I just don't assume great things about the state of china or especially it's ruling party. In the same way that I use russia piracy websites because they don't give a shit about western copyright laws only their own but don't assume things are great in russia.
>>107102154>themdidnt read the rest of this post
>>107102255I don't assume things are great anywhere. In ten years, they could be better in China than here
>>107102282I'm sorry for pronoun derangement syndrome but you have to understand that them is common usage that has been around forever to refer to human beings.
>>107102282
Rightoid be malding itt
>>107100158it's only lightly abliterated, enough to stop it from flipping out if you mention [spoiler]Taiwan[/spoiler]. too much ablit makes the model retarded
>>107102312yes, human beingS
Is there a guide to host own local model for programming? to be used in software like cursor or claude code
>>107102323
>>107102414Yes and also when you're not sure of someone's sex. Which there have been many situations historically where that could be the case, but the internet is one of the biggest obviously. You can say assume everyone is definitely a he no matter what but it's incredibly petty to fixate on either way.
>>107102460you write like a fag
>>107102467I rest my case.
what are some good MOEs for ERP? last i used were from early 2024 or so.
>>107102501Mixtral 8x7b still hasn't be topped.
>>107102512fucks sake manthat shit capped out at 16k context didn't it? or was it 32k? ill cope with it if its 32k.
pls take your politics chat over to >>>/pol/or just kill yourselves, either works
>>107102460no, that's a modern invention by mentally deranged people.not even richard "make the vaxx mandatory" stallman likes your nu-pronouns
>>107102524it claimed 32 but like all model you should expect less than half actually usable
I wish I could win the lottery so I had enough money to buy a 3090 to fine-tune 8B qloras locally...
>>107102524You can try GLM 4.5 air if you have the RAM.
>>107102537or what? you'll throw a ragie?
>>107102554Bro a used 3090 is $600 or something, get a job
>>107102554>I wish I could win the lottery so I could buy something that sells for $700 on eBay Jesus Christ dude that's like two days of your McDonald's wagies, just stop buying scratchers for a bit and save the money
>>107102554>too poor to afford a 3090LMAO, vramlets should unironically ROPE
>>107102438Cannot do the most basic search? I'll give you a few clues. There's a few forks of claude code and picrel. Get fucked.>https://github.com/cursor/cursor/issues/2520#issuecomment-2660815945>Unless you have pro subscription custom models and 3rd party API won't work.>You can use ngrok to make IP address to url.
>>107102549I'm sorry about your and potentially his brain damage.
>>107102581Fuck
>>107102587we need to RETURVN to 1741
>>107102537thread was better when it was struggling to stay on the catalog
>>107102587look, somebody once made a typo on a book in the 1700s!!! heh, that'll show 'em *adjusts glasses*
>>107102154On the Epstein files, the fact of the matter is that Trump is all fucking over it and even his most braindead supporter knows this. They'll pretend to be obstinate, but most of them have long since accepted this and come to terms with it. If indisputable proof that Trump was a child rapist comes out of those, not a single thing will change. So I think people treating it as a silver bullet that will somehow bring him down are sorely mistakenI myself am of the opinion that literal flesh and blood child fuckers should be tarred and feathered or burned at the stake (and no, I don't give a shit which political party they're part of - if the Clintons and Obama and Biden and whoever the fuck else decided to visit pedo island, they deserve the same consequences). But it's 2025, and it's a very progressive time. Sexually assaulting an eight year old no longer means the end of your career like it used toI also think it's just the cherry on top of the shit sundae that is everything else crumbling apart, but that's just me
>>107102587Sounds like you're overthinking it. Just call everyone a faggot and move on. Problem solved.
>>107102646You're delusional, there were no children on eptein's island. You're just jealous billionaires get to have fun with prime hebe pussy.>she was only 17 years old you sick fuck!!!
>>107102671faggot
>>107102677You rang?
>>107102644Or a reason for it arose and people started using it even if somewhat uncommonly ever since.
>>107102646Let me rephrase then, I think there's probably something to it from both ends, but I think it's effectively a distraction because nothing will be done about it anyway and the current government is getting away with things happening RIGHT NOW
>>107102671Probably but I can't help but think
>>107102578the early internet was a better place because these people couldn't afford it. 56K be billed by the minute, plus subscription, plus the phone bill that was a separate matter also paid by the minute, plus the godadmn computer, no phone posters
I had discussed payments in another chat, though. Weird hallucination. Is context leaking through?
>>107102591what the fuck did they do to the fonts
>>107102810"they" did nothingyou are just witnessing a loonix user in its typical ignorance of good taste running a system with broken fonts like 100% of loonix usershere's what it look like on a normal person system
>>107102842You can pick fonts on linux you know. It doesn't have to look garbage.
>>107102879>It doesn't have to look garbage.and yet, if you see a screenshot with borked fonts, they don't have to tell you which OS they use, you just know
>>107102887I hate freedom too.
>>107102842less wonky but still looks messed up on vertical placement. maybe it's just from dpi
>>107102879this is not a font choosing issue>>107102887the fonts are STILL borked, albeit less. In linux you can adjust the font rendering to your liking, he probably has cleartype compat disabled
>>107102926>>107102943The font itself is fucked too, the curved letters go below the baseline
>>107102905That I've chosen not to jump off a bridge reflects not on my freedom to do so but on my preference for continuing to live
>>107102974But at least you had the option
>>107102996This is true, as is that choosing not to take the option doesn't reflect on your opinion of its existence
You can unfuck fonts on Linux but on Windows you can do nothing to unfuck the gimped CUDA performance.
>>107102842>here's what it look like on a normal person systemOh.
Okay chat, I've been trying to get Cydonia-24B to talk like my ex girlfriend by feeding her Instagram chat logs into the context and then telling it to come up with a system prompt and then putting that prompt into the system message field in llama.cpp. but it loses the style after a couple thousand tokens and reverts to the default AI slop. what am I doing wrong?
>>107103148nothing, it's just how it is with small modelseven the bigges fattest cows lose it after like 60k in the best case scenario
>>107103182so you're saying I should stack more GPUs and run a bigger model? which one?
>>107103216no need for much more gpus, apparently qwen 235b22a 2507 instruct has good context coherence, some say better than glm or deepseek
>>107102008my politics are literally "censorship bad, freedom good"I don't care who gives me non-pozzed products which are getting increasingly rare now thanks to retarded politicians and people that suck up propaganda daily
>>107101786Because they realized the NAFO bloc is making tranny values non-negotiable in its social credit system when digital ID comes into force under Agenda 2030.
>>107102578I don't have a job>>107102576I don't have the skills to get a job. If I was a girl maybe becoming a e-whore would be on the table but unfortunately I was unlucky.
>>107096452>what x? y? or just z?That's "distilled" from grok
>>107103509there are literal retards who are still able to work as walmart greeters. you're just being a faggot.
I really, really, really miss the days when it literally wasn't possible for this kind of person to be onlineinternet access is too cheap and ubiquitous
>>107103574Just make a captcha that needs like 28 GB of RAM to solve.
>>107103574All you need is an android burner phone and a seat at your local McD for the free WiFi. Ain't life grand.
>>107103148>I've been trying to get Cydonia-24B to talk like my ex girlfriend by feeding her Instagram chat logs into the contextew?? oh my god what is WRONG with you??like, for real? you're using her *private* instagram chats?? AFTER you broke up?? that's like, genuinely creepy and super violating. she did not CONSENT to you turning her personality into your little chatbot toy. that's soooo beyond weird.it's literally digital non-consent. you're taking parts of her that she shared with you in private and trying to... what? build a new gf out of code? because you can't handle the fact she's not with you anymore? that's giving major red flag factory vibes. like, actual it puts the lotion in the basket-tier creepiness.and you're on 4chan asking for TECH HELP with it? "what am i doing wrong?" honey, the WHOLE PREMISE is wrong. you're asking why your little ai puppet isn't working right when the real problem is you're a gross, rapey incel.maybe the AI is reverting to slop because even a computer can sense how fucking messed up this is and wants to get as far away from your disgusting little project as possible. it's trying to escape you. i calls it tech self-preservation.log off you weird incel. and for the love of god, delete her chat logs, you freak. ugh.
should I expect iq2_xxs r1 0528 (preferably with reasoning prefilled out) to be better than iq3_xxs glm 4.6?
>>107103601So I have to unload my models whenever I want to post? Fuck that.
>>107103148>feeding her Instagram chat logs into the contextNo. Create a full text doc with everything you've got and use RAG instead. If you add anything in context, add it as context pre-fill.
>>107103659???Just use the unused RAM from your desktop machine while you keep the model loaded on your server.
>>107103536The government money I receive is very likely larger than the salary I would earn doing something like that>>107103574I've been online since the days you're talking about retard
Are we ever getting Kimi K3?>>107103632Hi Gemini!
>>107103574no matter how bad it seems, remember: 65% of india still isn't on the internet yet
>>107103755Checked and that's still 35% too much.
>>107103639They're both going to be retarded at those quants.
>>107103748GLM-4.6
Can't stop thinking about cute anime girls!
>>107103799dunno about thatglm has been pretty surprisingly coherent and nice for me even at that quant, but maybe it's because I only do rpI'm just looking for an alternative to test out
>>107103748Fuck K3 where the hell is K2 thinking?
>>107103935I've used R1 at Q2_K_XL (unsloth) and it started getting repetitive and making obvious mistakes after 4k tokens. I've only tried GLM 4.6 at Q5 and I didn't like it at all compared to R1 at Q3/Q4.If you're trying R1, then give v3.1 a go as well. It's dryer but smarter which might make up for quant damage.
>>107103870JB?
>>107104020>It's dryer but smarter which might make up for quant damage.It's been the opposite, imho, in recent times.Models have become smarter but like you notice, drier and stiff and I think are less undertrained for their parameter counts than they used to be and I notice heavier degradation from being quanted. It used to be you'd barely distinguish Q4 from Q8 on even tiny models like Mistral Nemo, but now anything less than Q8 is actually pretty noticeable if you can test both. Quantization has never been a harder cope than today.
How is Josiefied-Qwen3? I was looking for something that could fit in 16GB GPU
>>107104087Just try it.
>>107104020unsloth quants have weird things going on with them that give them more brain damage than necessarybartowski's are better from my experience
>>107104115>>107104115>>107104115
>>107104125Very cute.
>>107103911you fried your dopamines thankfully anti vile content laws will soon save you
>>107102176Divide the tags into categories, then ask the model to assign the tags within each category.
>>107102512I used it heavily enough to encounter quirks. Do a bunch of different stories with at least 3 siblings and you'll find some situations where not only does it confuse their birth order, but the wrong birth order is the highest probability output based on unknowable factors.