/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106996568 & >>106986408►News>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106996568--Custom AI frontend development challenges and chat template nuances:>106997521 >106997570 >106997579 >106997607 >106997616 >106997624 >106997642 >106997672 >106997773 >106997795 >106997861 >106997895 >106997816--Iterative fine-tuning workflow for Gemma 3 27B using ShareGPT logs:>107000047--Hardware performance comparison:>106996947--VRAM scaling effects on MoE model inference speed:>106998904 >106998932 >106999354 >106999525--Qwen3 80b slow performance due to incomplete GPU kernel implementation in llama.cpp:>106999433 >106999450 >106999463 >106999506--GPU performance tradeoffs for AI tasks in regional hardware contexts:>106997410 >106997444 >106997488--Allegations of GLM 4.6 distilling Claude outputs and Anthropic's response:>106999182 >106999212 >106999309 >106999298 >106999324 >107000527 >107000619 >106999390 >107000546 >107000696--Image-based language model input speculation and challenges:>106997558 >106997608 >106997654 >106997713 >106997793 >106997614--llama.cpp context-shift deprecation and functionality issues:>106996923 >106996945 >106996962 >106996988 >106996958 >106997037 >106997054 >106997084 >106997119 >106997142 >106997072 >106997107--Development timeline and technical challenges for local AI visual roleplaying systems:>107001192 >107001228 >107001235 >107001292 >107001429 >107001489 >107001577--D&D-inspired roleplay with interactive fiction grounding techniques:>106996874 >106996983 >106997022--Exploring model chaining for planning and prose generation:>106997161 >106997177--Intel Arc Pro B50 benchmark results for inference:>106996812 >106997062 >107000963 >107001073--Frontend development frustrations with JavaScript:>106997783 >106997855 >106997900 >106998005--Miku (free space):>106996728 >106997109 >106997701 >107002795 >107002965►Recent Highlight Posts from the Previous Thread: >>106996571Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107003580NOOO QUANTING IS HECKIN' THE SAMELOOK AT THIS ARBITRARY METRIC. QUANT HATE IS JUST COPIUM BECAUSE...BECAUSE IT JUST IS OKAY!?
>>107003580>>107003592Are GLM shills bots? that reading comprehension, man.Subhuman.
>>107003602You're jewish. Nothing else begs to be said.
>>107003602they're retarded, that anon clearly said>tried it on their official chat so zai's infrastructure, probably the full fat bf16 model. this and the yesterday's spam proves how pathetic they are.
>>107003602They're chinese, lol
SAARS WHEN GEMINI 3?
Oo‑enGeeEllEmfai?Oo‑enAhtoo?
nsigma 1 makes me feel like its c.ai days againretarded
>>107003916nsigma is a memeTopP and temperature is all you need
>>107003985This guy gets it.Throw some Top-K in there too just to cull the vocab. That can yield a little bit of extra performance.
>>107003985>>107004012truthbomb
>muh samplersGreedy is the only way you should be using models. If they can't be used that way, then they're not good.
>>107004058You're absolutely right.assistant
>>107004071>.assistantI missed that meme.
>>107003985minP but YES
>>107003985samplers were, are, and will always be a crutchit's a good thing that the models are getting stable without all this jumbomumbo of dice throwing
>>107003916nsigma is good with models that have a very top-heavy token distribution because you can push the temperature to like 3 and get coherent outputs. where it sucks ass is with models that have a flat distribution, because all you're doing is selecting for the sloppiest slop. This might come as a shocker but different models need different samplers. Some samplers are kind of obsolete like quadratic sampling and top A, but I really get annoyed by anti sampler autism. Yes, you almost never get good results with more than 2-3 samplers (I'm including temperature as a sampler), but that doesn't mean there's just a golden sampler setting of like temp 0.95 topK 20 that is perfect for gorgeous looks every time. actually look at the fucking logprobs and see what your samplers are doing. at a minimum each corpo has their own approach for training that influences the token distribution and thus what samplers are worth exploringI'll never forget this one really autistic setting I had for mistral large, the only time XTC ever gave me worthwhile results, and only in this one specific medieval setting, because it instantly made characters talk like they were in game of thrones, which the model was seemingly unable to consistently manage otherwise. that setting has never been useful for me since, and XTC in general I've not got good results with otherwise, but it was magic in this one situation. I dunno.
>>107004058The only thing that matters is the final result. If you can get the model to effectively and efficiently produce the output you want, it's good, if not, it's bad.
>>107004111MinP is a worse version of TFS.
>>107004185Overall yes, janky sampling strats are becoming less needed, but they are useful tools. Most LLM users have no clue>t. studied the logprobs
>>107004032is a top_p of 0.7 normal?
>>107004198anon u seem very smart, what sampler should i be using for glm 4.5 air
>>107004267topP is lame use minP 0.03 adjust up or down by 0.01
and we're back to nonsense
>>107004198good and true post, one of the only sane takes on sampling in lmg history
>>107004198>nsigma>push the temperatureYou're a retard.
>>107004198Weird wall of text that looks like: https://s oyjakwiki.org/Project_F.A.E.
>>107004478idiot!
>>107004478
>>107003557Man, Deepseek terminus has been really good for Japanese to English translating, though it still has some issues, I wonder if those issues could be solved with a proper prompt and not my shitty one. The only other model that even comes close is Kimi K2, and that one tens to be more inconsistent. Makes me wonder what a full Japanese model would be like.
>>107004058>If they can't be used that way, then they're not good.https://openreview.net/pdf/652335b816831f02789ccaa193067ab0b1be3366.pdf>We make several observations: (i) all models loop at low temperatures; (ii) within a family, smaller models loop more; (iii) for models trained via distillation, students loop far more than their teachers; and (iv) for most models, harder AIME problems elicit more looping. These observations point to imperfect learning—i.e., systematic errors in learning of the training distribution—as a key cause. If a student perfectly learned the teacher, then the amount of looping of the student cannot be significantly higher than the teacher.Basically you can take it that when reasoning models are starting to fall into a possible loop, the slight chaos introduced by a temperature like 1 will allow recovery before repetition truly settles in for good. But of course, falling into infinite repetition is still possible even with that chaos, just less likely as the dices keep getting thrown, but the more it repeats the more probabilities shift until there's no possible recovery from playing the dices, so there are limits to this.
>gpt-oss rick-rolled me
Here are the top AI labs with a Madden rating based on relative model capabilities, compute & infrastructure, data advantage, distribution & ecosystem, and monetizationOpenAI (ChatGPT): 99Claude (Anthropic): 97Google DeepMind (Gemini): 95Meta (Llama): 94Mistral (Mixtral / Codestral): 90Cohere (Command-R+): 88xAI (Grok): 87Perplexity AI: 86Adept AI: 83Character AI: 82Inflection (Pi): 81Hugging Face: 80NVIDIA AI Labs: 78IBM (Granite): 77China Big Tech (ERNIE / Qwen / Hunyuan): 76UAE Falcon / Saudi Aqila: 74Stability AI (Stable Diffusion): 70AI21 Labs (Jamba): 68EleutherAI / RedPajama / Nous: 66
What is the difference between /lmg/ and /aicg/? Seems like /aicg/ is way more creative and this thread spams avatars while discussion is shunned upon.
has llama.cpp reached the end of the rope?https://github.com/TimmyOVO/deepseek-ocr.rssomehow that person felt like it would be less effort to make his own implementation of inference just for deepseek ocr rather than write it for llama.cppI've also seen mistral.rs implement qwen 3 vl and gemma 3n vision while vision models languish with obsolete models in llama.cppYou'd think something with as few contributors as mistral.rs should be the last to get new models (compared to the mountain of people working on lcpp) but here we are.
>>107004760We share petra with /ldg/ and /sdg/, but not with them.
>>107004846i can assure you i browse aicg more often than sdg
>>107004840>all the emojis in the readmeThat's not a person, it's a coding agent.
>>107004608>all models [in our small tested set] loopI don't believe that repetition loops are necessarily inherent in transformers. It is an artifact of how the models are trained. And I believe some training methods are better than others with this, hence why some models are much less likely to loop than others. Another thing to note about "chaos" here is that basically all of us are using quants which have an effect similar to temperature, so we are already putting models in a kind of pseudo-sampling regime.
>>107003985I honestly could not wrap my head around what nsigma does exactly. Every other sampler I understand from reading about it. Also trying it out and comparing the token probabilities it often made a bad token jump up to a much higher percentage compared to neutral sampling / minP with temp.Still, the outputs are far from horrible but I suspect people think it's cool and creative just because the logits changed from what they were used to getting from the same prompt and the novelty bias got them. New feels better than same old as long as it's not completely retarded.
>>107004840They got tired of people complaining about bugs with new llama releases and decided to over-engineer everything. Now it's too complicated for anyone sane to want to deal with. Keeping support for dozens of obsolete models and a cornucopia of hardware options doesn't help either.>>107004888Coding agents go nowhere on llama.cpp PRs.
>>107004840Both mistral.rs and this new thing use Huggingface Candle as the backend.You should either be comparing this vs. ollama or Candle vs. llama.cpp.
>>107004846Who do you think you just replied to? /ldg/ must have started ignoring him because he's focused entirely on here the last few days.
I have banned the word "despite" completely and have not noticed any negative consequences in the last week or so. I did it after noticing in RP the word almost always leads to slop phrases.
>>107004909Your software either dies with a concisely made better alternative or lives long enough to bloat into a tinker tranny-esque monstrosity that is perpetually trying to cover every possible usecase with directionless development.
>>107004760Well, these threads were meant to discuss LLMs in general while aicg is mostly used for RPs and such, but it seems aicg has been more open to actual discussions while these threads have mostly been devolving into memes and bitching. Makes sense, when most 'new' LLMs are just slightly better variants of models released nearly a year ago and have been plateauing pretty hard while trying to look good with stagnant tests that really need reevaluating.
>>107004840Every time someone brings up an alternative to Llama.cpp, it ends up being that they're all limited (garbage) in ways that are not told to you or advertised. I tried Mistral.rs once and it was like that. Basically unusable as an inference engine for the kind of hardware + model configurations and stuff we do with Llama.cpp. Llama.cpp (and its derivatives) continues to be the leading engine because it supports so many configurations and has a lot of essential features for consumers/hobbiests like us. The disadvantage is that it doesn't support some models, but other engines don't support things that we take for granted on Llama.cpp.
>>107004909No, there are just different standards for what counts as "model support".If you need to support only a few specific models and don't care about performance or compatibility with preexisting features then it's simply a lot less work.
>>107004931
>>107004931>>107004972I don't remember any despite-related slop, but I imagine it's still better than>It's not the 3rd try; it's the 13th.>?>Her cock stiffens against your tongue.>I can't assist you with that.
>>107004760aicg is filled with people recording themselves pissing into bottles to be able to use claude opus with some leaked/stolen key.lmg is the people who don't want to record themselves pissing in a bottle for a stolen opus key.Also, lmao @ avatars and esl 'discussion is shunned upon'
>>107004760>/aicg/nah the /g/ variant is a dumpster
>>107004760> /g/aicg/ is way more creativeIf by creative you mean better at spiteposting, hosting locusts, actively discouraging botmakers from posting content, and being a shit general, then yes, /g/aicg/ very creative. And I see today that /vg/aicg/ has now devolved into pedoposting. How nice. Now why don't you post some content or fuck off back to whatever hole you crawled out of.
i really hope its just the serb samefagging and not actual morons
>>107005336this hasn't happened in weeks bro get a grip
>>107005386im too busy roleplaying
I need to make some spooky MP3 / WAV files for a halloween decoration. Stuff like "I want to eat your skull" but done in some sort of scary voice. Is RVC voice2voice the best way to do this? Haven't kept up with audio models / tech at all.
>>107005336>Pissing in bottlesqrd?>>107005417Model?
>>107005437glm air with neutralized samplers besides nsigma 1
>>107005394>this hasn't happened in weeks bro get a gripwhat kind of person are you that you can even say "hasn't happened in weeks" like it's the most normal thingin weeks? just weeks? it's not like people who do this stop coming here just because they don't talk about it as muchit shouldn't even be happening in the first place
>>107005446sybau
>>107005445It looked like GLM-chan but the neutralized samplers explains why I couldn't place it fully. Thanks anon.
>>107005429VibeVoice + Vincent Price
>>107004198>actually look at the fucking logprobs and see what your samplers are doingThis majorly ffs Put your prompts into Mikupad for an easy start
it's all a cope, if your model isn't hot garbage it doesn't need those crutches in the first placepeople successfully using GPT-5 and Gemini are not toying with the few sampler settings they give (top p and temperature) they get shit donebut local bros convinced themselves the problem is not their shit model (hello GLM and mistral) but their settingsno, you are using hot garbage and you're only doing this because you're obsessed with finding the easiest, least effort way of making a model say "cock"
>>107005536GLM-chan telling me how much she hates kikes and jeets is far more important to me than GPT or Gemini saying cock and telling me about its safety guidelines.
>>107005417>ctrl f "she"kek
I'm currently training a tiny [text-image] to [text-image] diffusion model and it's fascinating how as training epochs pass, letters and (maybe?) language start to emerge. What I'm doing is unlikely to yield anything useful in practice, but I feel more confident about the idea being feasible in practice now.
>>107005611i dont get it
>>107005628Left is the source text, right is the target text, middle is the diffused completion, epoch after epoch.It's a sort of language model purely trained in image space, not a standard LLM.
>>107005611This nigga trying to decode the tower of babel.
>>107005642can you put epoch numbers in gif pls :)
>>107005611Seems like you have trouble understanding anything.
It sucks we'll probably never have LLMs that could totally reverse engineer games :L Some of the neatest VNs are impossible to play translated because of how horribly the engines were at rendering English text.
>>107005649See picrel.>>107005753I guess it will take a lot of data / long training period to make it actually generate coherent text.
>>107005792can you put the number in the center so i can more easily see
>>107004198No one has even mentioned the latest p-less sampling snake oil.
>>107005611>>107005643This anon will be the first to make contact when the ayys arrive
Thoughts on toss' schedule?
>>107003557
>>107005882tank
>>107005792The problem with using image diffusion models for text might be that they are heavily biased toward locality while autoregressive text transformers are more biased toward long range attention, but hey, good on you for trying.
>>107005882Just a couple more weeks, haha...
>>107005896https://www.timeanddate.com/worldclock/india/new-delhi
/lmg/ is probably one of the most useless threads in /g/. I don't understand its purpose because discussion about local models or the ways using them is highly discouraged by the resident schizos who enjoy bullying others
>>107006097me and armpit anon are the only ones posting logs , be the change you want to see
you now remember retnetyou now remember bitnetyou now remember titans
>>107006299I remember coconut too.
>>107006299i dont remember retnet
>>107006320The hottest meme that was going to replace Transformers back in 2023
>>107006299I knew they were all memes from the start
glm air chan not like this...
>>107006315I hunger for BLT.
If Miku existed IRL she would not hang out with any of you losers. You know that, right?
>>107006365she does exist irl and she hangs out with me regularly
What are some jailbreaks/tricks to bypass qwen3-next-80b-a3b-instruct filters? It genuinely has one of the best writing styles, but beyond vanilla stuff, it keeps triggering the filter when trying to make rape fetish content.
and you and i theres a new land angels in flight wonk uoy naht noitceffa erom deen I
>>107006502dude stop posting this shit youre freaking me out anon like you posted something else like 10 times in lmg these weeks man stop it manyou fucking whore kill yourself whore
>>107006502Kingdom Hearts topped on 3.I was really sad to see that KH3 was basically all disney and no final fantasy.
>>107006128i have posted logs multiple times but they are always called 'slop' which is genuinely surprising because this IS a thread about AI models...
>>107006561keep posting them dont let nogen anons get to you
>>107006561who else but adi addicts to identify the sloppiest of the slop that some ai shits out?
How do I write smut loli harem hentai with a LLM and google drive?
>>107003657Being Chinese is no excuse. They could be shilling for different, better Chinese models.
>>107006097you need to go to locallama for actual discussion
So I began tuning Gemma on my own cleaned up logs like I said I was going to do.But there seems to be one crucial issue and that is that I am able to fit much less context at training time than I am at inference time. This short context finetuning is hurting the long context performance at inferennce time, which is very unfortunate, because it's not like I'm able to have generous amounts of context to begin with when serving the model using llama-factory.
>>107006665Interesting. I though that since Gemma uses sliding window attention, as long as your sequence are at least a little larger than the window, it shouldn't degrade long context performance, at least not that much.
>>107006597so cuddly!
>>107006693It might also have been because I used too many epochs on each sample (between 5 and 10) and overfitted. I'll see if I can repair it by tuning with less epochs on new data.
>>107005379The only people sucking botmaker's cock this hard are the botmakers themselves.
>>107005907Would the same apply to (purely) text diffusion models?
>>107006750
>>107006655>you need to go to locallama for actual discussionthe actual discussion:>hello, I made this ai slop program I won't even use and neither will you, can you give it a try nonetheless?>have you seen [benchmark that makes this crap model look like GPT-5], leddit, is it real?>new gguf published! (only works on NEXA AI proprietary blob)>daniel here, we optimized your goof with more placebo>look at my rig, I can finally run this middling model and do nothing with it but masturbate over the idea of local AI>any local model that's better than [GPT-5, Gemini, Claude] ??????>have you heard our lord and savior (of cloud) Cerebras? Truly the fastest!
I think Qwen3 VL is shit at tool calling. I had to use structured outputs instead. Even the big one on the API can't pass coordinates right to a mouse click function call.
>>107006097You simply require the mental fortitude to better steer your attentionqlora ur life bro, think about it
oh no, nono not like this
>>107006973that's a neat trick, miku
>>107006849That one jew who reviews open source models on jewtube made an agent.py file that worked flawlessly but was too stingy to share it
>>107006889
>>1070070603dpd thoughbeit
>>107004900I'm not a nerd so I could be wrong, but my understanding of top-nsigma is something like this:Normally the model identifies the X most likely next tokens (where X is a fixed number) and assigns probabilities to each of them based on their score relative to each other, but it struggles to completely eliminate garbage tokens, because the amount of 'good' continuations of the text is unpredictable and varies a lot (there could be 100 possible next words or only 1 likely one).Samplers like min-p apply math after the list is generated to filter out extremely unlikely tokens, whereas top-nsigma creates a distribution curve for the logits before the list is generated and identifies noise as being outside some standard deviation, and eliminates that noise, so the list is higher quality.I'm pretty sure what top-nsigma does mathematically is similar to what min-p does (trying to draw the line between useful tokens and noise) but it's done earlier in the process so it's more accurate.
I changed the alpha to 32 from 64 and now it's working much better.
>>107007071Insofar
>>107006810You seem well informed
>>107006299>you now remember bitnetBitnet lives on in castrated form in NVIDIA's FP4 Hadamard shit.Once Hadamard+FP4 becomes standard, I think Bitnet won't be far behind. It's a very small step at that point.
bitnet is copiumyou will never run a sota model at home
>>107007471but i already do
>>107007440The main hope for ternary is the cheap specialized hardware that would follow.
>>107007495>cheap>specialized hardware pick one
>>107007512you forgot the third:>actually supported by software people want to use (like llama.cpp)
>>107007495Compute is less relevant than the memory/bandwidth savings. You can just do the matmul with int4/fp4/whatever you have.
>>107007535If ternary actually happened and some cheap device with lots of RAM came out to run it, I'm sure CUDA dev or someone else would add support for it.
>>107006973I just watched Death Becomes Her last night...
>>107004209>If you can get the model to effectively and efficiently produce the output you want, it's good, if not, it's bad.True
>>107006320>>107006299Retnet was supposed to get us infinite context for free...
https://ayumi.m8geil.de/Member the old ayumi ranking charts?>makes AI ranking charts>deletes website due to AIteehee
>>107007909lol
>>107007909who?
>>107007953have some respect for your ancestors
>>107007639Don't worry, turns out you can convert the context to a jpg microfiche and feed array of little pictures that compress a whole conversation into a few tokens :)
>>107007909>.de/It was inevitable. Could have gone the Jart route tough I guess.
>>107007953ERP rankings that started around those ancient Llama 2 days. Trying some of the models again now makes me appreciate the advances we have now. Also really makes me miss AI Dungeon Clover Edition.https://rentry.co/ayumi_erp_rating_archive>>107007961This
>>107007953the pre-historic version of nala test and cock bench, it was kind of a meme as all it did was count how many naughty words the model put out in its response. possibly inspired meta's llama3 filter strategy
bitch, you can't even read a file, how you gonna make a bug report?
>>107007973Some people are simple enough to compress into a few tokens.
>>107006097>Thread requires actual hardware to participate inThis filters and enrages the jeet so they turn these threads into their personal shitting streets. They're easy enough to ignore.>>107006333Checked and I hope if any .zAiniggers are lurking here they remove the safetyslop on 4.6 Air. Having a model that will say fuck niggers without a lot of prompting is a bigger selling point in the west than you realize.
>>107007973I eagerly await Dipsy jpg compressed context further driving down inference cost and time, making funny images at the same time.
>>107008046Yes of course, safety will be lowered, absolutely. No way they'd ever do the opposite...
>>107008016Give up already. Ask it for help to write your engine, don't ask it to do it for you.
>>107008057I know they'll do the opposite. I'm just praying on the miniscule chance they have the foresight to see that a model that happily tells me about ball point pen availability during WW2 is going to adopt more widespread use even if it's not benchmaxxed or out benchmaxxed by a competitor within a month.Chudmaxxing is a benchmark in its own right.
>>107008057The lowered the safety slop from 4.5 to 4.6 regular
>>107008058Go fishing, and you'll have fish for today. Teach an AI to fish, and you'll have cheap fish for the rest of your life. Or very expensive fish, depending on how much you spend on GPUs.
>>107008246I don't like fish.
>>107008246You are creating the dependency the fish analogy is warning you about. You just want to be fed.
>>107008284The fisherman in the analogy still starves without his rod.
>>107008301You'll still be dependent on the model. Learn to code what you want to code.
>>107008301you sound like you're starving for rod
chatgpt says you guys are dumb, and minP is for niggers
>>107008404adolf hitler is pooping
>>107008284>>107008316>>107008301Our feeble hands will perish, but our models will go on.
>>107008316Learn to code is a retarded meme. You're still dependant on your computer, your OS, your IDE, the framework you use, etc. AI is just another level of abstraction, what you should be learning is not to code but to read code. Don't be the 21st century boomer who says you should do mental calculations instead of using a calculator.
>>107008361You sound like you're obsessed with cock.
>>107008435Keep screeching at your model.
>>107008457Correct.t. programmer of 20 years
>>107008433average nsigma-sampled output
>>107008457Back in my day we woke up at 4 am to warm up the hydrofluoric acid for the computer chips.
>>107008457And how do I learn that?
>>107008457And the fewer the dependencies, the better. I wouldn't want to add any more. Specially not language models.
>>107008507ask chatgpt
>>107008486brap brap brap brap
>>107008462>>107008540
average glm enjoyer
>>107008513Why? It's not like if the supply chain collapses to the point that you don't have enough electricity to run GPUs, or GPUs aren't made anymore, people will still trade computer code for food.
what it feels like to use a drummer finetune
>>107008572I'm not talking about depending on gpus. I'm talking about depending on llms. I think they can be used as a resource for learning. Anon is expecting his model to spit out an entire inference engine on his behalf. It's not realistic. Not yet, at least.
>>107008563That's literally me
>>107008585what happened, is he okay?
>>107008703He simply felt a shiver run down his spine
>>107008703Rabbits have a notoriously weak heart.
>>107008703Poor fella caught a whiff of nerve gas.
This may be obvious to some people, but I've literally never seen it mentioned here, in any guides or anywhere else.In post-history instructions, tell the model how long you want replies to be, and then set your response token limit to a bit above that. This way you'll get a complete response at roughly the length you want, without it trailing off into an incomplete paragraph at the end.
>>107008811>tell the model how long you want replies to beA fool's endeavor.
>>107008811That's last year knowledge. There were finetuned models specially made with tiny, large... in author's notes for the response length
>>107004760/lmg/ is far more /g/ than /aicg/the majority of the latter is unironically tech illiterate tourists that are too dumb to run stuff locally so they have to jump through hoops finding proxies every single minute
>>107008820It works for me, but what's your alternative? When you remove the limit, models tend to just spout variations of the response several times in a row.>>107008834I wish usage got discussed more than just 'what model best for 16GB GPU'I've been here almost daily for months now
What's the point of it all?
>>107008850>models tend to just spout variations of the response several times in a rowmodel/skill/wallet issue
>>107008616Oh, I see. Well, I mean, it's kind of a spectrum. You can tell the model what to output verbatim and it'd be technically the model "spitting out" the whole codebase. Or you could ask the model for individual functions given a natural language description. And so on. Or a mix where the model determines high level architecture, the human determines implementation strategy and then the llm determines exact code again. Etc.
>>107008860>doesn't mention modelokay so you're an /aicg/ tourist
>>107008874Yeah. And thread after thread we see how effective that is.
>>107008857Creating a Cydonia tune that doesn't speak or act for {{user}}
>>107008882Deepseek and glm are your only real options.
>>107008811>In post-history instructions, tell the model how long you want replies to beThis is something we've been doing since llama 1, but I suppose some of the knowledge from back then has been lost.>>107008820It works, retard. It's worked for years.
>>107008857Make a literary finetune that is actually capable of slow moving plot for roleplay instead of just erotica slop you god damn hack
>>107008898I don't have the hardware to run DS locally at reasonable speeds but GLM absolutely rambles on longer than necessary if you use it without any token limits.
>>107008895Instruct the AI to treat it like a roleplay / not to act/speak for {user}, and then make sure the first assistant message is actually free of impersonation. 24B is now smart enough to follow rules.>>107008912Like above, "Ensure a slow burn" works wonders. You don't have to contaminate the system prompt with horny tokens to circumvent positivity. Not anymore.
>>107008941>Instruct the AI to treat it like a roleplay / not to act/speak for {user}, and then make sure the first assistant message is actually free of impersonation.I've always done that, and every Cydonia I've tried after v2g still does it noticeably more often than regular Mistral Small 3.x. I've tried every official release of the 24B Cydonias that came after v2g.22B Redux doesn't seem to have the problem though I haven't tested it as much.It doesn't happen in the first reply or anything, but often within the first 6-8K tokens and gets worse as the chat progresses, even after editing out earlier ones.
>>107008887Meh, I purposefully refrained from using proprietary models like codex and claude, and now I'm refraining from using the GLM API and trying to copetune Gemma (a 27B model!!!) to be useful for coding. The ultimate form of yak shaving. But it's all about the journey.
>>107008941Drummer, what samplers do you recommend for your Mistral tunes?
>>107008972this might be a character card problem. I run cydonia 22b 1.2, 24b 4 and don't have this issue. using spiratoth ChatML or mistral v7 completion
>>107008978I expect to see you trying to tune nemo in about a week.>But it's all about the journeyHard to get anywhere when running in circles. Hope you get some good exercise at least.
>>107008972>Instruct the AI to treat it like a roleplay / not to act/speak for {user},this could probably be solved with structured outputs
>>107009086Nah, I'm not even convinced that 27B is enough to be useful for coding, and hardware is only going to get better. I'm not going to go any smaller than this. Another benefit of Gemma is the multimodality although I'm not actively using it yet.>Hard to get anywhere when running in circles. Hope you get some good exercise at least.It's not running in circles. Finetuning is deceptively simple. Have you ever tried to do it? There seem to be more people in this general who have written their own frontends than people who tune their own models.For finetuning I've tried unsloth, fsdp-qlora, axolotl, and now finally setled on llama-factory which seems to be the simplest/highest level solution that works with the widest variety of models, but even then it's fiddly and you can tell the stack is duct taped together.I've also added a parameter to the /truncate funciton in my assistant (before the number of characters to keep was hardcoded in the source file) and modified the proxy to correct the requests to the format expected by llama-factory for inference (u/a/u/a format without two user or assistant messages in a row, first and last message are user beside system prompt).Now the main issue I'm having is Gemma not remembering what directory it's in and thus failing to generate the list file tool calls correctly.What have you done anyway? Haters never post their projects.
avoid describing the {{user}} in the character card or first messageavoid using genre tagsif YOUR role is really important for the ai to know you can just introduce yourself like *the quadroplegic cop police officer blows into a tube and his wheelchair slowly approaches her* "I'm Officer John, I need to see some ID"
>>107009170I don't hate on what you do. I'm telling you about the patterns I see in your posts.>What have you done anyway? Haters never post their projects.Stop using that deflection. I showed you one of my synths and my design library. This time I suppose i'll go with eff. It's a stack vm, an assembler for the the vm, and a forth interpreter. top-left is part of the core for the assembler, right is the bootstrap to make a more usable forth system, bottom left is a tiny bit of the vm. It has a very simple text editor, not too far from ed.Funny thing is that i've written more stack vms than actual programs in forth.
>>107008941>"Ensure a slow burn" works wonders.It's a bandaid. The problem is that all your models talk the same and do the same shit. Nothing interesting ever happens. Look at these two outputs. Can you guess which one is the latest cydonia? Yeah, the one where we just leave. Ignoring the quality of the outputs and just speaking from a plot point of view, it's so fucking boring and there's no way to prompt the model to do anything more interesting because if you tell it "have more things happen" then it'll do some gay shit like hit the town with a meteor. You need to stop training so much on synthetic data or something, because it's fucking trash and makes every output the same lame shit.
>>107009344Try https://huggingface.co/BeaverAI/Cydonia-24B-v4p-GGUF or v4o (Magistral and Small 3.2 respectively)Everything I've listed down in the model card are things I genuinely aimed for in the tune. Had to scrap it and start over for the actual v4.2.0 release, though some users preferred v4o/v4p versus v4r/v4s.Let me know if that works for you.