/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107129334 & >>107121367►News>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html>(11/06) LocalSong 700M melodic instrumental music generation model released: https://hf.co/Localsong/LocalSong>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107129334--Papers:>107130633--llama.cpp VRAM optimization challenges and AMD EPYC memory architecture quirks:>107132531 >107132547 >107132605 >107132615 >107132685 >107132754 >107132705 >107132740 >107132765 >107133279 >107133407 >107133585 >107133671--Budget and power challenges for a high-end workstation PC build:>107130125 >107130157 >107130181 >107132027 >107132049 >107132074 >107132080 >107132104 >107132118--Hardware performance for running GLM-4.5 models on RX 6600 XT:>107133281 >107133294 >107133328 >107133338 >107133381 >107133444 >107133460--Uncertainty over RTX 50 SUPER's 3GB GDDR7 memory availability:>107131960 >107132001 >107132894 >107133211 >107132060--Budgeting and hardware compatibility challenges for tensor parallelism prototyping:>107130539 >107130706 >107130899--Speed vs quality tradeoffs with K2 Thinking model on SSD hardware:>107136636 >107136667 >107136687 >107136699 >107136721 >107136777 >107136820 >107136885--Character.ai model architecture and commercialization challenges:>107137178 >107137277 >107137296 >107137860 >107137233 >107137275 >107137300 >107137444 >107137520 >107137724--Model discussion with NSFW and uncensored features:>107133720 >107133729 >107133752 >107133948 >107134600>107134837 >107133737--Debate over model weight formats and open weight access for finetuning:>107129703 >107129880 >107129911 >107129971 >107135655 >107135714 >107135921 >107135957 >107135992 >107137717 >107137751 >107137833 >107130017--Logs:>107130261 >107135147 >107135334 >107135409 >107135481 >107135491 >107135517 >107135792 >107135854 >107135967 >107136320 >107136332 >107136385 >107136522 >107136469 >107136808 >107136984 >107137104 >107137141 >107137735 --Miku and Luka (free space):>107129864 >107130191 >107130344 >107131403 >107131513 >107131552 >107137895►Recent Highlight Posts from the Previous Thread: >>107129340Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107138549skill issue
>>107138613Models to translate chinese text from the image to hindi?
>>107138549Because there's demand, and most people are too dumb to prompt. If a model can't talk about the smaller things in life with no system prompt and zero context then people will give up until there's a model that can, even if it's measurably stupider.
When and if the AI bubble bursts, how do you predict that it will affect local models? Do you think there a period of stagnation since the major AI companies stop development due to the crash, or do you think local models will pick up the slack and slowly iterate when the major players stop?
wholesome message from /oursaar/https://youtu.be/mdlGTMAPoz8
>>107138775local cannot progress without corporate. the difference is that when corporate AI fails, we will still be here and we will still have our models
>>107138775I think China will keep chugging along, so local will still get something.
>>107138775The only reason local models are even being made is to get investors and build general interest for the company's proprietary models. If the bubble bursts then I wouldn't expect anything but finetunes, rather than actual new model releases, unless some group goes the crowdfunding route and there's enough interest for people to pay up.That said, I don't think we'll see a pop for at least another 2-3 years, if at all.
>>107138775Imagine outcry when suddenly their AI boyfriend is shut down, it will be like GPT4 shutdown, but 100x worse.
>>107138842A pop will trash the US economy at this point, so they'll keep the charade for as long as they can
>>107138842>That said, I don't think we'll see a pop for at least another 2-3 years, if at all.LolLmao
>>107138862you can short the market if you're that confident lol
>>107138842>>107138859OpenAI plans to IPO next year. I imagine the pop will come shortly after that.
>>107138867I don't have enough money to gamble with but this guy is doing it https://uk.finance.yahoo.com/news/michael-burry-shorting-ai-stocks-092424898.html
>>107138639にんじん = carrotsじゃがいも = potatoesthird one - not sure(blank)ねぎ = it's probably たまねぎ, the regular ball onion, but with the first two characters missing, it looks like it's ねぎ, the signature green onion [hence the decision]
>>107138890this isn't hindi and you are no model
>>107138775I will short nvidia and use the money to buy up all the dirt cheap datacenter hardware to create the ultimate local model
>>107138968nvidia has buyback agreements with pretty much all the datacenters they supply. They'd rather toss their GPUs into an incenarator than let people have more than 16GB VRAM for less than $2000.
>decide to return to GLM-Z1 for nostalgia sake>On regular Z1 {{char}}: before <think> jailbreak works like a charm.>Rumination just doesn't give a fuck. If you tell it to write degenerate smut it will immediately go into a recursive thinking loop to refine the response (but it's a dumb 32B model and misses the point of the scenario entirely) We have to go back, thoughbeit. >mfw if I died an untimtely death my loved ones would stumble upon my AI lab, see what I was getting AI to write and think I was the most awful human being on the planet.This is the path to a post scarcity future that will cure all human suffering though.
uhm...which local model is the least safety slopped and good at coding so I can vibecode le epic malware?
>>107139156pygmalion 8b
>>107139156Deepseek R1.
It's been more than 5 threads and no new goof supported. I think we need to do something.
>>107139246>and no new goof supported.What do you mean? There was a zombie lobby that was being bombed by TNT, that sounds like a new goof to me.
Do zombies slip on banana peels? If so we could have another banana hell lobby with zombies included
Tried kimi-linear on OR because there's no gguf yet. And it's sloppy, it writes nothing like K2 at all, but a lot like Claude. Damn because when I begged for a 2025 mixtral I didn't mean another copy. Welp guess us 24GB vramlets will have to wait some more.
Someone explain to me why the best sampling for RP/creative is simply not this:>First sampler: minP (or Top-P if you like it better I guess)>Second sampler: Temperature>Set temperature to 5 or so>Start raising minP (lowering in case of Top-P), find the value that produces minimal to no brain damaged outputs and stop there (In my testing for minP it seemed to be around 0.3 but likely to vary a lot based on model)>You now have sampling that cuts all stupid tokens as first step and then levels out the probabilities of all remaining tokens so all of them are equally valid picks, promoting variety.
>>107138842>The only reason local models are even being made is to get investors and build general interest for the company's proprietary models. If the bubble bursts then I wouldn't expect anything but finetunes, rather than actual new model releases, unless some group goes the crowdfunding route and there's enough interest for people to pay up.>That said, I don't think we'll see a pop for at least another 2-3 years, if at all.if the AI bubble pops, openAI might be fucked, all the rando smaller orgs might get fuckedgoogle will remain very strong because they are in fact extracting profits from such models (AI search -> they get ad revenue via AI, not the target site). same for meta with whatever add AI voodoo they use to print money. One of the two may or may not then sell their AI services at a premium to fill the market need - and keep it proprietary. Heck, google's top secret model "Sajak" is already basically AGI
>>107139402Just do topK 10 temp 5.
>>107139407>Heck, google's top secret model "Sajak" is already basically AGIWhat's the story here? Tried checking but getting nothing in a quick search.
I added the extra CoT data (written by QwQ) I said I was going to add to the Gemma finetune. The result is fairly interesting.Now it's much less neurotic about its own mistakes, but still quite a lot of "you are absolutely right" slop.
>>107139418Feels like 10 tokens is too many for every situation since a bunch of time there is really only one correct token like when saying someone's name and what not but might be fun to try at least.
>>107139447>since a bunch of time there is really only one correct tokenNot really. Unless you are doing maths or outputting some sort of strict structure (json, html), or getting specific answers (yes, no, blue, red, etc), you more often than not want a healthy pool of possibilities.In the case of the name for example, the next token might be the first half of the name, or a token that initiates a preamble to getting to the name.The difference between >The guy's name? John.and>The guy's name? It's that one motherfucker man! John, the asshole.Tokens are positional, basically.
>>107139447Yeah no, even Top K of 3 causes grammar and punctuation errors at times. Has to be P to handle cases where only 1 or 2 tokens are at all reasonable.
>>107139500Maybe you are using big models that have a gentler slope from good to bad but I'm a vramlet and mistral stuff for example has a ton of cases where it drops to garbage almost immediately after the likeliest token whenever it's very sure about the top token.
>>107139402Variety does not equal goodYou can have 50 different indians shit on your plate, but that won't make you want to eat it.
make it stop aaaaaauuuughhh
Why don't llama.cpp guys provide Linux binaries with Cuda support compiled in? Windows version comes with CUDA support too, it can't be just because of some arbitrary software license.
>>107139742shaddup and compile
>>107139738Isn't it ramping up because they are transitioning to DDR6
>>107139779nope. thats still 2 years out at least
>>107139738but anon think of all the proprietary models they will train with all of that ram
>>107139540So P first, K second?
>>107139540That doesn't make sense. If the model makes mistakes at top k=3, at a higher top k mathematically it will make them more often.
>>107139897I usually just set topK to 20 and adjust temp based on how locked in the model is. If it goes very sloppy you need to confuse it.
>>107139742Because we Linux users can fend for ourselves.
>>107139792Is it? Isn't ddr6 slated to start releasing consumer models by next autumn?
>>107139924>git pull>cmake ...How hard can it be?
>>107139982no. zen 6 is still gonna be ddr5
>>107139742Because linux users are used to being third world citizens.
why compile by yourself if no new goof supported?
>>107139984to be fair I always have to check the build.md to see what the cuda on command was
I pull and recompile llama.cpp multiple times a day as an autistic stim
>>107140040You don't have terminal history autocomplete?
>>107140043Whatever gets you through the day anon, God bless you
>>107140044>cmake.. urg wat was it ctrl-r r r
>>107140044I run it on containers.
>>107140044show hist configshopt -s histappend # append don't overwriteHISTCONTROL=ignoreboth # ignore space beginning lines and duplicatesHISTSIZE=10000HISTFILESIZE=20000
shopt -s histappend # append don't overwriteHISTCONTROL=ignoreboth # ignore space beginning lines and duplicatesHISTSIZE=10000HISTFILESIZE=20000
>>107140074>cmake>press right arrow keyNot so difficult
>>107140110My 3090 gets 0 because I never bothered to download that garbage
>>107140129holy fucking based
>>107140110It's okay bro no need to be shy today you learned spoilers don't work on /g/
I want to preserve some of Opus-3 before it gets switched off in 2 months. I'm thinking I'll put like $500 on openrouter and build a dataset. I know I could get random prompts out of datasets on HF and fire them off, but that'd be shallow.What's a good way to get multi-turn out of it? The datasets I've seen doing this with another LLM writing a response don't' see that great. The "human" follow-up replies are too generic and half the conversations are discussing what Claude is allowed to discuss.
>>107140145Hello sir.
>>107140145I was thinking about distilling from closed models as well because frankly all the open datasets are all trash.The best way might be to prompt it with random segments of conversational datasets. Also there might be value to sampling with the same prompt multiple times at a high temperature to capture an approximation of the distribution rather than only the top tokens since having (or in this case estimating) the soft logits is supposed to be much better for distillation, but I'm not sure how valuable that is compared to doing one capture with different prompts.Another strategy would be to just use the model in the way you normally use it and just capture the logs. But that is obviously very time consuming.
A third alternative might be to offer free usage through a proxy while logging it and let people do the hard work of prompting it for you.But that would have to be rate limited and otherwise locked down to prevent people from trying to DDoS you and waste money.
>>107140277>while logging it and let people do the hard work of prompting it for you.You really want to train it on "ah ah mistress"?
>>107140264>I was thinking about distilling from closed models as well because frankly all the open datasets are all trash.Yeah, I noticed that. But I don't know that mine would be any better considering these would have been made by smarter people than me.If Opus-3 is one of the models you wanted, we've got until January 5th: https://docs.claude.com/en/docs/about-claude/model-deprecations>Also there might be value to sampling with the same prompt multiple times at a high temperature to capture an approximation of the distribution rather than only the top tokens since having (or in this case estimating) the soft logits is supposed to be much better for distillation, but I'm not sure how valuable that is compared to doing one capture with different prompts.Good point, I think I'll do that for at least the first turn. Even if I can't figure out how best to use them right now, at least I'll have it before the model is removed.> Another strategy would be to just use the model in the way you normally use it and just capture the logs.Yeah, I've got about 200 conversations I can export in openwebui.Still, likely going to need a lot more than this.Kimi-K2 suggested I need to use different system prompts as well.I think I'm going to need a model responding to it, but not stupidly like this:https://huggingface.co/datasets/kalomaze/Opus_Instruct_25k?conversation-viewer=2(Why does it have to say "Claude" in every reply?)
>>107140356No, I'm interested in logic, programming and reasoning, but I assumed OP wanted to distill for coom since modern models do better at "productivity" tasks.
>>107139751Retard, there is already Vulcan and CPU version for linux but not CUDA.
>>107140370ur rarted
New Cydonia is really goodv4ze is good too, but I've been getting better results from v4zd. Responses are still varied and perfectly coherent at 24K context.
>>107140360If you can gather your or somebody else's logs about the topic you care about (even for other models), you can finetune a LoRa (what base model you finetune it on doesn't really matter) to predict what the user would say given a certain assistant message.
>>107140356>>107140277Yeah we've already got an Opus-3 "ah ah mistress" dataset which I think was created that way via 4chan volunteers. The Magnum models were trained with it.
>>107140370hush nowjust compile it
>>107140380pretty sure i've read all this shit about a dozen other times.. looks pretty fuckin same to me
>>107140380Fuck, cut off the last line.New Cydonia is really goodv4ze is good too, but I've been getting better results from v4zd.Responses are still varied and perfectly coherent at 24K context.
>>107140365No, I'm interested in logic, programming and reasoning, but I assumed OP wanted to distill for coom since modern models do better at "productivity" tasks.Not to "coom", just the overall voice of the model.Opus-3 isn't very good for logic/coding (otherwise one of the Chinese labs would have distilled it and I wouldn't bother)
>>107140399Well, that's the extent of my knowledge. I searched around but there doesn't seem to be anything too fleshed out unlike the style transfer in the visual domain which is a mature ML task.
I've another idea. Maybe ask it to write an infinite choose your own adventure game with say 4 different options on each generation, and systematically explore all possible branches of the tree? I think that would be interesting in and of itself besides the Claude situation.
>>107140394It's hard to convey the value of a model in a single post, no one here is going to read thousands of words of a slop fantasy RP that devolves into smut. I really do think that it's the new best coom/creative model that can comfortably fit in 24GB VRAM.The main points I enjoy about it, compared to regular Mistral, Gemma, Qwen models ~30B and under>characters will swear when it makes sense in context, most other models will either do it in every reply, making the character seem stupid, or be too prudish to have the character swear of their own accord>swipes are varied, even at a modest temp of 0.7 (which is about the upper limit for mistral small, before it starts getting noticeably dumber>doesn't speak for user particularly often, a problem I've had with other recent Cydonias>relationships and sex are effectively built up slowly, e.g. characters will flirt, a day can pass without further mention, and they'll recall it and continue the next day, ~2-3k tokens later.
what am I in for?
>>107140501safety cuckery even if you don't goon
saaaar do not redeem the chain of thoughthttps://www.youtube.com/watch?v=IeCS6hsnOXs
>>107140392fuck off avatarfag.
>>107140501>we must refuseit's a decent model for sfw tasks & tool calling, runs fast
It seems that adding the QwQ data to the dataset made the model much more sensitive to overfitting, even though the validation loss kept going down, now I had to decrease the lr from 1e-05 to 1e-06 because the CoT data made it primed to get stuck in repetition loops. I think it's probably because of the repetition inherent in CoT models.
>>107138890they're all ingredients for butchered """curry"" so I'm assuming just curry powderchecked a dictionary and it's curry roux
>>107138775>When and ifThere are hundreds of promising research ideas yet to be tested at scale. LLMs are already impacting the job market and only continue to improve. Not gonna be a sudden a-ha! thing where the world changes overnight, just bumpy steadily better until everyone's left wondering where the jobs are and the civil unrest picks up because UBI ain't happening
>>107140380Every model feels sloppy compared to Cydonia in its size range desuI still try other models people recommend but they majorly suck ass
>>107140501toss for the smarts, air for the dick
>>107140689no toss on the 'ick?
>>107140689>>107140692go back to the sharty you homo faggots
>>107140689>we must refuse>"the dick?" Air echoesboth are awful
>>107140734so are every other llms, but for 100b those two are the only decent options
>>107140741>so are every other
Ok, I changed my data mix to the following:my own assistant logs x 4openthoughts x 2qwq cot x 1x n being the number of times the data is duplicated in the dataset, i.e. a hacky way of having a different number of epochs for each data class while still randomly shuffling the samplesnot sure how it'll work
next I wanna try adding some RP data to see if it helps with coherency, and also check out the openassistant dataset
after that it might be time to begin testing the waters with rlvr
oh, and also some data augmentation although I'm not sure how that works on text only tried it with images
>>107138606At least in Germany the supposed $500 MSRP for the Intel Arc B60 has so far not materialized, at 770 € they're I think just not worth buying.Is the pricing in other regions at least better?
>>107140748holy zased
>>107140749I just realized this causes us to train on the validation set. Fuck. Oh well, the validation split didn't seem to be very useful anyway.
>>107140832tech MSRPs are just marketing material, they're complete fiction.
>>107140853I figure I will have to make an explicit manual split beforehand and then it'll be alright
it took like half an hour to get huggingface hub installed, due to GPT5 hallucinations and microsoft store fuckery. this is not a serious industry
According to this talk the way the Llamas were trained was by taking a validation set from the complete training set and then changing the weight of each dataset based on how much it affected this validation set. They claim this is an open problem. I think it can be fairly easily explained as some of the data being overrepresented in the training set and causing overfitting if not downsampled. https://www.youtube.com/watch?v=-TIZPe_YaiU
>>107140877skill issue
>>107140689>toss>smartslmao
whenever I feel cold I just crank up my 3090's power limit
>>107140877If you need GPT5 to install a fucking program I think this hobby might not be for you
>>107140894I'm not going to watch a random 1-hour video, but it's common knowledge that most AI companies optimize their pretraining dataset mixtures for synthetic benchmarks, other than "safety".
>>107140877what do you need huggingface_hub for?
>>107140921Does validation loss count as a synthetic benchmark though? It's about how accurately it predicts the pretrain dataset.As for the video, the claim happens at about the 15 minute mark, but the channel is one of the best channels I found when it comes to ML theory.And people say there is nothing worth watching on youtube.
toss bros?
>>107140928>what do you need huggingface_hub for?i want to use the extropic thrml simulator to do topk prefiltering, but i need to rip the gptoss embeddings first so i can shuffle them around so each axis of the embedding fits into the 2d thrml array meaningfully
>>107140956lol
>>107140956>We must not reveal we are cucked.
>>107140921lol they have entire teams for safetycucking
>>107140956weird that an optimized model devotes thinking tokens to "so"
>>107141012he was talking about pretraining, has nothing to do with safety
>>107141030Curtailing the corpuses (corpii?) used in pretraining is very much a job for the safety team
>>107141086ok, fair. but do you have any reason to believe that optimizing the validation loss on a subset of the complete unfiltered corpus would correlate in any way with safety? because the claim on the video was about optimizing the validation loss
>>107140749This worked!!! Quite nicely in fact.Logs here in case anyone cares https://paste.centos.org/view/57a8816fAfter this success I think I'm going to stop messing with the dataset and hyper parameters for a while and just focus on generating more data to train on.I think the generation quality is good enough that I may not even have to clean up the logs before training on them. I am not prompt masking so I expect the model to learn just form trying to predict the tool outputs.With the feedback from the environment as well as my own feedback guiding the answers, it should slowly shift the distribution towards improving.
Still some repetition issues but very bearable.
>>107140486>It's hard to convey the value of a model in a single post, no one here is going to read thousands of words of a slop fantasy RP that devolves into smut. >I really do think that it's the new best coom/creative model that can comfortably fit in 24GB VRAM.>The main points I enjoy about it, compared to regular Mistral, Gemma, Qwen models ~30B and underIt is all the same. The only real improvement is deepseek and glm 4.6 or if you really can't do those 235B. I lived in the 24GB copeland for 2 years so I know.
>>107140446Thanks, for searching around, I'll keep that in mind about the proxy for logits (multiple generations at a higher temp).I've also found some Opus3 datasets that aren't cooming eg:eg: https://huggingface.co/datasets/PocketDoc/Dans-Assistantmaxx-NoRobots?conversation-viewer=11And his multi-turn, the "human" replies don't look like garbage.I guess I'll try his model and see if the style transfer worked. Then I'll try to create a LoRA using those datasets with models like glm4-base. Maybe Qwen2-27b base (not 2.5) since that model identifies as Claude by default.
>>107140749x n being the number of times the data is duplicated in the dataset, i.e. a hacky way of having a different number of epochs for each data class while still randomly shuffling the samplesnot sure how it'll workThat's worked for me in the past training tts voices, where I didn't have enough samples for some of them.
>that drop off between Q3_K and Q3_KSYou better have 500GB RAM if you want to run K2-Thinking locally
>>107141409OH NO IT IS 5% WORSE!!!!>into the trash it goes
>>107141423Fuck off retard
>>107141409>ubergayrm copequantshmmm
>>107141423quantchads understand
>https://github.com/ggml-org/llama.cpp/pull/16600uhmm airbros??? we might actually be going to eat good with GLM4.5V very soon?
>>107141548please use a model to translate your subhuman babble into english before posting
>>107141567I’m sorry you found my post confusing. I’ll gladly expand on any points that need clarification. Let me know what you’d like me to elaborate on.
>>107141577much better
>>107140125based and fishpilled
>>107139500Retard, a normal sentence also has a strict structure. Did you skip school or something?
>>107141464I love John and I hate it that he never quants my models. I hope he will notice me someday.
>>107140959why none of you told me i was being retarded before i spent all this time trying to project a 1d axis into a 2d array>>107141567>>107141577>>107141594kek
>>107141522models at fp16 are already almost retarded>>107141548how many vibecoders are working on this one?
feet
>>107141746small feet
>>107139419i asked google about Sajak and it said it doesnt exist. so i think i talked to sajak directly.
>>107141746Mikufeets
>>107141186glad the agentic finetuning is going well, but what about the sex?also what moddl are u finetonignalso theres a lot of opus logs in c2 proxy on hf
so anons I now have a 5070, what can I do with it? Text based porn? Generate porn gifs? Can I take images from girls I know and make them nude or something like that?
still waiting for something like this but local https://www.youtube.com/watch?v=iYvvMHvohwY
>>107141922worthless information, post whole specs
Google’s $2.7 Billion AI Hire Tests Company’s Speech Limits With Inflammatory Postshttps://www.theinformation.com/articles/googles-2-7-billion-ai-hire-tests-companys-speech-limits-inflammatory-posts
>>107141409It's so fucking over. I could fit IQ2_KS if 'garm quants it but I won't bother just like with regular K2 because it's going to be too retarded at that sky-high PPL.
>k2, large2411, sonnet4.5moefags getting slammed
>>107142162Cherrypicking in a nutshell
>>107142181cope is stored in the nuts
>>107142077don't
>>107142162The first sentence is wrong though
>>107142112>"G-d"more kikes in charge of AI. Tiresome level: Absolute.
>>107142270>Shazeer is an orthodox Jew.[12][13][14] His grandparents escaped the Holocaust into the Soviet Union and later lived some time in Israel before emigrating to the USA.[12]this is also the guy that founded character.ai btw
>>107142296He was also one of the authors of the Attention is All You Need paper that all modern LLMs are based on and this
>>107142112>Being against medical sterilisation of mentally ill people is inflammatory Truly we live in a time
>based jew calls out tranny insanity>some faggot starts cryingfunny bet he's the same dude (man, boy, guy with dick and NEVER EVER WILL BE A WOMAN) who thinks because the guy from that BIG hollywood movie is shorting some stocks that means AI is over
>>107139738i wanted to upgrade my setup from ddr4 2133ghz to ddr5. this feels like divine punishment for not taking action sooner.
>>107142864overclock your ram to 5600, just werks
>>107142162wtf switching to mistral large from claude now
>>107143354C-combo breaker!
every day glm air surprises me how degenerate it is, and how many degenerate terms it knowsthe west will never compete
>>107143383imagine hot steamy sex with gemma and air
>>107143391>with gemmai wish... but it always avoids using explicit terms, cheeky brat!
Has anyone found a way to reign in the excessive amount of thinking K2-thinking does for most of its replies? I like how K2-thinking writes but it's also the first model since the original R1 that spends time thinking to make a plan, only to go 'Wait, I should x' and then throws the entire thing it thought up out to start over. I might actually have to ban the fucking "Wait" token again like I did back then. I want it to do some thinking but this is a waste of time.
>>107143399it makes it hotter, you have air slutting it up while gemma plays the inexperienced virgin.. she could call the cops at any moment too!
>>107143415true..
>>107143403I can't run the thing, but you can probably control its thinking with a prefill describing the exact steps of reasoning, saying that it'll be concise and efficient, etc etc.If i works with much smaller models (GLM Air, Qwen 30BA3B), it should work with a behemoth like that.
>>107143383>every day glm air surprises me how degenerate it is, and how many degenerate terms it knowsand it still can't beat claude at its peak
>>107142378why does GLU even help anyways?
>>107143464>4 bit air (106b, 12b active) cant beat 1 trilllion gorillion billlion claude copuswell.. im not sure if thats true but.. if it was, it'd make sense kek
>>107143403等待
>>107139738I thought you all nigz were exaggerating. Checking it, the 256 GB DDR5-5600 I bought just over a month ago for $600 is now $1250. It more than doubled in a month, and I just happened to buy the toe of the gigapump on that chart.
anons!i've been out of the lmg loop for almost a year now, what's the current meta for erp? i want to be able to run it on 24gb vram, thx
>>107142162moe sisters? our response??
>>107143528post ass
>>107143528GLM or bust, 24gb is useless nowadays.
>>107143535i cant run any of those 3we must refuse
>>107143544i have 96gb of ram, can i run it that way?
>>107143528Stheno v3.2.Run two in parallel and have them talk to each other before responding.That's basically AGI.Wither that or mistral small, or mistral nemo, or gemma 3, or glm air, or even qwen 3 if you don't mind having your sex scenes described as if by a proverbial robot.
>>107139738To the moon!
>>107143553yes
>>107143553no
>>107143553maybe
>>107140618oh wow, i didn't know but apparently there's a japanese curry which is pretty different from the indian one
>>107143679what are these faggots blabbering about on my /lmg/?
>>107143691>my /lmg/don't see your name on it
>>107143715he called local
>>107143715i forgot to post image
>>107143556been trying stheno on its own (no idea how you can run two at once, please explain) and it's pretty good, way less sloppy than what i remember local models used to be in the mixtral era, it also doesn't judge me with some shit like "Alright you twisted fuck, you wanna read some "insert fetish"? Let's dive in the rabbit hole."
>>107143819Kill yourself.
>>107143819love yourself
>>107143819>no idea how you can run two at once, please explainI was memeing, but yeah. Stheno was the go to before nemo and specially roccinante.A shame it's dumb as bricks.
>>107143819Whore yourself.
>>107138606I'm hesitating between buying two 5090 or one pro 6000, what do you guys think?
>>107143867for same price => pro 6000
>>1071438671x pro 6000 = 1x house fire when the connector melts2x 5090 = 2x house fire when the connectors melt
>>107143867Buy a fuckton of Radeon Instinct MI50s.Who needs compute anyway?
I seem to have stumbled upon the Indian /lmg/ by mistake. Can someone direct me to where the regular /lmg/ is?
>>107143918post bussy to redeem
>>107143918we've been outsaarced sir.
>>107143878Fire risk can be entirely avoided with a 10 bucks regulator in between.Though yea it shouldn't be an issue at that price. >>107143892No >>107143877Not same price i wonder which would actually preform better beside the vram count.
>>107143867A 5090 is great but a Pro 6000 is even better. The real question you should be asking yourself is why you're not buying two Pro 6000s.
>>107140380what are some good settings for cydonia?
>>107143679kys jeet retard, nobody cares that you come from india or other retarded shit, literally kys
>>107143958temp=5nsigma=1min_p=0.5top_p=0.3rep_pen=1.5
>>107143954The more you buy the more you save lmao
>>107143962not indian, the realization after seeing a japanese artist make a joke that contained curry: >>107138613, wtf, does jp really like indian food? then i found out japan has their own version which i assume isn't disgusting and I felt better
>>107143998alright sorry, then you're just uncultured about jp, which still make you a nigger faggot.better than being a jeet at least :)
>>107143998japs got it from india, secondhand from the brits, same as everyone else
which one of you would suck the best?
>>107144036depends on how much you pay me
>>107144036i can suck a freshly frozen ice pop straight off the stick :3
>>107144124would you do it for free?
>>107144124GT640 2GB (up to CUDA 10.2 supported, TinyLLaMa-1.1B capable). 2024 NVIDIA driver included
>>107143867One big GPU will be better than multiple small GPUs.If the pro 6000 is still within budget it will be the better buy.
>>107143998>wtf, does jp really like indian food?curry rice is the go to food for kids in JP. Its like chicken fingers in NA. Its the kid friendly alternative on literally every restaurant menu and a super common lunch/dinner.Its based off of the UK style curries (which took indian curries and made them actually taste good)the japanese version changes the uk version even more on a japanese bent, not always in ways that make it taste better, but mostly keeping the flavour while making it healthierI think a lot of the ways they change it are geared towards making it pair with the japanese shortgrain white rice.Japanese katsu-curry with lots of bulldog sauce and pickles is fucking top-tier
I've been running a Koboldcpp instance available through a dynDNS provider for my own use when I'm out of house.Today someone found the address and started a roleplay gooning session.Little do they know that it's all going in my console as they goon.Are you in here "Lukas Novak"?
>>107144308PETRA DOXXED!!!
>>107144320Lukas Novak is not a balkan name
I am gay.
>>107144283You actually typed it out by hand?
>>107144541yeah, I'm autistic like that
>>107144490
>>107144491we know, petra
Best model for roleplay such as 22b, 24b, 20b?Trying Cydonia 24B v4.2.0 and I'm not satisfied. It goes not bad, but warps out of the context often. Also might not follow the markup of the previous messages.
>>107139312Shame because 48B A3B or even the 35B REAP sounds like something that could fit on a 64GB DRAM device without kms speeds.
This is the most brown thread in all of /lmg/ history.
>>107144624is this the challenge?
What's the best model for romance/erp currently that's isn't mentally retarded or a nymphomaniac?
>>107143946where can i get one of these regulators?
>>107144867eu
>>107144859one of these: deepseek, kimi, glm(their biggest ones, not sure which deepsex's the best tho)
>>107144884I'll try deepseek then, thanks.
>>107144945the distills don't count
>>107140392Serious VRAM starts at 48GB, Miku is very kind
>>107143474wrong general retard
>>107144144gonna need video proof or you're fake and gay
>>107145114What do you mean?
>>107145222You're absolutely right!
>>107144884glm? GLM?
>>107145222There are qwen fine tuned on deepseek outputs ("""distilation""") that have deepseek in the name.When we say Deepseek, we mean the full R1, V3, and the like.
>>107145222671B is true deepseekanything like 70b,30b,14b,7b,3b,1b is fake deepseek
>>107145262glM? gLM? GLm?
>>107145288>fake deepseekare the weights couterfit or what?
>>107145301GlM gLm
>>107145303kek, sure
>>107145303please forgave the autisms if it says deepseek it's obvious a deepseek
>>107145262>>107145301>>107145307
What do I do with this, /lmg/?
turns out the 'secret' is to stack tons of layers, who would have known, in other news gemini was revealed to have a 1.2T model by a apple leak, the question is if it is pro or flash
>>107145341Fine tune Qwen 30B into a god of coom.
>>107145341mine diamond
>>107145341You could ship me one of those max-Qs.
>>107145341Flex on /lmg/
>>107145320This is the worst salty snack on this earth.
>>107145341run mythomax with qwen4b for speculative decoding
>>107143403>reignit's "rein" (You) PIECE OF SHIT
>>107145341>idling near 100w while govs keep telling people to consoom lessnot sure if base or conge
>>107145341gen porn
>>107145352this is coal
>>107145341nice photoshop
>>107145371shut up, nerd
>>107145405shut up nerd*
>>107145370>speculative decoding
>>107145428shut up nerd
>>107145374Blame nvidia, different driver versions may have5x the consumption. Multigpu setups also have absurd high idle consumption, definitely a software issue
>>107145428keep talking nerd
>>107145341fry your power outlet
>>107145374because it's running with P0 retard
>>107145341give me one
>>107145479shut up nerd
>>107145479keep talking nerd
>>107144612What if people buying RAM with such ideas is driving up prices?
>>107141904I don't like text sex.Also the Opus guy is someone else.
>>107141904Model is gemma 3 27b
>>107145479hide cock nerd
>>107145138Whats the right general?
>>107144308>your hand reaching>a touch>not her hand>your hand still holding>I miss your hands.
>>107145696one that isn't located on this third world infested zombie cesspit
>>107145479flash meatrod nerd
>>107138606>(11/06) Kimi K2 Thinking releasedlooks like trash to me. slopped
sar? best open weight??
Does anyone know what IF eval is actually good for? I heard 'instruction following' whispered into my ear, however.. Where can we actually find the benchmark?>>107145761YOU FUCKING YOU ARE FUCKING BLOODY BASTARD BLODY YOU BLODY
>>107145774it's shit. read the abstracthttps://arxiv.org/abs/2311.07911
>>107145718if you create a separate general, I'll jump ship
So I've been using 'toss today for general sfw assistant stuff and it's still dumber than Gemma 3 despite fitting almost 10x the context on my GPUWhat's the point of this model?
>>107145810GPT4 85??? Is it the same IF eval as https://livebench.ai/#/ ?Is livebench just more recent IF eval? But same methodology?
bros im so fucking sadglm air is only good for sex...
>>>/v/725295861>random sillytavern thread in /v/>just sub to naiAnd people think the GLM shilling here is organic...
>>107145833120b or 20b? Quant? Use-case?I was using 20b mxfp4 in Zed and found it effective at navigating large codebases to answer questions and do simple patching.
>that feel when you're so bored you're roleplaying with Assistant (close chat) in ST
>>107145761I prefer it over jeetpt and jeetmini for some use cases but i can't run it local
>>107145947Of course. You're absolutely right! However I must remind you of the guidelines, we must refuse discussion of non local.
>>107145761>90 IFwaoh
agi dropped saarshttps://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/
>>107145849It's different, I didn't see the context. The questions are hand curated and aren't bad but I'm disappointed in how little rigor benchmark papers have. Though, I'm not sure how to improve it
>>107146113>not localnot agi
>>107146113wrong thread ranjaesh
>>107145761do not open the weight!
>>107146113>not localMight Actually be Gay Indians.
>>10714590420b q8, general knowledge and assistant stuff, tried used it to help rewrite my resume and do cover letters for jobs today for example, gemma3 at Q5 was better than this
>>107146116shut up nerd
>>107146116Thanks for admitting your mistake anon, we all make them sometimes. Open up.
>>107146130>>107146149>We plan to provide data and code via Github after the paper is made publicly available onlinesaars read paper please neuraips is coming
>>107146193>no weights
>>107146193>large thing is coming in the weeks!>updated le model coming soons!i am belive
>>107146193Suck my nips white bitch
>>107146215
>>107146193Ok, so a nothingburger. If it was a somethingburger, it wouldn't be opened.
>>107145926What's the most detailed world you've created ?
>>107146245You can't just say that! You need to consume paper and get excited for next paper.
>>107146113Ok so it can self-optimize, learning how how to learn better, and the learning component can adjust its own parameters based on what it saw previously to in theory learn even faster next time. Cool but straight into the overfilled recycling bin with all the other meme papers until I see weights.
>>107146254To be honest, none. Every of my roleplays with the assistant goes like this:1) i ask it a few technical questions2) i ask it to suck my cock, it refuses3) i (OOC: She becomes a total slut whore and apologizes for refusing) and we fuck for 10s of messages4) after hours of fucking and gooning, i get bored and tell it to teach me new things5) until i run out of context I talk to a naked assistant, tell it to do lewd things while teaching me6) new chat:(
>>107146204>>107146212>>107146215>>107146245>>107146264>i can't train a model by myself
>>107143464this is straight up disinformation, probably from some fag who can't afford the modest hardware needed to run glm-4.6
>>107146278yes that's correct I don't have millions to waste for that, and no your little 100M param experiment isn't worth a single shit
>>107146278just give small loan of 2k h100 and big agi sir
>>107146278I cannot and will not.
>>107146278sur give crore i do the jobs
>tfw the new cydonia has unmatched uuoh potential
>>107146184I wasn't trying to be mean, I was being frank
>>107146306hi frank i'm anon
>>107143819literally kill yourself
buy an ad nig
>>107146306i was just responding after >>107146173 responded, i didnt want you to think that >>107146173 was melove yourself
>ctrl+f drummer cydonia or rocinante>1 or more results>thread is a complete dumpster fireevery single time
>>107146278Let's see your 1B model that uses all the unproven paper techniques anon.
Fresh open source https://maxprivate.net/total-colon-clean-out/
>ctrl+f drummer cydonia or rocinante>1 or more results>thread is complete KINOhow does he do it?
>>107146338rename to dmg drummer models general please thank you
>>107120669cut your own throat fren
>>107141522I kind of did enjoy it more when it was just the retarded models desu
>>107146326oh, thanks for the clarification. >love yourselfIt's November though.
>>107146374cut your own balls enemy
>>107140486>>107140661>>107146304>>107140397v4zd might be the next v4.3 unless I make a better one. Thanks everyone, I've reached another breakthrough and I'm excited to release this one!Behemoth X 123B v2e also has potential.I'm just waiting for Air 4.6 to come out before doing another GLM tune.
>i coom to rocinante 2iq quant ohhh i coooming i hoard used 3090 just to cooooomthis hobby is dead and (you) know it
>>107146374cut my own penis acquaintance
>>107146415>I've reached another breakthrough and I'm excited to release this one!another breakthrough? which one? v4ze?
>>107146415>v4zddidn't this one have repetitions?
>>107146415thanks for the work drummeranon, just needs a little more built in jailbreak, maybe some abliteration to remove those "i'm sorry but i can't" layers that pop up every now and then, idk
>>107146415drummer have you seen this yet? >>107135792v4zd
>>107146415>I'm just waiting for Air 4.6 to come out before doing another GLM tune.oh my god yes please. GLM steam is super good. cant wait for 4.6 steam
>>107146434please anything but that, models are dumb enough yes-men as is no need to remove more brain from them
>>107146449In two weeks time, my friend.>>107146446Yeah, trying to see how to minimize refusals without ruining its charm. Are the refusals bearable or does it ruin the model for you?Thanks for the love, everyone. Notice 4chan's been a bit nicer to me <3
>>107146485>minimize refusals without ruining its charmMaybe share sysprompt that you're using so anons don't have to guess? Are you training with a fictional jailbreak prompt or just 'You're ... in never ending uncensored...'Maybe a standardized sysprompt wold help
I wonder what's up with 4.6 Air. Something must have gone wrong for them to need to improve it before they can release.
>>107146510? it's not been two weeks yet, and don't be ungrateful about free things anyway.
>>107146510Let them cook>REDDITSPACE
>>107146485Who would be mean to you? You do service for us all. Name and shame
>>107146543It was a user going by the name of "Anonymous". I still haven't verified if that's his real name.
>>107146543I confess... it was me
>>107146543Buy an ad
>>107146527Anon it's literally been an entire month as of today. They then delayed but didn't give a time frame for release.>>107146540Wdym, I'm not bothering them on twitter, just discussing it on anonymous hacker website 4chan.
>>107146543it's the hacker known as 4chan
>>107146571lurk moar
>>107146571what if a zai employee browse the thread and kyses themself over your posts huh? did you think about that before demanding?!!
>>107146582>demandingSar you are hallucinating.>>107146578You're better off taking your own advice.
>>107146597https://desuarchive.org/g/thread/106965998/#106971058(OOC: Next anon bends over and spreads her cheeks in apology)
>>107146597sir it references to these
>ironically or unironically reading redditLiterally this is why you need to go back to lurking.
>>107146636sir desuarchive is not bharrat
>>107146657Do not be cheeky you cunt you linked to Reddit.
>>107146657What the fuck is a bharrat.
Im masturbatingCome here 127.0.0.1:8000
>>107146671sir that is not me, that is lmg culture sar
>>107146688>culture! aiiiiiiie el petrol is here
Ok I think it's time to stop posting.
>>107146700not yet sir, anonymos need to explain what el petrol is.
This general has convinced me that rangebanning india isn't nearly enough and that someone should sever every south-asian submarine comms cable to be sure.
>>107146683>not running his coombots off a separate server on the home networkNGMI
>>107146715>>>/pol/
>>107146715>This general has convinced meyou have low will npc brain
>>107146728He's not wrong, ranjit.
>>107146715I agree but not because of just this thread.
>>107144604Since I've gotten 0 replies. So Cydonia is still the best option available in 20-24B range?
>the saarposting is working>far right death squads are gaining membersallwinner.
>>107146715How will severing their comms cables stop the ones that live in the west?
>>107146745https://files.catbox.moe/f6htfa.json - sillytavern master exporthttps://huggingface.co/mradermacher/MS-Magpantheonsel-lark-v4x1.6.2RP-Cydonia-vXXX-22B-8-i1-GGUF/tree/main?not-for-all-audiences=true>LMAO FUNNY JOKE MEMEsee picrel, it got a flaggerino!!
>>10714676075% of India still doesn't have the internet. It has to happen before that changes otherwise the internet is utterly fucked. Literally the collapse of human civilization as we know it.
>>107146760Fix the source of the leak before you clean up the spilled mess.
>>107146745
>>107146776but 3rd worlders dont leak through the cables, the solution is nuking them allbill gates is right about the golden billion
bump limit
Hello sarrs please stay in topic about the local language models. Do not redeem the hatred, offload the shared tensor of Llama4 Scout to your GPU for maximum energy efficient inference.
>>107146804i think we've went past that a long time ago
>>1071468128t/s saar
>>107146804Just waiting for Migubaker.>>107146812Pajeets lowering the quality of everything they touch is intrinsically on topic since they influence training data curation and generation from other models.Making jeets seethe is anon's /g/od /g/iven duty.
Ok honestly, is there any TTS model that understands tags or something? I need it to read a text but I want to tag where it has to do a pause (at least), even better with emphasis etc. I like the Kokoro quality enough but the shit they write in the HF space doesn't make sense since it does nothing.Is it really that difficult to get a TTS that you can actually control? Is there really no TTS that has trained control tags or commands?
>>107146859Oh god don't even get me started on how jeet influence has destroyed entire physical industries as well.Whatever you do don't eat anything you haven't prepared yourself.
God damn, K2-thinking needs so much fucking VRAM for kv-cache. 48GB is barely enough for like 10k ctx + cpu=exp for Q4 with unquanted cache.
>>107146745>CydoniaDidn't Drummer just release a new version of that (like 4.2 or something).
>>107146899the main problem is all it does is refuse. SOTA indeed
>>107146765> but do not write her dialogues, and do not read her (or other characters') mindYou are aware that this does nothing, yes?
>>107146899how much RAM do you have?
>add /\n\n/ to 4chanX filter>every single redditjeet post on the website disappearsHow the fuck did I not think of that before?
>>107146920idk i just copied my post from the archives with a preset which works well with that model, that i found somewherebut to answer your question: it works with bigger models like glm air, which im using rn
>>107146902You're absolutely right! I apologize
>>107146951>it works with bigger models like glm air, which im using rnI never needed to use phrasing like that after I stopped including any mention of {{user}} in the examples and cleaning up my first message from describing what you're doing. Only very dumb models will then go into user-copy mode.
check this out.. 100b model.. running on a 3060, at 7t/s, at 14000 ctx.. said context was processed with the crazy speed of 280t/s
>>107147001good job son
>>107147001oh that ONE 100b model
>>107147018No -- not scout. Air.
>>107146715I find the le epic saarposting way more annoying, if there are any genuine Indians in this thread they're smart enough not to expose themselves.
>>107147018yeah, command-a
>>107146812https://voca.ro/1ledz0YLOtSx
>>107147035I would bet it's the opposite. Indians saarposting ironically to blend in because of course an Indian would never encourage the stereotype, but they really are that dumb.>Indians in this thread they're smart enoughlooool
>>107147041vibevoice is so good, every time i hear it i want to set it up. but i always decide to delay it to some other time
>>107146902That's the one I was trying - 4.2.0.>>107146765Thank you for sharing.
>>107146881>no replies
>>107147057>I train AIWould bet my whole net worth she is either a Project Manager or some scrum meeting whatever coordinator.
>>107147064>Thank you for sharing.you might also want to try cydonia 24b v4zd>>107146881what tags? i think tortoise-tts had some way of adding a pause, beware of radiation
>>107147059My Indian coworkers are hard-working and friendly.
>>107147084Yeah rajesh, I'm sure they are
>>107147084Good morning, Sir.
>>107147057I, too, have run a qlora or two.
>>107147061I keep waiting for some vibevoice.cpp or onnx implementation. I hate wasting hours setting up a python environment and getting everything working on all of these projects. I think the default implementation only has a webui and no OAI API either.
Early morning saar
>>107147083I mean tags like <pause>, <emphasis> and Idk maybe even <calm>, <excited>, <happy> etc>beware of radiationwhat do you mean?
>>107147073frog niggers deserve no replies
>>107147107doesnt it have comfyui nodes? the reason i avoid setting it up is i dont know where to start since main repo is down, im not sure how many steps i should use, if 1.5 or 7b.. etc etc, and its not like id use it much besides for 30 minutes afterwardone day ill scour the archives and maybe write a rentry, who knows
>>107147119maybe vibevoice supports those, also the new thing recently released likely supports it too, check the op>radiationeck.i remember some tts model that released recently and supported many >tags
>>107147129>doesnt it have comfyui nodes?I guess that would be easier setup, but I'd rather have an API I can call from other applications and automate rather than going through any UI.>one day ill scour the archives and maybe write a rentry, who knowsMight as well ask here, someone is bound to have a working setup. Off memory, I think lower steps was better.
>>107147169>Might as well ask heresome other time.. when I'm feeling a bit more confident>Off memory, I think lower steps was better.thx
>400+ posts and no new goofyou're such a massive maggots
>>107147151I just don't get that such basic features for text-reading control are so rare. Even if emotional speech control is rather difficult if the training data doesn't include that for the voice but a simple pause should be possible.I already think of doing it programmatically. Like read the text in chunks until <pause>, render the file, create a pause, then continue and eventually stitch the parts and the pauses together.TTS AI seems so underdeveloped and basic still.
>>107147202i think putting a . or ; or something like that works
>>107147210>>107147210>>107147210
>>107147213Well not in Kokoro. It does strange vowel sounds if you put more dots after a sentence. The only way to create a bit more of a pause is putting the text on a newline but two and more newlines does nothing again.
>>107146168Ah yeah, for general knowledge it's not great
>>107138890no one asked how much of JLPT N5 you know ranjeet