/lmg/ - a general dedicated to the discussion and development of local language models.New queen of /lmg/Previous threads: >>103477986 & >>103473510►News>(12/10) HF decides not to limit public storage: https://huggingface.co/posts/julien-c/388331843225875>(12/10) Upgraded version of DeepSeek-V2.5: https://hf.co/deepseek-ai/DeepSeek-V2.5-1210>(12/09) LG releases EXAONE-3.5: https://hf.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct>(12/06) Microsoft releases TRELLIS, a large 3D asset generation model: https://github.com/Microsoft/TRELLIS>(12/06) Qwen2-VL released: https://hf.co/Qwen/Qwen2-VL-72B>(12/06) InternVL2.5 released: https://hf.co/OpenGVLab/InternVL2_5-78B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/tldrhowtoquant►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
Begun, the Sperg War has.
Whats the best model right now thats acts similar to character.ai? Or will any model work, and thats handled by sillytavern?+
>>103487568Mistral nemo or mistral large. Everything else is censored.
>>103487568Original llama-1 or mythomax-13B, not THAT similar though but close if you want it quick.
>>103487586mistral large is censored too
she's clearly hallucinating but at least she's cute about it.there are times where, when outputting tokens, there's almost a flip flopping between understanding the nuance of what's going on, and not.I guess any "thoughts" start and end between the last token in the context and determining the new one generated. is it really luck getting there with the correct thought(s) in mind each time?in that case having them maintain a context between tokens seems like it would be a boon, surely?
>>103487773Yeah we've talked about that idea for a while now. I guess no one has really succeeded in making it work.
>>103487786this paper is the first attempt at it, right? >>103478321
>>103487813Kind of. There's still the issue of whether it actually can be applied to a production model. We've been hoping for bitnet for a while now and it's seeming grim.
>>103487489I think Coconut should be in the news.
>>103487867more like poopoonut
>>103487867I think Kokona should be the thread mascot because her paper is going to give us AGI.
►Recent Highlights from the Previous Thread: >>103477986--Paper: Training Large Language Models to Reason in a Continuous Latent Space:>103478321 >103478376 >103478946 >103478736 >103478891 >103480930--Benchmark results for Llama 3.3 70B with different quantization levels:>103480828 >103480834 >103480907 >103480962 >103481529 >103481844 >103480986--Google's AI advantage and content filtering practices:>103485975 >103485999 >103486015 >103486037 >103486070 >103486134 >103486187 >103486199 >103486239--Closed-source character cards vs. open-source alternatives:>103479205 >103479418 >103479509 >103479450 >103482151--Nemo model quirks and performance optimization:>103481705 >103481747 >103481777 >103481882 >103481928 >103481936--Google's Gemini 2.0 Flash model impresses with SWE-Bench results:>103485906 >103485913 >103485992 >103486118 >103486135--Prompt wrangling and character personality in Llama-3.3 and Gemma-2-27B models:>103480531 >103480635 >103480742 >103480788 >103480813--New QTIP Quantized Models released, no UI support yet:>103485961 >103485979 >103486283 >103486334--Anon discusses the usefulness of sysprompts in LLMs:>103481113 >103481584 >103481926--TabbyAPI support for asymmetric parallel inference with mismatched GPUs:>103479780 >103482468 >103482717--Anon speculates about DeepSeek's updated 250B model release:>103484679 >103484830 >103484853--Comparing AI models and discussing their limitations and potential enhancements:>103483783 >103483800 >103483857 >103483886 >103483911 >103484007 >103485541 >103485685 >103485716 >103485776 >103485823 >103485942 >103486092 >103486946 >103487395--Anon's PC dies due to faulty Corsair RM850x power supply:>103481759 >103481977 >103481991 >103482296 >103482662 >103482003 >103483018--Miku (free space):>103478013 >103480088 >103480784 >103485380 >103485885►Recent Highlight Posts from the Previous Thread: >>103478921Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
Flash 2 just below Sonnet 3.5 on LiveBench.
>>103488007If Gemini Exp 1206 is supposed to be 2.0 Pro, that's one hell of a fucking black pill
>>103487568Rocinante.
Should i just use temp and min p only?
newfag here. what the fuck does "RP" mean in the context of these threads?
>>103488659dunno. all these fucking incels do all day is lame sex chat. such a waste of time. i think RP refers to Ron Paul
>>103488659There's no way you couldn't have figured this out if you spent maybe 2 seconds thinking about it, but it stands for "roleplay".
>>103488628Agreed.>>103488647Yes.
new sex model when?
>>103488659It’s shorthand for rape.
>>103487489>>103487978asuka blyat
>>103488245It is. It's likely anthropic scaled to max tokens before everyone else
>>103487489Damn, he actually made the asuka thread like I suggestedAaaand... /lmg/ hasn't imploded yet, completely disproving the "miku obsessed troons" schizo(s). Cool.
>>103489504/lmg/ just has a thing for redheads
>>103487489i've been there since before /lmg/ even was a thread, but i've never quite got that petra thing, i must have missed the thread where it originated, can you make me a tldr lol?
>>103489953https://desuarchive.org/g/search/text/petra%20vsg
i like asuka and everything, but is it just me, or did the op pic kill the thread
>>103489504Don't make me dig up the archive where OP said everyone is one person and doxxed one person thinking they are all the same person because Kurisu was in the OP picture.
>>103489504the miku obsessed baker tried to make a split for the last thread because it had kurisu. even when miku was in the OP with her. it's just nobody used it, it's still up.
does your favorite streamer have an AI counterpart
linux spoonfeed guide for hunyuan video when?
remember the 12 hours or so that there were a bunch of people here pretending that Llama3.3-Instruct was really good for RP, and after half a day of glazing it they all instantly vanished foreverWhat the fuck was that about
>>103491078>linux>spoonfeedYou should go back to windows faggot
>>103490980I don't base my waifu on vtumors. LLMs are already not that bright.
>>103491111QwQ too. The model is horrible for RP.Still not sure if it was one anon or multiple people.
>>103491111>>103491446This. Tried it for 3 gens on 0.1 minp 1 temp and instantly went back to mythomax.
>>103491177she does streams with an AI version of herself, I'm not talking about streamer "cards"
>>103491111It is just the usual false prophets. You will recognize the final cooming not by shills here but by thread going to 0 posts per hour for more than a week since everyone will be too busy spilling their seed to post anything.
AI can't draw soulful anime. Only psuedo-3D slop
what happens to the last "thought" in coconut?how is it incorporated into the token generation phase?>After the latent mode finishes (t ≥ j), the input reverts to using the token embedding, i.e., Et = [e(x1), e(x2), ..., e(xi), hi, hi+1, ..., hj−1, e(xj ), ..., e(xt)]. It is worth noting that the last hidden states have been processed by the final normalization layer, so they are not too large in magnitude. M(xt+1 |x≤t) is not defined when i < t < j, since the latent thought is not intended to be mapped back to language space. However, softmax(W ht) can still be calculated for probing purposes (see Section 4).
>>103490980I predict is gonna be fun to watch AI models play their fake personalities better than the "actresses" behind the avatar. But then just like everything related to vtubers it is gonna be miserable. The companies are probably already thinking about the incoming apocalypse so I am sure they will ensure that the fanbase culture is heavily against using AI models like that. At least the ones that aren't sold to you by the company. Kinda like warhammer figurines. Which makes me wonder why they aren't already heavily pushing AI chatting with vtubers on their fanbase.
best model for an rtx 4080 super + 32gb ram?
>>103491871nemo
why do models suck with clothing? even mistral large thinks turtlenecks have cleavage
>>103492278Lack of data describing various types of clothes
Seems like the general public is waking up to the fact that LLMs will never give us AGI and that the exponential trend is more like an asymptotic line. Now what?
>>103492881the next path has already been decided. all in on the o1 meme
>>103492278might be anime data getting in
>>103492278Because they only see text
>>103492937Text can be used to describe the coverage area of turtlenecks. Surely a description, definition, or explanation of what a turtleneck is must exist somewhere online.
>>103491749Vtubers are already getting btfo by neuro-sama. The whole vtubing is pretending to be an anime girl, so you might as well cut the middle man. At least you won't have a drama on your hands every week from these mentally ill girls.
"abandon""adventur""barely above""beaming""blur""bond""borrowed""camaraderie""can we talk""challeng""chuckl""circle""clap""cling""coos""collect""cooed""corner""determ""drawl""dynamic""flicker""game""gleam""glint""grin""growl""heady""heart ""heavy ""hint""hiss""hmm?""hoarse""hooded""husk""hustle and bustle""impish""just maybe""lay ahead""lingers""lock""mix of""mock surrender""my spine""navigat""mischie""new world""not so bad""pad ""padded""palpable""pang""patterns""playing with""reality""renewed""ridden up""riding up""ride up""rides up""rode up""roll my""rolled my""sashay""saunter""second skin""shine""shiver""smirk""spark""spine""steel""talk to you""taut""thumb""tone""trace""tracing""tribulation""truth or dare""tug""twinkl""twisted""unfamiliar""undercurrent""vulnerab""wanton""well, well""well, well, well""padd""pop""purr""weight of""whatever comes""whirlwind""whisper""you know"
>>103494288you could've just let this thread die instead of posting this
>>103494288how's gguf training coming?
>>103494288>model is retarded>ban the most popular phrases it naturally usesWhat could go wrong?
Is there a decent ~32b model for 24gb yet? I've been away but I vaguely heard about QwQ, I assumed it was a meme like all CoT was for RP. Is that not true? Do I dare to hope?
>>103496514QwQ seems trash for RP, but its a really good coding and general reasoning model.
>>103492958Probably, but it's averaged out by the overabundance of cleavage-showing clothing usually used in rolep- I mean high quality literatureRemember, LLMs are just stacked layers of stochastic predictors, the average wins over factual information. Usually.
>>103496564>trash for RPnothingburger then?>"""general reasoning"""so, answering benchmark riddles or am I missing a use case here?Anyway what's the 32b sota for RP then, is it Qwen-2.5? I found earlier versions of Qwen pretty bad and prone to spitting random chinese which is never a good sign.
>>103496602>32b sota for RP thenfirst commander. but not really. sota for 24Gb is applying 2MW.
I have no good model and I must coom.
>>103494288Thanks, I will add these too.
>>103496564>>103496602QwQ is literally the best local model right now, learn how to use it.
>>103497203>QwQ is literally the best local model right nowI agree that its the best for its size (and almost certainly the best balance of speed and smarts), but tulu, mistral large, deepseek and L3 405b would all qualify as "better" on various metrics.
Is "Skip Special Tokens" in ST supposed to be checked or unchecked? When I press the neutralize samplers button, it gets checked, so it's the default, but is there a good reason it is? Wouldn't skipping special tokens be bad?
>>103488628Which version? And what format works best for it?
>>103497203any advice?
https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090Phi 4 incoming.
>>103499412>hasn't seen a dick or pussy in its life (training)Yike!
>>103498457they're pulling your leg anon, its a meme model. there is definitely something to having a model make 2 passes before giving an answer, but it needs 2 real passes, not a <think step by step> tag
>>103497384I think it fucks with llama 3 models.
>>103499433How is the QwQ CoT even supposed to work? The template looks like ordinary chatml. Did anyone try some sort of cot prompt for RP with any decent results? It seems like it would be difficult to fit into ST's prompting but still I'm curious. My assumption would be that its reasoning would contaminate the prose so it would be shit and dry, but idk
>>103499700it basically uses double the tokens to run a double-check on what it outputs. first its supposed to figure out what you want (the step by step part) then give you the actual answer. you activate it by simply adding 'think step by step' somewhere in the prompt.a while back on the original deepseek 33b code model, i noticed it would never give a correct answer on the first try. it'd spit out code and you tell it part is wrong, THEN it corrects itself, so thats 2 passes and the ai gets to re-read what it already output. thats what they're trying to accomplish in 1 pass, but still at the cost of extra tokens
LLM + Realtime gpt-sovits, I'm so far ahead of normalfags using cai it's not even funny
>>103499412Like another anon said, previous Phis have all been incredibly brittle models that had obviously been gamed for benchmarks scores and completely fall apart when given anything slightly OOD. No reason to get excited about this, no doubt it'll be the same again.
>>103499412drummer are we getting a cream phi 4???
>>103500490I'm bringing it to the masses through opensource;Working on adding something like chatgippity adv voice talk with dif voices
>>103499758This only makes sense to me for toy one shots, any examples of how this works in a multi turn RP? Do you leave the thought blocks in the history? It seems like you'd have to, or it'd learn not to do it. But the thoughts would also probably mess up the prose and it seems like a model would be prone to confusing thoughts and actions after a while especially deep in the context.Is it really just benchmark shit for riddlers or is there actual value somewhere?
>>103494288>"padd"damn not like this anon, my paddington bear card is gonna get no play
>>103494288I counter with:ministrationsaudible poprivulets ofadmit itpetthe ball is in your courtthe game is onthe choice is yoursI don't bite... unless you want me tohalf-lidded eyesshe worries her bottom lipwarring witharousal pooling in her bellytake your pleasurefiddles with the hem of her skirtkiss-bruised lipsa bruising kissdespite herselfyours to takewantonwith reckless abandontorn betweenknuckles turning whitegrins wickedlyfiery red hairlong lashespropriety be damnedthe world narrowspupils blown wide with pleasuretongue darts outchestnut eyesgrasps your chin and forces you to meet her gazebites your earnails raking angry red lines down your backher cheeks flamingcheeks hollowingstars burst behind her eyesinner walls clenching around nothingpuckered holeher wet heatshe whimpers, biting her lipdusky nipplesslick foldsstill lodged deep inside herheart, body and soul belong to youthe night is still young...for now.
>>103500963Chewing wood with Rin
>>103499426https://www.youtube.com/watch?v=gBqUDXu9h-I
>>103500528Well good luck with that. I almost remade the whole thing from scratch.
>>103502817any tips or issues you'd care to share?My plan is to implement sovitts and then figure out how to add/enable easy training for custom models, so user's can choose whatever they'd like
>>103505646Why was this deleted?
Does anyone else have trouble with rocinante being obsessed with washing after each sex act?
https://xcancel.com/JustinLin610/status/1867619389065114040>In 2025, Qwen models will be omni and smart, hopefully.open omni bros...
>>103506512Great. What happened to BitNet Qwen 3?
>>103506547they only ever said they knew about bitnet, this is different in that they have confirmed they are actually working on it
>>103506568>they only ever said they knew about bitnetIt's been a while, but didn't they promise more than that? Pretty sure I remember them saying they would experiment with it. If they just stop talking about it, seems like a strong indication that BitNet doesn't work in practice.