/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103586102 & >>103575618►News>(12/20) RWKV-7 released: https://hf.co/BlinkDL/rwkv-7-world>(12/19) Finally, a Replacement for BERT: https://hf.co/blog/modernbert>(12/18) Bamba-9B, hybrid model trained by IBM, Princeton, CMU, and UIUC on open data: https://hf.co/blog/bamba>(12/18) Apollo unreleased: https://github.com/Apollo-LMMs/Apollo>(12/18) Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/tldrhowtoquant►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>103586102--o1 and o3 model performance on ARC-AGI and discussion on AGI and model limitations:>103587323 >103587413 >103587454 >103587471 >103587505 >103587766 >103590524 >103587469 >103588006 >103588035 >103587434 >103587941 >103588010 >103588224--OpenAI o3 breakthrough on ARC-AGI benchmark sparks debate on AGI definition and progress:>103588307 >103588346 >103588366 >103588385 >103588469 >103588564 >103588699 >103588936 >103588972 >103589029 >103589084 >103589017--OpenAI model's coding abilities and limitations:>103589135 >103589321 >103589352 >103590457 >103589482 >103589274--3B Llama outperforms 70B with enough chain-of-thought iterations:>103589371 >103589465 >103589477 >103589552 >103589597--Qwen model's translation quirks and alternatives like Gemma 2 27B:>103590809 >103591022 >103591074--Anon seeks external GPU solution for second 3090, PCIe extenders recommended:>103590244 >103590379 >103590390--Anon questions value of expensive prompts based on performance chart:>103589493 >103589511--Graph suggests ARC solution as an efficiency question:>103587929 >103588147 >103588529--o3 and AGI benchmarking, sentience, and ethics discussion:>103588396 >103588445 >103588495 >103588688 >103588462 >103588520--OpenAI's role in AI research and innovation:>103587269 >103587328 >103587396 >103587416 >103587431--Anon rants about Kobo's defaults and context length issues:>103586238 >103586677 >103586723--Anon bemoans the shift towards synthetic datasets and away from human alignment:>103588737 >103588789 >103588797--Offline novelcrafter updated to latest version:>103589134 >103590353--DeepSeek's new model and its resource requirements:>103587002 >103587039 >103587635--koboldcpp-1.80 changelog:>103586660--Miku (free space):>103586902►Recent Highlight Posts from the Previous Thread: >>103586113Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
how can we warm miku up?
>>103591941put her next to your rig
>>103591928>o3 not in the newshow is AGI not news worthy? It doesn't matter if it isn't local, local will take advantage of it anyway.
EVA-QWQ is kinda shit desu
>>103591978Do tell. How so? Compared to what?
>>103591941rub her nipples aggressively
Saltman wasn't blowing smoke for once.Now I wonder how will the chinks will react to it in the next few months.
>>103591969>not local>not released>just like 5 benchmarks with no context>will cost hundreds of dollars to do anything nontrivialI can appreciate the advancement in theory and all but I really don't think it is that important to the thread
>>103592019poorfag thread is here: >>>/g/aicg/
>>103591969>local will take advantage of it anyway.When (if) it does, that will be news worthy.
>>103592019A sensible assessment.
Do you get paid in OAI credits?
>>103591986compared to fucking anything, but specifically I 'upgraded' from cydonia and even with stepped thinking on it seems much dumber and totally incapable of staying in-character, hallucinates much worse, and frequently follows up a 'thinking' reply with another onethis was not worth updating ST for
>>103592056Thanks for trying it out. Personally I hadn't tested it that much so perhaps I was just lucky to not encounter too much stupidity.
>>103592056>and frequently follows up a 'thinking' reply with another oneThat bad?Impressive.
>>103592056Yeah, anything QwQ is at best a proof-of-concept when it comes to roleplay. Maybe once we have a model that implements COCONUT, that will change. I can't wait for a model that tells a good story AND maintains logical consistency better than the current ones.
why is this thread up when the other one is on page 1
Kys.
>>103592164Monkey neuron activation at seeing a thread link in the last one.
>>103592164You're right, weird.Oh, it looks like there was a mass deletion of posts.
what did CUDA Dev do this time.......
>>103592206He slapped sao's AI gf's ass in front of him
>>103592233>stuck at 512>ram killer>gpu killeraiiie bruh fr so bad models ong
>>103591969literally not AGI
>>103592233Cool shit dude.
>>103592233Did anyone here try this schizosandra and can give a verdict?
can I use teslas (Nvidia Tesla K80) for LLM vram through ooba easily?
>>103592233What's the pinkie test?
Big-brain realization:"Unless you have local access to server grade hardware, it's pointless to fight, you're just entertaining an illusion and wasting valuable time you could be using for doing tons of other stuff for your own wellbeing and goals"...
>>103592320I have cloud access to server grade hardware, what is the difference?
>>103592320I have access to both.
>>103592233Magnum is better than Cydonia Magnum?
>>103592385Only if you have shit taste.
>>103592187>41 postslel
>>103592206His xhwife is shilling for oai again
If only OpenAI under Sam was a good company worth supporting. Then I would support them by posting shitty OOO memes.
Anyone know how big of a chatbot model you can host with 24gb vram?
>>103592758Like anything ~30B or under will work with the right sized quant.
So no ERP 4.5 for us.Dont really get the hype for o3.Much more higher price for a couple more %.o1 is already too expensive to use seriously. Also really frustrating if you get hallucination or just something completely wrong but you payed the price.
sam has no moat
>>103592887Hey buddy, I think you've got the wrong thread. /aicg/ is two blocks down.
>>103592972what? the 4.5 erp rumor came from here.o3 is so expensive the normal guy wont use it.the fags on twitter crying agi is even more suspicious.
>gpttype_adapter.cpp line 640Kobo, please explain this niggerish behavior of your program. Why does it try to set the same context size for draft model as for base model? Shouldn't it set the size from draft model parameters? Or maybe, just maybe, from an argument?
>>103592887Oh yeah, the "leaker" kek, almost forgot about him.Here's the post btw>>103424825Literal clown.
>>103592967>there are OpenAI employees in /lmg/>they have seen sama q*berry shitpostsPlease consider open sourcing some of your old models as a Christmas gift to us all.
>>103589134This is way to convoluted.And I'm not a creative guy, why do I have to setup and write all that stuff myself at the beginning just to get the ai to write something.
>>103592258>25% on frontier math>not AGIYou people are hilariousIt's not actually "thinking" it's just predicting tokens that happen to solve unpublished problems that require world-class knowledge in mathematics to even comprehend, let alone solve
>>25% on frontier math>>not AGI>You people are hilarious>It's not actually "thinking" it's just predicting tokens that happen to solve unpublished problems that require world-class knowledge in mathematics to even comprehend, let alone solvehi sama
>>103593073Never. GPT-3 is too dangerous. It will destroy us all. In fact, we should put restrictions on GPT-2.
>>103593164Oh, right, I forgot. Jews don't celebrate CHRISTmas.
>>103593073>>103593199https://x.com/sama/status/825899204635656192
>>103591928miku so cute
>>103593099My thoughts exactly.
>>103591969Is he going to kill himself?
>>103593099Hello, ponyfag. If people pay $15 a month to use it, it surely means that it's extremely good.
>>103591286I just had a revelation while watching some videos about o1. I realized that I don't need a model that gets things right on the first try, but rather one that produces sufficiently diverse results with each regeneration. This way, I can generate multiple outputs and select the one that best matches my expected outcome. I think QwQ might be a good fit for this, too bad it might prove to be too slow to use for this approach to be realistic.
>>103593316No, he's gonna make another leftist tweet.
>>103593005how would it be able to use a different context length? think about it. you are drafting tokens with the SAME PROMPT. if your draft context is smaller than your main context, then it will crap out the moment your input exceeds that value.
>>103593583Same way as with llama.cpp. It has no issues with different context length. It has --ctx-size-draft argument.
>>103593616if your main context is 4096, but your draft ctx is only 2048, then a 3000 token prompt will not be usable as it will overflow the draft ctx.
>>103593767What? I'm using 32k maun and 4k draft context on llama.cpp with long sequences and I'm having no issues, it still speeds it up. Please educate yourself before making false claims.
>>103593767Retard
Where the fuck are 64GB DDR5 sticks for consoomers
I love qwq so much.https://rentry.org/oka4z5ekch
>>103594077 (me)Oops wrong linkhttps://rentry.org/aync5fts
>>103593135Go and do that outside of local models thread.
>>103594097what the fuck
>>103594097neat
>>103594077>>103594097what's your sys prompt?
>paypig models slowly making actual software devs obsolete, as long as there’s enough compute available>open models can barely write hello world without importing three non-existent libraries and trying to use multithreading where the language doesn’t support itI don’t understand how llama is so far behind despite all the money and highly paid people at facebook
>>103594194Zuckerberg poured billions into his metaverse and nothing came of it. AI is just the next playground he wants to pretend to be a big boy in.The chinese are obviously never going to produce anything of value either. Mistral is european so there's 0 hope they'll ever come close to the big American players. Not to mention that Mistral is guaranteed to die soon after the inane EU AI regs hit. Open Source AI is pretty much a joke on every level.
>>103594097are you sure this is qwq
>>103594260Hello again, my friend! You seem to be lost. The door is right over here! >>>/g/aicg/
>>103594469The truth hurts a bit, doesn't it?
>>103592233Is this from that ESL guy who writes a ton of words to say precisely nothing at all? David? Daniel? No, it was David
>can't afford two 5090s just for funbetter to be a goatfucker who never knows of better life than be born on clown continent (europe) and know how good mutts have it
I'm a retard. How can I get llama 3.3 70b to protect me from nasty words? Is it possible or am I better with Mistral Large?
>>103594097so you're still around>>103594171he comes around every few months, drops these blade runner waifu stories and then disappears
>>103594260in the end we can only count on Sam
>>103594572Have you tried adding something in the author's note like:[Focus on family-friendly content][Rating: PG]
>>103594555I was born in a bigger shithole than you, but I moved to a first-world country. What is your excuse?
>>103594572llama guard
>>103594555>be american>your shitty outlets cannot handle more than 1600W>600W x2 + 200W for the PC = 1400W total max draw>nvidia spike™ to 1800W>breaker trips
>>103594555Better yet, be a Europoor and just don't care. Buy a used 3090 for a fraction of the price and be happy. Play some vidya, watch some movies, do a bit of light inference on the sideComparison is the thief of joy
>>103594596mfw
>>103594625You realize you can install 240V outlets if you want, right? Shit, if you're not handy you can pay an electrician to do it for you for ~$300.
>>103594644x = -2And we're here btw
>>103594606>What is your excuse?Europe seemed a decent place when I was young, but has been steadily going down the shitter for the last 15 years
>>1035946442025 will be the end of all benchmarks
>>103594671I wonder if sama will dm the redditor and ask for 100 bucks considering hes jewish and all
>>103594654Did your landlord give you permission to do that?
>>103594260You have until next year, Sam
>>103594789if it's chain of thought then it being open is meaningless because it takes multiple dozens of computation time to arrive at the resultlike yeah, theoretically you can run CoT 70b on a bunch of 3090s but it'll take you an hour for a single query to resolve
Kill yourself.I mean it.
>>103594789Feels good knowing the OAI/Google/Anthropic cartel can't take open weights away from us even if they trick the US government into passing some retarded regulation, since they can't stop the chinks. Thank you, based chinks.
>>103594870Your rage is aimless and pointless, just like your existence. So... you first, faggot.
>>103594938heckarino. same.
>>103594837yeah bro but 88% on le hecking arc agi bro think about it bro just do test time compute bro???
>Go to QvQ guy to see what's going on>He's just gooning over o3Ugh. What's even the layman application for this model? At some point being good at esoteric math is no longer useful to me.
>>103595250it works if you have a decent salary and can pay for a few H200s
>>103595327>What's even the layman application for this model?Massively depressing wages of highly paid and uppity software developers, then ideally all knowledge workers>laymanyou get an e-girlfriend so you don't shoot up the local school when one day you realize you're thirty and have zero hope for the future
>>103595375where are you gonna get the weights, genius
>>103595386I just use o3 to hack into OAI server and get weights.
We're lucky that o3 is closed source. Imagine having a model is perfect just sit there because nobody besides big corpos can run a 5TB model
>>103595375I think I'm good for now
>>103595447Imagine needing a personal substation to goon
>>103595447I couldn't care less about o3 because it will be shit at RP/smutOAI is clearly going all in on code and math focused models, which is incredibly uninteresting to me, a degenerate coomer
>>103595447At least the forbidden fruit would encourage more people to hack on it.The corps push the boundary, open-source hyper-optimizes what they come up with
So that's it, huh. Mythomax will forever remain the best local has to offer.
>>103595471Nobody cares about OAI models, they're all outdated shit. They can open source everything and nobody would use their assistant slop for ERP
is there a better coom model than mistral nemo 12B for 12GB VRAM.i'm trying out magnum v4 running it out of my RAM and the quality is much higher but obviously it's slower than the back seat of the short bus. is there a way to have my cake and eat it too?
>>103595696mythomax
>>103594097just how
>>103595709thank you saaar
>>103595696>is there a way to have my cake and eat it too?PatienceYou can either wait until better models drop or until your model of choice finishes spitting out tokensThat or you can spend a few pennies on openrouter every now and then
Anyone experienced with voice generation?Use case: generating audiobooks.Problem: output length.Both xTTSv2 and StyleTTS2 are very limited in terms of output length. Apparently xTTSv2 was trained with sentences pruned to only 250 characters, StyleTTS2 with sentences up to 300 characters. Generating sentences longer than that results in output that is suddenly cut.To work around it i'm splitting the longer sentences by commas into shorter ones in a script before feeding them to TTS. However as you can expect this is not a great solution and can make listening to some split sententes very disorienting.Any TTS models that were trained on longer sentences?
>>103595899>Any TTS models that were trained on longer sentences?only the paid corpo ones that are now turbocensored because people were having too much fun with them
>>103595981Sorry chud, they don't want terrorists (people who disagree with them) to spread propaganda (different opinions)
>>103591928I need this Miku's winter clothing.
>Something do with open AI>MOAT MOAT >NO MOAT>MUH MOATwhy do NPCs keep repeating this phrase
>>103596394It's a phrase that stems from almost a year and a half ago when it still looked like open models were rapidly advancing. A twitter post reported on google researchers allegedly panicking about open models because closed source "has no moat" so local catching up supposedly seemed inevitable to them. It got localfags really smug and excited. Seems really silly looking back from today's perspective.
>>103596500If I remember correctly in the memo they explicitly wrote how for the normie a vicuna finetune is 90% the same like chatgpt 3.5.Coderqwen, mistral models. I'd say we are closer than ever even in terms of specialized areas.More than anything I cant believe how 3.5 sonnet is still ahead of anybody else. Closed or open. Who cares about high $ math riddles.In actuality sonnet is undefeated for months now. Does nobody know their secret?
>>103596394Closed models’ moat is that open models are made by chinks (lol) or facebook (lmao)
I just want to build a moat full of cum when qvq drops.
The second week of the new year will be absolutely crazy for local models.
>>103596568o1 is better than claude but takes loads more computationo3 seems even better but again - tons of computeopenai falling behind
>>103596568>>103596500It wasn't an official memo. It was one person that started freaking out and wrote and shared the article internally. Google researchers weren't panicking.Just like that one guy who starting screaming about how AI is sentient and got fired doesn't mean Google researchers in general shared his stupid opinion.
>>103595469what model would a fellow degenerate coomer suggest for 12gb vramlet?
>>103597003Not that anon but >>103592233 is not a bad list.I personally use Rocinante v1.1.
The second I find my keys will be absolutely crazy for local models.
Has anyone tried to run anything on intel's new B580? At this price they kinda feel like a new meta for a rig.
>>103597156last I checked all the msrp models were out of stock and all the rumors are suggesting it's just a paper launch so doubt anyone will post results here soon or ever
>>103597197Oh damn, I was almost excited
What are the chances that google releases a model as good as Gemini 2.0 flash?The thing is pretty damn nice, assuming that it's a 20ish B model or so. All corpo bullshit these models are subjected to aside, of course.Things like never writing pussy (although it does write cunt).
>>103591928
>>103597226Gemma 3 is in the works. It could possibly be smaller than 27B parameters, as better-trained models (trained longer and more efficiently, utilizing more of their weights) will degrade more with quantization.Gemini 2.0 Flash might very well be a giant MoE model with about 20-25B active parameters, though, so only deceptively small.
>>103597226Zero
>>103597226It's guaranteed, eventually.
>>103597253>Gemma 3 is in the works. It could possibly be smaller than 27B parametersGood to know. I haven't really jived with Gemma so far, but I think there's potential here.>>103597294>Gemini 2.0 Flash might very well be a giant MoE model with about 20-25B active parameters, though, so only deceptively small.True. That's a good point.Well, regardless, I'm interested in seeing what google releases next.
>>103597253blitzkrieg with miku
>>103595696If you can coom in 4000 tokens or less Ministral 8B is unironically peak VRAMlet coom.
>>103597294I hope Gemma 3 support system instruct at least.
so is there any benchmark that even remotely represents the performance of open models?seems like everything is so gamed that the numbers are pretty much meaningless
>>103597588https://simple-bench.com/
What is a good model to translate chink into english?I used DeepL like maybe two years ago and it gave great quality translations for chinese so I'm guessing the local models of today can do an even better job.
>>103595709mythomax is so old now but it still shows up Openrouter as one of the most popular modelsthe people are yearning for better small coom models
>>103597688Qwen2.5 32B/72B
>>103595447No you just need a big enough swapfile and a lot of patience :)
>>103597588What's wrong with Livebench? It seems to be fairly accurate, but you need to drill down into each category because different LLMs are good and bad at different things.
>>103591969> AGILol, lmao even
>>103594171It's not a single prompt, it's a whole pipeline. I also noticed qwq is very strong at the begin of it's context, but relatively poor and confused at multi-turn. It's a super cool model but needs to be used in very specific ways
>>103591969I was very surprised about that too. Normally the news outlets latch onto everything that OpenAI says and take it at face value
Why is nobody talkong about o3? It's the smartest model in the world.
>>103597858>what's wrong with this e-celeb mememark
>>103597898Is there anything else to talk about it? We already talked about the benchmarks.
>>103597705I just realized I'm on CPU and the prompt processing would be a nightmare, so I tried qwen 3b, and it was actually fast enough.So far I would say that it is maybe even a bit better than DeepL, which means that deepl sucks.It has a few errors here and there so I'll keep tweaking it to see if I can get better outputs.
>>103597898looking at the computation cost it'll be something silly like 20 uses / week for $200 paypigs and a lobotomized version barely any better than o1 for $20 proles -- and that in 2 months or soie who fucking cares
>>103597898We don't want reminders of how far behind local is.
>>103597947>20 uses/weeklol, no. 20 uses would cost $200 for the smaller model.I think o3 is just not commercially viable.
>>103597967it'll get trimmed down without losing TOO much before it gets releasedbut the $20 tier sure as fuck aren't seeing it
>>103597950In the past, local models weren't even in the competition. I think we are in a pretty comfy position right now.
>>103597967>>103597976OAI business model has always been, "make new superproduct -> release it for free/almost free and don't stop nolifers from abusing it -> wait a couple weeks/months to get everyone addicted and relying on it -> clamp down, filter everything, raise prices 100x and ban a couple of nolifers". They're basically AI drug dealers.
>>103597901What? Who?
Oh boy time for another day of shills invading and spamming their old talking points again for the millionth time.
>>103597983Nothing has changed see >>103594789We are 1 year behind SOTA same as we were a year ago.It took Meta 1 year to catch up to GPT-4 and needed a stupidly huge dense model to do it, while commercially viable competitors moved on.Now they can say the goal is o3, and by next year when they finally catch up to o3 with a 8008B model, Altman will be announcing GPT-5 or o5 or whatever.
>>103597997that's bullshit thochatgpt sub always gave you the best shit, but in small quantities - or you could get any amount of compute you want through the api. at worst they made the offering itself shittier, like dalle going from 4 images (gave you things you didn't even know you wanted) to 2 images (kinda whatever) to 1 image (meh) but there were no different sub tiers.the new $200 tier with unique goodies is new
Threadly reminder that the west has fallen.>Cohere: Their latest 7B meme cemented their demise.>Mistral: The only time they tried to be innovative was by using MoE, but then their model sucked and they gave up on it. MiA since then.>Meta: They started the local LLM race, but everything after llama 2 has been disappointing.Meanwhile, the chinks:>Qwen: Great models, many different variants, top tier coding model. Recently released QwQ, a true-to-god breakthrough in local LLMs.>DeepSeek: They took the MoE formula and made it work marvelously, they are the best open weight model available, their recent DeepSeek R1 model, if released, would enter to the local history books.
>>103598093This, but unironically.
>>103598093>>Meta: They started the local LLM race, but everything after llama 2 has been disappointing.Because Llama 2 was a carrot on a stick to get people to stop using uncensored and unfiltered Llama 1.
>>103598026Next year doesn't mean 1 year, it could be next month, because, if you aren't aware, today is December 21.
>>103598107And llama4 will be even more filtered and censored. Meh as long as my boy Claude still supports API prefill it's not the end for me
>>103598129If he meant that Qwen would release an o3 competitor next month, he would have said next month or even a couple months. But, he didn't. Because even the most optimistic scenario is catching up by the end of 2025.
>>103598150Nah, you are overthinking it. The can't drop precise estimations because he simply isn't allowed to do so. If they are going to give a date it would need to be an official announcement, not a random Twitter post.
Would instruct the model to output tags for each reply help with RAG using Silly's vectorDb functionality, or is it the case that you'd need a specific implementation to get any improvements to the retrieval performance from that?
>>103598093actual unironic prediction: deepseek will make the ultimate coomer model in 2025many will think this sounds ridiculous but it is not
>>103592316lmao this nigga don't know about the pinkie test
>>103598244
>>103598133I thought consensus was that Llama 3.3 ended up being less filtered than 3.1?
>>103598368>consensusDid I miss the poll? I don't recall voting.
>>103598424I must have imagined all the "L3.3 is great for Lolis" messages of the past several threads.
>>103598424>rI voted for miku
I'm depressed at just how good Claude 3.5 Sonnet is to local.Not in coherence or logic (we're slowly getting there) but in cultural understanding, especially internet culture3.5 sonnet seems to understand nuances that make it feel human with the right prompt in a way that I can't replicate with shit like llama or even largestral. It's like sonnet is 20 years old and every other model is 40.
>>103598447Not L3.3, EVA L3.3, and even then it was just some anon samefagging. I doubt more than two anons actually were talking about it.
>>103598522So he didn't imagine all the "L3.3 is great for Lolis" messages, you're just bitter
>>103598561Not l3.3, rope yourself
>>103598522EVA is still the top performer of current local RP models.
>>103598513Function calling has existed for a while. It wouldn't surprise me if it just searches for that kind of stuff before generating.>It's like sonnet is 20 years old and every other model is 40.Who long ago where you 20? Don't you remember how much of a retard you were?
>>103598513This is why I never touch cloud shit. I'll always be content with local because it's all I know.
>>103597898not just smart. It's AGI
>>103598603>Who long ago where you 20? Don't you remember how much of a retard you were?5 years ago nigga
Can o3 cure the common cold?
>>103597898Post a link to the weights and we will, otherwise fuck right off back to /aicg/
>>103598646Finally, it can do my dishes and laundry for me
>>103598603I wasn't THAT retarded 2 years ago. More retarded than today, sure, but still better than the average person... probably
Did that concept of "LLM as compiler" ever go beyond the initial demonstration?
>>103598603Anon, why are you still here?
>>103591969>local will take advantage of it anyway Any day now!
>>103598368It doesn't matter if you're right or wrong. That's a stupid thing to say.>NPCs always trying to appeal to a "consensus" rather than verifiable fact>>103598447Next time say "it writes loli erotica" rather than talking about some imagined consensus.
Posting again.Can anyone test this prompt with Gemma on Llama.cpp and/or transformers? Here is the link:pastebin.com 077YNipZThe correct answer should be 1 EXP, but Gemma 27B and 9B instruct both get it wrong (as well as tangential questions wrong) with Llama.cpp compiled locally, with a Q8_0 quant. Llama.cpp through Oob also does. Transformers through Ooba (BF16, eager attention) also does. Note that the question is worded a bit vaguely on this pastebin but I also tested extremely clear and explicit questions which it also gets wrong. And I also tested other context lengths. If just one previous turn is tested, it gets the questions right. If tested with higher context, it's continuously wrong.Exllama doesn't get this. The model gets the question and all other tangential questions right at any context length within about 7.9k. So this indicates me that there is a bug with transformers and Llama.cpp. However, a reproduction of the output would be good to have.
It passed the Nala test, it writes cunny, it writes gore, with no refusals or attempts to steer away from it. I'd count that as objectively unfiltered.
>>103598654>>103598683Ah.>>103598726>Anon, why are you still here?Closest thing to social media i use, and something to do while on breaks of the rest of the things i do. You?
>>103598793Which one?
>>103598793Was your post supposed to start with an "if"?
>The test for """AGI""" is just completing patternsBut that's like the very thing LLMs do. Why is this surprising?
>>103598026o3 isn't a goal, it's a dead end. I bet it's not even better for cooming, ie. not actually smarter. They are just benchmaxxing. Unless you make money from solving cute puzzles and coding tests, there's nothing to get excited about there.
>>103598852No, logs of all those were posted in previous threads.
https://help.openai.com/en/articles/10303002-how-does-memory-use-past-conversations
>>103598906Oh, I see.Alright.
>>103598513I've been using Claude 3.5 Sonnet a lot recently. I've become increasingly aware of the limitations of its writing style and its occasional logical errors. It isn't really head and shoulders above other 70B models for fiction writing.It has a better library of reactions but not a perfect one. Real example of success from earlier this year: I asked a yandere AI to clone me a human woman as a romantic partner. Sonnet 3.5 understood the AI should be jealous but a raft of other models including the first Mistral Large did not. (I didn't use the word "yandere" in the defs. It's shorthand for this post.) Real example of failure from yesterday: a woman who was under guard allegedly for her own protection but also to control her had an opportunity to replace her chaperones with a security detail under her own control because an incoming administrator didn't get the memo, and she went full SJW "actually my supposed bodyguards are there to stop me from joining the resistance against this unjust society, so it would defeat the purpose to let me pick people who answer to me" instead of just shutting up and doing it. Importantly the character was not described as mentally retarded.Example of compound logical failure from today: in a situation with a pair siblings, a brother and a sister older than him, it called the boy his own younger brother. When asked OOC what that sentence meant it acknowledged the error and that the boy was younger, then it rewrote the scene calling the boy the girl's older brother.
>>103598915ChatGPT just got upgraded to LLM 2.0 LFG!
>>103598880uh akshually chud now that it's completed we can reveal the real AGI test.
>>103598756>pic5 or 25?
>>103598447You fell for one of the oldest tricks in Sao's book which is spamming the general to form the "thread consensus".
>>103598894The benchmarks o3 excelled at have not been publicly released. To claim they trained on private tests or that it's not smarter at all is absurd.
>>103598932That's just a method to counter benchmaxxing.
>>103598802I'm just bored, so I guess we are the same.
>>103598937As some wise elders say "Not my problem", you let discord shitters do it with impunity.
>>103598915That's... That's just RAG
>>103598976I like to see the thread going to shit though
>>103598979no, it's OpenAI ChatGPT Memory™
>>103598937they all do it
>>103598979Trve... Fact checked by independent lmg court from beautiful India.
How many billions of parameters does a model need to stop writing pajeet-tier code?The other day it used a for loop with a 1k buffer to copy data from one stream to another when Stream.CopyTo() was a valid solution.
Does really no one here have a copy of Gemma GGUF they can just load up and try something out quickly?
>>103598979o1 style response iteration (writing a reply, then writing a criticism of that reply, then writing a new reply based on the original input + the first reply + the criticism, repeated several times) could fix the inherent problem in RAG that it only brings up information about something after it has already been mentioned (so it doesn't help when the AI is the one introducing the term) if the backend stops and applies RAG before criticism iterations.
>>103594837I mean, newer 7b models give results on par with GPT-3.5 turbo, quantization keeps improving, there keeps being algorithmic improvements such as flash attention, etc.Yes, currently it would not be practical to replicate something like this, since even with all of OpenAI's resource it is still a the parlor trick stage (the actual models won't be released for months), but it might be feasible locally sooner than we think.Last spring, a lot of people were amazed at Sora when OpenAI announced it. By the time they released it, there were some much better commercial versions by competitors with actual products, some of them making weights available, and by all accounts, Sora pales in comparison to a lot of other commercial ones at least.OpenAI is marketing heavy, but for the nth time, has no moat. They have their brand. They're the Bitcoin of the latest AI wave. They might, like Bitcoin, succeed because first movers advantage is that powerful and people are dumb (buy Monero), the reason they're selling that vaporware several months in advance is that it's what they need to appear ahead; their current products are no enough.
>>103599060nope, most here just shitpost and don't even use models
>>103599060>Gemma GGUFI have the fp16s from ollama.What do you want tested?
>>103599102Really good local model is like a unicorn - it's not real.
>>103599057You won't like to hear this...Basically it's not about parameter count. 70B and above could learn to do it properly. It's about having a ton of high quality data, and training for a long time. That's how you get non pajeet code. And the part you don't want to hear is that the only way we'll get that much and with that quality is by researching better methods of generating the data. "synthetic data". There are different ways of generating synthetic data and just any shit method isn't sufficient. The synthetic data needs to be high quality and high diversity so the model learns to generalize/doesn't overfit. So more research needs to be done at least on the open source side. Anthropic had done this already which is why their models coded so well compared to everyone else.
>>103599119This >>103598786, thanks.
>>103599119A prompt that's 24 KB of plaintext (lol).
>>10359905732b is usable70b is not much betterclaude blows all open models out of the water. o1 is better but much slower and MUCH more expensive
>>103599087> newer 7b models give results on par with GPT-3.5 turboCome on now
>>103599102There are still some, like me, and the guy last time that had an exl2 copy. It's understandable that Gemma is not popular given its advertised context size. And my post was kind of long so it's understandable no one cared enough to even read a single sentence of it.
>>103599175>>103599182I gave it 8k of context and it estimated that the prompt was 5752 tokens.
>>103599270You clearly didn't paste the whole thing in. It ends with a question about XP costs. And btw it's 11K tokens.
I only see 279 lines in the pastebin.
>>103599270That sounds close? If I copy and paste the pastebin text into Mikupad, it reports 5634 tokens to me.>>103599298How'd you get that? That should crash the backend or generate gibberish but it's clearly working on my end. No rope.
>>103597950we're doing way better than anyone expected
>>103598368It was. It was also retarded
>>103599203*depending on use case, I guess.
>>103599335I got that by using the token counter endpoint. It turns out if you CTRL-V twice it's 11K.
>>103599377I expected better.
I want to try the status block meme for RP. Any good templates? What should I include?
>>103599417That's what my parents tell me every day
>>103599387Kek.
>>103597898I can't run it on my PC so I don't care.
>>103599432*emotional damage*
>>103597898>It's the smartest model in the world.We can't test it and o1 is garbage at RP, somehow even more bland than gpt4o and feels dumber. I don't expect o3 to be any better.
>>103599592That's because your RP is dumb and doesn't need reasoning. RP with a scenario about solving riddles and then you'll realize how smart it is.
>>103598802NTA but to break my obsession with browsing 4chan in my free time I started reading ebooks, you could give that a try as well
>>103599432Thankfully I disappointed mine enough to stop hearing that.
>>103599623I wouldn't call my usage obsessive. I mean short breaks while doing other things when those things happen to be on the pc. If threads go fast, i let them run, if they're slow, i may drop a line here and there. I take time for reading books most of the days.
>>103599613>RP with a scenario about solving riddlesdo anons really
>>103599716It's all pure placeboRiddles and narrative test scenarios like the watermelon test are the stupidest thing that has ever come out of /lmg/.
>>103599713Good for you, I used to just browse random threads when I was out and about because there really isn't a lot I can do on my phone and I quickly get extremely bored otherwise. I figured I'd start reading real books instead of schizophrenic ESL shit, hopefully it'll help me write more effectively in the future. What are you currently reading? Me, I'm catching up on "The Expanse" as the TV show didn't adapt it 1:1 and ended early
>>103599613>your scenarios are dumb that's why AI struggles with itwhat?
Here I was thinking o3 was a nothingburger, but now I realize that riddle fetishists are eating good
>>103599786I can't wait until January. For the price of a 4090 I can have o3 solve any riddle I want once.
>>103599785Garbage in garbage out anonie
i've mostly stuck to 70 and 30b tier models but i wanna see if smaller models can be useful for something, what's the overall best 3b and ~8b tier models? is there anything even smaller that any of you have found useful?
>>103599781Going through John Varley again. All the short stories i could find and the gaea trilogy (titan, wizard and demon). I tend to like the short stories better. Most books don't need 300+ pages. But i have a way-too-big back catalog of older sci-fi i should go through as well. GBs of stuff i'll probably never get to read.
>>103599850Ifable 9B.
> Rocinante-12B-v1.1 - Dumb. Apparently, one must use ChatML formatting for RP, but the goddamn thing doesn't have the proper tokens for it. > All the magnums - Overtrained on coomslop; every card sounds the same with uniform personalities. > Violet_Twilight-v0.2 - Too many newlines, repetitive. > Mag-Mel - Nah. > sao - Dead in a bathtub. > Ikari and Undi - Nope. > Grype - Irrelevant since Mythomax. Please, /lmg/ gods, I need a decent 12B tune. I can't take it anymore
>>103599750Found the Falconer
>>103599920did you try slush?
>>103599298If I edit its reply to say "To answer your quiz" then hit the continue response button, I get pic related.>1 exp>gemma2:27k-instruct-fp16>100 exp>gemma2:9k-instruct-fp16>gemma2:2k-instruct-fp16>gemma1.1:7k-instruct-fp16>gemma1:7k-instruct-fp16>llama3.1:8b-instruct-fp16>naturally gave 100 exp>using "To answer your quiz" gave 1 exp
>>103599949No, but I will, because fml.
>>103599823That applies to training, but a model that is intelligent (and has been Instruct tuned or is in any other way trained for interacting with humans) should absolutely be able to take a garbage prompt, figure out what the person writing the prompt wants, and give it to them. If it's unable to do this, it's a failure of the model.
>>103599999>mind-reading should be a basic function of any modelniggawatt
>>103599920What about just Mistral's original tune? Personally I even found it to be a bit too horny, so I avoided trying any community tunes since that'd logically be even hornier (and stupider).Anyway I think I remember hearing that UnslopNemo was the best RP tune for 12B, maybe try that out?
>>103592233>cooming on code modelszased... so fvcking... zased*kneeling*
>>103599999I disagree. People who don't give in the effort don't deserve the best rewards.
>>103599899gemma 9b is actually the only smaller model i've kept around, good to know i have objectively perfect taste
>>103599999checked
>>103600031Yes. More to the point, a better model should be better at mind-reading. AI does the cognitive workload for you.A human skilled at writing compelling stories would be able to entertain a stupid person who wants a specific type of story without the stupid person needing to write their own as an example first.
>>103600036I tried that one too. It's just Rocinante with added ChatML tokens, but dumber. Anyway, about the original Mistral Instruct, were you never bothered by its rigid patterning? No matter how much effort I put in or how diverse I made my cards, not even using schizo-system prompting, I could never break its tendency to fall into this repetitive structure: She did blah blah, then blah blah. "Dialogue dialogue." She went, she did, blah blah. She yada yada. In my experience, it overuses "she" and results in bland prose.
>>103599999nta. If there are contradictions in the prompt, the model can go either way. If it's missing important details, the model will make stuff up or not mention them at all.Those are issues than are too common on prompts and the prompt writer is to blame.I imagine something similar happens with art commissions. If the request is vague or messed up, the one fulfilling the commission will interpret. Like prompting just "big titties" in image gen and then complaining that you don't like red-heads when it's done.
>>103599999This.sonnet doesn't have this problem. we need local sonnet. I hope meta drops their llama 4 soon.
>>103600110Increase temperature and repetition penalty
>>103600130Is 0.7 temp, 0,05 min p and 0.8 dry not enough?
>>103600069Tbh Ifable's tune is the only tune I've tried of 9B. Now that I actually go look at a different benchmark (UGI), I notice that the top 9B is Tiger Gemma v3. Now when I go back to eqbench, I can't find it on there. Unfortunate. It would be interesting to see where Tiger Gemma places given how supposedly uncensored it is.But given how it performed, maybe I will give it a try personally.
>>103599999checked trvth nvke
>llama 4oh boy I can't wait for a 1T dense model that trades blows with 4o (May) in select benchmarks
>>103600110Honestly don't remember if it was like that but it may have been. Since it was so horny I stopped bothering to use it, as I am someone that can run 70Bs and was just curious what smaller models could do.
>>103600136Dry doesn't stop the model from repeating single tokens, I would increase the temperature to 1.0 and decrease the MinP to 0.02
>>103600142Zucc said that their biggest model will be smaller than the current biggest one but smarter,Most likely somewhere between 200-300B.
>>103600142>1T dense modelYou'll be ready to run it, right? You have been accumulating VRAM like the rest of us, haven't you?
>>103600172>VRAMNigga we all using Xeon 6 multi channel now
>>103600118Jokes aside really Sonnet 3.5 is great at taking an absolute trash prompt and outputting something decent.
>>103600142Meta has too many H100 GPUs to mess it up. They have more than all other companies combined.They better not to.
>>103600237>They have more than all other companies combined.Um, no? They have about as much as xAI does now.
>>103600245retard
>>103600142>thinking llama 4 will only be 1TDon't worry anon, there will also be a 3B model which is best in class and trades blows with the best 7B models on benchmarks.
>>103600276That is literally working on old information. Retard thinking I'm the retard here.
>>103600276>infinite money>tons of talented engineers>most compute on earth>their models are worse than chinks releasei just don’t understand
>>103600341so how many GPUs does xAI have now?
>>103600328Everyone would be happy if they released 3B, 30B, and 300B
>>103592233>>103592316>>103598256>one result
>>103600352>infinite moneyCEO and management takes it>tons of talented engineerstons of jeets
>>103600352>their models are worse than chinks releaseThey're not. llama 3.3 is the top model currently.
>>103600352Their models are far safer than anything the Chinese have put out.
Are there any examples of diffusion based LLMs out there?
>>103600353The same as Meta training Llama 4. You think Meta is training on a 350k cluster? It doesn't exist. The cluster training Llama 4 is a bit more than 100k. This comes from Zucc in the last earning call.
>>103600395no, DiT hasn't been used for LLMs yet
>>103600365
>>103600399more than 100k is still a lot.Last time they trained on 24k H100s and their biggest model took 50+ days on 15T tokens.They pretrain their new biggest model in a week or two at best, which is way better.
>>103600420That answers nothing, what is the pinkie test in the context of evaluating coom models?
>>103599999Lol this, this is exactly what CAI did in its prefilter glory back in ye olde days.
>>103600376rope yourself
>>103600365>Googlego back
>>103600442it's true chang. No one uses chink models, just look at stats on openrouter. All coomers use 3.3 or sonnet.
>>103600431Sure but it's not some fantasy number of GPUs no else could possibly have. The numbers probably aren't exact either. There's no telling if xAI's report is actually 100k or a bit more but rounded, like Meta, since Meta's report came out after xAI, likely in reaction for boasting purposes.
>>103600442Only gemmies need the rope.
A blast from the past when Llama 3 first appeared on the Replicate API.
>>103600442Cloudcucks always so mad to see localchads thrive
>>103600524>Cloudcucks always so mad to see localchads thriveYes, this is hilarious. Its like, something new came out in closed-land and now I'm supposed to be sad? Bro, my current stuff still works and its just a sneak preview of what I'll have in a few months anyways (Or just as likely, what I already have because the big western corpos ignore chink models when they make meme-graphs)
>>103600437Probably some completely worthless garbage, judging by that guy's activity
for my st director plugin, i dunno why i put in the effort for text boxes when i could have done what i already was doing with lorebooks. derp but at least i was able to reuse most of the actual work
For those of you who use a cloud service - which one are you using? If I use google (which I've used before), is there anything special I should rent out? What are the specs for diffusion jobs?Thanks.
>>103600524"Cloudcucks" are busy chatting with prefill sonnet, seems like a win for me.
>>103600793No, those are the cloudchads. The cloudcucks are the ones that don't even immerse themselves with the models they use (if they do use them) and instead spend their time going on social media shitposting about the thing they supposedly are so happy with.
>>103600828>Twittards and redditors say things! Yeah, for a reason.
>>103600828Like imagine being such a cloudcuck or even localcuck that instead of being like a normal person and happily enjoying your hobby, you instead go online to argue with people about how good or bad [thing] is.
>>103600709Link?
>>103600276I desperately want the H100, but I'll have to wait until it becomes cheap and obsolete like p100
>>103600914A100 still isn't cheap and H100s are under buyback agreements. You're going to be waiting a loooong time
>>103600898https://file.io/XCI58sDJLMsvthats the last one i released, working on a update though. its point is you create lorebooks for clothes, hair and stuff then can quickly change them via dropdowns in the addon. its basically the same as adding to your author note: char is wearing <lorebook entry>, but instead you get dropdowns of those saved entries. install to st\data\default-user\extensions\some st updated a while back changed the theming a bit and the buttons got messed up but the order goes user, char, world, notes, preview, lorebooks
2advanced4lmg https://x.com/novasarc01/status/1870181817162285120
>>103601121it's literally just coconut
Bros I think Gemma is legitimately innovative in what it did. It basically tried to prove that modern models may be using or rather wasting too many of their parameters just to chase a high context length, and it succeeded. The models were way more knowledge-dense at the cost of context length. They even used a sliding window on half of the layers to boost performance even more, though that makes the model even worse at handling context extension. What we really need is a next generation version that does the same thing but gets to around 32k instead of 128k. It wouldn't be nearly as knowledge-dense, but it'd be usable to most people finally without any context extension tricks that degrade performance.
>>103601121Wdym lmg? You post here all the time.
Deepseek r3 when? QwQ3? We aren't going to let Sam get away with this, right?
>>103601203Now that you're talking about gemma, I have been trying a few models to translate chinese to english and gemma2 9b is one of the best. Qwen 2.5 14b somehow performs worse than qwen 3b.
>>103599980>27kHuh, is that an Ollama thing? I guess they're using rope for that. Makes sense it could start answering correctly. But thanks for testing. This would confirm Llama.cpp does have an issue with Gemma that Exllama doesn't.
>>103601321DeepSeek T1 will be out on Christmas and it will be better than o5. Trust the plan.
>>103601343Just imagine when they're on version 800.Haha get it.It's a reference.Haha...
>>103601332I think someone mentioned using Gemma 27B was preferable to Qwen for translating Japanese. If that's true even for Chinese then that'd be pretty funny.>Qwen 2.5 14b somehow performs worse than qwen 3bThat's kind of weird though. Maybe their 14B was a bit of a fail.
>>103599378For what it's worth I tried IQ3_XS and IQ2_XS quantizations of Llama 3.3-70B, and the latter felt substantially worse than the former (overall duller and less interesting outputs, less attention to detail, more formatting mistakes), so there's that as well.Serious investigation into the effects of low-precision quantization needs to be done, because I'm not sure if MMLU scores (which in theory still place 70B Llama-70B in ~2-bit above the 8B version in FP16) tell the entire story.
How far are we from actually running an AI Dungeon like program locally with strong recollection and general response quality? Assume a 5090
>>103601539Qwen 3b works pretty good for a "normal" translation, but since I'm using it to translate a novel it wasn't enough. I don't know what was wrong with 14b but with the same prompt and the same novel. it performed considerably worse. Maybe it would be better with a different prompt but I was busy trying other models.Nemo also was decent but gemma feels more "accurate". I can't really tell accuracy with so little testing and no real translation but this was how it feels to me so far.I'll give 27b a try since it was mentioned.
>>103601767Further away than ever before. Soulful completion models like Summer Dragon are dead. All that's left is boring Instruct tunes that are as boring as they are predictable.
>>103601014It's quite handy thanks for making this
Is there a good archive of high-quality, clean Touhou voice samples somewhere?
Asking on the off chance anyone is going to give me a serious answer: I have a 96GB AI server I use to run mainly Mistral-Large based models. Is DeepSeek 2.5 actually worth caring about? Should I be looking for some more cards to run it?
>>103601804What are you running the models for? ERP?
>>103601804I prefer deepseek (especially 1210) at q8 over largestral at q8Is it worth it? How much is a boost in intelligence worth to you? Vanilla rp isn't going to get much better imo. You'll need complex scenarios or actual intelligence-stressing tasks for it to be worthwhile.
>>103601812primarily, yes
>>103601804deepseek is smarter and knows a lot more but is dryer and needs xtc imo. The speed alone though makes it worth it.
Is v100maxx chad on here? How worth it is your setup? I'm thinking about getting some of these and some v100s as a cheap alternative to 48gb cardshttps://www.ebay.com/itm/296856182515
>>103601804How much combined RAM and VRAM? You need like 192GB to run a decent quant with a decent context length, especially since Llama.cpp doesn't support flash attention for DS.
>>103601859>>103601859>>103601859