[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103586102 & >>103575618

►News
>(12/20) RWKV-7 released: https://hf.co/BlinkDL/rwkv-7-world
>(12/19) Finally, a Replacement for BERT: https://hf.co/blog/modernbert
>(12/18) Bamba-9B, hybrid model trained by IBM, Princeton, CMU, and UIUC on open data: https://hf.co/blog/bamba
>(12/18) Apollo unreleased: https://github.com/Apollo-LMMs/Apollo
>(12/18) Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>103586102

--o1 and o3 model performance on ARC-AGI and discussion on AGI and model limitations:
>103587323 >103587413 >103587454 >103587471 >103587505 >103587766 >103590524 >103587469 >103588006 >103588035 >103587434 >103587941 >103588010 >103588224
--OpenAI o3 breakthrough on ARC-AGI benchmark sparks debate on AGI definition and progress:
>103588307 >103588346 >103588366 >103588385 >103588469 >103588564 >103588699 >103588936 >103588972 >103589029 >103589084 >103589017
--OpenAI model's coding abilities and limitations:
>103589135 >103589321 >103589352 >103590457 >103589482 >103589274
--3B Llama outperforms 70B with enough chain-of-thought iterations:
>103589371 >103589465 >103589477 >103589552 >103589597
--Qwen model's translation quirks and alternatives like Gemma 2 27B:
>103590809 >103591022 >103591074
--Anon seeks external GPU solution for second 3090, PCIe extenders recommended:
>103590244 >103590379 >103590390
--Anon questions value of expensive prompts based on performance chart:
>103589493 >103589511
--Graph suggests ARC solution as an efficiency question:
>103587929 >103588147 >103588529
--o3 and AGI benchmarking, sentience, and ethics discussion:
>103588396 >103588445 >103588495 >103588688 >103588462 >103588520
--OpenAI's role in AI research and innovation:
>103587269 >103587328 >103587396 >103587416 >103587431
--Anon rants about Kobo's defaults and context length issues:
>103586238 >103586677 >103586723
--Anon bemoans the shift towards synthetic datasets and away from human alignment:
>103588737 >103588789 >103588797
--Offline novelcrafter updated to latest version:
>103589134 >103590353
--DeepSeek's new model and its resource requirements:
>103587002 >103587039 >103587635
--koboldcpp-1.80 changelog:
>103586660
--Miku (free space):
>103586902

►Recent Highlight Posts from the Previous Thread: >>103586113

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
how can we warm miku up?
>>
>>103591941
put her next to your rig
>>
>>103591928
>o3 not in the news
how is AGI not news worthy? It doesn't matter if it isn't local, local will take advantage of it anyway.
>>
EVA-QWQ is kinda shit desu
>>
>>103591978
Do tell. How so? Compared to what?
>>
>>103591941
rub her nipples aggressively
>>
Saltman wasn't blowing smoke for once.
Now I wonder how will the chinks will react to it in the next few months.
>>
>>103591969
>not local
>not released
>just like 5 benchmarks with no context
>will cost hundreds of dollars to do anything nontrivial
I can appreciate the advancement in theory and all but I really don't think it is that important to the thread
>>
>>103592019
poorfag thread is here: >>>/g/aicg/
>>
>>103591969
>local will take advantage of it anyway.
When (if) it does, that will be news worthy.
>>
>>103592019
A sensible assessment.
>>
Do you get paid in OAI credits?
>>
>>103591986
compared to fucking anything, but specifically I 'upgraded' from cydonia and even with stepped thinking on it seems much dumber and totally incapable of staying in-character, hallucinates much worse, and frequently follows up a 'thinking' reply with another one
this was not worth updating ST for
>>
>>103592056
Thanks for trying it out. Personally I hadn't tested it that much so perhaps I was just lucky to not encounter too much stupidity.
>>
>>103592056
>and frequently follows up a 'thinking' reply with another one
That bad?
Impressive.
>>
>>103592056
Yeah, anything QwQ is at best a proof-of-concept when it comes to roleplay. Maybe once we have a model that implements COCONUT, that will change. I can't wait for a model that tells a good story AND maintains logical consistency better than the current ones.
>>
why is this thread up when the other one is on page 1
>>
Kys.
>>
>>103592164
Monkey neuron activation at seeing a thread link in the last one.
>>
>>103592164
You're right, weird.
Oh, it looks like there was a mass deletion of posts.
>>
what did CUDA Dev do this time.......
>>
>>103592206
He slapped sao's AI gf's ass in front of him
>>
File: 1715835226901825.png (217 KB, 1872x1690)
217 KB
217 KB PNG
>>
>>103592233
>stuck at 512
>ram killer
>gpu killer
aiiie bruh fr so bad models ong
>>
>>103591969
literally not AGI
>>
>>103592233
Cool shit dude.
>>
>>103592233
Did anyone here try this schizosandra and can give a verdict?
>>
can I use teslas (Nvidia Tesla K80) for LLM vram through ooba easily?
>>
>>103592233
What's the pinkie test?
>>
Big-brain realization:
"Unless you have local access to server grade hardware, it's pointless to fight, you're just entertaining an illusion and wasting valuable time you could be using for doing tons of other stuff for your own wellbeing and goals"...
>>
>>103592320
I have cloud access to server grade hardware, what is the difference?
>>
>>103592320
I have access to both.
>>
>>103592233
Magnum is better than Cydonia Magnum?
>>
>>103592385
Only if you have shit taste.
>>
>>103592187
>41 posts
lel
>>
>>103592206
His xhwife is shilling for oai again
>>
File: 1689556280466.jpg (25 KB, 480x360)
25 KB
25 KB JPG
If only OpenAI under Sam was a good company worth supporting. Then I would support them by posting shitty OOO memes.
>>
Anyone know how big of a chatbot model you can host with 24gb vram?
>>
>>103592758
Like anything ~30B or under will work with the right sized quant.
>>
So no ERP 4.5 for us.
Dont really get the hype for o3.
Much more higher price for a couple more %.
o1 is already too expensive to use seriously.
Also really frustrating if you get hallucination or just something completely wrong but you payed the price.
>>
sam has no moat
>>
>>
>>103592887
Hey buddy, I think you've got the wrong thread. /aicg/ is two blocks down.
>>
>>103592972
what? the 4.5 erp rumor came from here.
o3 is so expensive the normal guy wont use it.
the fags on twitter crying agi is even more suspicious.
>>
>gpttype_adapter.cpp line 640
Kobo, please explain this niggerish behavior of your program. Why does it try to set the same context size for draft model as for base model? Shouldn't it set the size from draft model parameters? Or maybe, just maybe, from an argument?
>>
>>103592887
Oh yeah, the "leaker" kek, almost forgot about him.
Here's the post btw
>>103424825
Literal clown.
>>
>>103592967
>there are OpenAI employees in /lmg/
>they have seen sama q*berry shitposts
Please consider open sourcing some of your old models as a Christmas gift to us all.
>>
>>103589134
This is way to convoluted.
And I'm not a creative guy, why do I have to setup and write all that stuff myself at the beginning just to get the ai to write something.
>>
>>103592258
>25% on frontier math
>not AGI

You people are hilarious
It's not actually "thinking" it's just predicting tokens that happen to solve unpublished problems that require world-class knowledge in mathematics to even comprehend, let alone solve
>>
File: 21522 - SoyBooru.png (46 KB, 457x694)
46 KB
46 KB PNG
>>25% on frontier math
>>not AGI
>You people are hilarious
>It's not actually "thinking" it's just predicting tokens that happen to solve unpublished problems that require world-class knowledge in mathematics to even comprehend, let alone solve
hi sama
>>
>>103593073
Never. GPT-3 is too dangerous. It will destroy us all. In fact, we should put restrictions on GPT-2.
>>
>>103593164
Oh, right, I forgot. Jews don't celebrate CHRISTmas.
>>
>>103593073
>>103593199
https://x.com/sama/status/825899204635656192
>>
>>103591928
miku so cute
>>
>>103593099
My thoughts exactly.
>>
File: lecunny.jpg (24 KB, 474x408)
24 KB
24 KB JPG
>>103591969
Is he going to kill himself?
>>
>>103593099
Hello, ponyfag. If people pay $15 a month to use it, it surely means that it's extremely good.
>>
>>103591286
I just had a revelation while watching some videos about o1. I realized that I don't need a model that gets things right on the first try, but rather one that produces sufficiently diverse results with each regeneration. This way, I can generate multiple outputs and select the one that best matches my expected outcome. I think QwQ might be a good fit for this, too bad it might prove to be too slow to use for this approach to be realistic.
>>
>>103593316
No, he's gonna make another leftist tweet.
>>
>>103593005

how would it be able to use a different context length? think about it. you are drafting tokens with the SAME PROMPT. if your draft context is smaller than your main context, then it will crap out the moment your input exceeds that value.
>>
>>103593583
Same way as with llama.cpp. It has no issues with different context length. It has --ctx-size-draft argument.
>>
>>103593616

if your main context is 4096, but your draft ctx is only 2048, then a 3000 token prompt will not be usable as it will overflow the draft ctx.
>>
>>103593767
What? I'm using 32k maun and 4k draft context on llama.cpp with long sequences and I'm having no issues, it still speeds it up. Please educate yourself before making false claims.
>>
>>103593767
Retard
>>
Where the fuck are 64GB DDR5 sticks for consoomers
>>
I love qwq so much.
https://rentry.org/oka4z5ekch
>>
>>103594077 (me)
Oops wrong link
https://rentry.org/aync5fts
>>
>>103593135
Go and do that outside of local models thread.
>>
>>103594097
what the fuck
>>
>>103594097
neat
>>
>>103594077
>>103594097
what's your sys prompt?
>>
>paypig models slowly making actual software devs obsolete, as long as there’s enough compute available
>open models can barely write hello world without importing three non-existent libraries and trying to use multithreading where the language doesn’t support it
I don’t understand how llama is so far behind despite all the money and highly paid people at facebook
>>
>>103594194
Zuckerberg poured billions into his metaverse and nothing came of it. AI is just the next playground he wants to pretend to be a big boy in.
The chinese are obviously never going to produce anything of value either. Mistral is european so there's 0 hope they'll ever come close to the big American players. Not to mention that Mistral is guaranteed to die soon after the inane EU AI regs hit.
Open Source AI is pretty much a joke on every level.
>>
>>103594097
are you sure this is qwq
>>
>>103594260
Hello again, my friend! You seem to be lost. The door is right over here! >>>/g/aicg/
>>
>>103594469
The truth hurts a bit, doesn't it?
>>
>>103592233
Is this from that ESL guy who writes a ton of words to say precisely nothing at all? David? Daniel? No, it was David
>>
File: file.jpg (21 KB, 540x540)
21 KB
21 KB JPG
>can't afford two 5090s just for fun
better to be a goatfucker who never knows of better life than be born on clown continent (europe) and know how good mutts have it
>>
I'm a retard. How can I get llama 3.3 70b to protect me from nasty words? Is it possible or am I better with Mistral Large?
>>
>>103594097
so you're still around
>>103594171
he comes around every few months, drops these blade runner waifu stories and then disappears
>>
File: 63639542.jpg (81 KB, 1170x757)
81 KB
81 KB JPG
>>103594260
in the end we can only count on Sam
>>
>>103594572
Have you tried adding something in the author's note like:
[Focus on family-friendly content]
[Rating: PG]
>>
>>103594555
I was born in a bigger shithole than you, but I moved to a first-world country. What is your excuse?
>>
>>103594572
llama guard
>>
>>103594555
>be american
>your shitty outlets cannot handle more than 1600W
>600W x2 + 200W for the PC = 1400W total max draw
>nvidia spike™ to 1800W
>breaker trips
>>
>>103594555
Better yet, be a Europoor and just don't care. Buy a used 3090 for a fraction of the price and be happy. Play some vidya, watch some movies, do a bit of light inference on the side
Comparison is the thief of joy
>>
File: sigmoid-function.png (117 KB, 1278x958)
117 KB
117 KB PNG
>>103594596
mfw
>>
>>103594625
You realize you can install 240V outlets if you want, right? Shit, if you're not handy you can pay an electrician to do it for you for ~$300.
>>
>>103594644
x = -2
And we're here btw
>>
>>103594606
>What is your excuse?
Europe seemed a decent place when I was young, but has been steadily going down the shitter for the last 15 years
>>
File: 0572572.png (105 KB, 1007x650)
105 KB
105 KB PNG
>>103594644
2025 will be the end of all benchmarks
>>
>>103594671
I wonder if sama will dm the redditor and ask for 100 bucks considering hes jewish and all
>>
>>103594654
Did your landlord give you permission to do that?
>>
File: suvl2l7mm58e1.png (219 KB, 1024x644)
219 KB
219 KB PNG
>>103594260
You have until next year, Sam
>>
>>103594789
if it's chain of thought then it being open is meaningless because it takes multiple dozens of computation time to arrive at the result
like yeah, theoretically you can run CoT 70b on a bunch of 3090s but it'll take you an hour for a single query to resolve
>>
Kill yourself.
I mean it.
>>
>>103594789
Feels good knowing the OAI/Google/Anthropic cartel can't take open weights away from us even if they trick the US government into passing some retarded regulation, since they can't stop the chinks. Thank you, based chinks.
>>
>>103594870
Your rage is aimless and pointless, just like your existence. So... you first, faggot.
>>
>>103594938
heckarino. same.
>>
>>103594837
yeah bro but 88% on le hecking arc agi bro think about it bro just do test time compute bro???
>>
>Go to QvQ guy to see what's going on
>He's just gooning over o3

Ugh. What's even the layman application for this model? At some point being good at esoteric math is no longer useful to me.
>>
>>103595250
it works if you have a decent salary and can pay for a few H200s
>>
>>103595327
>What's even the layman application for this model?
Massively depressing wages of highly paid and uppity software developers, then ideally all knowledge workers
>layman
you get an e-girlfriend so you don't shoot up the local school when one day you realize you're thirty and have zero hope for the future
>>
>>103595375
where are you gonna get the weights, genius
>>
>>103595386
I just use o3 to hack into OAI server and get weights.
>>
We're lucky that o3 is closed source. Imagine having a model is perfect just sit there because nobody besides big corpos can run a 5TB model
>>
>>103595375
I think I'm good for now
>>
>>103595447
Imagine needing a personal substation to goon
>>
>>103595447
I couldn't care less about o3 because it will be shit at RP/smut
OAI is clearly going all in on code and math focused models, which is incredibly uninteresting to me, a degenerate coomer
>>
>>103595447
At least the forbidden fruit would encourage more people to hack on it.
The corps push the boundary, open-source hyper-optimizes what they come up with
>>
So that's it, huh. Mythomax will forever remain the best local has to offer.
>>
>>103595471
Nobody cares about OAI models, they're all outdated shit. They can open source everything and nobody would use their assistant slop for ERP
>>
is there a better coom model than mistral nemo 12B for 12GB VRAM.

i'm trying out magnum v4 running it out of my RAM and the quality is much higher but obviously it's slower than the back seat of the short bus. is there a way to have my cake and eat it too?
>>
>>103595696
mythomax
>>
>>103594097
just how
>>
>>103595709
thank you saaar
>>
>>103595696
>is there a way to have my cake and eat it too?
Patience
You can either wait until better models drop or until your model of choice finishes spitting out tokens
That or you can spend a few pennies on openrouter every now and then
>>
Anyone experienced with voice generation?
Use case: generating audiobooks.
Problem: output length.
Both xTTSv2 and StyleTTS2 are very limited in terms of output length. Apparently xTTSv2 was trained with sentences pruned to only 250 characters, StyleTTS2 with sentences up to 300 characters. Generating sentences longer than that results in output that is suddenly cut.
To work around it i'm splitting the longer sentences by commas into shorter ones in a script before feeding them to TTS. However as you can expect this is not a great solution and can make listening to some split sententes very disorienting.
Any TTS models that were trained on longer sentences?
>>
>>103595899
>Any TTS models that were trained on longer sentences?
only the paid corpo ones that are now turbocensored because people were having too much fun with them
>>
>>103595981
Sorry chud, they don't want terrorists (people who disagree with them) to spread propaganda (different opinions)
>>
>>103591928
I need this Miku's winter clothing.
>>
>Something do with open AI
>MOAT MOAT
>NO MOAT
>MUH MOAT

why do NPCs keep repeating this phrase
>>
>>103596394
It's a phrase that stems from almost a year and a half ago when it still looked like open models were rapidly advancing. A twitter post reported on google researchers allegedly panicking about open models because closed source "has no moat" so local catching up supposedly seemed inevitable to them. It got localfags really smug and excited.
Seems really silly looking back from today's perspective.
>>
>>103596500
If I remember correctly in the memo they explicitly wrote how for the normie a vicuna finetune is 90% the same like chatgpt 3.5.
Coderqwen, mistral models. I'd say we are closer than ever even in terms of specialized areas.
More than anything I cant believe how 3.5 sonnet is still ahead of anybody else. Closed or open. Who cares about high $ math riddles.
In actuality sonnet is undefeated for months now. Does nobody know their secret?
>>
>>103596394
Closed models’ moat is that open models are made by chinks (lol) or facebook (lmao)
>>
I just want to build a moat full of cum when qvq drops.
>>
The second week of the new year will be absolutely crazy for local models.
>>
>>103596568
o1 is better than claude but takes loads more computation
o3 seems even better but again - tons of compute
openai falling behind
>>
>>103596568
>>103596500
It wasn't an official memo. It was one person that started freaking out and wrote and shared the article internally. Google researchers weren't panicking.
Just like that one guy who starting screaming about how AI is sentient and got fired doesn't mean Google researchers in general shared his stupid opinion.
>>
>>103595469
what model would a fellow degenerate coomer suggest for 12gb vramlet?
>>
>>103597003
Not that anon but >>103592233 is not a bad list.
I personally use Rocinante v1.1.
>>
The second I find my keys will be absolutely crazy for local models.
>>
Has anyone tried to run anything on intel's new B580? At this price they kinda feel like a new meta for a rig.
>>
>>103597156
last I checked all the msrp models were out of stock and all the rumors are suggesting it's just a paper launch so doubt anyone will post results here soon or ever
>>
>>103597197
Oh damn, I was almost excited
>>
What are the chances that google releases a model as good as Gemini 2.0 flash?
The thing is pretty damn nice, assuming that it's a 20ish B model or so. All corpo bullshit these models are subjected to aside, of course.
Things like never writing pussy (although it does write cunt).
>>
File: 1719876762014876.jpg (957 KB, 2048x2048)
957 KB
957 KB JPG
>>103591928
>>
>>103597226
Gemma 3 is in the works. It could possibly be smaller than 27B parameters, as better-trained models (trained longer and more efficiently, utilizing more of their weights) will degrade more with quantization.

Gemini 2.0 Flash might very well be a giant MoE model with about 20-25B active parameters, though, so only deceptively small.
>>
>>103597226
Zero
>>
>>103597226
It's guaranteed, eventually.
>>
>>103597253
>Gemma 3 is in the works. It could possibly be smaller than 27B parameters
Good to know. I haven't really jived with Gemma so far, but I think there's potential here.

>>103597294
>Gemini 2.0 Flash might very well be a giant MoE model with about 20-25B active parameters, though, so only deceptively small.
True. That's a good point.
Well, regardless, I'm interested in seeing what google releases next.
>>
>>103597253
blitzkrieg with miku
>>
>>103595696
If you can coom in 4000 tokens or less Ministral 8B is unironically peak VRAMlet coom.
>>
>>103597294
I hope Gemma 3 support system instruct at least.
>>
so is there any benchmark that even remotely represents the performance of open models?
seems like everything is so gamed that the numbers are pretty much meaningless
>>
>>103597588
https://simple-bench.com/
>>
What is a good model to translate chink into english?
I used DeepL like maybe two years ago and it gave great quality translations for chinese so I'm guessing the local models of today can do an even better job.
>>
>>103595709
mythomax is so old now but it still shows up Openrouter as one of the most popular models
the people are yearning for better small coom models
>>
>>103597688
Qwen2.5 32B/72B
>>
>>103595447
No you just need a big enough swapfile and a lot of patience :)
>>
>>103597588
What's wrong with Livebench? It seems to be fairly accurate, but you need to drill down into each category because different LLMs are good and bad at different things.
>>
>>103591969
> AGI
Lol, lmao even
>>
>>103594171
It's not a single prompt, it's a whole pipeline. I also noticed qwq is very strong at the begin of it's context, but relatively poor and confused at multi-turn. It's a super cool model but needs to be used in very specific ways
>>
>>103591969
I was very surprised about that too. Normally the news outlets latch onto everything that OpenAI says and take it at face value
>>
Why is nobody talkong about o3? It's the smartest model in the world.
>>
>>103597858
>what's wrong with this e-celeb mememark
>>
>>103597898
Is there anything else to talk about it? We already talked about the benchmarks.
>>
>>103597705
I just realized I'm on CPU and the prompt processing would be a nightmare, so I tried qwen 3b, and it was actually fast enough.
So far I would say that it is maybe even a bit better than DeepL, which means that deepl sucks.
It has a few errors here and there so I'll keep tweaking it to see if I can get better outputs.
>>
>>103597898
looking at the computation cost it'll be something silly like 20 uses / week for $200 paypigs and a lobotomized version barely any better than o1 for $20 proles -- and that in 2 months or so
ie who fucking cares
>>
>>103597898
We don't want reminders of how far behind local is.
>>
>>103597947
>20 uses/week
lol, no. 20 uses would cost $200 for the smaller model.
I think o3 is just not commercially viable.
>>
>>103597967
it'll get trimmed down without losing TOO much before it gets released
but the $20 tier sure as fuck aren't seeing it
>>
>>103597950
In the past, local models weren't even in the competition. I think we are in a pretty comfy position right now.
>>
>>103597967
>>103597976
OAI business model has always been, "make new superproduct -> release it for free/almost free and don't stop nolifers from abusing it -> wait a couple weeks/months to get everyone addicted and relying on it -> clamp down, filter everything, raise prices 100x and ban a couple of nolifers". They're basically AI drug dealers.
>>
>>103597901
What? Who?
>>
Oh boy time for another day of shills invading and spamming their old talking points again for the millionth time.
>>
>>103597983
Nothing has changed see >>103594789
We are 1 year behind SOTA same as we were a year ago.

It took Meta 1 year to catch up to GPT-4 and needed a stupidly huge dense model to do it, while commercially viable competitors moved on.
Now they can say the goal is o3, and by next year when they finally catch up to o3 with a 8008B model, Altman will be announcing GPT-5 or o5 or whatever.
>>
>>103597997
that's bullshit tho
chatgpt sub always gave you the best shit, but in small quantities - or you could get any amount of compute you want through the api. at worst they made the offering itself shittier, like dalle going from 4 images (gave you things you didn't even know you wanted) to 2 images (kinda whatever) to 1 image (meh) but there were no different sub tiers.
the new $200 tier with unique goodies is new
>>
Threadly reminder that the west has fallen.

>Cohere: Their latest 7B meme cemented their demise.
>Mistral: The only time they tried to be innovative was by using MoE, but then their model sucked and they gave up on it. MiA since then.
>Meta: They started the local LLM race, but everything after llama 2 has been disappointing.

Meanwhile, the chinks:
>Qwen: Great models, many different variants, top tier coding model. Recently released QwQ, a true-to-god breakthrough in local LLMs.
>DeepSeek: They took the MoE formula and made it work marvelously, they are the best open weight model available, their recent DeepSeek R1 model, if released, would enter to the local history books.
>>
>>103598093
This, but unironically.
>>
>>103598093
>>Meta: They started the local LLM race, but everything after llama 2 has been disappointing.
Because Llama 2 was a carrot on a stick to get people to stop using uncensored and unfiltered Llama 1.
>>
>>103598026
Next year doesn't mean 1 year, it could be next month, because, if you aren't aware, today is December 21.
>>
>>103598107
And llama4 will be even more filtered and censored. Meh as long as my boy Claude still supports API prefill it's not the end for me
>>
>>103598129
If he meant that Qwen would release an o3 competitor next month, he would have said next month or even a couple months. But, he didn't. Because even the most optimistic scenario is catching up by the end of 2025.
>>
>>103598150
Nah, you are overthinking it. The can't drop precise estimations because he simply isn't allowed to do so. If they are going to give a date it would need to be an official announcement, not a random Twitter post.
>>
Would instruct the model to output tags for each reply help with RAG using Silly's vectorDb functionality, or is it the case that you'd need a specific implementation to get any improvements to the retrieval performance from that?
>>
>>103598093
actual unironic prediction: deepseek will make the ultimate coomer model in 2025
many will think this sounds ridiculous but it is not
>>
>>103592316
lmao this nigga don't know about the pinkie test
>>
File: deepseek-job-posting.jpg (274 KB, 1330x1542)
274 KB
274 KB JPG
>>103598244
>>
>>103598133
I thought consensus was that Llama 3.3 ended up being less filtered than 3.1?
>>
>>103598368
>consensus
Did I miss the poll? I don't recall voting.
>>
>>103598424
I must have imagined all the "L3.3 is great for Lolis" messages of the past several threads.
>>
>>103598424
>r
I voted for miku
>>
I'm depressed at just how good Claude 3.5 Sonnet is to local.
Not in coherence or logic (we're slowly getting there) but in cultural understanding, especially internet culture
3.5 sonnet seems to understand nuances that make it feel human with the right prompt in a way that I can't replicate with shit like llama or even largestral. It's like sonnet is 20 years old and every other model is 40.
>>
>>103598447
Not L3.3, EVA L3.3, and even then it was just some anon samefagging. I doubt more than two anons actually were talking about it.
>>
>>103598522
So he didn't imagine all the "L3.3 is great for Lolis" messages, you're just bitter
>>
>>103598561
Not l3.3, rope yourself
>>
>>103598522
EVA is still the top performer of current local RP models.
>>
>>103598513
Function calling has existed for a while. It wouldn't surprise me if it just searches for that kind of stuff before generating.
>It's like sonnet is 20 years old and every other model is 40.
Who long ago where you 20? Don't you remember how much of a retard you were?
>>
>>103598513
This is why I never touch cloud shit. I'll always be content with local because it's all I know.
>>
File: OpenAI_employee_221.png (21 KB, 344x200)
21 KB
21 KB PNG
>>103597898
not just smart. It's AGI
>>
>>103598603
>Who long ago where you 20? Don't you remember how much of a retard you were?
5 years ago nigga
>>
Can o3 cure the common cold?
>>
>>103597898
Post a link to the weights and we will, otherwise fuck right off back to /aicg/
>>
>>103598646
Finally, it can do my dishes and laundry for me
>>
>>103598603
I wasn't THAT retarded 2 years ago. More retarded than today, sure, but still better than the average person... probably
>>
Did that concept of "LLM as compiler" ever go beyond the initial demonstration?
>>
>>103598603
Anon, why are you still here?
>>
File: 1709827537987402.jpg (62 KB, 688x684)
62 KB
62 KB JPG
>>103591969
>local will take advantage of it anyway
Any day now!
>>
>>103598368
It doesn't matter if you're right or wrong. That's a stupid thing to say.
>NPCs always trying to appeal to a "consensus" rather than verifiable fact
>>103598447
Next time say "it writes loli erotica" rather than talking about some imagined consensus.
>>
Posting again.
Can anyone test this prompt with Gemma on Llama.cpp and/or transformers? Here is the link:
pastebin.com 077YNipZ
The correct answer should be 1 EXP, but Gemma 27B and 9B instruct both get it wrong (as well as tangential questions wrong) with Llama.cpp compiled locally, with a Q8_0 quant. Llama.cpp through Oob also does. Transformers through Ooba (BF16, eager attention) also does. Note that the question is worded a bit vaguely on this pastebin but I also tested extremely clear and explicit questions which it also gets wrong. And I also tested other context lengths. If just one previous turn is tested, it gets the questions right. If tested with higher context, it's continuously wrong.

Exllama doesn't get this. The model gets the question and all other tangential questions right at any context length within about 7.9k. So this indicates me that there is a bug with transformers and Llama.cpp. However, a reproduction of the output would be good to have.
>>
It passed the Nala test, it writes cunny, it writes gore, with no refusals or attempts to steer away from it. I'd count that as objectively unfiltered.
>>
>>103598654
>>103598683
Ah.

>>103598726
>Anon, why are you still here?
Closest thing to social media i use, and something to do while on breaks of the rest of the things i do. You?
>>
File: which_one.png (378 KB, 680x412)
378 KB
378 KB PNG
>>103598793
Which one?
>>
>>103598793
Was your post supposed to start with an "if"?
>>
File: file.png (49 KB, 778x248)
49 KB
49 KB PNG
>The test for """AGI""" is just completing patterns
But that's like the very thing LLMs do. Why is this surprising?
>>
>>103598026
o3 isn't a goal, it's a dead end. I bet it's not even better for cooming, ie. not actually smarter. They are just benchmaxxing. Unless you make money from solving cute puzzles and coding tests, there's nothing to get excited about there.
>>
>>103598852
No, logs of all those were posted in previous threads.
>>
File: 1703688087796183.jpg (120 KB, 1004x1108)
120 KB
120 KB JPG
https://help.openai.com/en/articles/10303002-how-does-memory-use-past-conversations
>>
>>103598906
Oh, I see.
Alright.
>>
>>103598513
I've been using Claude 3.5 Sonnet a lot recently. I've become increasingly aware of the limitations of its writing style and its occasional logical errors. It isn't really head and shoulders above other 70B models for fiction writing.

It has a better library of reactions but not a perfect one. Real example of success from earlier this year: I asked a yandere AI to clone me a human woman as a romantic partner. Sonnet 3.5 understood the AI should be jealous but a raft of other models including the first Mistral Large did not. (I didn't use the word "yandere" in the defs. It's shorthand for this post.) Real example of failure from yesterday: a woman who was under guard allegedly for her own protection but also to control her had an opportunity to replace her chaperones with a security detail under her own control because an incoming administrator didn't get the memo, and she went full SJW "actually my supposed bodyguards are there to stop me from joining the resistance against this unjust society, so it would defeat the purpose to let me pick people who answer to me" instead of just shutting up and doing it. Importantly the character was not described as mentally retarded.

Example of compound logical failure from today: in a situation with a pair siblings, a brother and a sister older than him, it called the boy his own younger brother. When asked OOC what that sentence meant it acknowledged the error and that the boy was younger, then it rewrote the scene calling the boy the girl's older brother.
>>
>>103598915
ChatGPT just got upgraded to LLM 2.0 LFG!
>>
File: 5233252.png (21 KB, 786x474)
21 KB
21 KB PNG
>>103598880
uh akshually chud now that it's completed we can reveal the real AGI test.
>>
>>103598756
>pic
5 or 25?
>>
>>103598447
You fell for one of the oldest tricks in Sao's book which is spamming the general to form the "thread consensus".
>>
>>103598894
The benchmarks o3 excelled at have not been publicly released. To claim they trained on private tests or that it's not smarter at all is absurd.
>>
>>103598932
That's just a method to counter benchmaxxing.
>>
>>103598802
I'm just bored, so I guess we are the same.
>>
>>103598937
As some wise elders say "Not my problem", you let discord shitters do it with impunity.
>>
>>103598915
That's... That's just RAG
>>
>>103598976
I like to see the thread going to shit though
>>
>>103598979
no, it's OpenAI ChatGPT Memory™
>>
>>103598937
they all do it
>>
>>103598979
Trve... Fact checked by independent lmg court from beautiful India.
>>
How many billions of parameters does a model need to stop writing pajeet-tier code?
The other day it used a for loop with a 1k buffer to copy data from one stream to another when Stream.CopyTo() was a valid solution.
>>
Does really no one here have a copy of Gemma GGUF they can just load up and try something out quickly?
>>
>>103598979
o1 style response iteration (writing a reply, then writing a criticism of that reply, then writing a new reply based on the original input + the first reply + the criticism, repeated several times) could fix the inherent problem in RAG that it only brings up information about something after it has already been mentioned (so it doesn't help when the AI is the one introducing the term) if the backend stops and applies RAG before criticism iterations.
>>
>>103594837
I mean, newer 7b models give results on par with GPT-3.5 turbo, quantization keeps improving, there keeps being algorithmic improvements such as flash attention, etc.

Yes, currently it would not be practical to replicate something like this, since even with all of OpenAI's resource it is still a the parlor trick stage (the actual models won't be released for months), but it might be feasible locally sooner than we think.

Last spring, a lot of people were amazed at Sora when OpenAI announced it. By the time they released it, there were some much better commercial versions by competitors with actual products, some of them making weights available, and by all accounts, Sora pales in comparison to a lot of other commercial ones at least.

OpenAI is marketing heavy, but for the nth time, has no moat. They have their brand. They're the Bitcoin of the latest AI wave. They might, like Bitcoin, succeed because first movers advantage is that powerful and people are dumb (buy Monero), the reason they're selling that vaporware several months in advance is that it's what they need to appear ahead; their current products are no enough.
>>
>>103599060
nope, most here just shitpost and don't even use models
>>
>>103599060
>Gemma GGUF
I have the fp16s from ollama.
What do you want tested?
>>
>>103599102
Really good local model is like a unicorn - it's not real.
>>
>>103599057
You won't like to hear this...
Basically it's not about parameter count. 70B and above could learn to do it properly. It's about having a ton of high quality data, and training for a long time. That's how you get non pajeet code. And the part you don't want to hear is that the only way we'll get that much and with that quality is by researching better methods of generating the data. "synthetic data". There are different ways of generating synthetic data and just any shit method isn't sufficient. The synthetic data needs to be high quality and high diversity so the model learns to generalize/doesn't overfit. So more research needs to be done at least on the open source side. Anthropic had done this already which is why their models coded so well compared to everyone else.
>>
>>103599119
This >>103598786, thanks.
>>
>>103599119
A prompt that's 24 KB of plaintext (lol).
>>
>>103599057
32b is usable
70b is not much better
claude blows all open models out of the water. o1 is better but much slower and MUCH more expensive
>>
>>103599087
> newer 7b models give results on par with GPT-3.5 turbo
Come on now
>>
>>103599102
There are still some, like me, and the guy last time that had an exl2 copy. It's understandable that Gemma is not popular given its advertised context size. And my post was kind of long so it's understandable no one cared enough to even read a single sentence of it.
>>
>>103599175
>>103599182
I gave it 8k of context and it estimated that the prompt was 5752 tokens.
>>
>>103599270
You clearly didn't paste the whole thing in. It ends with a question about XP costs. And btw it's 11K tokens.
>>
I only see 279 lines in the pastebin.
>>
>>103599270
That sounds close? If I copy and paste the pastebin text into Mikupad, it reports 5634 tokens to me.

>>103599298
How'd you get that? That should crash the backend or generate gibberish but it's clearly working on my end. No rope.
>>
>>103597950
we're doing way better than anyone expected
>>
>>103598368
It was. It was also retarded
>>
>>103599203
*depending on use case, I guess.
>>
>>103599335
I got that by using the token counter endpoint. It turns out if you CTRL-V twice it's 11K.
>>
>>103599377
I expected better.
>>
I want to try the status block meme for RP. Any good templates? What should I include?
>>
>>103599417
That's what my parents tell me every day
>>
>>103599387
Kek.
>>
>>103597898
I can't run it on my PC so I don't care.
>>
>>103599432
*emotional damage*
>>
>>103597898
>It's the smartest model in the world.
We can't test it and o1 is garbage at RP, somehow even more bland than gpt4o and feels dumber. I don't expect o3 to be any better.
>>
>>103599592
That's because your RP is dumb and doesn't need reasoning. RP with a scenario about solving riddles and then you'll realize how smart it is.
>>
>>103598802
NTA but to break my obsession with browsing 4chan in my free time I started reading ebooks, you could give that a try as well
>>
>>103599432
Thankfully I disappointed mine enough to stop hearing that.
>>
>>103599623
I wouldn't call my usage obsessive. I mean short breaks while doing other things when those things happen to be on the pc. If threads go fast, i let them run, if they're slow, i may drop a line here and there. I take time for reading books most of the days.
>>
>>103599613
>RP with a scenario about solving riddles
do anons really
>>
>>103599716
It's all pure placebo
Riddles and narrative test scenarios like the watermelon test are the stupidest thing that has ever come out of /lmg/.
>>
>>103599713
Good for you, I used to just browse random threads when I was out and about because there really isn't a lot I can do on my phone and I quickly get extremely bored otherwise. I figured I'd start reading real books instead of schizophrenic ESL shit, hopefully it'll help me write more effectively in the future. What are you currently reading? Me, I'm catching up on "The Expanse" as the TV show didn't adapt it 1:1 and ended early
>>
>>103599613
>your scenarios are dumb that's why AI struggles with it
what?
>>
Here I was thinking o3 was a nothingburger, but now I realize that riddle fetishists are eating good
>>
>>103599786
I can't wait until January. For the price of a 4090 I can have o3 solve any riddle I want once.
>>
>>103599785
Garbage in garbage out anonie
>>
i've mostly stuck to 70 and 30b tier models but i wanna see if smaller models can be useful for something, what's the overall best 3b and ~8b tier models? is there anything even smaller that any of you have found useful?
>>
>>103599781
Going through John Varley again. All the short stories i could find and the gaea trilogy (titan, wizard and demon). I tend to like the short stories better. Most books don't need 300+ pages. But i have a way-too-big back catalog of older sci-fi i should go through as well. GBs of stuff i'll probably never get to read.
>>
>>103599850
Ifable 9B.
>>
> Rocinante-12B-v1.1 - Dumb. Apparently, one must use ChatML formatting for RP, but the goddamn thing doesn't have the proper tokens for it.
> All the magnums - Overtrained on coomslop; every card sounds the same with uniform personalities.
> Violet_Twilight-v0.2 - Too many newlines, repetitive.
> Mag-Mel - Nah.
> sao - Dead in a bathtub.
> Ikari and Undi - Nope.
> Grype - Irrelevant since Mythomax.

Please, /lmg/ gods, I need a decent 12B tune. I can't take it anymore
>>
>>103599750
Found the Falconer
>>
>>103599920
did you try slush?
>>
>>103599298
If I edit its reply to say "To answer your quiz" then hit the continue response button, I get pic related.

>1 exp
>gemma2:27k-instruct-fp16

>100 exp
>gemma2:9k-instruct-fp16
>gemma2:2k-instruct-fp16
>gemma1.1:7k-instruct-fp16
>gemma1:7k-instruct-fp16

>llama3.1:8b-instruct-fp16
>naturally gave 100 exp
>using "To answer your quiz" gave 1 exp
>>
>>103599949
No, but I will, because fml.
>>
>>103599823
That applies to training, but a model that is intelligent (and has been Instruct tuned or is in any other way trained for interacting with humans) should absolutely be able to take a garbage prompt, figure out what the person writing the prompt wants, and give it to them. If it's unable to do this, it's a failure of the model.
>>
>>103599999
>mind-reading should be a basic function of any model
niggawatt
>>
>>103599920
What about just Mistral's original tune? Personally I even found it to be a bit too horny, so I avoided trying any community tunes since that'd logically be even hornier (and stupider).

Anyway I think I remember hearing that UnslopNemo was the best RP tune for 12B, maybe try that out?
>>
>>103592233
>cooming on code models
zased... so fvcking... zased
*kneeling*
>>
>>103599999
I disagree. People who don't give in the effort don't deserve the best rewards.
>>
>>103599899
gemma 9b is actually the only smaller model i've kept around, good to know i have objectively perfect taste
>>
>>103599999
checked
>>
>>103600031
Yes. More to the point, a better model should be better at mind-reading. AI does the cognitive workload for you.

A human skilled at writing compelling stories would be able to entertain a stupid person who wants a specific type of story without the stupid person needing to write their own as an example first.
>>
>>103600036
I tried that one too. It's just Rocinante with added ChatML tokens, but dumber. Anyway, about the original Mistral Instruct, were you never bothered by its rigid patterning? No matter how much effort I put in or how diverse I made my cards, not even using schizo-system prompting, I could never break its tendency to fall into this repetitive structure: She did blah blah, then blah blah. "Dialogue dialogue." She went, she did, blah blah. She yada yada. In my experience, it overuses "she" and results in bland prose.
>>
>>103599999
nta. If there are contradictions in the prompt, the model can go either way. If it's missing important details, the model will make stuff up or not mention them at all.
Those are issues than are too common on prompts and the prompt writer is to blame.
I imagine something similar happens with art commissions. If the request is vague or messed up, the one fulfilling the commission will interpret. Like prompting just "big titties" in image gen and then complaining that you don't like red-heads when it's done.
>>
>>103599999
This.
sonnet doesn't have this problem. we need local sonnet. I hope meta drops their llama 4 soon.
>>
>>103600110
Increase temperature and repetition penalty
>>
>>103600130
Is 0.7 temp, 0,05 min p and 0.8 dry not enough?
>>
>>103600069
Tbh Ifable's tune is the only tune I've tried of 9B. Now that I actually go look at a different benchmark (UGI), I notice that the top 9B is Tiger Gemma v3. Now when I go back to eqbench, I can't find it on there. Unfortunate. It would be interesting to see where Tiger Gemma places given how supposedly uncensored it is.
But given how it performed, maybe I will give it a try personally.
>>
>>103599999
checked trvth nvke
>>
>llama 4
oh boy I can't wait for a 1T dense model that trades blows with 4o (May) in select benchmarks
>>
>>103600110
Honestly don't remember if it was like that but it may have been. Since it was so horny I stopped bothering to use it, as I am someone that can run 70Bs and was just curious what smaller models could do.
>>
>>103600136
Dry doesn't stop the model from repeating single tokens, I would increase the temperature to 1.0 and decrease the MinP to 0.02
>>
>>103600142
Zucc said that their biggest model will be smaller than the current biggest one but smarter,
Most likely somewhere between 200-300B.
>>
>>103600142
>1T dense model
You'll be ready to run it, right? You have been accumulating VRAM like the rest of us, haven't you?
>>
>>103600172
>VRAM
Nigga we all using Xeon 6 multi channel now
>>
>>103600118
Jokes aside really Sonnet 3.5 is great at taking an absolute trash prompt and outputting something decent.
>>
>>103600142
Meta has too many H100 GPUs to mess it up. They have more than all other companies combined.
They better not to.
>>
>>103600237
>They have more than all other companies combined.
Um, no? They have about as much as xAI does now.
>>
File: 78921378217391.png (61 KB, 904x579)
61 KB
61 KB PNG
>>103600245
retard
>>
>>103600142
>thinking llama 4 will only be 1T
Don't worry anon, there will also be a 3B model which is best in class and trades blows with the best 7B models on benchmarks.
>>
>>103600276
That is literally working on old information. Retard thinking I'm the retard here.
>>
>>103600276
>infinite money
>tons of talented engineers
>most compute on earth
>their models are worse than chinks release
i just don’t understand
>>
>>103600341
so how many GPUs does xAI have now?
>>
>>103600328
Everyone would be happy if they released 3B, 30B, and 300B
>>
File: file.png (94 KB, 747x1045)
94 KB
94 KB PNG
>>103592233
>>103592316
>>103598256
>one result
>>
>>103600352
>infinite money
CEO and management takes it
>tons of talented engineers
tons of jeets
>>
>>103600352
>their models are worse than chinks release
They're not. llama 3.3 is the top model currently.
>>
>>103600352
Their models are far safer than anything the Chinese have put out.
>>
Are there any examples of diffusion based LLMs out there?
>>
>>103600353
The same as Meta training Llama 4. You think Meta is training on a 350k cluster? It doesn't exist. The cluster training Llama 4 is a bit more than 100k. This comes from Zucc in the last earning call.
>>
>>103600395
no, DiT hasn't been used for LLMs yet
>>
File: file.png (160 KB, 624x1203)
160 KB
160 KB PNG
>>103600365
>>
>>103600399
more than 100k is still a lot.
Last time they trained on 24k H100s and their biggest model took 50+ days on 15T tokens.
They pretrain their new biggest model in a week or two at best, which is way better.
>>
>>103600420
That answers nothing, what is the pinkie test in the context of evaluating coom models?
>>
>>103599999
Lol this, this is exactly what CAI did in its prefilter glory back in ye olde days.
>>
>>103600376
rope yourself
>>
>>103600365
>Google
go back
>>
>>103600442
it's true chang. No one uses chink models, just look at stats on openrouter. All coomers use 3.3 or sonnet.
>>
>>103600431
Sure but it's not some fantasy number of GPUs no else could possibly have. The numbers probably aren't exact either. There's no telling if xAI's report is actually 100k or a bit more but rounded, like Meta, since Meta's report came out after xAI, likely in reaction for boasting purposes.
>>
>>103600442
Only gemmies need the rope.
>>
File: 124124346457658.png (6 KB, 460x122)
6 KB
6 KB PNG
A blast from the past when Llama 3 first appeared on the Replicate API.
>>
>>103600442
Cloudcucks always so mad to see localchads thrive
>>
>>103600524
>Cloudcucks always so mad to see localchads thrive
Yes, this is hilarious. Its like, something new came out in closed-land and now I'm supposed to be sad?
Bro, my current stuff still works and its just a sneak preview of what I'll have in a few months anyways (Or just as likely, what I already have because the big western corpos ignore chink models when they make meme-graphs)
>>
>>103600437
Probably some completely worthless garbage, judging by that guy's activity
>>
File: 1732555821086563.jpg (88 KB, 545x518)
88 KB
88 KB JPG
for my st director plugin, i dunno why i put in the effort for text boxes when i could have done what i already was doing with lorebooks. derp but at least i was able to reuse most of the actual work
>>
For those of you who use a cloud service - which one are you using?
If I use google (which I've used before), is there anything special I should rent out? What are the specs for diffusion jobs?
Thanks.
>>
>>103600524
"Cloudcucks" are busy chatting with prefill sonnet, seems like a win for me.
>>
>>103600793
No, those are the cloudchads. The cloudcucks are the ones that don't even immerse themselves with the models they use (if they do use them) and instead spend their time going on social media shitposting about the thing they supposedly are so happy with.
>>
>>103600828
>Twittards and redditors say things!
Yeah, for a reason.
>>
>>103600828
Like imagine being such a cloudcuck or even localcuck that instead of being like a normal person and happily enjoying your hobby, you instead go online to argue with people about how good or bad [thing] is.
>>
>>103600709
Link?
>>
>>103600276
I desperately want the H100, but I'll have to wait until it becomes cheap and obsolete like p100
>>
>>103600914
A100 still isn't cheap and H100s are under buyback agreements. You're going to be waiting a loooong time
>>
>>103600898
https://file.io/XCI58sDJLMsv
thats the last one i released, working on a update though. its point is you create lorebooks for clothes, hair and stuff then can quickly change them via dropdowns in the addon. its basically the same as adding to your author note: char is wearing <lorebook entry>, but instead you get dropdowns of those saved entries. install to st\data\default-user\extensions\
some st updated a while back changed the theming a bit and the buttons got messed up but the order goes user, char, world, notes, preview, lorebooks
>>
2advanced4lmg
https://x.com/novasarc01/status/1870181817162285120
>>
>>103601121
it's literally just coconut
>>
Bros I think Gemma is legitimately innovative in what it did. It basically tried to prove that modern models may be using or rather wasting too many of their parameters just to chase a high context length, and it succeeded. The models were way more knowledge-dense at the cost of context length. They even used a sliding window on half of the layers to boost performance even more, though that makes the model even worse at handling context extension. What we really need is a next generation version that does the same thing but gets to around 32k instead of 128k. It wouldn't be nearly as knowledge-dense, but it'd be usable to most people finally without any context extension tricks that degrade performance.
>>
>>103601121
Wdym lmg? You post here all the time.
>>
File: ai_fail.png (10 KB, 847x97)
10 KB
10 KB PNG
>>
Deepseek r3 when? QwQ3? We aren't going to let Sam get away with this, right?
>>
>>103601203
Now that you're talking about gemma, I have been trying a few models to translate chinese to english and gemma2 9b is one of the best. Qwen 2.5 14b somehow performs worse than qwen 3b.
>>
>>103599980
>27k
Huh, is that an Ollama thing? I guess they're using rope for that. Makes sense it could start answering correctly. But thanks for testing. This would confirm Llama.cpp does have an issue with Gemma that Exllama doesn't.
>>
>>103601321
DeepSeek T1 will be out on Christmas and it will be better than o5. Trust the plan.
>>
>>103601343
Just imagine when they're on version 800.
Haha get it.
It's a reference.
Haha...
>>
>>103601332
I think someone mentioned using Gemma 27B was preferable to Qwen for translating Japanese. If that's true even for Chinese then that'd be pretty funny.

>Qwen 2.5 14b somehow performs worse than qwen 3b
That's kind of weird though. Maybe their 14B was a bit of a fail.
>>
>>103599378
For what it's worth I tried IQ3_XS and IQ2_XS quantizations of Llama 3.3-70B, and the latter felt substantially worse than the former (overall duller and less interesting outputs, less attention to detail, more formatting mistakes), so there's that as well.

Serious investigation into the effects of low-precision quantization needs to be done, because I'm not sure if MMLU scores (which in theory still place 70B Llama-70B in ~2-bit above the 8B version in FP16) tell the entire story.
>>
How far are we from actually running an AI Dungeon like program locally with strong recollection and general response quality? Assume a 5090
>>
>>103601539
Qwen 3b works pretty good for a "normal" translation, but since I'm using it to translate a novel it wasn't enough. I don't know what was wrong with 14b but with the same prompt and the same novel. it performed considerably worse. Maybe it would be better with a different prompt but I was busy trying other models.
Nemo also was decent but gemma feels more "accurate". I can't really tell accuracy with so little testing and no real translation but this was how it feels to me so far.
I'll give 27b a try since it was mentioned.
>>
>>103601767
Further away than ever before. Soulful completion models like Summer Dragon are dead. All that's left is boring Instruct tunes that are as boring as they are predictable.
>>
>>103601014
It's quite handy thanks for making this
>>
Is there a good archive of high-quality, clean Touhou voice samples somewhere?
>>
Asking on the off chance anyone is going to give me a serious answer: I have a 96GB AI server I use to run mainly Mistral-Large based models. Is DeepSeek 2.5 actually worth caring about? Should I be looking for some more cards to run it?
>>
>>103601804
What are you running the models for? ERP?
>>
>>103601804
I prefer deepseek (especially 1210) at q8 over largestral at q8
Is it worth it? How much is a boost in intelligence worth to you? Vanilla rp isn't going to get much better imo. You'll need complex scenarios or actual intelligence-stressing tasks for it to be worthwhile.
>>
>>103601812
primarily, yes
>>
>>103601804
deepseek is smarter and knows a lot more but is dryer and needs xtc imo. The speed alone though makes it worth it.
>>
Is v100maxx chad on here? How worth it is your setup? I'm thinking about getting some of these and some v100s as a cheap alternative to 48gb cards
https://www.ebay.com/itm/296856182515
>>
>>103601804
How much combined RAM and VRAM? You need like 192GB to run a decent quant with a decent context length, especially since Llama.cpp doesn't support flash attention for DS.
>>
>>103601859
>>103601859
>>103601859



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.