[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101514682 & >>101507132

►News
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1689703011561.jpg (29 KB, 480x360)
29 KB
29 KB JPG
►Recent Highlights from the Previous Thread: >>101514682

--Llama 3.1 benchmarks: >>101521353 >>101521436 >>101521438 >>101521521 >>101521558 >>101521590 >>101521501
--Llama 3 405B leaked base model discussion and distribution: >>101516633 >>101516705 >>101516732 >>101517331 >>101517074 >>101517390 >>101518290 >>101518318
--Techniques used for 12B model and multilingual model comparison: >>101515246 >>101515342
--Mistral-Nemo 128k context and its limitations: >>101519084 >>101519151 >>101519185 >>101519227
--Can language models do search?: >>101514990 >>101515020 >>101517627
--GGUF vs EXL2 performance and Llama-related tools discussion: >>101520034 >>101520119 >>101520143 >>101520235 >>101520710 >>101521084 >>101521257 >>101521405 >>101521513
--Speculation of a Meta insider and discussion of Yann LeCun's V-JEPA project: >>101517185 >>101517294 >>101517303 >>101517330 >>101517236 >>101517252 >>101517272
--Performance comparison between a 27 billion parameter model and WizardLM 8x22B: >>101515687 >>101515961
--Nemo and ooba compatibility, software interface for model deployment: >>101516907 >>101516947
--Flash attention works with exllama, llama.cpp, and gemma-27b: >>101515377 >>101518607
--Anthropic's Sonnet improvement and theories of API model degradation: >>101515487 >>101519037 >>101519196
--Anon celebrates Nemo support in llama.cpp: >>101517538 >>101517574 >>101519972 >>101520070
--Miku (free space): >>101519212 >>101519882 >>101516683

►Recent Highlight Posts from the Previous Thread: >>101515008
>>
Remember when Miqu was short for mistral quant?
>>
reminder that 60% of posters are underage coomer tourists
>>
so, 405B didn't improve from 2 months ago.
How over is it?
>>
>>101521779
wrong, I came here from Twitter after seeing the llama leak
>>
>>101521779
>he expects us to actually hang out here when miqu1 that was released last year is still better than miqu2
>>
>>101521800
Stop being a retard. Its base model numbers and they are ahead of GPT4 in most cases, including the 70B 3.1
>>
Will column-r save us from meta's incompetence?
>>
>>101521762
terrible recap
>>
File: L3.1-benches.png (26 KB, 1293x101)
26 KB
26 KB PNG
https://github.com/Azure/azureml-assets/pull/3180/files
A bit disappointing, but still good.
>>
>>101521839
>a bit
>>
>>101521762
Keep it up Recap-kun we love you!
>>
>>101521826
It's instruct.
>>
Where are the Instruct scores?
>>
>>101521839
yeah, I would have expected more of an improvement from a doubling of params, but maybe it just means that the larger model has more memory and needs to be trained for longer to make better use of it?
>>
File: ethical.png (106 KB, 893x435)
106 KB
106 KB PNG
Llama 3.1 is going to be more ethical, fair and inclusive.

souce: https://old.reddit.com/r/LocalLLaMA/comments/1e9hg7g/azure_llama_31_benchmarks/
>>
Is there a leaderboard that has the benchmarks for all LLMs, proprietary and free? The openLLM leaderboard only has open-weight models, and LMSys only has mt-bench and mmlu.
>>
I remember when recap anon used miku images the quality was so much better
>>
>>101521839
The community will fix it.
>>
>>101521839
no mmlu normal or mmlu pro?
>>
>>101521854
>>101521855
we only got the base as the leaked model right?
>>
>>101521826
If base model is bad, the instruct will be bad too. Good models don't even need that stupid base/instruct distinction and can be used both ways, like Cohere ones.
>>
>>101521860
I would rather be cucked using API models than using local models.
>>
>>101521921
>Good models don't even need that stupid base/instruct distinction and can be used both ways, like Cohere ones.
wait what? you can use base cohere models normally? it works?
>>
>>101521921
holy shit you're brain dead, lurk more
>>
>>101521513
Just tested with bigger context, asked to summarize this https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

38K context

>exl2 8.0bpw
Metrics: 256 tokens generated in 17.42 seconds (Queue: 0.0 s, Process: 87 cached tokens and 38615 new tokens at 5237.72 T/s, Generate: 25.49 T/s, Context: 38702 tokens)
14.3 t/s on ST

>gguf Q8_0
prompt eval time = 21019.59 ms / 38702 tokens ( 0.54 ms per token, 1841.24 tokens per second) | tid="131307092631552" timestamp=1721667015 id_slot=0 id_task=0 t_prompt_processing=21019.586 n_prompt_tokens_processed=38702 t_token=0.5431136892150277 n_tokens_second=1841.2351223282894
generation eval time = 14004.10 ms / 318 runs ( 44.04 ms per token, 22.71 tokens per second) | tid="131307092631552" timestamp=1721667015 id_slot=0 id_task=0 t_token_generation=14004.103 n_decoded=318 t_token=44.03805974842767 n_tokens_second=22.707630756500436
9.0 t/s on ST

The gguf quant didn't respond correctly, it cited other papers.

exl2 is definetely better at higher contexts
>>
>>101521917
Actually idk. Whatever, I'll wait for the actual announcement from Meta.
>>
>>101521917
no, we got instruct.
>>
>>101521939
I use Command models as completion and they just work.
>>
Is the announcement tomorrow or today?
>>
>>101521863
https://livebench.ai/ is good and the results align with my personal experience
>>
>>101521963
All models "just work" for completion...
>>
>>101521957
we definitely got the base only
>>
>>101521980
instruct afaict
>>
>>101522004
how do you know that?
>>
Man, Mistral Nemo is crazy sensitive to the instruct template if you are doing any sort of more complicated prompting.
Dayum.
But when it works, it seemingly works pretty well.
>>
>>101521942
why is the common response to reading something you don't like but know is true to act like such a fool?
>>
>>101521980
No need to cope; not every day is for us to celebrate. I am more interested in how it compares to the new Mistral.
>>
>>101522026
Can you post example? I mean example what works for you.
>>
>>101522013
The embedding
>>
>>101522028
I'm not coping, that's a base model, why do you believe it's the instruct one?
>>
>>101522027
Low iq.. sweating like pig in summer and being angry take your pick. /LMG/ was always comfy
>>
>>101522028
until mistral releases a 30b range model they aren't even worth acknowledging these 7 and 13b scraps are worthless
>>
>>101521980
No, you can tell by how well it performs at gsm8k benchmarks. There is no way, the model we have is just a base model.
>>
>>101522043
can you elaborate maybe? that's a bit vage
>>
>>101521839
>L3.1 70b social_iqa: 81.3%
roleplay bros we won
>>
>>101522055
>>101522043
oh so that means that those benchmarks is the very best that 405b can do? it's joever...
>>
405b lost massively, 3.5 sonnet is probably ~70b yet it MOGS llama 3 405b in every conceivable way. Meta lost.
>>
>>101522027
>>101522047
bcs you're so fucking wrong I wish there was IQ tests in place to prevent NIGGERS from using 4chan
>>
>>101522085
3.5 is probably a moe like deepseek, but bigger
>>
>>101521970
I was looking for something that just recorded scores for pre-existing benchmarks (e.g. MMLU, TruthfulQA etc.), so I could compare the llama-3.1 benchmarks with existing models.
>>
Why are you still localkeks?
>>
>>101522125
Because I like hearing my laptop fan whirr.
>>
File: rat.gif (2.35 MB, 490x390)
2.35 MB
2.35 MB GIF
>>101521839
> open weights on the level of GPT4 turbo with the same context size
> disappointing
>>
Wtf are some anons about? Did they not even read the bench marks? New 70B is a massive improvement and is supposed to be 182K context
>>
>>101522125
I am dating my computer.. What is the alternative really?
>>
>>101522149
>New 70B is a massive improvement and is supposed to be 182K context
It's worse than 3.5 sonnet though?
>>
>>101522145
I'm 90% sure it won't be as good as GPT 4 turbo because their dataset mixture is different.
>>
>>101522149
this happens every time there's a big release, it's just people trolling (and maybe a few easily-swayed people too dumb to tell)
>>
When will local beat 3.5 Sonnet?
>>
>>101522177
/thread
You won't be running this locally, so you might as well use Claude 3.5
It's over.
>>
>>101522177
Its more like claude sonnet 3 now which is a big leap from before.

Infact the new human eval, the most important score for creative writing which is what 90% of you faggots use it for doubled.
>>
File: U WOT M8.gif (2.25 MB, 320x240)
2.25 MB
2.25 MB GIF
>>101522179
>gpt4-slop when llama3.1-slop enters the room
>>
>>101522177
N-No, but the i-instruct...
>>
>no Bitnet as promised
>Worse than sonnet 3.5
It's fucking over. It's only downwards from here
>>
How many weeks until support in llama.cpp?
>>
>>101522222
Quints of truth. Owari desu.
>>
Um, anons, I don't feel so good about local models all of the sudden?
>>
>>101522207
humaneval is a code benchmark thoughbeit
>>
>>101522149
It is just that in practice, it will likely be a 5/10% improvement. I mean, it is not bad; the last two years were absolutely awesome, with every new release pushing the boundaries (haha) a lot. So these models, where we are definitely getting something better but not that much better, sound kind of lame.
>>
>>101522228
>How many weeks until support in llama.cpp?
supposedly same arch, so perhaps instant perhaps a week if different tokenizer
>>
File: 1640539815379.png (184 KB, 768x859)
184 KB
184 KB PNG
>>101522222
>digits
>>
>>101522222
Did you really expect a model better than 3.5 Sonnet? This is the same model that was still in training when L3.0 dropped... of course it's not gonna be better

my theory is that 3.5 sonnet is sonnet with turbocharged feature activation BS and I think local would benefit from looking into that
>>
>>101522222
Holy digits, what do we do now....
>>
>train 405B parameters model
>retrain 8B and 70B, just because we can
>no, you don't get other sizes

They're mocking us, aren't they?
>>
>Meta shots itself in the foot
Did NovelAI just win?
>>
>>101522222
localniggers... did we lose?
>>
>>101522265
Neigh.
>>
>>101522265
They have already won even before this shit show.
>>
>>101522231
>>97223983
>For the record, I completely and unequivocally support Undi and his creation of new model hybrids, and think that everyone who attacks him is mindbroken incel scum, who may or may not be employed by OpenAI to do so.
>everyone who attacks him is mindbroken incel scum

>>97062246
>I'm not Petra. Petra's an amateur. I'm something considerably worse.
>I'm also the point of origin for the practice of the above being added to sysprompts; as well as the 2, 5, 10, 12, and 60 times tables, which enable bots to answer arithmetic questions, when everyone previously said that they never could, and laughed at me for trying.

Go away Petrus!
>>
File: Capture.png (29 KB, 647x522)
29 KB
29 KB PNG
Today is my first day with dual GPUs and I'm curious about the process. So far, the second has been zeroed out in every metric since the PC turned on, to the point I was wondering if something went wrong. Now that I got ye olde /aids/ installed and a bigger model to utilize it, I see that its VRAM is being leeched (at an even amount between the cards) but offering no processing power to the generation. Is that just how it'll be? Or is it conditional, like there just isn't enough to chug to make both cards work? I'm generating quite fast despite using 2/3 of my VRAM.
>>
>>101522222
Sonnet and GPT4 are already instruct-tuned. L3.1 evals are done on base model. I suspect Sonnet 3.5 is just Sonnet with smart vector (tell it to larp as einstein)
>>
>>101522294
>smart vector
What's that?
>>
>>101521767
No actually I never knew that. Clever.
>>
>>101522288
>4060 Ti
ngmi
>>
Bitnet?
>>
>>101522294
>>101522303
https://www.anthropic.com/research/mapping-mind-language-model

is there a legit reason why local can't do this as well?
>>
>>101521858
Means we need better data. Higher resolution compression of the same shit stops mattering past a certain point
>>
Where is our "Beats gpt4 + multi-modal" local model that we were promised?
>>
>>101522086
I'd sigh and call you an idiot too but we already know that. You know what they say about arguing with stupid people. It is depressing to know people of such low intelligence are capable of solving the captcha though. Not only because they bring down the quality of the conversation but because it's a black mark on the human condition of society as a whole that we as viewers have to see up front in real time.
>>
File: FGIsyCqXsAUnDCI.jpg (124 KB, 1200x1200)
124 KB
124 KB JPG
>>101522294
>>
>>101522308
Bitnet
>>
>>101522307
I've already made it. I'm coming from a 1070. Everything feels like magic with how fast it all works now.
>>
>>101522310
That feature doesn't help improve the model though
>>
>>101522303
It connects you to India, where Rajet starts replying to your RP. Very smart.
>>
>base non instruct tune 405B benchmarks putting it between gpt4-0 and claude 3.5. Distilled 70B only a bit worse.
fail, let us down, what a shit show!!1!
>>
>>101522325
bitnet bitnet
>>
>>101522319
Do you also know what they say about stupid people? They can't recognize how stupid they are, often thinking they are smarter than they really are.
Now shut up, I'm not going to waste time explaining to you zigger why you are wrong.
>>
The common enemy is ngreedia
>>
File: nemokino.png (175 KB, 1381x763)
175 KB
175 KB PNG
>tfw nemo unironically understands the shivermaxxing system prompt
finally, i feel at home
>>
>>101522365
Now THAT'S what we call purple prose
>>
>>101522365
*you can't help but*
Seriously, is this how you roleplay?
>>
>>101522353
That is unfortunately because you aren't capable of doing it. All you're truly capable of is getting mad at things you don't understand... like an idiot.
>>
you niggers are giving me mixed feelings.
Are we back or not? I'm tired of proxy faggotery.
>>
File: 1720546748706394.png (109 KB, 339x296)
109 KB
109 KB PNG
>>101522361
You're just mad they're democratizing AI for the common man and allowing more people to inference in their machines than ever before
>>
>>101522405
nah it's joever. rope time.
>>
>>101522396
uhm sweaty, we're shivermaxxing here, this and "ahh ahh mistress" are the only acceptable answers
>>
>>101522405
Come back in 2 weeks. Every time a new model is released, the "people" can't help themselves and shitpost.
>>
>>101522405
We are back, Ignore the bad faithers / retards parroting them.
>>
>>101522405
Doom posting faggots, local can only go up.
We've got cohere as backup.
>>
>>101522414
Wouldn't call locking down consumer VRAM democratizing AI but ok
>>
>instruct Mistral to not use cliche phrases like "barely above a whisper"
>instead it uses "barely above a murmur"
I must be getting trolled at this point.
>>
>>101522270
Did we ever won?
>>
>>101522405
vramlets will be eating super good with 3.1 8b, slightly less vramlet too with 3.1, 70b 405b is doa
>>
>>101522401
>mad at things you don't understand
Oh yes, because you clearly understand what you're talking about, >>101521921 anon.
>>
>>101522405
Yes, we're back. Retard.
>>
>>101522405
70b and 8b get performance upgrades and way more context, it's a win for me
>>
>>101522445
Maybe the real 405B was the 8B and 70B we made along the way.
>>
>>101522328
it does though
>>
>>101522462
It only "improves" the "safety"
>>
Give me llama4
>>
>405b still writes worse than Nemo
Wow
>>
>>101522405
Don't trust hopefags, local is done after this.
>>
>>101522405
We will never be half as good as proxy models.
>>
>>101522405
Basically, once we get the instruct tunes 405B Should be around 3.5 Claude level, 70B will be about 10-15% worse than that. Also 128K context
>>
>>101522405
come back when bitnet
>>
If the new 70B isn't far from 400B, doesn't this mean that distillation works really well? Why hasn't anyone else except Google pulled off successful distillations in the open weights category?
>>
>>101522472
>Nemo
>Write good
Lol >>101522365
>>
>>101522472
Nemo really is a modern Mythomax, it's a little retarded and can need a few retries but damn it has so much sovl
>>
>>101522445
How does it compare to Mistral?
>>
>>101522464
Did you read the paper?
>>
>>101522487
What do you think new gpt4-o / mini / claude sonnet 3.5 is? Distilled then further trained.
>>
File: mikucity.png (1.51 MB, 1024x1024)
1.51 MB
1.51 MB PNG
>>101522405
With mistral-nemo we're back.
128k context.
12b size.
Not as smart as Wiz or CR+, but less positivity bias and more creativity to make up for it. And much faster of course.
8k context fags BTFO yet again
>>
>>101522464
>Claude 3
>Improved safety
>>
>>101522482
Mistral-Nemo also has 128k context on paper in reality it does sucks after 12k
>>
Any models finetuned for CoT RP? Because the only way for the model to take initiative is to specifically ask for it.
>>
>>101522502
>in the open weights category
>>
>>101522503
Same shit not even trying it
>>
>>101522492
You do know that JB, on the right there, is a meme right? That is what its supposed to do.
>>
>>101522502
3.5 Sonnet is also open-weights if I leak them :)
>>
>>101522472
Did someone get it running?
>>
>>101522521
lol
>>
>>101522487
Sao successfully distilled Opus into Llama 3 8B.
>>
>>101522511
What backend? It took till 160K-ish context before I noticed any degration of performance.
>>
>>101522521
do it
>>
Daily reminder that there's only one benchmark that matters and anyone obsessing over 'ofishal' benchmarks is a reddit pseud that needs to go back.
>>
dear all reddit trannies: die
>>
>>101522550
So true, and that benchmark is https://livebench.ai/
>>
>>101522525
literally impossible locally, 17+ 3090s required for 8bpw you're not running that in your house
>>
File: 1675205956243558.webm (2.1 MB, 270x480)
2.1 MB
2.1 MB WEBM
>>101522540
Kobold. What are you using?
>>
>>101522550
Yeah, lmsys.
>>
>>101522567
No, lmsys is dogshit
>>
Datacenter GPUs have had higher VRAM than 24gb for a long time now. It's easier than ever to find them second hand. And as a bonus to you you're not giving them money directly.
>>
>>101522566
VLLM
>>
>>101521921
true
>>
>>101522552
Why yes, I do like diversity, inclusion and equity. Thank you!
>>
File: 1711063256087873.jpg (1.3 MB, 1700x1400)
1.3 MB
1.3 MB JPG
>>101522579
I give it a try. Thanks.
>>
What system prompt are y'all using for Mistral Nemo? I really want to like it but it seems very slopped and heavy on positivity bias.
>>
>>101522594
ahh ahh mistress
>>
>>101522573
405÷48
~8.43
where are your 9 a6000?
>>
>>101522594
>heavy on positivity bias
Your lying. That is one of the benefits of it, no positivity bias at all.
>>
>>101522594
You're an expert roleplayer, you have 30 years of experience in professional storytelling and your expertise is writing natural and idiomatic English without common tropes of novel fiction.
>>
>>101522566
>Kobold
Wasn't aware it was supported there yet. Are you on that frankenkobold branch? I got it to 60k on llama.cpp without issues
>>
You don't fucking get it, it's over, Llama 3.1 didn't score 101% on the benchmarks, just go give money to openai as penance for hoping you could beat them!
>>
>>101522615
I can also confirm that llama.cpp didn't seem to have any issue.
>>
>>101522608
No, it does have some positivity bias. But that is an issue only if your card is... very innocent.
>>
>>101522615
Yeach https://github.com/Nexesenex/kobold.cpp
>>
>>101522641
this but unironically.
>>
>>101522464
>>101522462
>>101522328
A "feature" can be whatever you want it to be. If you reroute coding requests to a model with boosted "expert coder in all languages" feature you can amplify a model's coding ability. Why not do the same for RP?
>>
>>101522594
Single instruction Libra style prompt. No discreet system message, it's part of the user prompt.
>>
>>101522659
3.5 sonnet is a normal dense model
>>
>>101522615
>>101522649
Maybe it does not have the latest repo. In that case my bad, I do not want to mislead anons.
>>
16 HOURS
>>
>>101522666
3.5 is very likely a moe, idiot.
>>
>>101522681
It's not.
>>
>>101522681
of course it is, kek
>>
>>101522681
>is very likely idiot
do you think before you write? you don't have to reply I know what you're typing right now.
>>
Hope there's something on the hardware size to run these giant models soon. Zucc needs to cannibalize nvidia by releasing his own pcie-compatible AI hardware with gigantic amounts of VRAM. I wouldn't even mind if it could only run llama models
>>
>>101522685
Where is your proof?
>>
>>101522601
The RTX 8000 is also a thing, 48gb for half the price of the A6000.
Besides who's going to try to run that at full precision anyway? It's a total waste unless you're CPUMAXXING or something.
>>
Where's the multi-modal model?
>>
any benchmark where 4o is higher than turbo is worthless for the record, 4o is dogshit
>>
>>101522704
? It's already known that GPT4 and the likes are MoE.
>>
>>101522715
8000 is basically a 48gb 2080, don't think that has fa2
>>
>>101522719
are we just shitposting now because miqu2 failed? 4o is still sota
>>
>>101522739
shit of the ass maybe
>>
>>101522731
where are you reading this genius?
>>
>>101522739
3.5 sonnet is SOTA, or are you brain dead?
>>
>>101522750
lurk moar
>>
>>101522472
do you have some logs so I can laugh a bit?
>>
why are people saying 405b is bad? isn't it almost on par with gpt4? nobody expected a model this size to beat the closed source ones right?
>>
>>101522770
what gpt4?
there are like 50 versions kek
>>
>>101522757
hurr durr sumbudy sad a thang I rad it I deed
I'm too fucking stupid to factcheck I'll just assume it's true and repeat the same shit when others ask
>>
>>101522770
GPT4 is history now. OpenAI will exit their temple with AGI (real)
>>
>>101522704
>>101522681
no one know anything about what's happening on the API models, let's not pretend it's the case, what if that's an improved architecture of transformers? because it seems like spamming parameters (hello L3-405b) is not the solution
>>
Wait today is Monday. I thought it was Tuesday. FUCK. 24 hours to go until the actual release.
>>
>>101522770
Who cares if it's on par with GPT4 on meme benchmarks? The sota is Claude 3.5
NO ONE will use this model if there's a model better than it. That's also what made Grok Doa.
>>
tax refund just came in, I want an easy setup, is going for just one rtx 8000 worth it?
>>
>>101522801
Pretty much this.
>>
>>101522809
>Who cares if it's on par with GPT4 on meme benchmarks? The sota is Claude 3.5
this, if a 405b can't beat the 2nd best model yet, it's a failure, it was supposed to compete with C3.5 Sonnet
>>
>>101522820
You're better off going 2x 3090. You can get support for FlashAttention 2 which will let you fit more context. Unless space is a factor for you the dual 3090 setup will inference faster and be more future proof around the same budget (or less if you don't have to upgrade your PSU)
>>
>>101522288
check the graph dropdowns - look at CUDA or Compute or summit
>>
>>101522707
Meta quest 4 will feature 2 terabytes of VRAM
>>
>>101522719
Turbo is the worst of all GPT4 variations, what are you smoking?
>>
>>101522793
>papers revised by peers aren't trustworthy now
damn, what will the scientific community do now?
>>
>>101522731
>>101522863
>? It's already known that GPT4 and the likes are MoE.
It's only known that the original GPT-4 was MoE, you gorilla nigger.
>>
>>101522853
i don't have time to build shit myself, where can i just spent like 500€ more and get a pc thats already built
>>
>>101522868
And why would the lastest one not be MoE? MoE is cheaper to run inference on a large scale. It only makes sense that most if not all SOTA models are MoE.
>>
>>101522863
crying into their money. It was one of the biggest fuck-me moments when I realized that, in fact, no institute is immune to corruption, and the whole narrative about scientists should be more celebrated than some stupid celebrities with their IQ around 80, which was just a meme all along.
>>
File: truffle.jpg (70 KB, 1483x1032)
70 KB
70 KB JPG
>>101522874
If you want to be as hands-off as possible with the build maybe this is more your speed.
Caps out at 100b it sounds like though so no CR+ or Llama 3 405b
>>
I for one just look forward to a day when 100k context is normal and doesn't require 10 gigs of vram fuck the latest FOTM model releases
>>
>>101522959
That will never happen. Just wait for the day VRAM is cheap.
>>
>>101522862
4 turbo is good.
>>
>>101522801
Google has said Gemini pro is a Moe and flash is dense
>>
>>101522925
they don't mention quantization so I have no idea if its running unquantized or q2.
also this thing seems way too good to be true, and I can't even buy it.
I'd like to have something more real with a real gpu that i can also use for diffusion and stuff
>>
>>101523007 (me)
also there is this deceptive graph which makes me not trust them. (the difference between 18 and 20 is too large visually)
>>
>>101523006
Where?
>>
>>101523015
Buy an ad.
>>
>>101522615
>>101522649
So i tried the latest fork and it works just fine now.
>>
>>101522925
>200gb/s for $1299
dude that's worse than an m2 mac mini
>>
>>101523015
>hopefully we'll see some RP/ERP focused tunes for it
Buy an ad.
>>
>>101523041
lūrk mōrē
>>
>>101522925
Is there anything for around 600€?
>>
>>101523045
>>101523064
For what?
>>
>>101523015
>Stheno
Buy an ad.
>>
>>101522715
405gb is 8bpw not full...
>>
File: file.png (83 KB, 1338x402)
83 KB
83 KB PNG
>>101523007
https://www.reddit.com/r/LocalLLaMA/comments/1bd2ekr/comment/kujxcd9/

"custom quantization algorithm"
"minimal accuracy loss"
" large gains in speed"
"soon™"
>>
>>101523074
For your finetune, what else?
>hopefully we'll see some RP/ERP focused tunes for it
The model doesn't need one.
>>
>>101522925
This looks like the Stadia of AI
>buy our over-priced useless box...
>...so you can also rent our servers and stream back to this box.
>Also, streaming text gens from us is the only thing this neutered pc can do.
>$1300 btw (plus server tip)
>>
File: 1721097824649005.jpg (47 KB, 562x675)
47 KB
47 KB JPG
>>101523015
Buy an ad
>>
>>101523097
>scam
>raj
Every time
>>
>>101523015
Shills are back huh?
>>
File: 1706815557312128.jpg (32 KB, 480x692)
32 KB
32 KB JPG
>>101523015
>l3 llama / gemma meme-tuners on suicide watch
>>
>>101523136
he praises stheno tho
>>
>>101523099
But I have never finetuned anything. I am a literal bottom tier user who just waits for shit to come out that can run on lower end hardware and has spent 0 dollars on anything AI related. Also I think it could definitely benefit from a tune that made it slightly more RP/story oriented compared to assistant-esque short and polite responder but also if you say prompt issue that might be true since I haven't spent that much time with it yet.
>>
>>101523159
It's a schizo that gets triggered by merges/finetunes, don't bother
>>
>>101523159
Your experience is completely decoupled from reality. Go back.
>>
>>101523159
>>101523185
samefag
>>
>>101523131
Not a shill it just feels like the novelty bias in this thread is insane. Whenever any competent model comes out people are calling it the best thing yet by far. It happened with Stheno too and I think there are better Mistral models and better Llama2 models even.
>>
>>101523203
Name 4 (hard mode: don't shill your models)
>>
>>101523219
Solar, X-Norochronos, Utopia, Westlake (though this one is arguable it's kinda dumb but one of the most creative ones).
>>
>>101523203
MLEWD!!!

>>97223983
>For the record, I completely and unequivocally support Undi and his creation of new model hybrids, and think that everyone who attacks him is mindbroken incel scum, who may or may not be employed by OpenAI to do so.
>everyone who attacks him is mindbroken incel scum
>>101523219
>>
>>101523203
It doesn't have positivity bias, it's not censored, it doesn't have any problem writing long replies, swipes work just fine. Did you download the correct model?
Stheno was not a competent model, that was just astroturfing.
Also this plead:
>hopefully we'll see some RP/ERP focused tunes for it.
is the mark of a shill. Go fuck yourself.
>>
Looking for model and technology recommendations for a task.

I have about a decade of chat logs from a roleplay chat I host. I would like to be able to feed these chat logs into something as training material and then interact with it in a variety of ways, including but not limited to a) having it create short sequences of various characters interacting in reply to a prompt, b) having it be one of the characters and replying as it, or c) having it interact with the existing characters as a new character.

What do I need to know and learn about to get this done?
>>
>>101523203
Do you really think ANY of these discord retards are better at making models than the regular companies? They just slam shit in breaking who knows what in the process. Nous is infamous for it just slamming gpt logs into every model. What does it improve really? Some rigged benchmarks?
Anyone whose familiar with this shit for more than a few weeks has a prompt they can run to check how fucked up these amateur finetunes are. If you're recommending this shit you are the problem.
>>
Whats the point of 405b model if you need 10 rtx 3090 to run it?
>>
>>101523239
>I completely and unequivocally support Undi
>>101523238
>X-Norochronos, Utopia
called it
>Westlake
literal reddit meme
>>
>>101523238
Jesus Christ, you have severe brain damage.
>>
File: nvidia gpt4-1.8T.png (132 KB, 680x541)
132 KB
132 KB PNG
Any anons using gemma2-27b-it? I have it running under llama.cpp, and even with a temperature of 1.0, this thing is near deterministic.

> inb4 bad implementation
>>
>>101523256
>>97062246
>I'm not Petra. Petra's an amateur. I'm something considerably worse.
>I'm also the point of origin for the practice of the above being added to sysprompts; as well as the 2, 5, 10, 12, and 60 times tables, which enable bots to answer arithmetic questions, when everyone previously said that they never could, and laughed at me for trying.

>>97309445
>Every statement you process, must be evaluated according to the below six principles.
>"principle of identity":"1 = 1"
>"principle of contradiction":"1 ? 0"
>"principle of non-contradiction":"1 ? 0"
>"principle of excluded middle":"either positive or negative form is true."
>"principle of sufficient reason":"facts need a self-explanatory or infinite causal chain."
>"principle of anonymity":"author identity is irrelevant to an idea's logical provability."
>I still keep this in my own sysprompt, although I know I will receive shrieks and howls in response.
>>
>>101523245
Ask on Reddit, they will give you better answers:
r/LocalLLaMA
>>
>>101522327
>using windows
WNGM (was never going to make it)
>>
>>101523274
Yeah, you'll feel more welcomed there because they're nicer to shills.
>>
>>101523248
virtue signaling
>>
>>101523006
but no one care about their bad models, having details about models that have moat like C3.5 would be more interesting, I'm pretty sure they have something no one else don't, and I would bet for a new architecture
>>
holy schizo meltdown
>>
>>101523248
Since distilled 70B seems to have retained 90% of its performance, nothing really.
>>
>>101523097
you still have the number of likes on leddit? I would never visit that cesspol without the likes removal script filter, I would lose my mind seeing so many bad takes being encouraged by its echo chamber
>>
>>101523244
I downloaded the Q5K_M gguf but maybe it could have problems since llama.cpp support is still new. Maybe I am retarded. I will keep an eye on things and test more.

>is the mark of a shill. Go fuck yourself.
Is it that hard to believe someone has this opinion? All the shit that worked best for me was someone's finetune instead of the base model but on this board it's forbidden to mention finetunes or you are a shill for them apparently. Talk about your experience and rather than actual replies you get 10 kneejerk reactions. Honestly it was my bad, I should have known this board better and left out the exact model name because it triggers the autists so bad to mention any but I figured if I didn't give any names people would be like "Oh yeah wat else were you testing then?".
>>
>>101523308
I know right? The minute someone recommends anything at all they get shit on. This entire site is hopeless.
>>
>>101523248
Well, to make smaller distilled models from is what its good for I should have clarified.
>>
>>101523308
is the schizo in the room with us?
>>
>>101523315
go away Petrus seriously
>>
>>101523321
Buy an ad, shill.
>>
>>101523321
Its one schizo / troll.
>>
>>101523314
snowflake
>>
>>101523336
yes, petrus
>>
>>101522925
Might be OK depending on what Orin devboard it has. If it's the 64GB RAM one, it's not bad. Only caveat is I recall the Jetson Nano being a MASSIVE clusterfuck to update - basically you didn't dare touch the kernel since everything else on it supplied by nvidia was tied to the kernel it shipped with, and if they updated that, it all had to be updated at once.
The moment T4 16GB cards touch $500 I'm buying one for my Odroid-H4U system. It'll be far better than this Truffle thing.
>>
>>101523336
Petrus is ruining my business, he must be stopped...
>>
>>101523321
What discord do you partake in?
>>
File: you.png (456 KB, 860x646)
456 KB
456 KB PNG
>>101523324
.
>>
>>101523342
you know damn right that those likes are artificial, they censor and ban everyone that dare to disagree on a subreddit, so you know that it's not representative to the global opinion, reddit is a cesspol
>>
https://huggingface.co/spaces/Xenova/whisper-speaker-diarization
Source
https://huggingface.co/spaces/Xenova/whisper-speaker-diarization/tree/main/whisper-speaker-diarization
>>
>>101523246
I think there are a lot of autistic finetuners who just finetune to get top scores in the benchmarks but then when you use the model in a real situation surprise surprise it's shit. The models that are good for RP don't usually have that high of a score in benchmarks even. Out of all the meme scores Hellaswag is the one that even somewhat seemed to matter back when I paid attention to the scores but I haven't really done that this year at all.
I think in a lot of cases when one of these finetunes are a pleasant surprise luck is a large element in the process. It's basically a monkeys and typewriters situation, if everyone is finetuning then one of the finetunes is gonna end up being good, at least for a specific purpose. It's not because the finetuner is some kind of megamind god smarter than a whole company of pros. I'm not here to suck anyone's dick off I will use anything by anyone of it's free to download and I like the results.
>>
>>101522222
>no Bitnet as promised
there was no promise of bitnet by any company, retarded fudnigger, only the paper creators talked about creating a 8B+ models for it
>>
File: image.png (20 KB, 544x347)
20 KB
20 KB PNG
Wait, isn't this the base model? Do we have benchmarks for Instruct?
>>
>>101523388
>Wait, isn't this the base model?
yes
>Do we have benchmarks for Instruct?
no
>>
File: maxresdefault.jpg (60 KB, 1280x720)
60 KB
60 KB JPG
>>101523362
Yeah, if you know that, why do you need a script to keep your ego from getting hurt?
I hate filterfags like you.
>>
>>101523248
The point is that the improved 70B could exist only because they made the 400B.
>>
>>101522327
>Everything feels like magic with how fast it all works now
if you could actually run wizard you would know actual magic and wouldnt mind 1.5+t/s for great high iq responses every single time in basically every situation
>>
>>101523388
Yeah, it's instruct. Idk why some anons are calling that the base model. Just look at the gsm8k score.
>>
>>101523397
I just explained why, if you're too retarded to understand that there's not much I can do for you, intelligence is not something you can get anywhere
>>
>>101523410
>wizard
Dolphin Mixtral 2.5 is so much better.
>>
>>101523388
No, but like I said instruct 405B should end up somewhere between GPT4-o and claude 3.5. 70B should be 10-15% worse. With 128K context its big.
>>
>>101523397
>Yeah, if you know that, why do you need a script to keep your ego from getting hurt?
Why would I not want to see some biased metrics? Geez I wonder why...
>>
>>101523430
surely you cant be talking about the 8x7 model...
>>
File: an-update-on-truffle.png (202 KB, 1186x1026)
202 KB
202 KB PNG
>>101522925
I had a preorder for a Truffle, but they cancelled it. Looks like they're having manufacturing issues.
>>
>>101523421
>Idk why some anons are calling that the base model.
>model_name: Meta-Llama-3.1-405B
notice something missing?
>>
>>101523427
That ain't a matter of intelligence, you can't explain being a snowflake. Go back.
>>
>>101523410
Not him but even Wizard isn't enough for me. I want something smarter. What does feel like magic though (in a bad way) is trying a small model after being used to a big boy.
>>
File: BenchM.png (25 KB, 850x888)
25 KB
25 KB PNG
>>101523410
Gemma 27B is smarter.
>>
>>101523442
>agentic personal ai computer
Oh so it's a scam for hipsters.
>>
>>101523466
>>101523443
Look at the other results in the repo
>>
>>101523441
I am, I know I'll get called a shill but it's one of the best models we have, it's not woke or biased at all.
>>
>>101523478
lol
lmao even
>>
Newbie here. Is there a correlation between model size on disk and vram requirements? If I have 12gb of vram, I should be able to load any model that's less than 12gb on disk, right?
>>
>>101523478
Not that anon but I thought people generally agreed that the Dolphin models were GPT-4 slopped?
>>
can anyone here even run L3 405b?
>>
>>101523454
Use Gemma 27B
>>
>>101523441
Still way better than the 12B NeMo or the 8B llama 3. It's old but it's smarter than anything else below 70B.
>>
>>101523485
kind of, it's not exact
>>
>>101523459
No anon, mememarks will never represent the reality, no matter how much you'd want it to be
>>
>>101523485
You also have to account for the context, so it's more like 10GB in disk for 12 in VRAM.
>>
>>101523489
I don't even have the disk space for it.
>>
File: file.png (166 KB, 2365x418)
166 KB
166 KB PNG
>>101523454
>Wizard
>smart
Are you stupid?
>>
File: 1721268293103685.jpg (17 KB, 360x480)
17 KB
17 KB JPG
>>101522414
>Democratizing AI when they're abusing their market dominance to never release a card with above 24gb vram for under 5k
They're leaving so much slack in the market with their monopoly that fucking ching chong china is doing a better job of democratizing it by soldering more memory on and rewriting the vbios of other cards. Do you have any idea how fucking anti consumer your company has to be when such a labor intensive and potentially card ruining technique is more feasible than buying an equivalent card?
>>
>>101523459
>mememarks
try actually using wizard 8x22 at Q4+
>>
>>101523487
Try it for yourself, everyone here shits on everything without trying it.
>>
>>101523478
are you mentally retarded, anon?
>>
>>101523505
I have. Wizard is far over blown. Dry as fuck and garbage at dialogue. Gemma is smarter AND a better writer.
>>
>>101523492
Not enough context so I don't really care to. I regularly get up to 32k.
>>
Is it only L3-405B that has the new RoPE scaling tech or does all the L3.1 models also have it? 131k ctxt length feels a bit tight for RP purposes but if it's really effective at that full length, then it might be good (compared to Opus/4/4o/Sonnet's only effective 30-50k ctxt length) anyways hope the new L3.1 models are way better at RP stuff.
Also are they really not releasing a 27b or 34b model?
>>
>>101523508
Why are you always so butthurt? Who hurt you?
>>
>>101523508
No? I taught LLMs how to answer arithmetic questions, when everyone previously said that they never could, and laughed at me for trying.
>>
File: denial.png (958 KB, 3330x2006)
958 KB
958 KB PNG
A reminder.
>>
Now that the dust has settled do we all agree that llama 3 405b is an overhyped disappointment?
>>
>>101523506
I mean people make these kinds of claims all the time about models. What proof do you have that it's good and worth the time testing?
>>
>>101523525
people are just tired to read stinky takes here, you're just lowering the over IQ with your retarded takes, that's all
>>
>>101523523
It will be even dryer than L3. Trust my words.
>>
>>101523528
>opus not in 1st place
aaand discarded
>>
>>101523523
>131k ctxt length feels a bit tight for RP purposes
wat
>>
any llms i can run that do live audio feed translation into english?
I have a 13500 and 64gb ram and 16gb VRAM and a fast m.2
Chatgpt gave me whisper to look at but it's missing the rest.
Thanks.
>>
>>101523097
>Custom quantization
Hmm now where have I heard this before..
>>
>>101523544
3.5 Sonnet is smarter.
>>
>>101523502
>trusting the benchmark that places that garbage Phi above so many other larger models
Are you?
>>
>>101523544
C3.5 sonnet is better than any model, even C3 opus, are you serious anon?
>>
>>101523551
Robert Sinclair! Savior of local LLM!
>>
>>101523443
While I've never seen Meta call their instruct model by the base name like that, at the same time it wouldn't make much sense to run question-answering benchmarks on a pure autocomplete models. Unless they didn't finish finetuning the instruct versions or something and just wanted some really shaky data as a preview, but if it's really officially releasing tomorrow then it'd surely be done by now to distribute to cloud providers.
>>
>>101523544
3.5 sonnet blows away opus. You are a retard who has never used either if you dont think so.
>>
>>101523533
Yeah, but I wasn't expecting anything anyway so whatever.
>>
>>101523525
i'm not butthurt. no one in their right mind thinks any flavor of mixtral is even remotely good. i refuse to believe you've even downloaded a model since dolphin mixtral was released if you think it's the best model available under 70b.
>>
>>101523518
Gemma IS more charming, but you're insane if you think it's more technically proficient at writing.
>>
>>101523539
Stop projecting.
>>
>>101523566
>at the same time it wouldn't make much sense to run question-answering benchmarks on a pure autocomplete models.
could be for comparison on azure next to instruct who can say for sure.
>>
>>101523504
And yet... no 32GB AMD or Intel card either...
>>
>>101523544
opus is only a bit better than c3 sonnet for creative writing/rp, and only because its not as finetuned for assistant tasks. For ABSOLUTELY ANYTHING ELSE 3.5 sonnet is way better. Even for RP if it's with complex characters.
>>
>>101523544
Retard.
>>
>>101523544
Are you living under a rock?
>>
>>101523579
I will say that 27B either needs some starting context or to be told to write in a authors style. Wizard has a better "default" writing style.
>>
>>101523567
NTA, but how...? The replies feel really terse, like it doesn't want to reply to what's going on, even with a prefill. Is there a magic prefill that makes it stop acting like such a baby/suddenly start writing at a third grade level during lewd?
>>
>>101523604
Have you tried presets like https://momoura.neocities.org/ momoSORBET for example?
>>
>>101523573
>"principle of sufficient reason":"facts need a self-explanatory or infinite causal chain."
>>
>>101523579
I tried Wizard for one output and I immediately deleted it after seeing how slopped it was.
>>
>>101523573
I guess you weren't in this general when Mixtral was released, many such cases.

The tourists are really taking this general quality down in the dumps.
>>
>>101523533
No shizo, no one here has used it yet. And its base model benchmarks put it above anything not claude 4.5
>>
>>101523604
Also good 3.5 Sonnet presets in https://rentry.org/jb-listing, e.g.
https://rentry.org/SmileyJB
>>
>>101523618
>>96345096
>Mistal-Llama is fully /pol ready.
you talking about thread quality is very rich
>>
>>101523618
I was in this general, it was good just like Turbo was good at the time. You're insane if you're still using it. Seek medical attention.
>>
>>101523593
w6800 is 32gb and can be found around $900 second hand these days, and it's gfx1030 which is well supported on rocm
but it's still rocm so lmao
>>
File: RP-Trial1.png (7 KB, 634x56)
7 KB
7 KB PNG
>>101523546
This is my current RP session with custom instructions, system prompt, and a character card. Sonnet-3.5 & Opus are already breaking down and struggling to do some 'need in the haystack' stuff.
>>
>>101523504
Ngreedia is fucking over corpos too. Big companies know this and are looking to manufacture their own AI hardware now. Their existence is a bubble, kept alive by the belief that AI is a race and there's only one winner
>>
I'm a regular Wizard user and I'd say it is quite slopped by default but a lot better after you use a good system prompt tailored to get around slop for it. I'd use something like CR+ instead but Wizard is still pretty smart, while being faster, so it was worth it for me to test around with different system prompts to get it up to par.
>>
>>101523629
>You're insane if you're still using it.
>>97223983
>I know that the people who hate me will most likely try and use said post as a means of getting me banned.
>>
>>101523652
You need to learn how to prompt.
>>
>>101523567
I've been using both back to back for a couple of days. I really tried to like sonnet too, the charges for opus are eye-watering. But I feel like in any sort of real task, opus' advantage is immediately apparent. Even in RP, it's far better at understanding the context and "reading between the lines". Sonnet struggles with complex characters unless I clearly spell out what to write.
>>
>>101523652
>Sonnet-3.5 & Opus are already breaking down and struggling to do some 'need in the haystack' stuff.
no shit, aren't they like ~30 something k actual context?
>>
The real question is: How could you RP for more than 40~60 messages? The quality, no matter the model, takes a pretty sizable dip after that point, even before the end-of-context standard quality drop.
>>
>>101523660
Nah, 8x22B was a joke release. You're just a gullible idiot that fell for the Reddit hype.
>>
File: 1704056397584020.png (48 KB, 433x543)
48 KB
48 KB PNG
>>101523680
Yeah, 3 Sonnet (no info about 3.5) is 28K native context, Opus - we don't know.

Check here https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html
>>
>>101523671
And you mean sonnet 3.5? 3.5 is night and day better at coding than opus. And for RP it feels like it "gets" the character its trying to be far better / more naturally. I could never go back to opus after 3.5. Opus feels so robotic / dumb.
>>
>>101523685
>101523670
>You need to learn how to prompt.
>>
>>101523685
With corpo models you can easily RP for hundreds of messages
>>
>>101523685
I was RPing just fine for hundreds of message with NeMo
>>
>>101523690
That has nothing to do with the real context. It's the context length that fits in some GPUs, that's why the smaller models are higher.
>>
>>101523723
>That has nothing to do with the real context
No, retard, it has everything to do with real context. 28K is the native context that 3 Sonnet was trained on, and then they fine-tuned it on 200K context.
>>
>>101523688
I literally used CR+ for a while and I switched to Wizard. I'm basing this off of my own experience, not what other people have posted.
>>
>>101523731
>t. Anonymous4chan
>>
>>101523743
You're in /lmg/, not /aicg/. People here won't buy your stupid bait, retard.
>>
>>101523731
You don't know anything about how it was trained.
>>
is the real bitter lesson that to get a good small model you need to train something 50x the size to then distill from?
>>
>>101523749
but you bitted thoughbeit
>>
>>101523755
Then explain >>101523690
>>
>>101523749
no u
>>
>>101523737
I think you're mentally ill.
>>
>>101523757
Not sure if its bitter but it seems that way. Looks like new 70B is 90% of what 405B was.
>>
>>101523757
Honestly, I guess so. Maybe in the future Meta will not even release distills. They'll let others do it for them. And they'll only train the behemoths.
>>
>>101523760
>>101523749
Retard
>>
>>101523760
Read the first reply.
>>
>>101523694
Yes, 3.5. Coding is fine, true. But in RP I've had exact opposite experience on my own jb and a bunch of public ones. Opus had this understanding I would expect of a real reader. Guess it's a skill issue on my part or something.
>>
File: 11__00900_.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
Had to check to make sure I was in /lmg/ for a second with all this damn cloud talk going on
>>
>>101523782
Read the reply to the first reply.
>>
>>101523680
yes. that's what I meant 'if it's really effective at that full length'. If they actually trained at 131k ctxt length limit instead of <50k native context, then it should be good for RP.
>>101523690
ty nigga. why doesn't Bedrock have Sonnet-3.5 and Opus-3 details though?
>>
>>101523793
>why doesn't Bedrock have Sonnet-3.5 and Opus-3 details though?
why leak details about sota tools?
>>
>>101523790
All petra
>>
>>101523790
It's over. Local is dead forevermore.
>>
>>101523786
Maybe your opus is just hallucinating more due to being dumb. Try 3.5 with a good creative JB.
>>
>>101523807
>>101523807
How many times do I need to repeat I'm PetrUS not Petra?
>>97062246
>I'm not Petra. Petra's an amateur. I'm something considerably worse.
>>
>>101523786
Opus is retarded, just like you.
>>
>>101523792
ok, now you are just baiting
>>
>>101523819
you brown sharty zoomers really are something
>>
>>101523805
this, they got the secret sauce, of course no one would share that, it's ultra valuable to them
>>
>>101523830
Show proof. What is your favorite futa card and why?
>>
File: 1721271712720313.jpg (16 KB, 200x200)
16 KB
16 KB JPG
>>101523773
With how small the gains are at this size, is 90% of 405b really that different than L3 70b...?
>>
For gemma 9b they said they got a better performance from distillation than from training from scratch. Training a giant model then distilling for 90% of the performance for a fraction of the size / running costs is the way forward.
>>
>>101523866
It's worse.
>>
>>101523865
I haven't roleplayed with futas in a while, I realized I just want sex positive girls and don't really like cock.
>>
File: 10yearslater.jpg (50 KB, 1225x254)
50 KB
50 KB JPG
>>101523685
I dunno I start gooning for a night and next thing I know the sun is coming up, I'm 175 messages in and we're doing the 10 years later epilogue after I married the mercenary that was trying to kill me
>>
>>101523737
Just enjoy things instead of trying to convince other anons of anything. Expressing opinions is pointless. As soon as you let a model name out of your mouth you have been designated a retard and nothing will change the other guy's mind.
>>
>>101523866
small gains? are we reading the same benchmarks? Also take into account these are base model benchmarks. You can see the last batch and guess how much these will improve with instruct tune. New 70B will likely be on part / slightly better than GPT-4o and a bit worse than 3.5
>>
File: FNOaYJsVUAAWoCv.jpg (33 KB, 597x513)
33 KB
33 KB JPG
>>101523866
... No.
>>
>>101523874
>don't really like cock
Gay
>>
>>101523877
>t. Goliath/Midnight Miqu/Wizard/Stheno enjoyer
>>
>>101522868
>deepseek, gemini, mistral, gpt-4 all moe
>"NOOOO they mustve switched to dense right after!"
jewish
>>
>>101523874
Based, I will call you the good petrus.
>>
>>101523893
There is NOTHING wrong with Midnight Miqu
>>
>people already coping that 70B will be practically as good as the 400B
Yes. Just like Furbo is nearly as good as full 4.
>>
>>101523908
It's a retarded merge shilled by Reddit.
>>
>>101523877
mind broken
>>
>>101523919
YOU'RE retarded.
>>
>>101523914
>coping
benchmarks are coping now? We now know distilling instead of training is the way to go. There are papers on this now.
>>
>>101523919
Kill yourself
>>
>>101523886
2 years to finally get gpt at home
>>
>>101523943
We have been past old original gpt4 for awhile now. Im talking about the latest gp4-o.
>>
>>101523442
>Truffle-1 will be an agentic, personal AI computer
wasn't it supposed to be a really powerful inferencing server in a box?
this reads like "im grifting and word on the street is that this is the new grift"
>>
>>101523914
what is furbo. Sorry, I don't speak piss drinka
>>
>>101523950
>We have been past old original gpt4 for awhile now
???????????????
>>
>>101523959
New here?
>>
>>101523805
I'm talking about the API details which lists the ctxt length. They listed it for Sonnet-3 and Haiku-3 but not for Opus? Anthropic stated on their FAQ that Opus can ingest 200k+ tokens in one prompt but no one knows if each request is being processed at 200k+ ctxt limit or if they're trying to route each request by some rules they have (i.e. if the preprocessor identifies it's <50k tokens, send it to Opus-3-28k)
God this general has been infested by niggers that do nothing but stir shit and derail discussions. No wonder the actual ppl moved to twitter and discord to discuss stuff.
>>
It is weird how everyone is posting here instead of being busy trying out 405B. Almost like nobody can actually run this thing.
>>
>>101523980
>God this general has been infested by niggers that do nothing but stir shit and derail discussions. No wonder the actual ppl moved to twitter and discord to discuss stuff.
maybe us houd lgo bak,
>>
>>101523980
>if the preprocessor identifies it's <50k tokens, send it to Opus-3-28k
You're lost. aicg is here: >>101522808
But that take might be too retarded even for them.
>>
>>101523990
>Almost like nobody can actually run this thing.
Physically impossible on a home
>>
>>101523990
Even if I could if 70B is 90% as good then the speed difference would not be worth it for 405B. I'm waiting for that.
>>
>>101523980
You should leave too.
>>
Has anyone tried the famed fish tts by now?
>>
>>101524007
Even if I could if 8B is 80% as good then the speed difference would not be worth it for 70B. I'm waiting for that.
>>
>>101524003
> what is mac studio ultra 192gb
>>
>>101523980
>God this general has been infested by niggers that do nothing but stir shit and derail discussions. No wonder the actual ppl moved to twitter and discord to discuss stuff
I still have a schizo part of me that believes that maybe, just maybe, it's actual corpo agents trying to persistently kill threads where leaks happen, and to stifle potential growth of ideology and culture that opposes their corporate interests.
>>
>>101523990
the fuck are we supposed to do with a base model
>>
>>101523990
It was revealed to me in a dream that it will be garbage and lmg will overhype it.
>>
File: 8B.png (17 KB, 715x546)
17 KB
17 KB PNG
>>101524017
Cept 8B looks far worse.
>>
>>101524018
Under 4bpw...
>>
>>101524039
>>101524039
>>101524039
>>
>>101524033
its 50x smaller
>>
>>101524047
>page 6
>>
>>101524057
>436 posts
>>
>>101524025
no, it's just the usual village idiots and some bots. it's kinda unfortunate bc every once in a while there would be good discussions about technical details but nowadays it's mostly just niggers throwing shit at ppl who want to discuss stuff and bait responses.
>>
>>101524082
Oops all PetrUS
>>
>>101524053
i'm pretty sure if we evaluated those models on hardcore benchmarks, L3-405b would get some 50x better scores than L3-8b
>>
>>101524025
I unironically think there are people on corpo payroll here who shit on anything local and attack all sides of any conversation regarding local models while subtly being positive about closed corporate models in an attempt to frustrate people into leaving the hobby and/or instilling an idea of corporate superiority with local models being "cope".
This is kind of a bad post though because it might cause more of their autism and derail the thread more but I just want people to be aware that it's a real possibility. Even more so if someone shits on this post.
>>
>>101524094
There is clearly diminishing returns when it comes to params.
>>
>>101524094
I'm not, I'm sure i'td be within 40%
>>
>>101524105
>Even more so if someone shits on this post.
Persecution complex much PetrUS
>I know that the people who hate me will most likely try and use said post as a means of getting me banned.
>everyone who attacks him is mindbroken incel scum
>>
>>101524047
>►Official /lmg/ card: https://files.catbox.moe/ylb0hv.png
kill yourself
>>
New Thread
>>101524155
>>101524155
>>101524155
>>
File: Miqu 2.png (8 KB, 411x225)
8 KB
8 KB PNG
>>101521762
>Llama 3 405B leaked base model discussion and distribution
>Ah hell yeah, I'll seed that
>Attempt to download it so I can seed it
>Not a single peer in over 45 minutes
The dream is dead
>>
>>101523550
nm, sorted.
>>
>>101524224
fix ur DHT
or append some public trackers to the magnet like &tr=udp%3A%2F%2Fopen.demonii.com%3A1337
>>
>>101524224
That's strange, I'm getting 40 peers



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.