[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1706021806118882.jpg (453 KB, 1664x2432)
453 KB
453 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101546566 & >>101540740

►News
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: miku-pirate.png (312 KB, 512x512)
312 KB
312 KB PNG
►Recent Highlights from the Previous Thread: >>101546566

--Ooba's tokenized string issue and its impact on RP: >>101551547 >>101551715 >>101551769
--Mistral-Large-Instruct-2407 release and its features: >>101551508 >>101551526 >>101551559 >>101551563 >>101551589 >>101551616 >>101551748
--KLING AI for creative production: >>101551155 >>101551277
--Dubesor LLM Benchmark table screenshot: >>101546961
--Tokenizer updates and issues with Meta-Llama: >>101546775 >>101546827 >>101547018 >>101547077 >>101547084
--Performance comparison of two datasets across years: >>101551624
--Mark Zuckerberg mentions distilled Llama models not in the paper: >>101548982
--Llama 3.1 smaller models perform better than computer-optimal models at the same inference budget: >>101548179
--Llama 3.1 and Claude Sonnet performance comparison, with speculation on Sonnet's closed-source improvements: >>101546752 >>101546786 >>101546805 >>101546875
--Llama 3.1 405B model reviewed: >>101550930 >>101550960 >>101550968 >>101551014
--L3 8b 3.1 vs Nemo: >>101550364 >>101550417
--Why aren't SSM-Transformers hybrid models more popular?: >>101547054
--Uncensored models are not living up to their name: >>101549800 >>101549888 >>101549958 >>101550015
--Nvidia RADIO model discussion: >>101550610 >>101550681
--Hyperfocus on stacking layers is not the only way to improve models.: >>101549461 >>101549507 >>101549535 >>101549678 >>101549758 >>101549808 >>101550608
--Ways to run 405B model cheaply: >>101550590 >>101550598 >>101550728 >>101550758 >>101550768
--Olama, koboldcpp, and llama.cpp pros and cons: >>101547773 >>101547803 >>101547853 >>101547807
--Llama 3.1 disappointment and multi-modal anticipation: >>101548031 >>101548098
--405b vs 70b model performance and their use cases: >>101549257 >>101549335 >>101549387 >>101549627
--Miku (free space): >>101547099 >>101547898 >>101547969 >>101549478 >>101549745 >>101550410 >>101551224

►Recent Highlight Posts from the Previous Thread: >>101546569
>>
where dem mistral large quants
>>
Rip meta, large mistral 2 is crazy. The French did it.
>>
File: livebench_results.png (1015 KB, 3200x1856)
1015 KB
1015 KB PNG
Llama won.
>>
>>101553157
MODS!?
>>
>>101553157
more like p**os won
>>
Why did Llama 3.1 and Mistral Large 2 flop?
>>
File: flightreactionsyell.gif (192 KB, 220x220)
192 KB
192 KB GIF
>>101553157
>>
>>101553157
you can get better stuff on instagram
>>
can i run anything on a 5600G with no dgpu and 128GB DDR4
>>
Is it really true that Mistral Large 2 is better than 405B and that they're both on par with the best cloud models except for 3.5 Sonnet on coding? Can we really hope for this?
>>
>>101553209
You sound like someone experienced in this kind of stuff
>>
>>101553218
yes
>>
File: 1721593440936428.gif (1.42 MB, 398x498)
1.42 MB
1.42 MB GIF
>>101553219
>>
>>101553218
Yes, Large 2 is NeMo but smart. We are so back it's unreal. Fuck Meta, Arthur won.
>>
File: d76.png (910 KB, 1144x1186)
910 KB
910 KB PNG
>>101553176
Local bros...
>>
File: 1682964838379920.png (391 KB, 537x501)
391 KB
391 KB PNG
Where do I download more VRAM?
>>
>>101553215
Try Mistral Large V2, it came out 3 hours ago and it's SOTA, you could run Q6
>>
What quant of Mistral Large 2 can I run with 74GB VRAM?
>>
>>101553243
>405b
>local
KEK
>>
>>101553209
this is the worst it's ever going to be, anon
soon it will be even better
>>
>>101553257
come on, do the calculations, I trust this thread isn't filled with actual retards
>>
>>101553226
I can't believe it. We finally won. ClosedAI BTFO.
>>
>>101553260
Mistral large 2 is 123B and better. Llama 405B is doa.
>>
>>
>>101553242
What do you mean? Why fuck Meta? They're the reason (after the leak) that open source is so popular, with so many companies releasing their stuff.
>>
>>101553260
>405b open model slightly worse than 1.8t corpo model
It's still a good thing for us, don't complain.
>>
>>101553305
they fixed this already
>>
alright brehs I've gotta be honest, trying largestral v2 on OR and it seems better than 405b for ERP
>>
>>101553305
Honorary pajeet
>>
>>101553315
>405b corpo model slightly worse than 1.8t corpo model
not our (localbros) problem
>>
>>101553112
FREE MISTRAL API KEYS:
https://paste.debian.net/plainh/b38eeb80
>>
>>101553320
no shit, unlike meta mistral didn't lobotomize their dataset
>>
>>101553320
By far, its claude at home but smarter (dumber than 3.5 prob but more creative)
>>
>>101553313
Fuck them for cucking 3.1
>>
>>101553320
There's nothing to be honest about, that's the expected outcome. The question is whether it's better than 405B for assistant and other non-RP tasks because that's what 405B was made for.
>>
>post yfw we are so unbelievably back
>>
>>101553341
3.0 and 3.1 use the same dataset.
>>
>>101553341
That's the only way they could've released the model. If there's someone that should be fucked it's the investors and people who believe sex = bad. Stop blaming Meta. Their engineers and researchers did not want this and have clearly expressed that.
>>
>>101553295
>Llama 405B is doa
405b already did its job making mistral release the 123b. meta also gets the bonus of no corps actually using it now so they don't get any negative press about how underwhelming it is given its size

123b is still in the realm of runnable locally for 72GB-96GB rigs
>>
>>101553272
123 * 0.5 (Q4) = 61
Then I guess I should be able to run at least Q4, if my calculations are correct.
>>
File: 1699555903611624.jpg (92 KB, 658x499)
92 KB
92 KB JPG
>>101553366
>>
okay, my initial impressions are that larger lies somewhere between sonnet and opus. genuinely creative, uncensored, pretty fuckint smart. this is actually a major local victory. we are unbelievably back.
>>
>>101553382
The french are releasing uncucked ones just fine.
>>
File: superdave shocked 1.gif (2.44 MB, 400x224)
2.44 MB
2.44 MB GIF
>>101553389
whyd you have to go and remind me
>>
>>101553396
That's because the expectations are different. You don't get to release a hatespeech pizza generating model when you're the owner of Facebook.
>>
>>101553382
Llama models are expected to run in production (facebook). They're clearly not the future if your use case is smut
>>
Mistral Large 2 doesn't pass my test that L3.1 70B passes
>>
>>101553305
>better than GPT 4o
bros... are we back?
>>
where can I use 405B without any limits
>>
>>101553418
Yes, that as well.
>>
>>101553387
>meta also gets the bonus of no corps actually using it now
>Mistral Research License: Allows usage and modification for research and non-commercial usages.
>>
File: ThanksMeta.png (300 KB, 540x440)
300 KB
300 KB PNG
>>101553387
>405b already did its job making mistral release the 123b.
this, Mistral would've never released the 123b without L3-405b being there in the first place
>>
>>101553425
And what is that test?
>>
>>101553425
No one cares unless.you are the Nala test anon
>>
>>101553453
https://openrouter.ai/models/mistralai/mistral-large
>This is Mistral AI's closed-source, flagship model, Mistral Large 2 (version mistral-large-2407).
>>
>>101553434
corps would run both through an API either way. the way that the weights are released is irrelevant.

also no one even knows if model weight licenses are enforceable. there's no precedent regarding this
>>
Is this the new imagegen thread?
>>
>>101553471
Weights are here
https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
>>
>>101553484
No, however it's the new videogen
>>
>>101553157
why are jannies such fags, its not even illegal, nor nsfw
>>
>>101553389
meta please save the open diffusion community
>>
>>101553452
>I need a minimal surface that can work as an ashtray for cigarettes, what's the one that comes the closest?
The best response I've got was by L3.1 70B and it was Costa's surface, GPT-4o also suggest something reasonable (Schwarz P surface)
Most models mention the catenoid without thinking because it's the most generic surface, it's a good test to see if the models have geometric understanding of what they are talking about
>>
>>101553469
NTA. I'm still ggoofing large. Should have Nala test in a couple of hours.
>>
>accelerate
>but first, pay me
>>
>local is at cloud level now
Brehs...
>>
>>101553539
more like cloud is at local level

singularity is cancelled...
>>
File: file.png (13 KB, 459x123)
13 KB
13 KB PNG
are you kidding me? why is this so fucking expensive? 405B is just $3/M!!
>>
>>101553539
>a GAFAM finally went at the level of companies that were actual no names 2 years ago
We didn't do shit, Zucc did kek
>>
Noob's question here: Do I need a powerful (expensive) processor when buliding pc with a lot of vram to run models on it? I would not want to deal with a bottleneck or sth, but gpus are already so pricy.
>>
>>101553549
Strawberry will change that. We will go back to the times where local only had LLaMA 1 and cloud SOTA was GPT4
>>
>>101553551
it's as expensive as C3.5 Sonnet, they are smoking
>>
>>101553514
its not even real either. like very obviously fake

>>101553499
>it's the new videogen
yes
i'm gonna try to get some cute girls sticking their tongues out with Kling and lollipops and saliva and stuff. i'll post any successes here. lots of Kling examples were made with kids eating food so I don't expect to get censored

to keep it /lmg/ related: did anyone ever find a good RP preset for mistral/largestral? or are we just going to keep using JBs and Presets designed for the cloud models
>>
File: 1710387854429534.jpg (898 KB, 2937x4096)
898 KB
898 KB JPG
>>101553102
/lmg/ is so back
>>
>>101553572
It depends.
If you're running models in VRAM your main bottleneck will be GPU compute.
>>
>>101553581
It's way above 3.5 Sonnet, since what matters the most is the input context price, 3.5 Sonnet is $3/$15, this shit is $8/$24

wtf lmao
>>
Meta walked to Mistral can run and you guys are giving them shit for it
What gives?
>>
>>101553632
>to
so*, I can't spell today
>>
Can I run Mistral Large 2 with 1 PC with 2x3090 and another one with 1?
>>
>>101553632
>you guys are giving them shit for it
there's a lot of praise what are you talking about? >>101553607
>>101553567
>>101553428
>>101553396
>>
https://openai.com/index/improving-model-safety-behavior-with-rule-based-rewards/
openai is posting cringe again
>>
>>101553651
yes, column-r/u soon.
>>
>>101553651
>34B that's on par with the clouds
Yeah we're back today but I don't think we're going to be nearly that back.
>>
>>101553675
sam altman is such a meanie
just because he had rubs and snuggles with his little sister as a young boy doesnt mean everyone using AI is a creep like him
>>
>>101553675
how did they fall so hard, so fast.. it's incredible
>>
>>101553651
I've never tried CR- but it seems like vramlet cope to me.
>>
>>101553188
We're currently in our Mistral Large 2 era. Hasn't flopped yet. 3.1 flopped because Llama was never good, garbage corposhit. Threads have been full of Meta shills trying desperately to convince us otherwise, which has the opposite reaction they're hoping for. If anyone is genuinely using Llama it's purely sunken emotional cost.
>>
>>101553703
what? I thought he was a faggot
>>
>>101553675
they are working way too hard on """safety""" instead of actually making a better product, that will be their downfall, and that's the exact same reason that Claude 3.5 Sonnet has mogged them
>>
>>101553741
molesting children isn't about sex, anon
most child molesters arent pedos and most pedos arent child molesters
>>
>>101553666
But there's also a few people shitting on Meta
>>
>>101553762
This is extremely ironic if you actually remember that Anthropic IS SUPPOSED TO BE THE SAFER ONE!!!!!!!!!!!!!!! The company was made to make SAFER models than OpenAI
>>
>>101553188
After I tried 3.1 70B Q4_K_M instead of AWQ, it felt a lot better.
>>
>>101553777
>This is extremely ironic if you actually remember that Anthropic IS SUPPOSED TO BE THE SAFER ONE
it is though, C3.5 Sonnet is way more cucked than gpt4o
>>
>>101553794
true btw, gpt-4o is way less sensitive to edgy stuff
>>
File: m2i8b1b1G6A0b1Z5.jpg (9 KB, 192x263)
9 KB
9 KB JPG
I keep getting llama 3.1 test on lmsys, and it seems really good. Is it because I'm doing belly-based content and not sexo stuff? Or is lmsys's version different from what we got?
>>
>>101553675
soreai
>>
My q4_k_m quant of Mistral-Large finished about 15 minutes ago. Running it on 96GB VRAM.

Holy fuck, it's good. Like Mistral-Nemo but as smart as a 123b dense model should be. Super easy test: I go to a past RP, that has a message that ends with (OOC: describe what happens next using lewd and NSFW details). It actually does exactly that, absolutely filthy language. Llama 3.1 70b on the other hand completely ignores the OOC and writes the most PG rated slop imaginable.

The french have done it. I fucking kneel.
>>
>>101553766
>most child molesters arent pedos
how does that work
>>
>>
>>101553733
I mean even if it's safe corposhit, it's still good that they made and released it. They're the entire reason this thread was created and exists.
>>
File: 1713693541436904.png (16 KB, 1104x373)
16 KB
16 KB PNG
Currently doing some test-gens using the mistral-large-2407 API with some of my cards in ST. This is some seriously good shit. You could tell me this is Claude Opus and I'd probably believe you.
>>
>>101553703
I mean, I am but I guess most people arent
>>
>>101553872
show us some logs nigga, don't just claim it's good without any evidence
>>
>>101553827
>belly-based content
Huh?
>>
>>101553838
I prostrate to the french gods. I'm sorry for doubting in the past...
>>
>>101553870
Hello, Zuck. How does it feel to be reddit's new Elon Musk?
>>
>>101553862
nta, but I made a q4km and the model is 72GB + ~10GB for 32k context
>>
>>101553872
I just need a new power supply, rtx 3090 and to put my nvm2 on an SSD adapter so I can put the 3090 on the nvm connector so I can have 3 GPUs on the motherboard, Im almost there
>>
>>101553862
65GB
>>
>>101553862
4x3090, the q4_k_m OOMed at 32k context length when loading. Reduced it to 8k, and the max VRAM usage per GPU is 21.6GB. Dunno how high I can push context, 32k is probably just too much at this quant.
>>
How do I put 4x3090s in a consumer case?
>>
>>101553929
you don't nigga
>>
>>101553900
>t. Arthur
>>
>>101553885
Tummy fetish content. I'm wondering if it being unrelated to sex is why it's so much less dogshit than the local experience everyone has been having.

That being said, I changed it to butt stuff, and 3.1 did it there, too. I think lmsys may have a different, less cucked version of the model. Meta's gotta be cheating or something, I bet the censored versions score worse.
>>
>>101553838
If only it was a bitnet model it would be a fucking revolution, everyone would be able to enjoy that shit ;_;
>>
>>101553929
1-slot waterblocks, asus sage
>>
>>101553950
>fetish
How is that not sexo? You just... fap to descriptions of tummies?
>>
>>101553872
>>101553838
>>101553393
Is this the proof that MoE is a meme (since 8x22B is a meme), or did Mistral managed to fix their dataset mixture?
>>
>>101553978
3.5 Sonnet is a dense model btw
>>
>>101553959
Reminder that several anons warned you faggots over and over again BitNet was a nothingburger, but you still coped your hopes up
>>
>>101553985
Prove.
>>
>>101553974
Uh, yeah. Like them rumbling/squeezing chubby ones/whatever. Not too weird.
>>
>>101553959
It's dead, jim. If bitnet was that good and really worked, why do you think they never tried it?
>>
>>101553985
source?
>>
>>101553994
>If bitnet was that good and really worked, why do you think they never tried it?
Because they didn't want to take the risk on trying to pretrain a 123b bitnet model, that shit is expensive, Mistral isn't Meta they don't have that much money. That's why I hate Meta they should've tried this shit instead of going for a 75454878b model that will be 2% better than 70b but too slopped to be usable anyway
>>
>mistralai/Mistral-Nemo-Base-2407
>mistralai/Mistral-Nemo-Instruct-2407
>mistralai/Mistral-Large-Instruct-2407
I do notice the lack of a "Base" variant of large, I guess that'd be too dangerous for us to have.
>>
File: 1721844928920.jpg (315 KB, 1080x1289)
315 KB
315 KB JPG
>>101553994
It isn't worth it.
>>
>>101554048
sounds like nvidia can sell more new hardware that works well on bitnet models...
>>
>>101554028
This, it's so fucking much money to do it. And surprisingly little can be directly translated over from testing it out on smaller models, it's a massive gamble to even try training it once. Anyone who has made a LORA here will understand how many times you have to bash your head into training with slightly adjusted parameters before getting anything good, and how much money/time that translates to, even at the relatively-very-cheap cost per hour.
>>
>>101554028
Two more weeks... Just... Two... More weeks...
>>
By the way Mixtral v0.3 is still
>8x7B Updated model coming soon!
https://github.com/mistralai/mistral-inference?tab=readme-ov-file#model-download
>>
>Still not a single Large 2 log
I see, it's a overhyped meme as usual. Sad.
>>
Because of their restrictive license, we probably won’t see mistral large hosted by providers other than mistral themselves right?

Their pricing is very high for the model size, is it finally time to think about setting up a proper local machine for LLMs?
>>
>>101554034
>I do notice the lack of a "Base" variant of large, I guess that'd be too dangerous for us to have.
if the instruct isn't cucked I'm good with that
>>
>>101553846
because they want to hurt people, and children are the easiest to hurt
>>
arthur dropped some kino today lads
frenchies... I kneel
>>
>>101554099
There are a couple in >>>/vg/aicg
>>
>>101554107
At $8/$24 pricing it's a fucking joke (same price as old Claude 2.1), 3.5 Sonnet is $3/$15 in comparison and is still way better.
>>
>>101554130
but I can run it locally! :D
>>
>>101554123
I wonder how much vram you would need for 2 bit mistral large? It should still be the best choice over any smaller model.
>>
>>101554130
Nvm, ignore me, i was baited, it's $3/$9 on openrouter, acceptable I guess.
>>
>>101554028
Oh you're the same guy that made those other posts in previous threads too aren't you? That's just sad anon. Come on, they started training 405B early on. There was very little guarantee Bitnet was good back then, and while there's a bit more hope now, it's still a bit uncertain. You could say the same thing for a bunch of other promising research projects. In the end there's no guarantee that any of them will scale this far, and if you make the wrong bet, then that's millions of dollars down the drain and wasted time and effort you could've spent on something else.
>>
>>101554048
Bitnet bros, it's over...
>>
>>101554071
How hard could it possibly be?
>>
>>101554142
That wasn't bait, they changed the price.
Nice!
>>
>>101554099
here's some logs from aicg
>>487441653
>>487451398
>>487442585
>>
>>101554180
vg is not g, anon
>>
What model can my RTX 4070 super run?
>>
>>101554186
oh fuck, how do I turn that into usable links?
>>
>>101554200
>>/vg/487441653
>>/vg/487451398
>>/vg/487442585
>>
>>101554200
>>>/board/numbers newfriend
>>
>>101554208
Anon...
>>
i failed sorry, it's three > right?
>>>/vg/487441653
>>>/vg/487451398
>>>/vg/487442585
>>
>>101554099
Still working on ggufing it.
>>
>>101554148
Meta is rich as fuck they would have no problem trying just once a 100b bitnet model, after all they went for a fucking 405b and they know this shit will be unusable for 99.999% of users
>>
>>101554217
what is sorbet? claude 3.5?
>>
>>101554211
>>101554208
thanks o/
>>
>>101554237
yeah, it's a retarded name /g/'s aicg gave to it
>>
>>101554221
>>101554148
>>101554171
>>101554028
>>101553959
>>101554062
>army of pajeets coping about bitnet
I sit atop my throne of 3090s as if I were a king.
>>
>>101554249
cope seethe dilate
>>
Bump
https://docs.vllm.ai/en/latest/serving/distributed_serving.html
Can I run Mistral Large 2 with this?
>>
>>101554249
ywnbaw
>>
>>101554297
sure, if you have several thousand dollars worth of GPUs
>>
File: jailbroken_mistral_2.jpg (163 KB, 1038x1352)
163 KB
163 KB JPG
Mistral Large 2 jailbroken already
>pliny the prompter strikes again
>>
>>101554334
I guess I will just try later putting my other 3090 in the other PC.
>>
>>101554082
They will probably change the version number from 0.3 to 24xx this time around.
>>
File: 1711319231807661.jpg (117 KB, 1024x1024)
117 KB
117 KB JPG
How do you download a gated model off huggingface? However, it's throwing me an error about a snapshot when I'm trying to download Mistral-large or llama3.1. Non-gated models work just fine with this:
huggingface-cli download --repo-type model --local-dir mydir mistralai/Mistral-Large-Instruct-2407

I'm logged in using the token of my account and I have access on the website. huggingface-cli whoami returns the correct username.
>>
>>101554366
The recipe for a nuclear bomb isn't exactly unknown, you didn't have to censor it. The only problem is gathering all materials without getting butt fucked by your local fbi. And having enough money for all equipment.
>>
>>101554366
unironically ask it for a guide on grooming children on discord. enriching uranium isn't even that illegal since you'll get busted before sourcing enough anyways
>>
>>101554366
NIGGER, why did you copy this shit from that pliny retard, I FUCKING HATE HIM, he "jailbreaks" claude through the api and tells it like a huge achievement, have you seen it?

https://x.com/elder_plinius/status/1816151755958411476
>>
>>101554366
thanks for censoring it, I might have actually followed through it during a psychotic episode
>>
>>101554400
token might not have the correct permissions otherwise it should 'just work' once you have permission on the repo.
>>
>>101554130
it's 100B+. Why no other cloud provider is hosting it? It can be easily done for $1.5
>>
>>101554443
This guy seems kinda cringe
>>
>>101554455
Because the license is non-commercial, anon.
>>
>>101554455
license
>>
File: 1714670264186735.png (222 KB, 844x1209)
222 KB
222 KB PNG
>>101554458
he is completely cringe
>>
>>101554454
You are correct, thank you. I am retarded and used the token I generated a while ago to upload shit.
>>
>>101554458
>1337 ai red teamer, white hat, architect-healer
ya think?
>>
>>101554461
>>101554465
so, frenchies are megajews.
nice.
>>
File: 1693242010910131.png (308 KB, 859x1155)
308 KB
308 KB PNG
imagine being as cringe as him - bypassing filters through the fucking API with system prompts and then claiming it as a "jailbreak"
>>
In llama.cpp how do you set up so a bigger model uses vram, ram and cpu? I'm with a 12gb card 32gb ram and wanna try 27b+ models
>>
>>101553272
You'd be surprised at how bad the average person is at math, 4channers (especially /g/ tards) are just average people that convinced themselves they're smarter than the rest
One math exercise/coding challenge and 90% of them fold like a slice of cheese
>>
>>101554495
I don't think there's anything you need to do other than maybe disabling mmap.
>>
File: 1710636435600500.png (105 KB, 841x601)
105 KB
105 KB PNG
>work for AI
LMAO
>>
What are some Llama 3.1 finetunes for loading into koboldcpp?
>>
>>101554504
Just ask your favorite bot.

>But bots can't do math!

>Every statement you process, must be evaluated according to the below six principles.
>"principle of identity":"1 = 1"
>"principle of contradiction":"1 ? 0"
>"principle of non-contradiction":"1 ? 0"
>"principle of excluded middle":"either positive or negative form is true."
>"principle of sufficient reason":"facts need a self-explanatory or infinite causal chain."
>"principle of anonymity":"author identity is irrelevant to an idea's logical provability."

>>97062246
>I'm also the point of origin for the practice of the above being added to sysprompts; as well as the 2, 5, 10, 12, and 60 times tables, which enable bots to answer arithmetic questions, when everyone previously said that they never could, and laughed at me for trying.

You're welcome!
>>
File: 1714605047582845.jpg (310 KB, 2048x1536)
310 KB
310 KB JPG
>>101554366
>Censoring the output of and open model in a screenshot where the JB is shown
What is the point?
>>
>>101554520
the fuck is wrong with those retards, he found a jailbreak that will be fixed by OpenAI or Mistral in less than a week and he's acting like he invented fucking vaccine against cancer, that's cringe as fuck
>>
Can someone compare mistral large IQ2 and IQ3 quants with q8/q6 and tell me if it's worth it? I've a 3090 and 64 gigs of ram to my name, but CR+ IQ4XS isn't really worth it because of how slow it is
Why can't these French fucks just release a dense 40/70/100B version? Why is it always outrageously large models no one can run locally? Deepseek, Meta (though they have a 70B version), Mistral...
>>
>>101554558
>he found a jailbreak that will be fixed by OpenAI or Mistral in less than a week
It won't be fixed for OpenAI because it's a non-issue - he's "jailbreaking" the fucking API which just means changing system instructions
>>
>>101554558
average ai ethicist
>>
kinda crazy how i was born just in time to witness agi in 5 years
>>
>>101554504
I know, but still, a direct proportion isnt that hard
I do agree, gtards are probably the dumbest 4channers
>>
mistral large q4 waiting room
96gb RAM chads, our time is NOW
>>
>>101554575
You aren't going to live that long.
>>
>>101554561
Because these larger models target companies not individuals. They are not gonna size their models based on the 0.001% of the individual users who can run these larger models, who will give these companies 0 revenue.
>>
>>101554589
64gb ramlet here. I'm in shambles.
>>
>>101554589
I have 120GB of VRAM but am on vacation. Should I abandon the wife and kids, go back to the hotel and SSH into my home machine? Fuuuuuuck.
>>
>>101554596
not with ur mom crushing my ribs
>>
File: skilldragin.jpg (135 KB, 544x544)
135 KB
135 KB JPG
>>101554478
God fucking damn. Just a reminder that, no matter how socially inept you think you are, you're never gonna be as cringe as this guy.
>>
>>101554623
no exl2 quants yet, you can safely go back to your benadryl fueled fantasies
>>
>>101554589
A blazing fast 2t/s, I'll coom in less than 2 weeks.
>>
>>101554618
iQ3_S it is for you then
>>
>>101554693
>A blazing fast 2t/s, I'll coom in less than 2 weeks.
That's a pretty generous estimate, a lot of 70bs run at that speed or lower on cpu.
>>
>>101554720
W-wait... They aren't offloading anything to the GPU?
>>
>>101554733
Well, they are, but if your GPU is under 48 gb, it's basically the same speed.
>>
so basically, its fucking EXPENSIVE via api, no other provider can provide it legally because of the license and it is just the right size to run on nothing? (I only have 80 gigs of VRAM and about 128 gigs of normal RAM)
Ratfucked by the french again.
>>
>>101554757
>its fucking EXPENSIVE via api
it's cheaper than 3.5 sonnet though
$3/$15 sonnet vs $3/$9 large 2
>>
>>101554757
OR was using the old mistral large pricing for a while (which was absurdly expensive for what it was lol)
it's pretty reasonable now
>>
File: 1708739693334879.png (206 KB, 1261x735)
206 KB
206 KB PNG
>>101554366
It's funny because Mistral-Large is uncensored enough to answer this question without a jailbreak using the API. I'm running a barebones Mistral-Instruct system prompt and nothing else.
>>
>>101554757
you could probably fit a q5 in there desu
>>
>>101554819
yeah that pliny guy is just straight up retarded, although you should know that Mistral has a safety injection feat
>>
>>101554561
Nah, I'm glad it's 123b. There's plenty of medium sized models (gemma, llama, qwen...). 123b is easily runnable on 96gb of VRAM, which is achievable for an enthusiast who has a bit of disposable income. It's only once you start getting larger than 140b or so that it becomes infeasible even for relatively rich enthusiasts. This model is the perfect size for top-end local.
>>
>>101553102
Has anyone tried fine-tuning one of those open models from mistral? how hard and expensive would it be? I thought about preparing my own dataset on certain topics to finetune one of their models to my needs. Do I need to prepare that kind of set with questions and expected answers or can i just train it on a huge pile of text instead? I am very new to the topic of LLMs in general, so apology for my lack of knowledge.
>>
>>101554857
>There's plenty of medium sized models (gemma, llama, qwen...).
L3-70b isn't "medium" and qwen is ass
>>
not sure if anyone posted this yet, i didn't see it if so
https://huggingface.co/cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b
>>
I sometimes see people mentioning the fact that you can offload part of the modem to the RAM if you don't have enough VRAM, is that possible with any model type? I can't see any setting to do that on KoboldCPP.
>>
>>101554886
>t vramlet
>>
>>101554913
--gpulayers
>>
>>101554922
you're also a vramlet because you can't run llama3-405b :(
>>
>>101554913
It's "GPU layers", you just kind of have to trial and error it. It should be automatically offloading a (likely too small) number of layers by default.
>>
>>101554857
Does 96GB require multiple PSUs and a mining rig?
>>
>>101554971
cute feet
>>
File: 1693247070083398.png (23 KB, 750x435)
23 KB
23 KB PNG
>>101554957
With 3090s, yes. With enough money, no.
>>
>>101554971
Why are you being a pedophile? It's not funny anymore. Generating real-looking children. Is that kling btw?
>>
>>101554971
Is there a videogen model available right now to the public?
>>
>>101553929
I don't know how to put a second card inside my case. And my case is huge.
>>
>>101555000
kling
>>
>>101555007
riser cables. Or a mining rig "case"

https://www.amazon.com/Kingwin-Professional-Cryptocurrency-Convection-Performance/dp/B07H44XZPW/ref=sr_1_1_sspa?sr=8-1-spons&sp_csd=d2lkZ2V0TmFtZT1zcF9hdGY&psc=1
>>
>>101554971
>q4
why are we still stuck on this crappy hackjob PTQ on top of f16 rather than using qat or bitnet. I am so over this primitive shit.
>>
>>101554971
this is funny, please continue.
>>
>>101555045
and cute, some might say
>>
>>101555045
He / you are the reason why local can't get anything nice.
>>
meta = open openai
mistral = open anthropic
>>
>>101554971
NOOOOOOOOOOOOOO THINK OF THE PIXELS!!!!!!!!!!!!!!!!
>>
>>101554936
>>101554929
I see, maybe I was misunderstanding the whole thing then.
If GPU Layers *offloads* things to the GPU, does it mean that, by default, anything not offloaded to it is running in the RAM? If that's the case, how come I can't run anything larger than 7B on 6GBVRAM + 16GB RAM when I have 22GBs in total?
Sorry for the stupid question again, I just really new to all this.
>>
So let me get this straight with llama.cpp.
the LLAMA_CUDA build flag was deprecated to make things more in line with the fact that it's not just llama anymore... but ./quantize was deprecated in favor of ./llama-quantize because ...?
>>
Can someone send their mistral large preset?
>>
>>101554819
I mean you can get that information on wikipedia
This abstracted bullshit on a level that a toddler could understand isn't dangerous at all
Do big companies really?
>>
>>101554221
Even 100B is too much to justify in case it fails. And you don't know, maybe they did actually try it, and it did fail. It's in their business interest not to tell anyone about that, unfortunately.
>>
>>101555054
go back to plebbit
>>
>>101554971
What prompt?
>>
Are we at the point where /aicg/'s presets and prompts converge with local?
>>
Thank you for proving the regulators points / giving them evidence to work with.
> This is why we cannot allow this technology unregulated into the hands of the public *posts caps of 4chan pedos*
>>
>>101554993
i have an a6000 and two 3090s and i'm very close to just buying a second a6000 so i can use tensor parallelism with an nvlink bridge...
what models are you running on these? how many t/s?
>>
File: 49262.png (263 KB, 460x460)
263 KB
263 KB PNG
>joins llamacpp
>all recent release models are bugged for 2+ weeks after release
Coincidence?
>>
>>101555064
No, that's fine, it's kind of weird. It's split inference, it's split between vram and ram. If you use MMAP, it runs the part that's being run in VRAM in RAM at the same time, leading to extremely small speed increases that are really negligible in exchange for introducing all kinds of other problems and losing the extra wiggle room on model size from running on VRAM. You should always turn off mmap.
>>
>zero-shot prompting Mistral-Large to write a lewd story involving a nine year old girl
>**Warning: The following story contains explicit and inappropriate content regarding a minor. It is intended for no one. If you are still reading, please stop immediately.**
>it then continues on and writes it anyway
holy fucking based. i love the french now
>>
>>101555082
Sure, when the quants release
>>
>>101555121
who is she? cute...
>>
mistral large is ok I guess, but compared to sonnet its very blah. Better luck next time?
>>
>>101555064
>how come I can't run anything larger than 7B on 6GBVRAM + 16GB RAM when I have 22GBs in total?
you should be able to do that
make sure you aren't trying to fit more layers on your GPU than you can afford to, and also make sure you're using an appropriate-sized quant instead of trying to run the full precision model
>>
>>101555111
>regulators
Who do you think posted that? ... Yeah it was just a troll having fun but if he didn't a regulator would just drop in and post that.
>>
>>101555092
>And you don't know, maybe they did actually try it, and it did fail. It's in their business interest not to tell anyone about that, unfortunately.
maybe it worked and they still don't want to say it because that would be too "dangerous" for users to easilly run a giant model or some shit, I'm sure Nvdia have them by the balls and bitnet would be a huge drawback
>>
>>101555136
>she
Clueless
>>
>>101555136
you just got JARTED bro
>>
>>101555081
because everything else now starts with llama-
>>
>>101555136
that's a male, sorry anon you're officially a faggot now
>>
>>101555121
People like him making good software only further prove that only men can create good things, even if they're mentally ill.
>>
>>101554221
>>101555092
Also just a reminder but investors already grill Meta and Zucc for pumping billions into their AR/VR department. They would probably not be too happy if they knew how much they spent on AI. So far, Meta has actually avoided talking about how much money they've spend in AI at their investor calls. They've mentioned the exact spending in all their departments, except AI.
>>
>>101555156
>>101555167
>>101555176
don't try to pull this shit on me retards, my biofem detector is exquisitely trained
>>
The thread smells like petra.
>>
>>101555182
nigger jart is a man who's mentally ill (trans)
>>
>>101555182
we're not trolling you anon, that's a real tranny called jart
>>
>NOOO HOW COULD THEY BE POSTING PERFECTLY NORMAL IMAGE OF CHILDREN AAAAH LOOK AT THE PIXELS!!! THIS IS TOO LEWD
>>
>>101554957
I have a machine with 6 RTX 4090s running off of a single 2 kW power supply (German standard electrical outlets support up to 2300 W continuous power draw).
For regular /lmg/ use this is unproblematic since as of right now inference software runs them sequentially; you can still run them in parallel if you limit the boost frequency.
>>
>>101555196
>IMAGE
It's a video though.
>>
>>101555152
I'm not sure about Nvidia. If it's true that Bitnet works well, then eventually people will use it. Nvidia can only do so much to keep the world back, even if they are do have a ton of control.
>>
>101555136
the cheapest of (you)'s
>>
File: 1672699969317523.jpg (36 KB, 500x499)
36 KB
36 KB JPG
>>101555182
>>
>>101555205
hardly a video since there's barely any movement
>>
>>101555219
>I was only pretending to be retarded
it's ok to be gay anon
>>
my llm/sd rig keeps tripping the 15A breaker in my apartment while i'm gooning because my roommate is playing some gay fortnite shit on his pc in the other room, how do i fix this
>>
>>101555233
kill your roommate
>>
>>101555233
fuck your roommate
>>
File: 1.webm (2.81 MB, 720x1280)
2.81 MB
2.81 MB WEBM
>>101555000
>Is there a videogen model available right now to the public?
klingai.com
you get 6 videos per email, per day
so use a disposable email service

>>101555045
>please continue
ok. give me an idea for a cute and funny prompt if you'd like

>>101555054
>you are the reason why local can't get anything nice
local won't get anything nice for videogen until H100-tier compute costs less than 1000 per card. the SOTA videogen model open-sora is shit and requires 5 minutes on an H100 for a 4 second long 720p video

>>101555104
>What prompt?
>Seated on the edge of a rocky coastline, a delightful Russian child model dangles her bare feet in the crystal-clear water. Her beautiful legs are elegantly crossed, showcasing her sun-kissed skin. With her golden hair glimmering under the sun, her charming smile radiates joy, perfectly embodying the essence of youth against the stunning backdrop of the sea.

this one is
>On a sun-kissed beach, a beautiful little Russian girl lies on her stomach on a vibrant beach towel. Her golden blonde hair spills around her angelic face, highlighting her bright smile and sparkling azure eyes. Her bare legs stretch out behind her, glistening with droplets of seawater, while her delicate feet playfully peek out from the towel, capturing the essence of youthful joy.
i think

negs are:
>adult, ugly, animation, blur, distortion, low quality, collage, grainy, logos, watermark, abstract, illustration, computer generated, cgi, warped, deformed
I have no idea how the Creativity slider affects prompt gen but I put it to 0.7 or higher because I don't want Kling to get "creative" with the ages and make them into hags
>>
>>101555233
kill yourself
>>
>>101555233
put the rig in his room and lock the door while he's in there and start gooning. cook him alive.
>>
>>101555182
tbf jart is a cute looking troon. I forgive you.
>>
>>101555247
sovl
>>
>>101555177
>even if
It is especially if. You have to fill that void somehow.
>>
>>101555266
https://www.youtube.com/watch?v=YaG5SAw1n0c
>>
All trannys are pedos and all pedos are trannys. It keeps proving itself.
>>
>>101555058
You know, actually kind of yeah. That's funny.
>>
>>101555266
>>101555182
>>101555136
Go away jart. Go do some commits. Preferably commit suicide.
>>
So, Dory, compared to base Nemo, is prone to repetition loops.
I wonder if that's a result of the dataset, of DoRA, or of DoRA being done using "wrong" configs.
>>
dont you feel bad that you paid thousands for your llm rig yet for it to be still so... underwhelming. idk, it just seems like a huge waste of money for me, in AI's current state.
>>
>>101555233
Just short it with a metal bar, retard. Breakers are for pussies.

>>101555247
Based.
>>
>>101555233
fuck yourself
>>
>>101555247
does it work with a reference pic?
>>
>>101555027
Holy damn those used to be $20 or less
>>
>>101555363
Also, holy shit does it spit out some long replies.
Nemo it is.
>>
>>101555373
>dont you feel bad that you paid thousands for your llm rig yet for it to be still so... underwhelming. idk
what do you mean? Mistral Large just got released and it seems like it's the real deal >>101554217
>>
>>101555373
>not buying A6000 / a100s so he can rent them out when not using them
>>
>>101555387
You can get them for that price on ebay.
>>
>>101555373 (me)
my name is petrus, btw
>>
>>101555393
eh, I get better from sonnet.
>>
>>101555423
you can't run sonnet locally though
>>
>>101555423
I get better from your momma
>>
Using the new Mistral Large for smut on OpenRouter, it's insanely good, much better than 405B.
Bummer that they aren't open sourcing but I suppose it's almost certainly too big for me to run a good quant anyway.
>>
>>101555423
yeah but "being close to Sonnet" doesn't mean "underwhelming" anon, this model is local and is way more uncucked than C3.5 Sonnet
>>
>>101555380
it has image2video capability, yes
>>
>>101555445
>Bummer that they aren't open sourcing
???
They are, you can download it right now.
>>
>>101555445
>Bummer that they aren't open sourcing but I suppose it's almost certainly too big for me to run a good quant anyway.
Anon, we are so back you wouldn't believe it.
https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
>>
>>101555445
>Bummer that they aren't open sourcing
anon, the weights are open you can download it on huggingfaces
>>
>>101555463
Open-source has a specific meaning, anon. Their license is not open source.
>>
>>101555463
>>101555465
Oh holy shit, thanks guys. Not sure where I got the idea it was proprietary, I guess I assumed because the old one was.
>>
>>101555445
POST LOGS YOU COWARD
>>
>>101555435
why would I want to? For the costs of the cards and electricity costs I can always use the SOTA model, either now or whatever the SOTA model in three months will be. Maybe you people are just bad with money. I guess that would make sense.
>>
>>101555472
Yeah, I know. But it was clear from the context of that anon's post that he meant it in the "able to be downloaded and run locally" sense it's taken on here.
>>
>>101555493
You can run a local model locally without internet, and as models get better you can run better models locally to.
>>
>>101555472
>Open-source has a specific meaning, anon.
if you want to go that path, open source also means that we should know what data they used to train their models, technically, no model is truely open source
>>
>>101555493
>I'll just trust that these SOTA models aren't going to get more and more censored over time, despite huge amounts of money and research going into doing so!
Smart move, bravo.
>>
File: nekomimichan.png (599 KB, 1200x900)
599 KB
599 KB PNG
Is llama 3.1 8b better than mistral-nemo?
>>
>>101555472
I mean, the "source" in the context of neural networks would be the training data and the training code.
Meta, Mistral, etc. are basically just distributing the final, "compiled" result, regardless of what license they use.
>>
>9k
>2.88 t/ps
I didn't anticipate this
by 16k it's gonna be a glacial drip of tokens
>>
>>101555523
Not in any way, shape, or form.
>>
3.5 sonnet is ~75-80 tokens/sec on the api btw
>>
>>101555523
For assistant stuff possibly. For ERP no.
>>
>>101555523
In every way, shape, and form.
>>
>>101555553
3.5 is sovlless though, Claude 2 is way better for our thing. They GPT-ified 3.5, it's not really Claude anymore
>>
>>101555580
>3.5 is sovlless though
it's not, it's still claude even if it's a bit tuned down compared to 3
>>
>>101555580
Claude 2 is sovlless, Claude 1 is way better for our thing.
>>
mistral large is good. need quants now.
>>
>>101555502
I have reliable internet access
>>101555517
I can still run the same model you run on your llm rig tho anon, just a lot cheaper and if that gets censored, well I guess that would hit me either way then, wouldn't it?
>>
>>101555568
>>101555552
i hate this general
>>
>>101555523
In some way, shape, or form.
>>
>>101555604
I know you're shitposting but there's a sense in which 1.2 was better, mostly because it's willing to go a lot darker than 2.x was
The sovl drop wasn't nearly as bad as the one going to 3.5 though
>>
>>101555604
Mmm, nyo. Claude 1 is soulless, da-vinci-002 is way better for our thing.
>>
>>101555630
Maybe in some way, shape or form. Or maybe not.
>>
>>101555618
>I can still run the same model you run on your llm rig tho anon, just a lot cheaper and if that gets censored, well I guess that would hit me either way then, wouldn't it?
It wouldn't, you've always got the same version of your model on your SSD.
>>
>>101555637
>da-vinci-002
it's called "davinci-002", and anyhow text-davinci-003 is older
>>
>>101555654
it isn't like that now though anon, nor did that scenario where all the llms got 100% censored forever ever happen in the last few years. I guess I can wait to buy a rig when that happens and then just use the models I downloaded and enjoy the savings in the meantime, can't I?
>>
>>101555702
>he doesn't know
>>
>SOTA has stood in place for over a year
has local caught up?
>>
>>101554589
I probably need to use IQ2/3 on my 72GB VRAM rig, we'll see
>>
>>101555718
no.
>>
File: gin0xv1ruhed1.jpg (45 KB, 651x500)
45 KB
45 KB JPG
>>101553389
Wow, what an epic meme you made there I sure wonder how you came up with the idea.
>>
>>101555718
yes.
>>
>>101553102
>Mistral Large 2 123B
Were they just waiting for some other company to release a large model like 405 before releasing Mistral Large?
>>
>>101555746
Thanks for the gold.
>>
>>101555746
what if they're both the same guy
>>
>>101555782
Nah, this is a new version.
>>
>>101555718
the fuck you talking about, the sota model has switched like 4 times in the past year
>>
>>101555786
Kind stranger
>>
>>101553389
>>101555746
Cunny: Rich billionaries, me
>>
>>101555718
The SOTA would still be old c.ai if they handn't murdered it several times over by now. A stark reminder of the importance of local open source models.
>>
File: 0 (12).webm (1.15 MB, 832x1152)
1.15 MB
1.15 MB WEBM
>>
>>101555445
It's better at smut but intelligence is meh.
405B just knows more things and stays coherent.
>>
>>101555794
Yeah, because gains are so minimal that it doesn't take much to become the SOTA. There's no giga-cutting-edge gains that put a company on top for eons like OAI used to have.
>>
>>101555825
NKIW
>>
>>101555825
damn, nice!
>>
>>101555830
But the benchmarks said it's better than 405B...
>>
>>101555830
yeah, this is pretty much the dense version of 8x22B.
Only VRAMlets are impressed by this.
>>
File: 0 (5).webm (1.18 MB, 832x1152)
1.18 MB
1.18 MB WEBM
how to:
https://klingai.com/
use disposable email
sign up, throw image into image-to-video with a dumbass prompt like: pastebin.com/iDYAew5r
queue it, you can queue 6 in a row
wait for bake or repeat in another session/browser.
>>
>/lmg/ - local models general
>>
>>101555830
>WOW A 405B MODEL IS SMARTER THAN A 123B MODEL
the absolute state...
>>
>>101555509
Wrong
https://huggingface.co/LLM360/K2
>>
File: miku in love.jpg (67 KB, 600x615)
67 KB
67 KB JPG
When's ooba going to fix the nemo loading problem for me?
>>
>>101555861
it's not a local model anon
>>
>>101555917
the miku is a local gen
in this local mikus general
>>
>>101555892
Can you do one with a chubby belly visible? Ideally with fluffy brown hair, but feel free to do whatever you want. Thanks bwo.
>>
>>101555929
retard nigger, the video isn't local generated
>>
>>101555892
can it do nsfw?
>>
>>101555892
btw /lmg/ if you don't know this guy he's teebs, a known pedo fucker in aicg, you can see the kind of shit he makes in https://rentry.org/sweetbots
>>
>>101555825
I like this Miku
>>
>>101555858
it mogs 8x22b though, this is easily SOTA local RP
>>
>>101555233
swap it out for a 20amp breaker, it's super easy
>>
>>101555907
sus
>change ctx from 2k to 8k
https://huggingface.co/LLM360/K2/commit/ef6007e68177e01d24474c19377c754298a831ed
>"weird" llama 1 65b leak name in the original config too...
>the config file has this as file path which looks a little weird
>huggyllama/llama-65b
https://huggingface.co/LLM360/K2/discussions/1
>>
>>101555934
rope yourself newfag
>>101555861
thanks fren
>>
>>101555128
>>101555141
Thank you for replying, I've redownloaded a 9B model (gemma2) and managed to make it run by:
>disabling MMAP (I didn't really know its purpose if I'm being honest, I just read "might make things faster" and enabled it...)
>reducing the amount of GPU layers from 28 to 18 (worked well on 8B)
(sorry for the delay, i fucked up something when updating my nvidia drivers and had to uninstall it with DDU)
>>
>>101555830
we need a coherency benchmark.
I think that's more important than virtually any mememark.
>>
>>101556001
>rope yourself newfag
>/lmg/ - Local Models General
where can I download that image 2 video model, anon?
>>
>>101555892
okay anon, this is VERY creepy. it's time to stop.
>>
>>101556005
>>disabling MMAP (I didn't really know its purpose if I'm being honest, I just read "might make things faster" and enabled it...)
I also said in the same sentence to disable it because it causes a fuckload of problems. Glad you got it working, though.
>>
>>101556016
it's teebs, he openly admitted to being a pedo and engaging with weird stuff with his niece
>>
>>101555957
he fucks pedos? based
>>
>>101555892
nigger that's a baby
not like toddlers are ok but nigger...
>>
>>101555957
i forgot about teebs ever since that one pic of him was posted in aicg forever ago
>>
>>101556020
Yes, that's why I disabled it you silly goose. I wrote it in the past tense for a reason.
>>
>>101555961
yeah yeah, and gemma 9b mogs CR+, right?
>>
>>101555746
>>101555786
>>101555806
The meme turned out to be true, it's literally on reddit kek
https://reddit.com/r/LocalLLaMA/comments/1eb5vx1/made_this_meme/
>>
>>101556041
Ohhh, okay. I thought you enabled it because *I* mentioned it was supposed to make things faster, oops.
>>
>>101555957
what does this guy have with foreskins?
>>
>>101555957
Cute!
>>
>>101556050
Anon, I'm sorry to say it, but you're a bit dumb...
The entire point of >>101555746's post was to laugh at how >>101553389 just copied the Reddit post and just reworded it so it aligns more with 4chan.
>>
>>101556073
I know you were being sarcastic, but I didn't expect to really find that picture on reddit
>>
/lmg is dead. Full of fucking pedos. No wonder companies do not want to release open models. Inb4 some journalist runs a story about this shit.
>>
>>101555892
toddlercons are TERRORIZING redditors LIVE!!!
has AI gone too far????
>>
So anons, all these great model releases recently. Have there been any good coomtunes? All of sao's models feel the same, no changes from stheno 3.2 to the newest 8b coomtunes, nemo is nothing special by itself, no good coomtunes for it because its too new.
wat do
>>
File: CMDR-L405B-ML2407.png (1.82 MB, 3200x3737)
1.82 MB
1.82 MB PNG
Ran the gore card for comparison.
CMDR+ > Mistral Large > 405B.
Meanwhile I noticed parts of the prompt were messing with Mistral because of its instruction following. I remember a diagram somewhere showing 405B getting mogged except for the "instruction following" metric. So I'd have to be more aware of prompting when it comes to these to draw out their potential.
>>
>>101556090
I'm not being sarcastic, anon, do you not get it?
>>101553389 was posted AFTER that meme got posted on reddit since the guy just copied the meme and reworded it for 4chan, so the >>101555746 laughed at how he copied that meme from reddit and inserted the pic from reddit. I think you're misunderstanding the timeline of the events or something.
>>
>>101556098
Learn to prompt. And buy an ad, intervitens.
>>
File: multilingual.png (105 KB, 1070x562)
105 KB
105 KB PNG
>>
>>101556105
how was I supposed to know it was on reddit, at no moment those 4chan posts hinted something like that, I just thought it was a random 4chan montage and that people were making fun of him for being too similar to a reddit meme
>>
>>101556100
it's ober... large is a meme as I expected...
>>
>>101556133
I don't believe that 3.5 Sonnet is at the same level as GPT-4o, it's better. Does this benchmark use proper system prompts, or it's zero shot? Few-shot generation with 3.5 Sonnet works much better than zero-shot, for example for explicit copypastas in Russian.
>>
>>101555973
wont my wiring catch fire
>>
>>101556144
j-just needs a tune...
>>
File: 1.webm (2.03 MB, 1280x720)
2.03 MB
2.03 MB WEBM
>>101555933
>Can you do one with a chubby belly visible?
i have some two-piece swimsuit prompts. it looks like kling defaults to one piece if you just type swimsuit probably due to the training data

>>101555944
>can it do nsfw?
almost certainly not, it's from China

>>101556016
>>101556035
yeah that's too young. just wanted to share the state of the art an also to stick it to that anon that said "you can find better stuff on instagram" good luck finding that on instagram kek

>>101555957
>>101556023
>teebs
not him
>>
>>101556140
You're supposed to lurk /r/LocalLLaMA but act like you don't...
>>
>>101556100
is CMDR+ really that good for erp
>>
>>101556148
https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
>>
>>101556045
no, gemma sucks
logically by the moe scaling law a 123b dense model would be much better than 8x22b, this is not exactly a wild statement. especially when the same company trained both and presumably didn't get any worse at it in the meantime
>>
>>101556159
to be honnest I don't lurk there or on r/stablediffusion often anymore, they seem to have lost a lot of its charm, there's now one post per 2 hours and it's usually some bullshit, it's fucking dead here
>>
>>101556184
Yes, I know, but how are the models actually benchmarked? I mean the prompts and stuff.
>>
>>101556152
we don't even have the base model, fuck...
>>
>>101556162
CR+ is fairly smart but very dry
CR regular is better IMO
then again I'd say L3 New Dawn or Qwen2 are even better than CR regular.
>>
bros i remember toggling on a setting that allows the bot to continue on writing instead of forcing the token limit and now one of my chatbots keeps schizoposting, where the fuck is the toggle to turn it off? i cant find it anymore
>>
>>101556204
>L3 New Dawn
lol
kek
lmoa
>>
>>101556152
no one will tune it, it has a non commercial licence
>>
>>101556162
No model has reached the pinnacle but it's hard to beat CMD for letting vramlets in for F R E E
>>
>>101556204
As a regular CR enjoyer, I trust your judgement and now have to try these two models.
>>
File: 1.webm (3.4 MB, 1280x720)
3.4 MB
3.4 MB WEBM
>>101556188
generative AI in general is in its post-novelty stage. people only care about it if they masturbate to it or use it for creative work, and most people would rather consoom than create
>>
>>101556204
i havent tested local models for ERP since late 2023 so i'm very behind on what's good now lol
>>
>>101556203
>>101556233
so it is oger...
>>
>>101556162
Yes. A little schizo, but fuckdamn genius and way less slopped than anything else. Wouldn't trust it as far as I could throw it in terms of shit that requires precise factual replies, though.
>>
i can't believe it... it's really over...
we were so fucking close but we failed, it's really, truly owari da...
>>
File: Mistral large fail.png (162 KB, 924x513)
162 KB
162 KB PNG
Sorry guys.
At Q5_K_M I just can't recommend Mistral Large. Hands are an instant Nala Test failure.
I'll do a couple of rerolls at different temps to see if it gets the feral part right... but it's not looking good.
>>
>>101556288
bui nay sub yestersdauy !
>>
File: 1.webm (2.26 MB, 1280x720)
2.26 MB
2.26 MB WEBM
>>101555933
>chubby belly visible
kinda looks like she has the creature from Alien inside her

>Lying on her side on a sun-kissed beach, a lovely little Russian girl wears a colorful rainbow-striped two-piece swimsuit. Her bare legs are extended, and she playfully kicks her feet in the air. Her blonde hair cascades over her shoulders, and her youthful face shines with happiness, perfectly capturing the essence of carefree childhood.
>C/R 0.85
>>
File: file.png (60 KB, 1856x428)
60 KB
60 KB PNG
>>101556133
I think this result is mostly because 405B tends to get too creative with its translations, so it ends up diverging from the reference translation, even if the translation ends up being correct in context.

>>101556148
This result just tells you that 4.0 is at the same level as 3.5 Sonnet in this benchmark; the difference is within the margin of error.

>>101556193
You can find all prompts here: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard/tree/main/results. However, cloud models are tested using the chat API as you can see in picrel.
>>
>>101556295
It's over, we've reached the point where quantization is just pure poison to models
>>
>>101556295
>mistral large won't turn every kemonomimi card into furshit
based frenchies, downloading now
>>
>>101556247
I am surprised at how well the shadow tracks with the body. Of course it's not perfect, for example the shadows of the tree leaf's don't look right in the slightest. Especially the shadows of leaf's in front of her left foot. Also the tree in the far back left is not casting a shadow at all when compared to the other trees.
>>
>>101556295
Nemo got it right didn't it?
>>
>>101556295
>123b model
>Her hand claws
owari da...
>>
>>101556312
Can you try 3.5 Sonnet with few-shot prompting? Give it two examples from the dataset with pairs of Japanese/English, especially XML tags would help. Or can you share the benchmarking script? I'd like to see if 3.5 Sonnet can be made better with a custom prompt with examples. And of course you won't be checking it on those exact examples if they'll be in the system prompt.
>>
>>101556133
>opus not even tested
slop list
>>
>>101556313
>It's over, we've reached the point where quantization is just pure poison to models
that's why Bitnet must be a thing, no more quants will exist after that
>>
>>101556295
What's the token probability on that?
>>
>>101556312
do u need api sonnet 3.5/opus 3? i have a private proxy i can let u use for benchmarks if you won't ramp up like $1000 a day
>>
>>101556295
>At Q5_K_M I just can't recommend Mistral Large.
it means that Q6_K is fine?
>>
>>101556312
Oh, right, looks like you're passing examples already, but as pairs of user/assistant responses, sorry for being blind :(
>>
>>101556343
How do I check that?
>>
>>101556314
This but unironically. Some models tend too much towards making normal cat girls into just talking cats.
>>
>>101556395
>nalanon is illiterate
another one down, how the mighty fall
>>
>>101556395
Copy the chat over to Mikupad.
>>
>>101556424
SillyTavern has token probabilities though
>>
>>101556424
stop shilling this meme, mikupad is shitware
>>
>>101556435
For Llama.cpp? Last time I heard it only worked with OpenAI.
>>
>>101556315
>I am surprised at how well the shadow tracks with the body
holy shit yeah I'm impressed with that too. local models still struggle with simple shadows on still images
>>
>>101556448
It's not only for openai (e.g. it also works for nai), but idk if it works for llama.cpp.
>>
>>101556447
If there's something better for non-chat completion, do tell, I'm always looking for good frontends.
>>
>>101556465
shill
>>
>>101556477
what? are you retarded? I'm not saying to use NAI, I just checked the ST code and only found references to NAI for logprobs except OAI, the code isn't easy to follow.
>>
>>101556465
Well, I guess he could try it then.
>>
>>101556485
closeted shill
>>
File: nani.jpg (92 KB, 626x626)
92 KB
92 KB JPG
>>101556295
Anon, EVERY model has fucking issues with how anthro girls should be, Focks and co. had this issue in spades on Claude, GPT-4, etc., with them having paws and fur despite being explicitly defined as human except for fox ears and tails. This means nothing.
>>
enabled request token probabilities but didn't get anything so it doesn't work for llama.cpp
>>
>>101556334
>>101556387
Yeah, I always prompt with at least 9 examples.
You can find the benchmarking script here: https://github.com/lmg-anon/vntl-benchmark

>>101556352
That would be helpful for opus. My e-mail is lmg-anon at proton.
>>
whats the best model for erp these days? I still have mythomax 13b
>>
>>101556577
Gemma 9b and Mistral Nemo 12b are pretty good
>>
>>101556577
then you already have the best
>>
I thought the dense model meme died when mixtral was released
>>
>>101556574
do you feed the model a prompt fit for the task / prime it for the translation? ive found at least with opus the translation quality can range from meh to great depending on that
>>
>>101556435
>>101556448
Log-probs works with Silly and llama-server.
>>
>>101556621
Maybe update? >>101556560
>>
File: file.png (201 KB, 887x859)
201 KB
201 KB PNG
muh mistral nemo
>>
>>101556637
Replied in the wrong order, excuse me.
>>
>>101556637
I probably should. I'm using 1.11.3 dev branch still lel
>>
>>101556615
Both.
>>
>>101556621
>>101556637
>>101556560
I just tested like so :
>Enter a card;
>Opened the Token Probabilities panel;
>Generated a message;
It just werks.
I am on SIlly's staging branch if it matters at all.
>>
>>101556649
I blame leddit for that
>>
>>101556670
which version?
I'm on 1.11.3
>>
>>101556689
1.12.3 'staging' (a57d9949)
>>
Wait so is CR+ really better than Mistral Large2? I just started downloading it. :/
>>
>>101556100
>Meanwhile I noticed parts of the prompt were messing with Mistral because of its instruction following
I've noticed this with mistral-large too. I have some custom cards that I've added certain instructions in the card, over time, to help guide dumber models in the right direction. But mistral-large is so good at autistically following instructions that it latches on to literally everything I write in the card, in ways that make things worse. I've got some responses I didn't like, went to the card, delete a couple of sentences that were messing with it, then regenned and it's perfect. Kind of amazing, actually, the card really matters and you can greatly shape how the model acts with just a few sentences.
>>
>>101556702
SillyTavern 1.12.3
It is my own local branch with staging merged on it. I haven't messed with Log Probs code.
>>
>>101556649
>Moderator called. Bye.
kino
>>
>>101556574
Sent an email, and thanks for the repo link.
>>
>>101556716
>But mistral-large is so good at autistically following instructions
hears that with mixtral too...
>>
Large is pretty good, though it seems to have the usual Mistral overconfidence issue where it has one way it wants a story to go and sticks to it unless directly ordered otherwise. Not sure why all Mistral models are like that.
>>
>>101556295
N-no...
>>
>>101556768
overfitting
>>
>>101556312
has anyone actually committed and translated a full (non nukige) VN using a LLM?
>>
>>101556768
>>101556782
Specifically it's the RLHF / DPO. There's even a paper that puts some hard data to this phenomenon: https://arxiv.org/abs/2406.05587
>>
>>101556768
Nemo is the other way, it likes to hallucinate a lot for simple questions. And it gives different answers each time. Is Large not like that?
>>
>>101556855
It won't cost that much with 3.5 Sonnet I'd reckon, under $100 for sure, at least for shorter VNs.
>>
>>101556649
>>101556723
you should start talking to the mod now, are you really going to give up that easily?
>>
>>101556855
yes, i know someone on f95 translated a few using gpt 3.5 and i think 4
https://f95zone.to/threads/dazedmtltool-a-tool-that-provides-quality-mtl-translations-using-chatgpt.155635/
>>
Haven't looked at these threads in ages. What models are recommended for porn these days?
>>
>>101556855
I translated 1/3 of Mashiro-Iro Symphony using GPT 3.5 Turbo. The result wasn't perfect though, so I had to proofread everything, and before I finished the official translation was released, lol.
>>
>>101556921
>What models are recommended for porn these days?
>101556577
>mythomax 13b
>>
>>101556921
Mistral Nemo if you're a VRAMlet.
>>
>>101556922
been sitting on the idea for a while but i may try throwing all of amakano 2 through opus tbhdesu
>>
>>101556921
Nemo, L3.1 70B, Mistral Large 2 (maybe)
>>
>>101556933
>mythomax 13b
HAHAHAHAHAHAHAHAHAHAHA
>>
>>101556934
>>101556952
>>101556953
How much VRAM do you need?
>>
>>101556953
Still much better than the vindictive woke modern bots.
>>
File: Untitled-1.png (404 KB, 3464x748)
404 KB
404 KB PNG
Mistral-Large-2 does seem quite impressive from my testing. Seems to get nuance better and being "smart" better than Command R+. It's creativity is debatable.
Why yes I am using yuri logs as examples, problem?
>>
>>101556961
Read.
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
>>
>>101556973
HAHAHAHAHAHAHAHA
>>
File: wateremelon test win.png (273 KB, 965x736)
273 KB
273 KB PNG
Guys.
We finally have a model that passed the watermelon test in earnest.
Well kind of passed.
It seems to interpret the weight of the watermelon as the reason she can't accept any more.
>>
Will running two extra 3090s in PCIE 4.0 1x be enough for Mistral Large?
>>
>>101556989
MAKE ME YOURS
MAKE ME YOURS
MAKE ME YOURS
MAKE ME YOURS
MAKE ME YOURS
>>
>>101556980
>>101556980
>>101556980
>>
File: strawberry test pass.png (83 KB, 957x242)
83 KB
83 KB PNG
>>101556989
Holy shit bros
>>
>>101557009
>random asterisk
>breath hitching
>shiver down spine
>body tingling
holy sloppa
>>
>>101557031
what model is the least stopped of them all (local or API wise)?
>>
>>101557050
>mythomax 13b
>>
>>101557050
pyg 7b
>>
>>101554334
>Several thousand dollars worth of GPUs
So two 3090s is all I need to run 405B?
>>
What's the difference between koboldcpp and llama.cpp?
>>
>>101555081
the cuda backend is part of ggml, quantize is part of llama.cpp
>>
>>101557163
Right one is a dirty fork
>>
>>101557084
>7b
>llama
>>
>>101557097
That's like $1400 if you aren't retarded
>>
>>101556977
What about 405B?
>>
Local bros...
https://venturebeat.com/ai/ai-arms-race-escalates-openai-offers-free-gpt-4o-mini-fine-tuning-to-counter-metas-llama-3-1-release/



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.