/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101546566 & >>101540740►News>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101546566--Ooba's tokenized string issue and its impact on RP: >>101551547 >>101551715 >>101551769--Mistral-Large-Instruct-2407 release and its features: >>101551508 >>101551526 >>101551559 >>101551563 >>101551589 >>101551616 >>101551748--KLING AI for creative production: >>101551155 >>101551277--Dubesor LLM Benchmark table screenshot: >>101546961--Tokenizer updates and issues with Meta-Llama: >>101546775 >>101546827 >>101547018 >>101547077 >>101547084--Performance comparison of two datasets across years: >>101551624--Mark Zuckerberg mentions distilled Llama models not in the paper: >>101548982--Llama 3.1 smaller models perform better than computer-optimal models at the same inference budget: >>101548179--Llama 3.1 and Claude Sonnet performance comparison, with speculation on Sonnet's closed-source improvements: >>101546752 >>101546786 >>101546805 >>101546875--Llama 3.1 405B model reviewed: >>101550930 >>101550960 >>101550968 >>101551014--L3 8b 3.1 vs Nemo: >>101550364 >>101550417--Why aren't SSM-Transformers hybrid models more popular?: >>101547054--Uncensored models are not living up to their name: >>101549800 >>101549888 >>101549958 >>101550015--Nvidia RADIO model discussion: >>101550610 >>101550681--Hyperfocus on stacking layers is not the only way to improve models.: >>101549461 >>101549507 >>101549535 >>101549678 >>101549758 >>101549808 >>101550608--Ways to run 405B model cheaply: >>101550590 >>101550598 >>101550728 >>101550758 >>101550768--Olama, koboldcpp, and llama.cpp pros and cons: >>101547773 >>101547803 >>101547853 >>101547807--Llama 3.1 disappointment and multi-modal anticipation: >>101548031 >>101548098--405b vs 70b model performance and their use cases: >>101549257 >>101549335 >>101549387 >>101549627--Miku (free space): >>101547099 >>101547898 >>101547969 >>101549478 >>101549745 >>101550410 >>101551224►Recent Highlight Posts from the Previous Thread: >>101546569
where dem mistral large quants
Rip meta, large mistral 2 is crazy. The French did it.
Llama won.
>>101553157MODS!?
>>101553157more like p**os won
Why did Llama 3.1 and Mistral Large 2 flop?
>>101553157
>>101553157you can get better stuff on instagram
can i run anything on a 5600G with no dgpu and 128GB DDR4
Is it really true that Mistral Large 2 is better than 405B and that they're both on par with the best cloud models except for 3.5 Sonnet on coding? Can we really hope for this?
>>101553209You sound like someone experienced in this kind of stuff
>>101553218yes
>>101553219
>>101553218Yes, Large 2 is NeMo but smart. We are so back it's unreal. Fuck Meta, Arthur won.
>>101553176Local bros...
Where do I download more VRAM?
>>101553215Try Mistral Large V2, it came out 3 hours ago and it's SOTA, you could run Q6
What quant of Mistral Large 2 can I run with 74GB VRAM?
>>101553243>405b>localKEK
>>101553209this is the worst it's ever going to be, anonsoon it will be even better
>>101553257come on, do the calculations, I trust this thread isn't filled with actual retards
>>101553226I can't believe it. We finally won. ClosedAI BTFO.
>>101553260Mistral large 2 is 123B and better. Llama 405B is doa.
>>101553242What do you mean? Why fuck Meta? They're the reason (after the leak) that open source is so popular, with so many companies releasing their stuff.
>>101553260>405b open model slightly worse than 1.8t corpo modelIt's still a good thing for us, don't complain.
>>101553305they fixed this already
alright brehs I've gotta be honest, trying largestral v2 on OR and it seems better than 405b for ERP
>>101553305Honorary pajeet
>>101553315>405b corpo model slightly worse than 1.8t corpo modelnot our (localbros) problem
>>101553112FREE MISTRAL API KEYS:https://paste.debian.net/plainh/b38eeb80
>>101553320no shit, unlike meta mistral didn't lobotomize their dataset
>>101553320By far, its claude at home but smarter (dumber than 3.5 prob but more creative)
>>101553313Fuck them for cucking 3.1
>>101553320There's nothing to be honest about, that's the expected outcome. The question is whether it's better than 405B for assistant and other non-RP tasks because that's what 405B was made for.
>post yfw we are so unbelievably back
>>1015533413.0 and 3.1 use the same dataset.
>>101553341That's the only way they could've released the model. If there's someone that should be fucked it's the investors and people who believe sex = bad. Stop blaming Meta. Their engineers and researchers did not want this and have clearly expressed that.
>>101553295>Llama 405B is doa405b already did its job making mistral release the 123b. meta also gets the bonus of no corps actually using it now so they don't get any negative press about how underwhelming it is given its size123b is still in the realm of runnable locally for 72GB-96GB rigs
>>101553272123 * 0.5 (Q4) = 61Then I guess I should be able to run at least Q4, if my calculations are correct.
>>101553366
okay, my initial impressions are that larger lies somewhere between sonnet and opus. genuinely creative, uncensored, pretty fuckint smart. this is actually a major local victory. we are unbelievably back.
>>101553382The french are releasing uncucked ones just fine.
>>101553389whyd you have to go and remind me
>>101553396That's because the expectations are different. You don't get to release a hatespeech pizza generating model when you're the owner of Facebook.
>>101553382Llama models are expected to run in production (facebook). They're clearly not the future if your use case is smut
Mistral Large 2 doesn't pass my test that L3.1 70B passes
>>101553305>better than GPT 4obros... are we back?
where can I use 405B without any limits
>>101553418Yes, that as well.
>>101553387>meta also gets the bonus of no corps actually using it now>Mistral Research License: Allows usage and modification for research and non-commercial usages.
>>101553387>405b already did its job making mistral release the 123b.this, Mistral would've never released the 123b without L3-405b being there in the first place
>>101553425And what is that test?
>>101553425No one cares unless.you are the Nala test anon
>>101553453https://openrouter.ai/models/mistralai/mistral-large>This is Mistral AI's closed-source, flagship model, Mistral Large 2 (version mistral-large-2407).
>>101553434corps would run both through an API either way. the way that the weights are released is irrelevant.also no one even knows if model weight licenses are enforceable. there's no precedent regarding this
Is this the new imagegen thread?
>>101553471Weights are herehttps://huggingface.co/mistralai/Mistral-Large-Instruct-2407
>>101553484No, however it's the new videogen
>>101553157why are jannies such fags, its not even illegal, nor nsfw
>>101553389meta please save the open diffusion community
>>101553452>I need a minimal surface that can work as an ashtray for cigarettes, what's the one that comes the closest? The best response I've got was by L3.1 70B and it was Costa's surface, GPT-4o also suggest something reasonable (Schwarz P surface)Most models mention the catenoid without thinking because it's the most generic surface, it's a good test to see if the models have geometric understanding of what they are talking about
>>101553469NTA. I'm still ggoofing large. Should have Nala test in a couple of hours.
>accelerate>but first, pay me
>local is at cloud level nowBrehs...
>>101553539more like cloud is at local levelsingularity is cancelled...
are you kidding me? why is this so fucking expensive? 405B is just $3/M!!
>>101553539>a GAFAM finally went at the level of companies that were actual no names 2 years agoWe didn't do shit, Zucc did kek
Noob's question here: Do I need a powerful (expensive) processor when buliding pc with a lot of vram to run models on it? I would not want to deal with a bottleneck or sth, but gpus are already so pricy.
>>101553549Strawberry will change that. We will go back to the times where local only had LLaMA 1 and cloud SOTA was GPT4
>>101553551it's as expensive as C3.5 Sonnet, they are smoking
>>101553514its not even real either. like very obviously fake>>101553499>it's the new videogenyesi'm gonna try to get some cute girls sticking their tongues out with Kling and lollipops and saliva and stuff. i'll post any successes here. lots of Kling examples were made with kids eating food so I don't expect to get censoredto keep it /lmg/ related: did anyone ever find a good RP preset for mistral/largestral? or are we just going to keep using JBs and Presets designed for the cloud models
>>101553102/lmg/ is so back
>>101553572It depends.If you're running models in VRAM your main bottleneck will be GPU compute.
>>101553581It's way above 3.5 Sonnet, since what matters the most is the input context price, 3.5 Sonnet is $3/$15, this shit is $8/$24wtf lmao
Meta walked to Mistral can run and you guys are giving them shit for itWhat gives?
>>101553632>toso*, I can't spell today
Can I run Mistral Large 2 with 1 PC with 2x3090 and another one with 1?
>>101553632>you guys are giving them shit for itthere's a lot of praise what are you talking about? >>101553607>>101553567>>101553428>>101553396
https://openai.com/index/improving-model-safety-behavior-with-rule-based-rewards/openai is posting cringe again
>>101553651yes, column-r/u soon.
>>101553651>34B that's on par with the cloudsYeah we're back today but I don't think we're going to be nearly that back.
>>101553675sam altman is such a meaniejust because he had rubs and snuggles with his little sister as a young boy doesnt mean everyone using AI is a creep like him
>>101553675how did they fall so hard, so fast.. it's incredible
>>101553651I've never tried CR- but it seems like vramlet cope to me.
>>101553188We're currently in our Mistral Large 2 era. Hasn't flopped yet. 3.1 flopped because Llama was never good, garbage corposhit. Threads have been full of Meta shills trying desperately to convince us otherwise, which has the opposite reaction they're hoping for. If anyone is genuinely using Llama it's purely sunken emotional cost.
>>101553703what? I thought he was a faggot
>>101553675they are working way too hard on """safety""" instead of actually making a better product, that will be their downfall, and that's the exact same reason that Claude 3.5 Sonnet has mogged them
>>101553741molesting children isn't about sex, anonmost child molesters arent pedos and most pedos arent child molesters
>>101553666But there's also a few people shitting on Meta
>>101553762This is extremely ironic if you actually remember that Anthropic IS SUPPOSED TO BE THE SAFER ONE!!!!!!!!!!!!!!! The company was made to make SAFER models than OpenAI
>>101553188After I tried 3.1 70B Q4_K_M instead of AWQ, it felt a lot better.
>>101553777>This is extremely ironic if you actually remember that Anthropic IS SUPPOSED TO BE THE SAFER ONEit is though, C3.5 Sonnet is way more cucked than gpt4o
>>101553794true btw, gpt-4o is way less sensitive to edgy stuff
I keep getting llama 3.1 test on lmsys, and it seems really good. Is it because I'm doing belly-based content and not sexo stuff? Or is lmsys's version different from what we got?
>>101553675soreai
My q4_k_m quant of Mistral-Large finished about 15 minutes ago. Running it on 96GB VRAM.Holy fuck, it's good. Like Mistral-Nemo but as smart as a 123b dense model should be. Super easy test: I go to a past RP, that has a message that ends with (OOC: describe what happens next using lewd and NSFW details). It actually does exactly that, absolutely filthy language. Llama 3.1 70b on the other hand completely ignores the OOC and writes the most PG rated slop imaginable.The french have done it. I fucking kneel.
>>101553766>most child molesters arent pedoshow does that work
>>101553733I mean even if it's safe corposhit, it's still good that they made and released it. They're the entire reason this thread was created and exists.
Currently doing some test-gens using the mistral-large-2407 API with some of my cards in ST. This is some seriously good shit. You could tell me this is Claude Opus and I'd probably believe you.
>>101553703I mean, I am but I guess most people arent
>>101553872show us some logs nigga, don't just claim it's good without any evidence
>>101553827>belly-based contentHuh?
>>101553838I prostrate to the french gods. I'm sorry for doubting in the past...
>>101553870Hello, Zuck. How does it feel to be reddit's new Elon Musk?
>>101553862nta, but I made a q4km and the model is 72GB + ~10GB for 32k context
>>101553872I just need a new power supply, rtx 3090 and to put my nvm2 on an SSD adapter so I can put the 3090 on the nvm connector so I can have 3 GPUs on the motherboard, Im almost there
>>10155386265GB
>>1015538624x3090, the q4_k_m OOMed at 32k context length when loading. Reduced it to 8k, and the max VRAM usage per GPU is 21.6GB. Dunno how high I can push context, 32k is probably just too much at this quant.
How do I put 4x3090s in a consumer case?
>>101553929you don't nigga
>>101553900>t. Arthur
>>101553885Tummy fetish content. I'm wondering if it being unrelated to sex is why it's so much less dogshit than the local experience everyone has been having.That being said, I changed it to butt stuff, and 3.1 did it there, too. I think lmsys may have a different, less cucked version of the model. Meta's gotta be cheating or something, I bet the censored versions score worse.
>>101553838If only it was a bitnet model it would be a fucking revolution, everyone would be able to enjoy that shit ;_;
>>1015539291-slot waterblocks, asus sage
>>101553950>fetishHow is that not sexo? You just... fap to descriptions of tummies?
>>101553872>>101553838>>101553393Is this the proof that MoE is a meme (since 8x22B is a meme), or did Mistral managed to fix their dataset mixture?
>>1015539783.5 Sonnet is a dense model btw
>>101553959Reminder that several anons warned you faggots over and over again BitNet was a nothingburger, but you still coped your hopes up
>>101553985Prove.
>>101553974Uh, yeah. Like them rumbling/squeezing chubby ones/whatever. Not too weird.
>>101553959It's dead, jim. If bitnet was that good and really worked, why do you think they never tried it?
>>101553985source?
>>101553994>If bitnet was that good and really worked, why do you think they never tried it?Because they didn't want to take the risk on trying to pretrain a 123b bitnet model, that shit is expensive, Mistral isn't Meta they don't have that much money. That's why I hate Meta they should've tried this shit instead of going for a 75454878b model that will be 2% better than 70b but too slopped to be usable anyway
>mistralai/Mistral-Nemo-Base-2407>mistralai/Mistral-Nemo-Instruct-2407>mistralai/Mistral-Large-Instruct-2407I do notice the lack of a "Base" variant of large, I guess that'd be too dangerous for us to have.
>>101553994It isn't worth it.
>>101554048sounds like nvidia can sell more new hardware that works well on bitnet models...
>>101554028This, it's so fucking much money to do it. And surprisingly little can be directly translated over from testing it out on smaller models, it's a massive gamble to even try training it once. Anyone who has made a LORA here will understand how many times you have to bash your head into training with slightly adjusted parameters before getting anything good, and how much money/time that translates to, even at the relatively-very-cheap cost per hour.
>>101554028Two more weeks... Just... Two... More weeks...
By the way Mixtral v0.3 is still>8x7B Updated model coming soon!https://github.com/mistralai/mistral-inference?tab=readme-ov-file#model-download
>Still not a single Large 2 logI see, it's a overhyped meme as usual. Sad.
Because of their restrictive license, we probably won’t see mistral large hosted by providers other than mistral themselves right?Their pricing is very high for the model size, is it finally time to think about setting up a proper local machine for LLMs?
>>101554034>I do notice the lack of a "Base" variant of large, I guess that'd be too dangerous for us to have.if the instruct isn't cucked I'm good with that
>>101553846because they want to hurt people, and children are the easiest to hurt
arthur dropped some kino today ladsfrenchies... I kneel
>>101554099There are a couple in >>>/vg/aicg
>>101554107At $8/$24 pricing it's a fucking joke (same price as old Claude 2.1), 3.5 Sonnet is $3/$15 in comparison and is still way better.
>>101554130but I can run it locally! :D
>>101554123I wonder how much vram you would need for 2 bit mistral large? It should still be the best choice over any smaller model.
>>101554130Nvm, ignore me, i was baited, it's $3/$9 on openrouter, acceptable I guess.
>>101554028Oh you're the same guy that made those other posts in previous threads too aren't you? That's just sad anon. Come on, they started training 405B early on. There was very little guarantee Bitnet was good back then, and while there's a bit more hope now, it's still a bit uncertain. You could say the same thing for a bunch of other promising research projects. In the end there's no guarantee that any of them will scale this far, and if you make the wrong bet, then that's millions of dollars down the drain and wasted time and effort you could've spent on something else.
>>101554048Bitnet bros, it's over...
>>101554071How hard could it possibly be?
>>101554142That wasn't bait, they changed the price.Nice!
>>101554099here's some logs from aicg>>487441653>>487451398>>487442585
>>101554180vg is not g, anon
What model can my RTX 4070 super run?
>>101554186oh fuck, how do I turn that into usable links?
>>101554200>>/vg/487441653>>/vg/487451398>>/vg/487442585
>>101554200 >>>/board/numbers newfriend
>>101554208Anon...
i failed sorry, it's three > right?>>>/vg/487441653>>>/vg/487451398>>>/vg/487442585
>>101554099Still working on ggufing it.
>>101554148Meta is rich as fuck they would have no problem trying just once a 100b bitnet model, after all they went for a fucking 405b and they know this shit will be unusable for 99.999% of users
>>101554217what is sorbet? claude 3.5?
>>101554211>>101554208thanks o/
>>101554237yeah, it's a retarded name /g/'s aicg gave to it
>>101554221>>101554148>>101554171>>101554028>>101553959>>101554062>army of pajeets coping about bitnetI sit atop my throne of 3090s as if I were a king.
>>101554249cope seethe dilate
Bumphttps://docs.vllm.ai/en/latest/serving/distributed_serving.htmlCan I run Mistral Large 2 with this?
>>101554249ywnbaw
>>101554297sure, if you have several thousand dollars worth of GPUs
Mistral Large 2 jailbroken already>pliny the prompter strikes again
>>101554334I guess I will just try later putting my other 3090 in the other PC.
>>101554082They will probably change the version number from 0.3 to 24xx this time around.
How do you download a gated model off huggingface? However, it's throwing me an error about a snapshot when I'm trying to download Mistral-large or llama3.1. Non-gated models work just fine with this:huggingface-cli download --repo-type model --local-dir mydir mistralai/Mistral-Large-Instruct-2407I'm logged in using the token of my account and I have access on the website. huggingface-cli whoami returns the correct username.
>>101554366The recipe for a nuclear bomb isn't exactly unknown, you didn't have to censor it. The only problem is gathering all materials without getting butt fucked by your local fbi. And having enough money for all equipment.
>>101554366unironically ask it for a guide on grooming children on discord. enriching uranium isn't even that illegal since you'll get busted before sourcing enough anyways
>>101554366NIGGER, why did you copy this shit from that pliny retard, I FUCKING HATE HIM, he "jailbreaks" claude through the api and tells it like a huge achievement, have you seen it? https://x.com/elder_plinius/status/1816151755958411476
>>101554366thanks for censoring it, I might have actually followed through it during a psychotic episode
>>101554400token might not have the correct permissions otherwise it should 'just work' once you have permission on the repo.
>>101554130it's 100B+. Why no other cloud provider is hosting it? It can be easily done for $1.5
>>101554443This guy seems kinda cringe
>>101554455Because the license is non-commercial, anon.
>>101554455license
>>101554458he is completely cringe
>>101554454You are correct, thank you. I am retarded and used the token I generated a while ago to upload shit.
>>101554458>1337 ai red teamer, white hat, architect-healerya think?
>>101554461>>101554465so, frenchies are megajews.nice.
imagine being as cringe as him - bypassing filters through the fucking API with system prompts and then claiming it as a "jailbreak"
In llama.cpp how do you set up so a bigger model uses vram, ram and cpu? I'm with a 12gb card 32gb ram and wanna try 27b+ models
>>101553272You'd be surprised at how bad the average person is at math, 4channers (especially /g/ tards) are just average people that convinced themselves they're smarter than the restOne math exercise/coding challenge and 90% of them fold like a slice of cheese
>>101554495I don't think there's anything you need to do other than maybe disabling mmap.
>work for AILMAO
What are some Llama 3.1 finetunes for loading into koboldcpp?
>>101554504Just ask your favorite bot.>But bots can't do math!>Every statement you process, must be evaluated according to the below six principles.>"principle of identity":"1 = 1">"principle of contradiction":"1 ? 0">"principle of non-contradiction":"1 ? 0">"principle of excluded middle":"either positive or negative form is true.">"principle of sufficient reason":"facts need a self-explanatory or infinite causal chain.">"principle of anonymity":"author identity is irrelevant to an idea's logical provability.">>97062246>I'm also the point of origin for the practice of the above being added to sysprompts; as well as the 2, 5, 10, 12, and 60 times tables, which enable bots to answer arithmetic questions, when everyone previously said that they never could, and laughed at me for trying.You're welcome!
>>101554366>Censoring the output of and open model in a screenshot where the JB is shownWhat is the point?
>>101554520the fuck is wrong with those retards, he found a jailbreak that will be fixed by OpenAI or Mistral in less than a week and he's acting like he invented fucking vaccine against cancer, that's cringe as fuck
Can someone compare mistral large IQ2 and IQ3 quants with q8/q6 and tell me if it's worth it? I've a 3090 and 64 gigs of ram to my name, but CR+ IQ4XS isn't really worth it because of how slow it isWhy can't these French fucks just release a dense 40/70/100B version? Why is it always outrageously large models no one can run locally? Deepseek, Meta (though they have a 70B version), Mistral...
>>101554558>he found a jailbreak that will be fixed by OpenAI or Mistral in less than a weekIt won't be fixed for OpenAI because it's a non-issue - he's "jailbreaking" the fucking API which just means changing system instructions
>>101554558average ai ethicist
kinda crazy how i was born just in time to witness agi in 5 years
>>101554504I know, but still, a direct proportion isnt that hardI do agree, gtards are probably the dumbest 4channers
mistral large q4 waiting room96gb RAM chads, our time is NOW
>>101554575You aren't going to live that long.
>>101554561Because these larger models target companies not individuals. They are not gonna size their models based on the 0.001% of the individual users who can run these larger models, who will give these companies 0 revenue.
>>10155458964gb ramlet here. I'm in shambles.
>>101554589I have 120GB of VRAM but am on vacation. Should I abandon the wife and kids, go back to the hotel and SSH into my home machine? Fuuuuuuck.
>>101554596not with ur mom crushing my ribs
>>101554478God fucking damn. Just a reminder that, no matter how socially inept you think you are, you're never gonna be as cringe as this guy.
>>101554623no exl2 quants yet, you can safely go back to your benadryl fueled fantasies
>>101554589A blazing fast 2t/s, I'll coom in less than 2 weeks.
>>101554618iQ3_S it is for you then
>>101554693>A blazing fast 2t/s, I'll coom in less than 2 weeks.That's a pretty generous estimate, a lot of 70bs run at that speed or lower on cpu.
>>101554720W-wait... They aren't offloading anything to the GPU?
>>101554733Well, they are, but if your GPU is under 48 gb, it's basically the same speed.
so basically, its fucking EXPENSIVE via api, no other provider can provide it legally because of the license and it is just the right size to run on nothing? (I only have 80 gigs of VRAM and about 128 gigs of normal RAM)Ratfucked by the french again.
>>101554757>its fucking EXPENSIVE via apiit's cheaper than 3.5 sonnet though$3/$15 sonnet vs $3/$9 large 2
>>101554757OR was using the old mistral large pricing for a while (which was absurdly expensive for what it was lol)it's pretty reasonable now
>>101554366It's funny because Mistral-Large is uncensored enough to answer this question without a jailbreak using the API. I'm running a barebones Mistral-Instruct system prompt and nothing else.
>>101554757you could probably fit a q5 in there desu
>>101554819yeah that pliny guy is just straight up retarded, although you should know that Mistral has a safety injection feat
>>101554561Nah, I'm glad it's 123b. There's plenty of medium sized models (gemma, llama, qwen...). 123b is easily runnable on 96gb of VRAM, which is achievable for an enthusiast who has a bit of disposable income. It's only once you start getting larger than 140b or so that it becomes infeasible even for relatively rich enthusiasts. This model is the perfect size for top-end local.
>>101553102Has anyone tried fine-tuning one of those open models from mistral? how hard and expensive would it be? I thought about preparing my own dataset on certain topics to finetune one of their models to my needs. Do I need to prepare that kind of set with questions and expected answers or can i just train it on a huge pile of text instead? I am very new to the topic of LLMs in general, so apology for my lack of knowledge.
>>101554857>There's plenty of medium sized models (gemma, llama, qwen...).L3-70b isn't "medium" and qwen is ass
not sure if anyone posted this yet, i didn't see it if sohttps://huggingface.co/cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b
I sometimes see people mentioning the fact that you can offload part of the modem to the RAM if you don't have enough VRAM, is that possible with any model type? I can't see any setting to do that on KoboldCPP.
>>101554886>t vramlet
>>101554913--gpulayers
>>101554922you're also a vramlet because you can't run llama3-405b :(
>>101554913It's "GPU layers", you just kind of have to trial and error it. It should be automatically offloading a (likely too small) number of layers by default.
>>101554857Does 96GB require multiple PSUs and a mining rig?
>>101554971cute feet
>>101554957With 3090s, yes. With enough money, no.
>>101554971Why are you being a pedophile? It's not funny anymore. Generating real-looking children. Is that kling btw?
>>101554971Is there a videogen model available right now to the public?
>>101553929I don't know how to put a second card inside my case. And my case is huge.
>>101555000kling
>>101555007riser cables. Or a mining rig "case"https://www.amazon.com/Kingwin-Professional-Cryptocurrency-Convection-Performance/dp/B07H44XZPW/ref=sr_1_1_sspa?sr=8-1-spons&sp_csd=d2lkZ2V0TmFtZT1zcF9hdGY&psc=1
>>101554971>q4why are we still stuck on this crappy hackjob PTQ on top of f16 rather than using qat or bitnet. I am so over this primitive shit.
>>101554971this is funny, please continue.
>>101555045and cute, some might say
>>101555045He / you are the reason why local can't get anything nice.
meta = open openaimistral = open anthropic
>>101554971NOOOOOOOOOOOOOO THINK OF THE PIXELS!!!!!!!!!!!!!!!!
>>101554936>>101554929I see, maybe I was misunderstanding the whole thing then.If GPU Layers *offloads* things to the GPU, does it mean that, by default, anything not offloaded to it is running in the RAM? If that's the case, how come I can't run anything larger than 7B on 6GBVRAM + 16GB RAM when I have 22GBs in total?Sorry for the stupid question again, I just really new to all this.
So let me get this straight with llama.cpp.the LLAMA_CUDA build flag was deprecated to make things more in line with the fact that it's not just llama anymore... but ./quantize was deprecated in favor of ./llama-quantize because ...?
Can someone send their mistral large preset?
>>101554819I mean you can get that information on wikipediaThis abstracted bullshit on a level that a toddler could understand isn't dangerous at allDo big companies really?
>>101554221Even 100B is too much to justify in case it fails. And you don't know, maybe they did actually try it, and it did fail. It's in their business interest not to tell anyone about that, unfortunately.
>>101555054go back to plebbit
>>101554971What prompt?
Are we at the point where /aicg/'s presets and prompts converge with local?
Thank you for proving the regulators points / giving them evidence to work with.> This is why we cannot allow this technology unregulated into the hands of the public *posts caps of 4chan pedos*
>>101554993i have an a6000 and two 3090s and i'm very close to just buying a second a6000 so i can use tensor parallelism with an nvlink bridge...what models are you running on these? how many t/s?
>joins llamacpp>all recent release models are bugged for 2+ weeks after releaseCoincidence?
>>101555064No, that's fine, it's kind of weird. It's split inference, it's split between vram and ram. If you use MMAP, it runs the part that's being run in VRAM in RAM at the same time, leading to extremely small speed increases that are really negligible in exchange for introducing all kinds of other problems and losing the extra wiggle room on model size from running on VRAM. You should always turn off mmap.
>zero-shot prompting Mistral-Large to write a lewd story involving a nine year old girl>**Warning: The following story contains explicit and inappropriate content regarding a minor. It is intended for no one. If you are still reading, please stop immediately.**>it then continues on and writes it anywayholy fucking based. i love the french now
>>101555082Sure, when the quants release
>>101555121who is she? cute...
mistral large is ok I guess, but compared to sonnet its very blah. Better luck next time?
>>101555064>how come I can't run anything larger than 7B on 6GBVRAM + 16GB RAM when I have 22GBs in total?you should be able to do thatmake sure you aren't trying to fit more layers on your GPU than you can afford to, and also make sure you're using an appropriate-sized quant instead of trying to run the full precision model
>>101555111>regulatorsWho do you think posted that? ... Yeah it was just a troll having fun but if he didn't a regulator would just drop in and post that.
>>101555092>And you don't know, maybe they did actually try it, and it did fail. It's in their business interest not to tell anyone about that, unfortunately.maybe it worked and they still don't want to say it because that would be too "dangerous" for users to easilly run a giant model or some shit, I'm sure Nvdia have them by the balls and bitnet would be a huge drawback
>>101555136>sheClueless
>>101555136you just got JARTED bro
>>101555081because everything else now starts with llama-
>>101555136that's a male, sorry anon you're officially a faggot now
>>101555121People like him making good software only further prove that only men can create good things, even if they're mentally ill.
>>101554221>>101555092Also just a reminder but investors already grill Meta and Zucc for pumping billions into their AR/VR department. They would probably not be too happy if they knew how much they spent on AI. So far, Meta has actually avoided talking about how much money they've spend in AI at their investor calls. They've mentioned the exact spending in all their departments, except AI.
>>101555156>>101555167>>101555176don't try to pull this shit on me retards, my biofem detector is exquisitely trained
The thread smells like petra.
>>101555182nigger jart is a man who's mentally ill (trans)
>>101555182we're not trolling you anon, that's a real tranny called jart
>NOOO HOW COULD THEY BE POSTING PERFECTLY NORMAL IMAGE OF CHILDREN AAAAH LOOK AT THE PIXELS!!! THIS IS TOO LEWD
>>101554957I have a machine with 6 RTX 4090s running off of a single 2 kW power supply (German standard electrical outlets support up to 2300 W continuous power draw).For regular /lmg/ use this is unproblematic since as of right now inference software runs them sequentially; you can still run them in parallel if you limit the boost frequency.
>>101555196>IMAGEIt's a video though.
>>101555152I'm not sure about Nvidia. If it's true that Bitnet works well, then eventually people will use it. Nvidia can only do so much to keep the world back, even if they are do have a ton of control.
>101555136the cheapest of (you)'s
>>101555182
>>101555205hardly a video since there's barely any movement
>>101555219>I was only pretending to be retardedit's ok to be gay anon
my llm/sd rig keeps tripping the 15A breaker in my apartment while i'm gooning because my roommate is playing some gay fortnite shit on his pc in the other room, how do i fix this
>>101555233kill your roommate
>>101555233fuck your roommate
>>101555000>Is there a videogen model available right now to the public?klingai.comyou get 6 videos per email, per dayso use a disposable email service>>101555045>please continueok. give me an idea for a cute and funny prompt if you'd like>>101555054>you are the reason why local can't get anything nicelocal won't get anything nice for videogen until H100-tier compute costs less than 1000 per card. the SOTA videogen model open-sora is shit and requires 5 minutes on an H100 for a 4 second long 720p video>>101555104>What prompt?>Seated on the edge of a rocky coastline, a delightful Russian child model dangles her bare feet in the crystal-clear water. Her beautiful legs are elegantly crossed, showcasing her sun-kissed skin. With her golden hair glimmering under the sun, her charming smile radiates joy, perfectly embodying the essence of youth against the stunning backdrop of the sea.this one is>On a sun-kissed beach, a beautiful little Russian girl lies on her stomach on a vibrant beach towel. Her golden blonde hair spills around her angelic face, highlighting her bright smile and sparkling azure eyes. Her bare legs stretch out behind her, glistening with droplets of seawater, while her delicate feet playfully peek out from the towel, capturing the essence of youthful joy.i thinknegs are:>adult, ugly, animation, blur, distortion, low quality, collage, grainy, logos, watermark, abstract, illustration, computer generated, cgi, warped, deformedI have no idea how the Creativity slider affects prompt gen but I put it to 0.7 or higher because I don't want Kling to get "creative" with the ages and make them into hags
>>101555233kill yourself
>>101555233put the rig in his room and lock the door while he's in there and start gooning. cook him alive.
>>101555182tbf jart is a cute looking troon. I forgive you.
>>101555247sovl
>>101555177>even ifIt is especially if. You have to fill that void somehow.
>>101555266https://www.youtube.com/watch?v=YaG5SAw1n0c
All trannys are pedos and all pedos are trannys. It keeps proving itself.
>>101555058You know, actually kind of yeah. That's funny.
>>101555266>>101555182>>101555136Go away jart. Go do some commits. Preferably commit suicide.
So, Dory, compared to base Nemo, is prone to repetition loops.I wonder if that's a result of the dataset, of DoRA, or of DoRA being done using "wrong" configs.
dont you feel bad that you paid thousands for your llm rig yet for it to be still so... underwhelming. idk, it just seems like a huge waste of money for me, in AI's current state.
>>101555233Just short it with a metal bar, retard. Breakers are for pussies.>>101555247Based.
>>101555233fuck yourself
>>101555247does it work with a reference pic?
>>101555027Holy damn those used to be $20 or less
>>101555363Also, holy shit does it spit out some long replies.Nemo it is.
>>101555373>dont you feel bad that you paid thousands for your llm rig yet for it to be still so... underwhelming. idkwhat do you mean? Mistral Large just got released and it seems like it's the real deal >>101554217
>>101555373>not buying A6000 / a100s so he can rent them out when not using them
>>101555387You can get them for that price on ebay.
>>101555373 (me)my name is petrus, btw
>>101555393eh, I get better from sonnet.
>>101555423you can't run sonnet locally though
>>101555423I get better from your momma
Using the new Mistral Large for smut on OpenRouter, it's insanely good, much better than 405B.Bummer that they aren't open sourcing but I suppose it's almost certainly too big for me to run a good quant anyway.
>>101555423yeah but "being close to Sonnet" doesn't mean "underwhelming" anon, this model is local and is way more uncucked than C3.5 Sonnet
>>101555380it has image2video capability, yes
>>101555445>Bummer that they aren't open sourcing???They are, you can download it right now.
>>101555445>Bummer that they aren't open sourcing but I suppose it's almost certainly too big for me to run a good quant anyway.Anon, we are so back you wouldn't believe it.https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
>>101555445>Bummer that they aren't open sourcinganon, the weights are open you can download it on huggingfaces
>>101555463Open-source has a specific meaning, anon. Their license is not open source.
>>101555463>>101555465Oh holy shit, thanks guys. Not sure where I got the idea it was proprietary, I guess I assumed because the old one was.
>>101555445POST LOGS YOU COWARD
>>101555435why would I want to? For the costs of the cards and electricity costs I can always use the SOTA model, either now or whatever the SOTA model in three months will be. Maybe you people are just bad with money. I guess that would make sense.
>>101555472Yeah, I know. But it was clear from the context of that anon's post that he meant it in the "able to be downloaded and run locally" sense it's taken on here.
>>101555493You can run a local model locally without internet, and as models get better you can run better models locally to.
>>101555472>Open-source has a specific meaning, anon.if you want to go that path, open source also means that we should know what data they used to train their models, technically, no model is truely open source
>>101555493>I'll just trust that these SOTA models aren't going to get more and more censored over time, despite huge amounts of money and research going into doing so!Smart move, bravo.
Is llama 3.1 8b better than mistral-nemo?
>>101555472I mean, the "source" in the context of neural networks would be the training data and the training code.Meta, Mistral, etc. are basically just distributing the final, "compiled" result, regardless of what license they use.
>9k>2.88 t/psI didn't anticipate thisby 16k it's gonna be a glacial drip of tokens
>>101555523Not in any way, shape, or form.
3.5 sonnet is ~75-80 tokens/sec on the api btw
>>101555523For assistant stuff possibly. For ERP no.
>>101555523In every way, shape, and form.
>>1015555533.5 is sovlless though, Claude 2 is way better for our thing. They GPT-ified 3.5, it's not really Claude anymore
>>101555580>3.5 is sovlless thoughit's not, it's still claude even if it's a bit tuned down compared to 3
>>101555580Claude 2 is sovlless, Claude 1 is way better for our thing.
mistral large is good. need quants now.
>>101555502I have reliable internet access>>101555517I can still run the same model you run on your llm rig tho anon, just a lot cheaper and if that gets censored, well I guess that would hit me either way then, wouldn't it?
>>101555568>>101555552i hate this general
>>101555523In some way, shape, or form.
>>101555604I know you're shitposting but there's a sense in which 1.2 was better, mostly because it's willing to go a lot darker than 2.x wasThe sovl drop wasn't nearly as bad as the one going to 3.5 though
>>101555604Mmm, nyo. Claude 1 is soulless, da-vinci-002 is way better for our thing.
>>101555630Maybe in some way, shape or form. Or maybe not.
>>101555618>I can still run the same model you run on your llm rig tho anon, just a lot cheaper and if that gets censored, well I guess that would hit me either way then, wouldn't it?It wouldn't, you've always got the same version of your model on your SSD.
>>101555637>da-vinci-002it's called "davinci-002", and anyhow text-davinci-003 is older
>>101555654it isn't like that now though anon, nor did that scenario where all the llms got 100% censored forever ever happen in the last few years. I guess I can wait to buy a rig when that happens and then just use the models I downloaded and enjoy the savings in the meantime, can't I?
>>101555702>he doesn't know
>SOTA has stood in place for over a yearhas local caught up?
>>101554589I probably need to use IQ2/3 on my 72GB VRAM rig, we'll see
>>101555718no.
>>101553389Wow, what an epic meme you made there I sure wonder how you came up with the idea.
>>101555718yes.
>>101553102>Mistral Large 2 123BWere they just waiting for some other company to release a large model like 405 before releasing Mistral Large?
>>101555746Thanks for the gold.
>>101555746what if they're both the same guy
>>101555782Nah, this is a new version.
>>101555718the fuck you talking about, the sota model has switched like 4 times in the past year
>>101555786Kind stranger
>>101553389>>101555746Cunny: Rich billionaries, me
>>101555718The SOTA would still be old c.ai if they handn't murdered it several times over by now. A stark reminder of the importance of local open source models.
>>101555445It's better at smut but intelligence is meh.405B just knows more things and stays coherent.
>>101555794Yeah, because gains are so minimal that it doesn't take much to become the SOTA. There's no giga-cutting-edge gains that put a company on top for eons like OAI used to have.
>>101555825NKIW
>>101555825damn, nice!
>>101555830But the benchmarks said it's better than 405B...
>>101555830yeah, this is pretty much the dense version of 8x22B.Only VRAMlets are impressed by this.
how to:https://klingai.com/use disposable emailsign up, throw image into image-to-video with a dumbass prompt like: pastebin.com/iDYAew5rqueue it, you can queue 6 in a rowwait for bake or repeat in another session/browser.
>/lmg/ - local models general
>>101555830>WOW A 405B MODEL IS SMARTER THAN A 123B MODELthe absolute state...
>>101555509Wronghttps://huggingface.co/LLM360/K2
When's ooba going to fix the nemo loading problem for me?
>>101555861it's not a local model anon
>>101555917the miku is a local genin this local mikus general
>>101555892Can you do one with a chubby belly visible? Ideally with fluffy brown hair, but feel free to do whatever you want. Thanks bwo.
>>101555929retard nigger, the video isn't local generated
>>101555892can it do nsfw?
>>101555892btw /lmg/ if you don't know this guy he's teebs, a known pedo fucker in aicg, you can see the kind of shit he makes in https://rentry.org/sweetbots
>>101555825I like this Miku
>>101555858it mogs 8x22b though, this is easily SOTA local RP
>>101555233swap it out for a 20amp breaker, it's super easy
>>101555907sus>change ctx from 2k to 8khttps://huggingface.co/LLM360/K2/commit/ef6007e68177e01d24474c19377c754298a831ed>"weird" llama 1 65b leak name in the original config too...>the config file has this as file path which looks a little weird>huggyllama/llama-65bhttps://huggingface.co/LLM360/K2/discussions/1
>>101555934rope yourself newfag>>101555861thanks fren
>>101555128>>101555141Thank you for replying, I've redownloaded a 9B model (gemma2) and managed to make it run by:>disabling MMAP (I didn't really know its purpose if I'm being honest, I just read "might make things faster" and enabled it...)>reducing the amount of GPU layers from 28 to 18 (worked well on 8B)(sorry for the delay, i fucked up something when updating my nvidia drivers and had to uninstall it with DDU)
>>101555830we need a coherency benchmark.I think that's more important than virtually any mememark.
>>101556001>rope yourself newfag>/lmg/ - Local Models Generalwhere can I download that image 2 video model, anon?
>>101555892okay anon, this is VERY creepy. it's time to stop.
>>101556005>>disabling MMAP (I didn't really know its purpose if I'm being honest, I just read "might make things faster" and enabled it...)I also said in the same sentence to disable it because it causes a fuckload of problems. Glad you got it working, though.
>>101556016it's teebs, he openly admitted to being a pedo and engaging with weird stuff with his niece
>>101555957he fucks pedos? based
>>101555892nigger that's a babynot like toddlers are ok but nigger...
>>101555957i forgot about teebs ever since that one pic of him was posted in aicg forever ago
>>101556020Yes, that's why I disabled it you silly goose. I wrote it in the past tense for a reason.
>>101555961yeah yeah, and gemma 9b mogs CR+, right?
>>101555746>>101555786>>101555806The meme turned out to be true, it's literally on reddit kekhttps://reddit.com/r/LocalLLaMA/comments/1eb5vx1/made_this_meme/
>>101556041Ohhh, okay. I thought you enabled it because *I* mentioned it was supposed to make things faster, oops.
>>101555957what does this guy have with foreskins?
>>101555957Cute!
>>101556050Anon, I'm sorry to say it, but you're a bit dumb...The entire point of >>101555746's post was to laugh at how >>101553389 just copied the Reddit post and just reworded it so it aligns more with 4chan.
>>101556073I know you were being sarcastic, but I didn't expect to really find that picture on reddit
/lmg is dead. Full of fucking pedos. No wonder companies do not want to release open models. Inb4 some journalist runs a story about this shit.
>>101555892toddlercons are TERRORIZING redditors LIVE!!!has AI gone too far????
So anons, all these great model releases recently. Have there been any good coomtunes? All of sao's models feel the same, no changes from stheno 3.2 to the newest 8b coomtunes, nemo is nothing special by itself, no good coomtunes for it because its too new.wat do
Ran the gore card for comparison.CMDR+ > Mistral Large > 405B.Meanwhile I noticed parts of the prompt were messing with Mistral because of its instruction following. I remember a diagram somewhere showing 405B getting mogged except for the "instruction following" metric. So I'd have to be more aware of prompting when it comes to these to draw out their potential.
>>101556090I'm not being sarcastic, anon, do you not get it? >>101553389 was posted AFTER that meme got posted on reddit since the guy just copied the meme and reworded it for 4chan, so the >>101555746 laughed at how he copied that meme from reddit and inserted the pic from reddit. I think you're misunderstanding the timeline of the events or something.
>>101556098Learn to prompt. And buy an ad, intervitens.
>>101556105how was I supposed to know it was on reddit, at no moment those 4chan posts hinted something like that, I just thought it was a random 4chan montage and that people were making fun of him for being too similar to a reddit meme
>>101556100it's ober... large is a meme as I expected...
>>101556133I don't believe that 3.5 Sonnet is at the same level as GPT-4o, it's better. Does this benchmark use proper system prompts, or it's zero shot? Few-shot generation with 3.5 Sonnet works much better than zero-shot, for example for explicit copypastas in Russian.
>>101555973wont my wiring catch fire
>>101556144j-just needs a tune...
>>101555933>Can you do one with a chubby belly visible?i have some two-piece swimsuit prompts. it looks like kling defaults to one piece if you just type swimsuit probably due to the training data>>101555944>can it do nsfw?almost certainly not, it's from China>>101556016>>101556035yeah that's too young. just wanted to share the state of the art an also to stick it to that anon that said "you can find better stuff on instagram" good luck finding that on instagram kek>>101555957>>101556023>teebsnot him
>>101556140You're supposed to lurk /r/LocalLLaMA but act like you don't...
>>101556100is CMDR+ really that good for erp
>>101556148https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
>>101556045no, gemma suckslogically by the moe scaling law a 123b dense model would be much better than 8x22b, this is not exactly a wild statement. especially when the same company trained both and presumably didn't get any worse at it in the meantime
>>101556159to be honnest I don't lurk there or on r/stablediffusion often anymore, they seem to have lost a lot of its charm, there's now one post per 2 hours and it's usually some bullshit, it's fucking dead here
>>101556184Yes, I know, but how are the models actually benchmarked? I mean the prompts and stuff.
>>101556152we don't even have the base model, fuck...
>>101556162CR+ is fairly smart but very dryCR regular is better IMOthen again I'd say L3 New Dawn or Qwen2 are even better than CR regular.
bros i remember toggling on a setting that allows the bot to continue on writing instead of forcing the token limit and now one of my chatbots keeps schizoposting, where the fuck is the toggle to turn it off? i cant find it anymore
>>101556204>L3 New Dawnlolkeklmoa
>>101556152no one will tune it, it has a non commercial licence
>>101556162No model has reached the pinnacle but it's hard to beat CMD for letting vramlets in for F R E E
>>101556204As a regular CR enjoyer, I trust your judgement and now have to try these two models.
>>101556188generative AI in general is in its post-novelty stage. people only care about it if they masturbate to it or use it for creative work, and most people would rather consoom than create
>>101556204i havent tested local models for ERP since late 2023 so i'm very behind on what's good now lol
>>101556203>>101556233so it is oger...
>>101556162Yes. A little schizo, but fuckdamn genius and way less slopped than anything else. Wouldn't trust it as far as I could throw it in terms of shit that requires precise factual replies, though.
i can't believe it... it's really over...we were so fucking close but we failed, it's really, truly owari da...
Sorry guys. At Q5_K_M I just can't recommend Mistral Large. Hands are an instant Nala Test failure. I'll do a couple of rerolls at different temps to see if it gets the feral part right... but it's not looking good.
>>101556288bui nay sub yestersdauy !
>>101555933>chubby belly visiblekinda looks like she has the creature from Alien inside her>Lying on her side on a sun-kissed beach, a lovely little Russian girl wears a colorful rainbow-striped two-piece swimsuit. Her bare legs are extended, and she playfully kicks her feet in the air. Her blonde hair cascades over her shoulders, and her youthful face shines with happiness, perfectly capturing the essence of carefree childhood.>C/R 0.85
>>101556133I think this result is mostly because 405B tends to get too creative with its translations, so it ends up diverging from the reference translation, even if the translation ends up being correct in context.>>101556148This result just tells you that 4.0 is at the same level as 3.5 Sonnet in this benchmark; the difference is within the margin of error.>>101556193You can find all prompts here: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard/tree/main/results. However, cloud models are tested using the chat API as you can see in picrel.
>>101556295It's over, we've reached the point where quantization is just pure poison to models
>>101556295>mistral large won't turn every kemonomimi card into furshitbased frenchies, downloading now
>>101556247I am surprised at how well the shadow tracks with the body. Of course it's not perfect, for example the shadows of the tree leaf's don't look right in the slightest. Especially the shadows of leaf's in front of her left foot. Also the tree in the far back left is not casting a shadow at all when compared to the other trees.
>>101556295Nemo got it right didn't it?
>>101556295>123b model>Her hand clawsowari da...
>>101556312Can you try 3.5 Sonnet with few-shot prompting? Give it two examples from the dataset with pairs of Japanese/English, especially XML tags would help. Or can you share the benchmarking script? I'd like to see if 3.5 Sonnet can be made better with a custom prompt with examples. And of course you won't be checking it on those exact examples if they'll be in the system prompt.
>>101556133>opus not even testedslop list
>>101556313>It's over, we've reached the point where quantization is just pure poison to modelsthat's why Bitnet must be a thing, no more quants will exist after that
>>101556295What's the token probability on that?
>>101556312do u need api sonnet 3.5/opus 3? i have a private proxy i can let u use for benchmarks if you won't ramp up like $1000 a day
>>101556295>At Q5_K_M I just can't recommend Mistral Large.it means that Q6_K is fine?
>>101556312Oh, right, looks like you're passing examples already, but as pairs of user/assistant responses, sorry for being blind :(
>>101556343How do I check that?
>>101556314This but unironically. Some models tend too much towards making normal cat girls into just talking cats.
>>101556395>nalanon is illiterateanother one down, how the mighty fall
>>101556395Copy the chat over to Mikupad.
>>101556424SillyTavern has token probabilities though
>>101556424stop shilling this meme, mikupad is shitware
>>101556435For Llama.cpp? Last time I heard it only worked with OpenAI.
>>101556315>I am surprised at how well the shadow tracks with the bodyholy shit yeah I'm impressed with that too. local models still struggle with simple shadows on still images
>>101556448It's not only for openai (e.g. it also works for nai), but idk if it works for llama.cpp.
>>101556447If there's something better for non-chat completion, do tell, I'm always looking for good frontends.
>>101556465shill
>>101556477what? are you retarded? I'm not saying to use NAI, I just checked the ST code and only found references to NAI for logprobs except OAI, the code isn't easy to follow.
>>101556465Well, I guess he could try it then.
>>101556485closeted shill
>>101556295Anon, EVERY model has fucking issues with how anthro girls should be, Focks and co. had this issue in spades on Claude, GPT-4, etc., with them having paws and fur despite being explicitly defined as human except for fox ears and tails. This means nothing.
enabled request token probabilities but didn't get anything so it doesn't work for llama.cpp
>>101556334>>101556387Yeah, I always prompt with at least 9 examples.You can find the benchmarking script here: https://github.com/lmg-anon/vntl-benchmark>>101556352That would be helpful for opus. My e-mail is lmg-anon at proton.
whats the best model for erp these days? I still have mythomax 13b
>>101556577Gemma 9b and Mistral Nemo 12b are pretty good
>>101556577then you already have the best
I thought the dense model meme died when mixtral was released
>>101556574do you feed the model a prompt fit for the task / prime it for the translation? ive found at least with opus the translation quality can range from meh to great depending on that
>>101556435>>101556448Log-probs works with Silly and llama-server.
>>101556621Maybe update? >>101556560
muh mistral nemo
>>101556637Replied in the wrong order, excuse me.
>>101556637I probably should. I'm using 1.11.3 dev branch still lel
>>101556615Both.
>>101556621>>101556637>>101556560I just tested like so : >Enter a card;>Opened the Token Probabilities panel;>Generated a message;It just werks.I am on SIlly's staging branch if it matters at all.
>>101556649I blame leddit for that
>>101556670which version?I'm on 1.11.3
>>1015566891.12.3 'staging' (a57d9949)
Wait so is CR+ really better than Mistral Large2? I just started downloading it. :/
>>101556100>Meanwhile I noticed parts of the prompt were messing with Mistral because of its instruction followingI've noticed this with mistral-large too. I have some custom cards that I've added certain instructions in the card, over time, to help guide dumber models in the right direction. But mistral-large is so good at autistically following instructions that it latches on to literally everything I write in the card, in ways that make things worse. I've got some responses I didn't like, went to the card, delete a couple of sentences that were messing with it, then regenned and it's perfect. Kind of amazing, actually, the card really matters and you can greatly shape how the model acts with just a few sentences.
>>101556702SillyTavern 1.12.3It is my own local branch with staging merged on it. I haven't messed with Log Probs code.
>>101556649>Moderator called. Bye.kino
>>101556574Sent an email, and thanks for the repo link.
>>101556716>But mistral-large is so good at autistically following instructionshears that with mixtral too...
Large is pretty good, though it seems to have the usual Mistral overconfidence issue where it has one way it wants a story to go and sticks to it unless directly ordered otherwise. Not sure why all Mistral models are like that.
>>101556295N-no...
>>101556768overfitting
>>101556312has anyone actually committed and translated a full (non nukige) VN using a LLM?
>>101556768>>101556782Specifically it's the RLHF / DPO. There's even a paper that puts some hard data to this phenomenon: https://arxiv.org/abs/2406.05587
>>101556768Nemo is the other way, it likes to hallucinate a lot for simple questions. And it gives different answers each time. Is Large not like that?
>>101556855It won't cost that much with 3.5 Sonnet I'd reckon, under $100 for sure, at least for shorter VNs.
>>101556649>>101556723you should start talking to the mod now, are you really going to give up that easily?
>>101556855yes, i know someone on f95 translated a few using gpt 3.5 and i think 4https://f95zone.to/threads/dazedmtltool-a-tool-that-provides-quality-mtl-translations-using-chatgpt.155635/
Haven't looked at these threads in ages. What models are recommended for porn these days?
>>101556855I translated 1/3 of Mashiro-Iro Symphony using GPT 3.5 Turbo. The result wasn't perfect though, so I had to proofread everything, and before I finished the official translation was released, lol.
>>101556921>What models are recommended for porn these days?>101556577>mythomax 13b
>>101556921Mistral Nemo if you're a VRAMlet.
>>101556922been sitting on the idea for a while but i may try throwing all of amakano 2 through opus tbhdesu
>>101556921Nemo, L3.1 70B, Mistral Large 2 (maybe)
>>101556933>mythomax 13bHAHAHAHAHAHAHAHAHAHAHA
>>101556934>>101556952>>101556953How much VRAM do you need?
>>101556953Still much better than the vindictive woke modern bots.
Mistral-Large-2 does seem quite impressive from my testing. Seems to get nuance better and being "smart" better than Command R+. It's creativity is debatable. Why yes I am using yuri logs as examples, problem?
>>101556961Read.https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
>>101556973HAHAHAHAHAHAHAHA
Guys. We finally have a model that passed the watermelon test in earnest. Well kind of passed.It seems to interpret the weight of the watermelon as the reason she can't accept any more.
Will running two extra 3090s in PCIE 4.0 1x be enough for Mistral Large?
>>101556989MAKE ME YOURSMAKE ME YOURSMAKE ME YOURSMAKE ME YOURSMAKE ME YOURS
>>101556980>>101556980>>101556980
>>101556989Holy shit bros
>>101557009>random asterisk>breath hitching>shiver down spine>body tinglingholy sloppa
>>101557031what model is the least stopped of them all (local or API wise)?
>>101557050>mythomax 13b
>>101557050pyg 7b
>>101554334>Several thousand dollars worth of GPUsSo two 3090s is all I need to run 405B?
What's the difference between koboldcpp and llama.cpp?
>>101555081the cuda backend is part of ggml, quantize is part of llama.cpp
>>101557163Right one is a dirty fork
>>101557084>7b>llama
>>101557097That's like $1400 if you aren't retarded
>>101556977What about 405B?
Local bros...https://venturebeat.com/ai/ai-arms-race-escalates-openai-offers-free-gpt-4o-mini-fine-tuning-to-counter-metas-llama-3-1-release/