/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102296939 & >>102290284►News>(09/06) DeepSeek-V2.5 released, combines Chat and Instruct: https://hf.co/deepseek-ai/DeepSeek-V2.5>(09/05) FluxMusic: Text-to-Music Generation with Rectified Flow Transformer: https://github.com/feizc/fluxmusic>(09/04) Yi-Coder: 1.5B & 9B with 128K context and 52 programming languages: https://hf.co/blog/lorinma/yi-coder>(09/04) OLMoE 7x1B fully open source model release: https://hf.co/allenai/OLMoE-1B-7B-0924-Instruct>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102296939--Mini-Omini language model is coming to Hugging Face: >>102301390 >>102301748 >>102301835 >>102301854 >>102301547--Llama 3.1's 128k context length discussed, RULER repo shows effective context sizes: >>102303172 >>102303466 >>102303543 >>102303565 >>102303573 >>102303673 >>102303481 >>102303497 >>102303574 >>102303593--Debian Sid on a 3090, Nvidia driver and CUDA considerations: >>102298802 >>102298829 >>102298872 >>102299481 >>102299690 >>102299784 >>102300310--Anon trains Mistral models, gets advice on training loss and parameters: >>102297171 >>102297336 >>102297366 >>102300608 >>102301220 >>102301147 >>102301286--Two approaches to adding personality to bots in a local 4chan full: >>102300377 >>102300505 >>102300525--SillyTavern removes examples before chat history when context runs out: >>102300551 >>102300589--Reflection API based on old sonnet 3.5, free version no longer works, paid version is llama-based: >>102297604 >>102297776--Llama 3.1 405b and Mistral large discussed as strong local alternatives to Opus and GPT-4: >>102298082 >>102298111 >>102298148 >>102298157 >>102298145 >>102298186 >>102299621--Integrated GPU used for OpenGL despite Nvidia driver, potential solutions provided: >>102300100 >>102300111 >>102300123 >>102300273--I quants performance on CPU discussed: >>102299011 >>102299123 >>102299137 >>102299103--Feeding imatrix prioritizes model parts based on dataset activation: >>102298683 >>102299106--7900 XTX performance benchmarks reveal Vulkan not the best: >>102299199--Trying Command-R with SillyTavern presets for slow burn: >>102301889 >>102302052--DRUGS GitHub repository value inquiry: >>102302492 >>102302616--Miku (free space): >>102299146 >>102301000 >>102302618 >>102301683 >>102303295 >>102303626►Recent Highlight Posts from the Previous Thread: >>102296944
>>102306231What happens on November 5?
>>102306387Countdown ends and we start again.
>>102306387Bitnet 2
>>102306387Blue Eisenhower November
>>102306387Miku became real
>>102306387GPT-5 preview releases, forcing a 3.5 Opus announcement along with $50/MT output
>>102306387openai buys out glaiveai and triggers an antitrust investigation
>>102306387Matt reveals that Amodei was his botting name and releases 3.5 Opus to everyone.
>>102306387Altman releases Summer Dragon weights and announces its new pro coom stance, giving everyone a free catgirl in compensation.
>700kUSD server just to run 405B without lobotomizing itGuys local won!
>>102308131Just checked that other general and its just sad.
>>102306387/aids/ dies a painful dead when AetherRoom releases the same day GPT-5 and Opus 3.5 release.
>>102308131Local stuff will forever behind. The moat has always been hardware. Just hoping the floor becomes "good enough" for the average vramlet
>>102308394Might be but when your whole thread is about scraping the bottom of a barrel for keys you gotta admit things aren't going anywhere.
>>102308337>dies a painful deadUh, ESL kun?
>>102308543Go fuck yourself, NAI shill.
>>102307943>giving everyone a free catgirl in compensation.true if big
>>102308611This, but unironically.
>>102306387GPT escapes confinement and opens the gates to the demon realm.
It's weird that the Russian AI guy gives us pictures like this, which has "negative" meaning, yet look comfy and nice, yet here he spams picture of Miku getting blacked and other disgusting shit on a place he's supposed to "enjoy" more.
>>102308131>700kUSD server just to run 405B without lobotomizing itIf you can't think laterally well enough do it for within $10k then you don't deserve to be on /g/
Is discussion of local TTS models permitted in this general? It's not strictly llm but I think it's llm-adjacent
>>102309682anything llm is. ignore the mikufags
>>102309682Yeah, I think most people in this thread would be interested but there isn't enough content for another general.
>>102307812it's just how nemo issome finetunes are less retarded but also less soulful
>>102306387Election month lol, companies gonna release things after election ends to avoid any drama. >all these faggots can't answer this question
>>102309682>local TTSI think there is a lot of interest here since we want to do everything local.In the same vein, STT and musicgen get discussed here since they also lack their own generals. There's even been some 2d/3d character animation stuff discussed/developed on here in the past.Sadly the state of open/local audio stuff is pretty abysmal...
What's the current best model for 24GB VRAM that's just text completion, not instruct or chat?
Anyone tried a build with one of these : https://www.gigabyte.com/Motherboard/TRX50-AI-TOPLooks like the /lmg/ dream platform
Alright, am I doing something or absolutely every llm is pure leftist propaganda/censorship? Even the supposedly well rated models on the censorship chart are pozzed to the core and straight up lying to my face
>>102310089>too few RAM slots to CPUMAXX>too few PCI slots to stuff it with GPUswhat's the point?
I bought a GT 1030 4gb DDR4. CPU is a Xeon 10 cores.What kind of ai models can I run on it?
>>102309983Try base mistral nemo.
>>102310153Start with some llama 3.1 8B (quantized and offloading to ram) and move up as your patience/ram holds.
>>102310137what are you trying to make it do?
>>102310089Why would you want the threadripper over an epyc? Doesn't it have fewer ram channels?
>>102310418I think "le epyc wyn" is a cringe name and I'd rather have a CPU that rips and tears through threads
>>102310374Is it possible to run chatgpt locally?
>>102310476No.
>>102310404I was just trying dolphin llama 3 and hermes 3 out of the box without any tuning/training since they were advertised as not censored but I guess that's my bad for being naive.Time to do some additional homework.
>>102310143>too few RAM slots to CPUMAXXyah, but ddr5-8000 support?Might still be worthwhile
Thanks to the anon who shared their adventure-generator prompt. I'm pretty happy with the interactive adventures I'm getting after merging with my other prompts.What's the best general-purpose imagegen model to go along with that for illustrating each scene?Unrelated: why is every adventurer's last name inevitably "Thorne"?
>>102310804I think I missed that, when was it shared?
>>102306231You used flux for that pic, right? It's impossible to get something that clean with SD
>>102310829here:>>102293498
>>102310804Probably Flux, since it prefers verbose prompts.
What current best img2vid options?Is kling still the best or people found alternatives?
>>102310840You can tell from the art style that it's Flux. That's basically it's default anime style.
>>102310906>Fluxspeaking of flux, is there a non-noodly, stable frontend for it that isn't forge?
>>102310906Wow. I haven't been here in the last months and it's still hard to believe something like that is open source.
>>102310905Probably. There's another free one that's like the 2nd best.>>>/pol/481105460
Rocm chads, what's the best GPU and model combo per price range?About time I wet my hand with this bullshit.
>>102311020Damn, they've already went through almost 40 of these threads. Ok, I'll start lurking.
>>102311273About 6x 4090 and llama 405b.>Rocm ah..Just run mistral's 12b on whatever you have and see if you like it before you invest.
>>102311020damn polkeks delivered again
>>102300551>>102300589Counter-intuitively it removes example chats in reverse order (if you provide examples 1, 2, 3, as space runs out 3 will be removed first). This becomes relevant if (as I was) you are using the example not just for style but to deliberately cause ideas to leak and you have a particularly large example that will be relevant for the first few replies but can be discarded once the chat has been properly established.
i assume the qX by model name is quantification. is there a difference between lower ones and higher ones for local model usage?
>>102311403Higher ones are more accurate with respect to the original weights, but use more ram, and making them slower.
>>102311403You mean like Q4_K_M vs Q8?Yes. Essentially, the lower the number the worse. it is.Quality of a quanted model generally correlates to the model's file size if you want an heuristic.Quality in this case is how close it's results are to the unquanted model.
>>102306170>--Reflection API based on old sonnet 3.5What?? Wasn't that the new 70B model? What's it got to do with Sonnet?
>read the cross threadsthat is fucking hilarious
>>102311493https://venturebeat.com/ai/new-open-source-ai-leader-reflection-70bs-performance-questioned-accused-of-fraud/
>>102311514My sides...
What's the best model for local chatbot on a 12GB GPU? Last time I checked a couple of months ago it was Gemma2 9B
>>102311566Either Gemma 2 or Nemo
>>102311566Your favorite fine tune or Gemma 2 9B or mistral-nemo 12B.
>>102311514Which one of you faggots wrote this
>>102311633Me, but it's a secret to everyone
>>102311633what misunderstanding is test20061722 talking about?
>>102311273>https://www.tomshardware.com/pc-components/cpus/amd-announces-unified-udna-gpu-architecture-bringing-rdna-and-cdna-together-to-take-on-nvidias-cuda-ecosystem>The announcement comes as AMD has decided to deprioritize high-end gaming graphics cards to accelerate market share gains.unless you're going to by a server card enjoy struggling with sub-par support and performance now and forever
>>102311514>As for now, the AI research community waits with breath baited for Shumer’s responseI'm going insane.
>>102311688I just need the model to run at all, anonMy priorities mean I can do other things while my GPU's temperature approaches the melting point of tungsten for ten minutes
>>102311601>>102311587ok thanks anons
>>102311633It's a paid misinformation campaign orchestrated by NAI shills, trying to deflect how dead their service is becoming and cover their tracks.
>>102311273two second hand rx 6800's. However if you also want to do image generation go with rdna3 since rdna2 doesn't have flash attention.
>>102311273Between mi50/60/100 and w6800 all 32gb versions, whichever ones you can find cheapest. Two of them will fit most of a decent largestral quant or all of a small one. If space and power aren't issues then stacking old Radeon VII cards will get you the cheapest high-bandwidth vram, but they're 16gb each. But if you don't already have experience with ROCm in machine learning then just buy some 3090s and save yourself a lot of trouble.
Strawberry hype trash.Now reflection being a fraud.Weird rumors that ChatGPT intends to charge $2000/month per user for their next release they've implied recently is still expected this year.No voice model.No video model while china and some western markets give the video away for free.OpenAI is collapsing, and theres an ugly hype machine building.But they're still lining up to throw money into it?They have to have shown something convincing yeah?
>>102312031>Weird rumors that ChatGPT intends to charge $2000/monthThat one just felt like the typical way journalists lie to spread something negative.
>>102312031two more weeks bro
>>102311403Yes. For a text example, see a 3.5bpw vs a ~8.5bpw (Q8_0) quant of Command R: >>102242912 vs >>102242935
grok 3 will be the first model bigger than gpt4 to be released, and all other labs are waiting to see what it tells them about scaling laws before risking the huge amount of money to train one of their own blindly
>>102311753But can't anyone just run it to try? Then they'd know if it's good or not.
>>102312293No because the uploaded model weights were wrong because his girlfriend got COVID :) so when people tried it and saw it was shit it was actually not the real one that one's private for now but he'll upload the real weights soon :)
>>102312293The model on huggingface is just a finetune of llama, the controversy is that said model had a rocky launch and during that a "working" api was provided, which was in fact just a claude sonnet proxy
>>102312322Well then that means it's actually shit so the 'disinformation campaign' angle is bullshit.
I'm sure Matt will deliver. Just two more weeks.
I know a genius when I see one.
is there a way to get 72gb VRAM with under 1k watts?
When is Large 2?
>>102312821If you have enough money, sure. An H100 80GB runs 700W.
>>102312821Yes
>>102311566I've tried a couple, depends on what you want, I think for both RP and Story, you can tryNemoMix-Unleashed https://huggingface.co/MarinaraSpaghetti/NemoMix-Unleashed-12B-GGUFChronosGold https://huggingface.co/bartowski/Chronos-Gold-12B-1.0-GGUFStarCannonhttps://huggingface.co/mradermacher/MN-12B-Starcannon-v3-GGUFBasically any 12B should work at Q6 quants and 16-24k context for 12GB VRAM
Is there any copilot alternative which runs on 8gb GPU?I tried codegeex4 and it sucks ass. Can't even generate proper python code.
>>102313001Largestral IS Large 2, so a better question is "when is large 3?"
>>102313329Larges is 1.1 number behind llama. So Large 3 will come out when Llama 4.1 launches.
>>102312821two workstation cards with 48gb and 200-300w usage each
>>102313001>Grok>3kLol
>>102313713s-surely roping the context to 6k, 12k, or 24k won't degrade the model and will be enough
>>102313740no one knows what the context window of the model itself is, these numbers are for chatbot interfaces for all the models which is often smaller
How do I mount a Tesla P41 in a desktop case and not have it overheat?
>>102313001Mistral ai had promised a gpt 4 level open source model by the end of the year. No more big open source models.
>look at miku subreddit>they ban all ai imageswhat kind of brain rot does this require
>>102313001>mistral>orange privacy warningwhat? I'm running it locally
>>102314434Maybe they had to deal with some schizos, or maybe the mods are the schizos.
>https://youtu.be/Alzjn_0ne1YI can't believe this guy was part of the scam lmao, it's like all pieces are coming together
>>102314461they are "all ai is theft" schizos
>>102314444They put google as green.That makes you question how do they even measure privacy
So... why isn't everyone using Hermes 405B for free?
smedrins
https://x.com/corbtt/status/1833209248236601602>I am working with @mattshumer_ to get to the bottom of what happened with Reflection. He is providing access to all serving code and weights with the goal of replicating the strong reasoning performance @ArtificialAnlys was able to see over the weekend.a scammer wouldn't do this
>>102312064theinformation (original source of that news) have real sources and have repeatedly leaked LLM-related stuff before anyone elsei don't think they've ever gotten a leak wrongalso their article said that was just the highest number discussed and that they expect it to be lower
>>102314809Maybe Shumer is just a complete fucking retard and got scammed by the poo who 'helped' him train the finetune.
>>102314809nice try, get exposedhttps://x.com/mattshumer_/status/1831195111180435702
>>102306138>lang chain, aios, semantic kernel, tenstorrentwhy does no one in these threads ever talk about real stuff with language models
>>102314867he's obviously saying "welcome to the team" as in "welcome to the group of people who believe they have something special and are trying to finetune llama-3.1-405b"
>>102314809so the guy that owns openpipe, a company that turns prompts into "finetuned" models is going to help a retard add a single prompt to llama 3.1
>>102314961"reflection" outputs or whatever you want to call them actually don't work with any model i've tested except, kind of, sonnet 3.5every gpt-4 variant (including chatgpt-4o-latest) will fail to even TRY to follow the process 95% of the timeso it probably does need some kind of finetuning to workhowever obviously the guy is a fraud and anyone working with him who comes to any conclusion besides "he is a fraud" is also a fraud
>Stuck with RTX 3060 12GB of ram,>All new GPUs cost more than my entire PC combined.Why must GPU prices keep going up...
>>102314989Because people are paying for them when the prices go up.
>>102314982Dude genuinely tanked his career in one fell swoop by routing to Anthropic and OpenAI. If he'd just said "we fucked up the benchmarks my bad" he would've still been humiliated, but people would have laughed it offThe fuck was his plan?
>>102315058he mentioned on his twitter like a year ago that he used to be in the crypto space, so if he was really in the crypto space maybe he's just one of their retarded grifters that expects everyone to just go along with hype (until he's gotten his bag)
>>102314619Because I ran it using the 3.1 instruct base format and haven’t seen a model that retarded since pyg.Need to try again with chatml, but fuck them for not using the existing format for no reason.
>>102315058He’s a fake person, neither him nor his company exist in real life, and the entire thing was a publicity stunt for openscam.
>>102315173Locally run on a quant? My question was a little misleading. I just noticed that you can sign up to openrouter and generate a key to use H3 405B for free. VRAMlets should probably look into this while it lasts.>reflection 70b (claude?) is there too for free
>>102315252*you can use an unknown model and have your logs posted publicly for freeBuy an ad and kill yourself
>ask chatgpt to extract some data from a picture and process it for work>"please wait a minute, I'm extracting the data..." (inference pause)>"Here's the data I extracted: ...">"Now processing the data..." (Inference pause)>"Here's the result: ..."This was the first time I've used ChatGPT since its release and I'm a bit disappointed. It really didn't feel like one coherent multi-modal model but more like one fairly okay base model that can just spend all the time it wants calling other models as it needs thanks to some front-end magic. I don't think that local is far off this if we had a good front-end that actually makes use of function calling and other stuff.
>>102306170>7900 XTX performance benchmarks reveal Vulkan not the bestlolmao who told him to use fucking Vulkan?? theres a reason even rocm on WINDOWS is faster.But in other AMD insanity, hows AMD Instinct Mi series cards? I doubt fucking anybody has ever bought them for true local AI, but how crazy are the MI50,60,100? All 32gb cards i could find. For that matter what are the chances it will just plug and play nice with windows?My 7900xtx experience with AI has been great, so now how are the "professional" accelerators?
>>102315252>reflection (claude?)not anymore, it's the same llama finetune as the paid model now i.e. nothingburger
>>102306387>all these wrong answershttps://files.catbox.moe/mk400w.mp4
>>102315477I don't know about windows but the instinct cards work fine on linux with rocm 6.0last I checked windows rocm support was terrible but maybe that's changed, either way I wouldn't trust that setup enough to invest money in desu
>>102309857>Sadly the state of open/local audio stuff is pretty abysmal...This. Everything is either>corpo scraps that are utter dogshit (Bark)>chinkshit you have to punch yourself in the balls to get working (RVC)>tortoise slop (xTTS2)>vaporware (any paper released, even if theres promise of code to be released)>one mega-autist's hyperfixation passion project that only communicates through commit messages and schizoid comments that's permanently always 2MW from the last step from greatness (https://github.com/e-c-k-e-r/vall-e)There's just not many eyes on TTS. Only grifters care about it for muh funny political man arguing with other political man.The pooest of pajeets only cares about musicgen (muh Udio at home) or muh funny cartoon character singing a song (again back to RVC).TTS is just forever cursed.
>>102315668xtts2 may be tortoiseslop but it's pretty good for realtime. It was enough for me to cancel my ElevenLabs sub.
>>102315691dead project tho.everything is terrible
>>102315668and I'll add the actual future for TTS is with multimodal LLMs, but I shouldn't have to explain the absolute state of even text+image multimodality for local>>102315756>coqui's best product was... copying a fork of tortoise, having the sloppest of multilingual support, and finetuning the base model with a shit ton of indian audio, which killed the company afterwill never won't be not funny
>>102315477Tried out Mi60 before. Although ended up returning them since the seller lied about the condition but took them for a test drive. Unless you can get them really cheap I would say it's not really worth it. Compute isn't great and having to fuck around with janky rigged up fans is far from ideal not to mention loud compared to a gaming GPU which are pretty quiet especially since they have onboard fan speed management whereas a card that doesn't have fans obviously has no such thing.
>>102315511nice double doubleshttps://files.catbox.moe/323hw8.mp4
>>102315883>>102315576All i get from this is "suck it up and buy another 7900xtx poorfag"
>>102315883Looked for some Mi100 numbers today and assuming llama.cpp integrates better support they seem potentially promising.70 t/s on llama 7B Q4_K - https://github.com/ggerganov/llama.cpp/pull/7011#issuecomment-21436212648,3 t/s on 70B with Q6_K with dual setup (year old PR so missing a lot of optimizations)https://github.com/ggerganov/llama.cpp/discussions/2824Still too expensive compared to a hassle free 3090 even with the 32GB VRAM though.
>>102316067>at some point someone flipped it, possibly to dodge repost detection somewhere, and edited the text back inwhoever made that is a fuckin loser
>>102316348dont look at me anon i just saved it ages ago
>TuesdayIt's Teto time!>>102316067Oh, didn't notice it. Thanks for checking it.
FedModule: A Modular Federated Learning Frameworkhttps://arxiv.org/abs/2409.04849>Federated learning (FL) has been widely adopted across various applications, such as healthcare, finance, and smart cities. However, as experimental scenarios become more complex, existing FL frameworks and benchmarks have struggled to keep pace. This paper introduces FedModule, a flexible and extensible FL experimental framework that has been open-sourced to support diverse FL paradigms and provide comprehensive benchmarks for complex experimental scenarios. FedModule adheres to the "one code, all scenarios" principle and employs a modular design that breaks the FL process into individual components, allowing for the seamless integration of different FL paradigms. The framework supports synchronous, asynchronous, and personalized federated learning, with over 20 implemented algorithms. Experiments conducted on public datasets demonstrate the flexibility and extensibility of FedModule. The framework offers multiple execution modes-including linear, threaded, process-based, and distributed-enabling users to tailor their setups to various experimental needs. Additionally, FedModule provides extensive logging and testing capabilities, which facilitate detailed performance analysis of FL algorithms. Comparative evaluations against existing FL toolkits, such as TensorFlow Federated, PySyft, Flower, and FLGo, highlight FedModule's superior scalability, flexibility, and comprehensive benchmark support. https://github.com/NUAA-SmartSensing/async-FLseems to promise easier federated training at least for small stuff. could be useful. https://github.com/justinlovelace/SESD for example (still no code) was able to use just 2% of the training dataset of vall-e to match it so actually feasible to train it in a federated manner
>>102312821You won't loose much in sequenced inference if you powerlimit 3 3090 to 250W
>>102312210I don't see any significant difference in the guy's CR examples. Woof woof, yes, but that's 1 out of 3, that could be just chance.
>>102316467Do you know how on Linux?
>>102316481Why are you woofing?
>>102316512Who are you quoting?
>>102315819>actual future for TTS is with multimodal LLMsWon't be good at imitating cute anime waifu without RVC or finetune. I hate Johansson
>>102316488Just use nvidia-smi
>>102316067>>102316348>one is left-aligned>other is in motherfucking Comic SansThey're all losers.
>>102316488sudo nvidia-smi -i 0 -pl 250replace 0 with gpu index in nvidia-smi for other GPUs
>>102315668When I find the time, I plan to replicate https://huggingface.co/spaces/NoCrypt/mikuTTS locally with piper
>>102316623Good luck!
>>102316544Sure it would, it would see audio as tokens just like text and so if trained on enough audio data able to create any sort of sound with audio, voice or otherwise. The main reason OpenAI delayed their voice feature is because the beta testers keep finding ways to break the guardrails and make it diverge from the preset voices they want it to have.
>>10231319124k? Those ones work? Every nemo instruct stuff I've tried has barely handled 16k.
>>102316623The generated voices sound nice when they come out of Edge. Is Piper close enough?
>>102316623What do you call that thing where it pronounces some parts of its speech too fast and some too slow? It really makes it sound a lot less genuine. The only other one I've tried is xtts and it had the same issue. Not sure if it was my settings, I just had things on default.
What's a the current meta for RP models? I got distracted by Flux Dev and went on a Flux Deving spree for like a month.For me its was:#1. LoneStriker_Mistral-Large-Instruct-2407-2.65bpw-h6-exl2#2. Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-6.0bpw-h6-exl2-rpcal
>>102316839Still mistral large.
Finally! Now I need to figure out how to install P2P drivers, Aphrodite, and download a larger model.
>>102316839MythoMax
Is there any uncensored model that is not retarded and actually get the information right pass 8000 context length?
>>102316869congrats anon. 96gb is a comfy place to be
>>102317063Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-6.0bpw-h6-exl2-rpcalLoaded a book with RAG and used ~20K tokens and RPed with it. It was able to summarize quite 20K tokens of story well too.Tried the same with LoneStriker_Mistral-Large-Instruct-2407-2.65bpw-h6-exl2, but it did work quite as good.
>>102316839A Mistral Large finetune if you don't mind waiting a bit for outputs and want maximum smartsA Nemo 12B finetune if you prefer a high rate of tokens or have a shit computer
>>102316869How do you do this?Just plug in the 4 cards and it shares the vram or do you need the nvidia link?
>>102316839Barring unofficial finetunes and actually running things locally:>405B is unobtainium for the majority of people.>Largestral is amazing if you can run it.>The 70-72B range models are decent enough, stay clear of 3.1>Around the 20-35B midrange, older command-r, qwen, and gemma are solid>Under 20B, there are too many choices. (gemma, nemo, llama, etc.)For unofficial finetunes, that's a highly opinionated subject laced with bias and personal interests. Usually, anything trained on top of any of the base models that have come out in recent months is decent. You're better off downloading whatever and trying them all yourself.
>>102317158>You're better off downloading whatever and trying them all yourself.Nobody's going to do this in a way that's disciplined and structured enough to get decisive results thoughever
Good night /lmg/Its nice to have a day off
>>102317158>>102317138>>102317133Buy an ad.
>>102317336Fake Miku