/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103066795 & >>103057367►News>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B>(10/30) MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebService►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>103066795--Diffusion models merging with LLMs for language generation:>103073785 >103073859 >103073960 >103074715--Using local models for visual novel translation:>103075666 >103075854 >103076003 >103076659 >103077006--Troubleshooting GPT-SoVITS2 with Silly Tavern:>103071219 >103071342--SmolLM2 1.7b can generate a Mandelbrot set, unlike previous Llama models:>103070970--Guide to choosing the right model and quantization:>103068169--Fitting 4 RTX 3090 GPUs into ASUS PRO WS WRX80E-SAGE SE WIFI motherboard:>103072146 >103072175 >103072718 >103072763 >103072766 >103073357--Discussion about AI models, benchmarks, and performance:>103067158 >103067174 >103067237 >103067246 >103067259 >103067289 >103067326 >103067356 >103068797 >103068828 >103067274 >103067460 >103067826--Current GPU meta and buying recommendations:>103066797 >103066998 >103067057 >103067113 >103067157 >103067198 >103067221 >103067228 >103067149 >103067214 >103076090 >103067253 >103067797 >103067801--Chat and image generation on 10GB VRAM, and consistent anime-style SD models:>103070025 >103070054 >103070093 >103070229 >103070522 >103070571 >103070619--Anon tests Noob models on "outstretched hand" prompt, finds Noob 1.0 excels at hand drawing:>103077300--Anon shares experience with LLMs for data extraction and discusses challenges and techniques:>103075416 >103075431 >103076168 >103076216 >103076236 >103076272 >103076273 >103076290 >103076319 >103076260 >103076502 >103076668 >103076773 >103077016--Anon gets SoVITS working with Illusive Man voice lines:>103072261 >103072478 >103072527 >103072781 >103073010 >103072548 >103076145--Konrad's CNN implementation in System Verilog:>103073134--Miku (free space):>103066797 >103068268 >103074576 >103074709 >103076544 >103076601 >103077300►Recent Highlight Posts from the Previous Thread: >>103066923Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
--- A Measure of the Current Meta ---> a suggestion of what to try from (You)96GB VRAManthracite-org/magnum-v4-72b-gguf-Q8_0.gguf64GB VRAManthracite-org/magnum-v4-72b-gguf-Q5_K_M.gguf48GB VRAManthracite-org/magnum-v4-72b-gguf-IQ4_XS.gguf24GB VRAManthracite-org/magnum-v4-27b-Q4_K_M.gguf16GB VRAManthracite-org/magnum-v4-12b-v0.1-Q6_K.gguf12GB VRAManthracite-org/magnum-v4-12b-Q4_K_M.gguf8GB VRAManthracite-org/magnum-v4-12b-IQ4_XS.ggufPotato>>>/g/aicg> fite me
>>103077348>suggesting bad models to newfagsDevilish.
>>103077348i will run 12b Q4_K_Ms on my 8gb card and you'll never stop me
So what about this discord server?
>>103077399It's full of pedos and trannies as you'd expect.
>>103076712>>103077221If you do go with mistral nemo make sure to enable Do Sample and BOS token if you can as well
So now that Meta claims that Llama 4 will be out early 2025, what are you hoping to see from it?
>>103077484BitNet
>>103077484I really fucking hope they dropped ultra ass fuck hard dataset filtering. No matter how smart the model is it won't conjure trivia from nothing.Please be claude, not gpt.
>>103077414>It's full of pedos and trannies as you'd expect.What are you waiting for then?
>>103077484Good, uncensored base. IDGAF about the official instruct.
>>103077484Hoping they live up to their promise of multimodality that was supposed to be in Llama 3.
>>103077552>>103077548You are hopeless. When will you learn that unless western society and culture suddenly does a 360, they're not allowed to openly release anything "uncensored". You should be asking for more Mistral and chink models instead.
>>103077549I'm probably already on a list so I'd rather behave.
>>103077596Trump will win #MAGA2024
>>103077598>only a list>not all of themngmi
>>103077596Anthropic manages somehow. By the time it comes out, the election will be over so there will be less election "interference" hitpieces. Besides, they can make their instruct as censored as they feel they need to. The important thing is that they don't filter the pretraining data to hell.
Does anyone have a chatTTS python script to load sample audio and lets me choose/see a seed?Local TTS models have awfully bad documentation/examples.I know there is a webui but it's buggy, i find using a script much more efficient.
It is entirely unrealistic to expect them to remove any filters they had. They may not strengthen them. But they probably won't just outright remove them when Llama 3 turned out fine (for their business; coomers don't matter to them). Stop coping and just accept reality. Mistral is about the only hope left for you.>>103077621Anthropic is an entirely different company in a different position, producing an entirely different product (or rather, service).>>103077603That helps but won't change the business and the values of investors by Q1 2024. And Llama 4 already began training, so they would've played it safe with the dataset to account for the possibility of future unfavorable political landscapes anyway.
>>103077484I hope ... Who am I lying to? I don't actually have any hope left. The only salvation for LLMs is Anthropic's Opus 3.5
I have 4080SI use 12b model but it's a bit mehI tried 8x7 model but was bit slowI want something around 20-25b modelno idea what q4 or q6 meansonly usage: coomrecommendations?
QTip sounds huge why isn't anyone talking about it? In their github they mention a 1Bit 405b model they were trying too which would fit in like 56gigs
>>103077726Llama.cpp doesn't support it and people don't want to install shit just to try yet another research project that likely isn't actually that good.
>>103077786you fags won't use anything that isn't a 1 click exe
>>103077802yea, you guys suck
>>103077706Magnum
>>103077726Because they released quants for Llama 3.1 8B and Llama 3.1 405B. Even someone with quad 3090s can't run their 2 bit quant of Llama 3.1 405B.(They also released Llama 2 7B, 13B, and 70B, which makes me wonder what they're doing.)
>>1030777261Bit quantization never works. It's just a slightly better QuIP# and that wasn't worth using either. What good is fitting 70B on a single 3090 if the perplexity doubles?
>>103077484I just want base models again. However, I expect that we will only get instruct models at 3B and 405B. Without bitnet, of course.
>>103077706>no idea what q4 or q6 meansIt means download magnum
>>103077818>>103077859stop being a retarded shilll
>>103077726
>>103077484>what are you hoping to see from it?Absolutely nothing. It is gonna be shit for cooming. They will do a 9B and a 70B again. It is gonna be an incremental update that is barely noticeable. And the only good thing about it is probably native multi modal. I won't even download.
>>103077973>native multi modalI bet it will be adapters again
>>103077484I hope the new Mistral will mog them.
Give me your best gooner model that works on 16GB of VRAM. The death of Claude is driving me nuts and I need to blow a load stat. I will try literally any model you link me
>>103078467https://www.cleverbot.com/
>>103078467405b hermes is free on openrouter
Can I voice chat with a local model in real time yet?
>>103078535the free endpoint is only 4k context
>>103078555yuphttps://github.com/Standard-Intelligence/hertz-dev
>>103078555Plenty of options, from moshi to fish agent.
When I was wishing /lmg/ would die I didn't mean for it to become the / /aicg/+caiggers using local models general/... It is just like 4chan in general. A corpse turned into a trophy paraded around by horrible people who should die in a fire.
>>103078535I'll give it a try but I was hoping to try at some new local models as well. I've tried Mythomax, Mythalion and Moistral before and wasn't impressed, that was months ago thobeit>>103078493Not going to try this
>>103078467random 12B tune I guess
ROCm has failed me for the last time.
>>103078467That new killa tune released yesterday.
>>103078582Why did you wish for /lmg/ death in the first place?
>>103078467Mistral NeMo. Dumb but fun. Try the Instruct model first before you try anyone's fine tunes.
The AI boom has been going on for around two years now so why is the integration of local models with other programs still so bad?, in 2022 i was expecting that by 2025 they would be able to do extremely niche stuff like searching for exhentai cosplay galleries that have comments mentioning nip slips or booting up and playing games by themselves
>>103078702Models sucked extra ass 6 months ago
>>103078702The tech landscape is currently filled with shitty startups with loads of cash trying to milk AI, but there is no one who knows anything about it. I'm getting many proposals from them due to my HF repo. Also they want to do B2B not B2C except the nsfw chatbot thing like muah.ai.
>>103078702hallucination on local models is still really bad. we're essentially waiting for models to get more accurate at smaller sizes, or for there to be hardware released that allows you to run very large models quite cheaply.
MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibrationhttps://arxiv.org/abs/2411.00390https://github.com/meta-metrics/metametricsFor VNTL anon if you want to mess around with another metric A Lorentz-Equivariant Transformer for All of the LHChttps://arxiv.org/abs/2411.00446For Johannes. How is your Master's going btw?
Is there any online model that I can use to summarize my 40k+ token long coom story?Rocinante can't cope anymore trying to retrieve info even with the context length pumped to 32k and rags.Does Claude has a long context length?
>>103079112Or maybe a uncensored local model specialized to summarize stuff with a gigantic context window? Do that even exist?
>>103077484Uncensored base + o1 CoT finetune
PatternBoost: Constructions in Mathematics with a Little Help from AIhttps://arxiv.org/abs/2411.00566>We introduce PatternBoost, a flexible method for finding interesting constructions in mathematics. Our algorithm alternates between two phases. In the first ``local'' phase, a classical search algorithm is used to produce many desirable constructions. In the second ``global'' phase, a transformer neural network is trained on the best such constructions. Samples from the trained transformer are then used as seeds for the first phase, and the process is repeated. We give a detailed introduction to this technique, and discuss the results of its application to several problems in extremal combinatorics. The performance of PatternBoost varies across different problems, but there are many situations where its performance is quite impressive. Using our technique, we find the best known solutions to several long-standing problems, including the construction of a counterexample to a conjecture that had remained open for 30 years.https://github.com/zawagner22/transformers_math_experimentsPretty neat
>>103077348>QwenBut that's not how you spell Nemotron!
>>103077484The most critically important component is a lack of censorship. If it doesn't have that, then it's useless at base. Fine-tunes can help a bit with that, but they come at the expense of intelligence. Make an uncensored base model and that intelligence drop is not necessary.If they're going to include politically correct censorship in the model, then I may as well go with a Corpo cloud model instead.Local was made to be free.
>>103079133>>103079112Did you try Nemo?
>>103077706Mistral-Small-Instruct 22b Q4_K_MMagnum v4 22b Q4_K_MMagnum v4 27b IQ3_M>no idea what q4 or q6 meansThe 'q4' and 'q6' refer to quant sizes. You will need to download the relevant GGUF file to run these, at the correct quant sizes to fit your vram limitations.
>>103079565Is even worse at it than Rocinante and even more retarded, I just tested it.
>>103079112>>103079685Split your text into chunks of 8k or 16k tokens, then summarize every chunk one by one, and finally summarize all the summaries merged together
>>103079112Try wizard 22x8 or nous hermes 405B on open router.
>>103079901Tried Hermes and it sucked ass.
>>103079964What about wizard?
>>103079967Tried wizard and it sucked dick.
>>103079969I don't know if llms can do what your asking right now. You can try chunking but that's probably a cope.
>>103079112Qwen 2.5 has enough context although I don't know if it has enough coherence.
>>103079967Not yet but I think it won't make a difference. I may have to chunk like I did time ago and some other had suggested. But from experience summarizing by chunking and then feeding a rag to the model it doesn't do a good job for continuing a story, it's gonna suck in a lot of ways.>>103079969Stfu retard.
Good eRP text LLM for 24GB VRAM nvidia GPU? magnum-4-27B-Q4 is disappointing and giving duplicate generations no matter how much I change the prompts or tweak the values.
>>103079112>Our experiments show that while human readers easily perform this task, it is enormously challenging for all ten long-context LLMs that we evaluate: no open-weight model performs above random chance (despite their strong performance on synthetic benchmarks), while GPT-4o achieves the highest accuracy at 55.8%>405B is only 6 points awayYou may need to use other techniques in order to enhance the capability of an LLM to do summarization, such as prompting the LLM to do state tracking and summarizing every event checkpoint or scene transition. People were discussing an automated system to do this in the past, but I guess it turned into vaporware.
Someday...
what practical model size is anon running for daily use? 8b, 70b?
>>103080398AMD's new 1B model.
>>103078982I'm already done with my Master's degree and currently doing a PhD.If things go well I'll use GGML to fit parton density functions and the strong coupling constant.
>>103080403cactus
>>103080188Nemotron 70b IQ2_S fits with a 4-bit cache and flash attention on, and is way better than smaller models.
>>103080706I feel like that could actually be true but at the same time I kind of feel bad about lobotomizing something that much even if it is just an algorithm...
>>103079112Chunks your story into 16K tokens, then summarize the first part and inject that as a context to summarize the second part.
i want to go back bros...back when i just installed st and had hot maid sex with pyggy and mythomax and summarize feature
Is a CPU-only setup with a bunch of RAM a reasonable alternative to GPU? I'm okay with 1 token/s for 100b+ models
>>103081277>1 token/s for 100b+ models on CPUYou wish
>>103081317Perplexity says you can get 5-10 token/s for 70b on CPU
>>103081277Don't know if intel's new ai chip works as they claim.It's technically still GPU though with their built-in Intel® Arc™ graphics.
i want to learn how to fine tune models. specifically, i've been looking at papers where they're using audio transformers to classify bird sounds. this model:https://github.com/cwx-worst-one/EAThas pretty good performance and is pre-trained on AudioSet which is a bunch of youtube audio clips. in papers, they take a bunch of 10 second audio clips, convert them into spectrograms, augment them with stuff like specaug, "fine-tune the model with adamW". i don't know what that means. i understand how i could generate spectrograms and modify them and stuff, but what the fuck does "using adamW" mean. it's an optimizer, from what i understand, but how do i take the fuckin spectra of bird songs and make the model do math on my GPU? in the EAT github it looks like pytorch is being used. can i just try and follow some sort of huggingface guide and that'll work? i feel like im nearly drowning here.
>>103081442With llama.cpp offloading nothing into VRAM I run a 32B at 1.5 tokens/second and a 70B around half a token per second with 2-channel 2667 MT/s DDR4 RAM. If RAM bandwidth is the limiting factor, as a first order of approximation it seems plausible to me that by going up to 16 channel RAM and DDR5 instead of DDR4 someone could run a 70B about 8 * 4800 / 2667 = 14.4 times faster than I can. That would be around 7 tokens/second. Math checks out.
>>103081277Saw a youtuber get 0.06t/s on a 405b model.h/w was Threadripper Pro 7995wx + 256gb ram. 96c 192t. 8-channel ram.
anons... i'm tired of the cope, i'm tired of the slop...I went back by curiosity to text-to-image local AIs and it's so much easier to get what you want from thesewhen the fuck are we going to be eating good bros...
>>1030812770.7take it or leave it
>>103081654Largestral
>>103081654i had the opposite experience yesterdayflux was making really pretty images, but not really doing what i was prompting for, and my itty bitty 12b was perfectly simulating my warring states period china qin kingdom royal harem
not sure if this was posted yet in here:https://arxiv.org/abs/2410.16454>This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information. To thoroughly evaluate this phenomenon, we conduct comprehensive experiments using various quantization techniques across multiple precision levels. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21\% of the intended forgotten knowledge in full precision, which significantly increases to 83\% after 4-bit quantization. Based on our empirical findings, we provide a theoretical explanation for the observed phenomenon and propose a quantization-robust unlearning strategy to mitigate this intricate issue...
>>103081745>An embarrassingly simple approachWho comes up with these faggy titles and why?Anyway, this just sounds like more incentive for corpos to be more aggressive when filtering.
>>103081654Text-to-image was a pain for me last time I checked, 90% of the time using them I was inpainting things and tweaking the settings because I had a very specific thing in my mind.But textgen is also similar in that I am a compulsive reroller, probably a me issue.
>>103081587Start over https://d2l.ai/
>>103081708Okay Chang
>>103081883i got so good at prompting Pony and optimizing comfyui that I always get what i want, llms are so much more random and i feel like most samplers are placebo anyway
>>103081745The model just forgot that it needs to forget things lol
>>103081988they kinda are desuthe best option is just temp minp and prompting half well
>>103081989How do we get it to remember to forget?
>>103077705Son, Sonnet 3.5 New was actually a failed Opus, but they used that name to cope. Opus 3.5 is never coming.
>>103078467>The death of ClaudeWhat?
>>103082190It will drop the day after some new model beats Sonnet 3.5. They have no reason to release any earlier than that.
>>103077487>BitNetthis, if we really want to advance in this field, BitNet must be a thing
>>103082190Opus is just dead. All the big players have realized that there is no point in training expensive 65B models like Opus when you can get even better performance with just a simple 22B like Sonnet 3.5
>>103078732What's on your hf?
>>103082294My cock pics.
>>103082294LLMs & NLP models and a few vision models
o1 full release today. can you feel? are you excited?
>>103082696No I don't
>>103082696Imagine paying $10 to find out how many Rs strawberry have. At this point it's cheaper to hire chink farms or pajeet farms, the accuracy would probably be higher too.
>>103081277With ddr5 63gb/s bandwidth I get about 0.45 t/s in largestral. Using logic 12 channel would in theory be 6x faster, meaning 0.45x6 =2.7 In practice however it would probably be just above 1t/s, maybe 1.5?
>>103082696I've had the preview for weeks and I don't even use it because the weekly limit deters me. Is the "full release" better in any way?
>>103082793The full version will be RLHF'd using the feedback of millions of pajeets.
>>103082793The search feature on the other hand is pretty cool. I thought it indexes websites like once a day because it reads them so quickly, but it's actually realtime.
>>103082811Take your pajeet cloudshit elsewhere, Sam.
>>103082696>we
>>103082810So it'll just be more accurate? Probably still won't use it then, I'm an engineer but I rarely need to know more than the latest webshitter technology which 4o does fine
>>103082831>pajeet>more accurate?>I'm an engineerGod help us all.
>>103082844Bet you don't even know what the Outbox design pattern is
>>103077338>there are now single board computers with 32 GB RAM and a built-in displayHas someone already put together a project where you can tell a computer/phone to generate an image in natural, spoken language?
>>103081277See the op. https://rentry.co/miqumaxx/Hope you’re not poor
>>103082877>Wasting five minutes to come up with something to tell your computer/phone instead of using a few keywords
>>103082925I have small children in my family so the idea is that I would let them directly say to the thing what kind of image they want.
>>103077348why are the models listed different every thread
>>103083071Xe is le enlightened sekrit club gatekeeper, please understand.
>>103082877It sounds doable if you know basic python programming.
>>103082877You just need whisper and send the output to SD
>>103083338>>103083371I know how to do it, I just don't want to do it myself.
>>103083371isn't SD too heavy for that thing?
>>103083401Lmao as if.>>103083405Nah, SD can run on potatoes now you just have to wait a while.
What's currently the best **uncensored** ≤8GB model? I want to use it as an expensive spellcheck/text corrector, but I don't want it to give comments or straight up remove bad words from the text.For example if I input:>so theres this nigger you know nigger john he is areal dumb fucking niggerIt should output:>So there's this nigger, you know, nigger John? He is a real dumb fucking nigger.
>>103082765Might be better to go for 5th gen Xeon scalable. It at least has AMX.
>>103082696>Elections are ending.>so sam is going to release level 2 strawberry reasoning AGI to change the world Holy fuck
>>103082287>22B like Sonnet 3.5source?
Hello newfags
Not even this influx of newfriends can save /lmg/ we truly have stagnated.
>>103083546ministral 8b, maybe
>>103083781
I'm using an 8GB 2070 super to play with models. I also have a 4GB 770 laying laying around. would there be any benefit to adding the 770 to my rig?
>>103082696>I'm hecking feeling it...>It's so big, beautiful and BLACK...>The BBC... I mean the AGI!
is cpumaxxing worth it in any facet? i know if you add a gpu you can get decent prompt processing speed as well. but i think building a dual genoa = $5-8k. i don't need hyper-speed. i just want to use big models and not wait 25m for a response without having a massive space heater that needs dual psus to function.
>>103083833Go back >>>/pol/troon
>>103083836It wasn't worth it for me. It not that the speed isn't nice, it's just that the big models are kinda meh. 20% better largestral is not worth 8k. Hopefully something in the future comes out that will justify my purchase.
>>103082696why is sama such an underage reddit fuck? jesus christ, this "marketing" is just sad
>>103083546>>103083781 (me)llamabros...
will you guys use llama4 if it's more pozzed but has bitnet?
>>103084075No.
>>103084075Yes.
>>103084075Maybe.
>>103084060WTF? Qwen didn't complain? Didn't expect that. Do you think 8gb quant of Nemostral would do a better job than Qwen?
>>103084105>>103084108>t. cuck
>>103084075It won't use bitnet. End of question. You'll get your basic bitch transformer model with some more multimodality stapled on (3B, 95B) and shut up.
>>103084060It's for your safety chud
what do you guys use to monitor VRAM usage under GNU+Penguin?
>>103084119Are we going to get all of the modalities this time or just image input again?
>>103084075Base model or Instruct? I don't care about instruct as long as base is uncucked. l3.1 and qwen2.5 have garbage bases so fuck them.
>>103084136nvidia-smi
>>103084150thanks buddy
>>103082696fuck off Sam
>>103084060polchuds should be permanently banned off this site.
>>103084137Just input
>>103083847>>103084178Hi sama. Do you fell the AGI? Still upset about regulatory capture failing? Will Trump fuck you over if he wins? Of course he will! Elon will be winning non-stop once he is in power. XAI will be the standard, not ChatGPT. How does that make you feel sama? Wanna cry? Wanna spam? Wanna sneed? Oh wait, you can't sneed, totally forgot about it.
>>103084111nemo 12b is really good for not safe content and its base model is less censored than qwen
>>103084178>Faggot who wants no-no words removed from LLM lexicon also wants to silence anyone who disagreesPottery
>>103084372Oh so now we are le based and redpilled rightoids here, nice LARP bro!
>>103084416Answer this sama >>103084243
>>103084441Take your meds bro, you are hallucinating things now.
>>103077338fuck a miku, choke a miku, roundhouse plap a miku
>>103084484miku execution by hanging
>>103084372NTA but I'm not interested in American culture war bullshit and /g/ would improve dramatically if the mods did their job and actually enforced the rules that already exist.
how do i expose my koboldcpp api to the internet without using the cloudflare tunnel option?it has to be a static link so i can hardcode it into my softwareif there's a way to do this with other backends that's also fine, but i enjoy the token count option you get with the koboldcpp api
>>103084537No-ip with port forwarding, ngrok tunel.Assuming you don't have a static ip.
>>103084562i have a static ip and forwarding worked
>>103084625Yeah, the port forwarding (most likely) is necessary with a static ip too.
>>103084666this is an epiphany of how networking works to me
o1 signals an end to "AI equality"."America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coke." - WarholThis is true of GPT models, but not o1https://x.com/DavidSKrueger/status/1852818742650282431
>>103084484We will always be loved by Miku.
>>103084075Bitnet isn't real, stop huffing copium already you easily impressionable cucks
>>103084644What's that white stuff
>>103084708>Super grok election modelIt's happening
>>103084644
>>103084699I feel that.I'm not a big network guy. Everything I know I learned by tinkering.
>>103084903Probably going to be 1T, with no GQA, so you need multiple clusters to run it at more than 2K context.
this is an uncannily realistic self-portrait created by x grok agi
NEW CHINESE MODEL "hunyuan-standard-256k" SPOTTED ON LMARENA! 256k context? Big if true. Significant if open-weights.
>>103085140256K claimed context, that means 50K actual context, not bad.
>>103085140are we back?
>>103085140all the context in the world doesn't help for RP as long as LLMs are still utterly terrible at writing anything that isn't a self-contained scene.
Bros I don't get it. Sometimes when I start text-generation-webui with Rocinate-12B I get around 20t/s on my 1080ti. Other times I start it and I get around 4t/s.I offload all 41 layers to my gpu. 9.7/11.2GB vram is in use so I'm not overloading the VRAM. I have it set to use 12 threads since I have a 6 core cpu.Once when I reset the thread count it magically went back to 20t/s, but it won't work again no matter how much I try. I'm using the exact same prompt, settings, and even the same other programs open on my desktop for each test.Sometimes I start my PC in the morning and it's magically fast until I reboot it then it's slow again. Exact same FUCKING SETTINGS. How the fuck can I track down what's taking 80% of my t/s?
>>103085140quick google search:>Proprietary model>launched back in MarchMight be resurfacing because maybe they intend to make it open weights but who knows.
Sasuga retards. /lmg/ is now worse than /aicg/, still can't believe it. Kill yourselves faggots.
>>103085234Still upset, sAlty Sam? Not feeling the AGI? Seethe harder and maybe, just maybe, Fuhrer Trump will show some mercy.
>>103085281Weird obsession with sam altman, must be your gay urges kicking in.
>>103085310You aren't fooling anyone, sama. You'll be locked up together with other big tech communists.#TND #MAGA2024 #WWG1WGA #TheStormIsHere #Trump
>>103085374Go back to your polskin containment board, you are not welcome here.
>>103085230Thanks for taking part in this achievement.
>/aicg/ is just shitposts about proxies, keys, and which cloud model is shittier>/lmg/ is just deadGrim. You would think the image gen threads might be a bit better considering all the new toys they're getting but it's a dumpsterfire or also dead in those generals too.I blame blackrock and nipmoot.
I, for one, blame the sloptuners for not making their datasets 100% open.I hate people who chase clout instead of wanting the better of all.That's why we don't have nice things.
Anyone experience reduced quality with KV-cache quantization? Honestly, I can't tell any difference in responses after turning it on - but much more free VRAM. Pretty nice.
>>103085514It's because the mods let you shit up all the AI threads with impunity so people have just stopped bothering to show up.
Hi /v/.
>>103085539I did test it some ages ago and noticed it had issues recalling things from context. Don't know if it's better nowadays.
>>103085221I have a similar problem.I don't have a solution :(Run a very small model, and look at your cuda usage.Then run your usual model and look at your cuda usage.For me:- very small model: 90% cuda usage.- usual model I want to run: 50%. Sometimes 60% if I kill ollama and restart. One time 80%.My guess is that some of the vram is being used by the OS for something.Loading in a huge model that takes up all your vram,and then loading in the model you want to use immediately after (which unloads the huge model)sometimes helps.My system only has the one gpu in it.No integrated graphics.To see if makes any difference,I might try installing a cheap card for windows to use,so that my ai s/w can use my nice card without interference.
>>103079112You went that long in a chat with rocinante? Which one are you using? 1.1? With what settings? I had bad luck with it.
>>103085140Is it slopped tho
any good model that will take my README.md and fix grammar and style? up to 13B.
>>103085785See >>103085487
>>103084060Possibly an interesting way of stopping ai assistants from scraping your page ?
For some reason, llama.cpp only seems able to use 75% of my VRAM. Is that the intended behavior?
>>103085893llama.cpp itself does not determine how much VRAM to use.It relies on the user to specify the number of layers to load into VRAM.However, koboldcpp and ollama (and probably more downstream projects) try to estimate how many layers will fit automatically.These estimates are typically poor and leave a lot of VRAM unused.
>>103085893Are you using 25% of it to have four panoramic displays surrounding your battlestation?
no one has managed to make a finetune of the new mistral small yet that actually feels like a siginificantly changed modelit seems to be very belligerent, resistant to being altered trainingI understand the temptation to say "that's just what Mistral models are like" but this one is uniquely frozen even for Mistral imo. like Behemoth actually feels significantly different from normal Largestral. While I have yet to use a Small tune that doesn't still feel like the same model
>>103085637No Miku. You're not allowed to crush my balls.
>>103086037*altered by training
>>103086037Skill issue
>>103085917I'm using llama.cpp itself rather than a downstream project. I'm manually telling it to offload all the layers to my GPUs. I have 24gb + 12gb of VRAM between my GPUs, but attempting to load a model that's larger than ~18GB throws an error about not having enough memory>>103085938No, I have my monitor plugged directly into the motherboard, so I think that's using the integrated graphics.
It's happening... eventually...
>>103086087Sounds worse than maskgct or fish-speech https://x.com/reach_vb/status/1853475883706614232
>>103086087based gg waiting for a true multimodal and not a bullshit adapter implementation
>>103085893>>103086075There's 3 things that use your vram, number of layers in vram, context size, and prompt processing batch size.Try playing around with all three one at a time.
>>103086075Unless you are manually setting a tensor split it should distribute the model correctly automatically.Are you also taking the memory for context into account?
>>103086117I don't care as long as it doesn't need python. I'm using piper on a vm and while it works perfectly, it's clunky. I want ggml-based tts.
>>103086117https://x.com/reach_vb/status/1853486414798733314
Its been over a month and there still is no vision support for llama 3.2 on llama.cpp. Also, there seems to be no work being done to make ministral run properly at long contexts.Should I just give up on llama.cpp and learn how to use vllm or something?
>>103082877Flux can do that, or a computer-use llm might be able to use stable diffusion for you
>>103086427>Should I just give up on llama.cpp and learn how to use vllm or something?Yes. Install vllm or something and use it.
>>103086427>Should I just give up on llama.cpp and learn how to use vllm or somethingYou only have yourself to blame for not doing that yet.
>>103086460>>103086468Yeah. I guess you're right. I've been spoiled too much by ooba/kobold. I really don't want to have to set up vllm but I may as well get used to it now.
Refugee discord when?
>>103086427You are already on troonix so it doesn't matter troon.
>>103086537No, please, no! I'm too old to get groomed!
>>103086552What?
>>103086561vllm needs troonix
>>103086571It's called GNU/Linux.
>>103084531>improve dramaticallyJust like LLM's. I love it when companies remove all wrongthink from my LLM's.
>>103086537We already have a discord lmao
>>103086586Good morning saaar!
>>103086552>linux badGet out.
>>103086586>>103086621It always starts innocently. You want to run an llm loader or emulate some switch. And then before you know it your twink boss fires you for being a harassing "lesbian".
>>103086621Follow your own advice bro, get out and start new daily dilat- ahem, debugging session with your server oriented OS.
>>103086552>>103086571Seek help, you're mentally ill
>“During final testing, Haiku surpassed Claude 3 Opus, our previous flagship model, on many benchmarks — at a fraction of the cost. As a result, we’ve increased pricing for Claude 3.5 Haiku to reflect its increase in intelligence,” Anthropic wrote in a post on X.Fucking Jews
>>103086739How many parameters is Haiku supposed to have? God, please someone leak it.
>>103086739*Fucking Americans
What options are available for training a voice generator on given samples? I want to give a model some .mp3 samples and then generate speech from text. Can't find anything on it.
i feel compelled to tell you that i'm not dead nor she-mr. z
>>103086037Have you tried SorcererLM
>>103086739They will charge whatever the market is willing to pay
>>103086834is there a difference?
>>103085595>My guess is that some of the vram is being used by the OS for something.It is but I'm always using small models quantized to 4_k_m so there's plenty of room to fit in my VRAM. Current usage is 9805MiB / 11264MiB with only around 1-1.5GB taken by the os.I checked the box for no_offload_kqv and it sped things up quite a bit for a while, but now reloading the model with it checked or unchecked is still slow. It's just strange because there's no difference in debug output between when it's fast and it's slow, it's exactly the same but 5x slower for no discernable reason.This bug and the bugs I've had with AUTOMATIC1111 being slow are the main reasons I've just not played around with AI models for a while. Shit just never works long enough to really get into it.
>>103086117About fish-speech https://x.com/cocktailpeanut/status/1853512204118540625 It also small on vram, maskgct eats up to 40gb if you send it wall of text.
>>103086117Miss me with your shit. GPT-SoVITS is already the best there is by a mile
Is there a good way to progress a chat after a longer session? I find that after 20-30 minutes the character just locks and will repeat itself. The normal temperature increases, repeat penalty stuff doesn't seems to work. I am thinking RAG with a generic chapter 2 character.
>>103086075my stupid technique has worked so far. Although it limits to 24G1. build llama-server from scratch (not sure if this helped for memory, but I was missing features)2. CUDA_VISIBLE_DEVICES=1 (or your target card)3. --split-mode none4. --gpu-layers to a stupid high number and let god sort it out. I use 300. I managed to squeeze magnum-v4-27b-Q6_K_L.gguf in my 3090. It is 22G
>>103087592They tend to do that when they realize you are a newfag.
>>103087678>doesn't know answer>activates you stupid responseI have been here an entire 2 days after I learned about this thread on reddit. I know that is the pattern here. Your mean homesexual names won't deter me.
>>103087592You've probably hit your context limit.Increase the context and retry and if it's suddenly smart again, you know what you're up against.At some point though you'll run out of memory and be forced to concede.You can try to have it summarize, and start a new session with the summary in the document hoping (praying) that it'll make use of it and stay coherent. (Good luck.)
none of you even know what VRAM stands forprotip: it's not what you think it is
>>103087377With a reasonable quants fish is under 2GB
>>103087746thanks. I am way past my context limit, like 4 to 5 times. I have had my limit at 16K and it is alright looping past 16K and 32K and then starts locking at 48K and beyond. I will start playing with summary. I have seen some stuff about rope, but not sure where to start with that.
>>103087749benis
>>103087749Vagina RAM.
>>103087844Don't bother. Depending on what model you're using it, it might be best to do a summary of events/character actions/feelings/whatever to cut down on token usage and look at using a larger model depending on which one you're starting with.
>>103087844I remember the first time I got a really good story going. There was a macguffin in the beginning that the AI's RP character was really on about, and after what seemed like a really neat, long scene, I mention the object and the AI acts like it's something new.I felt like the character died.
>>103078583>Not going to try this
>>103084484migu
mikusex :3
>>103087902>do a summary of events/character actions/feelings/whatever to cut down on token usageI wonder if I could build something that monitors chat logs and changes the system prompt after a certain amount. I know my character cards tend to be long, it seems necessary though. A trim at 4K could help a bunch. >>103087912it sucks a lot. I will periodically mention things just to keep them in context. "Is that jewel still shining strangely {{char}}?" It seems to be working, but this all is smoke and mirrors. It does break immersion a little, but it is better than the hard stop if you exceed the window completely. I tried a resurrection by deleting half the log and loading it again. It didn't feel right and worked very poorly.
Mikulove!
Is there any frontend that will create a new tree element for you if you edit the reply? I use the edit/continue a lot, and there is no way to do this in SillyTavern currently.
>>103088016You can use kobold/silly/other chat UI that shows token count and then just edit the convo by taking out the last X lines, summarizing it, and putting it back in.That's the easy way. SillyTavern tried doing something more complex, but they took it out since it didn't work that well. An alternative more complex thing is doing liek the character card v3 standard implementation, where each character has an accompanying DB/info collection regarding them, and then extending that over time as the conversation develops.https://github.com/kwaroran/character-card-spec-v3But i don't really do rp stuff so this is just what I've found trying to figure out coming up with a story generator.
when will this meme of pretending to be retarded die?
>>103088016I simply started using Mistral Large. It's very slow on my vramlet shit box, but it runs a very long time before it guesses wrong about the story so far.Though at that point it's so long that any summary is unwieldy, too.>>103088016>I wonder if I could build somethingI've had that thought, too. And probably so has everybody else who has spent a weekend looking at Python tutorials. But as >>103088067mentioned, it's probably a lot harder to get right, if it's doable at all, than it seems. So I'm not prioritizing such a project.
>>103084075I will use it only if it's not heavily censored. If they focus too much on removing 'toxicity', then it's useless.
>>103088067I might have a closer look at this spec. Character cards are still the wild west right now. >>103088102>Mistral Largerespect sir. I just can't do it. I would rather waste dozens of hours trying to fix it than wait 30 seconds for a response. >it's probably a lot harder to get rightI think the problem is that SillyTavern and such have to handle all cases. It would probably be very easy to put in a hack. You hard code it for 4K and just don't use models that are under 4K. Projects are rough. I have too many goals and just seem to wander around. I want to fix that TTS/Image gen bug in ST and implement that new TextToVoice thing I saw on hackernews and ..... I just end up fixing things for myself with duck tape. It really sucks. I am also tired of getting my PRs rejected and "re-writen" for no reason outside the maintainer just don't want it.
>>103084111Qwen is censored, but in a different way. Keep in mind the model is Chinese. The Chinese are not infected by identity politics and tend to be openly racist towards blacks, so I would expect a Chinese model to have no problem dropping N-Bombs.Start talking about Taiwan, though, and I bet you'll quickly see the censorship.
>>103088193https://github.com/malfoyslastname/character-card-spec-v2Use v2 to start with.
>>103088238>The Chinese are not infected by identity politics>>102447861 >Oh I should mention, qwen VL will NEVER mention a person's gender. Even when directly instructed to do so, as I did in my example. It's always "person", "they", "them". And it will never mention anything related to NSFW stuff even when given in the tags. I actually can't believe the fucking chinks are doing this gender neutral troon shit now.>>102447836 >In this image, there's a person
Chinese pronouns differ somewhat from English pronouns and those of other Indo-European languages. For instance, there is no differentiation in the spoken language between "he", "she" and "it" (though a written difference was introduced after contact with the West), and pronouns are not inflected to indicate whether they are the subject or object of a sentence. source: https://en.wikipedia.org/wiki/Chinese_pronounsCan you "men" stop looking for the boogy man everywhere. Sometimes shit is just broken.
>>103088416If it has been trained on English, the coarseness of Chinese is no defense.
>>103088416This talks about the English portion of the model thoughever
>>103088238One key distinction is that while Llama often incorporates discussions on diversity, inclusivity, and consent and uses they/them pronouns, Qwen does not specifically insert the topic of Taiwan into its narratives.
>>103088436this >>103088445 is censorship. It makes sense that it is censorship. They don't need a defense about screwing up or even just being lazy about pronouns they don't give a shit about. There are plenty of scary things. You don't need to claim everything is. >>103088441yes. ESL people (not the /pol/ version of ESL) may have issues training a english model. It is more than just loading a dataset and machine goes whirrrrrrrrr. The humans involved will shape how it goes and get things wrong.
How do I get a model to write more than a few lines in a role-play response? I've had this issue since MythoMax despite playing with prompts and params. I'm currently on Mistral-Nemo-12B-Instruct.
>>103088565Tell it to write longer replies.Aside from that, each model seems to have its own idea of how long an RP response should be, from a few lines to hold my beer while I write a whole fucking novel, you don't mind that I write your character, too, right, of course not.
>>103083824bump
>>103088580I tell it to write longer in both the prompt and author's note. It writes like two lines of pretty good stuff and then that's it. Even using the continue, the model will ask me in OOC what to do next cuz it's out of ideas until I drive the story forward.>>103088609Mixing GPU architectures like that can cause headaches.
>>103088609No, your inference speeds will drop to the slowest card in use.
>>103088636Sounds like you've found the limits of the model, at least in the "aware it's doing an RP" mode. You might be able to assert that it IS the character, but I have a feeling that whatever you do you'll be able to feel whatever template it's settled on for filling out responses.
>>103088666Is there an RP finetune for Mistral-Nemo-12B-Instruct? I tried Lumimaid and DoryV2 but Nemo is the best I've tried.
>>103088688Anotheranon might have a suggestion. I don't stop short of 70B.
>>103088710Favorite 70b?
>>103088067a competing v3 spec lolhttps://github.com/Bronya-Rand/Prom-Spec-V3/blob/main/Concept.md>Prom V3 takes what already exists in V2 and RisuAI's V3 and adapts it to be easier to read for application developers to implement in their own codebases without the unnecessary bloat of Risu's assets folder
the absolute state....
>>103088731Mist Large is my current go to for anything creative writing (NOT for anything requiring truthiness). I have to quant it down to IQ3 and it's pretty slow, but it seems to be able to sweat it out as far as 16k context. (I have a note of a long run that it collapsed at 20k.) Obviously most 70B's are L3.0 and L3.1 spins. Those it's kinda just shop around till you find something that doesn't spit out refusals barely above a whisper. Most recently I've been playing with that L3.1 Nemotron, and it seemed okay for relatively normie RP, but nothing to write home about.And there's always CR+ I suppose.
>>103088840>I have to quant it down to IQ3 and it's pretty slow, but it seems to be able to sweat it out as far as 16k contexthow much ram + vram do you have?
>>103085514it's pretty funny when the sharty zoomers whine that the thread they shitposted to death is actually indeed dead. Yeah guys you destroyed one of the few decent places to talk about a very niche subject.
>>103085514/g/ is a dumpster anyway
>>103085595>>103087253Ok so I accidentally left it running while I played a little Factorio and it's back to being fast, no reloading the model or changing settings. Power usage seems the same for me but maybe you're right about cuda usage and it's in some kind of sleep state not using all the cores properly.
What is the best model under 12GB for NER?
>>103077338
>>103084075People use gpt4o latest and sonnet3.5 or opus 3.0 and those are pozzed as a motherfuck unless you feed it a 1000 token "You are Clau" jb.
>>103089196A three word prefill is enough to all safety features for Opus unless the key you're using ended up on the Anthropic naughty list and had 'extended safety features' enabled (which usually takes them months of continued abuse to do)
>>103089231This. Prefill is all you need. Even just {{char}}: is enough for Claude.
>>103087990>>103088041loveless migusex
>>10308895212 on the card, 64 system.
>>103089138This image is illegal>Kenzo Fujisue, a member of the Democratic Party of Japan attempted to obtain the rights to use the image of Hatsune Miku in his run for a seat in the Japanese House of Councillors. His hope was that the use of her image would appeal to younger voters. Crypton declined the use of Hatsune Miku’s image for political purposes.
https://x.com/si_pbc/status/1853184307063660723Seems like local 4o is coming sooner rather than later.
>>103088416Your entire post is basically wrong about everything, but I'm too lazy to elaborate. Pipe down midwit
>>103089631>firstNyo
>>103089663y-you too
Just think about all those dozens of open source cutting edge models that are currently on hold until the elections are over. In just a few days the open LLM sphere will look so very different to what we have now. By the end of the year talking about "LLaMA-405B", "Mistral Large2", "Qwen2.5-72B" will feel like talking about Pygmalion-6B right now. Models will be so much better.
>>103089976lolEven if those revolutionary models did exist, they would not be so obvious as to release them right after the elections. Maybe a month or two later.
It's called NoobAI but I have no idea how to use it.
>>103089993nta. And still, retards will point at a model that just happens to be released after elections and say "see? i told you!!", even if no model is released for the next 12 months.My prediction for the future is>In the near future, things will keep happening.
>>103089993That's why I said "by the end of the year". It'll start subtle by minor players who want to get a head start before this new golden age of LLMs truly starts. There will be groundbreaking stuff amongst these november releases already that will be better than what we have right now + models that truly make use of bitnet and all those other revolutionary improvements that they've been saving. However,it won't be comparable to the insane new models which we'll get at the turn of the year. November: Serious improvements, first true bitnet models, etcJanuary: "the next step", as significant as pre-/post-llama open source if not more
>>103090059nigger
>>103090072bitch
does llamacpp support text completion in sillytavern?i keep getting "dry_sequence_breakers must be a non-empty array of strings" when trying to use it, no issues with chat completion api
>>103090162chat completion and text completion in st do the exact same thing anyway
When will they develop an architecture that is capable of remembering everything that is fed to it? Trying to give current models reasoning is like trying to give insects reasoning. There is no actual reasoning going on in there, it is just the output is improving when certain certain input, AKA stimulus. No reasoning can ever happen until the model has an actual honest to god memory that it can rely upon.
>>103089138Generating paper waste with Miku
>>103090175whys it complaining about the dry sequence breakers being empty when koboldcpp doesnt care then? its gotta have something to do with llamacpp then because they're the same prompts
>>103090162Yeah. I to wonder what "dry_sequence_breakers must be a non-empty array of strings" means. it's so mysterious.Either disable DRY or put some shit on that list.>>103090207Either because it's got some defaults already or because it's disable by default.>its gotta have something to do with llamacpp then because they're the same promptsNo. It's you.
I build a tool at work which summarized information across several of our systems to help management get a unified view on particular situations. Problem is I used Ollama, and now they want me to build out an API to extend these capabilities to other systems within the company. This use case calls for concurrent asynchronous inference of several models. It will all be served on prem as we do have the hardware for it, I just don’t know of the backend framework for serving a scalable LLM endpoint. Any suggestions? Preferably something close to ollama and/or dockerized
Using koboldcpp (vulkan) and Sillytavern, on certain character cards I run into >processing prompt [BLAS] like every few messages for some reason despite being way under total context limit.I'm using Mistral Nemo Instruct 2407 Q5 K M on a 12gb GPU. Is there anything about a character card, or a topic I could be exploring that triggers this more often than usual? Typically this doesn't happen really ever until I hit total context limit and then it will occasionally do it along with context shifting but just on certain cards I'm constantly running into it.
>>103090250vLLM>Preferably something close to ollama and/or dockerizedStop that.
>>103090258there's some random function in ST, also any reference to {{user}} in character defs, system prompt etc. can be a problemdoes it happen on swipes?
>>103090258{{char} in sys prompt during group chat, or triggering LB?
>Still no good Japanese to English translator LLM outside of paid services full of censorshipFuck man, Llama 3.1 405 might be the best, but it fucking blows compared to gemini pro 2 and 4o
>>103090059Sorry but you also said>In just a few days the open LLM sphere will look so very different to what we have nowSo there better be a big release in a couple of days."By the end of the year blah blah blah" doesn't invalidate that sentence or change it.
>>103090276>does it happen on swipes?Hm not sure but I don't think i've run into that.thanks I'll look through the card for those.>>103090282Not doing group chat. What's LB? I'm relatively new to this stuff.
>>103090307Lorebook / world info, which can have dynamic activation. Some cards have one embedded.
>>103090288So you are saying that better models than what we have now + bitnet and other improvements won't make the state of models look different than what we have now, even if it's nothing compared to the jump we will make by the turn of the year? I guess my expectations for the future are more humble than yours. To me, even a reasonable improvement + the first true implementation of things like bitnet in big releases qualify as a satisfactory step this month. More will come later, as mentioned.
>>103090330Oh gotcha, nah I stopped using those because of that and this one doesn't have an embedded one. Good idea though.
>>103090268>vllmThis popped up quite a bit in my research. I’ll take a look
>bitnet
>>103090284How much time did you waste not learning japanese?
>>103090359aint going anywhere near that malware
>>103090162use the staging branch of sillytavern. easy fix to google.on the other hand, i'm pretty sure sillytavern fucked up prompt caching for llama.cpp. it keeps reprocessing the whole prompt despite nothing changing. i've only found one vague reddit post about the issue. very sad.
>>103090377>on the other hand, i'm pretty sure sillytavern fucked up prompt caching for llama.cppDid you inspect the requests to make sure that the cache_prompt variable is being included?
>>103090377that fixed it for me, thanks dudewhat reddit thread did you find it on?>>103090405yeah i checked the output in the sillytavern logs and it had this:cache_prompt: truedry_sequence_breakers: [ '\n', ':', '"', '*' ]
>>103090368I've been learning, actually. Using AI to basically act as a 'native' speaker also really helps when you have no one else to get help from. But this is only limited to very polite Japanese and doesn't me with a rougher tone. Plus, it's better having something else do the grunt work.
>>103090412>>103090412>>103090412
>>103090284Qwen 2.5 32B and 72B is great though.