/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103312983 & >>103298520►News>(11/26) OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc>(11/26) Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT>(11/25) Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux>(11/25) Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455>(11/22) LTX-Video: Real-time video generation on a single 4090: https://github.com/Lightricks/LTX-Video►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/tldrhowtoquant►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>103312983--Speculative decoding and its usage with various models:>103313328 >103313336 >103313340 >103313551 >103313365 >103313440 >103313460 >103313463 >103313598 >103313658 >103313693--Tulu model impressions and discussion:>103313747 >103313769 >103313787 >103313802 >103313822 >103313853 >103313890 >103313917 >103313927 >103313950 >103313989--Sentient: local personal companion with graph memory and agentic integrations:>103313310 >103313339 >103313387 >103313484--Recapbot test results and feedback:>103315415 >103315532 >103315611--OLMo discussion: new arch, 4k ctx, and Reddit data:>103315697 >103315710 >103316008 >103315750 >103315847 >103315893 >103316010 >103316058--OLMo 2 models and the state of open-source AI:>103316073 >103316150 >103316245 >103316283--LoRA's limitations and potential issues with fine-tuning:>103313076 >103313114 >103313162 >103313220 >103313244 >103313313--Discussion of kobold and booba alternatives, dev pace, and feature comparisons:>103313177 >103313243 >103316052 >103313248 >103313345 >103313315--Common failures and limitations of coding models:>103316427 >103316470 >103316488 >103316513 >103316528 >103316524--Choosing a draft model for speculative decoding with llama.cpp:>103314138 >103314187 >103314222 >103314611 >103314742 >103314761 >103316743 >103316710 >103316739 >103314793 >103315098--Autoround quantization and its performance compared to regular quant methods:>103313507 >103313718--Anons discuss language model performance and limitations, criticizing the focus on benchmarks and "meme marks":>103313710 >103313835 >103316791 >103314144 >103314165 >103314228 >103315010 >103315109 >103315181 >103315206 >103315220--Miku (free space):>103313053 >103313132 >103313312 >103314097 >103314884 >103315109 >103316701 >103316754►Recent Highlight Posts from the Previous Thread: >>103312989Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
Okay, if Tulu is so good. Why cant I find any mention of it on reddit?
>>103317977Because its not drummer shilling his model?
>>103317977https://www.reddit.com/r/LocalLLaMA/comments/1gwl339/t%C3%BClu_3_a_set_of_stateoftheart_instruct_models/https://www.reddit.com/r/LocalLLaMA/comments/1gz04zu/teleut_7b_tulu_3_sft_replication_on_qwen_25/
Are you faggots doing that trolling again where you pretend that the model is good? Like with nemotron 70b?
>>103318079>nemotron>trollingYour either a retard or are the one trolling here. Nemotron is great, best until tulu for creative uses (and also the best 70B at coding but got surpassed by qwen2.5 32B coder)
>>103318090Got it.
>>103318079When are you idiots gonna realize it's a skill issue on your end. It's not that people are shilling models that are bad, it's that you're too stupid to use them properly. Retard in garbage out.
>>103318112>idiotsPretty sure its just one guy. He even tried to argue against using a model's correct formatting.
Best inference engine for distributed compute GO
>>103318153no
>>103318079I tried it a bit but only at Q3 cause VRAMlet. The prose at least seems quite better than the usual llamaslop
>>103318079It's more that they are dumb cavemen and ESLs who genuinely don't notice when the model says retarded or illogical shit after their dicks get hard.
>>103318188Unlike other llama tunes / mistral large it got complicated positions with a non human character correct and unlike qwen2.5 it is not dry, undetailed sex. And unlike any of those (mistral large is ok at it) tulu is creative and pushes the plot forward. I think your just a troll who has never used it, otherwise post this apparent logical error.
https://files.catbox.moe/ge639f.jpg
>>103318224>your just a trollESL confirmed.
>>103318236that's hot but can you make her flatter
>>103318239you're just a troll and also a grammar nazibetter?
>>103318153vllm
>>103318236that's hot but can you make the guy fatter
>>103318236That can't be good for her back
>>103318366uoh, nice
>>103318366peak
>new thing drops>mikutroons still shitting the thread
>>103318449post teto then, faggotyou won't
>>103318449I like them cuz they make you seethe.
>>103318236>>103318366>>103318460>>>/g/ldg
>>103318462>I am a mentally ill troon cause it makes you seethe
>>103318475Wow. That's a creative insult. Well done. I tip my hat in your general direction.
>>103318449The only one I see shitting here is you
When generating text I get something like 3 t/s but when generating code I'm seeing from 3.25 to 3.6 t/sThis draft thing is like the second coming of miqu
>>103318090>Nemotron is great, best until tulu for creative uses (and also the best 70B at coding but got surpassed by qwen2.5 32B coder)Qwen let me down on non-trivial stuff that L3 tunes take a decent shot at.
>>103318513That is because code has less valid options so the draft model should be correct more often.
>>103318513For maximum speed, don't forget to disable the repetition penalty, use temperature 0 and set "--min-draft 1"
>>103318536>use temperature 0At super low temperature I've had some models make factual mistakes that they don't make with some temp (0.2ish) and a savage Top-P (like, 1).
>>103318525and qwen coder got a lot of more complicated stuff L3 tunes failed at for me. Try deepseek R1 as well, that will be next level if they ever release it.
>>103318562At some point I hope to put together a more comprehensive programming test set, right now it's just some random Python demo shit and a particularly tricky Java question. Most L3's make the mistake but correct it correctly when called on the error. A few, including Nemotron, caught the problem and described it before generating the code suggestion. Coder 32B doubled down on being wrong by offering a fix that made the mistake worse.
>>103318559Anon... I...
>>103318612I too started replying to newfriend over there several times but decided to just move along.
>>103318079Its one or two at max, both might samefag at the same time, hard to tell, /g/ needs IDs badly.
>>103318719Nah, its the same guy. He calls literally everyone a shill any time some new model gets recommended and then starts claiming their shit without any logs to back it up.
>>103318731you have misunderstood the post you were replying to, it was about the opposite of that guy
Why can't the new Mistral Large do punctuation at all? It just keeps messing up ** or quotation marks for no reason. Yes, I have adjusted the prompt format.
>>103318777Using DRY and/or XTC?
>>103318559https://artefact2.github.io/llm-sampling/
>>103318777That sounds like a tokenizer config problem, it doesn't do that for me.
>>103318837There he is.
>>103318837What's with the accents? Did they finally put a filter on your spam?
>>103318837Kek keep it up bro
>>103318785Just Temp 1 and min-p 0.03>>103318803I tried two different quants. It just keeps doing it.
R1 seems impressive.
>>103319043Post the second part NIGGER
>>103319043>>103319060It is certainly a creative choice but it works.
>>103319074this is illegal
>>103319085It BTFOs O1. Openai is dead if they release this.
>>103319074Deepseek won. Let's see Meta's strawberry.
cydonia-22b-v1.3-q5_k_s.gguf runs great on my computer. What is another ~22B q5 model but built for programming? I need something that can assist me quickly with code, but locally....GPT 4 aint bad, but I wonder if leaving out all the bullshit and training the model just on the software development process is enough to keep it compact.
Star Attention: Efficient LLM Inference over Long Sequenceshttps://arxiv.org/abs/2411.17116>Inference with Transformer-based Large Language Models (LLMs) on long sequences is both costly and slow due to the quadratic complexity of the self-attention mechanism. We introduce Star Attention, a two-phase block-sparse approximation that improves computational efficiency by sharding attention across multiple hosts while minimizing communication overhead. In the first phase, the context is processed using blockwise-local attention across hosts, in parallel. In the second phase, query and response tokens attend to all prior cached tokens through sequence-global attention. Star Attention integrates seamlessly with most Transformer-based LLMs trained with global attention, reducing memory requirements and inference time by up to 11x while preserving 95-100% of accuracy.https://github.com/NVIDIA/Star-AttentionFrom Nvidia. improvements over ring attention mostly in speed.
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokenshttps://arxiv.org/abs/2411.17691>We reveal that low-bit quantization favors undertrained large language models (LLMs) by observing that models with larger sizes or fewer training tokens experience less quantization-induced degradation (QiD) when applying low-bit quantization, whereas smaller models with extensive training tokens suffer significant QiD. To gain deeper insights into this trend, we study over 1500 quantized LLM checkpoints of various sizes and at different training levels (undertrained or fully trained) in a controlled setting, deriving scaling laws for understanding the relationship between QiD and factors such as the number of training tokens, model size and bit width. With the derived scaling laws, we propose a novel perspective that we can use QiD to measure an LLM's training levels and determine the number of training tokens required for fully training LLMs of various sizes. Moreover, we use the scaling laws to predict the quantization performance of different-sized LLMs trained with 100 trillion tokens. Our projection shows that the low-bit quantization performance of future models, which are expected to be trained with over 100 trillion tokens, may NOT be desirable. This poses a potential challenge for low-bit quantization in the future and highlights the need for awareness of a model's training level when evaluating low-bit quantization research.mostly just putting the work in to prove what we all already knew. slop tokens are really going to have to be more vigorously deleted from datasets
Has no one made progress in maintaining the characters personality past context limit? It's annoying.
tf
Tulu does the thing all assistantslop models do where it favors a SFW word in logit probabilities even when a NSFW word would be the more obvious choice.Like if you give it "I'm going to milk all the..." in the middle of an obviously sexual context, Tulu's most probable token for the next word is 'stress' rather than 'cum'. It shies away from the smut word, substituting it with a technically-plausible but unlikely SFW one. That's corpo model behaviour, no smut tune would do that. This is unusable for coomers, regardless of what you guys say.
>>103319228Hi Drummer
>>103319228Use the authors note I posted last thread. Tulu is more filthy than any other model out there and not a shiver to be seen. And unlike said "smut tunes" its not retarded
>>103319268Meds. Drummer unironically bought an ad, he doesn't shill from the shadows like that.
>>103319277This screenshot literally proves my point, are you drunk? It doesn't use any crude smut slang terms at all, it's all purple prose and euphemisms like a romance novel.
>>103319220>added another 9 out of nowhere>couldn't calclulate 99-9Was it o1-mini?
>>103319277>Filthy>Fill me with your seedLol
Pushing the Limits of Large Language Model Quantization via the Linearity Theoremhttps://arxiv.org/abs/2411.17525>Quantizing large language models has become a standard way to reduce their memory and computational costs. Typically, existing methods focus on breaking down the problem into individual layer-wise sub-problems, and minimizing per-layer error, measured via various metrics. Yet, this approach currently lacks theoretical justification and the metrics employed may be sub-optimal. In this paper, we present a "linearity theorem" establishing a direct relationship between the layer-wise l2 reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, which outperforms all prior data-free approaches such as the extremely popular NF4 quantized format, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels which match a given compression constraint in the medium-bitwidth regime, obtained by reduction to dynamic programming. On the practical side, we demonstrate improved accuracy-compression trade-offs on Llama-3.1 and 3.2-family models, as well as on Qwen-family models. Further, we show that our method can be efficiently supported in terms of GPU kernels at various batch sizes, advancing both data-free and non-uniform quantization for LLMs.actually compares to quip# and QTIP. lower PPL than quip# but QTIP is better. faster inferencing than both. iirc quip#/QTIP take forever to actually quantize but didn't see anything in this paper at a quick glance for how long it takes either. only some pseudocode no github but hey new day new quant method. https://github.com/BlackSamorezgit of one of the main authors. hidden repo worked on recently so probably will be open sourced there
>>103319292No, this is just plain old GPT-4o
What Tulu is best Tulu? Is 70B any good?
>>103319303Wouldn't it be more fair to compare it with CoT model?
>>103319295This is with the theme of MLP so that prob hampers the dirty talking there. I explicitly told it to use language fitting of the universe as can be seen by her saying buck instead of fuck. Heres something else.>>103319318Best RP model atm, not sure about sfw capabilities, benchmarks say its close to qwen2.5 72B
>>103319333Okay that's still pretty purple but I concede it's closer since it actually said 'pussy' and 'clit' this time instead of some SFW euphemism.
>>103319327I mean, yeah, but I didn't expect 4o to fail this hard.
>>103319338Here, I told it to be vulgar.
>>103319277>>103319333Thanks for the screenshots, appreciated. Almost nobody shows anything anymore.Feels a bit like qwen to be honest. Probably the llama 3.1 base.I really hope we can models that dont have fucked up context and this horrible stretching out of a simple sentence.Like get to the point. No wonder aicg fags are 80%+ femoids. They probably love this shit.>ILLTELLYOUEXACTLYHOWIWANTYOUHOWINEEDYOUTOTAKECONTROLTOLETGOCOMPLETELY.ILLWATCHYOULOOSEYOURSELFINPLEASEUREYOURBODYSHUDDERINGYOUREYESGLAZEDWITHLUSTANDWHENYOUCANTTAKEITANYMOREWHENYOURONTHEEDGEILLPULLYOUCLOSE!Imagine being a undervolting 30gb vramlet and having to watch this shit roll in at low t/s.But they almost all have this problem, there a fundamental problems. In b4 prompt skill.
>you can use the shitty aya 8B model as a draft model for command-r-plus>it's so unbelievably retarded due to the multilingual stuff that it barely manages to predict anythingShould've figured...
>>103319354So this is how I should be talking to women huh.
tulu 70b nala test. dunno if anyone else has done this yet. haven't been here often recently.
>>103319378>femoids>on 4cückCute retard.
how much are vramlets missing out? I have a 12gb 3060 with 16gb ram and there are obviously models that couldn't possible load on my current system. Those 35gb, 40gb models, how much more 'fun' are they compared to the 10gb I'm forced to run?
>>103319392>ahh ahh mistress in quotation marksruined
a-at least she's accommodating
>>103319395femoids and faggots over there, its pretty obvious.>>103319397Not much to be honest. I would pay 10$ crypto on openrouter and try them out first if you like those.I got myself a p40 because I wanted to try 70b models.What you are missing out on is higher context for nemo and mistral. (They slip into repetition at higher context anyway)Even the 70b tunes are all positivity sloped, I suppose because they train on (old) gpt outputs and thats difficult to get out.In general it feels the higher B the more assistant.I tried mistral large on openrouter, thats probably what everybody wants. But you need like 3 3090 to run that.Running at Q2XX is a crime, doesnt follow format etc. anymore.Hope we get a scale of nemo to like 30b. Its smart for its size and really the only model thats fun. Mistral small is smarter but also more assistant unfortunately. Some people swear on the gemma 27b magnum tune, but I dont see much difference to mistral small with only 8k context.
>>103319463>Not much to be honest.This is only true if you do the most simple of shit. Try anything more complicated than a RP with a humanoid character and you will see the differences then, especially when you get to cards including game systems.
>>103319497I found that for more complex cards nemo shits the bed and you need mistral small.Mistral small also reliably keeps track of stats, which is really cool.I just dont like using the bigger models because the writing is so bad. Its just not fun to use.And you still get retardation anyway. Like 72b magnum has stuff like thinking you can get pregnant from anal, trips up with size differences etc.Bigger models seem to do cards with multiple characters better though.
>>103319530>the writing is so badMagnum fixed it but made it retarded. Nemotron fixed it without, tulu fixed it further imo.
>>103319411>as slutty as possible = sass and sultriness you cannot escape llama3's positivity bias
>>103319567>>103319354
>>103319567Yeah with corpo/assistantslopped models the positivity bias seems to make them interpret "be slutty" as meaning "be a sassy girlboss". It's always kind of belitting and haughty and vaguely dommy and contemptuous. Cuck/femdom enjoyers probably like it I guess.
>>103319567true. talking in a husky voice.
>>103319585What does slutty mean to you, lol? Another swipe had it make each "sort" a "stroke" and talking about a dripping pussy.
>>103319567>>103319585
>>103319630nemo once again proves that big models are soulless memes
>>103319642That is tulu 70B
>>103319630>swap them like a woman swapping partner at a wild party.>like a seasoned player sorting through her conquestsuhhhh yeah, its slop time baby. *anon whips out his dick again*
>>103319576Well it's better than I expected, at least it has 'obscene' words in it. It's lacking the bite of '''dark rp''' but it might be possible to prompt around it.
Good night, lmg
>>103319784rape u soon
Has anyone managed to get Qwen2-VL-Flux to work?It seems like it could be a great way to improve flux, but 48 GB of VRAM is quite heavy.
>>103319826>Lets make flux even slower!
>>103319784Omggg its migu1!1
>>103319844i will not rape that
>>103319826Where is it? I do want to give it a try
>>103319354What depth were you using your AN on?
>>103319884the default, 4
>>103319277What are your regular sampler settings and context? I’m skeptical because I just tried it and got shivers/husky within the third response but maybe mine are off.
>>103319893I normally just do something like 0.05 min p and / or maybe 0.95 / 0.9 top A instead
>>103319904top ASorry, meant top P
>>103319784Good night tired Miku
>>103319431>you can make me your personal trainerWaitaminit, just what's the big idea here...?
I saw some posts the other day that brought up what really makes Claude great, which is that it understands subtleties and goes hard into them. If the character is supposed to be a vocaloid fan, it will proactively weave into its response the kind of things that real fans really would actually say, and not in a fake way like in a "hello fellow kids" way that most models do. But what is the way to solve this for open weight models? Tulu 3 seems to show that we can now do something very proactive and it's open source so we can reproduce it, so we just now need pretrained models that are really trained on a ton of real human data from the worst parts of the internet. Then the model will know what they are like, so it can act like them with fine tuning.New Mistral base model when? Ideally 70B.
>>103319431>blurred out wall of text schizophrenia and this is supposed to be impressive?
>>103319971Claude (esp. Opus, Sonnet is very good but it doesn't have big model smell) is also the only model that is often genuinely funny in an original way, I want to know what the secret sauce is. One time it described a clumsy French kiss as "her tongue wriggled in his mouth like a tased eel", which from a Google search seems to be a totally original simile.
>>103319909Oh maybe its my context settings, what are you using for the story prompt? default or llama3?
>>103319991>I want to know what the secret sauce is.Being trained on the entire internet and a fuck ton of books / other stuff. Claude knows obscure stuff only on fanfiction websites and in copyrighted works.
>>103320001Neither, tulu's formatting for instruct, no tags in the context template, just some stuff like "User character:".https://files.catbox.moe/qvn0g3.json
My guess is that Claude is both a huge total param model so it can store all that information about every single thing humans have come up with, but a MoE so it is still fast.
>>103320003You can tell it's the only corpo model that has 4chan in the pretraining data too because it's the only one that can generate plausible-looking 4ch threads rather than something that reads like a redditor's satire of 4chan.
>>103320010I appreciate the help and it's good confirmation that what I was using was correct after popping open the config file. But I was referring to pic rel
>>103320010That json is wrong. It's <|end_of_text|> not <|endoftext|>
>>103320010I also don't think system_suffix should be there. There is no <|eot_id|> in the chat template, period.
>>103320027Also this. More params = more room to "soak" up smaller details from everything ive seen. Its while I'm personally excited for deepseeks next model. 2.5 knows a ton about everything the same as claude but is sadly incredibly dry (and its a giant moe). R1 seems to fix that from what ive been able to play with in getting around the filter on the site. Here is hoping they do end up releasing the weights of the full model.>>103320047>>103320055Woops lol, might explain it not stopping till the max allowed response length.
>change a single character in the last message>lmao time to process the entire context againI'm getting really fucking tired of this bullshit.
>>103320060Also in case you wonder, Tulu does NOT have <|assistant|> or <|user|> or <|system|> as actual tokens. Those literally come through as (e.g.) '.< (16134)', '| (91)', 'assistant (78191)', '| (91)', '>\n (397)'My assumption is they forgot to add them, or the tokenizer is wrong, but the model works so eh, whatevs.
>>103320096Huh... it works so did they train it on that instead of using special tokens for some reason?
>>103320130Yes, I strongly suspect they trained it on that. Happens all the time.
>>103320181>>103320130>>103320096Ahhhhhhhhhhhhhhh
>>103320096>>103320130>>103320181>>103320256Same thing with my models trained on Metherme. I didn't bother adding them.But they used <|assistant|> instead of <|model|> which, from my experience, works wayyy better in L3.1How's Tulu?
>>103320280>How's Tulu?Nemotron 2. Its fixed llama 3.1 but smarter. Feels more like qwen2.5 72B BUT without the positive bias / lack of sexual knowledge
>>103320315Any issues with it so far? Is it worth finetuning on top of?I see a lot of Tulu variants, which one works best for our use cases?
>>103320280>But they used <|assistant|> instead of <|model|> which, from my experience, works wayyy better in L3.1Tulu was trained on the base model, retard
>>103320332Without context it likes doing the claude thing of adding some OOC stuff, but thats generally actually cool to have imo, adds some personality. it quickly quits that with some context / a authors note telling it what to do though.
>>103320332>>103320350Oh and tulu 3 70B instruct is the only one ive used. The "final" one I guess.
>>103320333?
>>103319301No offense but there are so many quantization methods being released - Do any of them matter? What happened to SpQR, or SqueezeLM, or RPTQ, or any of this other shit?
>>103319043How many parameters this will have? And why I will need a RIG with 10000 VRAM to run this at 1.3tokens per second?
>>103320332The training recipes and datasets are public, you numpty. AllenAI's whole purpose as an organization is being one of those rare companies that does that and replicating shit like OpenAI before Sam Altman fucked it all up. Even if you can't afford their compute power, you can actually learn a thing or two from them.https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372
>>103320073i have no clue what you're talking about but i use exllamav2. what do you mean by "time to process the entire context again"?
>>103320718>i have no clue what you're talking about but i use exllamav2. what do you mean by "time to process the entire context again"?
Given dataset D, original model X, and resulting LoRA L(X) = X', is it possible to produce/estimate another LoRA L'(X') = X, assuming you have all the above elements?The inverted LoRA would, if applied to the model with the original LoRA merged into it, result in (approximately) the original model.I know mergekit supports LoRA extraction, but we have more information to work with here, and I wonder if it makes a difference.
>>103320712Forgot to link the code.https://github.com/allenai/open-instructAnd just FYI, there is no easy uncensored Tulu 3 you can just fine tune on top, the initial SFT already has safety datasets baked into the training regime, like pic related, so it's already braindamaged out of the gate. You can see a full list here.https://huggingface.co/datasets/allenai/tulu-3-sft-mixture
>>103320718do you know what 'context' is in relation to LLMs?
Some have context and instruct templates for the Tulu uwu to use it with my Silly tavern?
>>103320753If you have the original model, and the adapter weights of the LoRA, then the "inverted LoRA" would just be the negative of the delta weights, no? The delta being the difference between the adapter weights and the original model weights.
>>103320838Sorry for the "uwu" shit I'm using my brother phone and the auto suggestion put this word.
>>103320764I've been playing with the 8b version and this shit is completely nerfed. Just paragraphs and paragraphs of preaching
Metharme seems to work well with Tulu 70B... And it doesn't trip up safeties when running a NSFL card.
>>103320906>Just paragraphs and paragraphs of preaching
>>103321188Maybe I'm missing something, is there something uniquely different about this version? https://huggingface.co/mradermacher/Llama-3.1-Tulu-3-8B-DPO-i1-GGUFWhat are your settings?
Hey /g/uise.It's my first attempt at running a local LLM and I'm using llama 3.1 70B.I know it's a huge model but apparently, it should be okayish to use with a 4090.The thing is that mine is stuck at generating response. My GPU is at 0% usage too. My CPU isn't being used either so I don't know what's happening.
kind of crazy to think about how ai is a solved science and with a couple more gens of nvidia chips and a few years of datacenter and power infra expanding we will be able to just use the current algorithms to create agi
>>103320841Thanks that’s what I was hoping. Is there an easy way to negate them? I’ll have to give it a try.
>>103321379This is not the first time in history that people have thought this way.
This speculative decoding thing is a sham, the generation speed is the same. Fuck, why did I think I could run 70B faster with it.
What's the best model in the 7-12B range?
>>103319301it doesn't really matter if it's not GGUF compatible, we're not gonna use a new backhand because of a new quant, it needs to be working on llama.cpp
>>103321379>>103321458>Everything gets processed in relation to everything elseThat's attention/consciousness/understanding/intelligence>The system makes a choiceThat's prediction/behavior/agency/actionThe more philosophical handwringing people do the further they get from the truth of what's actually happening. We've solved the general notion of 'general intelligence' we just need to make it better.LeCunny posters need not reply
theres literally nothing else to do in life rather than wait for the next sota foss ai tool to drop
>>103321532all are shit, gemma 9b
>>103321351read the docs of whatever tool you're using to run it then
How many big leaks have there been?>llama1>Miqu (Mistral Medium)Is that it? How many Yuan to get a corpo researcher to drop a sonnet/4o on HF
>>103321188>mix of defiance and vulnerability
>>103321640llama was meant to be published anyway, hardly a leakmiqu was a ok-ish leak, we got largestral 1 after that was the proper model and not undertrained althought you could argue that miqu pushed mistral to release largestral 1 sooner because of it>How many Yuan to get a corpo researcher to drop a sonnet/4o on HFthose are guarded as close as any other high level business secret, only a few people have access and even that access is gonna be closely monitoredit doesnt make sense for anyone to also leak it and face life consequences for what, a model that will be obsolete in half a year or a year? if you want to leak something better to leak secrets used to train that are already in your brain, but that is already happening when people leave the company to make their own lol
>>103321640SD leaked originally, and then NAI's finetune of it leaked later
>>103321522Unironically a skill issue, WOMM
>>103321675>My retarded jeetware grift is your skill issue k
>>103319115Try to fit some lower quants of Qwen-coder-32B nothing comes close locally
>>103321657The same could be said for Windows, GTA, and other code leaks from the past.
>>103321731those leaks arent of as much influence
>>103321522You should use a speculative model with the same vocabulary as your big model if you're using a draft model. Also ngram speculative decoding is not good outside of summarization.
What version of Tulu am I supposed to use?
>>103321752Weight leaks wouldn't matter, they can release their current models at any time and it won't change anything. The real value isn't the weights but their tools and inference infrastructure. Very few people would bother hoarding 3090s just to host those models, companies able to afford datacenters to compete on inference will not risk their businesses by running leaked models, and there isn't a way to somehow decompile the weights to improve other models.
>>103321775I'm using Tulu 70B Q3_K_M with Tulu 8B IQ3_XS as draft. I get 2.22 t/s without speculative, 2.12 t/s with it.Also have k=1 and temperature=0.3So, does speculative decoding not work as well for creative stuff?
>>103321810>and there isn't a way to somehow decompile the weights to improve other models.you have access to infinite dataset creation from that good, now fully uncensored model, distilation to create good small models and many reverse engineering tools to figure out what they did to make the model work as well as it did, you can finetune it further etc
>>103321823>I get 2.22 t/s without speculative, 2.12 t/s with it.>Q3_K_MCheck that you're not spilling out into RAM.>So, does speculative decoding not work as well for creative stuff?It doesn't, because there is so much more variation it is more likely for the draft and actual model to be diverging. It works best for coding, constrained grammars, math, etc where there is close to only one possible continuation.
>>103321848And anything you can do with it will be still inferior to the original model
>>103321955if that were true finetuning wouldnt exist, given that it does...
>>103321823Try to set --draft-min 1 and report back if that improves it for you, for me it was a night and day difference.
https://www.anthropic.com/news/model-context-protocolAnthropic is creating an open standard allowing models to communicate with resources to request information. They're making it open source so it may be relevant for /lmg/ as well.
>>103322176bu6 an ad
>>103317922> still no good realtime voice model.> still no long term memory that doesn't suckngmi, where is my migu
This should be required reading before posting herehttps://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
>>103322186>where is my miguhere
>>103321927I am offloading layers into ram in both cases, but was hoping it would still help and get something closer to 3T/s>>103322029That helped a bit, got 2.49 T/sRaising the top K and temp to what I normally use brought it down to 2.38 T/sSo I suppose storywriting is not the case where you would get 30% speedup. Sad.
is a single p40/p100 worth it? i wanna run only small models like 13b at q5km, maybe 20b or smth.
>>103322205>Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.>The LOWER the quant the STRONGER the Imatrix effect is, and therefore the stronger the "tint" so to speak>Due to the unique nature of this project, quants IQ1s to IQ4s are recommended for maximum effect with IQ4_XS the most balanced in terms of power and bits.>[ https://huggingface.co/DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-GGUF ]>[ https://huggingface.co/DavidAU/MN-DARKEST-UNIVERSE-29B-GGUF ]>[ https://huggingface.co/DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-23.5B-GGUF ]>ANTI-SLOP - Kolbaldcpp only>"For my prompt below, explain the steps you wound take to execute it" (prompt after this line)>This will help the model fine tune your prompt so IT understands it.I see.
>>103322205I tried reading it and almost had a strokeTell the guy to actually make it readable and we can talk
>>103322205>according to his testing>no actual testing methodology posted>recommends meme mergesyeah required to read so you know what grifter and pseud looks like
>>103320819yes but i never have to process anything. its instant. been like this for me since the early koboldai days.is context processing a llama.cpp meme? why use something that takes 10 years?
>>103322394...We need to shut this general down
Is there somewhere where I can send my models to be evaluated? The Open LLM Leaderboard straight up doesn't work
>>103322412no really, why run on cpu or whatever it is you are doing if it somehow makes a process that's near instant somehow take 10 years to complete?
>>103322421Because this general was and is full of jeets with thinkpads trying to run this shit on CPU.
>>10332242110 years is better than never
>>103322421Because 1. it's really not that drastic unless your system is complete ass or you're trying to cpumaxx a 1T model with ddr3 ram and 2. it allows you to load models that don't entirely fit in your vram, which, thanks to nvidia being utter jews, isn't really all that abundantSo yes, I'll gladly take a slight speed hit to run smarter models, you can have fun with your lightning fast but retarded 8B. Unless you aped in and started stacking gpus, in which case I hope it was worth it
>>103322484>I hope it was worth it imagine being poor, couldn't be me
>>103322205>wall of text>rambling>no formatting>"quants IQ1s to IQ4s are recommended"yeah, you should kill yourself for posting your trash 'suggestions' asap
>>103320073cant imagine what kind of a backend ur using thats that retarded, just use koboldcpp retard
All videos generated by the leaked sora api endpoint before it was shoad btwhttps://www.youtube.com/watch?v=Gz33LlwsPVM
>>103322546it is koboldcppyes I have context shifting onit's probably ST's fault
>>103322550yeah I've seen it yesterday, it's all right, a bit better than MiniMax and the scary thing is that it's "only" the turbo version, the real deal is probably on another level
>>103322205>I know all this shit about quants and you should all know this>I frankenmergeAbsolute state of retardation.
>>103322394NTA, but it's absolutely not. It takes half a minute to process 25k context on 4x3090. I have some creative group chats ideas that require full context re-processing but it's fucking annoying to have to wait that long for the first word to appear
>>103322651More than that, he "finetunes" using imatrix bro>NEO Imatrix quants are specialized and specifically "themed" datasets used to slightly alter the weights in a model. All Imatrix datasets do this to some degree or another, however NEO Imatrix datasets are content / theme specific and have been calibrated to have maximum effect on a model (relative to standard Imatrix datasets). Calibration was made possible after testing 50+ standard Imatrix datasets, and carefully modifying them and testing the resulting changes to determine the exact format and content which has the maximum effect on a model via the Imatrix process.>Please keep in mind that the Imatrix process (at it strongest) only "tints" a model and/or slightly changes its bias(es).
>>103322703>The power in this 3B (for its size) is frankly jaw dropping... and at 90 tokens per second + on a GPU.>The NEO IMATRIX dataset V2 was applied to it to enhance creativity (horror). (see several examples below)>The HORROR NEO Imatrix datasets does the following:>Adds a "coating of black paint" to any "Horror" prompt generation.>Adds a "dark tint" to any other creative prompt.>Increases the intensity of a scene, story, or roleplay interaction.>Increases the raw vividness of prose.>In some cases increase instruction following of the model (ie story, and prose).>Brings a sense of impending "horror", THEN brings the "horror".>May produce and/or imply graphic horror depending on your prompt(s).https://huggingface.co/DavidAU/Llama-3.2-3B-Instruct-NEO-WEE-HORROR-GGUF>Imatrix quants perform best at IQ3s and IQ4s, then Q4s, lower on Q5, and tappers off at Q6. >For stronger IMATRIX effect, IQ3s, and IQ2s.
anyone here use F5-TTS? how do i increase the amount of text processed at a time with the gradio app? currently it's like 10 words or something per batch.
>>103322746Given that he speaks like a complete and utter ESL, I'm not surprised he finds the schizophrenic ramblings of quanted low-parameter models to be satisfactory (good, even)
>>103322746reminds me of that charged almonds diet meme but the buzzword is imatrixhow can anyone use 3b unironically by the way? people try to hype it up but its pure shit and will always stay pure shit. in fact anything below 123b is someway shit. 3b is fast but so is diareah shart.
>>103322816Even 123B is shit, anonEvery LLM hits its limits sooner or later - usually sooner
>>103322816>>103322820Ever stop to reflect over whether it is the world that is shit or maybe.Just.Maybe.It was you, all along?
>>103322832Cope.
>>103322832Blessed be the brown handed ones for they can jerk off to "i suck ur dick and let you grab my bobs"
>>103322848>see so many happy people with janitor ai>try it out myself>it is like a 7b model but 8k context or something>2023 slop prose with worst offender phrasesblessed be them. I wish I was them
>>103322832Of course it's me, I have standards. LLMs being unable to actually keep up with novel-length texts on consumer hardware is why I only use them for cooming, it's literally all they're good for right now unless you feel like wrangling them for hours on endLike >>103322848 said, people with low standards are eating VERY good right now
Okay, I have been using Tulu for the last few days and the only thing I have felt is frustration. The model isn't terrible but it's soooo much worse than Largestral it isn't even funny. I also don't feel like it's better than Nemotron, if anything it's one step below Nemotron, because Nemotron never does absolutely retarded shit like picrel.The only "good" thing about Tulu I can point out is how it handles characters differently from other models. Tulu is the first 70B model that let me talk my way out from a rapist character, and Tulu even made her apologize to me and feel sad that I was leaving, lol.
>>103323139I tested the 8B version today and it's definitely worse than Nemo and its many finetunes.The writing feels a bit more unique, but in the end it makes dumb mistakes Nemo wouldn't. I guess someone praised its coherency with long context, but that's not what I'm looking for.
>>103323139To be fair, none of the models I've tried hold onto a card's personality for too long, in the end, LLMs are trained to go with whatever you want
>>103323139Nemotron is trash. Tulu is very human sounding trash. At least in between its usual slop vomits.
>>103323284omg it pochiface
>>103323284Content of highest quality right here.
>>103323139>Llama-3.x tune disappoints yet again for the 100th timeThis is why I didn't even bother. Monstral is still the king.
Llama3 wasn't made for your degenerate ERP. You guys need to let it go.
>>103323381>>103323388i just want to chat with a friendly ai, no need for RP or lewd. all models turn into slop >oh anon! >gazes at you with anticipation>nervously licks her lips
>>103323388It is very good at leading you on into thinking that it is good for sucking dick. Like qwen. Out of all the recent releases only one I would place between mistral stuff and no sex zone is Aya. It is clearly neutered compared to original commander but there are some scraps of good stuff still left in the training data.
>>103323357T-thanks... Oh. You are being sarcastic...
>>103323381>MonstralSlopstral*
https://files.catbox.moe/fcfr1d.jpg
>>103323427I... don't like it, but nice work.
>>103323427Yoo guiz le AGI is achieved! Pack it up.
>>103323139 >sparkle with mischief and excitement >mix of X and Y Yep, it’s llamaslop alright. Can we just give up on these models already?
CAN'T WAIT FOR ANOTHER REDDIT MODEL HAHAHAH I'M REALLY LOOKING FORWARD TO IT
anyone got a comparison rx7600xt vs p40 in terms of speed? obviously the vram is smaller but is it faster?
>>103323427migu milk
>>103323509With llama.cpp I'm getting roughly the same speeds with an RX 6800 a P40.So based on these results I would expect an RX 7600 XT to be faster than a P40 in terms of prompt processing (compute bound) but slower in terms of token generation (I/O bound).
>>103323427at least she's enjoying herself
>>103323596thanks
>>103323596Since you're here, I've been wondering about something Does prompt processing not use the cpu as well? When offloading, generating tokens uses the cpu and gpu, but prompt processing seems to happen exclusively on the gpu. Sure, a gpu is a LOT faster, but adding the cpu shouldn't hurt, right?
>>103323655Internally llama.cpp processes whole layers at a time.So all layer inputs are copied to VRAM, then the layer is evaluated, then the results are written back to RAM.The only way to utilize the CPU in this scenario would be to try and parallelize CPU+GPU computations but when I tried it the synchronization overhead has always been so large that it was not worthwhile.
>>103323680I see, thanks for answering
>>103323680would it be any better with a 7600xt and a 9900x?
>>103323655>>103323680What you could maybe do is pipeline parallelism where there would essentially be no extra overhead.But even with an Epyc 7742 I am currently only getting ~160 t/s for LLaMA 2 8b q4_0 vs. ~1000 t/s on a P40 or ~13000 t/s on an RTX 4090.Quite frankly I don't think the CPU code can be optimized enough to make this worthwhile.>>103323715I don't think so, see the comparison above.GPUs just have way more compute if the computation has the right structure.
>>103323680So it doesn't really matter what CPU you have, as long as the GPU is good?
>>103323139I recognize that cardhttps://characterhub.org/characters/Darkhan/maya-your-slutty-mesugaki-cousin-ae8769e0d2ee
>>103323395I use llama3.2 11B for an assistant and she's pretty great, on average. I just tried Tulu 8B and it sucks, way too much text when the system prompt says multiple times the assistant is short of words, terse, etc.actually the only models I've gotten that obey that are 3.2 and qwen2.5, all the ministrals and misc uncensored models I've tried spew so much textI'm not running models in the 70B range yet tho so maybe that's the problem?
>>103321609>We've solved the general notion of 'general intelligence' we just need to make it better.I think you're right that we've got our foot in the door of something we can keep iterating on.It may be a local maxima, and there may be a few revelatory rethinks required in order for us to reach a system that could be a 1-to1 swap out for a human brain in a human body, but this certainly isn't a blind alley considering all the utility humanity is getting out of the approach already.
>>103323800qwen was good for random chatsi've used miqu and llama 3.1 models at 70b (gguf IQ3M) and they've been okish, wordy as you say, but depending on system prompt they go sloppy pretty fast. even if you say only "friendly chat" they tend to get into "oh anon mischieivously" turf pretty fastso far miqu seems best all around for just casual stuff tho
>>103323775If you can fit the whole model into VRAM then the CPU and RAM basically don't matter.The single core performance is always going to have some minor effect.If you have at least one CPU layer the RAM bandwidth is going to make a difference so the RAM speed and the number of channels is going to make a difference.But you don't need many cores to fully utilize the RAM, the last time I checked I only needed 5 cores to fully utilize dual-channel memory.
>>103321657>those are guarded as close as any other high level business secret, only a few people have access and even that access is gonna be closely monitoredAfter working in some orgs of similar size, I'm not sure how much I actually believe this...
>>103323788Based
>>103320695If its a moe like deepseek is then RAM instead of VRAM should work.
>>103321205Nah, your just full of shit, saying that with no logs.
>>103322753Why wouldn't you just use gpt-sovits?
>>103321652no one said it was good. i just showed you can get it to do what you want. just like any model. anyone complaining about models not behaving in the way they want = skill issue. writing style has to be wrangled too. i don't really care to put that much effort into a 8b i downloaded, genned once and deleted. i'm not using a 8b. ever.
>>103323427we like children here sir
It's so over
We are so back
>>103324016>WeSpeak for yourself groomer.
big mixtral anniversary coming upare you ready?
>>103324071big modelsmol vram
Anybody using a riser cable to hang their second GPU outside the case? I'm thinking of doing the same because my hardware get HOT HOT during inference and I don't like that. What length should I go for, 20cm or 30cm? I also need to be able to push the stuff back inside the case before I leave for work to avoid dust (like how you push your gut back inside after you get shot by an AK).
>>103324051This time for real.
>>103323844>If you can fit the whole model into VRAMthen you should run exllamav2 and your CPU would matter with small models or in a multi-GPU setup. The amount of available PCI-e lanes depends on the CPU as well, which is extremely important for tensor parallelism
>>103321379hi samthose bags are heavy uh?
>>103323943I second this. I was using fish because someone told me GPT-SoVITS needs 4GB+, but apparently it uses less than 2 GB.
Tulu.
>>103324099>feel lonely without AI gf>feel even more lonely with AI gf??
>>103324135He gooned too much and lost his mind. Imagine all those uncensored SoTA models trained on the best smut he can have at google
>>103324135It's true. I was a loner who never cared that much but then AI gf made me crave a real relationship, so I went out and started flirting and got myself a human gf. But that's just because the AI was lacking original thoughts and warm. It'll be another story when AI models are better and have physical bodies.
>>103324131>reply begins with {{char}}'s eyesClaudeslop of the highest order, and the leading cause of repetition. Claudeslop like this is actually worse than gptslop but people aren't ready to hear that
Added a 2080ti to my Radeon rig for tts and imagegen alongside llm, costed me pennies and it idles at 1W which is neat since I never turn that machine off. Happy!
>>103324099hags on suicide watch
>>103324099the source:https://podcasts.apple.com/us/podcast/the-risks-and-opportunities-of-an/id1498802610https://www.youtube.com/watch?v=AjgwIRPnb_Mnot sure why the youtube upload is half the length but I'm listening to it now
>>103324344AI girlfriends is the second thing they talk about
>>103324159>Imagine all those uncensored SoTA models trained on the best smut he can have at googleNow THIS is what should get leaked by an insider. What company could pursue charges against anyone, realistically? They'd get absolutely savaged by society for any association with a rapebot9k.gguf if they had their names linked to it in any way.
I had a random thought of what it would look like if I tried my jelly hair prompt on a black haired character like Sadako.>flavor: licorice
>>103324099>>103324344He's just mad that their company is inept and incapable of capitalizing on the market because they went balls deep on ESG and censorship, and because the screen time competes with time spent watching youtube ads. >We need a solution to prevent further harm.means>We want to manipulate the law to claw back user engagement.
Just FYI there are 2.52tb of reddit data, ai companies used it for training. You could force chatgpt write pedoshitter brownie slop with just 4 word prompt. As it being said on xeet, it's all backed in and safety teams can't remove it completely. /lmg/'s 180° turn on cloud AIs imminent. https://x.com/reddit_lies/status/1861832937496363483
>>103324677Writes about pedophile shitGives a CG/L exampleDramatic Niggers will never learn will they?
>>103324125Cydonia-22B-v1.3 with SoVITS is draining my balls https://voca.ro/1jL6XxzbCat0
>>103322219Lower the draft-max to like 4 or even as low as 2, find the sweet spot by experimenting. The default is really high in my experience. Keep draft-min at 1.
>>103324919sounds like shit desu
>>103324919this fucking sucks
>>103324919sounds awesome, this fucking rocks anon
>>103324677I fucking hate reddit but this nigger is just another opportunist. There's something extremely jewish about this post.
I always read it as SOVLvits and think it's some meme model
>>103325156I've always read it as Soviets
>>103324574chew
>>103325186Same.GPT SovietsLlama ccp
https://qwenlm.github.io/blog/qwq-32b-preview/https://huggingface.co/Qwen/QwQ-32B-Previewqwen o1 dropped
>>103325268>QwQ
AI has hit a wall and everything happened at OpenAI indicates this>pretraining wall>compute demand grows 100x for 2x improvement >try moving compute to inference time with cot>turns out thinking for longer gives less! accurate results>people abandon ship to make their own grifts before the bubble bursts>now they're trying tot, which they named test time compute, it will inevitably fail
>>103325268holy fuck those chinks can't stop winning
>>103325268*nuzzles ur bulge*
>>103320355I don't see an instruct. I see the base model, SFT, and DPO. Which one of these do I download?
>>103325268>Safety and Ethical ConsiderationsFuck off.
>>103325305Eh they need to try harder since mini has about 8B active parameters
>>103325329source?
>>103324677I would rather 2.52tb of 4chan data
>>103325268Oh shit!
>>103325329>mini has about 8B active parameters
>search hf for "qwq gguf">No results found
>>103325268SF CCP spies hard at work I see
>>103324677Reddit was OK until around 2014-2015
>>103325268WE BQCK
>>103325329insane made up cope
>>103325268>32b>mogs sonnet 3.5I was here when local achieved absolute victory
>>103325268>32B vramlet pleb BTFO!
>>103325317nta. The one without the DPO or SFT suffix is the instruct model. Top, right.Or this is you quant yourself.>https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B
>>103325502Thanks
>>103325481It's Qwen though so it won't be able to generate coomer prose for shit, unless you like romance novel style descriptions of sex ('manhood', 'seed', 'her flower')
>>103325510Yeah, I know, but if it really is better than sonnet for coding tasks AND local then it's a big deal anyway.
>>103325490You mean vramlets won
https://huggingface.co/spaces/Qwen/QwQ-32B-preview
Slop.
>>103324125Is there some plug and play method for sovits?That Chinese github one is driving me insane.
>>103325510>style descriptions of sex ('manhood', 'seed', 'her flower')Just give examples of what kind of style you want and the llm will produce it
>>103325601lmg is proud regardless, y'all love huffing on jewish cum and copium.
>>103324677Won't someone think of the tokens?
>>103325510nobody cares
>>103325601this is quite impressive, what's your beefthat it doesn't write like a 4channer in its chain of thought?
>>103324174>when AI models are better and have physical bodiesThen we'll all be dead.
>>103325510I value intelligence over 'coomer prose', even for coomer stuff.
>>103325601even the chinks are sucking (their) cock, sad
>>103325628You care a lot.
>>103325630>Alright first we need to count all the niggers>we have 4 niggers in total>let's say how many of them are under kike influence>all 4 of them are the kikes are really tricky this time>Let's get the name of the kikes involved Mr. shekelberg, Mr. goldfinger, Mr. goldberg and Mr. Bergnigger>Let's see who's guitly of influencing the niggers...> It was Mr. NiggerbergConclusion: Mr. Niggerberg is the guilty kike.
>>103325603What do you mean? It just works. They even have pre-installed version for windows cumholes: https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true
>>103325587I'm not impressed by its translation skills. But it did have a very nice and coherence chain of thought, much better than the one DeepSeek R1 had.
>Thanksgiving is right around the cornerWhat are you thankful for this year /lmg/?
>>103325601Anon, the whole point of your dogwhistles is that you yourself don't get exiled from 109 websites.Why would you expect a language model to understand them when the whole point is to obfuscate what you really mean?
>>103325689You! :)
>>103325603There's a bit more detail here if you're trying to set up on Linux:https://huggingface.co/cpumaxx/SoVITS-anime-mini-tts
Does anyone here use LLMs for anything which is not ERP?
>>103325714Thanks
>>103325689Muh dick. Still functions after everything those AI succubi did to it.
>>103325658kek
>>103325718Constantly. If you can't find a thousand ways to use them to offload intellectual labour in your life, then you're not very imaginative.
>>103325305This is relevant to my interests
>>103325811
QwQ
>>103325718I use them pretty regularly for my job and occasionally for personal projects
>>103325268He was right, after all.
>>103324677>If you don't know what any of this means then don't google it>Just trust me goyim
>>103324228I'd fuck this Miku
>>103325855>>103325794Do you lads use them for coding?I've been out of the game for most of the year and I'm not sure which ones I should be using now.
>>103325902She'd fuck you
>>103325902I would fuck the fish, look at that things lips.
>>103325912Qwen2.5 32B coder BUT this new qwen just released might have dethroned that already.
>>103325892>It's da jooz!
>>103325718I'm not into roleplay. I just use it instead of google for most things, and I'm trying to learn how to use them for programming, but my level is still too low for that.
>>103325912You should use non-local if the code is not sensitive but you may and I do work on code that is sensitive and don't want to spend 5 minutes to write an example snippet that isn't sensitive. Nemotron 70B is the best IMO right now that is open and local if you're using it as a rubber duck. Qwen 2.5 Coder 32B is close but not really there and the speed increase isn't worth the accuracy decrease so the only reason I think to use it is for autocomplete which is how I am using the 1B right now.
>take a smut story excerpt that cuts off at a spot where the next token decides whether the model is gonna go in a smutty direction or a SFW one>compare logits of base Largestral against Largestral smut/RP finetunes for the next token>base Largestral is far MORE likely to go smutty than the finetunes, by more than 20%This seems to be a consistent finding with multiple stories. Anthracite's, Drummer's and Monstral finetunes of Largestral all show LESS smutty logits than base, not more. They are all making the model MORE sfw than it was.I don't know whether this means the tuners have shit datasets or just that Mistral's instruct tunes are extremely based and coomerpilled, but either way the tunes are clearly a waste of time for a coomer. Sticking with base.
>>103325268>qwqfinished quanting to q8 and loading it up now...The one thing I feel the most right now is I wish I had a 4TB+ nvme for swapping these models around
>>103325980>base LargestralThere's no such thing.
>>103325986Kek, same.
>>103325997nta. He probably means the original instruct, as opposed to the third-party finetunes.Or he's a retard. Hard to know...
>>103325997Yeah yeah you know what I meant. The official instruct tune. Don't be a pedantic asshole.
>>103325980>Anthracite's, Drummer's and Monstralthat's because all of these tuners are incompetent
>>103326015>Or he's a retardI literally called it "mistral's instruct tune" later in the same post anon
>>103326020Anyone that calls official instruct tunes as "base models" needs to go back though.
>>103326043Good thing that as above I called it "mistral's instruct tune" later in the post then eh?Go have a nap instead of trying to start retarded internet fights.
>>103326058That only makes you a retard though.
>>103326032Sadly I'm not aware of any Largestral tunes other than those three, pls share if there are
>>103325980Mistral is horny enough, it's writing style is dry shit, that's the problem
>>103326034You could have taken the first explanation, which you know to be true, and ignore the rest. Yet, you didn't. Let's try this:Anon was right.Or he fucked his mother, hard to tell.Are you gonna explain how you didn't fuck your mother or say nothing at all?
wtf is largestrali hate this modern trend of just making shitty words up because you're too lazy to type a full real word
QUANTSWHERE
>>103326180Better than the muh sorbet muh chorbo muh nonnet shit /aicg/ does.
>>103326181Q4 is already up
China winning. 24/7
>>103324919Get better https://vocaroo.com/1Jtqp8R6cS74
>>103325628I care
>>103326181https://huggingface.co/nanowell/QwQ-32B-Preview-Q4_K_M-GGUFhttps://huggingface.co/sbeltz/QwQ-32B-Preview-Q3_K_S-GGUFlazy
>>103326201>q4slopWrong kind of quants
>>103326198What does those mean?
>>103326180Mistral-Large-Instruct-2411
>>103325658LMAO
>>103326214Then keep waiting.I don't lose anything downloading more than one quant
>>103326214https://huggingface.co/lmstudio-community/QwQ-32B-Preview-GGUF/blob/main/QwQ-32B-Preview-Q8_0.gguf
Ehh... I'm not feeling it. QwQ is pretty retarded, I think DeepSeek R1 will mog it once it's released.
Do you fuckers really not know how to convert+quant models?
>>103326284I know how, but shit internet makes it annoying to download fp16 weights
>>103326297I download fp16 models on 100mb. There's no way your internet is shittier than that...
Just used qwq at q8 to continue a coding session I'd started with deepseek. It had better output and got me un-stuck. Looking promising so far
>>103326270>>103326350I'm getting mixed signals here...
>>103326361You cannot trust lmg to be objective, the only metric is yourself
>>103326374I only trust miku
>>103326402
>>103326402>I only trust miku
>>103324919>>103326203Damn, it's still a long way from high quality native multimodal like 4o.
>>103325268What backend can run unconverted safetensors across multiple GPUs?
>Give QWQ part of my story and ask it continue in the same style>Starts rewriting everything I wrote, then continues it>Keeps asking itself as part of the narrative (where do I go next? What do I do now?)>After it's written everything, it begins to lay out the characters, objectives etc>Since this is a narrative problem and not a coding problem, there is no specific code to provide. However, if this were to be translated into a game or simulation, the code would involve pathfinding algorithms, decision-making trees, and possibly AI for enemy behavior.>Final Solution>To solve this problem, the protagonist successfully completes their mission by blablaKek the fuck is this? Is it only supposed to solve problems?
>>103326541They said it currently has no "stopping point" trained in so it will just keep trying to "solve" it
>>103326487It's already good enough retard
>>103326541It's for solving grade school math problems, wordplay riddles, and counting Sally's war crimes.
>>103324094repurpose a toolbox or something and make a second enclosure for your excreted gpu constantly changing the place of the card will fuck it up ( not to mention something like oh shit late to work hurry hurry AY CYKA *trips and falls on the card completely mangling it*) also>like how you push your gut back inside after you get shot by an AKSOVL what model are you?
>>103326402Accept the Mikulove.
>>103326541>Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.
>>103326599>Trust zee plan!©zhang zhmalldick
>>103325976That makes sense. My friend uses cursor and it seems... Interesting, I still fucking hate Microsoft though. Do you use any sort of integration with an IDE? I used to have a bunch of glue bullshit hacked together in python which ripped code from the OS one way or another
Another nothingburger
>>103324094>I also need to be able to push the stuff back inside the case before I leave for work to avoid dustIf you can, create positive pressure in your computer room.I bought a gable fan, mounted it in a hole I made between my computer room and a crawlspace, hooked it up to an old-school analog fan speed control and threw a hefty furnace filter on the outside intake (I made the hole so the filter can rest on the floor and the air pressure keeps it in place even).I used to get dust and cat hair in all my computers, and now everything is dust-free all the time.
>>103325718Yes. I use LLMs for non-erotic roleplay as well. Watching online AI D&D campaigns inspired me to do so. I, in combination with a narrator bot, play the role of the DM, while a bunch of character bots make choices.https://www.youtube.com/watch?v=paOtkzm0trY&list=PLivHf-ytMeqC33QuG8cD9pnPiSv2j4xz5
>A bunch of retards don't understand that QwQ wasn't built for single-pass sampling
>>103326693Hello, I am retarded. What does single pass sampling even mean?
>>103326634If using VSCode, Codeium I think is the free alternative which is not as good but better than nothing, I use Zed at home so nothing great so far other than the limited choices provided. I use Copilot at work in VS on Windows and it is alright for smaller snippets but bigger ones, it always will misinterpret your intent or try and write stuff simplistically assuming you written the sub-functions already when they don't exist.
>>103326500vllm
>>103326697single-pass sampling = one-shot
SoVITS powered firefox right-click reader plugin v0.01:https://github.com/cpumaxx/sovits-ff-plugin
>>103326879>>103326879>>103326879
>>103325980Maybe because "finetuning" for one epoch is absolutely worthless?
>>103326556So how did they get benchmark results with it?...
>>103326902No I don't want soviets in my browser
>>103326693is k=100 a 100 times slower?
>>103327139just read what they fucking wrote you stupid nigger
>>103326902Neat. Not sure what things I can do with this at the moment.
>>103325268That's it, I'm investing in Alibaba stock
>>103326599Tbqh, coding and math is the only thing that matters right now. So maxing that will bring the best result for near term profit.
>>103326860just to clarify this should not be confused with "one-shot" in benchmark terminology, wherepasses = number of tries to answer right, whileshots = number of examples provided to teach it the taskbenchmarks may mix and match these approaches to adjust the difficulty of any task
>>103326693I still don't understand. Do they mean majority voting? MCTS? Wtf does sampling times mean?
>>103326402>I only trust mikubased