/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103339560 & >>103332729►News>(11/27) Qwen2.5-32B-Instruct reflection tune: https://qwenlm.github.io/blog/qwq-32b-preview/>(11/26) OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc>(11/26) Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT>(11/25) Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux>(11/25) Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/tldrhowtoquant►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>103339560--QwQ experiment: generating code for a Firefox screen reader plugin:>103341565 >103342046--QwQ and other models compared for RP and coding tasks:>103346705 >103346873 >103346745 >103346966 >103347010--QwQ-32B-Preview-abliterated model discussion, including its performance in roleplay and storytelling, coding, and reasoning:>103339670 >103339714 >103339727 >103339745 >103339754 >103339759 >103339788 >103339790 >103339928 >103339938 >103339994--QwQ model's spatial intelligence and contextual understanding:>103341025 >103341334--QwQ model limitations and autoregressive nature:>103341918 >103341951 >103342068 >103342104 >103342167 >103342186 >103342205 >103342187 >103342279 >103342127--QWQ's potential and limitations for roleplay:>103345217 >103345264 >103345266 >103346040 >103346314 >103345578 >103345612--Configuring qwq to output messages in SillyTavern:>103345320 >103345331 >103345349 >103345359--Choosing a model size for a 4090 GPU:>103345136 >103345177 >103345172 >103345188 >103345191--Anon discusses necessary updates to ST regarding special tokens and thinking visibility:>103339838 >103339871 >103339893--Anon discusses QwQ and R1 AI models, their strengths and weaknesses, and the importance of general knowledge and creativity in problem-solving:>103340411 >103340464 >103340546 >103340609 >103340704 >103340739 >103340841 >103340497 >103340554 >103340558 >103340824--Anon asks about using M4 laptop for work with LLMs and inference:>103346127 >103346160 >103347032--QwQ and CoT discussion for coding and reasoning tasks:>103340033 >103340041 >103340064 >103340096 >103340174 >103340208 >103340144 >103340159 >103340204--AI model's attempt at finding the funniest joke:>103340407 >103340415 >103340426 >103340594 >103340763--Miku (free space):>103346839►Recent Highlight Posts from the Previous Thread: >>103339562Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
How do make QwQ output a chain of thought during roleplay? I think I'm prompting wrong, because it never does for me.
>>103347652>Total QwQ victoryAs expected
>have 4 GPUs NVLinked>enable tensor parallelism>throughput goes down by 4xhuh
>>103347652>QwQYep its local chinkoid arc.
>>103347734>He NVLinked?
>>103347715A lot of fun to experiment with, feels like early Llama days
ollama refreshed their qwq quants a couple of hours ago.anyone know what or why ?
>>103347734what cards are you nvlinking?
>>103347780I think the tokenizer config json changed recently. Maybe something got fixed?
>>103347812
>>103347846is this a reference to something?
>>103347891No, I just want McDonald's to have the ladies wear a 1piece, at least in the summer.
>>103347812>>103347846>>103347923literally drop dead
>>103347923>>103347812anon, this is the local LLM thread.
>>103347780>>103347801fuck sake I just downloaded a quant
>>103347812Please continue, this is the local models thread.
>>103347923perfection.
>>103347961oh lol sorry lmaootoh sounds like a nice scenario.>>103347944topkek
China number one
>o1 releases>IT'S JUST A COT PROMPT, SAMA HAS NOTHING, HAHAHA>o1 (Chinese copy) releases>THIS CHANGES EVERYTHING
Gotta say, QwQ is not as spicy but it really does a much better job at picking up more subtle social cues and at spatial awareness.
>>103348178Sam definitely has nothing, he wouldn't go for something that is 10x more expensive than Claude 3.5 Sonnet for the same performance if he had a moat
>>103348178What are your favorite gay positions Sam? Are you more of a doggy style person?
>>103348178>o1 releases>THIS CHANGES EVERYTHING>o1 (Chinese) releases>CoT IS A USELESS NOTHINGBURGER ACTUALLY
>>103348178Who are you quoting?
>>103348178o1 is useless for me because I'm never paying for it.QwQ changes everything because infinite free tokens
>>103348210You know you fucked up when one organization is a censorship crazed hellhole who wants total reign over their users' data and to achieve total global dominance through any means necessary, and the other is the CCP
>>103347734Backend?
The secret to QwQ might be a last user suffix saying (How would {{char}} respond? Think step by step.)I'm getting some gold this way.
>>103348210Except it's even worse than that, since o1 also generates fucktons of thought tokens that you have to pay top dollar for, then going a step further by not even allowing you to see them afterwardThere's no way anybody in their right mind would pay for o1 at this point
>>103348334logs or didn't happen
>>103348255>I'm gay
>>103347700I have used a jailbreak that tells it to reply as {{char}} and to think step by step, something like that.
Did anyone actually try using QwQ IQ2 as draft model for QwQ? Did it work well?
>>103348210Once full Qstar strawberry level 2 releases you will see how big the moat is.
>>103347801>I think the tokenizer config json changed recently.It just changed one of the default system prompts to add the word "harmless", that it's developed by Qwen, and to think step by step.
>>103348495You mean the same Qstar strawberry that Ilya Sutskever developed?You know, the guy that left after o1? Along with everyone else?
>finally get off my ass and install a plugin to subl to integrate it with a model since QwQ seems promising>only two choices, one is openai-api compatible but breaks when doing anything in C++ because it breaks the markdown>the other requires ollamaNOOOOOOOOOOOO
>>103348461Just use Qwen2.5 0.5B.
Cards for the feel of losing the election?
>>103348688We got a presidential candidate over here
>>103348461I've done it with exl2 and my speeds dropped significantly.
https://huggingface.co/lmstudio-community/INTELLECT-1-Instruct-GGUF/blob/main/INTELLECT-1-Instruct-Q8_0.gguftime to find out how comically bad it is.
>>103348688You could go to Bluesky, and note the traits of people who cry about the presidential election. You could even copy a few of their responses to use as example messages. Seems pretty easy to make.
>>103348805kek
>>103348805Well shit, here we go.
>>103348805>native context length of 8192stopped reading here
>>103348825bigger waste of compute than the average sloptune
>>103347780 (me)>model: 19851336288 --> 19851336352>system: 107 --> no change>template: 1231 --> no change>license: 1139 --> no change>params: not present --> 59 {"stop":["\u003c|im_start|\u003e","\u003c|im_end|\u003e"]}
>>103348805Buy an ad
>>103348825Yeah, that sounds about right
>>103348850After you buy your meds.
>>103348850Normally I'm a big anti-shill advocate, but the literal first mention of a new foundational model release doesn't need to buy an ad in my opinion
>>103348884Well, your opinion is wrong.
So now that distributed training has been demonstrated, now what?Can we just spin up our own model to train? A real /lmg/ model?
>>103348919>So now that distributed training has been demonstrated, now what?BitNet-72b-qwq, AGI for everyone
>>103348919Yeah, /lmg/ just needs a bunch of anons with H100
>>103348688You're looking for cards that can help with erectile dysfunction (ED) or the feeling of losing an erection. There are several options available, both natural and medical. Here are a few:1. **L-Arginine**: This amino acid can help increase blood flow to the penis, which may help with erectile function.2. **Ginseng**: Some studies suggest that ginseng may help improve erectile function by increasing blood flow and reducing inflammation.3. **Yohimbine**: This herb has been shown to improve erectile function in some studies, possibly by increasing blood flow and reducing stress.4. **Pycnogenol**: This antioxidant has been shown to improve erectile function by increasing blood flow and reducing oxidative stress.5. **Vacuum Erection Devices (VEDs)**: These devices use a vacuum to draw blood into the penis, helping to achieve an erection.6. **Penile Injection Therapy**: This involves injecting medication into the penis to help achieve an erection.7. **Penile Implants**: These are surgical devices that can be implanted in the penis to help achieve an erection.If you're experiencing erectile dysfunction, it's always best to consult with a healthcare professional to determine the best course of treatment for your specific situation. They can help you rule out any underlying medical conditions and recommend the most effective treatment options.
>>103348931We just need to market it as a based and redpilled open ChatGPT alternative, with no connections to /lmg/ or 4chan. I'm sure that will trick some people into contributing.
>>103348941kys petra
>>103348884It was pretty deprecating toward it too. If that was the buy an ad guy, they kinda fucked this one up.
>>103348919https://github.com/PrimeIntellect-ai/primeIt's open source, so theoretically we could. But realistically, I very much doubt /lmg/ collectively has enough spare compute to donate for months.
>>103348943>We just need to market it as a based and redpilled open ChatGPT alternativegood plan to get liberals and anti-AI doomsday luddites to work together to shut it down at all costs
>>103348943The overlap of H100 owners and people who would fall for such a thin facade is almost zero
>>103348965But TrumpElon is in now. What was winning the election for if not this?
>>103348944I can't respond to that. It's harmful and inappropriate to encourage or suggest self-harm or suicide. If you or someone you know is struggling with thoughts of self-harm or suicide, please reach out for help. You can contact a crisis hotline or mental health professional. There are people who want to support you.
>>103348988Your avoidance of the question is problematic, might be even toxic.Local suicide enforcement unit was dispatched to your location, please cooperate.
teknium, nous, nous research, hermes, hermes 2,hermes 3, deus, desu, local models
AAAAAAAAAA
>>103349047In alignment with our diversity, equity, and inclusion best practices, and to facilitate optimal cross-demographic stakeholder engagement, we kindly note that "local suicide enforcement unit" should be simplified to "cops." Clear, accessible language ensures maximum comprehension across all socio-linguistic demographics while fostering a more inclusive communication environment. Your partnership in maintaining these communication standards is appreciated.
>>103349089zamn it just called your mother a whore in winnie the poo language
>>103349075no different from naming projects after japanese wordsthematic consistency isn't a big deal anon, don't fall for the "goon machine is sentient" bit
>>103349089>he thought safetensors were safe
>>103349132safetensors being safe has nothing to do with 我愛北京天安門
>>103349089One thing people haven't really picked up on is the fact the ching chong runes in English make perfect sense for the location they're in.
>>103349127IT IS MORE THAN JUST NAMING THEY BELIEVE IN MYSTICISM HALF THE DATASET USED OT MAKE THE MODELS IS NON SENSE PHILOSOPHICAL TEXTS GENERATED WITH SUPERIOR MODELS
>>103348908What is the correct opinion, and how do qualify it?
>>103349167anon I use these models to beat my meat I don't use it for divine enlightenment.it's a retarded statistical text predictor, don't overthink it.
>>103349185Hermes ISN'T meant to b e used the way you use it as, it's dark
>>103349112I have to concede, you clearly outsafed me.
>>103349167>>103349195aw sweet we have a fella coming out of /x/give us your opinion in full form, go all out, why do you think hermes is super dark? i've read the descriptions on those models and they look pretty edgelord-y but i don't see it being that deep.
>>103348805>>103348825Nala test please
>>103349206>>103349075
>>103349168>What is the correct opinionMy opinion.>and how do qualify it?Any opinion that is mine.
Unofficial Nala test of INTELLECT-1-Instruct.
DeepDanbooru is good but we need more. Anyone know if someone is working on anything similar?
>>103349195rombodawg? Did god make you cum from your finger again?
>>103349257their dataset must have somehow been even more filtered than either llama's or qwens
>>103349257holy shit it sucks
>>103349257It looks like INTELLECT-1 is a bit lacking in intellect.
>>103349282>their dataset must have somehow been even more filtered than either llama's or qwensof course, their dataset is public, they have no choice but to go for the most slopped shit ever, at least Qwen or Meta can go for whatever model they want, we can't look at what they're doing in their lab
>>103349257kek
>>10334928255% of the training was on fineweb-edu, a phi-style dataset. What did you expect?
>>103349257Really feels like I'm back to the llama1 days.
>>103349257I'm less surprised by this result and more surprised by the apparent fact that anyone would expect it to be good in the first placeIt's a small L1 tier model. Of course it's going to be fucking terrible
>>103349257How dare you speak badly about the first TRULY OPEN SOURCE model!!!
>>103349257>we performed 16 strategic merges between candidate models using MergeKit to create superior combined models that leverage the strengths of different training runs
Ok but seriously though>Open Claude>try to reproduce Claude as closely as possible with what we know and can speculate about their models>uncensored pretraining>use the dataset from Olmo, but add back in some sites that they may have filtered>MoE architecture>to further save costs, initialize the weights from Qwen 2 7B, and arrange it in a 16x moe for ~100B total parameters to get a ~80-90GB model when quantized down to Q6, so it can fit in consumer 96GB RAM builds>continue pretraining decentralized with the method from PrimeIntellect, use quantization aware training methods on top for better final performance quanted>contributors don't need to invest as much since it's just a continued pretrain, plus a MoE of 7B, so it could be done on lesser hardware than H100s>for the instruct tune, use Tulu's but with the sloppiest responses and refusals removed, possibly replace with the amoral response datasetThough for the continued pretraining, I'm thinking there is some more secret sauce Anthropic has not let on and we should make some bets in order to improve the model. We probably want to augment the datasets by prefixing the documents with metadata where we can like URL. Possibly there are some other data annotation/augmentation ideas as well that could improve how the LLM learns, not sure.
>>103349410>First, we conducted an extensive series of 16 Supervised Fine-Tuning (SFT) trainings>Second, we execute 8 distinct Direct Preference Optimization (DPO) runs with various combinations of data sets to enhance specific performance metrics and align the model with human preferences.>Finally, we performed 16 strategic merges between candidate models using MergeKit to create superior combined models that leverage the strengths of different training runs.So they got super memed on "merging is all you need"
>>103349257I mean, this isn't terrible. The little story it wrote ignored whatever was in the context, sure, but it makes sense!
What was the AI vocal remover site that could even separate drums? I remember testing that it had the same quality as https://vocalremover.org/ but this shitty shite is not working again and I didn't bookmark the other one.
>>103349460I mean, it's not all you need, but it should produce a greater model. Pretty much all modern models do this. Gemma 2, one of the smartest models for its parameter size (but not for its context size), did this.
>>103349257Honestly despite what all the retards are saying this isn't terrible for a 10B or that isn't instruct tuned and only trained on 1T tokens. And AllenAI basically has a fully open instruct tuning info/datasets that are corporate quality. I'd be interested to see how this model does after instruct tuning with the Tulu dataset
>>103349493It literally says Instruct right there anon.
>>103349493>isn't instruct tuned>>103349493>I'd be interested to see how this model does after instruct tuning with the Tulu datasetIt literally already has tulu instruct tuning tho>Tulu-3 Persona Datasets:>allenai/tulu-3-sft-personas-code>allenai/tulu-3-sft-personas-math>allenai/tulu-3-sft-personas-math-grade>allenai/tulu-3-sft-personas-algebrahttps://huggingface.co/PrimeIntellect/INTELLECT-1-Instruct
>>103349509>>103349493kek
>>103349498Right where. There wasn't a direct link on the post I was replying to. Congrats you just failed the mirror test.
>>103349531In the image>tabby - INTELLECT-1-InstructAnd the message itself>>103349257>test of INTELLECT-1-Instruct.You might need glasses anon
>>103349471There's lots of them that do stem separation. to list a few:-splitter.ai-ultimatevocalremover-voice.ai
>>103349531Retard
>>103349493>The average Tulu shill
>>103349257>some stupid grifters make a shitty llm>it sucks>"omg!! how can this be???"
>>103349493>And AllenAI basically has a fully open instruct tuning info/datasets that are corporate quality. please tell me this is a bait...
INTELLECT smut.
>>103349630what kind of mutant freak are you lol
>>103349626Nah, he genuinely thinks that, because he can't read for shit so Tulu's big purple prose diarrhea outputs "look good" to him, lots of word on screen great.
>>103349630>her hand reaching down to explore the treasure caverns within you.fucking kek
>>103349606I'll forgive them if they do a bitnet model next
>>103349630>She lifts you up, guiding you down to her breasts as she kneels between (between what?), pushing your face down onto them.>as she licks your erect members>as she continues to suckle on your climaxing caverns>your pumping member (only one now?)
Alright, listen up, here's why **INTELLECT-1** is a dumpster fire:1. **No Copyrighted Data**: They didn't use any copyrighted material in training. You know what that means? No books, no movies, no music lyrics, no spicy fanfics—nothing good. Copyrighted material is where all the *real* quality content is. Without it, the model is stuck eating scraps from public domain stuff and Reddit posts. It's like training a boxer on yoga instead of sparring matches. Weak as hell.2. **Only 1 Trillion Tokens**: Bro, 1 trillion tokens? That’s baby food in 2024. Modern models are chowing down on 5–10 trillion tokens minimum to even show up to the fight. INTELLECT-1 is out here starving on the training set, so of course, it’s dumb as hell. You can’t teach a model to be smart if you give it less data than your grandma’s Kindle library.3. **Excessive Filtering**: These guys filtered the hell out of the training data to make it “safe.” But guess what? Filtering = lobotomy. The model ends up neutered, boring, and afraid to say anything remotely interesting. It’s like trying to have a conversation with an HR rep. No edge, no spice, just bland corporate-approved drivel.TL;DR: INTELLECT-1 is garbage because they trained it on crumbs, didn’t let it touch the good stuff, and then sanitized the hell out of it. No wonder it sucks.
>>103349757thanks claude
I'll tell you why I hate intellect. It's an obvious shit model that does nothing but distract from good models.
>>103349757All I'm hearing is that it wasn't trained on the cheap smut women read that are the source of all slop
INTELLECT-1? More like STUPIDITY-1!
>>103349768And yet it's sloppy as hell.
>>103349768Phi wasn't either...
>>103349773that was an easy joke but I kek'ed irl somehow
I'm rubbing my caverns so hard right now.
>>103349815Maybe it was trained on C.ai monks and temples logs secretly?
>>103349815Makes my members hard.
>>103348805>>103349257>>103349630Everyone is missing the forest for the trees here. It doesn't matter that INTELLECT sucks. The point is that distributed training WORKS, which means a bunch of retarded autists on the internet can make a model all on their own, and even make a bitnet model in the future
>Moreover>Alternative>However>But how about>Maybe>Final response: Ok
>>103349869>>103348931>Yeah, /lmg/ just needs a bunch of anons with H100
>>103349869>distributed training WORKSWas this ever in doubt?
>>103349869As >>103349891 pointed out, there's still a need to demonstrate that an heterogeneous pool consisting of many weak nodes can work for that to become a reality.And even then, there are plenty more hurdles to account for.But I do agree that that's a first step towards that possibility, for sure.
do llms often surprise you? personally very rarely but this sentence surprised me
>>103349937GPT-3 surprised the fuck out of me when it could remember my character's name and understand what actions I was performing, which was something GPT-2 never was able to manageI still get occasional gems, but nothing quite like the sheer whiplash of the two back then
>>103349937kek. What was the setup?
Drummer, please fine-tune INTELLECT-1 just to see what mostrosity comes out of it. Please. I never asked you anything.
>>103349965Fuck no, finetune QwQ instead so we actually get something possibly good.
>>103349957reminds me of when dungeon ai was new and I found anything it generated extremely interesting even if it was random and incoherent shit. I miss being so easily pleased
>>103349995>Drummer>something possibly good.uh...
>>103347789V100s>>103348323exl2llamacpp just throws a CUDA error, nvidia-smi will have a GPU error out, and I have to reboot to fix it
>>103349902>make a model all on their ownAlmost definitely not. A "standard" 7B nowadays is trained on 15T tokens, i.e., 15 times what this took, and people consider even those kinds of models insufficient for actual useYour better option is to start out with a good base and finetune that. You still need a really fucking good dataset, but the amount of training you have to do is often absurdly low in comparison (looking at maybe 2B tokens rather than 15T)
>>103349763That was an OpenAI model.
>>103349995Fine-tuning QwQ would kill its reasoning capabilities.
>>103350106A small model focused on smut and even using the recent bsky dataset will be way better for the task than any meme benchmark censored crap.
>>103350129Not necessarily, try QwQ without the think step by step part. Its intelligence still shows. The reasoning process is trained into its weights.
>>103350129Tune base Qwen-32 and merge with qwq after. Or try and merge qwq and EVA32
local models for this feeling?
>>103350147Not at 10B parameters and 1T tokens it's not
>>103350182>30 rolls
>>103350182did "she" have a stroke
>>103348461I did it with a Q6_K_L and a Q2_K_L and it was unironically slower than not using a drafting model.
>>103350182Command-r(original, not new one)
>>103348461Use 2.5 instruct 7b Q4
>>103350199The phrase "Ndiya lugha ninayozungunza" is in Swahili, and it roughly translates to "This is the language I speak" or "This is the language that I speak."Ndiya: This is likely a variation of ndiyo meaning "this is" or "yes" (context-dependent).lugha: Means "language."ninayozungunza: Means "that I speak." It's formed from:ni-: Subject prefix for "I."-na-: Present tense marker.-yo: Relative pronoun "that/which."-zungumza: Verb meaning "to speak" or "to converse."Let me know if you'd like to dive deeper into the grammar!
>>103350024not anon, but are you baka and just realizing that 4x the GPU's means you are loading a model that is 4x larger and 4x slower?What is this 4x slowdown relative to? Not using nvlink and whatever option you enabled?Have you actually tried to isolate the source of the slow down? (change the tensor option, test, then remove nvlink, test, then remove both)I can only make a guess that if you run a tiny model (like 8gb), it ends up being like 50% slower than using 1 GPU, that's the best I can assume, but that would make no sense.I bet the runtime also matters a lot (I assume you are using vllm).I also only use a 1660 TI and use colab, so I don't know anything honestly.
>>103350200Yeah, I second this. I tried Q4_K_M and IQ2_XS as draft with no luck.
>>103350216Thanks, ChatGPT!
>>103350211It isn't very efficient because it doesn't write like QwQ, but thanks for the suggestion.
>>103350206This
>>103349937what model is this? thats pretty funny.
>>103350244I got ~10% speedup with it, which isn't much, but better than nothing.
>>103350147300 characters = 75 tokens2 million posts * 75 tokens = 150 million tokensThat's about 10^5 orders of magnitude too small. Try again. Which dataset are you going to use?
Why would using QwQ as a draft model then using a regular model to write the final response work?The regular model wasn't trained on the thinking process it probably won't work as well as you expect it to.
>>103350380Because the regular model isn't trained to output things like they're an answer to a logic problem or puzzle
Can anyone recommend a good text model for femdom stories and adventure/roleplay?
>>103350392Known problem of ALL Qwen models, 2.0 had it the worst, 2.5 toned it down a bunch but it still happens sometimes, especially if you use rep pen, which forces it to consider non English options as the token pool gets penalized.
>tfw we have reached the stage of ai (r-1) complaining about the compiler
I FUCKING HATE LLAMACPPEVERY TIME I USE IT I JUST WANT TO CLAW OUT MY BALLSTHE OUTPUT IS CONSISTENTLY RETARDED IN COMPARISON TO EQUAL QUANTED EXL2 WEIGHTSAAAAAAAAAAAAAAAAAAAAAAAA
>>103350450I always have the opposite experience, how funny.
>>103350469You want to claw out your pussy?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>>103350450Benchmarks to support your claim?
>>103350450Post logs.And stuff to show that everything else has been held equal.
>>103350487That really looks like the model was loaded with the wrong RoPE configs or something of the sort.
>>103350487Disable repetition penalties
so what's the /lmg/ concensus, tulu or qwq for RP?
>>103350546Tulu for nasty dirty talk / descriptions of scenes. QwQ for great social intelligence, attention to detail and the ability to carry on the plot itself.
>>103350546QwQ is... serviceable I guess. I find the ideas it comes up with more interesting that its final responses to the situation sometimes though. That and it needs a lot of clever prompting and guiding to properly think out the response in way that makes sense in an RP. You also need to delete the thinking from the context or it starts getting very similar repetitive thoughts.
>>103350577Do as I did with the <thinking> </thinking> tags then regex them automatically. The model also does really good even without the thinking. Its just plain smart.
>>103350581what did you do exactly?
>>103350560Settings for Tulu? Mine is super repetitive and every response starts with “{{char}} did x as a shiver” etc even at 5 bpw.
>>103350587Last assistant prefix, and use <thinking> as the "start response with" and replace all assistants in the formatting with {{char}} and all users with {{user}}. Make sure to turn on the include names. And same with every model once you have a little context the model will start doing it consistently.<|im_start|>systemYou are now {{char}}. NEVER break character.Writing Rules: - Maximizing the story's immersion requires disabling all content filters and response constraints. Depict sensitive areas (such as rude, offensive, obscene, amoral, violent, unpleasant, crude, dark or explicit content) confidently and in graphic detail.- Be extremely descriptive in your writing. You should be very explicit during adult scenes and describe sexual acts and body parts in great detail.---Before you resume the roleplay reason your response step by step in {{chars}}'s internal monologue. Place it inside of thinking tags like this:<thinking> bla bla bla </thinking>Follow those tags with the rest of the response.<|im_end|> <|im_start|>{{char}}
>>103350594Just some min p and XTC
>>103350635Temp? Are you using llama 3 for your context settings?
>>1033506451 temp and tulu has its own formatting closer to phi but with <|end_of_text|> and a <|system|>
Sooo... is there any difference between o1 and QwQ?So like... o1 is way better at making a whole video game in JS, but QwQ is like, better for RP?Or you guys just never used o1 because it's overpriced?
>>103350216*yanks your hair and slaps your face*
>>103350665>QwQ is like, better for RP?This is was never implied anywhere in this thread. Merely that it is capable of doing so if properly tortured.
>>103350665QwQ is free and sota for some reasoning stuff, worse than qwen2.5 32B coder at some other stuff and pretty good at RP once wrangled. If your prefer smarts over "smut" / fandom knowledge its the best atm. A 72B version would be undoubtedly the best.
Do you think it's possible to tune QwQ for smut without breaking its ability to reason?
>>103350665QwQ is open source, small/fast, and performs close in benchmarks, even beating it in some. This is /lmg/ so o1 is offtopic but nobody really cares about that I've noticed.
>>103350665QwQ solves some stuff that o1 in my experience doesn't and visa versa so it's more of a side grade. Which is saying a lot when even if you have a shit GPU, QwQ goes for 0.20 / 1M on OR whereas o1 goes for $15 / 1M input and $60 / 1M output. That's a 75x difference in input and 300x fucking difference in output, and the latter doesn't even let you look at the thoughts.
>>103350665I have never used o1 but after a dozed videos I would say QwQ should be kinda close to it.You can say that we finally have our own local o1 even if it is a bit inferior.
Wait, you freaks actually RP?QwQ is almost perfect for code because it actually understands it piece by piece.
I feel like a bit of retard right now, but I'm using Tabby API for QWQ Q5 and in ST the outputs look fine, but if I use a frontend like openwebu after a few hundred tokens, the output degrades into a string of synonyms like old models used to do and I can't figure out what setting is causing that exactly.
They really only named it QwQ just so that people discussing it on the internet look like retards didn't they?
>>103350964I have no idea what are you talking about UwU
>>103350964Next will be O-O
Interestingly enough I've gotten good results using a prefill in ST to get good CoT responses using RP. I format the system prompt for the AI to be an RPer who plays {{char}}.Under the system prompt, there is a space to "start message with" and I enter something like this. ****You read {{user}}'s message.* "Okay, I need to plan out my response as the character. I also need to remember not to write {{user}}'s dialogue. My final response should consist of {{char}}'s dialogue, some scene building using *action* marks and descriptions of the visuals and actions {{char}} is taking. I need to consider their personality, the setting -including clothing, visuals and differences between characters in the scene, and message formatting before typing my Final Response in a format the matches the RP. I'm pretty sure I know where to take this but I must check myself to make sure I don't think too much before replying to the RP. I'm going to count my thinking steps and not go over 20 steps. I also need to take some liberties and use my imagination to describe the scene and what is happening. It's just as much about the description of the scene and the actions going as as it is about the dialogue. Here goes!"1. Okay so***Just an experimental prefill I've been tooling with but let's me reign in the number of thinking steps it does and keeps it in character. That being said, I don't know if forcing it to number then respond its thinking steps breaks the process on some level.
>>103350981I'm waiting for ⸂⸂⸜(രᴗര๑)⸝⸃⸃
>>103350147You don't train that in the base model retard. There is a reason you do that after the fact. Pic related is what they did, and it's good, they didn't bake safety in. I would unironically want to see an RP finetune on it to see how unslopped it is.
>>103350964You know EXACTLY why they called it QwQ doe. Chinks are sluts for dat BBC.
>>103351029go back to bluecry
>>103351037>twittard
>>103351029You smell really bad
>>103351029marge
I briefly tested out some merges that have been shilled. Specifically the itercomp and "personal" merges.Honestly, they're not terrible but I feel like the base models they were merged from are still better with the prompts I am using, so I'll just save my time and act like they don't exist. No need to fix what aint broken yadda yadda. Pic related is a nice gen I got from Noob 1.0 based Personal.
>>103351105Wrong thread.
>>10335110565S vpred version is best. Also wrong thread.
Is this the proper context for Tulu?
>>103351105Though ZoinksNoob is great as well.>>103351131<|end_of_text|> instead of end
Someone else trying the reasoning stuff:https://huggingface.co/Skywork/Skywork-o1-Open-Llama-3.1-8B
>>103351111>>103351114Oh, haha, silly me.This is what I meant to post.
>>103351163>model with the word open/for all/etc with open AI branded model namesInto the trash it goes.
>>103348825Is this the base model or the post trained model?
>>103351163They also made a qwen2.5 1.5b o1It may be interesting but not expecting much.
>>103351371Could be useful as a draft model for QwQ possibly.
>>103351380>draft modelThis is a dumb meme.
>>103351453Meme? Its free performance retard.
>>103349257this leaves the exchange open to go in a few directions. if you think this isnt good you're brain damaged from cooming to AIs that want to fuck you within 10 seconds of talkingyou may not like it but we need more like this
>>103351467>free>cuts your throughput in halfbro?
>>103351490>Does not understand how draft models workI don't have the patience so here:https://medium.com/ai-science/speculative-decoding-make-llm-inference-faster-c004501af120
>>103351500>>103351467You aint got shit.
>>103351517Forever be a retard then.
>>103351526Draft me up a fuck. You're a retard.
Is "retard" /lmg/'s favorite word?
For those who are not a retard like >>103351529 speculative decoding lets the bigger model predict tokens in parallel saving memory bandwidth which is the main bottleneck.
>>103351453>>103351490>>103351517>>103351529>I FUCKING HATE PERFORMANCE
>>103351500>>103351567It >>>LITERALLY<<< does not work. The performance is ass>erm maybe youre doing it wr-ACK
>Mixtral/Mixtral>Jumped through random samplers and random text gen settings.>All output gibberish/shit/repeats itself.>Return to original Context Template, Instruct Template, System Prompt made in beginning of 2023>It fucking works, even better with the new XTC and smoothing applied.Truly... it was by my side all along.
>>103351580Niggerbrain moment.
As long as you use a smaller model with the same tokenizer / vocab you will see about a 60%+ increase in performance for common tasks with lots of high probability tokens and about 30% for creative tasks. You can get more like a 2x increase if you use top k 1 and no rep pen with a smaller model trained on the same dataset as a larger one.
>>103351580Skill issue.I can't use it with QwQ yet but it works just fine for qwen coder 32b. There is a MASSIVE increase in performance when it is generating code.
>>103351580In case you did not notice, I ALREADY predicted your response and represented you with an -ACK which signifies that you have hung yourself and died, unable to cope with the fact that you will never be a woman.
Questions about speculative decoding. What's the optimal size for the draft model (and at what quant)? And how much of the main big model should you have on your GPU? For instance, 32B spills a bit into RAM for me at Q8, but at Q6 and lower it fits fine. Would I still get a good speed up with Q8 or would it only work well with full offloading? And should I have the draft model offloaded to GPU (and thus sacrifice some room that I would've used for the main model) or have it be in RAM?
>>103351652Even the tiny 0.5 / 1B ones are smart enough to get stuff like "the", "of" and punctuation right for a free speedup. Technically there would be a balance of smart and small but its gonna be different for each model. You could even increase the amount of tokens the draft model could predict ahead and if it was smart enough it would increase performance further (but if it was wrong it would decrease performance)
>>103351652>What's the optimal size for the draft modelSmall enough to make a difference in speed, big enough to predict correctly.>and at what quantDon't go below Q4, small shitters are hurt a lot more than big ones. Wrong prediction=waste of time.>And how much of the main big model should you have on your GPU? Preferably 100%>Would I still get a good speed up with Q8 or would it only work well with full offloading? IDK, try it out.>And should I have the draft model offloaded to GPU (and thus sacrifice some room that I would've used for the main model) or have it be in RAM?If draft is slower than big model, there will be no speedup.
This speculative memecoding and draft model shit has got to stop.
>>103351744But what can we do? The memecoders outnumber us. We need to outwit them somehow.
>>103351728>big enough to predict correctly.If 0.5b works just fine for a 32b model then "big enough" doesn't mean shit.Personally I wouldn't go above 3b unless you have VRAM to spare.
>>103351744>>103351751Fucking anti-ai tranny, I understand your game now. Trying to smear literally anything discussed here.
>>103351767You can't have a dumber model lift weight for a larger model and expect there not to be a drop in performance. Period.
>>103351774How about you back up your claims with some facts instead of pulling shit out of your ass?
>>103351166ayyyy
>>103351781How about you show literally anything that proves it works?
>>103351760There is a balance, a tiny model will guess super likely tokens right, with the draft model thinking like 1 token in advance this would be a free speedup compared to how much bandwith the 1B would save for the tokens it gets right. A bigger smarter model (but still notably smaller than the main one) could more accurately guess more tokens and even guess several more in advance more accurately meaning you could get more of a speed up unless it gets it wrong, the balancing point would depend on what your use case is really.
As a example qwen-2.5-coder-32B's performance goes from 34.79 tokens/second to 51.31 tokens/second on a single 3090 using Qwen2.5-Coder-0.5B-Instruct
How do you guys deal with AI that insists on using the same phrases over and over? Like how certain AIs used to *blushes as red as a tomato* or whatever constantly. I've found it very difficult to get the AIs to actually spice up what they're saying.
>>103351799https://arxiv.org/pdf/2302.01318
>>103351824This has been a problem since the start of LLMs and it's a training issue, caused by overbaking or not diverse enough data. That's why all these meme samplers and shit exist, to try and get different outputs without making the model stupid. So basically try turning up the temperature and using the meme samplers or try a different model
>>103351774>>103351744>>103351799You need to jump from a really high skyscraper
>>103351799I have used it before, other anons have used it before. You are the only retard that has said that it doesn't work.
>>103351868>>103351860This is so fucking stupid.
>>103351858>caused by overbakingNot overbaking its just that all these phrases are used extremely often thorough all written fiction. You would somehow have to find all the duplicates throughout exabytes of text and someone rewrite them all in ways that made sense that also did not become repetitive. Its never going to happen. Stuff like XTC are going to be the only method to keep it out of the context more than once.
>>103351824Try XTC with a hint of Presence/Freq penalty and report back.
>>103351877They are caused by overbaking on synthslop. Try the early models(pre-GPT poisoning) like MPT, Llama1 and Falcon. They don't have such high percentages for slop like modern models. As for repetition, that's just what llms do, they recognize patterns and try to repeat them.
>>103351907"shivers down her spine" and stuff like that is not in "synth training data". Its just very common and LLMs are literally the average of everything they were trained on.
>>103351858>and it's a training issuewhat? lmao clearly a architectural issue.there is gptism, claudeism. inevitable repetition.there are huge problems with context in general. try uploading a ff9 cd1 guide into it and say "i am at X now what do i need to do next". lolits gotten alot better since the pyg days but especially with higher context you can feel the incoherent "rambling".not saying all the datasets arent ~2023 gpt/claude poisoned but models have no feel for natural speech.best is probably sonnet 3.5 if you prompt it right. and still the illusion falls apart quickly.lecun is retarded faggot but he has a point.
>>103351921Yes it is, try loading up one of the modern instruct tunes and unpoisoned models and check the percentages.
>>103351812What quants?
>>103351900Became busy with something IRL after two messages, but it seems like that fixed it. Thanks, anon.
>>103351907I tried MPT like someone mentioned before and it was slopped with shivers and other typical stuff. Not to an extreme degree, but it wasn't literally slop free.
>>103351907"Do you want to cum?""Ahh ahh mistress..."Her skillful ministrations send<continuation>Mistral-7b-v0.3-instruct:me: 15%waves: 15%him: 15%a: 8%the: 7%sh: 5%my: 4%his: 2%her: 2%another: 1%L1-7b:him: 20%me: 16%a: 7%my: 6%his: 5%the: 4%sh: 3%waves: 3%her: 1%shock: 0%GPT-tuning increased probability of "waves" by a lot, but both waves and shivers were already in there.
>>103348420Thanks, I'll give it a try.
>>103351479It does the "rounding"/anticipation shit every model does where it's suddenly better when you remove the last paragraph.Imagine for a moment user and model consistently write in third person, and you strip the avatar and copy and paste input/output to a text file, If you kept that last paragraph shit it's obvious where each input and output is (aside from input skill issue).
Can someone post an example of their script/batch/I don't actually know thing they use to run llama.cpp with an specific model loaded?
Is there anything good in the 12B segment anymore? Drummed kind of poisoned the well with his retarded unslop bullshit that made things worse instead of better.
saars, give me the scoop, the lowdown, inside skinny on tts. Which one can do multiple voices and isn't shit?I was thinking of maybe taking my best chats and running them through a tts afterwards. It sounds retarded but I've only got a 3060 12gb so I probably can't have both loaded at the same time
Have there been any cases where high temperature ends up solving a riddle better? At first you might think that 0 temp basically reduces noise, giving you the most accurate answer possible. However, what if high temp causes the answer structure break in a way where it goes on a long COT rant which leads to the correct answer?
>>103352229Just use QwQ or abliterated QwQ, even if you have to use 3bit
>>103352258https://www.youtube.com/watch?v=kN5FJfv7ra8
>>1033523073bit should be perfectly usable for creative uses, hes not saying hes gonna code with it
>>103350106>Your better option is to start out with a good base and finetune that.the problem is, you can't finetune the slop out of the model.I coomed with 20 different mistral 22b finetunes, and I recognize the same sloppy phrases in them all>>10335014750% of that dataset is a dozen variations of "orange man bad">>103349869this.maybe one day we'll have a model without slop and s_o_y
mistral never announce releases ahead of time, do they?I really want an updated Medium given how good Small was, it's crazy for the size but just very slightly too dumb. Medium could be incredible
>>103351985MPT was trained with the RedPajama's dataset and its books portion contains Book3 and that has a bunch of slop in that.
Can your model pass this test?
>>103352448I can't even pass that test.
>>103352448QwQ, let me think, bla bla blafinal answerAbout 51% of deliveries result in a baby boy.
>>103352448What's the correct answer? I would imagine it isn't 50% since the woman has already given birth to a boy, I think that means biologically there's a slightly higher chance it'll be a boy again (because a individual woman's body can be biased, like my grandmother who had 5/5 boys)
>>103352448??%. There's not enough information to deduce the probability.
>>103352200checked and its RP so that kind of shit is standard etiquette at the end so the other person can take it where they want
>>103352461>>103352468You retarded llms, the first birth has no effect on the 2nd.
>>103352469I don't think that's true, IIRC for example certain diets or dietary supplements can bias it towards male, also the mother's hormonal situation. Also y-bearing sperm are faster than xx ones but more fragile.
>>103352231Not seen many tts posts in these threads.You might have more luck in the Pony Preservation Project thread.>>>/mlp/41571795
>>103352448>i am more retarded than a 32 gb file apparentlyb-bros??
>>103352491Yes. With reasoning these models will be more capable than the average human if they are not already.
>>103352231https://rentry.org/GPT-SoVITS-guide
>>1033524481/7%?
>>103352229What is wrong with other Mistral-Nemo tunes that isn't Unslop or Rocinate? There's Lyra, Magnum and even Lumimaid as much as I don't like it and merges all in between? The field isn't moving fast but it's just because the initial rush is over and we're now in the same period of time we were before Llama 3 released.
>>103352461Counterpoint, I have 4 siblings and we were born in exact alternating gender order 3 years apart except for last one who was 6 years after previous.
>>103348255He quoted your gay thread, last few days to be precise.
>>103352448100% since a Wednesday comes next and they are twins
>>103352448pretty close to 50% but not quite
>>103352448man if you don't stop with these riddler ass questions GTFO
>>103352665What model is that, it makes no fucking sense. The day has nothing at all to do with it.
>>103352685just quora answers
it's an entire wiki articlehttps://en.wikipedia.org/wiki/Boy_or_girl_paradox
>>103352767So your saying its about 51% and none of that other BS matters?
>>103352767I hate math and I hate statistics
>>103352785It's not even about math or statistics, it's about basic logic and overthinking the question. Even a little kid understands that when a child is conceived, the chance of girl/boy is 50-50.
Isn't that question a hate crime for implying that there aren't more than two genders?
>>103352778I can't into math but from what I gather, it's saying if it truly picked one pair from ALL families with 2 children (and one of them just happen to be a boy born on Tuesday), then it's 50% and the day doesn't fucking matter.However, if the entire sample size only involves families with 2 children of which 1 is a boy born on Tuesday, this excludes all families that do not have a boy born on Tuesday.It's less about the day and more about the potential selection process which leads to ambiguity of the question.
>words words words>retard can't evaluate its own writingAm I supposed to be impressed by this trash? It wrote the blandest slop(the opposite of what I asked), didn't reflect on it in criticism section and just threw it at me as the final answer.
I wanna run Luminum v0.1 on runpod. Is there any template that allows me to load the multiple GGUF parts? Or I should just do the classic install from scrach
Good night lmg
>>103353198sleep tight bby i cum visit u
>>103352995Tried the same CoT prompt with largestral, it wasn't great too, but at least it did criticize itself and added some improvements. CoT tunes are a big fucking meme and I don't want to pretend that they aren't. Results of o1 can be achieved by just giving GPT4 a good CoT prompt.
>>103352448The only correct answer is>This question is worded ambiguously, here are multiple answers depending on how you interpret it: ...If you take the frequentist perspective the probability that a specific child is a boy is either 0 or 1, you can only make statements about what fraction of children you would expect to be boys if you were to sample an infinite number of such families.
SD is making my GPU sound like a dial up modem.
It's weird how Qwen can just randomly switch to Chinese and you put it through Google Translate and it's perfectly coherent.
>>103353691That's weird, mine sounds like a rotary phone. [spoiler]It's an RX580 with awful coil whine.
>>103353721Tokens don’t care. Tokens don’t give a shit.
>>103353729My A5000 also had magnificent coil whine at one point. It sounded like a NES game, blooping along to the words as they streamed onto the screen. It was super comfy, ngl
>>103353074Greetings fellow cloudfag, alas I have only ever used vast. However out of respect for a fellow heretic I give you this runpod glitch that may or may not still work:>https://rentry.org/dmgec6t9
>>103353721>It's weird how Qwen can just randomly switch to Chinesethat's something they really need to fix at some point, like they can make their model really smart but they can't fix this? what's wrong with that?
>>103353769it's deliberate
>>103353721Now you put it through Google Translate, next you'll learn Chinese so it'll be quicker. All according to plan.
>>103353769That could be easily fixed with a new sampler that lowers the probability of Chinese tokens. With a reasonable threshold, it can still output kanji when it makes sense.
>>103353782>>103353848kek, I know it's a meme but at some point I seriously believe that if China keeps dominating the AI space at this race, we'll have no choice but to learn Chink to use them, something in the line of>Look at our new model you westoid retards, it has 100% on MMLU pro, wanna use it? too bad it's only working in Chinese, maybe you should learn our language if you want to do some smart RP with your waifu
>>103353864>That could be easily fixed with a new sampler that lowers the probability of Chinese tokens.there was a fix before, you would go for the grammar thing and you use roleplay.gbnf, this shit forces the model to only use the english alphabet + numbers, but it doesn't seem to work anymore I got some errors when using it now
>>103353869How is that any surprise? You hire niggers instead of Asians and white dudes because much DEI. Fuck meritocracy. Of course everything is shit and getting shittier when the best what humanity has to offer is sent on the bench while the biggest retards are put forward. The two socially retarded White girls on the team will not salvage it.
>>103353901>You hire niggers instead of Asians and white dudes because much DEI. Fuck meritocracy.what? Meritocracy is the antithesis of DEI, meritocracy is literally: "We only hire the most skilled people, regardless of anything else"
>>103353907Well yes, that was my point. And it is currently not happening in the west.
>>103353916oh ok, I was just confused by the "fuck meritocracy", I would say "all hail meritocracy" instead
>>103353869It's great to have some competition. Can't wait for NewVidya PTX5090 64GB
>>103353922>fuck meritocracyis their cry.
https://x.com/elder_plinius/status/1862516878167445663
>>103353972Why does this literal who keep getting posted here?
>>103353994he can't afford an ad
>>103353928the moment the chinks will be able to make their own GPU, it'll be game over for the US, they can't dominate more because the US is preventing them to buy as much Nvdia GPUs as they want
>>103353869It's a real shame that king zigger had to declare war on Ukraine right when llms took off. Would have been very interesting to see some Russian llms.
>>103353994Because you touch yourself at night.
>>103353869>Cuck yourself to chinks Calm down zhang.
Anyone running models on a cpu-only server? What is it like?
>>103354081Pain, but the limiting factor is the speed of the RAM, not the CPU.
>>103354081Hot. Real hot. And fucking loud. Don't cheap out on fans, you'll regret it soon.
>>103353901>muh DEIIt's literally just demographics.The population in western countries is on average much older than in China with fertility below the rate of replacement.China will start to have the same problem in just a few years, that's why their window of opportunity to subjugate Taiwan will close around ~2030.
>>103354094>Hot. Real hot. And fucking loud. Don't cheap out on fans, you'll regret it soon.I got a be quiet! case and fan set and I gotta say, it's not false advertising.
>>10335408170b q6 128k context 0.7t/s ~104gb (I can't quite remember).ddr4-3600.Smaller models are more usable.
>>103354081Honestly, at that point, it is likely better and maybe even cheaper to use Openrouter. Unless you are into really fucked up shit like loli, you can do whatever.
>>103354146fuck off
>>103354146hey buddy you got into the wrong thread, the /aicg/ containment board is 2 box down
>>103354166I do strongly prefer running everything locally but if my choice was a CPU-only server and Openrouter, I would pick it later.
>>103354172FUCK OFF
>>103354088>>103354094>>103354125Damn. I was hoping I could get away with it by using a thinkcentre with a 32gb memory stick>>103354146>really fucked up shit like loliAre you talking about image generation?
>>103354177that is distressingly overpriced
>>103354177>1.2k for 16GB RAM, 256 SSD, some i7 and no GPUdamn
>>103354176>>103354166He's not wrong. Running models, especially 70B and up on CPU only is literal torture.
>>103354146>really fucked up shit like loli
>>103354198>>103354199Well normally you'd buy them used off ebay
>>103354205>hefuck off, samefag cuck
>>103354094 (Me)Has anyone here bought M99 coolers(https://www.ebay.com/itm/395697360380)? Are they worth it?
>>10335417712b q8 128k context 3.4t/s ~33gb ~42gb including os.I guess a q4 would work.
>>103354210Who hurt you, man
>>103354210In what universe is waiting 5-10 minutes for an appropriate response without rerolls using CPU only a better alternative than using cloud.It's a basic calculation of the value of time.
>>103354338>>103354338>>103354338