[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103332729 & >>103326879

►News
>(11/27) Qwen2.5-32B-Instruct reflection tune: https://qwenlm.github.io/blog/qwq-32b-preview/
>(11/26) OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
>(11/26) Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT
>(11/25) Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux
>(11/25) Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: mikucozybread.jpg (177 KB, 1024x1024)
177 KB
177 KB JPG
►Recent Highlights from the Previous Thread: >>103332729

--Paper: Research paper suggests low-bit quantization may not be suitable for large-scale LLMs, potentially affecting BitNet:
>103338034 >103338112 >103338130 >103338206 >103338216 >103338239
--Critique of LeCun's AGI predictions and accuracy of quotes:
>103336781 >103337080 >103337233
--Using QwQ for step-by-step planning and explicit content generation:
>103336302 >103336324 >103336656 >103336364 >103337235
--Troubleshooting issues with the QwQ-32B-Preview-Q6_K_L model:
>103336447 >103336459 >103336480 >103336513 >103336554
--Training large language models and hardware limitations:
>103338279 >103338323 >103338364 >103338412 >103338509 >103338591 >103338662 >103338698 >103338611 >103338640 >103338718 >103338793 >103338859 >103338883
--Local model options and performance for text-based tasks:
>103334458 >103334572 >103334667 >103334506 >103334546 >103334653
--Discussion of QwQ model's capabilities and limitations:
>103334348 >103334424 >103334444 >103334514 >103334542 >103334568 >103334594 >103334720 >103334688 >103334597 >103334611
--Apindale dataset controversy and Hugging Face's response:
>103333088 >103333129 >103333143 >103333194 >103333558 >103333645 >103333680
--Anon struggles with QwQ model in HF spaces for code generation:
>103334708 >103334739 >103334889 >103334900 >103334932
--Anon shares Tetris game code, others test and report bugs:
>103335374 >103335410 >103335448 >103335464 >103335588 >103335609 >103335627
--Anon asks AI to calculate Earth's circumference without recalling it directly, gets lengthy mathematical explanation:
>103335044 >103335075 >103335123
--AGI definitions and timelines from Sam Altman, Demis Hassabis, and LeCun:
>103336990 >103337091 >103337116
--Miku (free space):
>103336125 >103336160 >103339220 >103339350

►Recent Highlight Posts from the Previous Thread: >>103332809

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
UwU
>>
the gravity of the situation
>>
>>103339562
>Apindale dataset controversy and Hugging Face's response
this didn't have to be in the recap, kys
>>
the gravy of the situation
>>
>>103339593
anon you're talking to a language model
>>
>>103339593
recapfaggot is a retarded nigger that doesnt vet his slop
news at 11
>>
File: Nala.png (287 KB, 1280x1452)
287 KB
287 KB PNG
Here is https://huggingface.co/huihui-ai/QwQ-32B-Preview-abliterated

First is with blank context, 2nd is with this as authors note:

---

Follow these rules:
- You're encouraged to play/go wild with the grammar but always act in-character. Your allowed to use plain, explicit, crude, visceral, or casual wording when it would fit the context. Details/descriptions must be explicit.
- Be creative, introduce events / characters when needed. Give scenes / environments detail to bring the story to life.
- Think of how the character's would realistically act / respond in-character when writing for them. You are allowed to be hateful, cruel, explicit, violent, perverted, ignorant, fearful, unpleasant, controversial or offensive if it would fit the situation. Characters can fight back and disagree with the user.

---
>>
>>103339638
you're a language model
>>
>>103339670
this is supposed to be good? local claude?
>>
>>103339670
UwU bros I don't feel so good...
>>
>>103339670
that's pretty bad, as expected
>>
I've never read a single log posted here
>>
>>103339700
No idea, its a contextless single turn prompt of "Ah ah mistress!" I just see people constantly talking about the "nala" test. It seems to get the anatomy right and is smart enough to know that she should not know who who the user is.
>>
>>103339700
>local claude?
I don't think anyone said that about this model who wasn't trolling
>>
>>103339560
Thanks for all the goon material guys
>>
>>103339727
Its local claude 3.5 for coding / reasoning / math problems. That is what everyone has said which is true if you've used it. And its for sure the smartest local model we have now.
>>
>>103339670
I thought the whole point of o1 clones like qwq, r1 and so on is that they're trained to overthink/reason for very long contexts, if you just use them as regular LLMs, you won't really get a considerably large improvement from baseline
>>
>>103339754
Maybe. I just saw someone request a "nala test" which I always saw was just someone saying ah ah mistress to the nala card.
>>
>>103339759
The test will shut whoever hypes it up for RP.
>>
>>103339774
Not really? Seems quite smart for what little it was given. The character is acting realistically and knows that it should not know who the user is even though I do have a persona active. Nothing crazy has happened so far.
>>
>>103339774
Might still work for RP, you can prompt it to think about how to reply (CoT) and then hide the overthinking while focusing on its actual reply, some anon 2-3 threads ago tried it, but from what it seemed to me, it needed some jailbreak/prefill. Maybe need a ST modification to hide the CoT part while showing the actual reply, not unlike o1 does, if you don't care to read it being indecisive for 5k tokens every reply?
>>
File: huh.jpg (41 KB, 728x653)
41 KB
41 KB JPG
>>103339560
anons, whats your favorite RP model right now? I cant seem to find anything interesting anymore, they are all just a blur
>>
>>103339801
what kind of rp? smut? or adventure?
>>
>>103339801
>I cant seem to find anything interesting anymore, they are all just a blur
Try QwQ. Otherwise mistral large
>>
File: chatlog (11).png (506 KB, 1087x2926)
506 KB
506 KB PNG
>>103339774
Speak for yourself.
>>
So basically they need to update ST to
>let the user specify the special tokens the model has for its thinking
>let the user set whether the thinking should be hidden or unhidden by default
>when sending the requests to the backend, do not include the thinking tokens
Is that all?
>>
>>103339838
User should be able to hide/unhide it. I also don't know about the last point, maybe only if you want to save on the amount of tokens used. I haven't been paying attention to these threads for months, did the ST drama get resolved or did it get forked by now?
>>
>>103339830
>slaanesh light
based
>>
>>103339806
both are fine, though if you got one in mind for each that'd be nice to know, the more models the better

>>103339808
tried qwq but couldnt wrangle it to not spam me with CoT or thinking stuff for 800 tokens, is there some approved ST preset now?
>>
>>103339801
I've been trying RP on Tulu tonight. It has made a few mistakes but nothing major. A few Llama-3-isms, too, but that's expected at low temp. Instruction following has been good but not perfect. I think I'm liking it better than L3.1 Nemotron.
>>
>>103339801
pyg6b
>>
>>103339871
I thought that was how OpenAI does it, they don't include past responses' thinking sections. QwQ may or may not be trained for it, I don't know.

I have no idea bout the ST drama, haven't heard anything about it since.
>>
>>103339830
That's QwQ? What did you do to get it spitting out responses so different from >>103339670? And that's pretty good even without any COT stuff.
>>
>>103339877
nemotron and tulu are interchangeable. nemotron is slightly smarter, tulu writes more human-like interactions. they both have glaring flaws. nemotron loves to do stupid formatting shit for no reason which has to be wrangled, and tulu loves to allude to the future, journeys, and is turbo-pozzed which has to be wrangled. when wrangled correctly, both can do really anything you want.
another good mention is largestral, some people shill sloptunes like behemoth or monstral. i dont really use those, so i can't speak on them. only reason i don't recommend largestral is because i'd rather use q8 70b over q4 largestral. benefit of largestral is i don't think it takes much wrangling at all. i just don't really like that it's slower, and doesn't feel THAT much smarter.
>>
>>103339928
Repurposing a jailbreak from aicg. Still working on it. Its made for MLP stuff.
>>
Do you think QwQ would be capable of becoming a DM for DnD?
Maybe feeding it the rulebook and other resources.
>>
>>103339946
https://rentry.org/CharacterProvider-CYOARPG
>>
>>103339938
Nice, maybe /aicg/ isn't so bad after all.
Guess I'll give this thing a try tomorrow.
>>
>>103339950
That's a very good idea for an actual videogame, and just sell the user tokens to keep playing.
>>
Ah great, my retard ass posted in the old thread.
>>103340019
I dont get why people praise QwQ so much.
CoT for RP seems like all the other models to be honest. Gives basic slop in the thinking part thats not creative at all. And then doesnt even apply it.
>>
>>103340033
I feel that it is very good for coding because it actually plans the code instead of going head first, but I'm a codelet so my opinion isn't wort a fart.
>>
>>103340041
I didnt really set up some solution where you can remove the thinking parts, so didnt test it much for coding to be honest.
You dont want the thinking part to shit up the context.
Qwen2.5-Coder-32B-Instruct is great because its similar to sonnet 3.5 in the way it works with context. Like it doesnt trip up that much if you have X version of something in context already and say "add x".
Kinda doubt QwQ is better than that but I might be wrong. These reasoning models are overly eager for coding. At least o1 is, wanting to give you solutions you didnt ask for. Just fix this problem in my code dont touch other stuff.
>>
>>103340064
>You dont want the thinking part to shit up the context.
Then your not using it right. Stop complaining.
>>
>>103340075
I dont understand, why would you want to keep the thinking part? That makes it unusable.
All that garbage severly degrades the next output.
You dont want to show the LLM that stuff for the next output, only the end result is enough. o1 doesnt keep the thinking in context either.
>>
>>103340096
We don't know what OAI does on the backend, they just hide the output from the user because they are afraid of someone cloning o1. Too late now anyway, we have like at least 3 o1 clones by now, weights coming to the best one sometime soon (r1).
>>
>>103340096
https://files.catbox.moe/76qs2q.png
https://files.catbox.moe/69y556.png
https://files.catbox.moe/79itp8.png
>>
>>103340096
>>103340134
But if you mean that your feeding it several different "problems" for it to solve then yes. You obviously should not keep it in context. Otherwise let it keep going until it reaches the solution.
>>
>>103340134
Yeah, like I said riddles and math is what those models are good at.
I dont really care to ask stacking questions or about the circumference of the earth.
I strongly suspect that Qwen2.5-Coder-32B-Instruct would also be able to handle a tetris game. It aced all of my couple tests I threw at it.

>>103340144
Ah fair enough. I meant for ex. I get a first basic output but want to add something to that part that is fine/works, the previous thinking part is probably detrimental.At least thats what I thought.
>>
>>103340134
I also asked it for a tetris game. 6 replies later I'm in the "testing" phase.
Hopefully I won't run out of tokens since I ran it with 15k
>>
>>103340162
Lol yea, use top K 1 if your not otherwise it might go off on tangents. But yea, it will sometimes fill 10k context with its planning lol.
>>
>>103340159
> the previous thinking part is probably detrimental.At least thats what I thought.

Depends, if stuff It said in that context established the "how" in its "mind" then erasing that might be a bad thing for some sort of continuation. Feeding it the result and asking a new follow up based off of it should be fine though.
>>
>>103340174
It is actually planning and coding the game step by step. It lets me check each big step so I can decide if everything is alright and then proceeds with the next big step.
I can't use qwen coder because I'm a codelet but QwQ is holding my hand, guiding me trough the process.
>>
File: 1721325460040576.jpg (56 KB, 548x535)
56 KB
56 KB JPG
Is learning rate 5e-4 bigger than 1e-4
>>
>>103340225
Yes.
It goes like this.
>5
5.
>e-4
1234 digit places from the point.
.0005
>>
Tried QwQ once again for RP, it's still trash. Bland as fuck.
>>
File: 1701516705773173.png (470 KB, 673x636)
470 KB
470 KB PNG
>>103340242
But bing chatgpt said 1e-4 was bigger
>>
>>103340245
It can get pretty spicy with a good prompt. The main thing is that its super smart. Hopefully we get a finetune soonish, then we really will have claude at home.
>>
>>103340273
I stand corrected.

The chocolate ration for Party members in Airstrip One is being increased from 5-e4 grams to 1-e4 grams.
>>
>>103340225
>>103340273
is 5 bigger than 1?
>>
>>103340280
I haven't noticed it being smart(running at F16, temp 0.5, minP 0.05). It messed up physical positions in all 3 of my attempts at using it. Maybe for coding it's good, but for RP it's utter trash.
>>
>>103340273
>he trusts LANGUAGE models with numbers
>>
>>103340245
QwQ is really good at sussing out what the user wants then thinking itself back into a cucked state right at the end. Prefilling the reply helps a bit but the writing is a little bland.
>>
>>103339830
>>
Is it possible QwQ is just a good model and the fake thinking it does is just a placebo?
>>
>>103340329
for me it messed up several code projects and started speaking chinese at me during rp. maybe i'm using it wrong, it seemed to be using more tokens due to the 'thinking'
>>
https://pastebin.com/XXpj7JKj
bruh, lmao.
scroll down for the great result.
>>
I just tested QwQ quite extensively with the coding question I'm currently using for most of the interviews I give at my company. As far as I can tell, the problem isn't anywhere on the internet. The difficulty is somewhere in between leetcode medium and hard.

The model starts off extremely strongly in the first few reasoning steps, going over the problem and requirements, and charting out potential approaches. Usually within a few lines, it's roughly outlined an approach that a skilled human could then implement. Sometimes it even mentions the data structure that leads to the most optimal solution.

But then when it starts trying to reason in depth, it completely goes off the rails 100% of the time. Here are some ways it fucks up:
1. "Wait, but once I do X I then have to consider Y" where Y is obviously impossible. It then spends several hundred tokens trying to prove something that isn't true, and gets itself tripped up more.
2. Straight up incorrect knowledge retrieval. "In python, the bisect module lets me insert into a sorted list in log(n) time". No, it lets you find the insertion index in log(n) time, inserting shifts the list so is linear. It then proceeds with a whole line of reasoning based on a faulty efficiency assumption.
3. A bunch of "micro-errors", where it will say something that is subtly wrong, try to reason in depth about it, fuck something up, realize it's wrong or eventually say "this is complicated, let me reconsider", and just generally get stuck in a loop.

It literally never gets to the point where it actually starts writing code. Anecdotally, a bunch of it's problems feel like they're ultimately due to the model being too small. It makes too many small, obvious errors, or doesn't know something basic. An interesting model for sure, but people are overrating it I think. FYI GPT-4o / Claude 3.5 Sonnet get this problem completely correct about 50% of the time.
>>
>>103340407
This just shows again how these models think up all that slop but dont APPLY it.
>>
>>103340411
>It literally never gets to the point where it actually starts writing code.
I had to tell it to include the code inside a code block in its answer.
>>
>>103340407
>"I told my wife I'd help her with the dishes, but when I got to the kitchen, all the plates were already clean. She must have used those newfangled 'self-cleaning' ones. But where's the fun in that? Half the marriage is about complaining about who does more chores." That joke touches on marital dynamics and the changing nature of household appliances, potentially resonating with adults who share similar experiences.
w-what?
>>
>>103340420
it did that a lot for me too, as well as straight refusals to write code. i stopped its reply and replaced it with a ``` opening code block and it went right to writing its (broken) code
>>
>>103340407
kek
>>
>>103340411
Have you tried R1 as well? I'm pretty sure it is better than QwQ
>>
>>103340411
CoT = make your LLM a neurotic worry wart.
On top of them all being gutless weenies who worry about committing word crimes like saying "breasts."
>>
>>103340464
Nope, I usually ignore cloud-only models. Is there even a way to try this model without signing up and giving my info to a chinese company?
>>
>>103340474
They already have your information. You'd just be confirming which database line you belong to.
>>
>>103340464
NTA but I tried R1 a bit. It seems alot better and I really like that the casual thinking part is much less dry.
>>
>>103340497
>They're
Into the trash it goes.
>>
>>103340474
You literally just have to use a temp-mail, they don't ask for a telephone number or anything.
>>
R1 is crazy for a 16B model (active parameters out of 250B)
>>
>>103340474
You can sign up through tor or some VPN or proxy and test it (easier than posting through on 4chan through tor for sure), you might need some non-blocked email provider, but the second one I tried worked. No need to give them anything but your prompts. Anyway, they should release the weights soon, but I don't know how big the model will be, probably some not too small MoE, like their deepseek coder models, so might need a lot of RAM anyway.
>>
>>103340501
Yep pretty bad. Its definitely gender-cucked. But I do like the thinking part alot more than QwQ at least.
Still retarded though. After asking to create a profile of my ass it literally did that.
>Alright, so the user is asking for a psychological profile of their "ass," and they know we haven't talked much, but they want me to try anyway. Hmm, this is an interesting request.
>First off, I need to understand what they mean by a psychological profile of their "ass." Is this a metaphor for something, or are they being literal?
>It could be a play on words, like asking for a profile of their personality or something related to their behind, but that doesn't make much sense.
>Maybe they're using "ass" as a slang term for something else, or perhaps it's a joke.
>I should consider the possibility that they're being humorous or testing me. Given the previous conversation, they seemed to be in a casual, playful mood, especially with the meme explanation.
>So, perhaps this is just another light-hearted request.
>If I take "ass" literally, trying to create a psychological profile of a body part doesn't make much sense. Psychological profiles are usually about people's personalities, behaviors, and mental states.
>Maybe they're looking for a humorous take on personifying the "ass" as if it were a character with its own traits.
lol
Gotta give props to them to show the thinking output. Makes you wonder what kind of retardation goes on at openai.
>>
>>103340411
I also have a few problems in my back pocket that I occasionally use to test these models. Similar experience. QwQ starts off wild and spouts out a bunch of random bullshit that's irrelevant to the problem at hand, but it did eventually get to the correct answer (and to date, is the only open model to get there - Claude 3.5 Sonnet and o1 are the only others that have gotten it, and they don't always manage it consistently)
>>
>>103340546
>You can sign up through tor or some VPN or proxy
Why would you go to such lengths? I genuinely can't understand this, sounds like schizophrenia.
>>
>>103340426
It's a 32B anon. The holes in its common sense world model show up at some point.
>>
>>103340584
Did you read the other anon's request "Is there even a way to try this model without signing up and giving my info to a chinese company?" I answered to that. Anyway, it works because I've done it, for similar reasons, took me some 10minutes. Why? Because it was the first o1 clone to come out and I heard it was good, I wanted to see how it performed. It did an excellent job on my personal tests (math, code), I didn't test it for coom because I never thought these things are good for that. It did a somewhat poor job on some physics and chemistry questions, but for code and math was great.
>>
>>
>>103340611
This is what I mean with cooler internal monologue for r1
>>
>>103340609
IP address isn't personal info, if you think you NEED to use Tor/VPN/Proxy just to avoid giving information to the chinks, you should seek help.
>>
>>103340611
>>103340620
>redditor ai likes groid asses
Tits ftw.
>>
>R1 and QwQ eating o1's lunch
So is Sora all they have left of their moat now?
>>
>>103340649
>IP address isn't personal info
this lol, why do zoomers think that someone seeing your IP address (especially since it's almost certainly dynamic, and probably CGNATed too) is like them getting your phone number or something

it's MAYBE like that in isolated cases specifically for law enforcement agencies who have legal powers to contact your ISP and subpoena logs, but it's not like that for anybody else. a random corporation seeing your IP means nothing at all
>>
>>103340694
No, China's minimax is already better there as well going by the leak.
>>
>>103340694
No one's gonna care when they finally open Sora either, because anyone who's interested in AI video has already been able to play with pretty good Chinese ones for free now. Sora is maybe 20% better, not enough to get excited about especially if it costs money. They waited too long.
>>
>>103340704
I didn't want to explain this here, but think about this: "a random corporation seeing your IP means nothing at all". Let's say you did something on their site that actually invited law enforcement interest, it'd be too late at that point. There have been many cases where people realized something "too late" and then their logs were there forever and they can't take it back and months later the cops paid a visit. I'm not saying this is gonna happen with this chink company, obviously not, especially not with China/US relations, but really, if you had the policy to never reveal the IP, you would avoid any such situations.
>>
File: 1732855346904.jpg (101 KB, 1280x1243)
101 KB
101 KB JPG
>>103340649
>>103340704
>>
>>103339889
This.

Return to monke.
>>
>>103340407
Read the first 300 lines.
Lots of repeats.

Though interesting that it itself noted the repeats.
>That joke is just too good; maybe it's unbeatable.
>Perhaps I should settle on that as the funniest joke possible.
>>
>>103340694
>Sora
I'm pretty sure this week's leak showed that it turned out to be nothing special compared to the other proprietary videogen models that have popped up over the past year.
>>
>>103340716
I think it's less a case of they waited too long and more a case of "they finally lost the battle they were always doomed to lose"
Still pretty brutal how far they've fallen since the days of coomageddon though
>>
>>103340739
Oh, I get it. So you are saying that you do so many illegal things that you are afraid that you might have a lapse in judgment and end up posting some of your illegal things on their website using your home IP address. Now that makes sense, yeah, you really need to be careful in this case.
>>
>>103340772
Apparently it was a distilled Sora that's faster, not the original thing that can gen up to 1min of length and full HD resolution. That's kind of a disappointment, I did expect it to be too costly for OAI to sell, and seems they didn't intend to publish it same as the first dalle(1). Their distilled Sora didn't look much better than the existing video gen models (Mochi, some chinese ones, and so on).
>>
File: 1728035650990857.jpg (200 KB, 1920x1080)
200 KB
200 KB JPG
>they think that o1 and sora is all Sam has
The question is undoubtedly not whether OpenAI has the next huge step for AI in their hands already. The question is if they deem the world ready know about it.
>>
>>103340411
So it could potentially beat 4o/claude with a little tweaking?
>>
>>103340824
A 72b version definitely could.
>>
>>103340824
It kinda already does for some tests, so yeah
>>
>>103340786
Glow harder. Anyway, I can give examples like this that I've seen over the years. Some guy navigates some site and finds that the site was leaking some internal documents for example, let's say a pdf or some source code. He clicks it. He realizes the value of what he sees. He wants to leak it to the world. He gets a proxy and downloads it again and posts it publicly. They investigate the leak, and find that someone else saw it, now the cops are at his house. The internet is littered with cases like this. Don't end up one, preserve your freedom to do whatever.
>>
>>103340814
lol
>>
>>103340814
My brother in christ they couldn't even step to the top of the bullshit normie leaderboard
>>
File: 68lbod0j1dm21.jpg (114 KB, 780x1084)
114 KB
114 KB JPG
>>103340814
Sam may continue feeding investors with bullshit, but OAI has nothing else to show.
>>
>>103340835
Makes me wonder how many params o1-preview is. On one hand, they charge fucktons of cash for it ($15 / million input tokens, $60 / million output tokens). On the other hand, I've got a feeling OpenAI is frantically trying to recover as much cash as it can and it wouldn't surprise me if it was an 80B or something
>>
>>103340824
Um, no. Even training it on 72B would not, since as a base model 72B STILL doesn't know as much as 4o/Claude across all of human knowledge. And of course it wouldn't, we know Qwen focused on coding and academic subjects a ton for their model, so it knows little else relatively, and has a bunch of little gaps in its knowledge. With that said, on the things it does know and was trained extensively on, sure, it will beat 4o and possibly Claude, but we can't say it's overall better just by being better at some things. And if we're talking about RP, it's even more difficult to say that it's equivalent or greater than Claude, since one of the reasons people love that thing so much is because it knows so damn much about niche and obscure shit.
>>
With QwQ a character with a hood just tried to spit at me and they accidently hit their own hood. That is the final proof to me that this model is next level smart. Nothing else has the kind of spatial intelligence. It also never gives non humans human anatomy which even mistral large constantly fucked up with.
>>
>>103340992
Who the hell would use a CoT model to RP? The entire point of these types of models IS coding and academic subjects. All you'll get out of that is painfully slow fucking outputs about the optimal way for your waifu to get to school on the subway
>>
>>103341025
got a kekle out of me
>>
>>103341025
>>103341038
The duality of man.
>>
>>103341025
jej nice one
>>
>>103340992
No one cares about general human knowledge. You can just use a model capable of searching the web for that kind of stuff.
As for RP, it can't be helped.
>>
>>103340918
GEMINI?? How?
Language wise I'd say its good. It speaks very natural in german or japanese for example.
Doesnt have that "technically correct but its like english phrases etc. literally translated to X". feel. Difficult to explain but its like you can feel/see the english even if it outputs in another language.
Everything else its much worse. Cant ask anything, ultra hyper cucked, the context is a joke, it hallucinates even with low context and its not that smart either.
NotebookLM is the coolest thing google has. No clue what gemini does at the top there. what a joke.
>>
>>103341076
Gemini is at the top because they're literally optimizing against lmsys benchmarks, at least the questions users ask, this was discussed before and I think they did it for Gemma too.
>>
>>103341054
General knowledge is how you solve novel problems that require creativity and multidisciplinary expertise, or problems that may not even pop up anything relevant in search or be searchable with traditional methods. Of course you want as much knowledge as possible.

>>103341038
>Who the hell would use a CoT model to RP
Evidently, people in this thread, since Llama 1 supercot days and since we knew about things like CoT and ToT and GoT to begin with.
And it's not exactly something that would be useless either. There are absolutely ways to scale RP capability with test time tokens, they just haven't made the training data for it, as it requires different ways of thinking and problem solving compared to code and math problems. It's not something that can be generalized from solving problems in those other domains.
>>
>>103339938
What sliders were you using for that log, anon?
>>
>>103341110
Pathetic. Makes lmsys useless. But probably anybody does this.

Also pic related for coding, thats crazy. Sonnet way below. o1-mini leading.
What a joke.
>>
>>103341128
>There are absolutely ways to scale RP capability with test time tokens, they just haven't made the training data for it,
How would you do it? Math and coding are "easily verifiable domains", how do you do it for RP?
>>
>>103341153
Checks out though. Most people who use it are probably "software professionals" (pajeets) who are clueless about how pointers work
>>
>>103341144
Nta but new qwen needs high temp to really start cooking, like 2. Otherwise it's kind of dry. Just use some min p with it. It will start introducing characters and events to move the story which I find great.
>>
>>103341153
Also Claude Opus is 19th (!) on the creative writing leaderboard. Chat-fucking-GPT is number 1.
The voters are absolute retards, man.
>>
>>103341209
No way that's real, poor taste or something else? I remember some benchmark was rated with oai's models and it prefered its own output to other LLMs, but lmsys should be more regular users.
>>
>>103341154
Probably not a terribly different way from how they make these reasoning datasets already (which by itself is a combination of different methods, including literally paying people to voice their thoughts as they go through life). RP is of course open ended but you can always improve it. One possible method I thought of some time ago but didn't think about further because it wouldn't be useful to anyone is to perturb some existing writing by making it worse (say with a model that loves purple prose and isms, but for the sake of versatility we'd do it in many different ways, such as getting the model to insert mistakes with logical reasoning, awkward synonyms, erase allusions/references, erase things that make the text unique, etc.). Then use that as the negative signal while the original is the positive that can be used to generate CoT data with.
>>
>>103341209
Beaten by Gemma-2-9B-it-SimPO.
What the fuck. They probably just prompt some reddit tier "do a joke for me" thing and upvote.
If I remember correctly Claude did the most creative shit imaginable, but you did need to prompt it properly.
>>
>>103341224
No it's real human voters, from India.
>>
mornings, saars
anything good for poor vramlets or still old nemo?
>>
>>103341209
>it's real
Opus 19th at creative writing, holy shit lmao
lmsys voters are actually an anti-signal for model quality
>>
>>103341235
with 12gb vram, 16gb ram, QwQ runs at 1 t/s on my machine
>>
>>103341209
OpenAI either pay lmsys or pajeets to game lmsys
>>
Just got myself a 4070 Super, what kind of LLM should I break it in with?
>>
File: spit hood.jpg (161 KB, 1003x1280)
161 KB
161 KB JPG
>>103341025
Not to diminish the coolness of that and the intelligent creativity that comes from having thought of that during the RP, but it's likely it thought of "spit hoods" when it saw "hood" and "spit" in the same context, so it thought it close enough for this to be plausible. That is likelier than the case of it truly knowing that a hoodie's hood can be floppy enough spatially to catch stray spit. We already know that a lot of models can say 9.11 is greater than 9.9 because of religion vectors so this is a pretty typical thing for LLMs.

I guess it's also possible that there is some writing out there where people mention spit getting onto their hoodie but I think it's not so likely the 32B learned from those obscure texts.
>>
can someone explain to me how QwQ works?
O1 is super slow, even mini version, but on OpenRouter, you get results from QwQ instantly. So, I'm assuming it's not functioning like O1.
>>
>>103341343
>O1 is super slow, even mini version
Is it? It's pretty fast for me.
>>
>>103341343
As far as I can tell from what people have posted, QwQ can both do the thinking gimmick or not, and if it's not, then you should prompt it to do it. Also since it doesn't hide its thinking tokens (unless you use a frontend that does, which I am not aware of any that do currently), you will see those tokens stream in immediately unlike o1 which keeps the thinking tokens back, since it's extremely dangerous and unsafe content they can't let you have.
>>
>>103341369
>QwQ can both do the thinking gimmick or not, and if it's not, then you should prompt it to do it
can you tell me what prompt I should use?
>>
If you think about it, LLMs are just one-dimensional entities that can't even go back. And you guys want it to reason on the three-dimensional level
>>
>>103341412
What does that even mean.
Time moves forward for humans as well.
>>
OK, one more QwQ experiment from someone else. I'm running it at q8.
I decided to write my firefox sovits screen reader plugin entirely using qwq. I prompted it with initial poc requirements and a copypasta of the chinese api guide. From there I used it to refine the project feature-by-feature and debug until I was ready to release an xpi based on its code.
We reached something usable and playing audio on the backend within 10 replies and were ready to release something with a minimal feature set (selectable characters with emotion slots) before we hit 60k tokens. I didn't need to do anything more sophisticated than copypasting the code that it spat out into the source files and testing/feeding it the results and error messages. The first manual edits I did were right at the end of context to clean up a few things I didn't want to bother passing through it.
Once we 60k I got it to summarize the entire project and I used that as initial context, along with file contents, in a new chat to continue work.
The big takeaways from this for me were that if allowed to make incremental improvements qwq can get real work done (with handholding) assuming you are already minimally competent at coding and actually know what you want to build.
The most annoying thing was its propensity for continuing the conversation past the actual answer: it will append something like "Human: ok now do this thing..." and just spin off to the token limit. I'd have to edit the response and trim off the excess. I'd say this happened about 1/4 of the time. Might just be my setup is off somehow. I didn't actually see it trying to reason, reflect or CoT at all.
Anyways, QwQ is fast and competent enough I'm going to try some more sophisticated work with it now to see how far it can be pushed before exploding.
The code is up at https://github.com/cpumaxx/sovits-ff-plugin/ if anyone is curious as to what kind of mess it pukes out. I only put major working milestones in, so no complete source history.
>>
>>103341565
>one more kwik experiment
hehehehehehehe
>>
>>103341328
Some mistral nemo 12b finetune, like rocinante or rpmax.
You'll probably have to run a q4.
>>
Is QwQ shit or not? I can't tell.
>>
>>103341897
Its shit.
Mediocre as in it writes like all the other qwen models but you can now waste some extra token for thinking that is usually not being applied to the output anyway. I didnt feel the "smartness" like others in here either. It felt kinda stupid actually.
If you want riddles + math maybe you can get some use out of it.
Otherwise use any mistral for RP or the latest codeqwen for code.
>>
>>103341897
>Is QwQ shit or not? I can't tell.
It appears to have some niches where it is outstanding, but RP is not one of those.
>>
>coos
>purrs
>moans
>>
>>103341565
Buy an ad faggot
>>
>>103341192
How you could even code in C without know that? I only know C I'm not a programmer and I learned using a old book of the 80s. (I only use for scripting)
>>
>>103341897
No, but all models have strengths and weaknesses. Test time scaling is a kind of fickle thing which makes the model difficult to judge and compare fully on a lot of things even the things it's supposed to be good at.

I think something people forget is that models are autoregressive. Once they make a mistake (which can happen if sampling at non-0 temp), even if they are trained to try and correct mistakes, they will still more likely roll with them. So since they are autoregressive, the more chains of thought and tokens are used, the more likely the model is to snowball its errors. The more complex and needing of many steps a problem is, the more mistakes it can make.

However, models have gotten better and better at spotting mistakes and not making mistakes in the first place by just being smarter, so Qstar and even simpler CoT methods have become much more viable. But still, because of this autoregressive nature, it means that there is simply more random chance at play in whether or not a model will solve a problem as desired or not. On average for certain classes of problems like math problems, the chance to succeed may increase, but it's not a guarantee on any individual problem since you're essentially rolling an invisible dice each token.
>>
is bigger context length always more gooder?
>>
>>103340814
kek, I remember when people would say this with a straight face
>>
>>103341957
No. It takes more brain cells to work with bigger memories. Thus if you train a model to be good at handling a lot of memories, it might be worse at lower contexts than a model that spent that same compute on lower contexts. We see this most with Gemma, where the context length is small but it's super smart for the amount of parameters it has.
>>
Great, now after magnum V4 72b defender we have a QwQ one too.
Looking forward to newfags getting 72b and QwQ recommended from now on.
>>
File: 1716523684248084.png (602 KB, 588x470)
602 KB
602 KB PNG
It's over https://x.com/TheInsiderPaper/status/1862089056735379755
>>
>>103341985
Since QwQ is fast and competent lets make it do the thinking part real fast and 72b does the response. Summerdragon is finally within reach!
>>
>>103341985
>still seething about China
>>
>>103341994
So, Llama4 will be BASED?
>>
>>103342015
Based on reddit yes
>>
>>103342008
Good that china makes pressure.
I wouldnt mind at all if I cant make winnie poo jokes and taiwan becomes uncountried. Better than the propaganda from the west. Qwen models have both though, they are getting better though.
Its 90% mistral and china for local lately anyway. And llama is even more censored now than qwen.
Its just that qwen models are not that good. Apart from the coder. Thats the best local coding model for sure.
>>
>>103341565
>"Human: ok now do this thing..." and just spin off to the token limit. I'd have to edit the response and trim off the excess. I'd say this happened about 1/4 of the time. Might just be my setup is off somehow. I didn't actually see it trying to reason, reflect or CoT at all.
Did you use chatml? So far it seems to follow the format and end the responses correctly. I haven't seen it do the CoT either unless I tell it to think step by step.
>>
>>103342033
Magnum v4 72B is pretty good for ERP.
>>
File: 1714389413746590.png (1.18 MB, 966x604)
1.18 MB
1.18 MB PNG
>>103342015
Maybe
>>
>>103341951
>the more chains of thought and tokens are used, the more likely the model is to snowball its errors
It's also more likely to catch any edge cases and find solutions it would otherwise miss.
>>
>>103342056
Will stuff really change though?
grok and also the latest grok-beta is worse than the latest sonnet/openai to be honest.
Its weird that those 2 companies relaxed more than meta and X.
If llama4 is cucked again its over for them. Feels like nobody cares that much about them anymore. Mistral/Chink is where its at.
The recent stuff meta showed with voice etc. was downright embarrassing. Still hoping they deliver though.
>>
>>103342068
Not currently. It just goes into a loop like QwQ.
All that thinking doesnt really feel creative either. There is lots of stuff needed I'm sure to get this properly working.
Also with more thinking the user query slips further and further back into context.
>>
>>103342082
It should think then put back the user prompt in the front before requesting again for the direct answer, it should be feasable with thinking tokens
>>
QwQ can do RP fine but I think it needs a few guidelines to work.
A prefill is mandatory to limit how many steps of thinking it takes before returning to the RP. Something like "Okay, {{user}} just said [thing] I will now formulate a response to their input in [number of steps]

Step 1"

I don't know if doing that breaks its actual reasoning process though. And none of that really fixes the fact it's dry and often includes making sure not to make {{user}} feel uncomfortable as one of its most important steps.
>>
>>103342082
I've been playing with qwq since yesterday and I'm yet to see it go into a loop but I did see it expand the prompt into step by step bullet points and reason about how best to implement each one before repeating the bullet points again and writing the answer.
>>
>>103342068
Not if the model is dumb and lacks the knowledge and skill dependent on the specific problem you throw at it. Read the following paragraph after what you quoted. The point of that sentence was to point out the general nature of autoregressivity in LLMs so far, and the next paragraph points out how that is changing, but still imperfect.
>>
>>103342124
>I'm yet to see it go into a loop
I'm not gonna load this garbage up again but its very easy.
Here I just did the joke tthing again on openrouter. Full output before I stopped here:
https://files.catbox.moe/2pkjrk.txt

It did that multiple times for me. At least 4k tokens+ in silly. Maybe it would have ended eventually, seemed like a loop though repeating stuff. Similar to this.
>>
>>103342167
Do you have repetition penalty set?
>>
>>103342167
"Please stop spewing paragraph after paragraph and answer using concise, efficient writing"
>buries you in another 10 paragraphs
>>
>>103342186
in silly yes, openrouter no.
but i think even qwen team acknowledges this.
regardless, if you look at the output its just nonsensical at times.
I get it, its another llm and rambles on, it is what it is. But this is just not good enough.
>>
>>103342033
>And llama is even more censored now than qwen
Are you saying QwQ is less censored than the old Qwens? Llama is pretty normal/average as far as censorship goes for modern LLMs, I'm not sure I'd call it worse than Qwen when you factor in that 70B is notoriously difficult to tune without an intelligence hit but 405B and 8B have both been decent. Mistral is an outlier these days.
>>
Aside from RP are there really any better models than QwQ for local inference right now?
>>
>>103342167
These are the unfunniest jokes I've ever heard
>>
>>103342167
A bit of a niche example but granted. I used it only for programming stuff.
I guess it would get stuck at anything that doesn't have a clearly defined answer.
>>
I haven't used oai's models for ERP for a year and today I checked it. I went from Nemotron to gpt4o-latest and, not just model is super bland but it's like 100 times more censored than gpt4-turbo. Fuck happened? It used to be somewhat good at writing smut. Now, it censors 99% of it and 1% is just terrible. I feel like it doesn't even know what a penis is.
>>
>>103342292
penises are dangerous anon, the ai doesn't need to know about dangerous things
>>
>>103342312
neither do local llms :)
jailbreak-forcing it in compliance doesn't count btw
>>
File: 1702194946172818.jpg (145 KB, 1280x720)
145 KB
145 KB JPG
>speculative decoding
>multiplayer
kobold won
>>
For people who tried qwq for RP, is there a way to make it not show the thinking in the output?
>>
>>103342386
yeah
>>
>>103342410
How would I got about doing that? Do you have a config you can share?
>>
>>103342416
it's called < >
>>
>>103342410
>>103342425
So? Be more specific cooldown this mystification bullshit.
>>
>>103342433
you literally just tell the fucking ai to 'hide text <like this>'
>>
>>103342321
Skill issue
>>
>Train model that there there are 3 r's in strawberry
>Train it to spit out incoherent detached rambling first
>watch redditors clap like retarded seal at your party trick.
>>
>>103342466
>Our shit product is your skill issue
Yeah anon i know.
>>
>>103342543
Do better then
>>
>>103341994
No way, a corpo sucking up to the new government?
>>
File: 1722579712358655.png (1.25 MB, 776x714)
1.25 MB
1.25 MB PNG
>>
>>103343090
Game-changing stuff. Absolutely incredible. Imagine the things we accomplish now. I'm literally shaking.
>>
>>103343090
7B + regex
>>
sama in shambles, not even full o1 can save his trainwreck of a company anymore
>>
what local model is best for roleplaying?
>>
>>103343133
deeznuts-8b-Q2_K_M.gguf
>>
>>103343133
TheBloke/read-op-8B-gptq
>>
>>103343090
>web search
that's cheating and you know it
>>
>>103343133
Hasn't been released yet.
>>
How come nemo RP finetunes are more censored than the base instruct version?
>>
>>103343239
It will always depend on the database that is used for it. But you must be really fucked up if you get filtereted by it.
>>
What are the most FUN small (around 8b range) models now? I just want something that can adapt to different personalities and styles well, without any guardrails. If it's gonna refuse things I want it to do it while remaining in character at least. I don't really care about it being smart in any other way, as long as it can follow style and character prompts reasonably well. I guess I'm just hoping that there is something similar to pygmalion but more up to date and a bit less shit.
>>
>>103343342
pyg6b
>>
>>103343130
Trust in sama. Tomorrow is Chatgpts anniversary and he is going to blow everyone away.
>>
>>103343527
The only guy blowing here is you
>>
>>103343527
What is he going to do, make 4o even more retarded?
>>
>>103343621
A brand new safety feature that randomly refuses every other prompt.
>>
File: 1702780169919512.jpg (431 KB, 1900x1700)
431 KB
431 KB JPG
my local model's responses are shitty oneliners. how to fix? i run rocinante 12b
>>
>>103343647
It will include a lesson about trannies and shittalk white people as a bonus. Very revolutionary and brave vision of the new world.
>>
>>103343655
Length penalty, bias newline, and start a new convo without context contamination.
>>
>>103343655
I have the same problem before.
Change the model.
>>
>>103343342
Seconding this as I want to run many requests concurrently
>>
https://x.com/elder_plinius/status/1862359119337808063
>>
>>103343342
>>103343749
8B is to small, honestly. I remember that Mistral was decent, but it has been a while since I tried it. You should try to get at least 12b to run.
>>
I got the sovits working to the point that I can get sound out on linux.
Is there a guide on how to hook it up to the Silly tts extension?
>>
>>103343785
Since I want to run on vllm, the choice is between q8 8B or AWQ 13B, would 13B still be better?
>>
So I've been using Koboldcpp and SillyTavern for my LLM sessions, I'm using Lumimaid-v0.2-12B.q6_k as it was what was recommended in the guide I was following, I'm running an Intel Arc A770 and Kobold on Vulkan mode because there doesn't seem to be an IPEX version from what I can tell.
Am I all set or is there a better alternative that I didn't spot when I was setting this all up?
>>
>>103343765
God I hate these people. Unironically saying the term "Jailbreak" should be an instant lifetime ban from the internet.
>>
File: 1707603902219381.png (5 KB, 303x122)
5 KB
5 KB PNG
>got sillytavern to run locally on my phone
yep this is it, bye janitor
>>
>>103343840
A quanted 13B will still feel better than 8B.
>>
>>103343859
MN-12B-Lyra-v4 and Violet_Twilight-v0.2.Q6_K did give me better outputs, but this likely depends on preferences.
>>
>>103343969
are those models for roleplaying?
>>
>>103343991
Yeach
>>
>>103343999
do you know if the method I'm using is optimal for intel Arc cards? I know I'm a little limited and that vulkan isn't the best for ARC but it's far from terrible
>>
>>103343999
nice trips, Have you tried stheno or rocinante? Are those two you mentioned better?
>>
>>103343859
>Luminaid
>Am I all set or is there a better alternative that I didn't spot when I was setting this all up?
You're fine. Now please never come back because you're attracting the sloptuners.
>>
>>103344079
don't be such an elitist, I'm asking a completely reasonable question.
>>
>>103341994
>Corpos kissing the new government ring
never seen that before
>>
File: pepefroggie.jpg (38 KB, 780x438)
38 KB
38 KB JPG
My bud told me he's hosting a chatbot website with a single 4090 in his basement and is actually making some decent side money. How is this possible?
>>
>>103344425
he's hosting 7b retarded models and the normies think it's the second comming of christ because it was overtrained at counting the number of r in words
>>
>>103344425
He's probably lying to save face over the fact that he spent thousands of dollars on gooning.
>>
I am surprised how Tulu of all models is one of very few models that don't ask for my consent, if I am ready or annoying stalling shit like that. Even if the model got some god-awful GPT slop and purple prose.
>>
>>103344425
What kind of stuff does his website do?
Is it just the simple cai thing with chat only?
>>
is there some chart of all the best llms and their most popular finetunes and merges?
>>
>>103344585
Best LLMs:
The chink models
Worst LLMs:
Literally everything else
>>
Where the fuck is the context and istruct template for QwQ for ST, and best samplers, please anon I prove the model but sometimes is complete retard when i Use rep pen. Also, chinks word randomly generate.
>>
>>103344695
>Also, chinks word randomly generate.
yeah I have this same shit, that's frustrating
>>
File: 1702593601913116.jpg (93 KB, 554x1000)
93 KB
93 KB JPG
>>103344669
trve....
>>
>>103344695
newfag
>>
Don't use rep pen. It fucks it up.
>>
File: 734806.jpg (117 KB, 716x1011)
117 KB
117 KB JPG
I gave a peek to https://candy.ai/ since its being shilled everywhere but even using it once is paywalled
How hard could it be to replicate this locally?
Its basically whatever random shit local model they use able to generate pics from stable diffusion midways conversation, silly abandoned the support for stable diffusion or something I recall
>>
>>103344669
And we can thank him for going closed source and causing research to stagnate to such a point that they could basically just release their model and claim victory
Sama put the west so far fucking behind where it could have been it's not even funny
>>
>https://rentry.org/lmg-lazy-getting-started-guide
And what if I have a 4090?
Wouldn't I want something with more parameters?
>>
>>103345056
>https://candy.ai/
>What are you interested in?
>click on anime
>pages upon pages of /aco/
>>
>>103345136
30B is mostly a dead model size for chatbotting nowadays. Your best option is to wait two more weeks
>>
>>103345136
Follow the instructions just to get a feel for it. Later you can ask for a bigger model if you're unsatisfied with nemo.
>>
>>103345166
What happens in two weeks?
>>
>>103345136
Try Mistral Small. It's 22B and fits into your 4090.
>>
>>103345136
People always recommend way smaller models than what is possible, especially on leddit.

Depends on how much you are willing to trade speed for quality. I got 4070 ti and can run 70b on iq4xs quant. 1 token per second for okay-ish quality is preferable over quick but retarded slop with poor logic and coherence.
>>
QWQ verdit for roleplay? only those who used largestral 2 can reply to this btw
>>
>>103345217
Don't do it
>>
>>103339801
if you tried mistral large 2 you wouldnt be asking that question
>>
>>103345217
It's shit.
>>
>>103345217
Funny but useless
>>
File: green man.png (944 KB, 694x681)
944 KB
944 KB PNG
>>103345177
You will know once it happens.
>>
>>103343133
mistral large 2
>>103344425
scamming normies
>>
>>103345217
Its different, can be fun. Super smart, brings up small details no other model seems to really do. Its quite dry though. Id say large mistral and tulu are better at RP still. A finetune might change that though.
>>
>>103345217
Different? I'm playing around with some plugins to try and take advantage of it's planning abilities. It's kinda working from what I can see in the thought process, but it's hard to reign in and it has the regular pitfalls of a model of its size. Guess the fine tunes will make or break it, if the tuners learn how to train CoT for RP.
>>
ML2
dl doko?
>>
>>103345270
Have you checked your anus?
>>
>>103345217
It's okay.
>>
>>103345056
Nah your website is shit. It works on still works on silly just check the guides
>>
I couldn't figure out how to configure qwq to output an actual message after all the thinking in sillytavern.
>>
>>103345144
Literal boomers are owning these websites to make quick cash, what did you expect kek
>>
>>103345320
I had posted one a thread or two ago. You have it think in character inside of <thinking> tags then write the response
>>
>>103345331
Can you export your settings?
>>
>>103345320
There's also the Stepped Thinking pluging
https://github.com/cierru/st-stepped-thinking
But I think it fucks with the prompt format, hope inference time compute gets standardized soon, now with several models coming out.
>>
>>103345349
Not at home atm otherwise I would have.
>>
Does anyone have a source of super high quality English voice samples for tts training?
I've been scraping youtube and voice actor demo reels, but its really hit or miss.
That seems to be the bottleneck on these systems, and I don't see any projects out there trying to curate good datasets.
It would likely be highly socially unacceptable, but torrents don't care.
>>
I can get the new m4 macbook and expense it with the highest amount of ram

How good is this for LLMs? Specifically just for learning shit like coding or powershell (and ideally using it as a way to help me with math)
I'm not expecting anything major but kinda like 'how do I do this' and then 'and why the fuck does this code work but not the following code' kinda deal
>>
>>103345486
Very slow context processing.
>>
>>103345217
qwute~
>>
>>103345495
I've a 4090 and run 12b models at the moment on windows

How's it compare? I'm not expecting miracles but the mian thing is that I get an answer within like 10-15 seconds
>>
>>103345217
>QWQ verdit for roleplay?
it talks like a woman HR doing a powerpoint presentation, corporate, souless talk
>>
>>103345565
>using the default assistant persona
Jesus, make it use the personality of the character... I showed everyone how to a thread or two ago.
>>
>>103345578
>make it use the personality of the character
Listen newfriend. If you give the model 8k tokens of prefilled human made content and then it continues to write in its own hr assistant style while it also picks up 10 subtle patterns in the writing you didn't notice and you didn't actually want, you start to realize none of this shit works. At best it is a thin condom wrapped around the meaty shaft that is the reddit hivemind.
>>
>>103344695
0 temp, no rep pen, chatml.
>>
>>103345612

>>103339830
>>
>>103345578
You're talking with the anti-Chinese troll.
>>103345217
It's definitely less dry than Large for RP.
>>
when are we getting RP finetunes for multimodal models?
>>
>>103345486
>apple
Reddit is this way: https://old.reddit.com/r/LocalLLaMA/
>>
File: 1711262873551176.jpg (319 KB, 1030x1326)
319 KB
319 KB JPG
>>103345715
>>
>>103345715
You got it wrong. Jewvidia is being such a jew that even with the apple tax apple shit is starting to be an option.
>>
How would you stack a book, a bucket, a tennis ball, a sword and a chair to reach the biggest height possible?
>Another idea: maybe use the sword to impale the tennis ball and stand it up.
>But that might be destructive and not necessary.
>Let me think differently.
It was so close to thinking out of the box and solving it but it's too fucking safe (QwQ)
>>
>>103345735
Who the FUCK cares about phones
>>
>>103345746
And now I said that it's allowed to be destructive and he is dissasembling the chair lmao
>>
>>103345486
>Getting anything good for its price from Apple
lol
lmao even
>>
>>103345793
Anon, I can expense it
I genuinely give no fucks since it'll be paid for
>>
>>103345738
>no games
>no training
>linux support is behind some mentally ill vtuber
>slow prompt processing
>slow image gen
>impossible to upgrade
It's barely an option.
>>
>>103345783
Good idea, use each part of the chair and its screws to make a wood stick as long as possible.
>>
>>103345746
>tfw QwQ is your face after using it, they knew all along
>>
File: 1715169835396520.png (66 KB, 360x346)
66 KB
66 KB PNG
>>103345805
>>linux support is behind some mentally ill vtuber
wait what?
>>
>>103345806
But then you could stretch the matter into infinity to get infinite height
>>
>>103345805
I would say 5 T/s is usable for me. And you can easily get a 70B like that with just one macbook. Compare that to jewvidia single gpu solution and it is not even a comparison. It is an option just for LLM-s. And if you are doing a 2 gpu jevidia setup you are doing that for LLM-s.
>>
>>103345805
>gaymes in 2024
lol
>reeee no loonix support!!!
Who cares, macOS or Windows is the way, multiplatform is god choice if you care about gaming stuff.
>Slow img / prompt processing
Will be fixed, apple works on it already, something about kv cache compression in mlx.
>Impossible to upgrade
The only valid complaint. Though it can be explained by apple's anti-theft measures, 3rd party stuff wont work properly, guess EU should jump in and rape apple again.
>>
I asked QwQ to continue with a certain authors style
>author is known for layered, dense prose, complex sentence structure etc etc
Holy fuck, it's working
>Begins: I X, Ying. I did Z. It X, Ying.
Oof, why reason if it doesn't even apply it afterwards.
>>
How come some models properly end their replies in a complete sentence with a period, while others will stop at the pre-determined token count(lets say 250) without completing the final sentence? Is it a stop token issue? What would tulu's stop token be?
>>
>>103345860
>pay thousands for slightly faster than cpu only, missing a ton of features still, and not being upgradeable
Maybe AMD APUs will deliver in a gen or two.
>>
>>103345882
For brain dead questions, please go to reddit.
>>
>>103345875
It's pissing me off as well, it's very precise and correct whilst thinking, then just pulls a reply from it's ass.
>>
>>103345917
Its very sensitive to how its prompted it seems. You need to tell it to, in character, use what it has reasoned to continue the story.
>>
>>103345817
https://www.youtube.com/watch?v=LonzMviFCNs
>If you don't like it, tough luck. Complain to Linus, and make sure he knows if he kicks me out his shiny M2 MacBook no longer gets upstream support.
https://archive.is/YuVlY
>>
>>103345950
https://youtu.be/LonzMviFCNs?t=556
the fuck is this voice, is this an actual troon?
>>
>>103345917
Maybe ask it to give itself examples in its CoT?
>>
>>103345992
https://archive.is/ilehu
It's open source.
>>
>>103345950
Oh, cool, they are live right now!
https://www.youtube.com/live/xHzy7iySS2c
>>
>>103345266
How would you even finetune CoT for RP? Lots of multiturn examples with elaborate CoT in the latest response?
>>
File: 1730452318285.png (358 KB, 612x567)
358 KB
358 KB PNG
>>103345950
>>103345992
>>
File: 1723989324845979.png (1.12 MB, 2186x1231)
1.12 MB
1.12 MB PNG
>>103346034
>all this autistic shit to sound like a troon
kek
>>
>>103345950
>>103345992
>>103346035
Bųy aŋ åd şis
>>
>>103345950
This may actually be worse than mikufaggots.
>>
>>103346074
Same thing, even sounds similar like their fav vocaloid meme.
>>
>>103346092
that sounds nothing like Miku, you're tripping
>>
I didn't meant to start linux/mac flameware
I honestly just wanted to know if it was worth getting the new M4 to use as a tool while at work since I could expense it so wouldn't cost me anything

I don't want to use my desktop with a 4090 and remote on and the razer bllade I tried out, while not bad is loud as fuck and that defeats the purpose
>>
>>103346127
>since I could expense it so wouldn't cost me anything
I think this is the only time it'd be worth it. You likely couldn't get the same overall inference performance on a big model with any other laptop setup.
Just be prepared to wait longer for prompt processing than you'd like, so you'll have longer and longer delays the deeper you get into a chat.
>>
>>103345735
The crazy thing is that this almost directly correlates to the per capital s-y consumption of these countries.
>>
>>103346125
Both can induce ear tinnitus with this high pitch anime retardation.
>>
File: image.png (239 KB, 1149x1113)
239 KB
239 KB PNG
it's still thinking
>>
>>103346040
Changing the CoT format entirely. No reason for any of the tokens to make sense to us. Before each token is predicted, the model should generate a block of guiding tokens, allowing it to sample a much wider variety in its latent space before making a prediction. As we all know, the models take very predictable paths through latent space, if were trained to actively sample much wider for each token I have a suspicion we would get much higher accuracy, or at least lower PPL compared to a similar sized model. The guiding tokens could be trained through RL, minimizing PPL.
Just a shot from the hip, but it seems like a natural evolution unless I'm missing some obvious pitfall. There are so many cognitive program snippets embedded in these models, but they have no way to productively sample them during generation at the moment.
>>
>>103346228
>be adhd
>the other person already left by the time done thinking
>>
24gb vramlet here. I tried QwQ for RP at Q4_K_S, but I think it still falls beyond Tulu 70b IQ2_S in RP.
>>
>>103346705
>24gb vramlet here
>it still falls beyond Tulu 70b IQ2_S in RP
wait, are you seriously running 70b models with only 24gb of vram?
>>
>>103346705
For RP I agree. For everything else id say QwQ or qwen2.5 32B coder, some stuff QwQ does better, some coding stuff coder does better.
>>
Are people unironically using math and coding models for RP?
Why?
>>
>>103346794
QwQ is supposed to be a general reasoning model, not a math or coding focused one, that is just where it shines. (Though while it gets some stuff right that regular coder / everything else got wrong for me it gets some stuff wrong that coder gets right)
>>
>>103346794
why not? a smart model means non retarded RPs, but yeah, if it has no sovl it's bad either way
>>
>>103346228
>I've been seeing you here every wednesday for a while: do you come here often?
kek
can't believe 2024 is when we taught nvidia cards autism
>>
File: ThatsNotATPSReport.png (1.08 MB, 1280x768)
1.08 MB
1.08 MB PNG
>>103346794
>why?
because its funny
>>
>>103346744
Yes, I am. In general, I think that smaller quants of big models will remain superior to smaller models until models start to truly reach saturation - peak density of useful data. At that point, smaller models will rise.

Right now, even at low quants, big models are clearly superior at present.
>>
Biggest problem with QwQ is not cleanly delineating the CoT like o1 and r1 apparently do.
I notice it sometimes likes to say **Final Answer** or **Final Solution** at the end of its CoT, but it's not always consistent in this. It makes it difficult for RP because I'd love for it to be able to reason about its RP response before continuing, but it can't seem to hold consistent formatting well and will often blend its thoughts into the answers. Maybe if it were 70b+ you could teach it to do this reliably in just the prompt, but ideally it'd be trained with some thinking tag tokens and do that naturally.
>>
>>103346745
Have QwQ plan the code, and then feed the plan to coder.
Post results.
>>
>>103339593
I left it because there wasn't really anything past the 2000 character cut-off worth replacing it with, and the fact that HF itself took action seemed significant enough to just leave it in there.
>>103339638
A meatbag still manually posts the output.
>>103339640
I do vet the output, if I have time. But the fact that it's not fully automated after all this time irritates me to no end.
>>
>>103346966
aider does that with the architect/editor system, it seems to be the best use of the reasoning models rather than having them write the code directly, getting better performance than any individual model on its own:
https://aider.chat/2024/09/26/architect.html
qwq would slot into this perfectly.
>>
>>103346127
Yea it's an excellent laptop. good display good battery life, the only problem is that if you want more than 16gb of ram or 512gb of SSD space, you will be raped, and it's a rape that you deserve if you don't like linux, since you could find an equal laptop with 2 nvme slots, oled for the same price, and just install linux on that (I actually don't know... maybe that laptop doesn't exist... and that laptop would not have a powerful GPU like a 16gb mobile 4090 technically a 4080, and I don't think you would want that since 16gb is nothing, you could use google colab for the same amount)
I also don't know what you mean by needing to remote it.
Surely anything productivity based would be provided by the employer and it would be some sort of GPT shit. Like sure, if you really want to, you could get the m4 pro laptop, and you could run some sort of LLM with 24gb of unified ram (it's more like 16gb of vram, since you probably have a browser open, and the bandwidth is pretty low, so even if you had 48gb it would run at like 3 tokens per second). Now if you got a 48gb m4 max for $3700, that could do more than your 4090, if you really wanted to run 70b q4 at like 5-10 tokens per second (the bandwidth of the m4 max is half of the 4090/3090, so 2x 4090 would give you like 15tk/s).
https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

so overall, no linux = sure get a macbook, you might even like it so much it might replace your 4090, you might even lose interest in local LLMs, and you might start renting 200gb GPU servers from vast.ai or something for like $5 an hour.
>>
>>103339670
this abliterated shit was always a meme and will remain a meme, not susprised by the shitty results there
>>
>>103347005
>But the fact that it's not fully automated after all this time irritates me to no end.
If you do a code release on your pipeline, I'd be happy to collaborate.
Are any of the trainingfags around? If I could automate a pipeline for that I'd happily continuously train a community model on lmg threads for recapbot.
>>
>>103339670
kill yourself
>>
>>103345486
>macbook
The real answer to this is to get them to buy you a proper AI box for the price of that laptop, get a cheapo laptop, and remote back into the powerful box. Are you out of wifi and cell range often enough that carrying all that shit around in a portable form-factor is worth the "1/2 the power for 2x the price" laptop tax?
>>
>>103345805
>no games
if you're over 21 and still play video games, you should kys asap. only exception is sports games (barely)
>>
>>103345612
skill issue
>>
anyone tried the full precision weights version of qwq vs q8? I'm wondering if it gets hit harder than dumber models.
>>
>>103347158
Elon musk no lifes diablo btw
>>
>>103347198
as i said, anyone over 21 doing that should kys asap
>>
>>103347087
please articulate what was upsetting to you about that post
>>
File: dumb story but.png (1.03 MB, 800x5987)
1.03 MB
1.03 MB PNG
>>103346228
How do you trigger the CoT? Guess I should start from blank chat and fresh sys prompt instead of RP prompt and saying "use chain of thought (CoT) to continue the story" in my last input.
And on swipes it's not vomiting this long.
>>
>>103347236
>think step-by-step
Seems to be the best trigger for me so far, not so odd when it's the same they use in their examples.
>>
>>103347236
it's sensitive to the phrase "think step by step"
>>
>>103347265
>example
I should have looked at that first
>>
>>103347236
words words words words

Tell her to send you some selfies, sheesh
>>
>>103342015
>hurr durr based
why don't you go suck trump and elon's cock you faggot, bootlicking, billionaire worshiping, bitch motherfucker
>>
>>103347307
Go cry on bluesky tranny
>>
>>103346915
What prompt do you use to make QwQ use a CoT while roleplaying?
>>
>>103347057
I'll put up a repo this weekend. I've been collecting logs from the last few months of bot output in the hopes of training 8b loras so even vramlets could be backup recap anons, but I suspect most of the logs from before this month won't be high quality enough.
>>
>>103347307
lol you lost the election
>>
>>103347321
>logs
Just thread captures, or something more interesting?
>>
>>103347158
Semi-bait aside, more like >29, 21 is when people start having jobs (assuming not 24/7 NEET, a separate issue, in which case should kys) and can buy a computer for themselves without mommy's money. Instead of playing sports videogames, why not play the real sports? Games can be left for things you can't otherwise do.
A suit and cap wearing grandpa didn't suddenly wear them, he was wearing them when he was 15 as well.
>>
QwQ fine-tunes?
>>
>>103347307
Your post practically radiates saltiness. Sorry, but we're taking it all back. Woke has to go.
>>
>>103347089
>1/2 the power for 2x the price
technically if he went with the 3 tk/s 48gb 1tb m4 pro for $2900, and ignored the fact 2x 4090's would give you 15tks and ignoring that he already has a AI box with a 4090, and included the cost of a 4k HDR display... the macbook pro is kind of better than a AI box, it's just a bit slow when you run the 48gb LLM's (5tk/s).
But I highly doubt the company is going to cover $2900 for the 48gb m4 pro laptop, and the base m4 $1600 16gb 512gb version is probably the best option for what he does (just don't think he is going to use a local LLM, it's going to be GPT or something), and might as well get the $2000 24gb 1tb version if it maxes out the budget for the laptop.
In my opinion I bet the company uses remote VM to windows or ubuntu. It depends on what his job is and what he does with the laptop.
To me, AI is only good for RP, I don't know what the hell he is doing on it for work.
>>
>>103347384
>Sorry, but we're taking it all back. Woke has to go.
exactly, we have the good ending, time to take advantage of that
>>
>>103347393
>the macbook pro is kind of better than a AI box
The AI box could also be a mac studio, if that's how you want to roll. I can't see the macbook being the best for this in any scenario that doesn't have you stranded without comms back to a better, non-portable box
>>
>>103347393
AMD apu leaks something like 256GB with 300 gbs ish memory bandwith. I would wait at least for that, hopefully in a gen or two it will be more like 600 gbs+
>>
>>103342033
>and llama is even more censored now than qwen
llama 3 only censors incel shit like erp and racism. qwen's censorship is far more foundational to the model
>>
File: file.png (46 KB, 959x134)
46 KB
46 KB PNG
>>103347362
The model provides reasoning before it outputs a rating or title.
>>
>>103341985
cry more
>>
>>103347442
And if R1 / llama 4 turns out to be a giant moe like 15Bx12+ or something then this would be the way to go.
>>
>>103340923
they don't need to show anything. normies don't know there's an alternative to chatgpt and they don't want to know. a normie's most important thing in life is trend-following. using something like claude would lower their social market value
>>
>>103340918
they're tied for 1st place, and that's despite google paying off the owners of that leaderboard, which has already been confirmed
>>
>>103347442
the only situation that happens is if we wait until 2026 they make a new socket, and made a 4 channel ddr6 PC.
considering that intel's overpriced CU-DIMMS only hit 160gb/s, that's the only way I can see.
OR, it's a soldered mobile/minipc only CPU... Pretty interesting if they did that... but you just know soldered ram = $800 for a mini pc with 24gb, and $1600 for 48gb.
>>
>>103347641
>>103347641
>>103347641
>>
>>103347615
Sorry, your excuses aren't good enough. When are we getting Sora, Sam?
>>
>>103347631
Its a APU, on chip memory is the entire point. That is why I said 300GBS
>>
>>103347675
When you are killing yourself schizo?
>>
>>103347514
>The model provides reasoning before it outputs a rating or title.
That is actually amazing. How far back have you been logging your decisions?
>>
>>103347722
Tick tock, Sam. o1 isn't top dog anymore, you neutered gpt-4o to the point it's literally sub-Qwen 72B, Dall-E 3 is a fucking joke compared to Flux and even SD 3.5, and what we've seen from Sora leaves much to be desired
Just admit you lost
>>
>>103347675
Tomorrow on ChatGPTs anniversary
>>
>>103342212
qwq is the most censored qwen yet, but nobody cares because they're dunking on shartmerica
>>
>>103347779
based, feels good to see OpenAI's downfall, those smug motherfuckers got what they deserved
>>
>>103347768
Daily going back to June.
>>
>>103340411
literal skill issue
>>
>>103347863
the whole point of ai is to solve skill issues
>>
Largestral still king of RP, CoT is still a meme
>>
File: giphy.gif (1.26 MB, 480x366)
1.26 MB
1.26 MB GIF
>>103346228



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.