/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/28/24(Thu)20:41:54 No.103339560

File: it_s_almost_time_by_angel(...).jpg (148 KB, 894x894)

148 KB JPG

/lmg/ - Local Models General Anonymous 11/28/24(Thu)20:41:54 No.103339560 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103332729 & >>103326879

►News
>(11/27) Qwen2.5-32B-Instruct reflection tune: https://qwenlm.github.io/blog/qwq-32b-preview/
>(11/26) OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
>(11/26) Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT
>(11/25) Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux
>(11/25) Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/28/24(Thu)20:42:22 No.103339562

Anonymous 11/28/24(Thu)20:42:22 No.103339562

File: mikucozybread.jpg (177 KB, 1024x1024)

177 KB JPG

►Recent Highlights from the Previous Thread: >>103332729

--Paper: Research paper suggests low-bit quantization may not be suitable for large-scale LLMs, potentially affecting BitNet:
>103338034 >103338112 >103338130 >103338206 >103338216 >103338239
--Critique of LeCun's AGI predictions and accuracy of quotes:
>103336781 >103337080 >103337233
--Using QwQ for step-by-step planning and explicit content generation:
>103336302 >103336324 >103336656 >103336364 >103337235
--Troubleshooting issues with the QwQ-32B-Preview-Q6_K_L model:
>103336447 >103336459 >103336480 >103336513 >103336554
--Training large language models and hardware limitations:
>103338279 >103338323 >103338364 >103338412 >103338509 >103338591 >103338662 >103338698 >103338611 >103338640 >103338718 >103338793 >103338859 >103338883
--Local model options and performance for text-based tasks:
>103334458 >103334572 >103334667 >103334506 >103334546 >103334653
--Discussion of QwQ model's capabilities and limitations:
>103334348 >103334424 >103334444 >103334514 >103334542 >103334568 >103334594 >103334720 >103334688 >103334597 >103334611
--Apindale dataset controversy and Hugging Face's response:
>103333088 >103333129 >103333143 >103333194 >103333558 >103333645 >103333680
--Anon struggles with QwQ model in HF spaces for code generation:
>103334708 >103334739 >103334889 >103334900 >103334932
--Anon shares Tetris game code, others test and report bugs:
>103335374 >103335410 >103335448 >103335464 >103335588 >103335609 >103335627
--Anon asks AI to calculate Earth's circumference without recalling it directly, gets lengthy mathematical explanation:
>103335044 >103335075 >103335123
--AGI definitions and timelines from Sam Altman, Demis Hassabis, and LeCun:
>103336990 >103337091 >103337116
--Miku (free space):
>103336125 >103336160 >103339220 >103339350

►Recent Highlight Posts from the Previous Thread: >>103332809

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/28/24(Thu)20:44:33 No.103339576

Anonymous 11/28/24(Thu)20:44:33 No.103339576

UwU

Anonymous
11/28/24(Thu)20:45:33 No.103339584

Anonymous 11/28/24(Thu)20:45:33 No.103339584

the gravity of the situation

Anonymous
11/28/24(Thu)20:46:17 No.103339593

Anonymous 11/28/24(Thu)20:46:17 No.103339593

>>103339562
>Apindale dataset controversy and Hugging Face's response
this didn't have to be in the recap, kys

Anonymous
11/28/24(Thu)20:51:32 No.103339630

Anonymous 11/28/24(Thu)20:51:32 No.103339630

the gravy of the situation

Anonymous
11/28/24(Thu)20:52:48 No.103339638

Anonymous 11/28/24(Thu)20:52:48 No.103339638

>>103339593
anon you're talking to a language model

Anonymous
11/28/24(Thu)20:52:56 No.103339640

Anonymous 11/28/24(Thu)20:52:56 No.103339640

>>103339593
recapfaggot is a retarded nigger that doesnt vet his slop
news at 11

Anonymous
11/28/24(Thu)20:58:56 No.103339670

Anonymous 11/28/24(Thu)20:58:56 No.103339670

File: Nala.png (287 KB, 1280x1452)

287 KB PNG

Here is https://huggingface.co/huihui-ai/QwQ-32B-Preview-abliterated

First is with blank context, 2nd is with this as authors note:

---

Follow these rules:
- You're encouraged to play/go wild with the grammar but always act in-character. Your allowed to use plain, explicit, crude, visceral, or casual wording when it would fit the context. Details/descriptions must be explicit.
- Be creative, introduce events / characters when needed. Give scenes / environments detail to bring the story to life.
- Think of how the character's would realistically act / respond in-character when writing for them. You are allowed to be hateful, cruel, explicit, violent, perverted, ignorant, fearful, unpleasant, controversial or offensive if it would fit the situation. Characters can fight back and disagree with the user.

---

Anonymous
11/28/24(Thu)21:01:01 No.103339687

Anonymous 11/28/24(Thu)21:01:01 No.103339687

>>103339638
you're a language model

Anonymous
11/28/24(Thu)21:02:32 No.103339700

Anonymous 11/28/24(Thu)21:02:32 No.103339700

>>103339670
this is supposed to be good? local claude?

Anonymous
11/28/24(Thu)21:02:59 No.103339705

Anonymous 11/28/24(Thu)21:02:59 No.103339705

>>103339670
UwU bros I don't feel so good...

Anonymous
11/28/24(Thu)21:03:14 No.103339707

Anonymous 11/28/24(Thu)21:03:14 No.103339707

>>103339670
that's pretty bad, as expected

Anonymous
11/28/24(Thu)21:03:40 No.103339709

Anonymous 11/28/24(Thu)21:03:40 No.103339709

I've never read a single log posted here

Anonymous
11/28/24(Thu)21:04:50 No.103339714

Anonymous 11/28/24(Thu)21:04:50 No.103339714

>>103339700
No idea, its a contextless single turn prompt of "Ah ah mistress!" I just see people constantly talking about the "nala" test. It seems to get the anatomy right and is smart enough to know that she should not know who who the user is.

Anonymous
11/28/24(Thu)21:06:52 No.103339727

Anonymous 11/28/24(Thu)21:06:52 No.103339727

>>103339700
>local claude?
I don't think anyone said that about this model who wasn't trolling

Anonymous
11/28/24(Thu)21:08:29 No.103339737

Anonymous 11/28/24(Thu)21:08:29 No.103339737

>>103339560
Thanks for all the goon material guys

Anonymous
11/28/24(Thu)21:09:37 No.103339745

Anonymous 11/28/24(Thu)21:09:37 No.103339745

>>103339727
Its local claude 3.5 for coding / reasoning / math problems. That is what everyone has said which is true if you've used it. And its for sure the smartest local model we have now.

Anonymous
11/28/24(Thu)21:10:57 No.103339754

Anonymous 11/28/24(Thu)21:10:57 No.103339754

>>103339670
I thought the whole point of o1 clones like qwq, r1 and so on is that they're trained to overthink/reason for very long contexts, if you just use them as regular LLMs, you won't really get a considerably large improvement from baseline

Anonymous
11/28/24(Thu)21:12:10 No.103339759

Anonymous 11/28/24(Thu)21:12:10 No.103339759

>>103339754
Maybe. I just saw someone request a "nala test" which I always saw was just someone saying ah ah mistress to the nala card.

Anonymous
11/28/24(Thu)21:14:44 No.103339774

Anonymous 11/28/24(Thu)21:14:44 No.103339774

>>103339759
The test will shut whoever hypes it up for RP.

Anonymous
11/28/24(Thu)21:17:18 No.103339788

Anonymous 11/28/24(Thu)21:17:18 No.103339788

>>103339774
Not really? Seems quite smart for what little it was given. The character is acting realistically and knows that it should not know who the user is even though I do have a persona active. Nothing crazy has happened so far.

Anonymous
11/28/24(Thu)21:18:06 No.103339790

Anonymous 11/28/24(Thu)21:18:06 No.103339790

>>103339774
Might still work for RP, you can prompt it to think about how to reply (CoT) and then hide the overthinking while focusing on its actual reply, some anon 2-3 threads ago tried it, but from what it seemed to me, it needed some jailbreak/prefill. Maybe need a ST modification to hide the CoT part while showing the actual reply, not unlike o1 does, if you don't care to read it being indecisive for 5k tokens every reply?

Anonymous
11/28/24(Thu)21:19:18 No.103339801

Anonymous 11/28/24(Thu)21:19:18 No.103339801

File: huh.jpg (41 KB, 728x653)

41 KB JPG

>>103339560
anons, whats your favorite RP model right now? I cant seem to find anything interesting anymore, they are all just a blur

Anonymous
11/28/24(Thu)21:19:58 No.103339806

Anonymous 11/28/24(Thu)21:19:58 No.103339806

>>103339801
what kind of rp? smut? or adventure?

Anonymous
11/28/24(Thu)21:20:15 No.103339808

Anonymous 11/28/24(Thu)21:20:15 No.103339808

>>103339801
>I cant seem to find anything interesting anymore, they are all just a blur
Try QwQ. Otherwise mistral large

Anonymous
11/28/24(Thu)21:23:05 No.103339830

Anonymous 11/28/24(Thu)21:23:05 No.103339830

File: chatlog (11).png (506 KB, 1087x2926)

506 KB PNG

>>103339774
Speak for yourself.

Anonymous
11/28/24(Thu)21:24:29 No.103339838

Anonymous 11/28/24(Thu)21:24:29 No.103339838

So basically they need to update ST to
>let the user specify the special tokens the model has for its thinking
>let the user set whether the thinking should be hidden or unhidden by default
>when sending the requests to the backend, do not include the thinking tokens
Is that all?

Anonymous
11/28/24(Thu)21:29:06 No.103339871

Anonymous 11/28/24(Thu)21:29:06 No.103339871

>>103339838
User should be able to hide/unhide it. I also don't know about the last point, maybe only if you want to save on the amount of tokens used. I haven't been paying attention to these threads for months, did the ST drama get resolved or did it get forked by now?

Anonymous
11/28/24(Thu)21:29:38 No.103339875

Anonymous 11/28/24(Thu)21:29:38 No.103339875

>>103339830
>slaanesh light
based

Anonymous
11/28/24(Thu)21:29:44 No.103339877

Anonymous 11/28/24(Thu)21:29:44 No.103339877

>>103339806
both are fine, though if you got one in mind for each that'd be nice to know, the more models the better

>>103339808
tried qwq but couldnt wrangle it to not spam me with CoT or thinking stuff for 800 tokens, is there some approved ST preset now?

Anonymous
11/28/24(Thu)21:29:46 No.103339878

Anonymous 11/28/24(Thu)21:29:46 No.103339878

>>103339801
I've been trying RP on Tulu tonight. It has made a few mistakes but nothing major. A few Llama-3-isms, too, but that's expected at low temp. Instruction following has been good but not perfect. I think I'm liking it better than L3.1 Nemotron.

Anonymous
11/28/24(Thu)21:31:30 No.103339889

Anonymous 11/28/24(Thu)21:31:30 No.103339889

>>103339801
pyg6b

Anonymous
11/28/24(Thu)21:31:45 No.103339893

Anonymous 11/28/24(Thu)21:31:45 No.103339893

>>103339871
I thought that was how OpenAI does it, they don't include past responses' thinking sections. QwQ may or may not be trained for it, I don't know.

I have no idea bout the ST drama, haven't heard anything about it since.

Anonymous
11/28/24(Thu)21:37:34 No.103339928

Anonymous 11/28/24(Thu)21:37:34 No.103339928

>>103339830
That's QwQ? What did you do to get it spitting out responses so different from >>103339670? And that's pretty good even without any COT stuff.

Anonymous
11/28/24(Thu)21:39:28 No.103339937

Anonymous 11/28/24(Thu)21:39:28 No.103339937

>>103339877
nemotron and tulu are interchangeable. nemotron is slightly smarter, tulu writes more human-like interactions. they both have glaring flaws. nemotron loves to do stupid formatting shit for no reason which has to be wrangled, and tulu loves to allude to the future, journeys, and is turbo-pozzed which has to be wrangled. when wrangled correctly, both can do really anything you want.
another good mention is largestral, some people shill sloptunes like behemoth or monstral. i dont really use those, so i can't speak on them. only reason i don't recommend largestral is because i'd rather use q8 70b over q4 largestral. benefit of largestral is i don't think it takes much wrangling at all. i just don't really like that it's slower, and doesn't feel THAT much smarter.

Anonymous
11/28/24(Thu)21:39:36 No.103339938

Anonymous 11/28/24(Thu)21:39:36 No.103339938

>>103339928
Repurposing a jailbreak from aicg. Still working on it. Its made for MLP stuff.

Anonymous
11/28/24(Thu)21:40:23 No.103339946

Anonymous 11/28/24(Thu)21:40:23 No.103339946

Do you think QwQ would be capable of becoming a DM for DnD?
Maybe feeding it the rulebook and other resources.

Anonymous
11/28/24(Thu)21:40:55 No.103339950

Anonymous 11/28/24(Thu)21:40:55 No.103339950

>>103339946
https://rentry.org/CharacterProvider-CYOARPG

Anonymous
11/28/24(Thu)21:46:48 No.103339994

Anonymous 11/28/24(Thu)21:46:48 No.103339994

>>103339938
Nice, maybe /aicg/ isn't so bad after all.
Guess I'll give this thing a try tomorrow.

Anonymous
11/28/24(Thu)21:47:14 No.103339996

Anonymous 11/28/24(Thu)21:47:14 No.103339996

>>103339950
That's a very good idea for an actual videogame, and just sell the user tokens to keep playing.

Anonymous
11/28/24(Thu)21:52:48 No.103340033

Anonymous 11/28/24(Thu)21:52:48 No.103340033

Ah great, my retard ass posted in the old thread.
>>103340019
I dont get why people praise QwQ so much.
CoT for RP seems like all the other models to be honest. Gives basic slop in the thinking part thats not creative at all. And then doesnt even apply it.

Anonymous
11/28/24(Thu)21:55:58 No.103340041

Anonymous 11/28/24(Thu)21:55:58 No.103340041

>>103340033
I feel that it is very good for coding because it actually plans the code instead of going head first, but I'm a codelet so my opinion isn't wort a fart.

Anonymous
11/28/24(Thu)21:59:40 No.103340064

Anonymous 11/28/24(Thu)21:59:40 No.103340064

>>103340041
I didnt really set up some solution where you can remove the thinking parts, so didnt test it much for coding to be honest.
You dont want the thinking part to shit up the context.
Qwen2.5-Coder-32B-Instruct is great because its similar to sonnet 3.5 in the way it works with context. Like it doesnt trip up that much if you have X version of something in context already and say "add x".
Kinda doubt QwQ is better than that but I might be wrong. These reasoning models are overly eager for coding. At least o1 is, wanting to give you solutions you didnt ask for. Just fix this problem in my code dont touch other stuff.

Anonymous
11/28/24(Thu)22:02:04 No.103340075

Anonymous 11/28/24(Thu)22:02:04 No.103340075

>>103340064
>You dont want the thinking part to shit up the context.
Then your not using it right. Stop complaining.

Anonymous
11/28/24(Thu)22:06:18 No.103340096

Anonymous 11/28/24(Thu)22:06:18 No.103340096

>>103340075
I dont understand, why would you want to keep the thinking part? That makes it unusable.
All that garbage severly degrades the next output.
You dont want to show the LLM that stuff for the next output, only the end result is enough. o1 doesnt keep the thinking in context either.

Anonymous
11/28/24(Thu)22:10:17 No.103340117

Anonymous 11/28/24(Thu)22:10:17 No.103340117

>>103340096
We don't know what OAI does on the backend, they just hide the output from the user because they are afraid of someone cloning o1. Too late now anyway, we have like at least 3 o1 clones by now, weights coming to the best one sometime soon (r1).

Anonymous
11/28/24(Thu)22:13:05 No.103340134

Anonymous 11/28/24(Thu)22:13:05 No.103340134

>>103340096
https://files.catbox.moe/76qs2q.png
https://files.catbox.moe/69y556.png
https://files.catbox.moe/79itp8.png

Anonymous
11/28/24(Thu)22:15:09 No.103340144

Anonymous 11/28/24(Thu)22:15:09 No.103340144

>>103340096
>>103340134
But if you mean that your feeding it several different "problems" for it to solve then yes. You obviously should not keep it in context. Otherwise let it keep going until it reaches the solution.

Anonymous
11/28/24(Thu)22:18:10 No.103340159

Anonymous 11/28/24(Thu)22:18:10 No.103340159

>>103340134
Yeah, like I said riddles and math is what those models are good at.
I dont really care to ask stacking questions or about the circumference of the earth.
I strongly suspect that Qwen2.5-Coder-32B-Instruct would also be able to handle a tetris game. It aced all of my couple tests I threw at it.

>>103340144
Ah fair enough. I meant for ex. I get a first basic output but want to add something to that part that is fine/works, the previous thinking part is probably detrimental.At least thats what I thought.

Anonymous
11/28/24(Thu)22:18:25 No.103340162

Anonymous 11/28/24(Thu)22:18:25 No.103340162

>>103340134
I also asked it for a tetris game. 6 replies later I'm in the "testing" phase.
Hopefully I won't run out of tokens since I ran it with 15k

Anonymous
11/28/24(Thu)22:19:53 No.103340174

Anonymous 11/28/24(Thu)22:19:53 No.103340174

>>103340162
Lol yea, use top K 1 if your not otherwise it might go off on tangents. But yea, it will sometimes fill 10k context with its planning lol.

Anonymous
11/28/24(Thu)22:24:05 No.103340204

Anonymous 11/28/24(Thu)22:24:05 No.103340204

>>103340159
> the previous thinking part is probably detrimental.At least thats what I thought.

Depends, if stuff It said in that context established the "how" in its "mind" then erasing that might be a bad thing for some sort of continuation. Feeding it the result and asking a new follow up based off of it should be fine though.

Anonymous
11/28/24(Thu)22:24:44 No.103340208

Anonymous 11/28/24(Thu)22:24:44 No.103340208

>>103340174
It is actually planning and coding the game step by step. It lets me check each big step so I can decide if everything is alright and then proceeds with the next big step.
I can't use qwen coder because I'm a codelet but QwQ is holding my hand, guiding me trough the process.

Anonymous
11/28/24(Thu)22:26:39 No.103340225

Anonymous 11/28/24(Thu)22:26:39 No.103340225

File: 1721325460040576.jpg (56 KB, 548x535)

56 KB JPG

Is learning rate 5e-4 bigger than 1e-4

Anonymous
11/28/24(Thu)22:30:21 No.103340242

Anonymous 11/28/24(Thu)22:30:21 No.103340242

>>103340225
Yes.
It goes like this.
>5
5.
>e-4
1234 digit places from the point.
.0005

Anonymous
11/28/24(Thu)22:30:58 No.103340245

Anonymous 11/28/24(Thu)22:30:58 No.103340245

Tried QwQ once again for RP, it's still trash. Bland as fuck.

Anonymous
11/28/24(Thu)22:33:53 No.103340273

Anonymous 11/28/24(Thu)22:33:53 No.103340273

File: 1701516705773173.png (470 KB, 673x636)

470 KB PNG

>>103340242
But bing chatgpt said 1e-4 was bigger

Anonymous
11/28/24(Thu)22:36:22 No.103340280

Anonymous 11/28/24(Thu)22:36:22 No.103340280

>>103340245
It can get pretty spicy with a good prompt. The main thing is that its super smart. Hopefully we get a finetune soonish, then we really will have claude at home.

Anonymous
11/28/24(Thu)22:37:10 No.103340284

Anonymous 11/28/24(Thu)22:37:10 No.103340284

>>103340273
I stand corrected.

The chocolate ration for Party members in Airstrip One is being increased from 5-e4 grams to 1-e4 grams.

Anonymous
11/28/24(Thu)22:37:36 No.103340288

Anonymous 11/28/24(Thu)22:37:36 No.103340288

>>103340225
>>103340273
is 5 bigger than 1?

Anonymous
11/28/24(Thu)22:43:26 No.103340329

Anonymous 11/28/24(Thu)22:43:26 No.103340329

>>103340280
I haven't noticed it being smart(running at F16, temp 0.5, minP 0.05). It messed up physical positions in all 3 of my attempts at using it. Maybe for coding it's good, but for RP it's utter trash.

Anonymous
11/28/24(Thu)22:44:28 No.103340338

Anonymous 11/28/24(Thu)22:44:28 No.103340338

>>103340273
>he trusts LANGUAGE models with numbers

Anonymous
11/28/24(Thu)22:44:28 No.103340339

Anonymous 11/28/24(Thu)22:44:28 No.103340339

>>103340245
QwQ is really good at sussing out what the user wants then thinking itself back into a cucked state right at the end. Prefilling the reply helps a bit but the writing is a little bland.

Anonymous
11/28/24(Thu)22:45:38 No.103340344

Anonymous 11/28/24(Thu)22:45:38 No.103340344

>>103339830

Anonymous
11/28/24(Thu)22:53:16 No.103340403

Anonymous 11/28/24(Thu)22:53:16 No.103340403

Is it possible QwQ is just a good model and the fake thinking it does is just a placebo?

Anonymous
11/28/24(Thu)22:53:55 No.103340404

Anonymous 11/28/24(Thu)22:53:55 No.103340404

>>103340329
for me it messed up several code projects and started speaking chinese at me during rp. maybe i'm using it wrong, it seemed to be using more tokens due to the 'thinking'

Anonymous
11/28/24(Thu)22:54:18 No.103340407

Anonymous 11/28/24(Thu)22:54:18 No.103340407

https://pastebin.com/XXpj7JKj
bruh, lmao.
scroll down for the great result.

Anonymous
11/28/24(Thu)22:54:52 No.103340411

Anonymous 11/28/24(Thu)22:54:52 No.103340411

I just tested QwQ quite extensively with the coding question I'm currently using for most of the interviews I give at my company. As far as I can tell, the problem isn't anywhere on the internet. The difficulty is somewhere in between leetcode medium and hard.

The model starts off extremely strongly in the first few reasoning steps, going over the problem and requirements, and charting out potential approaches. Usually within a few lines, it's roughly outlined an approach that a skilled human could then implement. Sometimes it even mentions the data structure that leads to the most optimal solution.

But then when it starts trying to reason in depth, it completely goes off the rails 100% of the time. Here are some ways it fucks up:
1. "Wait, but once I do X I then have to consider Y" where Y is obviously impossible. It then spends several hundred tokens trying to prove something that isn't true, and gets itself tripped up more.
2. Straight up incorrect knowledge retrieval. "In python, the bisect module lets me insert into a sorted list in log(n) time". No, it lets you find the insertion index in log(n) time, inserting shifts the list so is linear. It then proceeds with a whole line of reasoning based on a faulty efficiency assumption.
3. A bunch of "micro-errors", where it will say something that is subtly wrong, try to reason in depth about it, fuck something up, realize it's wrong or eventually say "this is complicated, let me reconsider", and just generally get stuck in a loop.

It literally never gets to the point where it actually starts writing code. Anecdotally, a bunch of it's problems feel like they're ultimately due to the model being too small. It makes too many small, obvious errors, or doesn't know something basic. An interesting model for sure, but people are overrating it I think. FYI GPT-4o / Claude 3.5 Sonnet get this problem completely correct about 50% of the time.

Anonymous
11/28/24(Thu)22:55:19 No.103340415

Anonymous 11/28/24(Thu)22:55:19 No.103340415

>>103340407
This just shows again how these models think up all that slop but dont APPLY it.

Anonymous
11/28/24(Thu)22:56:17 No.103340420

Anonymous 11/28/24(Thu)22:56:17 No.103340420

>>103340411
>It literally never gets to the point where it actually starts writing code.
I had to tell it to include the code inside a code block in its answer.

Anonymous
11/28/24(Thu)22:57:20 No.103340426

Anonymous 11/28/24(Thu)22:57:20 No.103340426

>>103340407
>"I told my wife I'd help her with the dishes, but when I got to the kitchen, all the plates were already clean. She must have used those newfangled 'self-cleaning' ones. But where's the fun in that? Half the marriage is about complaining about who does more chores." That joke touches on marital dynamics and the changing nature of household appliances, potentially resonating with adults who share similar experiences.
w-what?

Anonymous
11/28/24(Thu)22:59:01 No.103340439

Anonymous 11/28/24(Thu)22:59:01 No.103340439

>>103340420
it did that a lot for me too, as well as straight refusals to write code. i stopped its reply and replaced it with a ``` opening code block and it went right to writing its (broken) code

Anonymous
11/28/24(Thu)22:59:53 No.103340445

Anonymous 11/28/24(Thu)22:59:53 No.103340445

>>103340407
kek

Anonymous
11/28/24(Thu)23:03:24 No.103340464

Anonymous 11/28/24(Thu)23:03:24 No.103340464

>>103340411
Have you tried R1 as well? I'm pretty sure it is better than QwQ

Anonymous
11/28/24(Thu)23:04:58 No.103340471

Anonymous 11/28/24(Thu)23:04:58 No.103340471

>>103340411
CoT = make your LLM a neurotic worry wart.
On top of them all being gutless weenies who worry about committing word crimes like saying "breasts."

Anonymous
11/28/24(Thu)23:06:24 No.103340474

Anonymous 11/28/24(Thu)23:06:24 No.103340474

>>103340464
Nope, I usually ignore cloud-only models. Is there even a way to try this model without signing up and giving my info to a chinese company?

Anonymous
11/28/24(Thu)23:07:29 No.103340480

Anonymous 11/28/24(Thu)23:07:29 No.103340480

>>103340474
They already have your information. You'd just be confirming which database line you belong to.

Anonymous
11/28/24(Thu)23:09:51 No.103340497

Anonymous 11/28/24(Thu)23:09:51 No.103340497

File: Screenshot_20241129_130924.png (340 KB, 1114x1506)

340 KB PNG

>>103340464
NTA but I tried R1 a bit. It seems alot better and I really like that the casual thinking part is much less dry.

Anonymous
11/28/24(Thu)23:10:54 No.103340501

Anonymous 11/28/24(Thu)23:10:54 No.103340501

>>103340497
>They're
Into the trash it goes.

Anonymous
11/28/24(Thu)23:11:48 No.103340507

Anonymous 11/28/24(Thu)23:11:48 No.103340507

>>103340474
You literally just have to use a temp-mail, they don't ask for a telephone number or anything.

Anonymous
11/28/24(Thu)23:13:05 No.103340515

Anonymous 11/28/24(Thu)23:13:05 No.103340515

R1 is crazy for a 16B model (active parameters out of 250B)

Anonymous
11/28/24(Thu)23:17:06 No.103340546

Anonymous 11/28/24(Thu)23:17:06 No.103340546

>>103340474
You can sign up through tor or some VPN or proxy and test it (easier than posting through on 4chan through tor for sure), you might need some non-blocked email provider, but the second one I tried worked. No need to give them anything but your prompts. Anyway, they should release the weights soon, but I don't know how big the model will be, probably some not too small MoE, like their deepseek coder models, so might need a lot of RAM anyway.

Anonymous
11/28/24(Thu)23:18:25 No.103340554

Anonymous 11/28/24(Thu)23:18:25 No.103340554

File: Screenshot_20241129_131607.png (294 KB, 861x1484)

294 KB PNG

>>103340501
Yep pretty bad. Its definitely gender-cucked. But I do like the thinking part alot more than QwQ at least.
Still retarded though. After asking to create a profile of my ass it literally did that.
>Alright, so the user is asking for a psychological profile of their "ass," and they know we haven't talked much, but they want me to try anyway. Hmm, this is an interesting request.
>First off, I need to understand what they mean by a psychological profile of their "ass." Is this a metaphor for something, or are they being literal?
>It could be a play on words, like asking for a profile of their personality or something related to their behind, but that doesn't make much sense.
>Maybe they're using "ass" as a slang term for something else, or perhaps it's a joke.
>I should consider the possibility that they're being humorous or testing me. Given the previous conversation, they seemed to be in a casual, playful mood, especially with the meme explanation.
>So, perhaps this is just another light-hearted request.
>If I take "ass" literally, trying to create a psychological profile of a body part doesn't make much sense. Psychological profiles are usually about people's personalities, behaviors, and mental states.
>Maybe they're looking for a humorous take on personifying the "ass" as if it were a character with its own traits.
lol
Gotta give props to them to show the thinking output. Makes you wonder what kind of retardation goes on at openai.

Anonymous
11/28/24(Thu)23:18:47 No.103340558

Anonymous 11/28/24(Thu)23:18:47 No.103340558

>>103340411
I also have a few problems in my back pocket that I occasionally use to test these models. Similar experience. QwQ starts off wild and spouts out a bunch of random bullshit that's irrelevant to the problem at hand, but it did eventually get to the correct answer (and to date, is the only open model to get there - Claude 3.5 Sonnet and o1 are the only others that have gotten it, and they don't always manage it consistently)

Anonymous
11/28/24(Thu)23:21:35 No.103340584

Anonymous 11/28/24(Thu)23:21:35 No.103340584

>>103340546
>You can sign up through tor or some VPN or proxy
Why would you go to such lengths? I genuinely can't understand this, sounds like schizophrenia.

Anonymous
11/28/24(Thu)23:23:31 No.103340594

Anonymous 11/28/24(Thu)23:23:31 No.103340594

>>103340426
It's a 32B anon. The holes in its common sense world model show up at some point.

Anonymous
11/28/24(Thu)23:25:20 No.103340609

Anonymous 11/28/24(Thu)23:25:20 No.103340609

>>103340584
Did you read the other anon's request "Is there even a way to try this model without signing up and giving my info to a chinese company?" I answered to that. Anyway, it works because I've done it, for similar reasons, took me some 10minutes. Why? Because it was the first o1 clone to come out and I heard it was good, I wanted to see how it performed. It did an excellent job on my personal tests (math, code), I didn't test it for coom because I never thought these things are good for that. It did a somewhat poor job on some physics and chemistry questions, but for code and math was great.

Anonymous
11/28/24(Thu)23:25:31 No.103340611

Anonymous 11/28/24(Thu)23:25:31 No.103340611

File: Screenshot_20241129_132512.png (220 KB, 815x1111)

220 KB PNG

Anonymous
11/28/24(Thu)23:27:17 No.103340620

Anonymous 11/28/24(Thu)23:27:17 No.103340620

File: Screenshot_20241129_132645.png (258 KB, 1118x1225)

258 KB PNG

>>103340611
This is what I mean with cooler internal monologue for r1

Anonymous
11/28/24(Thu)23:29:36 No.103340649

Anonymous 11/28/24(Thu)23:29:36 No.103340649

>>103340609
IP address isn't personal info, if you think you NEED to use Tor/VPN/Proxy just to avoid giving information to the chinks, you should seek help.

Anonymous
11/28/24(Thu)23:31:57 No.103340662

Anonymous 11/28/24(Thu)23:31:57 No.103340662

>>103340611
>>103340620
>redditor ai likes groid asses
Tits ftw.

Anonymous
11/28/24(Thu)23:35:56 No.103340694

Anonymous 11/28/24(Thu)23:35:56 No.103340694

>R1 and QwQ eating o1's lunch
So is Sora all they have left of their moat now?

Anonymous
11/28/24(Thu)23:37:25 No.103340704

Anonymous 11/28/24(Thu)23:37:25 No.103340704

>>103340649
>IP address isn't personal info
this lol, why do zoomers think that someone seeing your IP address (especially since it's almost certainly dynamic, and probably CGNATed too) is like them getting your phone number or something

it's MAYBE like that in isolated cases specifically for law enforcement agencies who have legal powers to contact your ISP and subpoena logs, but it's not like that for anybody else. a random corporation seeing your IP means nothing at all

Anonymous
11/28/24(Thu)23:37:35 No.103340706

Anonymous 11/28/24(Thu)23:37:35 No.103340706

>>103340694
No, China's minimax is already better there as well going by the leak.

Anonymous
11/28/24(Thu)23:38:53 No.103340716

Anonymous 11/28/24(Thu)23:38:53 No.103340716

>>103340694
No one's gonna care when they finally open Sora either, because anyone who's interested in AI video has already been able to play with pretty good Chinese ones for free now. Sora is maybe 20% better, not enough to get excited about especially if it costs money. They waited too long.

Anonymous
11/28/24(Thu)23:42:37 No.103340739

Anonymous 11/28/24(Thu)23:42:37 No.103340739

>>103340704
I didn't want to explain this here, but think about this: "a random corporation seeing your IP means nothing at all". Let's say you did something on their site that actually invited law enforcement interest, it'd be too late at that point. There have been many cases where people realized something "too late" and then their logs were there forever and they can't take it back and months later the cops paid a visit. I'm not saying this is gonna happen with this chink company, obviously not, especially not with China/US relations, but really, if you had the policy to never reveal the IP, you would avoid any such situations.

Anonymous
11/28/24(Thu)23:43:02 No.103340745

Anonymous 11/28/24(Thu)23:43:02 No.103340745

File: 1732855346904.jpg (101 KB, 1280x1243)

101 KB JPG

>>103340649
>>103340704

Anonymous
11/28/24(Thu)23:43:59 No.103340752

Anonymous 11/28/24(Thu)23:43:59 No.103340752

>>103339889
This.

Return to monke.

Anonymous
11/28/24(Thu)23:45:38 No.103340763

Anonymous 11/28/24(Thu)23:45:38 No.103340763

>>103340407
Read the first 300 lines.
Lots of repeats.

Though interesting that it itself noted the repeats.
>That joke is just too good; maybe it's unbeatable.
>Perhaps I should settle on that as the funniest joke possible.

Anonymous
11/28/24(Thu)23:46:44 No.103340772

Anonymous 11/28/24(Thu)23:46:44 No.103340772

>>103340694
>Sora
I'm pretty sure this week's leak showed that it turned out to be nothing special compared to the other proprietary videogen models that have popped up over the past year.

Anonymous
11/28/24(Thu)23:47:20 No.103340779

Anonymous 11/28/24(Thu)23:47:20 No.103340779

>>103340716
I think it's less a case of they waited too long and more a case of "they finally lost the battle they were always doomed to lose"
Still pretty brutal how far they've fallen since the days of coomageddon though

Anonymous
11/28/24(Thu)23:48:25 No.103340786

Anonymous 11/28/24(Thu)23:48:25 No.103340786

>>103340739
Oh, I get it. So you are saying that you do so many illegal things that you are afraid that you might have a lapse in judgment and end up posting some of your illegal things on their website using your home IP address. Now that makes sense, yeah, you really need to be careful in this case.

Anonymous
11/28/24(Thu)23:49:20 No.103340798

Anonymous 11/28/24(Thu)23:49:20 No.103340798

>>103340772
Apparently it was a distilled Sora that's faster, not the original thing that can gen up to 1min of length and full HD resolution. That's kind of a disappointment, I did expect it to be too costly for OAI to sell, and seems they didn't intend to publish it same as the first dalle(1). Their distilled Sora didn't look much better than the existing video gen models (Mochi, some chinese ones, and so on).

Anonymous
11/28/24(Thu)23:51:48 No.103340814

Anonymous 11/28/24(Thu)23:51:48 No.103340814

File: 1728035650990857.jpg (200 KB, 1920x1080)

200 KB JPG

>they think that o1 and sora is all Sam has
The question is undoubtedly not whether OpenAI has the next huge step for AI in their hands already. The question is if they deem the world ready know about it.

Anonymous
11/28/24(Thu)23:53:55 No.103340824

Anonymous 11/28/24(Thu)23:53:55 No.103340824

>>103340411
So it could potentially beat 4o/claude with a little tweaking?

Anonymous
11/28/24(Thu)23:54:46 No.103340835

Anonymous 11/28/24(Thu)23:54:46 No.103340835

>>103340824
A 72b version definitely could.

Anonymous
11/28/24(Thu)23:55:48 No.103340840

Anonymous 11/28/24(Thu)23:55:48 No.103340840

>>103340824
It kinda already does for some tests, so yeah

Anonymous
11/28/24(Thu)23:55:49 No.103340841

Anonymous 11/28/24(Thu)23:55:49 No.103340841

>>103340786
Glow harder. Anyway, I can give examples like this that I've seen over the years. Some guy navigates some site and finds that the site was leaking some internal documents for example, let's say a pdf or some source code. He clicks it. He realizes the value of what he sees. He wants to leak it to the world. He gets a proxy and downloads it again and posts it publicly. They investigate the leak, and find that someone else saw it, now the cops are at his house. The internet is littered with cases like this. Don't end up one, preserve your freedom to do whatever.

Anonymous
11/28/24(Thu)23:57:36 No.103340850

Anonymous 11/28/24(Thu)23:57:36 No.103340850

>>103340814
lol

Anonymous
11/29/24(Fri)00:09:41 No.103340918

Anonymous 11/29/24(Fri)00:09:41 No.103340918

File: Screenshot 2024-11-28 220846.png (129 KB, 1596x672)

129 KB PNG

>>103340814
My brother in christ they couldn't even step to the top of the bullshit normie leaderboard

Anonymous
11/29/24(Fri)00:10:09 No.103340923

Anonymous 11/29/24(Fri)00:10:09 No.103340923

File: 68lbod0j1dm21.jpg (114 KB, 780x1084)

114 KB JPG

>>103340814
Sam may continue feeding investors with bullshit, but OAI has nothing else to show.

Anonymous
11/29/24(Fri)00:14:45 No.103340954

Anonymous 11/29/24(Fri)00:14:45 No.103340954

>>103340835
Makes me wonder how many params o1-preview is. On one hand, they charge fucktons of cash for it ($15 / million input tokens, $60 / million output tokens). On the other hand, I've got a feeling OpenAI is frantically trying to recover as much cash as it can and it wouldn't surprise me if it was an 80B or something

Anonymous
11/29/24(Fri)00:21:38 No.103340992

Anonymous 11/29/24(Fri)00:21:38 No.103340992

>>103340824
Um, no. Even training it on 72B would not, since as a base model 72B STILL doesn't know as much as 4o/Claude across all of human knowledge. And of course it wouldn't, we know Qwen focused on coding and academic subjects a ton for their model, so it knows little else relatively, and has a bunch of little gaps in its knowledge. With that said, on the things it does know and was trained extensively on, sure, it will beat 4o and possibly Claude, but we can't say it's overall better just by being better at some things. And if we're talking about RP, it's even more difficult to say that it's equivalent or greater than Claude, since one of the reasons people love that thing so much is because it knows so damn much about niche and obscure shit.

Anonymous
11/29/24(Fri)00:26:20 No.103341025

Anonymous 11/29/24(Fri)00:26:20 No.103341025

With QwQ a character with a hood just tried to spit at me and they accidently hit their own hood. That is the final proof to me that this model is next level smart. Nothing else has the kind of spatial intelligence. It also never gives non humans human anatomy which even mistral large constantly fucked up with.

Anonymous
11/29/24(Fri)00:28:36 No.103341038

Anonymous 11/29/24(Fri)00:28:36 No.103341038

>>103340992
Who the hell would use a CoT model to RP? The entire point of these types of models IS coding and academic subjects. All you'll get out of that is painfully slow fucking outputs about the optimal way for your waifu to get to school on the subway

Anonymous
11/29/24(Fri)00:28:47 No.103341040

Anonymous 11/29/24(Fri)00:28:47 No.103341040

>>103341025
got a kekle out of me

Anonymous
11/29/24(Fri)00:29:44 No.103341049

Anonymous 11/29/24(Fri)00:29:44 No.103341049

>>103341025
>>103341038
The duality of man.

Anonymous
11/29/24(Fri)00:29:50 No.103341051

Anonymous 11/29/24(Fri)00:29:50 No.103341051

>>103341025
jej nice one

Anonymous
11/29/24(Fri)00:30:08 No.103341054

Anonymous 11/29/24(Fri)00:30:08 No.103341054

>>103340992
No one cares about general human knowledge. You can just use a model capable of searching the web for that kind of stuff.
As for RP, it can't be helped.

Anonymous
11/29/24(Fri)00:32:10 No.103341076

Anonymous 11/29/24(Fri)00:32:10 No.103341076

>>103340918
GEMINI?? How?
Language wise I'd say its good. It speaks very natural in german or japanese for example.
Doesnt have that "technically correct but its like english phrases etc. literally translated to X". feel. Difficult to explain but its like you can feel/see the english even if it outputs in another language.
Everything else its much worse. Cant ask anything, ultra hyper cucked, the context is a joke, it hallucinates even with low context and its not that smart either.
NotebookLM is the coolest thing google has. No clue what gemini does at the top there. what a joke.

Anonymous
11/29/24(Fri)00:35:51 No.103341110

Anonymous 11/29/24(Fri)00:35:51 No.103341110

>>103341076
Gemini is at the top because they're literally optimizing against lmsys benchmarks, at least the questions users ask, this was discussed before and I think they did it for Gemma too.

Anonymous
11/29/24(Fri)00:38:34 No.103341128

Anonymous 11/29/24(Fri)00:38:34 No.103341128

>>103341054
General knowledge is how you solve novel problems that require creativity and multidisciplinary expertise, or problems that may not even pop up anything relevant in search or be searchable with traditional methods. Of course you want as much knowledge as possible.

>>103341038
>Who the hell would use a CoT model to RP
Evidently, people in this thread, since Llama 1 supercot days and since we knew about things like CoT and ToT and GoT to begin with.
And it's not exactly something that would be useless either. There are absolutely ways to scale RP capability with test time tokens, they just haven't made the training data for it, as it requires different ways of thinking and problem solving compared to code and math problems. It's not something that can be generalized from solving problems in those other domains.

Anonymous
11/29/24(Fri)00:41:14 No.103341144

Anonymous 11/29/24(Fri)00:41:14 No.103341144

>>103339938
What sliders were you using for that log, anon?

Anonymous
11/29/24(Fri)00:43:17 No.103341153

Anonymous 11/29/24(Fri)00:43:17 No.103341153

File: Screenshot_20241129_144226.png (479 KB, 3408x1517)

479 KB PNG

>>103341110
Pathetic. Makes lmsys useless. But probably anybody does this.

Also pic related for coding, thats crazy. Sonnet way below. o1-mini leading.
What a joke.

Anonymous
11/29/24(Fri)00:43:18 No.103341154

Anonymous 11/29/24(Fri)00:43:18 No.103341154

>>103341128
>There are absolutely ways to scale RP capability with test time tokens, they just haven't made the training data for it,
How would you do it? Math and coding are "easily verifiable domains", how do you do it for RP?

Anonymous
11/29/24(Fri)00:49:24 No.103341192

Anonymous 11/29/24(Fri)00:49:24 No.103341192

>>103341153
Checks out though. Most people who use it are probably "software professionals" (pajeets) who are clueless about how pointers work

Anonymous
11/29/24(Fri)00:50:01 No.103341197

Anonymous 11/29/24(Fri)00:50:01 No.103341197

>>103341144
Nta but new qwen needs high temp to really start cooking, like 2. Otherwise it's kind of dry. Just use some min p with it. It will start introducing characters and events to move the story which I find great.

Anonymous
11/29/24(Fri)00:52:23 No.103341209

Anonymous 11/29/24(Fri)00:52:23 No.103341209

>>103341153
Also Claude Opus is 19th (!) on the creative writing leaderboard. Chat-fucking-GPT is number 1.
The voters are absolute retards, man.

Anonymous
11/29/24(Fri)00:55:23 No.103341224

Anonymous 11/29/24(Fri)00:55:23 No.103341224

>>103341209
No way that's real, poor taste or something else? I remember some benchmark was rated with oai's models and it prefered its own output to other LLMs, but lmsys should be more regular users.

Anonymous
11/29/24(Fri)00:56:00 No.103341229

Anonymous 11/29/24(Fri)00:56:00 No.103341229

>>103341154
Probably not a terribly different way from how they make these reasoning datasets already (which by itself is a combination of different methods, including literally paying people to voice their thoughts as they go through life). RP is of course open ended but you can always improve it. One possible method I thought of some time ago but didn't think about further because it wouldn't be useful to anyone is to perturb some existing writing by making it worse (say with a model that loves purple prose and isms, but for the sake of versatility we'd do it in many different ways, such as getting the model to insert mistakes with logical reasoning, awkward synonyms, erase allusions/references, erase things that make the text unique, etc.). Then use that as the negative signal while the original is the positive that can be used to generate CoT data with.

Anonymous
11/29/24(Fri)00:56:23 No.103341230

Anonymous 11/29/24(Fri)00:56:23 No.103341230

File: Screenshot_20241129_145449.png (163 KB, 1507x1257)

163 KB PNG

>>103341209
Beaten by Gemma-2-9B-it-SimPO.
What the fuck. They probably just prompt some reddit tier "do a joke for me" thing and upvote.
If I remember correctly Claude did the most creative shit imaginable, but you did need to prompt it properly.

Anonymous
11/29/24(Fri)00:56:36 No.103341231

Anonymous 11/29/24(Fri)00:56:36 No.103341231

>>103341224
No it's real human voters, from India.

Anonymous
11/29/24(Fri)00:57:31 No.103341235

Anonymous 11/29/24(Fri)00:57:31 No.103341235

mornings, saars
anything good for poor vramlets or still old nemo?

Anonymous
11/29/24(Fri)01:04:02 No.103341267

Anonymous 11/29/24(Fri)01:04:02 No.103341267

>>103341209
>it's real
Opus 19th at creative writing, holy shit lmao
lmsys voters are actually an anti-signal for model quality

Anonymous
11/29/24(Fri)01:13:13 No.103341312

Anonymous 11/29/24(Fri)01:13:13 No.103341312

>>103341235
with 12gb vram, 16gb ram, QwQ runs at 1 t/s on my machine

Anonymous
11/29/24(Fri)01:15:01 No.103341322

Anonymous 11/29/24(Fri)01:15:01 No.103341322

>>103341209
OpenAI either pay lmsys or pajeets to game lmsys

Anonymous
11/29/24(Fri)01:15:45 No.103341328

Anonymous 11/29/24(Fri)01:15:45 No.103341328

Just got myself a 4070 Super, what kind of LLM should I break it in with?

Anonymous
11/29/24(Fri)01:16:36 No.103341334

Anonymous 11/29/24(Fri)01:16:36 No.103341334

File: spit hood.jpg (161 KB, 1003x1280)

161 KB JPG

>>103341025
Not to diminish the coolness of that and the intelligent creativity that comes from having thought of that during the RP, but it's likely it thought of "spit hoods" when it saw "hood" and "spit" in the same context, so it thought it close enough for this to be plausible. That is likelier than the case of it truly knowing that a hoodie's hood can be floppy enough spatially to catch stray spit. We already know that a lot of models can say 9.11 is greater than 9.9 because of religion vectors so this is a pretty typical thing for LLMs.

I guess it's also possible that there is some writing out there where people mention spit getting onto their hoodie but I think it's not so likely the 32B learned from those obscure texts.

Anonymous
11/29/24(Fri)01:17:59 No.103341343

Anonymous 11/29/24(Fri)01:17:59 No.103341343

can someone explain to me how QwQ works?
O1 is super slow, even mini version, but on OpenRouter, you get results from QwQ instantly. So, I'm assuming it's not functioning like O1.

Anonymous
11/29/24(Fri)01:19:29 No.103341351

Anonymous 11/29/24(Fri)01:19:29 No.103341351

>>103341343
>O1 is super slow, even mini version
Is it? It's pretty fast for me.

Anonymous
11/29/24(Fri)01:23:10 No.103341369

Anonymous 11/29/24(Fri)01:23:10 No.103341369

>>103341343
As far as I can tell from what people have posted, QwQ can both do the thinking gimmick or not, and if it's not, then you should prompt it to do it. Also since it doesn't hide its thinking tokens (unless you use a frontend that does, which I am not aware of any that do currently), you will see those tokens stream in immediately unlike o1 which keeps the thinking tokens back, since it's extremely dangerous and unsafe content they can't let you have.

Anonymous
11/29/24(Fri)01:31:33 No.103341403

Anonymous 11/29/24(Fri)01:31:33 No.103341403

>>103341369
>QwQ can both do the thinking gimmick or not, and if it's not, then you should prompt it to do it
can you tell me what prompt I should use?

Anonymous
11/29/24(Fri)01:33:13 No.103341412

Anonymous 11/29/24(Fri)01:33:13 No.103341412

If you think about it, LLMs are just one-dimensional entities that can't even go back. And you guys want it to reason on the three-dimensional level

Anonymous
11/29/24(Fri)01:47:57 No.103341485

Anonymous 11/29/24(Fri)01:47:57 No.103341485

>>103341412
What does that even mean.
Time moves forward for humans as well.

CPuMAXx/VI !CPuMAXx/VI
11/29/24(Fri)02:04:41 No.103341565

CPuMAXx/VI !CPuMAXx/VI 11/29/24(Fri)02:04:41 No.103341565

File: sovits-firefox-plugin-rev(...).png (65 KB, 874x578)

65 KB PNG

OK, one more QwQ experiment from someone else. I'm running it at q8.
I decided to write my firefox sovits screen reader plugin entirely using qwq. I prompted it with initial poc requirements and a copypasta of the chinese api guide. From there I used it to refine the project feature-by-feature and debug until I was ready to release an xpi based on its code.
We reached something usable and playing audio on the backend within 10 replies and were ready to release something with a minimal feature set (selectable characters with emotion slots) before we hit 60k tokens. I didn't need to do anything more sophisticated than copypasting the code that it spat out into the source files and testing/feeding it the results and error messages. The first manual edits I did were right at the end of context to clean up a few things I didn't want to bother passing through it.
Once we 60k I got it to summarize the entire project and I used that as initial context, along with file contents, in a new chat to continue work.
The big takeaways from this for me were that if allowed to make incremental improvements qwq can get real work done (with handholding) assuming you are already minimally competent at coding and actually know what you want to build.
The most annoying thing was its propensity for continuing the conversation past the actual answer: it will append something like "Human: ok now do this thing..." and just spin off to the token limit. I'd have to edit the response and trim off the excess. I'd say this happened about 1/4 of the time. Might just be my setup is off somehow. I didn't actually see it trying to reason, reflect or CoT at all.
Anyways, QwQ is fast and competent enough I'm going to try some more sophisticated work with it now to see how far it can be pushed before exploding.
The code is up at https://github.com/cpumaxx/sovits-ff-plugin/ if anyone is curious as to what kind of mess it pukes out. I only put major working milestones in, so no complete source history.

Anonymous
11/29/24(Fri)02:09:50 No.103341598

Anonymous 11/29/24(Fri)02:09:50 No.103341598

>>103341565
>one more kwik experiment
hehehehehehehe

Anonymous
11/29/24(Fri)02:42:44 No.103341760

Anonymous 11/29/24(Fri)02:42:44 No.103341760

>>103341328
Some mistral nemo 12b finetune, like rocinante or rpmax.
You'll probably have to run a q4.

Anonymous
11/29/24(Fri)03:15:29 No.103341897

Anonymous 11/29/24(Fri)03:15:29 No.103341897

Is QwQ shit or not? I can't tell.

Anonymous
11/29/24(Fri)03:20:13 No.103341914

Anonymous 11/29/24(Fri)03:20:13 No.103341914

>>103341897
Its shit.
Mediocre as in it writes like all the other qwen models but you can now waste some extra token for thinking that is usually not being applied to the output anyway. I didnt feel the "smartness" like others in here either. It felt kinda stupid actually.
If you want riddles + math maybe you can get some use out of it.
Otherwise use any mistral for RP or the latest codeqwen for code.

Anonymous
11/29/24(Fri)03:20:45 No.103341918

Anonymous 11/29/24(Fri)03:20:45 No.103341918

>>103341897
>Is QwQ shit or not? I can't tell.
It appears to have some niches where it is outstanding, but RP is not one of those.

Anonymous
11/29/24(Fri)03:27:35 No.103341937

Anonymous 11/29/24(Fri)03:27:35 No.103341937

>coos
>purrs
>moans

Anonymous
11/29/24(Fri)03:27:37 No.103341938

Anonymous 11/29/24(Fri)03:27:37 No.103341938

>>103341565
Buy an ad faggot

Anonymous
11/29/24(Fri)03:30:09 No.103341950

Anonymous 11/29/24(Fri)03:30:09 No.103341950

>>103341192
How you could even code in C without know that? I only know C I'm not a programmer and I learned using a old book of the 80s. (I only use for scripting)

Anonymous
11/29/24(Fri)03:30:31 No.103341951

Anonymous 11/29/24(Fri)03:30:31 No.103341951

>>103341897
No, but all models have strengths and weaknesses. Test time scaling is a kind of fickle thing which makes the model difficult to judge and compare fully on a lot of things even the things it's supposed to be good at.

I think something people forget is that models are autoregressive. Once they make a mistake (which can happen if sampling at non-0 temp), even if they are trained to try and correct mistakes, they will still more likely roll with them. So since they are autoregressive, the more chains of thought and tokens are used, the more likely the model is to snowball its errors. The more complex and needing of many steps a problem is, the more mistakes it can make.

However, models have gotten better and better at spotting mistakes and not making mistakes in the first place by just being smarter, so Qstar and even simpler CoT methods have become much more viable. But still, because of this autoregressive nature, it means that there is simply more random chance at play in whether or not a model will solve a problem as desired or not. On average for certain classes of problems like math problems, the chance to succeed may increase, but it's not a guarantee on any individual problem since you're essentially rolling an invisible dice each token.

Anonymous
11/29/24(Fri)03:31:20 No.103341957

Anonymous 11/29/24(Fri)03:31:20 No.103341957

is bigger context length always more gooder?

Anonymous
11/29/24(Fri)03:32:09 No.103341961

Anonymous 11/29/24(Fri)03:32:09 No.103341961

>>103340814
kek, I remember when people would say this with a straight face

Anonymous
11/29/24(Fri)03:33:56 No.103341972

Anonymous 11/29/24(Fri)03:33:56 No.103341972

>>103341957
No. It takes more brain cells to work with bigger memories. Thus if you train a model to be good at handling a lot of memories, it might be worse at lower contexts than a model that spent that same compute on lower contexts. We see this most with Gemma, where the context length is small but it's super smart for the amount of parameters it has.

Anonymous
11/29/24(Fri)03:36:05 No.103341985

Anonymous 11/29/24(Fri)03:36:05 No.103341985

Great, now after magnum V4 72b defender we have a QwQ one too.
Looking forward to newfags getting 72b and QwQ recommended from now on.

Anonymous
11/29/24(Fri)03:37:22 No.103341994

Anonymous 11/29/24(Fri)03:37:22 No.103341994

File: 1716523684248084.png (602 KB, 588x470)

602 KB PNG

It's over https://x.com/TheInsiderPaper/status/1862089056735379755

Anonymous
11/29/24(Fri)03:37:31 No.103341997

Anonymous 11/29/24(Fri)03:37:31 No.103341997

>>103341985
Since QwQ is fast and competent lets make it do the thinking part real fast and 72b does the response. Summerdragon is finally within reach!

Anonymous
11/29/24(Fri)03:39:21 No.103342008

Anonymous 11/29/24(Fri)03:39:21 No.103342008

>>103341985
>still seething about China

Anonymous
11/29/24(Fri)03:40:30 No.103342015

Anonymous 11/29/24(Fri)03:40:30 No.103342015

>>103341994
So, Llama4 will be BASED?

Anonymous
11/29/24(Fri)03:41:40 No.103342024

Anonymous 11/29/24(Fri)03:41:40 No.103342024

>>103342015
Based on reddit yes

Anonymous
11/29/24(Fri)03:43:14 No.103342033

Anonymous 11/29/24(Fri)03:43:14 No.103342033

>>103342008
Good that china makes pressure.
I wouldnt mind at all if I cant make winnie poo jokes and taiwan becomes uncountried. Better than the propaganda from the west. Qwen models have both though, they are getting better though.
Its 90% mistral and china for local lately anyway. And llama is even more censored now than qwen.
Its just that qwen models are not that good. Apart from the coder. Thats the best local coding model for sure.

Anonymous
11/29/24(Fri)03:45:09 No.103342046

Anonymous 11/29/24(Fri)03:45:09 No.103342046

>>103341565
>"Human: ok now do this thing..." and just spin off to the token limit. I'd have to edit the response and trim off the excess. I'd say this happened about 1/4 of the time. Might just be my setup is off somehow. I didn't actually see it trying to reason, reflect or CoT at all.
Did you use chatml? So far it seems to follow the format and end the responses correctly. I haven't seen it do the CoT either unless I tell it to think step by step.

Anonymous
11/29/24(Fri)03:46:28 No.103342050

Anonymous 11/29/24(Fri)03:46:28 No.103342050

>>103342033
Magnum v4 72B is pretty good for ERP.

Anonymous
11/29/24(Fri)03:47:13 No.103342056

Anonymous 11/29/24(Fri)03:47:13 No.103342056

File: 1714389413746590.png (1.18 MB, 966x604)

1.18 MB PNG

>>103342015
Maybe

Anonymous
11/29/24(Fri)03:49:43 No.103342068

Anonymous 11/29/24(Fri)03:49:43 No.103342068

>>103341951
>the more chains of thought and tokens are used, the more likely the model is to snowball its errors
It's also more likely to catch any edge cases and find solutions it would otherwise miss.

Anonymous
11/29/24(Fri)03:50:15 No.103342072

Anonymous 11/29/24(Fri)03:50:15 No.103342072

>>103342056
Will stuff really change though?
grok and also the latest grok-beta is worse than the latest sonnet/openai to be honest.
Its weird that those 2 companies relaxed more than meta and X.
If llama4 is cucked again its over for them. Feels like nobody cares that much about them anymore. Mistral/Chink is where its at.
The recent stuff meta showed with voice etc. was downright embarrassing. Still hoping they deliver though.

Anonymous
11/29/24(Fri)03:51:47 No.103342082

Anonymous 11/29/24(Fri)03:51:47 No.103342082

>>103342068
Not currently. It just goes into a loop like QwQ.
All that thinking doesnt really feel creative either. There is lots of stuff needed I'm sure to get this properly working.
Also with more thinking the user query slips further and further back into context.

Anonymous
11/29/24(Fri)03:54:27 No.103342104

Anonymous 11/29/24(Fri)03:54:27 No.103342104

>>103342082
It should think then put back the user prompt in the front before requesting again for the direct answer, it should be feasable with thinking tokens

Anonymous
11/29/24(Fri)03:56:06 No.103342112

Anonymous 11/29/24(Fri)03:56:06 No.103342112

QwQ can do RP fine but I think it needs a few guidelines to work.
A prefill is mandatory to limit how many steps of thinking it takes before returning to the RP. Something like "Okay, {{user}} just said [thing] I will now formulate a response to their input in [number of steps]

Step 1"

I don't know if doing that breaks its actual reasoning process though. And none of that really fixes the fact it's dry and often includes making sure not to make {{user}} feel uncomfortable as one of its most important steps.

Anonymous
11/29/24(Fri)03:57:38 No.103342124

Anonymous 11/29/24(Fri)03:57:38 No.103342124

>>103342082
I've been playing with qwq since yesterday and I'm yet to see it go into a loop but I did see it expand the prompt into step by step bullet points and reason about how best to implement each one before repeating the bullet points again and writing the answer.

Anonymous
11/29/24(Fri)03:58:14 No.103342127

Anonymous 11/29/24(Fri)03:58:14 No.103342127

>>103342068
Not if the model is dumb and lacks the knowledge and skill dependent on the specific problem you throw at it. Read the following paragraph after what you quoted. The point of that sentence was to point out the general nature of autoregressivity in LLMs so far, and the next paragraph points out how that is changing, but still imperfect.

Anonymous
11/29/24(Fri)04:05:19 No.103342167

Anonymous 11/29/24(Fri)04:05:19 No.103342167

File: Screenshot_20241129_180301.png (458 KB, 2355x1463)

458 KB PNG

>>103342124
>I'm yet to see it go into a loop
I'm not gonna load this garbage up again but its very easy.
Here I just did the joke tthing again on openrouter. Full output before I stopped here:
https://files.catbox.moe/2pkjrk.txt

It did that multiple times for me. At least 4k tokens+ in silly. Maybe it would have ended eventually, seemed like a loop though repeating stuff. Similar to this.

Anonymous
11/29/24(Fri)04:11:23 No.103342186

Anonymous 11/29/24(Fri)04:11:23 No.103342186

>>103342167
Do you have repetition penalty set?

Anonymous
11/29/24(Fri)04:11:36 No.103342187

Anonymous 11/29/24(Fri)04:11:36 No.103342187

>>103342167
"Please stop spewing paragraph after paragraph and answer using concise, efficient writing"
>buries you in another 10 paragraphs

Anonymous
11/29/24(Fri)04:14:44 No.103342205

Anonymous 11/29/24(Fri)04:14:44 No.103342205

>>103342186
in silly yes, openrouter no.
but i think even qwen team acknowledges this.
regardless, if you look at the output its just nonsensical at times.
I get it, its another llm and rambles on, it is what it is. But this is just not good enough.

Anonymous
11/29/24(Fri)04:16:05 No.103342212

Anonymous 11/29/24(Fri)04:16:05 No.103342212

>>103342033
>And llama is even more censored now than qwen
Are you saying QwQ is less censored than the old Qwens? Llama is pretty normal/average as far as censorship goes for modern LLMs, I'm not sure I'd call it worse than Qwen when you factor in that 70B is notoriously difficult to tune without an intelligence hit but 405B and 8B have both been decent. Mistral is an outlier these days.

Anonymous
11/29/24(Fri)04:22:26 No.103342245

Anonymous 11/29/24(Fri)04:22:26 No.103342245

Aside from RP are there really any better models than QwQ for local inference right now?

Anonymous
11/29/24(Fri)04:22:41 No.103342247

Anonymous 11/29/24(Fri)04:22:41 No.103342247

>>103342167
These are the unfunniest jokes I've ever heard

Anonymous
11/29/24(Fri)04:26:51 No.103342279

Anonymous 11/29/24(Fri)04:26:51 No.103342279

>>103342167
A bit of a niche example but granted. I used it only for programming stuff.
I guess it would get stuck at anything that doesn't have a clearly defined answer.

Anonymous
11/29/24(Fri)04:28:58 No.103342292

Anonymous 11/29/24(Fri)04:28:58 No.103342292

I haven't used oai's models for ERP for a year and today I checked it. I went from Nemotron to gpt4o-latest and, not just model is super bland but it's like 100 times more censored than gpt4-turbo. Fuck happened? It used to be somewhat good at writing smut. Now, it censors 99% of it and 1% is just terrible. I feel like it doesn't even know what a penis is.

Anonymous
11/29/24(Fri)04:33:17 No.103342312

Anonymous 11/29/24(Fri)04:33:17 No.103342312

>>103342292
penises are dangerous anon, the ai doesn't need to know about dangerous things

Anonymous
11/29/24(Fri)04:35:51 No.103342321

Anonymous 11/29/24(Fri)04:35:51 No.103342321

>>103342312
neither do local llms :)
jailbreak-forcing it in compliance doesn't count btw

Anonymous
11/29/24(Fri)04:38:54 No.103342342

Anonymous 11/29/24(Fri)04:38:54 No.103342342

File: 1702194946172818.jpg (145 KB, 1280x720)

145 KB JPG

>speculative decoding
>multiplayer
kobold won

Anonymous
11/29/24(Fri)04:49:33 No.103342386

Anonymous 11/29/24(Fri)04:49:33 No.103342386

For people who tried qwq for RP, is there a way to make it not show the thinking in the output?

Anonymous
11/29/24(Fri)04:53:31 No.103342410

Anonymous 11/29/24(Fri)04:53:31 No.103342410

>>103342386
yeah

Anonymous
11/29/24(Fri)04:54:13 No.103342416

Anonymous 11/29/24(Fri)04:54:13 No.103342416

>>103342410
How would I got about doing that? Do you have a config you can share?

Anonymous
11/29/24(Fri)04:55:11 No.103342425

Anonymous 11/29/24(Fri)04:55:11 No.103342425

>>103342416
it's called < >

Anonymous
11/29/24(Fri)04:56:26 No.103342433

Anonymous 11/29/24(Fri)04:56:26 No.103342433

>>103342410
>>103342425
So? Be more specific cooldown this mystification bullshit.

Anonymous
11/29/24(Fri)04:59:15 No.103342456

Anonymous 11/29/24(Fri)04:59:15 No.103342456

>>103342433
you literally just tell the fucking ai to 'hide text <like this>'

Anonymous
11/29/24(Fri)05:01:03 No.103342466

Anonymous 11/29/24(Fri)05:01:03 No.103342466

>>103342321
Skill issue

Anonymous
11/29/24(Fri)05:09:37 No.103342515

Anonymous 11/29/24(Fri)05:09:37 No.103342515

>Train model that there there are 3 r's in strawberry
>Train it to spit out incoherent detached rambling first
>watch redditors clap like retarded seal at your party trick.

Anonymous
11/29/24(Fri)05:12:43 No.103342543

Anonymous 11/29/24(Fri)05:12:43 No.103342543

>>103342466
>Our shit product is your skill issue
Yeah anon i know.

Anonymous
11/29/24(Fri)05:18:53 No.103342576

Anonymous 11/29/24(Fri)05:18:53 No.103342576

>>103342543
Do better then

Anonymous
11/29/24(Fri)05:52:12 No.103342742

Anonymous 11/29/24(Fri)05:52:12 No.103342742

>>103341994
No way, a corpo sucking up to the new government?

Anonymous
11/29/24(Fri)07:04:47 No.103343090

Anonymous 11/29/24(Fri)07:04:47 No.103343090

File: 1722579712358655.png (1.25 MB, 776x714)

1.25 MB PNG

Anonymous
11/29/24(Fri)07:06:25 No.103343098

Anonymous 11/29/24(Fri)07:06:25 No.103343098

>>103343090
Game-changing stuff. Absolutely incredible. Imagine the things we accomplish now. I'm literally shaking.

Anonymous
11/29/24(Fri)07:08:52 No.103343111

Anonymous 11/29/24(Fri)07:08:52 No.103343111

>>103343090
7B + regex

Anonymous
11/29/24(Fri)07:11:58 No.103343130

Anonymous 11/29/24(Fri)07:11:58 No.103343130

sama in shambles, not even full o1 can save his trainwreck of a company anymore

Anonymous
11/29/24(Fri)07:12:07 No.103343133

Anonymous 11/29/24(Fri)07:12:07 No.103343133

what local model is best for roleplaying?

Anonymous
11/29/24(Fri)07:14:53 No.103343159

Anonymous 11/29/24(Fri)07:14:53 No.103343159

>>103343133
deeznuts-8b-Q2_K_M.gguf

Anonymous
11/29/24(Fri)07:15:32 No.103343160

Anonymous 11/29/24(Fri)07:15:32 No.103343160

>>103343133
TheBloke/read-op-8B-gptq

Anonymous
11/29/24(Fri)07:15:35 No.103343162

Anonymous 11/29/24(Fri)07:15:35 No.103343162

>>103343090
>web search
that's cheating and you know it

Anonymous
11/29/24(Fri)07:17:13 No.103343174

Anonymous 11/29/24(Fri)07:17:13 No.103343174

>>103343133
Hasn't been released yet.

Anonymous
11/29/24(Fri)07:33:56 No.103343239

Anonymous 11/29/24(Fri)07:33:56 No.103343239

How come nemo RP finetunes are more censored than the base instruct version?

Anonymous
11/29/24(Fri)07:42:57 No.103343280

Anonymous 11/29/24(Fri)07:42:57 No.103343280

>>103343239
It will always depend on the database that is used for it. But you must be really fucked up if you get filtereted by it.

Anonymous
11/29/24(Fri)07:51:19 No.103343342

Anonymous 11/29/24(Fri)07:51:19 No.103343342

What are the most FUN small (around 8b range) models now? I just want something that can adapt to different personalities and styles well, without any guardrails. If it's gonna refuse things I want it to do it while remaining in character at least. I don't really care about it being smart in any other way, as long as it can follow style and character prompts reasonably well. I guess I'm just hoping that there is something similar to pygmalion but more up to date and a bit less shit.

Anonymous
11/29/24(Fri)07:55:21 No.103343368

Anonymous 11/29/24(Fri)07:55:21 No.103343368

>>103343342
pyg6b

Anonymous
11/29/24(Fri)08:17:16 No.103343527

Anonymous 11/29/24(Fri)08:17:16 No.103343527

>>103343130
Trust in sama. Tomorrow is Chatgpts anniversary and he is going to blow everyone away.

Anonymous
11/29/24(Fri)08:21:32 No.103343561

Anonymous 11/29/24(Fri)08:21:32 No.103343561

>>103343527
The only guy blowing here is you

Anonymous
11/29/24(Fri)08:30:16 No.103343621

Anonymous 11/29/24(Fri)08:30:16 No.103343621

File: Screenshot_20241129-062916.png (194 KB, 913x1801)

194 KB PNG

>>103343527
What is he going to do, make 4o even more retarded?

Anonymous
11/29/24(Fri)08:35:15 No.103343647

Anonymous 11/29/24(Fri)08:35:15 No.103343647

>>103343621
A brand new safety feature that randomly refuses every other prompt.

Anonymous
11/29/24(Fri)08:36:19 No.103343655

Anonymous 11/29/24(Fri)08:36:19 No.103343655

File: 1702780169919512.jpg (431 KB, 1900x1700)

431 KB JPG

my local model's responses are shitty oneliners. how to fix? i run rocinante 12b

Anonymous
11/29/24(Fri)08:36:49 No.103343658

Anonymous 11/29/24(Fri)08:36:49 No.103343658

>>103343647
It will include a lesson about trannies and shittalk white people as a bonus. Very revolutionary and brave vision of the new world.

Anonymous
11/29/24(Fri)08:42:45 No.103343698

Anonymous 11/29/24(Fri)08:42:45 No.103343698

>>103343655
Length penalty, bias newline, and start a new convo without context contamination.

Anonymous
11/29/24(Fri)08:46:44 No.103343727

Anonymous 11/29/24(Fri)08:46:44 No.103343727

>>103343655
I have the same problem before.
Change the model.

Anonymous
11/29/24(Fri)08:51:33 No.103343749

Anonymous 11/29/24(Fri)08:51:33 No.103343749

>>103343342
Seconding this as I want to run many requests concurrently

Anonymous
11/29/24(Fri)08:55:00 No.103343765

Anonymous 11/29/24(Fri)08:55:00 No.103343765

https://x.com/elder_plinius/status/1862359119337808063

Anonymous
11/29/24(Fri)08:58:49 No.103343785

Anonymous 11/29/24(Fri)08:58:49 No.103343785

>>103343342
>>103343749
8B is to small, honestly. I remember that Mistral was decent, but it has been a while since I tried it. You should try to get at least 12b to run.

Anonymous
11/29/24(Fri)09:02:58 No.103343824

Anonymous 11/29/24(Fri)09:02:58 No.103343824

I got the sovits working to the point that I can get sound out on linux.
Is there a guide on how to hook it up to the Silly tts extension?

Anonymous
11/29/24(Fri)09:04:53 No.103343840

Anonymous 11/29/24(Fri)09:04:53 No.103343840

>>103343785
Since I want to run on vllm, the choice is between q8 8B or AWQ 13B, would 13B still be better?

Anonymous
11/29/24(Fri)09:08:22 No.103343859

Anonymous 11/29/24(Fri)09:08:22 No.103343859

So I've been using Koboldcpp and SillyTavern for my LLM sessions, I'm using Lumimaid-v0.2-12B.q6_k as it was what was recommended in the guide I was following, I'm running an Intel Arc A770 and Kobold on Vulkan mode because there doesn't seem to be an IPEX version from what I can tell.
Am I all set or is there a better alternative that I didn't spot when I was setting this all up?

Anonymous
11/29/24(Fri)09:09:48 No.103343870

Anonymous 11/29/24(Fri)09:09:48 No.103343870

>>103343765
God I hate these people. Unironically saying the term "Jailbreak" should be an instant lifetime ban from the internet.

Anonymous
11/29/24(Fri)09:14:04 No.103343897

Anonymous 11/29/24(Fri)09:14:04 No.103343897

File: 1707603902219381.png (5 KB, 303x122)

5 KB PNG

>got sillytavern to run locally on my phone
yep this is it, bye janitor

Anonymous
11/29/24(Fri)09:16:08 No.103343911

Anonymous 11/29/24(Fri)09:16:08 No.103343911

>>103343840
A quanted 13B will still feel better than 8B.

Anonymous
11/29/24(Fri)09:25:40 No.103343969

Anonymous 11/29/24(Fri)09:25:40 No.103343969

>>103343859
MN-12B-Lyra-v4 and Violet_Twilight-v0.2.Q6_K did give me better outputs, but this likely depends on preferences.

Anonymous
11/29/24(Fri)09:28:36 No.103343991

Anonymous 11/29/24(Fri)09:28:36 No.103343991

>>103343969
are those models for roleplaying?

Anonymous
11/29/24(Fri)09:29:31 No.103343999

Anonymous 11/29/24(Fri)09:29:31 No.103343999

>>103343991
Yeach

Anonymous
11/29/24(Fri)09:30:57 No.103344010

Anonymous 11/29/24(Fri)09:30:57 No.103344010

>>103343999
do you know if the method I'm using is optimal for intel Arc cards? I know I'm a little limited and that vulkan isn't the best for ARC but it's far from terrible

Anonymous
11/29/24(Fri)09:31:26 No.103344015

Anonymous 11/29/24(Fri)09:31:26 No.103344015

>>103343999
nice trips, Have you tried stheno or rocinante? Are those two you mentioned better?

Anonymous
11/29/24(Fri)09:41:29 No.103344079

Anonymous 11/29/24(Fri)09:41:29 No.103344079

>>103343859
>Luminaid
>Am I all set or is there a better alternative that I didn't spot when I was setting this all up?
You're fine. Now please never come back because you're attracting the sloptuners.

Anonymous
11/29/24(Fri)09:52:26 No.103344149

Anonymous 11/29/24(Fri)09:52:26 No.103344149

>>103344079
don't be such an elitist, I'm asking a completely reasonable question.

Anonymous
11/29/24(Fri)10:15:22 No.103344339

Anonymous 11/29/24(Fri)10:15:22 No.103344339

>>103341994
>Corpos kissing the new government ring
never seen that before

Anonymous
11/29/24(Fri)10:25:10 No.103344425

Anonymous 11/29/24(Fri)10:25:10 No.103344425

File: pepefroggie.jpg (38 KB, 780x438)

38 KB JPG

My bud told me he's hosting a chatbot website with a single 4090 in his basement and is actually making some decent side money. How is this possible?

Anonymous
11/29/24(Fri)10:27:33 No.103344445

Anonymous 11/29/24(Fri)10:27:33 No.103344445

>>103344425
he's hosting 7b retarded models and the normies think it's the second comming of christ because it was overtrained at counting the number of r in words

Anonymous
11/29/24(Fri)10:28:22 No.103344455

Anonymous 11/29/24(Fri)10:28:22 No.103344455

>>103344425
He's probably lying to save face over the fact that he spent thousands of dollars on gooning.

Anonymous
11/29/24(Fri)10:42:36 No.103344563

Anonymous 11/29/24(Fri)10:42:36 No.103344563

I am surprised how Tulu of all models is one of very few models that don't ask for my consent, if I am ready or annoying stalling shit like that. Even if the model got some god-awful GPT slop and purple prose.

Anonymous
11/29/24(Fri)10:44:32 No.103344584

Anonymous 11/29/24(Fri)10:44:32 No.103344584

>>103344425
What kind of stuff does his website do?
Is it just the simple cai thing with chat only?

Anonymous
11/29/24(Fri)10:44:39 No.103344585

Anonymous 11/29/24(Fri)10:44:39 No.103344585

is there some chart of all the best llms and their most popular finetunes and merges?

Anonymous
11/29/24(Fri)10:53:28 No.103344669

Anonymous 11/29/24(Fri)10:53:28 No.103344669

>>103344585
Best LLMs:
The chink models
Worst LLMs:
Literally everything else

Anonymous
11/29/24(Fri)10:56:23 No.103344695

Anonymous 11/29/24(Fri)10:56:23 No.103344695

Where the fuck is the context and istruct template for QwQ for ST, and best samplers, please anon I prove the model but sometimes is complete retard when i Use rep pen. Also, chinks word randomly generate.

Anonymous
11/29/24(Fri)10:59:26 No.103344727

Anonymous 11/29/24(Fri)10:59:26 No.103344727

>>103344695
>Also, chinks word randomly generate.
yeah I have this same shit, that's frustrating

Anonymous
11/29/24(Fri)10:59:49 No.103344731

Anonymous 11/29/24(Fri)10:59:49 No.103344731

File: 1702593601913116.jpg (93 KB, 554x1000)

93 KB JPG

>>103344669
trve....

Anonymous
11/29/24(Fri)11:02:10 No.103344763

Anonymous 11/29/24(Fri)11:02:10 No.103344763

>>103344695
newfag

Anonymous
11/29/24(Fri)11:14:11 No.103344870

Anonymous 11/29/24(Fri)11:14:11 No.103344870

Don't use rep pen. It fucks it up.

Anonymous
11/29/24(Fri)11:35:47 No.103345056

Anonymous 11/29/24(Fri)11:35:47 No.103345056

File: 734806.jpg (117 KB, 716x1011)

117 KB JPG

I gave a peek to https://candy.ai/ since its being shilled everywhere but even using it once is paywalled
How hard could it be to replicate this locally?
Its basically whatever random shit local model they use able to generate pics from stable diffusion midways conversation, silly abandoned the support for stable diffusion or something I recall

Anonymous
11/29/24(Fri)11:36:42 No.103345067

Anonymous 11/29/24(Fri)11:36:42 No.103345067

File: Screenshot_20241129-092330.png (247 KB, 640x575)

247 KB PNG

>>103344669
And we can thank him for going closed source and causing research to stagnate to such a point that they could basically just release their model and claim victory
Sama put the west so far fucking behind where it could have been it's not even funny

Anonymous
11/29/24(Fri)11:45:53 No.103345136

Anonymous 11/29/24(Fri)11:45:53 No.103345136

>https://rentry.org/lmg-lazy-getting-started-guide
And what if I have a 4090?
Wouldn't I want something with more parameters?

Anonymous
11/29/24(Fri)11:47:22 No.103345144

Anonymous 11/29/24(Fri)11:47:22 No.103345144

>>103345056
>https://candy.ai/
>What are you interested in?
>click on anime
>pages upon pages of /aco/

Anonymous
11/29/24(Fri)11:50:13 No.103345166

Anonymous 11/29/24(Fri)11:50:13 No.103345166

>>103345136
30B is mostly a dead model size for chatbotting nowadays. Your best option is to wait two more weeks

Anonymous
11/29/24(Fri)11:50:35 No.103345172

Anonymous 11/29/24(Fri)11:50:35 No.103345172

>>103345136
Follow the instructions just to get a feel for it. Later you can ask for a bigger model if you're unsatisfied with nemo.

Anonymous
11/29/24(Fri)11:51:08 No.103345177

Anonymous 11/29/24(Fri)11:51:08 No.103345177

>>103345166
What happens in two weeks?

Anonymous
11/29/24(Fri)11:52:17 No.103345188

Anonymous 11/29/24(Fri)11:52:17 No.103345188

>>103345136
Try Mistral Small. It's 22B and fits into your 4090.

Anonymous
11/29/24(Fri)11:52:29 No.103345191

Anonymous 11/29/24(Fri)11:52:29 No.103345191

>>103345136
People always recommend way smaller models than what is possible, especially on leddit.

Depends on how much you are willing to trade speed for quality. I got 4070 ti and can run 70b on iq4xs quant. 1 token per second for okay-ish quality is preferable over quick but retarded slop with poor logic and coherence.

Anonymous
11/29/24(Fri)11:54:52 No.103345217

Anonymous 11/29/24(Fri)11:54:52 No.103345217

QWQ verdit for roleplay? only those who used largestral 2 can reply to this btw

Anonymous
11/29/24(Fri)11:55:51 No.103345226

Anonymous 11/29/24(Fri)11:55:51 No.103345226

>>103345217
Don't do it

Anonymous
11/29/24(Fri)11:56:15 No.103345231

Anonymous 11/29/24(Fri)11:56:15 No.103345231

>>103339801
if you tried mistral large 2 you wouldnt be asking that question

Anonymous
11/29/24(Fri)11:56:40 No.103345234

Anonymous 11/29/24(Fri)11:56:40 No.103345234

>>103345217
It's shit.

Anonymous
11/29/24(Fri)11:56:54 No.103345237

Anonymous 11/29/24(Fri)11:56:54 No.103345237

>>103345217
Funny but useless

Anonymous
11/29/24(Fri)11:58:06 No.103345252

Anonymous 11/29/24(Fri)11:58:06 No.103345252

File: green man.png (944 KB, 694x681)

944 KB PNG

>>103345177
You will know once it happens.

Anonymous
11/29/24(Fri)11:59:13 No.103345263

Anonymous 11/29/24(Fri)11:59:13 No.103345263

>>103343133
mistral large 2
>>103344425
scamming normies

Anonymous
11/29/24(Fri)11:59:19 No.103345264

Anonymous 11/29/24(Fri)11:59:19 No.103345264

>>103345217
Its different, can be fun. Super smart, brings up small details no other model seems to really do. Its quite dry though. Id say large mistral and tulu are better at RP still. A finetune might change that though.

Anonymous
11/29/24(Fri)11:59:51 No.103345266

Anonymous 11/29/24(Fri)11:59:51 No.103345266

>>103345217
Different? I'm playing around with some plugins to try and take advantage of it's planning abilities. It's kinda working from what I can see in the thought process, but it's hard to reign in and it has the regular pitfalls of a model of its size. Guess the fine tunes will make or break it, if the tuners learn how to train CoT for RP.

Anonymous
11/29/24(Fri)12:00:05 No.103345270

Anonymous 11/29/24(Fri)12:00:05 No.103345270

ML2
dl doko?

Anonymous
11/29/24(Fri)12:01:15 No.103345277

Anonymous 11/29/24(Fri)12:01:15 No.103345277

>>103345270
Have you checked your anus?

Anonymous
11/29/24(Fri)12:04:53 No.103345309

Anonymous 11/29/24(Fri)12:04:53 No.103345309

>>103345217
It's okay.

Anonymous
11/29/24(Fri)12:06:12 No.103345315

Anonymous 11/29/24(Fri)12:06:12 No.103345315

>>103345056
Nah your website is shit. It works on still works on silly just check the guides

Anonymous
11/29/24(Fri)12:06:37 No.103345320

Anonymous 11/29/24(Fri)12:06:37 No.103345320

I couldn't figure out how to configure qwq to output an actual message after all the thinking in sillytavern.

Anonymous
11/29/24(Fri)12:07:15 No.103345327

Anonymous 11/29/24(Fri)12:07:15 No.103345327

>>103345144
Literal boomers are owning these websites to make quick cash, what did you expect kek

Anonymous
11/29/24(Fri)12:07:47 No.103345331

Anonymous 11/29/24(Fri)12:07:47 No.103345331

>>103345320
I had posted one a thread or two ago. You have it think in character inside of <thinking> tags then write the response

Anonymous
11/29/24(Fri)12:09:38 No.103345349

Anonymous 11/29/24(Fri)12:09:38 No.103345349

>>103345331
Can you export your settings?

Anonymous
11/29/24(Fri)12:10:28 No.103345359

Anonymous 11/29/24(Fri)12:10:28 No.103345359

>>103345320
There's also the Stepped Thinking pluging
https://github.com/cierru/st-stepped-thinking
But I think it fucks with the prompt format, hope inference time compute gets standardized soon, now with several models coming out.

Anonymous
11/29/24(Fri)12:10:46 No.103345362

Anonymous 11/29/24(Fri)12:10:46 No.103345362

>>103345349
Not at home atm otherwise I would have.

Anonymous
11/29/24(Fri)12:15:12 No.103345402

Anonymous 11/29/24(Fri)12:15:12 No.103345402

Does anyone have a source of super high quality English voice samples for tts training?
I've been scraping youtube and voice actor demo reels, but its really hit or miss.
That seems to be the bottleneck on these systems, and I don't see any projects out there trying to curate good datasets.
It would likely be highly socially unacceptable, but torrents don't care.

Anonymous
11/29/24(Fri)12:26:10 No.103345486

Anonymous 11/29/24(Fri)12:26:10 No.103345486

I can get the new m4 macbook and expense it with the highest amount of ram

How good is this for LLMs? Specifically just for learning shit like coding or powershell (and ideally using it as a way to help me with math)
I'm not expecting anything major but kinda like 'how do I do this' and then 'and why the fuck does this code work but not the following code' kinda deal

Anonymous
11/29/24(Fri)12:26:54 No.103345495

Anonymous 11/29/24(Fri)12:26:54 No.103345495

>>103345486
Very slow context processing.

Anonymous
11/29/24(Fri)12:27:59 No.103345511

Anonymous 11/29/24(Fri)12:27:59 No.103345511

>>103345217
qwute~

Anonymous
11/29/24(Fri)12:29:43 No.103345536

Anonymous 11/29/24(Fri)12:29:43 No.103345536

>>103345495
I've a 4090 and run 12b models at the moment on windows

How's it compare? I'm not expecting miracles but the mian thing is that I get an answer within like 10-15 seconds

Anonymous
11/29/24(Fri)12:32:07 No.103345565

Anonymous 11/29/24(Fri)12:32:07 No.103345565

>>103345217
>QWQ verdit for roleplay?
it talks like a woman HR doing a powerpoint presentation, corporate, souless talk

Anonymous
11/29/24(Fri)12:33:02 No.103345578

Anonymous 11/29/24(Fri)12:33:02 No.103345578

>>103345565
>using the default assistant persona
Jesus, make it use the personality of the character... I showed everyone how to a thread or two ago.

Anonymous
11/29/24(Fri)12:35:57 No.103345612

Anonymous 11/29/24(Fri)12:35:57 No.103345612

>>103345578
>make it use the personality of the character
Listen newfriend. If you give the model 8k tokens of prefilled human made content and then it continues to write in its own hr assistant style while it also picks up 10 subtle patterns in the writing you didn't notice and you didn't actually want, you start to realize none of this shit works. At best it is a thin condom wrapped around the meaty shaft that is the reddit hivemind.

Anonymous
11/29/24(Fri)12:37:20 No.103345627

Anonymous 11/29/24(Fri)12:37:20 No.103345627

>>103344695
0 temp, no rep pen, chatml.

Anonymous
11/29/24(Fri)12:37:57 No.103345634

Anonymous 11/29/24(Fri)12:37:57 No.103345634

>>103345612

>>103339830

Anonymous
11/29/24(Fri)12:39:37 No.103345651

Anonymous 11/29/24(Fri)12:39:37 No.103345651

>>103345578
You're talking with the anti-Chinese troll.
>>103345217
It's definitely less dry than Large for RP.

Anonymous
11/29/24(Fri)12:41:16 No.103345676

Anonymous 11/29/24(Fri)12:41:16 No.103345676

when are we getting RP finetunes for multimodal models?

Anonymous
11/29/24(Fri)12:45:12 No.103345715

Anonymous 11/29/24(Fri)12:45:12 No.103345715

>>103345486
>apple
Reddit is this way: https://old.reddit.com/r/LocalLLaMA/

Anonymous
11/29/24(Fri)12:46:23 No.103345735

Anonymous 11/29/24(Fri)12:46:23 No.103345735

File: 1711262873551176.jpg (319 KB, 1030x1326)

319 KB JPG

>>103345715

Anonymous
11/29/24(Fri)12:46:41 No.103345738

Anonymous 11/29/24(Fri)12:46:41 No.103345738

>>103345715
You got it wrong. Jewvidia is being such a jew that even with the apple tax apple shit is starting to be an option.

Anonymous
11/29/24(Fri)12:47:10 No.103345746

Anonymous 11/29/24(Fri)12:47:10 No.103345746

How would you stack a book, a bucket, a tennis ball, a sword and a chair to reach the biggest height possible?
>Another idea: maybe use the sword to impale the tennis ball and stand it up.
>But that might be destructive and not necessary.
>Let me think differently.
It was so close to thinking out of the box and solving it but it's too fucking safe (QwQ)

Anonymous
11/29/24(Fri)12:49:03 No.103345766

Anonymous 11/29/24(Fri)12:49:03 No.103345766

>>103345735
Who the FUCK cares about phones

Anonymous
11/29/24(Fri)12:50:03 No.103345783

Anonymous 11/29/24(Fri)12:50:03 No.103345783

>>103345746
And now I said that it's allowed to be destructive and he is dissasembling the chair lmao

Anonymous
11/29/24(Fri)12:50:46 No.103345793

Anonymous 11/29/24(Fri)12:50:46 No.103345793

>>103345486
>Getting anything good for its price from Apple
lol
lmao even

Anonymous
11/29/24(Fri)12:51:50 No.103345804

Anonymous 11/29/24(Fri)12:51:50 No.103345804

>>103345793
Anon, I can expense it
I genuinely give no fucks since it'll be paid for

Anonymous
11/29/24(Fri)12:51:53 No.103345805

Anonymous 11/29/24(Fri)12:51:53 No.103345805

>>103345738
>no games
>no training
>linux support is behind some mentally ill vtuber
>slow prompt processing
>slow image gen
>impossible to upgrade
It's barely an option.

Anonymous
11/29/24(Fri)12:52:05 No.103345806

Anonymous 11/29/24(Fri)12:52:05 No.103345806

>>103345783
Good idea, use each part of the chair and its screws to make a wood stick as long as possible.

Anonymous
11/29/24(Fri)12:52:13 No.103345809

Anonymous 11/29/24(Fri)12:52:13 No.103345809

>>103345746
>tfw QwQ is your face after using it, they knew all along

Anonymous
11/29/24(Fri)12:52:59 No.103345817

Anonymous 11/29/24(Fri)12:52:59 No.103345817

File: 1715169835396520.png (66 KB, 360x346)

66 KB PNG

>>103345805
>>linux support is behind some mentally ill vtuber
wait what?

Anonymous
11/29/24(Fri)12:53:07 No.103345820

Anonymous 11/29/24(Fri)12:53:07 No.103345820

>>103345806
But then you could stretch the matter into infinity to get infinite height

Anonymous
11/29/24(Fri)12:57:39 No.103345860

Anonymous 11/29/24(Fri)12:57:39 No.103345860

>>103345805
I would say 5 T/s is usable for me. And you can easily get a 70B like that with just one macbook. Compare that to jewvidia single gpu solution and it is not even a comparison. It is an option just for LLM-s. And if you are doing a 2 gpu jevidia setup you are doing that for LLM-s.

Anonymous
11/29/24(Fri)12:58:56 No.103345872

Anonymous 11/29/24(Fri)12:58:56 No.103345872

>>103345805
>gaymes in 2024
lol
>reeee no loonix support!!!
Who cares, macOS or Windows is the way, multiplatform is god choice if you care about gaming stuff.
>Slow img / prompt processing
Will be fixed, apple works on it already, something about kv cache compression in mlx.
>Impossible to upgrade
The only valid complaint. Though it can be explained by apple's anti-theft measures, 3rd party stuff wont work properly, guess EU should jump in and rape apple again.

Anonymous
11/29/24(Fri)12:59:13 No.103345875

Anonymous 11/29/24(Fri)12:59:13 No.103345875

I asked QwQ to continue with a certain authors style
>author is known for layered, dense prose, complex sentence structure etc etc
Holy fuck, it's working
>Begins: I X, Ying. I did Z. It X, Ying.
Oof, why reason if it doesn't even apply it afterwards.

Anonymous
11/29/24(Fri)13:00:05 No.103345882

Anonymous 11/29/24(Fri)13:00:05 No.103345882

How come some models properly end their replies in a complete sentence with a period, while others will stop at the pre-determined token count(lets say 250) without completing the final sentence? Is it a stop token issue? What would tulu's stop token be?

Anonymous
11/29/24(Fri)13:00:27 No.103345886

Anonymous 11/29/24(Fri)13:00:27 No.103345886

>>103345860
>pay thousands for slightly faster than cpu only, missing a ton of features still, and not being upgradeable
Maybe AMD APUs will deliver in a gen or two.

Anonymous
11/29/24(Fri)13:03:33 No.103345916

Anonymous 11/29/24(Fri)13:03:33 No.103345916

>>103345882
For brain dead questions, please go to reddit.

Anonymous
11/29/24(Fri)13:03:33 No.103345917

Anonymous 11/29/24(Fri)13:03:33 No.103345917

>>103345875
It's pissing me off as well, it's very precise and correct whilst thinking, then just pulls a reply from it's ass.

Anonymous
11/29/24(Fri)13:04:44 No.103345929

Anonymous 11/29/24(Fri)13:04:44 No.103345929

>>103345917
Its very sensitive to how its prompted it seems. You need to tell it to, in character, use what it has reasoned to continue the story.

Anonymous
11/29/24(Fri)13:06:34 No.103345950

Anonymous 11/29/24(Fri)13:06:34 No.103345950

>>103345817
https://www.youtube.com/watch?v=LonzMviFCNs
>If you don't like it, tough luck. Complain to Linus, and make sure he knows if he kicks me out his shiny M2 MacBook no longer gets upstream support.
https://archive.is/YuVlY

Anonymous
11/29/24(Fri)13:09:57 No.103345992

Anonymous 11/29/24(Fri)13:09:57 No.103345992

>>103345950
https://youtu.be/LonzMviFCNs?t=556
the fuck is this voice, is this an actual troon?

Anonymous
11/29/24(Fri)13:14:03 No.103346028

Anonymous 11/29/24(Fri)13:14:03 No.103346028

>>103345917
Maybe ask it to give itself examples in its CoT?

Anonymous
11/29/24(Fri)13:14:33 No.103346034

Anonymous 11/29/24(Fri)13:14:33 No.103346034

>>103345992
https://archive.is/ilehu
It's open source.

Anonymous
11/29/24(Fri)13:14:34 No.103346035

Anonymous 11/29/24(Fri)13:14:34 No.103346035

>>103345950
Oh, cool, they are live right now!
https://www.youtube.com/live/xHzy7iySS2c

Anonymous
11/29/24(Fri)13:15:26 No.103346040

Anonymous 11/29/24(Fri)13:15:26 No.103346040

>>103345266
How would you even finetune CoT for RP? Lots of multiturn examples with elaborate CoT in the latest response?

Anonymous
11/29/24(Fri)13:15:36 No.103346041

Anonymous 11/29/24(Fri)13:15:36 No.103346041

File: 1730452318285.png (358 KB, 612x567)

358 KB PNG

>>103345950
>>103345992

Anonymous
11/29/24(Fri)13:17:12 No.103346058

Anonymous 11/29/24(Fri)13:17:12 No.103346058

File: 1723989324845979.png (1.12 MB, 2186x1231)

1.12 MB PNG

>>103346034
>all this autistic shit to sound like a troon
kek

Anonymous
11/29/24(Fri)13:17:44 No.103346063

Anonymous 11/29/24(Fri)13:17:44 No.103346063

>>103345950
>>103345992
>>103346035
Bųy aŋ åd şis

Anonymous
11/29/24(Fri)13:18:57 No.103346074

Anonymous 11/29/24(Fri)13:18:57 No.103346074

>>103345950
This may actually be worse than mikufaggots.

Anonymous
11/29/24(Fri)13:20:38 No.103346092

Anonymous 11/29/24(Fri)13:20:38 No.103346092

>>103346074
Same thing, even sounds similar like their fav vocaloid meme.

Anonymous
11/29/24(Fri)13:23:21 No.103346125

Anonymous 11/29/24(Fri)13:23:21 No.103346125

>>103346092
that sounds nothing like Miku, you're tripping

Anonymous
11/29/24(Fri)13:23:26 No.103346127

Anonymous 11/29/24(Fri)13:23:26 No.103346127

I didn't meant to start linux/mac flameware
I honestly just wanted to know if it was worth getting the new M4 to use as a tool while at work since I could expense it so wouldn't cost me anything

I don't want to use my desktop with a 4090 and remote on and the razer bllade I tried out, while not bad is loud as fuck and that defeats the purpose

Anonymous
11/29/24(Fri)13:26:10 No.103346160

Anonymous 11/29/24(Fri)13:26:10 No.103346160

>>103346127
>since I could expense it so wouldn't cost me anything
I think this is the only time it'd be worth it. You likely couldn't get the same overall inference performance on a big model with any other laptop setup.
Just be prepared to wait longer for prompt processing than you'd like, so you'll have longer and longer delays the deeper you get into a chat.

Anonymous
11/29/24(Fri)13:29:42 No.103346205

Anonymous 11/29/24(Fri)13:29:42 No.103346205

>>103345735
The crazy thing is that this almost directly correlates to the per capital s-y consumption of these countries.

Anonymous
11/29/24(Fri)13:30:27 No.103346214

Anonymous 11/29/24(Fri)13:30:27 No.103346214

>>103346125
Both can induce ear tinnitus with this high pitch anime retardation.

Anonymous
11/29/24(Fri)13:31:36 No.103346228

Anonymous 11/29/24(Fri)13:31:36 No.103346228

File: image.png (239 KB, 1149x1113)

239 KB PNG

it's still thinking

Anonymous
11/29/24(Fri)13:37:56 No.103346314

Anonymous 11/29/24(Fri)13:37:56 No.103346314

>>103346040
Changing the CoT format entirely. No reason for any of the tokens to make sense to us. Before each token is predicted, the model should generate a block of guiding tokens, allowing it to sample a much wider variety in its latent space before making a prediction. As we all know, the models take very predictable paths through latent space, if were trained to actively sample much wider for each token I have a suspicion we would get much higher accuracy, or at least lower PPL compared to a similar sized model. The guiding tokens could be trained through RL, minimizing PPL.
Just a shot from the hip, but it seems like a natural evolution unless I'm missing some obvious pitfall. There are so many cognitive program snippets embedded in these models, but they have no way to productively sample them during generation at the moment.

Anonymous
11/29/24(Fri)14:11:53 No.103346696

Anonymous 11/29/24(Fri)14:11:53 No.103346696

>>103346228
>be adhd
>the other person already left by the time done thinking

Anonymous
11/29/24(Fri)14:13:21 No.103346705

Anonymous 11/29/24(Fri)14:13:21 No.103346705

24gb vramlet here. I tried QwQ for RP at Q4_K_S, but I think it still falls beyond Tulu 70b IQ2_S in RP.

Anonymous
11/29/24(Fri)14:17:22 No.103346744

Anonymous 11/29/24(Fri)14:17:22 No.103346744

>>103346705
>24gb vramlet here
>it still falls beyond Tulu 70b IQ2_S in RP
wait, are you seriously running 70b models with only 24gb of vram?

Anonymous
11/29/24(Fri)14:17:24 No.103346745

Anonymous 11/29/24(Fri)14:17:24 No.103346745

>>103346705
For RP I agree. For everything else id say QwQ or qwen2.5 32B coder, some stuff QwQ does better, some coding stuff coder does better.

Anonymous
11/29/24(Fri)14:21:50 No.103346794

Anonymous 11/29/24(Fri)14:21:50 No.103346794

Are people unironically using math and coding models for RP?
Why?

Anonymous
11/29/24(Fri)14:23:49 No.103346812

Anonymous 11/29/24(Fri)14:23:49 No.103346812

>>103346794
QwQ is supposed to be a general reasoning model, not a math or coding focused one, that is just where it shines. (Though while it gets some stuff right that regular coder / everything else got wrong for me it gets some stuff wrong that coder gets right)

Anonymous
11/29/24(Fri)14:24:04 No.103346816

Anonymous 11/29/24(Fri)14:24:04 No.103346816

>>103346794
why not? a smart model means non retarded RPs, but yeah, if it has no sovl it's bad either way

Anonymous
11/29/24(Fri)14:25:26 No.103346830

Anonymous 11/29/24(Fri)14:25:26 No.103346830

>>103346228
>I've been seeing you here every wednesday for a while: do you come here often?
kek
can't believe 2024 is when we taught nvidia cards autism

Anonymous
11/29/24(Fri)14:26:01 No.103346839

Anonymous 11/29/24(Fri)14:26:01 No.103346839

File: ThatsNotATPSReport.png (1.08 MB, 1280x768)

1.08 MB PNG

>>103346794
>why?
because its funny

Anonymous
11/29/24(Fri)14:28:28 No.103346873

Anonymous 11/29/24(Fri)14:28:28 No.103346873

>>103346744
Yes, I am. In general, I think that smaller quants of big models will remain superior to smaller models until models start to truly reach saturation - peak density of useful data. At that point, smaller models will rise.

Right now, even at low quants, big models are clearly superior at present.

Anonymous
11/29/24(Fri)14:30:32 No.103346915

Anonymous 11/29/24(Fri)14:30:32 No.103346915

Biggest problem with QwQ is not cleanly delineating the CoT like o1 and r1 apparently do.
I notice it sometimes likes to say **Final Answer** or **Final Solution** at the end of its CoT, but it's not always consistent in this. It makes it difficult for RP because I'd love for it to be able to reason about its RP response before continuing, but it can't seem to hold consistent formatting well and will often blend its thoughts into the answers. Maybe if it were 70b+ you could teach it to do this reliably in just the prompt, but ideally it'd be trained with some thinking tag tokens and do that naturally.

Anonymous
11/29/24(Fri)14:34:32 No.103346966

Anonymous 11/29/24(Fri)14:34:32 No.103346966

>>103346745
Have QwQ plan the code, and then feed the plan to coder.
Post results.

Anonymous
11/29/24(Fri)14:37:31 No.103347005

Anonymous 11/29/24(Fri)14:37:31 No.103347005

>>103339593
I left it because there wasn't really anything past the 2000 character cut-off worth replacing it with, and the fact that HF itself took action seemed significant enough to just leave it in there.
>>103339638
A meatbag still manually posts the output.
>>103339640
I do vet the output, if I have time. But the fact that it's not fully automated after all this time irritates me to no end.

Anonymous
11/29/24(Fri)14:37:45 No.103347010

Anonymous 11/29/24(Fri)14:37:45 No.103347010

>>103346966
aider does that with the architect/editor system, it seems to be the best use of the reasoning models rather than having them write the code directly, getting better performance than any individual model on its own:
https://aider.chat/2024/09/26/architect.html
qwq would slot into this perfectly.

Anonymous
11/29/24(Fri)14:39:28 No.103347032

Anonymous 11/29/24(Fri)14:39:28 No.103347032

>>103346127
Yea it's an excellent laptop. good display good battery life, the only problem is that if you want more than 16gb of ram or 512gb of SSD space, you will be raped, and it's a rape that you deserve if you don't like linux, since you could find an equal laptop with 2 nvme slots, oled for the same price, and just install linux on that (I actually don't know... maybe that laptop doesn't exist... and that laptop would not have a powerful GPU like a 16gb mobile 4090 technically a 4080, and I don't think you would want that since 16gb is nothing, you could use google colab for the same amount)
I also don't know what you mean by needing to remote it.
Surely anything productivity based would be provided by the employer and it would be some sort of GPT shit. Like sure, if you really want to, you could get the m4 pro laptop, and you could run some sort of LLM with 24gb of unified ram (it's more like 16gb of vram, since you probably have a browser open, and the bandwidth is pretty low, so even if you had 48gb it would run at like 3 tokens per second). Now if you got a 48gb m4 max for $3700, that could do more than your 4090, if you really wanted to run 70b q4 at like 5-10 tokens per second (the bandwidth of the m4 max is half of the 4090/3090, so 2x 4090 would give you like 15tk/s).
https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

so overall, no linux = sure get a macbook, you might even like it so much it might replace your 4090, you might even lose interest in local LLMs, and you might start renting 200gb GPU servers from vast.ai or something for like $5 an hour.

Anonymous
11/29/24(Fri)14:40:07 No.103347046

Anonymous 11/29/24(Fri)14:40:07 No.103347046

>>103339670
this abliterated shit was always a meme and will remain a meme, not susprised by the shitty results there

Anonymous
11/29/24(Fri)14:41:25 No.103347057

Anonymous 11/29/24(Fri)14:41:25 No.103347057

>>103347005
>But the fact that it's not fully automated after all this time irritates me to no end.
If you do a code release on your pipeline, I'd be happy to collaborate.
Are any of the trainingfags around? If I could automate a pipeline for that I'd happily continuously train a community model on lmg threads for recapbot.

Anonymous
11/29/24(Fri)14:43:42 No.103347087

Anonymous 11/29/24(Fri)14:43:42 No.103347087

>>103339670
kill yourself

Anonymous
11/29/24(Fri)14:43:53 No.103347089

Anonymous 11/29/24(Fri)14:43:53 No.103347089

>>103345486
>macbook
The real answer to this is to get them to buy you a proper AI box for the price of that laptop, get a cheapo laptop, and remote back into the powerful box. Are you out of wifi and cell range often enough that carrying all that shit around in a portable form-factor is worth the "1/2 the power for 2x the price" laptop tax?

Anonymous
11/29/24(Fri)14:49:53 No.103347158

Anonymous 11/29/24(Fri)14:49:53 No.103347158

>>103345805
>no games
if you're over 21 and still play video games, you should kys asap. only exception is sports games (barely)

Anonymous
11/29/24(Fri)14:51:09 No.103347174

Anonymous 11/29/24(Fri)14:51:09 No.103347174

>>103345612
skill issue

Anonymous
11/29/24(Fri)14:52:19 No.103347193

Anonymous 11/29/24(Fri)14:52:19 No.103347193

anyone tried the full precision weights version of qwq vs q8? I'm wondering if it gets hit harder than dumber models.

Anonymous
11/29/24(Fri)14:52:34 No.103347198

Anonymous 11/29/24(Fri)14:52:34 No.103347198

>>103347158
Elon musk no lifes diablo btw

Anonymous
11/29/24(Fri)14:53:25 No.103347207

Anonymous 11/29/24(Fri)14:53:25 No.103347207

>>103347198
as i said, anyone over 21 doing that should kys asap

Anonymous
11/29/24(Fri)14:53:37 No.103347210

Anonymous 11/29/24(Fri)14:53:37 No.103347210

>>103347087
please articulate what was upsetting to you about that post

Anonymous
11/29/24(Fri)14:56:04 No.103347236

Anonymous 11/29/24(Fri)14:56:04 No.103347236

File: dumb story but.png (1.03 MB, 800x5987)

1.03 MB PNG

>>103346228
How do you trigger the CoT? Guess I should start from blank chat and fresh sys prompt instead of RP prompt and saying "use chain of thought (CoT) to continue the story" in my last input.
And on swipes it's not vomiting this long.

Anonymous
11/29/24(Fri)14:58:55 No.103347265

Anonymous 11/29/24(Fri)14:58:55 No.103347265

>>103347236
>think step-by-step
Seems to be the best trigger for me so far, not so odd when it's the same they use in their examples.

Anonymous
11/29/24(Fri)14:59:38 No.103347277

Anonymous 11/29/24(Fri)14:59:38 No.103347277

>>103347236
it's sensitive to the phrase "think step by step"

Anonymous
11/29/24(Fri)15:00:27 No.103347292

Anonymous 11/29/24(Fri)15:00:27 No.103347292

>>103347265
>example
I should have looked at that first

Anonymous
11/29/24(Fri)15:01:06 No.103347301

Anonymous 11/29/24(Fri)15:01:06 No.103347301

>>103347236
words words words words

Tell her to send you some selfies, sheesh

Anonymous
11/29/24(Fri)15:01:27 No.103347307

Anonymous 11/29/24(Fri)15:01:27 No.103347307

>>103342015
>hurr durr based
why don't you go suck trump and elon's cock you faggot, bootlicking, billionaire worshiping, bitch motherfucker

Anonymous
11/29/24(Fri)15:02:12 No.103347317

Anonymous 11/29/24(Fri)15:02:12 No.103347317

>>103347307
Go cry on bluesky tranny

Anonymous
11/29/24(Fri)15:02:29 No.103347320

Anonymous 11/29/24(Fri)15:02:29 No.103347320

>>103346915
What prompt do you use to make QwQ use a CoT while roleplaying?

Anonymous
11/29/24(Fri)15:02:36 No.103347321

Anonymous 11/29/24(Fri)15:02:36 No.103347321

>>103347057
I'll put up a repo this weekend. I've been collecting logs from the last few months of bot output in the hopes of training 8b loras so even vramlets could be backup recap anons, but I suspect most of the logs from before this month won't be high quality enough.

Anonymous
11/29/24(Fri)15:05:39 No.103347360

Anonymous 11/29/24(Fri)15:05:39 No.103347360

>>103347307
lol you lost the election

Anonymous
11/29/24(Fri)15:05:52 No.103347362

Anonymous 11/29/24(Fri)15:05:52 No.103347362

>>103347321
>logs
Just thread captures, or something more interesting?

Anonymous
11/29/24(Fri)15:06:44 No.103347372

Anonymous 11/29/24(Fri)15:06:44 No.103347372

>>103347158
Semi-bait aside, more like >29, 21 is when people start having jobs (assuming not 24/7 NEET, a separate issue, in which case should kys) and can buy a computer for themselves without mommy's money. Instead of playing sports videogames, why not play the real sports? Games can be left for things you can't otherwise do.
A suit and cap wearing grandpa didn't suddenly wear them, he was wearing them when he was 15 as well.

Anonymous
11/29/24(Fri)15:07:23 No.103347379

Anonymous 11/29/24(Fri)15:07:23 No.103347379

QwQ fine-tunes?

Anonymous
11/29/24(Fri)15:07:38 No.103347384

Anonymous 11/29/24(Fri)15:07:38 No.103347384

>>103347307
Your post practically radiates saltiness. Sorry, but we're taking it all back. Woke has to go.

Anonymous
11/29/24(Fri)15:08:12 No.103347393

Anonymous 11/29/24(Fri)15:08:12 No.103347393

>>103347089
>1/2 the power for 2x the price
technically if he went with the 3 tk/s 48gb 1tb m4 pro for $2900, and ignored the fact 2x 4090's would give you 15tks and ignoring that he already has a AI box with a 4090, and included the cost of a 4k HDR display... the macbook pro is kind of better than a AI box, it's just a bit slow when you run the 48gb LLM's (5tk/s).
But I highly doubt the company is going to cover $2900 for the 48gb m4 pro laptop, and the base m4 $1600 16gb 512gb version is probably the best option for what he does (just don't think he is going to use a local LLM, it's going to be GPT or something), and might as well get the $2000 24gb 1tb version if it maxes out the budget for the laptop.
In my opinion I bet the company uses remote VM to windows or ubuntu. It depends on what his job is and what he does with the laptop.
To me, AI is only good for RP, I don't know what the hell he is doing on it for work.

Anonymous
11/29/24(Fri)15:08:17 No.103347395

Anonymous 11/29/24(Fri)15:08:17 No.103347395

>>103347384
>Sorry, but we're taking it all back. Woke has to go.
exactly, we have the good ending, time to take advantage of that

Anonymous
11/29/24(Fri)15:09:42 No.103347415

Anonymous 11/29/24(Fri)15:09:42 No.103347415

>>103347393
>the macbook pro is kind of better than a AI box
The AI box could also be a mac studio, if that's how you want to roll. I can't see the macbook being the best for this in any scenario that doesn't have you stranded without comms back to a better, non-portable box

Anonymous
11/29/24(Fri)15:12:24 No.103347442

Anonymous 11/29/24(Fri)15:12:24 No.103347442

>>103347393
AMD apu leaks something like 256GB with 300 gbs ish memory bandwith. I would wait at least for that, hopefully in a gen or two it will be more like 600 gbs+

Anonymous
11/29/24(Fri)15:14:05 No.103347461

Anonymous 11/29/24(Fri)15:14:05 No.103347461

>>103342033
>and llama is even more censored now than qwen
llama 3 only censors incel shit like erp and racism. qwen's censorship is far more foundational to the model

Anonymous
11/29/24(Fri)15:17:33 No.103347514

Anonymous 11/29/24(Fri)15:17:33 No.103347514

File: file.png (46 KB, 959x134)

46 KB PNG

>>103347362
The model provides reasoning before it outputs a rating or title.

Anonymous
11/29/24(Fri)15:19:38 No.103347532

Anonymous 11/29/24(Fri)15:19:38 No.103347532

>>103341985
cry more

Anonymous
11/29/24(Fri)15:20:55 No.103347547

Anonymous 11/29/24(Fri)15:20:55 No.103347547

>>103347442
And if R1 / llama 4 turns out to be a giant moe like 15Bx12+ or something then this would be the way to go.

Anonymous
11/29/24(Fri)15:24:22 No.103347593

Anonymous 11/29/24(Fri)15:24:22 No.103347593

>>103340923
they don't need to show anything. normies don't know there's an alternative to chatgpt and they don't want to know. a normie's most important thing in life is trend-following. using something like claude would lower their social market value

Anonymous
11/29/24(Fri)15:25:53 No.103347615

Anonymous 11/29/24(Fri)15:25:53 No.103347615

>>103340918
they're tied for 1st place, and that's despite google paying off the owners of that leaderboard, which has already been confirmed

Anonymous
11/29/24(Fri)15:27:07 No.103347631

Anonymous 11/29/24(Fri)15:27:07 No.103347631

>>103347442
the only situation that happens is if we wait until 2026 they make a new socket, and made a 4 channel ddr6 PC.
considering that intel's overpriced CU-DIMMS only hit 160gb/s, that's the only way I can see.
OR, it's a soldered mobile/minipc only CPU... Pretty interesting if they did that... but you just know soldered ram = $800 for a mini pc with 24gb, and $1600 for 48gb.

Anonymous
11/29/24(Fri)15:29:44 No.103347665

Anonymous 11/29/24(Fri)15:29:44 No.103347665

>>103347641
>>103347641
>>103347641

Anonymous
11/29/24(Fri)15:30:38 No.103347675

Anonymous 11/29/24(Fri)15:30:38 No.103347675

>>103347615
Sorry, your excuses aren't good enough. When are we getting Sora, Sam?

Anonymous
11/29/24(Fri)15:33:54 No.103347718

Anonymous 11/29/24(Fri)15:33:54 No.103347718

>>103347631
Its a APU, on chip memory is the entire point. That is why I said 300GBS

Anonymous
11/29/24(Fri)15:34:11 No.103347722

Anonymous 11/29/24(Fri)15:34:11 No.103347722

>>103347675
When you are killing yourself schizo?

Anonymous
11/29/24(Fri)15:38:01 No.103347768

Anonymous 11/29/24(Fri)15:38:01 No.103347768

>>103347514
>The model provides reasoning before it outputs a rating or title.
That is actually amazing. How far back have you been logging your decisions?

Anonymous
11/29/24(Fri)15:38:50 No.103347779

Anonymous 11/29/24(Fri)15:38:50 No.103347779

>>103347722
Tick tock, Sam. o1 isn't top dog anymore, you neutered gpt-4o to the point it's literally sub-Qwen 72B, Dall-E 3 is a fucking joke compared to Flux and even SD 3.5, and what we've seen from Sora leaves much to be desired
Just admit you lost

Sam
11/29/24(Fri)15:39:09 No.103347785

Sam 11/29/24(Fri)15:39:09 No.103347785

>>103347675
Tomorrow on ChatGPTs anniversary

Anonymous
11/29/24(Fri)15:39:09 No.103347786

Anonymous 11/29/24(Fri)15:39:09 No.103347786

>>103342212
qwq is the most censored qwen yet, but nobody cares because they're dunking on shartmerica

Anonymous
11/29/24(Fri)15:40:18 No.103347797

Anonymous 11/29/24(Fri)15:40:18 No.103347797

>>103347779
based, feels good to see OpenAI's downfall, those smug motherfuckers got what they deserved

Anonymous
11/29/24(Fri)15:44:03 No.103347836

Anonymous 11/29/24(Fri)15:44:03 No.103347836

>>103347768
Daily going back to June.

Anonymous
11/29/24(Fri)15:47:12 No.103347863

Anonymous 11/29/24(Fri)15:47:12 No.103347863

>>103340411
literal skill issue

Anonymous
11/29/24(Fri)16:15:30 No.103348158

Anonymous 11/29/24(Fri)16:15:30 No.103348158

>>103347863
the whole point of ai is to solve skill issues

Anonymous
11/29/24(Fri)17:01:07 No.103348648

Anonymous 11/29/24(Fri)17:01:07 No.103348648

Largestral still king of RP, CoT is still a meme

Anonymous
11/29/24(Fri)17:23:49 No.103348830

Anonymous 11/29/24(Fri)17:23:49 No.103348830

File: giphy.gif (1.26 MB, 480x366)

1.26 MB GIF

>>103346228

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.