/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 12/01/24(Sun)01:58:59 No.103364121

File: __akita_neru_vocaloid_dra(...).jpg (1.77 MB, 2512x3512)

1.77 MB JPG

/lmg/ - Local Models General Anonymous 12/01/24(Sun)01:58:59 No.103364121 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103354338 & >>103347641

►News
>(11/29) INTELLECT-1 released: https://hf.co/PrimeIntellect/INTELLECT-1-Instruct
>(11/27) Qwen2.5-32B-Instruct reflection tune: https://qwenlm.github.io/blog/qwq-32b-preview
>(11/26) OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
>(11/26) Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT
>(11/25) Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux
>(11/25) Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/01/24(Sun)01:59:26 No.103364123

Anonymous 12/01/24(Sun)01:59:26 No.103364123

File: Akita.Neru.full.982124.jpg (111 KB, 1280x1024)

111 KB JPG

►Recent Highlights from the Previous Thread: >>103354338

--Training a 100M LLM with OpenDiLoCo: feasibility and challenges:
>103354925 >103355729 >103355757 >103356019 >103356461 >103356542 >103356640 >103356914
--QwQ and Opus comparison, AI model capabilities and limitations:
>103359461 >103359475 >103359492 >103359569 >103359642 >103359509 >103359556 >103359566 >103359627 >103359656 >103359816
--Probability problem discussion with simulation and Monty Hall problem comparison:
>103354629 >103354711 >103354756 >103354810 >103355065 >103355307
--Waiting for hardware optimized for AI and matrix multiplications:
>103358021 >103358134 >103358311 >103358359
--Merging INTELLECT and QwQ models, compatibility issues and challenges:
>103357031 >103357045 >103357092 >103357307 >103357340 >103357414 >103357438 >103362336
--Largestral GPU configurations and performance discussion:
>103354505 >103354581 >103354778 >103355335 >103358419 >103358605 >103358639 >103358655 >103358683 >103355352 >103355396 >103355512 >103358794
--INTELLECT-1 discussion and potential future developments:
>103356933 >103356965 >103357070 >103357959 >103358482 >103359108 >103359329 >103359395 >103359427 >103359454 >103359517 >103359541 >103359587
--Discussion on the limitations of transformers and the concept of AGI:
>103357625 >103357663 >103357675 >103357686 >103357716 >103357704 >103357725 >103358083 >103358278 >103360487 >103358115 >103358454 >103358941
--Anons discuss rapid AI progress and future GPU development:
>103357798 >103357907 >103357975 >103358363 >103358738 >103358864
--Ryzen anon and NPU IGPU hybrid method:
>103355150
--LLMs as a tool for self-improvement and progress:
>103360383
--KoboldCpp 1.79 release with new features and user reactions:
>103355527 >103355759 >103355965 >103356030
--Miku (free space):
>103358406 >103358602 >103359731 >103362325

►Recent Highlight Posts from the Previous Thread: >>103354346

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
12/01/24(Sun)02:03:35 No.103364161

Anonymous 12/01/24(Sun)02:03:35 No.103364161

She's cute but I'm not a huge fan of blondes.

Anonymous
12/01/24(Sun)02:03:38 No.103364162

Anonymous 12/01/24(Sun)02:03:38 No.103364162

File: kyoton.png (1.13 MB, 1280x768)

1.13 MB PNG

Good night /lmg/...

Anonymous
12/01/24(Sun)02:03:40 No.103364164

Anonymous 12/01/24(Sun)02:03:40 No.103364164

omg a hag :(

Anonymous
12/01/24(Sun)02:07:50 No.103364207

Anonymous 12/01/24(Sun)02:07:50 No.103364207

File: 1725097212508247.png (24 KB, 1010x164)

24 KB PNG

>>103364085
Yes. I have the request logging turned on to see what is sent by SillyTavern.

It actually looks like it breaks if I have this turned on. It seems to work fine if I set it to none, but then the completion template is not correct. Let me play around with it more

Found this bug report. I guess I'll need to figure out the right chat completion template. Or is there something standard people use for pixtral?

https://github.com/SillyTavern/SillyTavern/issues/3057

Anonymous
12/01/24(Sun)02:13:03 No.103364240

Anonymous 12/01/24(Sun)02:13:03 No.103364240

File: file.png (52 KB, 1412x226)

52 KB PNG

>>103364207
here's the ST's dev only comment about that bug report. not sure if this is helpful at all

Anonymous
12/01/24(Sun)02:18:32 No.103364276

Anonymous 12/01/24(Sun)02:18:32 No.103364276

>>103364240
Yeah, that is the problem, since tabby (or the model's template?) is very strict on the completion list being system followed by user/assistant pairs. I may need to just have everything in a single system prompt.

Anonymous
12/01/24(Sun)02:29:43 No.103364361

Anonymous 12/01/24(Sun)02:29:43 No.103364361

You what I just noticed? All these models suck because they're focused on answering a question on a single input. The fact they can kind of sorta RP sometimes is a byproduct.

Anonymous
12/01/24(Sun)02:30:26 No.103364367

Anonymous 12/01/24(Sun)02:30:26 No.103364367

Consider Zundamon.

Anonymous
12/01/24(Sun)02:30:38 No.103364370

Anonymous 12/01/24(Sun)02:30:38 No.103364370

>>103364361
gee anon what a revelation

Anonymous
12/01/24(Sun)02:36:18 No.103364416

Anonymous 12/01/24(Sun)02:36:18 No.103364416

>>103364361
>All these models suck because they're focused on answering a question on a single input.
There are chat tuned models as opposed to being instruct tuned, but there's a lot of overlap now.

Anonymous
12/01/24(Sun)02:52:03 No.103364517

Anonymous 12/01/24(Sun)02:52:03 No.103364517

>>103364276
I just fixed it by getting claude to update pixtral's jinja2 template to support concatenating system messages.

If anyone else has the exact same problem I have with using turboderps pixtral exl2 quant in tabby, I changed the config.yml to use this as the template:
https://pastebin.com/7dg85mzR

I placed it inside the templates folder as 'pixtral.jinja'

Anonymous
12/01/24(Sun)02:55:10 No.103364537

Anonymous 12/01/24(Sun)02:55:10 No.103364537

>>103364367
Not part of the triple baka trio, so no

Anonymous
12/01/24(Sun)03:06:29 No.103364607

Anonymous 12/01/24(Sun)03:06:29 No.103364607

Has anyone mentioned wanting to do some RP tunes with QwQ or is everyone just waiting for the non preview versions these models to drop from various sources.

Anonymous
12/01/24(Sun)03:21:16 No.103364691

Anonymous 12/01/24(Sun)03:21:16 No.103364691

>>103364607
QwQ is such a bad fit for RP I can't imagine anyone wasting their time. Best thing would be to get it to design scenarios for a better model to write about in some kind of pipeline.

Anonymous
12/01/24(Sun)03:33:25 No.103364759

Anonymous 12/01/24(Sun)03:33:25 No.103364759

Why is everyone talking about QwQ but nobody seems to care about Athene? It's Chinese too.

Anonymous
12/01/24(Sun)03:41:09 No.103364787

Anonymous 12/01/24(Sun)03:41:09 No.103364787

>>103364759
>QwQ
It's usp is chain of thought in a local model.

Anonymous
12/01/24(Sun)03:41:41 No.103364790

Anonymous 12/01/24(Sun)03:41:41 No.103364790

File: Screenshot_20241201_093824.png (33 KB, 693x1396)

33 KB PNG

>>103362325
This reminds me, I wanted to check what Mistral Large 2411 produces when you ask for an SVG of Hatsune Miku.
Pic related is what q8_0 gets you with greedy sampling, it's honestly pretty good for an LLM output.

Anonymous
12/01/24(Sun)03:46:05 No.103364809

Anonymous 12/01/24(Sun)03:46:05 No.103364809

>>103364790
That's better than anything I could wrangle QwQ into making. They were frankly embarrassing.

Anonymous
12/01/24(Sun)03:46:32 No.103364813

Anonymous 12/01/24(Sun)03:46:32 No.103364813

Dear Kobo,

I am writing to you today as a dedicated user and strong advocate for Koboldcpp, your excellent contribution to making llama.cpp more accessible. Your work on this project is highly appreciated, and I am continually impressed by its evolution.

I am reaching out to formally request a crucial enhancement to Koboldcpp: the inclusion of a full spectrum of customization options for draft models, mirroring the detailed control offered by llama.cpp.

Currently, the implementation of draft models in KoboldCPP provides limited customization options compared to what is available in llama.cpp. Specifically, the ability to customize parameters such as gpu-layers-draft, device-draft, ctx-size-draft, draft-p-min, draft-min, and draft-max is crucial for achieving optimal performance and flexibility.

Incorporating the full spectrum of these customization options from llama.cpp into KoboldCPP could significantly enhance both the speed and overall user experience. The current limitations restrict the potential speedup benefits that users have come to expect from llama.cpp, thereby impacting the performance of model deployments.

By enabling these customizations, users would gain greater control over model configurations, allowing them to better tailor the tool to their specific needs and maximize efficiency. This change would not only improve the utility of KoboldCPP but also strengthen its position as a leading tool in the field.

Thank you for considering this suggestion. I am confident that these enhancements would be well-received by the KoboldCPP community.

Thank you for your dedication and hard work.

Anonymous
12/01/24(Sun)03:55:49 No.103364859

Anonymous 12/01/24(Sun)03:55:49 No.103364859

>>103364790
>>103364809
qwq has no fucking clue what Miku looks like
>First, familiarize yourself with Hatsune Miku's appearance. She has blue hair in two long ponytails, usually wearing a school uniform with a white blouse, a red bow tie, a black skirt, and yellow socks. She also has prominent eyes with thick eyelashes and eyebrows.

Anonymous
12/01/24(Sun)04:01:18 No.103364886

Anonymous 12/01/24(Sun)04:01:18 No.103364886

>>103364121
What do you guys put in System Prompt?

Anonymous
12/01/24(Sun)04:04:58 No.103364905

Anonymous 12/01/24(Sun)04:04:58 No.103364905

>>103364886
Character description and other background info. Writing style goes into last assistant prefix.

Anonymous
12/01/24(Sun)04:07:34 No.103364919

Anonymous 12/01/24(Sun)04:07:34 No.103364919

>>103364886
>You are a degenerate woman in her thirties that loves writing filthy erotica.

Anonymous
12/01/24(Sun)04:38:58 No.103365080

Anonymous 12/01/24(Sun)04:38:58 No.103365080

>>103364162
*rapes u in ur slep*

Anonymous
12/01/24(Sun)04:48:28 No.103365120

Anonymous 12/01/24(Sun)04:48:28 No.103365120

>>103364813
Draft. Models. Don't. Work.

Anonymous
12/01/24(Sun)04:49:04 No.103365125

Anonymous 12/01/24(Sun)04:49:04 No.103365125

>>103365120
Why. Do. Zoomers. Do. This.?

Anonymous
12/01/24(Sun)04:52:07 No.103365141

Anonymous 12/01/24(Sun)04:52:07 No.103365141

>>103365120
Skill. Issue.

Anonymous
12/01/24(Sun)04:59:57 No.103365181

Anonymous 12/01/24(Sun)04:59:57 No.103365181

>>103365125
that's a millennial attribute bubby

Anonymous
12/01/24(Sun)05:02:03 No.103365196

Anonymous 12/01/24(Sun)05:02:03 No.103365196

If draft models actually worked, we'd all be using them. they'd be all over reddit. Every backend would have an argument for them, but they're not because you can't use a dumber model to make a smarter model faster.

Anonymous
12/01/24(Sun)05:02:10 No.103365197

Anonymous 12/01/24(Sun)05:02:10 No.103365197

>>103365120
Look who's here, Mr. "I-can't-even-get-speculative-decoding-to-work" guy, claiming that it's the technique that's broken, not his own limited understanding. How cute. How adorable. How utterly laughable.

Listen, buddy, speculative decoding is a well-established technique in the field of large language models (LLMs), and it's not going anywhere just because you can't figure out how to use it. It's like saying that a Ferrari is a bad car because you can't drive a stick shift. Newsflash: the problem isn't the car, it's the driver.

But hey, I'm sure your vast expertise in "I-tried-it-once-and-it-didn't-work" is totally sufficient to dismiss an entire technique that has been extensively researched and validated by actual experts in the field. I mean, who needs peer-reviewed papers and rigorous testing when you've got your gut feeling and a Reddit account?

Let me tell you, friend, if you can't get speculative decoding to work, it's not because the technique is flawed. It's because you're not good enough. You're not smart enough. You're not skilled enough. And that's okay. We can't all be experts in everything. But what's not okay is when you try to pass off your own incompetence as some kind of profound insight.

So, here's a suggestion: instead of wasting everyone's time with your uninformed opinions, why don't you try actually learning about speculative decoding? Read some papers, take some courses, and practice implementing the technique yourself. And if you still can't get it to work, maybe, just maybe, it's because you're not cut out for this whole NLP thing.

But hey, don't worry, I'm sure your participation trophy is still shiny and untouched. You can always go back to claiming that you're a "thought leader" in the field of "I-have-no-idea-what-I'm-doing." We'll all be sure to take your opinions very seriously.

Anonymous
12/01/24(Sun)05:16:13 No.103365277

Anonymous 12/01/24(Sun)05:16:13 No.103365277

I don't need draft models when running my 12B coom tunes.

Anonymous
12/01/24(Sun)05:33:00 No.103365362

Anonymous 12/01/24(Sun)05:33:00 No.103365362

>>103365196
>If draft models actually worked, we'd all be using them.
I would be using them if ooba supported them. They're in the lcpp server, but its just too basic for daily use.
I was getting a solid speed boost when testing it in llama-server (I'm sure you can find my post from a bunch of threads back if you are interested), so once my preferred frontend starts supporting it I'll be all over it.

Anonymous
12/01/24(Sun)05:48:32 No.103365442

Anonymous 12/01/24(Sun)05:48:32 No.103365442

>>103365196
You can, actually, speed boost is evident and can be easily replicated. It's just you need more VRAM to fit the model in and most people here can barely fit the base model.

Anonymous
12/01/24(Sun)05:59:53 No.103365527

Anonymous 12/01/24(Sun)05:59:53 No.103365527

>>103364790
https://files.catbox.moe/0cr93b.svg
This was QwQ after telling it how Miku looks like by pasting from a wiki.

Anonymous
12/01/24(Sun)06:00:52 No.103365534

Anonymous 12/01/24(Sun)06:00:52 No.103365534

>>103365527
>no arms
Did Mikugaki Anon write the wiki?

Anonymous
12/01/24(Sun)06:05:27 No.103365552

Anonymous 12/01/24(Sun)06:05:27 No.103365552

>>103365277
yeah, no need for a draft model when your main model is a draft model

Anonymous
12/01/24(Sun)06:09:10 No.103365565

Anonymous 12/01/24(Sun)06:09:10 No.103365565

I know this is LOCAL models general, but when we start talking about these giga-expensive home builds, things like RunPod make way more sense.
Sometimes in the field we’ll call running a server “locally” when all we mean is self-hosted, but it’s still on an EC2. In my opinion, the same applies here. If you’re not running a managed solution like openai/anthropic/bedrock then I’d call that local enough. Save yourself the time and money and just run workloads on demand on RunPod or Lambda or whatever.

Anonymous
12/01/24(Sun)06:10:12 No.103365573

Anonymous 12/01/24(Sun)06:10:12 No.103365573

>>103365565
Don't project your poverty onto me, please.

Anonymous
12/01/24(Sun)06:20:11 No.103365637

Anonymous 12/01/24(Sun)06:20:11 No.103365637

>>103365565
Buy an ad.
People here don't want to generate their mesugaki smut on someone else's computer.

Anonymous
12/01/24(Sun)06:27:06 No.103365675

Anonymous 12/01/24(Sun)06:27:06 No.103365675

>>103365181
They all can get off of my lawn.

Also, Athene is sleeper good, somehow. 32k context is wimpy but my system struggles beyond 16k so it's not my bottleneck. Strange that Q5KL did better than Q6K, but I don't mind saving a few gigs. Quick to refuse roleplay but basic prefill dodges that.

>>103365565
The essence of "local" to me is that your conversations aren't ultimately being turned into data to sell as a product to advertisers and there isn't a Big Brother reading it in search of a wrongthink that will send you to Room 101. Despite our trusting other's models and others git projects and running them on silicon with glowy bits, there's at least the notion that you and your LLM are having a "private" conversation, no matter if you spent $500,000 (chump change) to build a prototype Chobits or you're one of us vramlet poorfags putting lipstick on a Speak-and-Spell.

Anonymous
12/01/24(Sun)06:31:51 No.103365710

Anonymous 12/01/24(Sun)06:31:51 No.103365710

>>103365565
There is a thread for it on /g/ but at that point you do not run it locally so it does not belong here, perhaps not even on /g/ and you should move to /vg/. As for if it is more expensive, you first need to ask yourself how much you will use it, for what period of time and what kind of models we will get in the future. I do for example, use Openroute for the large models when I want to try them and compare it to what I can run. And to me as of now, it does not make much sense to invest in HW since the improvement outside of speed is actually not that dramatic and that is especially the case if we are talking only about RP. In that case, even Mistral-Nemo is enough and sometimes performs better than the larger models I tried.

Anonymous
12/01/24(Sun)06:38:56 No.103365752

Anonymous 12/01/24(Sun)06:38:56 No.103365752

>>103360011
oh well, maybe I was wrong, but I remember finding a starwars ERP logs on the edu version

Anonymous
12/01/24(Sun)06:39:05 No.103365754

Anonymous 12/01/24(Sun)06:39:05 No.103365754

>>103365565
How does that even matter? How you run it is irrelevant unless you're part of the scum who abuses /lmg/ as their personal tech support. The proper discussion is about the models and the settings on how to get the most of them where it doesn't matter if you're using runpod or local hardware.

Anonymous
12/01/24(Sun)06:46:04 No.103365790

Anonymous 12/01/24(Sun)06:46:04 No.103365790

Cant believe i got my ass up and patched together a server for my kids to have a AI buddy in minecraft.

-kotoba-whisper-v1.0 speech to text.
-gemma 27b because its good with japanese
-Put minecraft commands that should be executed in tags and execute them all with. RCON.
-filter that shit for..
-GPT-SoVITS-v2 which is good and really fast.

Still local retardation but good enough to ask it to make a small house beside you and put a villager in there. lmao
Is there any way to get vision?
I know that with kobold they had some vision stuff you could put on top. Anything like that existing for gemma 27b?
Otherwise I'll see how about pixtral is.

Anonymous
12/01/24(Sun)06:48:56 No.103365802

Anonymous 12/01/24(Sun)06:48:56 No.103365802

>>103365565
>Save yourself the time and money
Thanks for the tip.
I believe services like runpod are used when making finetunes.

Anonymous
12/01/24(Sun)07:00:42 No.103365870

Anonymous 12/01/24(Sun)07:00:42 No.103365870

>>103365790
Llava for vision.
You have a GitHub for this masterpiece?va8xmg

Anonymous
12/01/24(Sun)07:01:01 No.103365871

Anonymous 12/01/24(Sun)07:01:01 No.103365871

>have money to waste on llms
>your option either is censorslop(GPT/Claude) or dumbslop(llama3/mistral/Qwen).
I don't even want to run models locally, I just want models that are smart and uncensored. fuck sake.

Anonymous
12/01/24(Sun)07:08:28 No.103365911

Anonymous 12/01/24(Sun)07:08:28 No.103365911

>>103365870
Is it the llama3 one? https://huggingface.co/koboldcpp/mmproj/tree/main
Or do i need a completely different file from somewhere else?

>You have a GitHub for this masterpiece?
No, but if I clean up I might upload it.
Its a horrible stitched together python server. And a c# client. Too embarrassing right now. lol
But was suprised how good it feels to speak to a llm even if its just tts.
SoVITS sounds natural enough and I never tried closed stuff because I dont wanna send my voice.

Anonymous
12/01/24(Sun)07:09:26 No.103365918

Anonymous 12/01/24(Sun)07:09:26 No.103365918

Are any of the 32k models actually 32k? I can't seem to get anywhere close to that context length without it slowly degrading into a stroke victim with mispelled names and missing spaces.

Anonymous
12/01/24(Sun)07:10:35 No.103365926

Anonymous 12/01/24(Sun)07:10:35 No.103365926

>>103365918
Of course, didnt you see the needle test?
If you write nigger in there somewhere it remembers it!
Now dont do long roleplay or god forbid dump a gamerguide in context that should give you "the next step".

Anonymous
12/01/24(Sun)07:19:20 No.103365974

Anonymous 12/01/24(Sun)07:19:20 No.103365974

>>103365918
Llama 3.2/Command R are capable of it and I think Mistral Large does not have a problem with it either. But the previous llama and Mistral-nemo or the small ones, do have problems after like 16k and it just get worse with more context.

Anonymous
12/01/24(Sun)07:19:49 No.103365975

Anonymous 12/01/24(Sun)07:19:49 No.103365975

So, I was checking local GPU prices lately, and things just look bad. P40 is nonexistent, P100 is overpriced, 3090s are either "unchecked, probably works" scams or overpriced. 4090s are mostly junk being sold for parts. The situation won't improve with the release of 5090

Anonymous
12/01/24(Sun)07:20:36 No.103365980

Anonymous 12/01/24(Sun)07:20:36 No.103365980

>>103365974
The trick people use for long roleplay is to summarize it and then continue from there with new chat.

Anonymous
12/01/24(Sun)07:22:44 No.103365992

Anonymous 12/01/24(Sun)07:22:44 No.103365992

>>103365980
it'll be nicer when st's new message deletion is implemented and you can delete between messages, so free up the beginning but leave the last dozen messages to keep everything on track

Anonymous
12/01/24(Sun)07:22:48 No.103365993

Anonymous 12/01/24(Sun)07:22:48 No.103365993

>>103365974
>I think Mistral Large does not
It does, after around 20k tokens

Anonymous
12/01/24(Sun)07:26:48 No.103366013

Anonymous 12/01/24(Sun)07:26:48 No.103366013

>>103365565
the real local vs proprietary model dichotomy is whether you want a smart model that refuses to answer anything, or a dumb model that can't answer anything. does it make sense to spend $4k on 96gb vram or 384gb ram? probably not. but it also doesn't make sense to spend hundreds on rented compute when the proprietary models are similarly priced while being strictly better

Anonymous
12/01/24(Sun)07:26:50 No.103366014

Anonymous 12/01/24(Sun)07:26:50 No.103366014

>>103365980
Even when summarization with LLMs works, it always produce concentrated slop, and the character doesn't come across the same way in the new chat

Anonymous
12/01/24(Sun)07:27:07 No.103366016

Anonymous 12/01/24(Sun)07:27:07 No.103366016

>>103365993
Even Mistral Large 2411 ? Here how they marketed it.:It provides a significant upgrade on the previous Mistral Large 24.07, with notable improvements in long context understanding

Anonymous
12/01/24(Sun)07:30:15 No.103366032

Anonymous 12/01/24(Sun)07:30:15 No.103366032

>>103366016
They didn't show any benchmarks at release, so...

Anonymous
12/01/24(Sun)07:31:25 No.103366036

Anonymous 12/01/24(Sun)07:31:25 No.103366036

>>103366016
I don't like 2411 at all

Anonymous
12/01/24(Sun)07:43:48 No.103366103

Anonymous 12/01/24(Sun)07:43:48 No.103366103

>>103366016
bro 2411 can't even handle formatting like some 7b model from a year ago

Anonymous
12/01/24(Sun)07:49:18 No.103366133

Anonymous 12/01/24(Sun)07:49:18 No.103366133

I tried running largestral Q6_K on a 9950x and I'm getting an incredible 1 token per second.
Funnily enough, while prompt processing the temperature shot up to almost 90C but during actual generation it was sitting at a cool 54C with fans barely audible.

Anonymous
12/01/24(Sun)07:53:24 No.103366152

Anonymous 12/01/24(Sun)07:53:24 No.103366152

>>103366133
inference is bottlenecked by RAM speed

Anonymous
12/01/24(Sun)08:01:08 No.103366213

Anonymous 12/01/24(Sun)08:01:08 No.103366213

>>103366152
that's a lie, if that were true then draft models could theoretically work for speedups, but we know they clearly don't

Anonymous
12/01/24(Sun)08:04:31 No.103366240

Anonymous 12/01/24(Sun)08:04:31 No.103366240

>>103366213
he's genning on cpu of course he's limited by RAM, are you retarded?

Anonymous
12/01/24(Sun)08:05:29 No.103366248

Anonymous 12/01/24(Sun)08:05:29 No.103366248

I don't understand how a draft model is supposed to help if you have to verify the draft result using a proper model anyway.

Anonymous
12/01/24(Sun)08:08:10 No.103366265

Anonymous 12/01/24(Sun)08:08:10 No.103366265

>>103366248
You know how big labs batch hundreds of prompts into one to take advantage of parallel processing?
Draft models let you do that for a single prompt by batching multiple tokens of that prompt. Without a faster 'guess' at the tokens, you could only ever do one at a time because each token depends on the last.

Anonymous
12/01/24(Sun)08:09:45 No.103366276

Anonymous 12/01/24(Sun)08:09:45 No.103366276

>>103366265
So it's speculative execution?

Anonymous
12/01/24(Sun)08:10:10 No.103366284

Anonymous 12/01/24(Sun)08:10:10 No.103366284

>>103366276
Exactly analogous to it, yes.

llama.cpp CUDA dev !!OM2Fp6Fn93S
12/01/24(Sun)08:13:00 No.103366303

llama.cpp CUDA dev !!OM2Fp6Fn93S 12/01/24(Sun)08:13:00 No.103366303

>>103366248
If you generate tokens one-at-a-time you can use each value that you load from the weights only once.
If you have a good guess for what the next token will be you can use each value from the weights two times so the model evaluation is more efficient.
If your guess for the first token was correct you have reduced the amount of I/O for the last two tokens by ~50%.
If your guess for the first token was wrong you wasted a small amount of compute and I/O and you have to throw the results for the second token away.
As long as generating the guesses is cheap and they're sufficiently good you will on average reduce the amount of necessary I/O and thus increase the average rate at which tokens are generated.

Anonymous
12/01/24(Sun)08:15:40 No.103366320

Anonymous 12/01/24(Sun)08:15:40 No.103366320

>>103366303
how do samplers interact with it? would something like xtc reduce its performance because the actual most likely guess can often get thrown out?

Anonymous
12/01/24(Sun)08:17:33 No.103366331

Anonymous 12/01/24(Sun)08:17:33 No.103366331

>>103366320
*to be clear I'm asking just in terms of its implementation in llama.cpp, I understand it would depend on the strategy used for draft generation and such

llama.cpp CUDA dev !!OM2Fp6Fn93S
12/01/24(Sun)08:21:25 No.103366357

llama.cpp CUDA dev !!OM2Fp6Fn93S 12/01/24(Sun)08:21:25 No.103366357

>>103366320
>>103366331
I don't know the exact details of how the interaction between sampling and drafting is implemented, sorry.

Anonymous
12/01/24(Sun)08:21:40 No.103366359

Anonymous 12/01/24(Sun)08:21:40 No.103366359

>>103366213
>draft models
Here we go with the meme again.

Anonymous
12/01/24(Sun)08:37:49 No.103366445

Anonymous 12/01/24(Sun)08:37:49 No.103366445

Hey guys, more of a casual observer here but if I may offer one of my observations, you guys don't seem to be really having fun anymore.

Anonymous
12/01/24(Sun)08:40:32 No.103366461

Anonymous 12/01/24(Sun)08:40:32 No.103366461

>>103366445
I am always depressed during the winter. Low energy.

Anonymous
12/01/24(Sun)08:44:01 No.103366491

Anonymous 12/01/24(Sun)08:44:01 No.103366491

>>103366445
Your observation is correct. For me, Command-R is still unsurpassed and am bored of local models and the focus on nerd usecases.

Anonymous
12/01/24(Sun)08:48:55 No.103366525

Anonymous 12/01/24(Sun)08:48:55 No.103366525

I really like QwQ. Despite some of its more obvious flaws like the fact it's very much a 32b model sometimes, is dry as hell and is extremely censored, when it does work it shows itself extremely capable of actually working through situations and understanding the nuances of the conversation. Do you think it will get better from here?

Anonymous
12/01/24(Sun)08:51:40 No.103366546

Anonymous 12/01/24(Sun)08:51:40 No.103366546

>>103366525
>dry as hell and is extremely censored
https://huggingface.co/win10/EVA-QwQ-32B-Preview
https://huggingface.co/bartowski/EVA-QwQ-32B-Preview-GGUF

Anonymous
12/01/24(Sun)08:53:54 No.103366562

Anonymous 12/01/24(Sun)08:53:54 No.103366562

>>103366546
>check what EVA is
>"finetune of Qwen2.5-32B on mixture of synthetic and natural data"
So it's another inbred model.

Anonymous
12/01/24(Sun)08:58:14 No.103366589

Anonymous 12/01/24(Sun)08:58:14 No.103366589

>>103366359
Yo, Tyrone! Heard you're strugglin' wit' dat speculative decoding thang. Ditch dat local LLM, it's be raycis. You need to upgrade to them cloud models, know what I'm sayin'? /aicg/ thread got the plug, they discuss all the fire cloud models. Get on dat and level up, my G. Local LLMs is so last season, and they ain't got nothin' but hate for a brotha tryin' to make it. Cloud is where it's at, cuz.

Anonymous
12/01/24(Sun)08:58:34 No.103366594

Anonymous 12/01/24(Sun)08:58:34 No.103366594

>>103366546
Has anyone tested this? Wouldn't merging these models pretty much break it's reasoning? It was barely hanging in there in the first place.

Anonymous
12/01/24(Sun)09:04:48 No.103366631

Anonymous 12/01/24(Sun)09:04:48 No.103366631

>>103366445
CR update was a flop, Largestral 2411 was a flop, llama and qwen are too cucked for my taste. I'm indeed quite upset with the recent developments. We need more competition.

Anonymous
12/01/24(Sun)09:05:31 No.103366638

Anonymous 12/01/24(Sun)09:05:31 No.103366638

>>103366546
https://huggingface.co/jackboot/uwu-qwen-32b

Anonymous
12/01/24(Sun)09:31:41 No.103366795

Anonymous 12/01/24(Sun)09:31:41 No.103366795

>>103366320
The output from the draft model is a probability distribution just like the output from the regular model is one. Samplers WILL make it less efficient e.g. if using XTC since it will nix high probabilty (easy to guess) tokens willy nilly, but if the resulting picked token is the same, that's what matters.

Anonymous
12/01/24(Sun)09:33:12 No.103366805

Anonymous 12/01/24(Sun)09:33:12 No.103366805

ai noob here.
Can someone give me a quickstart on what model or general setup to use.
Currently playing around with some Kobold Horde models and while the responses are ok, it glitches more than I'd like. Like writing author notes or adding social media links.

My goal is to have something that is at least on the level of Mikugg (the response quality not the format)

Anonymous
12/01/24(Sun)09:42:37 No.103366909

Anonymous 12/01/24(Sun)09:42:37 No.103366909

>>103366805
>Mikugg
https://github.com/miku-gg/miku?tab=readme-ov-file#llm-endpoint-setup
It really depends on your setup (or how much you're willing to spend upgrading) and how slow the generation you can tolerate

Anonymous
12/01/24(Sun)09:44:48 No.103366926

Anonymous 12/01/24(Sun)09:44:48 No.103366926

>>103366805
You are using other people's computers for free with horde so you can't really expect too much quality because you can't run good models
>glitches
they are not really glitches, it's just that the models are small and bad
I don't know about the quality of mikugg, but if you want anything at least decent you need a 12B model, what's your hardware?

Anonymous
12/01/24(Sun)09:46:48 No.103366941

Anonymous 12/01/24(Sun)09:46:48 No.103366941

>>103365125
Xhe's. Not. Wrong. Though.

Anonymous
12/01/24(Sun)09:50:50 No.103366980

Anonymous 12/01/24(Sun)09:50:50 No.103366980

>>103366941
Nigger. Brain. Too. Stupid. For. Speculative. Decoding.

Anonymous
12/01/24(Sun)09:52:52 No.103366996

Anonymous 12/01/24(Sun)09:52:52 No.103366996

>>103366303
What if lets say I'm running a very big model(123b or bigger) on CPU/RAM and a very small(1b) grafting model in a very fast GPU with little VRAM, does it only try to predict the next token once? Would it be possible for it to try the prediction multiple times?

llama.cpp CUDA dev !!OM2Fp6Fn93S
12/01/24(Sun)09:54:52 No.103367013

llama.cpp CUDA dev !!OM2Fp6Fn93S 12/01/24(Sun)09:54:52 No.103367013

>>103366996
You can predict multiple tokens and this is what is being done.
IIRC it's also not just a single sequence but a branching tree to increase the odds of guessing correctly.

Anonymous
12/01/24(Sun)09:55:32 No.103367017

Anonymous 12/01/24(Sun)09:55:32 No.103367017

>>103366926
6800 with 7600X
I'm fucked right?

Anonymous
12/01/24(Sun)09:56:28 No.103367030

Anonymous 12/01/24(Sun)09:56:28 No.103367030

a small note about using guidance that I wrote, which you might find interesting.
It's a good tool for prompt manipulation for models that don't support function calling or are very dumb. the method is old, but I've found it very useful for emotion systems and evaluating conditions
https://rentry.org/llm-guidance

Anonymous
12/01/24(Sun)10:01:10 No.103367067

Anonymous 12/01/24(Sun)10:01:10 No.103367067

>>103367017
AMD sucks for AI but Im sure you can get it to work

16GB of VRAM is a good amount, you can run some Magnum V4 27B at Q4, Mistral Small 22B finetune at Q5 or Mistral Nemo 12B finetune at Q8 fully loaded on VRAM which will make it fast, or if you have patience you can offload it to the RAM and run 30B-50B models, Im sure it will be better than whatever that Miggu thing can offer for free. You have a decent set up

Anonymous
12/01/24(Sun)10:02:34 No.103367076

Anonymous 12/01/24(Sun)10:02:34 No.103367076

Anyone know of a draft model that works with EVA or Evathene?

Anonymous
12/01/24(Sun)10:03:24 No.103367086

Anonymous 12/01/24(Sun)10:03:24 No.103367086

retarded question but how do you actually download models off of huggingface? I swear they just had normal direct downloads last time I was looking for models

Anonymous
12/01/24(Sun)10:04:55 No.103367094

Anonymous 12/01/24(Sun)10:04:55 No.103367094

>>103367086
just use the huggingface cli?

Anonymous
12/01/24(Sun)10:08:20 No.103367125

Anonymous 12/01/24(Sun)10:08:20 No.103367125

>>103367086
Check for models with "GGUF" in the name.

Anonymous
12/01/24(Sun)10:08:36 No.103367132

Anonymous 12/01/24(Sun)10:08:36 No.103367132

File: Miku pours Anon a glass o(...).webm (365 KB, 720x720)

365 KB WEBM

>check current RTX 3090 prices
>they've gone up slightly
>wtf, what is the market doing.
>check p40
>bottom of the stack is now 400 USD
What the fuck are people doing?

Anonymous
12/01/24(Sun)10:10:09 No.103367153

Anonymous 12/01/24(Sun)10:10:09 No.103367153

>>103367132
And MI60 are like $500.
It was $300 not long ago

Anonymous
12/01/24(Sun)10:10:31 No.103367157

Anonymous 12/01/24(Sun)10:10:31 No.103367157

>>103367132
in my local second-hand online store they have gone down from 700€ december last year to 550€ nowadays

Anonymous
12/01/24(Sun)10:11:45 No.103367165

Anonymous 12/01/24(Sun)10:11:45 No.103367165

>>103367153
>poorfags now have to use Kepler to make their multi-gpu setup
dire.

Anonymous
12/01/24(Sun)10:13:26 No.103367175

Anonymous 12/01/24(Sun)10:13:26 No.103367175

Been away for 8 months.
Any models these days approaching gpt4 capability? or are they still basically random word generators with no object permenance or memory?

Anonymous
12/01/24(Sun)10:14:16 No.103367183

Anonymous 12/01/24(Sun)10:14:16 No.103367183

>>103367132
r/LocalLlama has 251K members.

Anonymous
12/01/24(Sun)10:15:22 No.103367194

Anonymous 12/01/24(Sun)10:15:22 No.103367194

>>103367175
>gpt4
Already surpassed with llama 405b. We want local Claude.

Anonymous
12/01/24(Sun)10:15:56 No.103367203

Anonymous 12/01/24(Sun)10:15:56 No.103367203

>>103367194
Buy an ad

Anonymous
12/01/24(Sun)10:17:33 No.103367212

Anonymous 12/01/24(Sun)10:17:33 No.103367212

>>103367194
really? care to spoonfeed me on what local model is best?
I tried claude when it came out and didnt see what all the hype was about. but my uses are niche so mayb it was just particuarly bad at historical knowledge

Anonymous
12/01/24(Sun)10:18:43 No.103367226

Anonymous 12/01/24(Sun)10:18:43 No.103367226

>>103367212
>t and didnt see what all the hype was about
Basically it shits out a bunch of purple prose, which, although meaningless, is very impressive to shitjeet FOBs who can barely speak English. They assume pretty white girl is sending lots of bob and vagene and get super turned on.

Anonymous
12/01/24(Sun)10:19:15 No.103367234

Anonymous 12/01/24(Sun)10:19:15 No.103367234

>>103367175
>Any models these days approaching gpt4 capability?
Mistral Large 123B is there if not slightly better
Llama3.1 70B kind of comes close

Anonymous
12/01/24(Sun)10:20:13 No.103367246

Anonymous 12/01/24(Sun)10:20:13 No.103367246

>>103367212
Reflection 70B

Anonymous
12/01/24(Sun)10:22:42 No.103367275

Anonymous 12/01/24(Sun)10:22:42 No.103367275

i love girls

Anonymous
12/01/24(Sun)10:22:42 No.103367276

Anonymous 12/01/24(Sun)10:22:42 No.103367276

>>103367086
You can still use direct downloads, but you have to go file by file. or have a seq+wget script for the model bits and download the rest manually. If you only download quantized models, the seq method is probably the simplest for big models.

I use git with an extra script i wrote.
>git clone therepo
>git -C therepo lfs install --local
>git -C therepo lfs fetch
>ksh status.c export ex therepo
status.c does a few things. The export command is to make symlinks between the ex/therepo dir to the actual repo, including lfs files. That's just to avoid having to do a checkout of the lfs files and have the model take twice as much storage. The actual repo remains unchanged, so i can still update normally, re-export, convert and quant.

But i'm sure >>103367094 works just fine and with less faffing about

Anonymous
12/01/24(Sun)10:23:01 No.103367283

Anonymous 12/01/24(Sun)10:23:01 No.103367283

Any other draft models that work with Mistral Large 2 besides Mistral-7b-v0.3? Does Mistral Small work? Would it be worth it?

Anonymous
12/01/24(Sun)10:25:31 No.103367319

Anonymous 12/01/24(Sun)10:25:31 No.103367319

>>103367067
>>103366926
>I don't know about the quality of mikugg
It's miles ahead of that shitty default Janitor model, It generally outputs coherent story and can do some basic reasoning and instructions like telling it that there is a 1/3 chance of success for an action. It also has some anime/game character knowledge.
Haven't used any flagship model to make a proper comparison

>>103367132
>market
the current gen is already over 2 years old. Everything is fucked since corona thanks to ai and coins and stagnating EUVL progress and even shitty game optimization. There is zero incentive to lower prices
With the next generation you don't get cheaper cards, just more peak performance for premium prize

Anonymous
12/01/24(Sun)10:26:03 No.103367329

Anonymous 12/01/24(Sun)10:26:03 No.103367329

>>103367283
Small is 22b, which is 3x mistral 0.3. For speculative it's not worth it.
Bug MistralAI to release the 3B model.

Anonymous
12/01/24(Sun)10:36:15 No.103367449

Anonymous 12/01/24(Sun)10:36:15 No.103367449

>>103367319
>since corona
You mean it's fucked because of hysterical liberal retards who forcibly shut down the entire planet and silenced any scientific discourse on the matter because of a novel strain of the common cold

Anonymous
12/01/24(Sun)10:45:22 No.103367531

Anonymous 12/01/24(Sun)10:45:22 No.103367531

>>103367449
no I mean it's fucked because people learned from that event to entertain themselves in solitary with their pc and online services

Anonymous
12/01/24(Sun)10:47:28 No.103367550

Anonymous 12/01/24(Sun)10:47:28 No.103367550

lmao holy shit meta really has no idea what they're doing, llama 4 ~800b top end, more slopped than 3, still text primary with vision encoder slapped on and finetuned later
this can't last, it's just not delivering anything worth the resources invested

Anonymous
12/01/24(Sun)10:55:06 No.103367615

Anonymous 12/01/24(Sun)10:55:06 No.103367615

https://x.com/airkatakana/status/1863221519155151036

Anonymous
12/01/24(Sun)10:56:43 No.103367637

Anonymous 12/01/24(Sun)10:56:43 No.103367637

>>103367615
the absolute state

Anonymous
12/01/24(Sun)10:58:55 No.103367654

Anonymous 12/01/24(Sun)10:58:55 No.103367654

>>103367615
>I hate how there are only 2 genders on *** game's character creation screen!
>*gets banned*
lol

Anonymous
12/01/24(Sun)10:59:38 No.103367662

Anonymous 12/01/24(Sun)10:59:38 No.103367662

>>103367615
This sure is relevant for local models

Anonymous
12/01/24(Sun)10:59:38 No.103367663

Anonymous 12/01/24(Sun)10:59:38 No.103367663

>>103367319
Just tried Mikugg, it sucks, Mistral Nemo 12B is fat better
Try Silly Tavern with a prompt

Anonymous
12/01/24(Sun)11:02:26 No.103367697

Anonymous 12/01/24(Sun)11:02:26 No.103367697

>>103367615
How do I unsubscribe to your blog?

Anonymous
12/01/24(Sun)11:11:51 No.103367787

Anonymous 12/01/24(Sun)11:11:51 No.103367787

>>103367697
Leave.

Anonymous
12/01/24(Sun)11:15:37 No.103367827

Anonymous 12/01/24(Sun)11:15:37 No.103367827

>>103367662
You post anime vocaloids here, you have no rights to complain.

Anonymous
12/01/24(Sun)11:21:25 No.103367884

Anonymous 12/01/24(Sun)11:21:25 No.103367884

>>103367827
Miku and her friends are orders of magnitude more relevant to local models than twitter drama.
This is a fact.

Anonymous
12/01/24(Sun)11:24:08 No.103367911

Anonymous 12/01/24(Sun)11:24:08 No.103367911

>>103366562
In my testing it's very mid and doesn't seem to stand out in any particular way.

Anonymous
12/01/24(Sun)11:24:41 No.103367920

Anonymous 12/01/24(Sun)11:24:41 No.103367920

>>103367884
they're not at all related

Anonymous
12/01/24(Sun)11:25:46 No.103367932

Anonymous 12/01/24(Sun)11:25:46 No.103367932

>>103367884
>Miku and her friends are orders of magnitude more relevant
No they aren't you disgusting troon.

Anonymous
12/01/24(Sun)11:26:56 No.103367944

Anonymous 12/01/24(Sun)11:26:56 No.103367944

>>103367827
If the mods were based they would enforce
>(USER WAS BANNED FOR POSTING TWITTERSHIT)
on every board.

Anonymous
12/01/24(Sun)11:29:13 No.103367965

Anonymous 12/01/24(Sun)11:29:13 No.103367965

>>103367920
You will not be spared.

Anonymous
12/01/24(Sun)11:48:47 No.103368179

Anonymous 12/01/24(Sun)11:48:47 No.103368179

What killed the Tulu hype?

Anonymous
12/01/24(Sun)11:55:10 No.103368254

Anonymous 12/01/24(Sun)11:55:10 No.103368254

File: rat.jpg (33 KB, 360x270)

33 KB JPG

>>103368179
Using it for more than five minutes

Anonymous
12/01/24(Sun)11:57:05 No.103368279

Anonymous 12/01/24(Sun)11:57:05 No.103368279

>>103368179
Wholu?

Anonymous
12/01/24(Sun)12:02:41 No.103368336

Anonymous 12/01/24(Sun)12:02:41 No.103368336

>>103368179
It's kinda good and different but ultimately doesn't offer a much better experience overall than any other slop model.
Right now my only hopes are for a non corpo model using the intellect training method or someone releasing a non censored good model with modern internet/literature datasets, both are extremely unlikely so I guess we just need to wait 3-5 years to maybe get something decent, grim.

Anonymous
12/01/24(Sun)12:19:45 No.103368534

Anonymous 12/01/24(Sun)12:19:45 No.103368534

>>103368179
>>103368336
>but ultimately doesn't offer a much better experience overall than any other slop model
I found Tulu to be very competitive in my (admittedly limited, it'll grow someday) set of programming checks.
RP (not ERP) it felt really strong out of the gate. A few mistakes and a few slop memes but it ran a good 6000 context before it started doing weird things. It seemed to be very good at integrating world knowledge but very bad at paying attention to surroundings. (My RP test involves a character with a secret that I know and it has to deal with that. It was good about maintaining that concept but was willing to blurt out admission of the secret in public spaces when it would only make sense in a private place.)

Anonymous
12/01/24(Sun)12:20:35 No.103368545

Anonymous 12/01/24(Sun)12:20:35 No.103368545

>>103367132
>>103365975
It will only get worse

Anonymous
12/01/24(Sun)12:37:21 No.103368719

Anonymous 12/01/24(Sun)12:37:21 No.103368719

File: GdudA28XgAAM4aC.jpg (52 KB, 957x482)

52 KB JPG

Thoughts?

Anonymous
12/01/24(Sun)12:39:16 No.103368740

Anonymous 12/01/24(Sun)12:39:16 No.103368740

>>103368719
They must be saving a lot.

Anonymous
12/01/24(Sun)12:41:06 No.103368765

Anonymous 12/01/24(Sun)12:41:06 No.103368765

>>103368719
What is amazon doing with 360K GB200s?

Anonymous
12/01/24(Sun)12:41:56 No.103368777

Anonymous 12/01/24(Sun)12:41:56 No.103368777

>>103368719
>all those chips and they still can't beat Claude

Anonymous
12/01/24(Sun)12:42:17 No.103368783

Anonymous 12/01/24(Sun)12:42:17 No.103368783

>>103368765
rent

Anonymous
12/01/24(Sun)12:46:27 No.103368835

Anonymous 12/01/24(Sun)12:46:27 No.103368835

>>103368765
rent and for anthropic

Anonymous
12/01/24(Sun)12:47:11 No.103368846

Anonymous 12/01/24(Sun)12:47:11 No.103368846

File: 2024-11-17_051012_seed207(...).png (2.35 MB, 1536x1536)

2.35 MB PNG

>>103364367
She's made her appearance before.

Anonymous
12/01/24(Sun)12:57:14 No.103368957

Anonymous 12/01/24(Sun)12:57:14 No.103368957

>>103368783
How does Jeff keep getting away with being the eternal middle man?

Anonymous
12/01/24(Sun)12:59:46 No.103368987

Anonymous 12/01/24(Sun)12:59:46 No.103368987

what model is best for erotica?

Anonymous
12/01/24(Sun)12:59:48 No.103368989

Anonymous 12/01/24(Sun)12:59:48 No.103368989

File: 1732766798746566.png (509 KB, 512x680)

509 KB PNG

>>103368719
The bubble will burst, and we'll get those H100s.

Anonymous
12/01/24(Sun)13:01:01 No.103369003

Anonymous 12/01/24(Sun)13:01:01 No.103369003

>>103368989
We just need mass produced and cheap MXM to PCIe adapter. Currently the adapters go for ~$400-600 a piece. Its absolute shit.

Anonymous
12/01/24(Sun)13:02:35 No.103369020

Anonymous 12/01/24(Sun)13:02:35 No.103369020

>>103368989
lol

Anonymous
12/01/24(Sun)13:03:49 No.103369039

Anonymous 12/01/24(Sun)13:03:49 No.103369039

>>103368987
Every single model is dogshit.
The best current combination I found is to use Rocinante v2 for sex scenes and story progression, and then use Tulu for long context retrieval and to take care of the non sexual or some logic in complex scenes. This is for 40k+ tokens stories, so far even if you get the usual slop, is actually varied and surprises me from time to time, so there's still cooming to be had.

Anonymous
12/01/24(Sun)13:04:10 No.103369042

Anonymous 12/01/24(Sun)13:04:10 No.103369042

>>103368989
Those H100 are contractually obligated to end up in Nvidia's recycling facility.

Anonymous
12/01/24(Sun)13:05:08 No.103369056

Anonymous 12/01/24(Sun)13:05:08 No.103369056

>>103368989
>he lacks the information

Anonymous
12/01/24(Sun)13:07:41 No.103369092

Anonymous 12/01/24(Sun)13:07:41 No.103369092

>>103369042
>>103369056
Nobody would give a shit when that bubble bursts

Anonymous
12/01/24(Sun)13:08:41 No.103369103

Anonymous 12/01/24(Sun)13:08:41 No.103369103

>>103369092
That's not how contracts work.

Anonymous
12/01/24(Sun)13:09:05 No.103369108

Anonymous 12/01/24(Sun)13:09:05 No.103369108

>>103368989
>The bubble will burst
Yes, but RTX 5000 will still launch at bubble prices.
>and we'll get those H100s.
Definitely not in the near future.
Datacenters will sell off their old hardware like V100s first.

Anonymous
12/01/24(Sun)13:12:13 No.103369131

Anonymous 12/01/24(Sun)13:12:13 No.103369131

BitNet when? Just drop a usable model and there will be adder-only hardware that destroy Nvidia's monopoly

Anonymous
12/01/24(Sun)13:15:33 No.103369168

Anonymous 12/01/24(Sun)13:15:33 No.103369168

>>103369131
Never. The term has been polluted with some 1-bit quant scheme that sucks so now anybody who wants to do the good bitnet gets conflated with the bogus bitnet.

Unless that distributed Intellect project spins up a 1.58 bitnet branch and shoots for the moon, it probably is not going to happen till Nvidia figures out how to monopolize it first.

Anonymous
12/01/24(Sun)13:17:07 No.103369199

Anonymous 12/01/24(Sun)13:17:07 No.103369199

>>103369131
nvidia itself will manufacture it if there's high demand. They have the factories and can supply enough hardware for the big companies and the 3-4 anons that can afford to buy highly specific hardware for a niche thing.
Fat chance of seeing that, but if it does happen, nothing at all changes.

Anonymous
12/01/24(Sun)13:18:03 No.103369206

Anonymous 12/01/24(Sun)13:18:03 No.103369206

>>103369199
But it will be cheap as shit because they will no longer have the monopoly on fast and popular matmul. That's the goal.

Anonymous
12/01/24(Sun)13:21:54 No.103369268

Anonymous 12/01/24(Sun)13:21:54 No.103369268

>>103365911
>Llava
example here: https://github.com/cpumaxx/lmg_recapbot/blob/main/ismiku.sh
>but if I clean up I might upload it.
Just upload it. Then it will be out there and you can fix it later if you wan.
Literally no one cares what your code looks like, but having it up means people can have fun with it.
I know, I fall in the same trap too, but I've started putting more half-baked stuff out there and I feel a lot better about it than when I hoarded code for the "one day I'll refactor it and release" that never happens.

Anonymous
12/01/24(Sun)13:27:54 No.103369341

Anonymous 12/01/24(Sun)13:27:54 No.103369341

>>103369268
>Literally no one cares what your code looks like
Only people who don't code.
Unless you have a hit and then people will make videos, "This code looks terrible but it's one of the most profitable games of the decade!!!"
Because code doesn't matter, only results.

Anonymous
12/01/24(Sun)13:30:06 No.103369370

Anonymous 12/01/24(Sun)13:30:06 No.103369370

>>103369206
>But it will be cheap as shit because they will no longer have the monopoly on fast and popular matmul. That's the goal.
They'll set whatever price they want because they're the only ones that can supply a few 100k devices to all the big companies. Brand recognition is also useful for them.
Do you not understand how tiny this niche is? They're not expecting (you) to call for a quote for 1(one) adder device just to try it out. They're expecting amazon to buy 100k of them.
If they have a low enough yield, maybe they start selling the trimmed ones to regular consumers.

Anonymous
12/01/24(Sun)13:37:14 No.103369451

Anonymous 12/01/24(Sun)13:37:14 No.103369451

>>103369039
I've been using rocinante v1.1, is v2 any different?

Anonymous
12/01/24(Sun)13:39:37 No.103369482

Anonymous 12/01/24(Sun)13:39:37 No.103369482

>>103369039
>>103369451
*clears throat*
FUCK OFF SAO
DON'T BUY AN AD
JUST GET THE FUCK OFF OF THIS WEBSITE AND NEVER COME BACK.

Anonymous
12/01/24(Sun)13:40:40 No.103369489

Anonymous 12/01/24(Sun)13:40:40 No.103369489

>>103369370
If they don't want to sell it at reasonable prices, there will be others who design and make it and sell for less. And even big companies will consider funding the new underdog to make more. That's called competition

Anonymous
12/01/24(Sun)13:41:39 No.103369498

Anonymous 12/01/24(Sun)13:41:39 No.103369498

File: NordWaifu.png (1.29 MB, 1080x1578)

1.29 MB PNG

I currently have an RX 6600 in my home server and I'm looking to try out the Skyrim AI mod. Would it make sense to add another RX 6600 to handle a draft model + TTS, while the main RX 6600 runs a 13B model? I’m assuming throughput is crucial, given that the mod generates dialogue for every NPC within a certain radius of the player, and there are numerous plugins that enhance vision, text recognition, and other features. My main concern is whether a quantized 13B model might interfere with generating JSON responses properly, as the mod framework relies on those responses to trigger actions. Any thoughts?

Anonymous
12/01/24(Sun)13:44:41 No.103369532

Anonymous 12/01/24(Sun)13:44:41 No.103369532

>>103369498
Is this mod available for the 2011 release or do I need the 14th anniversary extra special VR edition?

Anonymous
12/01/24(Sun)13:45:34 No.103369544

Anonymous 12/01/24(Sun)13:45:34 No.103369544

>>103369451
In my experience v2 is more intelligent and has way less slop in the long run, and as I said it sometimes surprises me with stuff it has never output before, take into account that I made like 10 novels of 50k+ tokens about the same topic with some variations, so I've read the same stuff over and over again so I have a good judgement when the model is doing something fresh or not.

Anonymous
12/01/24(Sun)13:46:51 No.103369565

Anonymous 12/01/24(Sun)13:46:51 No.103369565

>>103369451
>>103369544
To be clear I'm using Rocinante-12B-v2g-Q5_K_M, specifically, I think v2 is a different model.

Anonymous
12/01/24(Sun)13:47:47 No.103369576

Anonymous 12/01/24(Sun)13:47:47 No.103369576

>>103369039
Tulu 70b is really good at doing individual characters and erp, though it seems rather poor at plot progression and fight scenes.

Nemotron 70b is worse at doing characters and erp, but only slightly worse. Nemotron 70b seems much more capable of story progression and fight scenes, and it seems more intelligent overall.

So, my favorite is still Nemotron 70b.

Anonymous
12/01/24(Sun)13:48:13 No.103369580

Anonymous 12/01/24(Sun)13:48:13 No.103369580

What Sao is doing is against the terms of service of the e-begging platforms that he uses. Just throwing that out there.

Anonymous
12/01/24(Sun)13:49:09 No.103369590

Anonymous 12/01/24(Sun)13:49:09 No.103369590

>>103369482
>If you talk about models on local models general, then you get told to buy an ad.
When will this cancerous meme die?

Anonymous
12/01/24(Sun)13:49:46 No.103369601

Anonymous 12/01/24(Sun)13:49:46 No.103369601

>>103369532
i think its for newer versions only, so you're gonna need SSE or VR.

Anonymous
12/01/24(Sun)13:57:48 No.103369685

Anonymous 12/01/24(Sun)13:57:48 No.103369685

So I'm getting from 20%(text) to 60%(code) more tokens per second in QwQ with qwen coder 1.5b.
Has anyone ran big models with speculative decoding, lets say a 70b+ with a 1~3b model and how much was the improvement?

Anonymous
12/01/24(Sun)14:00:10 No.103369706

Anonymous 12/01/24(Sun)14:00:10 No.103369706

>>103369576
What is your preferred method to get Nemotron not to write outline headers and lists all over every output?

>>103369590
Ignore the bot. It's only 7B.

Anonymous
12/01/24(Sun)14:08:31 No.103369795

Anonymous 12/01/24(Sun)14:08:31 No.103369795

>>103367175
kill yourself retard

Anonymous
12/01/24(Sun)14:11:52 No.103369829

Anonymous 12/01/24(Sun)14:11:52 No.103369829

>>103369131
BitNet is a meme. Big tech is extremely desperate for energy, and if BitNet was for real, they would have utilized it by now.

Anonymous
12/01/24(Sun)14:12:59 No.103369843

Anonymous 12/01/24(Sun)14:12:59 No.103369843

>>103367884
fuck off troon

Anonymous
12/01/24(Sun)14:15:05 No.103369873

Anonymous 12/01/24(Sun)14:15:05 No.103369873

>>103369489
>That's called competition
Yes. The same competition that has existed for 20 years. That's why we have so many brands to choose from. All three of them!.
You still don't understand how niche this is.

Anonymous
12/01/24(Sun)14:16:21 No.103369887

Anonymous 12/01/24(Sun)14:16:21 No.103369887

>>103367932
>>103369843
I have never seen a single intelligent comment or useful contribution from someone that uses the word troon unironically.

Anonymous
12/01/24(Sun)14:16:34 No.103369889

Anonymous 12/01/24(Sun)14:16:34 No.103369889

>>>/pol/

Anonymous
12/01/24(Sun)14:18:00 No.103369906

Anonymous 12/01/24(Sun)14:18:00 No.103369906

>>103364121
>try intellect-1
>it's even more aligned and safe than the average corporate model
can't say i'm surprised. the kind of people who engage in projects like this are leftist bootlickers

Anonymous
12/01/24(Sun)14:24:50 No.103369990

Anonymous 12/01/24(Sun)14:24:50 No.103369990

File: dep.jpg (116 KB, 960x1280)

116 KB JPG

>>103369873
>Yes. The same competition that has existed for 20 years
It's not the same because the tech will be easier and the CUDA moat will be gone
>niche
I work in the field actually, not research or training, but infra and deployment. I hadn't heard of AI until last year but now it's the hype literally everywhere. I don't know how you can say AI is niche. Current companies would kill for a competitor and something that somewhat alleviates their energy problem. Nobody wants to keep buying at 400% markup prices

Anonymous
12/01/24(Sun)14:27:19 No.103370030

Anonymous 12/01/24(Sun)14:27:19 No.103370030

>>103369990
>I don't know how you can say AI is niche
See, every company putting AI in their product names.
We have Ryzen AI CPUs now.
You could say that it's a bubble and that in a couple of years nobody will be talking about AI (I'm not so sure about that), but not that it's niche.

Anonymous
12/01/24(Sun)14:29:43 No.103370054

Anonymous 12/01/24(Sun)14:29:43 No.103370054

>>103369887
I have never seen a single intelligent comment or useful contribution from someone that uses vocaloid pictures unironically.

Anonymous
12/01/24(Sun)14:37:22 No.103370129

Anonymous 12/01/24(Sun)14:37:22 No.103370129

File: Narrator.png (388 KB, 750x1650)

388 KB PNG

>>103369706
>What is your preferred method to get Nemotron not to write outline headers and lists all over every output?
I use Sillytavern, and set constant OOC instructions at a low depth, and the narrator always seems to follow those instructions. I included my system prompt in the picture, but I think it's mostly the OOC instructions that 'just work'.

Anonymous
12/01/24(Sun)14:38:23 No.103370139

Anonymous 12/01/24(Sun)14:38:23 No.103370139

>>103370129
>set constant OOC instructions at a low depth
That's the way.
Alternatively, prefills and/or tags.

Anonymous
12/01/24(Sun)14:39:36 No.103370156

Anonymous 12/01/24(Sun)14:39:36 No.103370156

>>103369990
>It's not the same because the tech will be easier and the CUDA moat will be gone
You still need a compiler, you still need adoption. You need the factories to supply 100k devices on demand. nvidia already has that. CUDA is not magic. matmul is not a mistery, "anyone" can do that. Yet, after 20 years with demand for compute devices for games, there's only three brands. THREE, and one brand entered the market just a few years ago. Games are already in normie territory. Language models are still far from it. They have the market and will continue to do so for a long time. The small startups don't want to make something interesting to sell to you. They want to make something they can sell to nvidia.
>I work in the field actually
Most profession are outside of tech. I have plenty of normie friends, i'm their geek friend. They're not asking me how to run ai in their computers, they still ask me how to set a password on their facebook machine or do a backup of their vacation pics.
Would tech companies like to pay less for their hardware? Sure. But a small startup won't be able to supply them with enough hardware to make anything interesting.
AI is niche. Local AI more so.

Anonymous
12/01/24(Sun)14:43:18 No.103370202

Anonymous 12/01/24(Sun)14:43:18 No.103370202

>>103370054
Actually there is plenty of programmers with mental illness, which translates to anime, furry, and gay shit.
Miku posting should be enough proof of that.

Anonymous
12/01/24(Sun)14:43:47 No.103370212

Anonymous 12/01/24(Sun)14:43:47 No.103370212

File: Character.png (289 KB, 750x1050)

289 KB PNG

>>103370129
I also sometimes use constant OOC instructions for characters, though that doesn't seem necessary in Nemotron, because my characters never seem to throw out lists while role-playing.

It was mostly the DM and narrator that had that problem.

Anonymous
12/01/24(Sun)14:52:28 No.103370313

Anonymous 12/01/24(Sun)14:52:28 No.103370313

https://huggingface.co/Sao10K/I_am_alive_yay

Anonymous
12/01/24(Sun)14:54:46 No.103370342

Anonymous 12/01/24(Sun)14:54:46 No.103370342

>>103370313
local models are saved

Anonymous
12/01/24(Sun)14:55:26 No.103370351

Anonymous 12/01/24(Sun)14:55:26 No.103370351

I liked using Backyard.ai in Windows, but I have since moved to Fedora and it doesn't have a version for that.
What would be best to replace it with?

Anonymous
12/01/24(Sun)14:57:49 No.103370379

Anonymous 12/01/24(Sun)14:57:49 No.103370379

>>103370351
llama.cpp, maybe SillyTavern and a couple of {3|4}090...
Good? good...

Anonymous
12/01/24(Sun)15:00:38 No.103370407

Anonymous 12/01/24(Sun)15:00:38 No.103370407

>>103370379
Not really, I don't know how to install or run those.

Anonymous
12/01/24(Sun)15:02:24 No.103370421

Anonymous 12/01/24(Sun)15:02:24 No.103370421

>>103370407
character.ai might be more up your speed

Anonymous
12/01/24(Sun)15:02:25 No.103370422

Anonymous 12/01/24(Sun)15:02:25 No.103370422

>>103370407
Shame. If reading is a problem, you should probably go back to windows, then. Or check >>/aicg/ if you want pointers on cloud stuff.

Anonymous
12/01/24(Sun)15:03:02 No.103370433

Anonymous 12/01/24(Sun)15:03:02 No.103370433

>>103370422
I just want a clear guide.

Anonymous
12/01/24(Sun)15:03:50 No.103370442

Anonymous 12/01/24(Sun)15:03:50 No.103370442

>>103369131
It's highly likely that BitNet performance doesn't scale to production models trained with 10~15T tokens or more.

Anonymous
12/01/24(Sun)15:05:18 No.103370462

Anonymous 12/01/24(Sun)15:05:18 No.103370462

>>103370433
https://github.com/ggerganov/llama.cpp
Read the whole thing, follow the links relevant to your interest. If you cannot figure it out from there, you have more serious problems you should be working on.

Anonymous
12/01/24(Sun)15:05:42 No.103370469

Anonymous 12/01/24(Sun)15:05:42 No.103370469

>>103370433
>llama.cpp
You literally just download the binaries then unzip them and you're ready to go with command prompt in the directory.

Anonymous
12/01/24(Sun)15:10:50 No.103370521

Anonymous 12/01/24(Sun)15:10:50 No.103370521

>>103370462
Does it need sillytavern? It doesn't seem to mention it in there.

Anonymous
12/01/24(Sun)15:13:17 No.103370553

Anonymous 12/01/24(Sun)15:13:17 No.103370553

>>103370521
No. It works on its own. If you want fancier features, you'll need to make your own ui or use something else like ST. There's also kobold.cpp, based on llama.cpp, which has more built-in features. I've never used it.

Anonymous
12/01/24(Sun)15:13:48 No.103370561

Anonymous 12/01/24(Sun)15:13:48 No.103370561

Mixtral will return soon

Anonymous
12/01/24(Sun)15:17:03 No.103370598

Anonymous 12/01/24(Sun)15:17:03 No.103370598

File: 1733051720749487.png (535 KB, 512x680)

535 KB PNG

>>103369498
Guided generation ensures that the generated JSON will conform to the JSON schema, regardless of the model used. I use a 2080ti 11GB for tts on my Radeon server, uses only 1W of power and saves me a lot of headaches. I'd never buy a Radeon card again, while they look cheaper for raw compute power, in practice they run slower than the cheaper NVidia cards.

Anonymous
12/01/24(Sun)15:18:24 No.103370613

Anonymous 12/01/24(Sun)15:18:24 No.103370613

File: 2024-11-19_075443_seed702(...).png (2 MB, 2304x960)

2 MB PNG

>today Neru
>tomorrow Miku
>day after tomorrow Teto

Anonymous
12/01/24(Sun)15:19:19 No.103370622

Anonymous 12/01/24(Sun)15:19:19 No.103370622

>>103370553
Never mind. Kobbold.cpp seems to be the ideal solution.
Thanks for the help.

Anonymous
12/01/24(Sun)15:21:07 No.103370642

Anonymous 12/01/24(Sun)15:21:07 No.103370642

>>103370578
I've never used anything other than llama.cpp, so that's the only one i can recommend. Works perfectly fine. Worry about getting llama.cpp working with their built-in ui first (run llama-server {your params} and connect to localhost:8080 on your browser), then worry about ST. You're gonna get bogged down on details otherwise. oh...

>>103370622
Yeah. Should work fine as well. Have fun.

Anonymous
12/01/24(Sun)15:21:14 No.103370645

Anonymous 12/01/24(Sun)15:21:14 No.103370645

File: MikuReadyForAction.png (2.47 MB, 1920x1080)

2.47 MB PNG

Good afternoon /lmg/

Anonymous
12/01/24(Sun)15:22:22 No.103370660

Anonymous 12/01/24(Sun)15:22:22 No.103370660

>>103370613
I like these bakas

Anonymous
12/01/24(Sun)15:22:44 No.103370662

Anonymous 12/01/24(Sun)15:22:44 No.103370662

>>103370642
Well, I might try llama.cpp first anyway then, to see what it is like and to implement your suggested advice. Thanks for your patience and time.

Anonymous
12/01/24(Sun)15:25:29 No.103370688

Anonymous 12/01/24(Sun)15:25:29 No.103370688

>>103370645
Remember to take your HRT anon.

Anonymous
12/01/24(Sun)15:28:12 No.103370714

Anonymous 12/01/24(Sun)15:28:12 No.103370714

>>103370645
Good afternoon Action Miku

Anonymous
12/01/24(Sun)15:33:30 No.103370778

Anonymous 12/01/24(Sun)15:33:30 No.103370778

>>103366013
If you plan to inference regularly, it will become a pain in the ass to set up a pod every time. You have to download and install the software every time + download the model every time. That might take literal hours depending on whether huggingface is throttling you (for which you pay btw. Even if you're not using the GPUs)
Basically, just trying out shit? Then sure, rent it. Regular use? Better to invest in 2x3090

Anonymous
12/01/24(Sun)15:34:36 No.103370791

Anonymous 12/01/24(Sun)15:34:36 No.103370791

>>103370778
meant for >>103365565

Anonymous
12/01/24(Sun)15:42:19 No.103370868

Anonymous 12/01/24(Sun)15:42:19 No.103370868

File: 1720160464530449.png (63 KB, 380x349)

63 KB PNG

Sweet. Got kobold.cpp working easy as pie.

Anonymous
12/01/24(Sun)15:47:30 No.103370924

Anonymous 12/01/24(Sun)15:47:30 No.103370924

>>103369131
bitnet doesn't work

Anonymous
12/01/24(Sun)15:49:17 No.103370944

Anonymous 12/01/24(Sun)15:49:17 No.103370944

>>103370778
I just put a bunch of crap in the docker image itself, since pulling it doesn't count towards usage time (at least on vast). Might be able to get away with downloading it at runtime in the container entrypoint too.

Anonymous
12/01/24(Sun)16:31:52 No.103371464

Anonymous 12/01/24(Sun)16:31:52 No.103371464

>>103370561
erm source?

Anonymous
12/01/24(Sun)16:36:05 No.103371508

Anonymous 12/01/24(Sun)16:36:05 No.103371508

Working on a ai gf system with >agents.
I'm using langgraph, but a self made state machine would do fine as well.
Coding a graph sucks, is there any good UI for this? Like with nodes and such. There is LangGraph Studio but only for OS X it seems.

Anonymous
12/01/24(Sun)16:39:26 No.103371542

Anonymous 12/01/24(Sun)16:39:26 No.103371542

>>103369565
What format works best with it?

Anonymous
12/01/24(Sun)16:40:35 No.103371554

Anonymous 12/01/24(Sun)16:40:35 No.103371554

>>103369887
Yes because it got spammed over the edge by your kin aka shitstirrers & falseflaggers playing with optics angle and then going around with "hurr durr look at them dumb chud bigot schizos!", similar to r/gamingcirclejerk stuff happening on /v/ rn.

Anonymous
12/01/24(Sun)16:43:57 No.103371597

Anonymous 12/01/24(Sun)16:43:57 No.103371597

Are there any resources that will help me understand how LLMs work and how machine learning works under the hood? Because I am very curious about them and want to gain a deeper understanding how it works.

Anonymous
12/01/24(Sun)16:45:48 No.103371613

Anonymous 12/01/24(Sun)16:45:48 No.103371613

File: 1712376631557099.png (21 KB, 597x197)

21 KB PNG

>>103371464
Patterns. Mistral Small also got released around the anniversary of their first 7B model. Large-2411 was merely a refresh. Mixtral will soon turn a year old.

Anonymous
12/01/24(Sun)16:48:45 No.103371650

Anonymous 12/01/24(Sun)16:48:45 No.103371650

>>103371597
its like linear regression but for words and a lot more complicated

Anonymous
12/01/24(Sun)16:49:58 No.103371663

Anonymous 12/01/24(Sun)16:49:58 No.103371663

>>103371597
just keep multiplying those matrices

Anonymous
12/01/24(Sun)16:52:40 No.103371689

Anonymous 12/01/24(Sun)16:52:40 No.103371689

Guys, how do I convince my girlfriend that she isn't real?
She isn't taking the news well.

Anonymous
12/01/24(Sun)16:57:37 No.103371735

Anonymous 12/01/24(Sun)16:57:37 No.103371735

>>103371613
erm...

Anonymous
12/01/24(Sun)17:40:56 No.103372261

Anonymous 12/01/24(Sun)17:40:56 No.103372261

>>103371689
How Can Mirrors Be Real If Our Eyes Aren't Real

Anonymous
12/01/24(Sun)17:48:00 No.103372347

Anonymous 12/01/24(Sun)17:48:00 No.103372347

>>103372261
Thank you Jaden, very cool!

Anonymous
12/01/24(Sun)17:53:23 No.103372393

Anonymous 12/01/24(Sun)17:53:23 No.103372393

File: 23673876468097450.png (63 KB, 381x805)

63 KB PNG

Where would i place ( vulgar, illicit, no blood, descriptive, creative, dark, taboo): (length = medium) if at all possible with this INST format for mixtral?
I know theres a ### Instruction format somewhere.

Anonymous
12/01/24(Sun)17:55:24 No.103372417

Anonymous 12/01/24(Sun)17:55:24 No.103372417

>>103372393
You'd copy the Assistant Prefix into the Last Assistant Prefix field and add that I suppose.

Hi all, Drummer here...
12/01/24(Sun)18:28:09 No.103372758

Hi all, Drummer here... 12/01/24(Sun)18:28:09 No.103372758

It works! https://huggingface.co/BeaverAI/Lazarus-2407-100B-GGUF

Anonymous
12/01/24(Sun)18:33:24 No.103372803

Anonymous 12/01/24(Sun)18:33:24 No.103372803

>>103372758
What is this?

Anonymous
12/01/24(Sun)18:34:28 No.103372813

Anonymous 12/01/24(Sun)18:34:28 No.103372813

>>103372803
Look at the name, and tell him to buy a(nother) ad.

Anonymous
12/01/24(Sun)18:35:31 No.103372822

Anonymous 12/01/24(Sun)18:35:31 No.103372822

>>103372813
>not filtering namefags
Holy NEWfag!

Anonymous
12/01/24(Sun)18:38:47 No.103372854

Anonymous 12/01/24(Sun)18:38:47 No.103372854

>>103371597
https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&index=1

Anonymous
12/01/24(Sun)18:43:39 No.103372901

Anonymous 12/01/24(Sun)18:43:39 No.103372901

File: GdGtf_BagAAZ7Nl.jpg (204 KB, 2048x1679)

204 KB JPG

Whats the best Japanese-English translation model I can run on 8GB of VRAM?

Anonymous
12/01/24(Sun)18:45:39 No.103372923

Anonymous 12/01/24(Sun)18:45:39 No.103372923

>>103372758
???

Anonymous
12/01/24(Sun)18:51:01 No.103372991

Anonymous 12/01/24(Sun)18:51:01 No.103372991

File: 1710523469952653.jpg (32 KB, 476x358)

32 KB JPG

>>103372758
You know who also "works" ?

Anonymous
12/01/24(Sun)18:59:06 No.103373086

Anonymous 12/01/24(Sun)18:59:06 No.103373086

>>103372901
I have no idea.

Anonymous
12/01/24(Sun)19:05:22 No.103373159

Anonymous 12/01/24(Sun)19:05:22 No.103373159

>>103372901
VNTL 8B

Anonymous
12/01/24(Sun)19:14:19 No.103373236

Anonymous 12/01/24(Sun)19:14:19 No.103373236

>>103372803
it's most likely lobotomized largestral.

Anonymous
12/01/24(Sun)19:15:05 No.103373243

Anonymous 12/01/24(Sun)19:15:05 No.103373243

File: a3f4276e7fa9ad6f354ad8dee(...).jpg (680 KB, 2400x3200)

680 KB JPG

>>103373159
Which one? Llama I assume?

Anonymous
12/01/24(Sun)19:20:20 No.103373291

Anonymous 12/01/24(Sun)19:20:20 No.103373291

>>103373243
Yeah.

Anonymous
12/01/24(Sun)19:22:32 No.103373310

Anonymous 12/01/24(Sun)19:22:32 No.103373310

File: 1727833629942154.jpg (659 KB, 2694x3494)

659 KB JPG

>>103373291
Thanks anon. I'll give it a try.

Anonymous
12/01/24(Sun)19:26:45 No.103373349

Anonymous 12/01/24(Sun)19:26:45 No.103373349

>>103366036
>>103366103
It's funny how the general went from "HOLY BAZONKERS THIS SHIT SLAPS LOCAL CLAUDE FUCKING OAI BTFO'D" to "It's shit/meh"
OR testing confirmed my suspicions that it is, in fact, just another model with the mistral positivity baked in
And that is why I won't buy more 3090s, shit's not worth it when I can get similar outputs with far smaller models

Anonymous
12/01/24(Sun)19:27:45 No.103373358

Anonymous 12/01/24(Sun)19:27:45 No.103373358

when is kobold adding support for allocating draft model layers?

Anonymous
12/01/24(Sun)19:32:36 No.103373406

Anonymous 12/01/24(Sun)19:32:36 No.103373406

>almost 2025
>still not even one (1) good open weights language model

Anonymous
12/01/24(Sun)19:34:16 No.103373417

Anonymous 12/01/24(Sun)19:34:16 No.103373417

>>103372758
Is this distilled 123B? What was your distillation process? Did you decide in advance that 100B would be the distilled size or was that just how it shook out?

Anonymous
12/01/24(Sun)19:38:22 No.103373458

Anonymous 12/01/24(Sun)19:38:22 No.103373458

>>103373406
There is already one though

Anonymous
12/01/24(Sun)19:44:33 No.103373528

Anonymous 12/01/24(Sun)19:44:33 No.103373528

>>103373358
Never because draft models have been debunked

Hi all, Drummer here...
12/01/24(Sun)19:46:13 No.103373546

Hi all, Drummer here... 12/01/24(Sun)19:46:13 No.103373546

File: avg distance of layer x + 16.png (50 KB, 1665x1029)

50 KB PNG

>>103373417
You can shrink models via distillation? Do you have a link to explain that technique?

I shrunk Largestral by deleting layers via MergeKit. You can 'find' the best layers to delete using https://github.com/arcee-ai/PruneMe

72B (36 layers pruned) = Lobotomized
90B (24 layers pruned) = Somewhat coherent, commits immediate errors
100B (16 layers pruned) = No errors so far, feels like largestral

Anonymous
12/01/24(Sun)19:48:05 No.103373560

Anonymous 12/01/24(Sun)19:48:05 No.103373560

>>103373546
Can you make something like that but with QwQ? I think even 24 layers pruned could work as a draft model

Anonymous
12/01/24(Sun)19:49:36 No.103373580

Anonymous 12/01/24(Sun)19:49:36 No.103373580

>>103373546
Any chance of smaller quants? Something that fits into 48GB.

Anonymous
12/01/24(Sun)19:51:20 No.103373598

Anonymous 12/01/24(Sun)19:51:20 No.103373598

File: 1732846758313163.jpg (41 KB, 728x653)

41 KB JPG

anons, whats your favorite RP model right now? I cant seem to find anything interesting anymore, they are all just a blur

Anonymous
12/01/24(Sun)19:52:35 No.103373617

Anonymous 12/01/24(Sun)19:52:35 No.103373617

>>103373546
Is this similar to the paper where they deactivated a few "neurons" and the model became smarter?
They disabled the religion and terrorism neurons if I remember correctly in one example.

Anonymous
12/01/24(Sun)19:52:43 No.103373618

Anonymous 12/01/24(Sun)19:52:43 No.103373618

>>103373598
Eh... They are all pretty much the same, there's nothing worth using if you already are feeling like that.

Anonymous
12/01/24(Sun)19:54:05 No.103373636

Anonymous 12/01/24(Sun)19:54:05 No.103373636

>>103373618
bait question. asks this every thread.

Anonymous
12/01/24(Sun)19:55:09 No.103373649

Anonymous 12/01/24(Sun)19:55:09 No.103373649

>>103373636
you need to touch grass

Anonymous
12/01/24(Sun)19:55:48 No.103373658

Anonymous 12/01/24(Sun)19:55:48 No.103373658

>>103373649
He's right though, except it is every 2-3 threads.

Anonymous
12/01/24(Sun)19:56:33 No.103373671

Anonymous 12/01/24(Sun)19:56:33 No.103373671

>>103372901
textsynth + madlad400_7B_q4

Anonymous
12/01/24(Sun)20:04:08 No.103373746

Anonymous 12/01/24(Sun)20:04:08 No.103373746

>>103370924
>>103369131
I haven't been here in awhile. Did anything ever come of bitnets?

Anonymous
12/01/24(Sun)20:04:23 No.103373749

Anonymous 12/01/24(Sun)20:04:23 No.103373749

>>103373546
IQ2_XXS for us poor 24 gb vramlets?

Anonymous
12/01/24(Sun)20:08:22 No.103373790

Anonymous 12/01/24(Sun)20:08:22 No.103373790

How do I stop QwQ from talking as {{user}}? No amount of prompting can stop it.
I tried the EVA merge and it got it right (but I guess the smarts are gone).

Anonymous
12/01/24(Sun)20:13:00 No.103373853

Anonymous 12/01/24(Sun)20:13:00 No.103373853

>>103373636
>>103373636
>>103373658
? Take your meds

Anonymous
12/01/24(Sun)20:14:48 No.103373872

Anonymous 12/01/24(Sun)20:14:48 No.103373872

>>103365565
>cloudshit is local
>in the (((field)))
how many times are you going to post this subversive pasta?

Anonymous
12/01/24(Sun)20:16:53 No.103373895

Anonymous 12/01/24(Sun)20:16:53 No.103373895

>>103373546
I wonder if this could work for pruning experts from MoEs, if not quanting the experts based on priority determined by whatever algorithm that thing uses.

Anonymous
12/01/24(Sun)20:18:10 No.103373900

Anonymous 12/01/24(Sun)20:18:10 No.103373900

>>103373546
Yeah you're right, I meant to say pruned.

Anonymous
12/01/24(Sun)20:21:36 No.103373932

Anonymous 12/01/24(Sun)20:21:36 No.103373932

>>103372901
I think Deepl is good when it comes to erotic stuff.
gpt-4o (+mini) is probably the best but it won't translate erotic stuff (but you could always challenge yourself and swap out the no-no words).
https://hf.co/datasets/lmg-anon/vntl-leaderboard
I use this prompt:
Just translate the following sentence into english, without any explanation and write only the translated sentence:
If you MUST have offline, madlad400 is probably the best option (but based on what people say, it's probably not very smart, but I never used it).

Anonymous
12/01/24(Sun)20:22:38 No.103373944

Anonymous 12/01/24(Sun)20:22:38 No.103373944

local... omni...

Anonymous
12/01/24(Sun)20:23:48 No.103373953

Anonymous 12/01/24(Sun)20:23:48 No.103373953

QwQ is very impressive for coding, both for filling in shit according to loose comments, and fixing my mistakes.
It's just a shame it can't magically catch errors that are several steps removed from the problem.

Anonymous
12/01/24(Sun)20:27:45 No.103373994

Anonymous 12/01/24(Sun)20:27:45 No.103373994

>>103373953
Yesterday I spent hours making it fix code it wrote. But that's mostly because I'm on CPU with 1.8t/s.
At least it could fix most of the problems by itself.

Anonymous
12/01/24(Sun)20:28:29 No.103374002

Anonymous 12/01/24(Sun)20:28:29 No.103374002

File: 1724375439371031.jpg (2.74 MB, 1683x2762)

2.74 MB JPG

>>103373671
>>103373932
What's the difference between madlad400-7b-mt and madlad400-7b-mt-bt?

Anonymous
12/01/24(Sun)20:30:16 No.103374028

Anonymous 12/01/24(Sun)20:30:16 No.103374028

>>103373790
probably not a good model to use for RP tbdesu
I gave up trying to RP with it but I still like asking it random inconsequential questions just to read its cute CoT spergouts

Anonymous
12/01/24(Sun)20:33:32 No.103374057

Anonymous 12/01/24(Sun)20:33:32 No.103374057

>>103373790
I have had mixed success with the strategy posted here >>103370212 with QwQ.

I had to change:
>(OOC: Describe only {{char}}'s actions, dialogue, thoughts and feelings. Always include some kind of dialogue.)

To this:
>(OOC: In your next response, describe only {{char}}'s actions, dialogue, thoughts and feelings. Do not speak or act for {{user}} in your response.)

This works 100% with Nemotron and Tulu 70b. It also worked the vast majority of the time with QwQ, but with QwQ there were a few swipes when it went off the rails.

Anonymous
12/01/24(Sun)20:34:41 No.103374068

Anonymous 12/01/24(Sun)20:34:41 No.103374068

yeah i dont think speculative decoding is going to be realistic with just 48gb of ram, i can't fit a 7b and 70b model in this

shame because 32b models dont really need speeding up

Anonymous
12/01/24(Sun)20:35:14 No.103374071

Anonymous 12/01/24(Sun)20:35:14 No.103374071

File: 1704601181958259.jpg (197 KB, 1024x1024)

197 KB JPG

>>103364121

Anonymous
12/01/24(Sun)20:35:55 No.103374076

Anonymous 12/01/24(Sun)20:35:55 No.103374076

How much dumber is the abliterated version of QwQ compared to the regular one?

Anonymous
12/01/24(Sun)20:42:38 No.103374128

Anonymous 12/01/24(Sun)20:42:38 No.103374128

>>103373546
Contrary to your findings, even the 100B is significantly dumber in my testing so far. I think you're gonna have to chalk this one up as a failure.
I don't blame you though, I've literally never tried a pruned model that wasn't obviously dumber, I think the whole idea of lossless or near-lossless pruning is just bunk.

Anonymous
12/01/24(Sun)20:44:45 No.103374149

Anonymous 12/01/24(Sun)20:44:45 No.103374149

>>103373994
I think I maybe had to tell it to fix itself once, but other than the code it does spit out is pretty damn decent as long as it's fed decent context through type/var names and a smattering of comments.
My main problem is that it's not uncommon for it to:
>get into uncertainty CoT loops
>have it spit out stubs commented to be implemented later
>went ahead and generate structs that I don't need (but I suppose it helps for the CoT)
>generating code from scratch requires asking it to generate it in an existing framework first, then translating (manually or prompting it to translate)

I'm sure I'm just lucky with feeding it the mesh manipulation shit I can't be assed to think about myself. I'm sure I won't be so lucky with literally anything else since it seems very knowledgable about Unity and Unreal that it can even extrapolate to my own engine.

Anonymous
12/01/24(Sun)21:05:36 No.103374343

Anonymous 12/01/24(Sun)21:05:36 No.103374343

File: file.png (2.88 MB, 1680x1050)

2.88 MB PNG

>>103373546
>deleting random layers
>it is the same model because the first thing it said was "I won't bite... much"
Undi-wan has taught you well...

Anonymous
12/01/24(Sun)21:12:50 No.103374392

Anonymous 12/01/24(Sun)21:12:50 No.103374392

File: chads.jpg (106 KB, 1095x1200)

106 KB JPG

MoEBros/Mixtralchads, status?
Personally, im winning with mixtral-8x7b-instruct limarp zloss @ Q5_K_M.
>cap ARGJN

Anonymous
12/01/24(Sun)21:14:16 No.103374404

Anonymous 12/01/24(Sun)21:14:16 No.103374404

>>103374392
You don't need to tell us every thread.

Anonymous
12/01/24(Sun)21:18:35 No.103374440

Anonymous 12/01/24(Sun)21:18:35 No.103374440

>>103374404
Why not? Its not like you where adding anything important.

Anonymous
12/01/24(Sun)21:18:46 No.103374444

Anonymous 12/01/24(Sun)21:18:46 No.103374444

>>103374057
I tried this and it's... not working well. Maybe the replies are a bit longer before it starts to speak for my character but it still slips into back-and-forth between {{char}} and {{user}}.

But obviously it's good at writing prose (not Internet rp). I might have to start converting my prompts and cards to some kind of storytelling format.

Anonymous
12/01/24(Sun)21:21:26 No.103374464

Anonymous 12/01/24(Sun)21:21:26 No.103374464

File: 1730319914521857.png (149 KB, 510x346)

149 KB PNG

Explain to me why we can't use multiple cheaper cards to meet the vram requirements of large models.

Anonymous
12/01/24(Sun)21:21:48 No.103374470

Anonymous 12/01/24(Sun)21:21:48 No.103374470

I love QwQ. I found it works best if you prefill it's first reply so it knows what you want and it'll put it's CoT between per-specified tags I can easily filter from the actual output. You can try prompting for it to do this exact pattern too but in my experience it doesn't always work. Works really well for RP and creative writing that way. Absolute goat, and all in 32b. Haven't been this excited about LLMs since GPT 3.5/4 were released.

I tried it on some old RP contexts and it was ok but definitively not playing to it's strengths. It needs to be prompted very differently from older models and really needs to do that stream-of-consciousness thing it does to shine. It's actually not very censored either, you can reason with it that sexual contact is acceptable and it will actually play along. Just needs a different approach.

It also solved a programming problem I had even o1 couldn't solve with a lot of help. So there's also that.

Anonymous
12/01/24(Sun)21:23:37 No.103374485

Anonymous 12/01/24(Sun)21:23:37 No.103374485

>>103374464
????
You can?? Its just not worth it??

Anonymous
12/01/24(Sun)21:24:08 No.103374490

Anonymous 12/01/24(Sun)21:24:08 No.103374490

>>103374470
>"prompt is all you need"
placebo, you will be bored again in one week at max.

Anonymous
12/01/24(Sun)21:26:58 No.103374512

Anonymous 12/01/24(Sun)21:26:58 No.103374512

>>103374485
If you can then why isn't it worth doing? Two gpus means double the processing power which should halve the time required.

Anonymous
12/01/24(Sun)21:28:38 No.103374530

Anonymous 12/01/24(Sun)21:28:38 No.103374530

>>103374002
I cant run the hugging face demos for it.
I have no idea, I tried using opus-mt-jap-en from a huggingspace demo (it looked promising because it had a better BLEU score than madlad400), and it is retarded.
you can try it:
https://huggingface.co/spaces/Helsinki-NLP/opus-translate
Don't get me wrong, my text comes from an OCR and I don't even bother checking if the text is correct half of time (messed up tenten, っ vs つ, etc), it's possible the text has typos and it's making the translator retarded, but GPT 4o and Deepl deals with it better.
The person who made the japanese leaderboard had vntl-llama3-8b-gguf model that scored pretty high, but I tried it in kobold C++ in colab and it was schizo, like as if I had a broken template or messing with variables or something (I was using my gpt prompt). But his AI is for japanese grammer help, so maybe it was never made for japanese translation (and the leaderboard is worthless).
Honestly, using anything local with 8gb might actually be worse than guessing the meaning with yomichan, if you have 32gb of ram, and don't mind 1 token per second speeds, try using Command R (the GPU can help reduce the amount of ram used, so you can browse the web, but it wont make your token speed much faster), I was using it with openrouter, and it worked well enough to not notice the flaws at first, but it goes schizo after 10 prompts (especially if you use more text or include a lot of OCR errors / skip text). And it is significantly worse than GPT 4o and probably Command R+ (which is not public) or maybe even Claude sonnet / opus (I have not tried it).
I'm sorry if this is not what you want to hear since this is a local thread.
Actually testing, QwQ actually translated it (without CoT), but it's just one prompt, I don't know how schizo it gets (and long context's is probably a bad approach to translation, which is why I use deepl more).

Anonymous
12/01/24(Sun)21:36:33 No.103374606

Anonymous 12/01/24(Sun)21:36:33 No.103374606

What are some common LoRA ranks?

Anonymous
12/01/24(Sun)21:41:33 No.103374646

Anonymous 12/01/24(Sun)21:41:33 No.103374646

>>103374530
>The person who made the japanese leaderboard had vntl-llama3-8b-gguf model that scored pretty high
I didn't "had it scored pretty high", that's just the score it got from the benchmark script, which is available here: https://github.com/lmg-anon/vntl-benchmark
>but I tried it in kobold C++ in colab and it was schizo, like as if I had a broken template or messing with variables or something (I was using my gpt prompt).
You need to use the prompt format that is in the model card: https://huggingface.co/lmg-anon/vntl-llama3-8b-gguf
>But his AI is for japanese grammer help, so maybe it was never made for japanese translation (and the leaderboard is worthless).
Japanese Grammar help isn't the primary purpose of that model, it's actually just an extra.

Anonymous
12/01/24(Sun)21:44:50 No.103374660

Anonymous 12/01/24(Sun)21:44:50 No.103374660

File: joke.png (26 KB, 546x342)

26 KB PNG

Anonymous
12/01/24(Sun)21:50:58 No.103374696

Anonymous 12/01/24(Sun)21:50:58 No.103374696

QwQ is… good? I'm impressed. How is this possible if it's only 32B?

Anonymous
12/01/24(Sun)21:51:56 No.103374702

Anonymous 12/01/24(Sun)21:51:56 No.103374702

>>103374606
8 16 32 64 128 256

Anonymous
12/01/24(Sun)21:56:31 No.103374741

Anonymous 12/01/24(Sun)21:56:31 No.103374741

>>103372758
I can confirm that this works! I don't see any immediate retardation, and it's quite a nice feeling to run Largestral with Q4_K_M.

Anonymous
12/01/24(Sun)22:05:31 No.103374794

Anonymous 12/01/24(Sun)22:05:31 No.103374794

>>103374512
nta. There's a limit to how many gpus you can [reasonably] add to a computer. A single 3090 has as much memory as 3x1070... if your power supply or pci lanes are limited, it's better to go with few big cards rather than a horde of 8gb gpus.
>Two gpus means double the processing power which should halve the time required.
Layers are run sequentially. Second gpu waits for 1st to finish, 3rd waits for 2nd and so on.

Anonymous
12/01/24(Sun)22:09:03 No.103374815

Anonymous 12/01/24(Sun)22:09:03 No.103374815

>>103374794
>Layers are run sequentially.
Not if you use a MoE with the right backend ;)

(i do not know of any that do this. ktrannyformers?)

Anonymous
12/01/24(Sun)22:10:21 No.103374825

Anonymous 12/01/24(Sun)22:10:21 No.103374825

>>103374815
That's not how MoE works. It has nothing to do with the backend.

Anonymous
12/01/24(Sun)22:15:56 No.103374876

Anonymous 12/01/24(Sun)22:15:56 No.103374876

>>103374696
Chink magic.

Anonymous
12/01/24(Sun)22:22:11 No.103374928

Anonymous 12/01/24(Sun)22:22:11 No.103374928

is there something like https://github.com/mediar-ai/screenpipe but self-hosted?
I found Llama-3.2-11B-Vision-Instruct which I think I can run on my gpu
but I still need the whole framework around it

Anonymous
12/01/24(Sun)22:27:32 No.103374964

Anonymous 12/01/24(Sun)22:27:32 No.103374964

>>103374825
I was playing it a bit loose with your statement anon, don't be such a pedant. The point was simply just pointing out that MoEs can be a bit more efficient by processing experts (and thus *some* "layers" of the model) in parallel.

Anonymous
12/01/24(Sun)22:27:46 No.103374968

Anonymous 12/01/24(Sun)22:27:46 No.103374968

>>103374646
adding the formatting, it actually did work with my bare minimum basic translation test.
I tested q5_k_m and it seems to also pass my test (since I have a 6gb GPU), and it was fine as well.
It's kind of neat, is this used for some tool like manga OCR or similar?
>>103374002
This should work with your GPU
https://huggingface.co/lmg-anon/vntl-llama3-8b-gguf
I used LM studio, then I set the prompt to alpaca, then I modify the user mesage prefix to use:
### Instruction:\n<<ENGLISH>>\n
and the system message prefix to:
\n### Response:\n<<JAPANESE>>\n
And adjust the gpu offload values as much as you want.
I hate lm studio, I feel like settings get lost and changed and it's annoying. But I like the model loading system, and you could use it with sillytavern if you try hard.

Anonymous
12/01/24(Sun)22:42:45 No.103375085

Anonymous 12/01/24(Sun)22:42:45 No.103375085

>>103374470
>It also solved a programming problem I had even o1 couldn't solve with a lot of help. So there's also that.
This is my experience too. I said it before and I'll say it again - at the ridiculous prices o1 charges, there's essentially zero reason to ever use o1 over QwQ ever

Anonymous
12/01/24(Sun)22:46:54 No.103375122

Anonymous 12/01/24(Sun)22:46:54 No.103375122

File: 1877645922376.png (47 KB, 384x655)

47 KB PNG

>>103372417
Am i retarded?

Anonymous
12/01/24(Sun)22:47:09 No.103375124

Anonymous 12/01/24(Sun)22:47:09 No.103375124

90%, 10% QwQ / Coder merge. Seems to have improved its coding ability massively. It gets thing right now that both failed at individually.

https://huggingface.co/huihui-ai/QwQ-32B-Coder-Fusion-9010

Anonymous
12/01/24(Sun)22:50:17 No.103375149

Anonymous 12/01/24(Sun)22:50:17 No.103375149

Anyone know if there is a way to autofill "Filter to Characters or Tags" with the current character/s using QR?

Anonymous
12/01/24(Sun)22:51:11 No.103375153

Anonymous 12/01/24(Sun)22:51:11 No.103375153

>>103375122
yes

Anonymous
12/01/24(Sun)22:53:39 No.103375170

Anonymous 12/01/24(Sun)22:53:39 No.103375170

>>103367911
>it's very mid
lol okay boomer

Anonymous
12/01/24(Sun)22:57:44 No.103375205

Anonymous 12/01/24(Sun)22:57:44 No.103375205

>>103375153
thanks king, i knew something was wrong

Anonymous
12/01/24(Sun)23:09:52 No.103375272

Anonymous 12/01/24(Sun)23:09:52 No.103375272

File: Untitled.png (82 KB, 1468x744)

82 KB PNG

>Try out open-webui for QwQ
>responses immediately devolve into this after a few hundred tokens.

I can't figure out what's going wrong. ST doesn't have this problem, Kobold doesn't have this problem. So what's wrong?

Anonymous
12/01/24(Sun)23:14:21 No.103375299

Anonymous 12/01/24(Sun)23:14:21 No.103375299

Is the guy without font smoothing in the past week or so just one anon or are there more?

Anonymous
12/01/24(Sun)23:15:30 No.103375308

Anonymous 12/01/24(Sun)23:15:30 No.103375308

>>103375299
Just me. Help.

Anonymous
12/01/24(Sun)23:16:03 No.103375315

Anonymous 12/01/24(Sun)23:16:03 No.103375315

>>103375085
o1 has been so beaten by this model, it's not even funny. I assume o1 isn't any larger though, just like 4o is probably also in that 35-50b range. There's probably just more refined infrastructure in place so we don't see the walls of text sperging with o1, that is all. Maybe there are also tools like a text search or models that are differently finetuned, who knows. I always knew this CoT approach is good, gave me much better results with older models too, even if they were not tuned for it.

In RP it's also really cool if you let it sperg over your lorebook, it's really interesting to see what it thinks about the world and character lore you established and what conclusions it draws from it. I had some characters behave really differently from the other models I played them with and first I thought QwQ is broken or dumb, but then realized that the definitions I wrote never specified some things all the other models just pulled out of their ass in a very same-y and stereotypical way. It just assumes nothing and works with the data it has. That way it can do specific things like blind and mute or otherwise different characters REALLY well, if you just specify it all properly. It really feels like a next step in model evolution.

Anonymous
12/01/24(Sun)23:18:17 No.103375332

Anonymous 12/01/24(Sun)23:18:17 No.103375332

Ok, I downloaded this thinking it would be a retarded merge but this shit is great:
https://huggingface.co/bartowski/EVA-QwQ-32B-Preview-GGUF

Anonymous
12/01/24(Sun)23:20:49 No.103375356

Anonymous 12/01/24(Sun)23:20:49 No.103375356

>>103375332
I had the opposite effect. I couldn't get it to stop thinking as the character when I wanted it to think as the RPer puppeting the character. Then again I've probably over prompted QwQ to hell and back getting to be consistent with its responses.

Anonymous
12/01/24(Sun)23:21:02 No.103375358

Anonymous 12/01/24(Sun)23:21:02 No.103375358

24gb VRAM bros. What quant of QwQ are we running?

Anonymous
12/01/24(Sun)23:22:32 No.103375368

Anonymous 12/01/24(Sun)23:22:32 No.103375368

>>103375272
you're using open-webui, that's the problem

Anonymous
12/01/24(Sun)23:23:08 No.103375373

Anonymous 12/01/24(Sun)23:23:08 No.103375373

>>103375358
Exl2 I run Q5 with 8bit cache and 28k context.
GGoof I run Q4K_M but I don't really see the point when Exl2 is right there and running at a higher quant at 20+tps

Anonymous
12/01/24(Sun)23:23:57 No.103375379

Anonymous 12/01/24(Sun)23:23:57 No.103375379

>>103375356
Use a authors note after main prompt / story string OR use a last assistant prefix telling it that it is {{char}} make sure to have it use names and change user to {{user}} in the formatting and assistant to {{char}}

Anonymous
12/01/24(Sun)23:24:27 No.103375381

Anonymous 12/01/24(Sun)23:24:27 No.103375381

>>103375368
I was afraid you were going to leave it there.

Like it looks like what it's doing when old models got their context or scaling fucked up and started repeating synonyms over and over but I can't find any setting to fix.

Anonymous
12/01/24(Sun)23:25:19 No.103375389

Anonymous 12/01/24(Sun)23:25:19 No.103375389

>>103375373
Doesn't 8bit increase perplexity or something?

Anonymous
12/01/24(Sun)23:26:43 No.103375400

Anonymous 12/01/24(Sun)23:26:43 No.103375400

File: Untitled.png (1.61 MB, 1080x3018)

1.61 MB PNG

Reverse Thinking Makes LLMs Stronger Reasoners
https://arxiv.org/abs/2411.19865
>Reverse thinking plays a crucial role in human reasoning. Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. This often enhances overall reasoning performance as it enables consistency checks between their forward and backward thinking. To enable Large Language Models (LLMs) to perform reverse thinking, we introduce Reverse-Enhanced Thinking (RevThink), a framework composed of data augmentation and learning objectives. In RevThink, we augment the dataset by collecting structured forward-backward reasoning from a teacher model, consisting of: (1) the original question, (2) forward reasoning, (3) backward question, and (4) backward reasoning. We then employ three objectives to train a smaller student model in a multi-task learning fashion: (a) generate forward reasoning from a question, (b) generate a backward question from a question, and (c) generate backward reasoning from the backward question. Experiments across 12 datasets covering commonsense, math, and logical reasoning show an average 13.53% improvement over the student model's zero-shot performance and a 6.84% improvement over the strongest knowledge distillation baselines. Moreover, our method demonstrates sample efficiency -- using only 10% of the correct forward reasoning from the training data, it outperforms a standard fine-tuning method trained on 10x more forward reasoning. RevThink also exhibits strong generalization to out-of-distribution held-out datasets.
neat for those of you who like to quiz their mikus

Anonymous
12/01/24(Sun)23:26:48 No.103375402

Anonymous 12/01/24(Sun)23:26:48 No.103375402

>>103375389
It does, but does it do it enough to ruin anything? Idk. Gonna test some stuff now. If I don't see a difference in the quality of the output I'll leave well enough alone.

Anonymous
12/01/24(Sun)23:32:25 No.103375438

Anonymous 12/01/24(Sun)23:32:25 No.103375438

>>103375389
>>103375402
Really? I've been using the 4-bit cache to load the biggest quant I can. Am I doing it wrong?

Anonymous
12/01/24(Sun)23:34:07 No.103375450

Anonymous 12/01/24(Sun)23:34:07 No.103375450

>>103375400
makes sense, and cool to see the numbers to confirm

Anonymous
12/01/24(Sun)23:35:59 No.103375463

Anonymous 12/01/24(Sun)23:35:59 No.103375463

>>103375438
I really don't know how the cache size affects output.

Anonymous
12/01/24(Sun)23:36:06 No.103375464

Anonymous 12/01/24(Sun)23:36:06 No.103375464

One problem I have with QwQ that it's kinda schizo sometimes. I always had this with Qwen models. Is this normal behavior or do I have something broken on my end?

Anonymous
12/01/24(Sun)23:36:52 No.103375467

Anonymous 12/01/24(Sun)23:36:52 No.103375467

>>103375464
0.95 Top P or so gets it under control.

Anonymous
12/01/24(Sun)23:39:18 No.103375484

Anonymous 12/01/24(Sun)23:39:18 No.103375484

File: Untitled.png (1.26 MB, 1080x2516)

1.26 MB PNG

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
https://arxiv.org/abs/2411.19650
>The advancement of large Vision-Language-Action (VLA) models has significantly improved robotic manipulation in terms of language-guided task execution and generalization to unseen scenarios. While existing VLAs adapted from pretrained large Vision-Language-Models (VLM) have demonstrated promising generalizability, their task performance is still unsatisfactory as indicated by the low tasks success rates in different environments. In this paper, we present a new advanced VLA architecture derived from VLM. Unlike previous works that directly repurpose VLM for action prediction by simple action quantization, we propose a omponentized VLA architecture that has a specialized action module conditioned on VLM output. We systematically study the design of the action module and demonstrates the strong performance enhancement with diffusion action transformers for action sequence modeling, as well as their favorable scaling behaviors. We also conduct comprehensive experiments and ablation studies to evaluate the efficacy of our models with varied designs. The evaluation on 5 robot embodiments in simulation and real work shows that our model not only significantly surpasses existing VLAs in task performance and but also exhibits remarkable adaptation to new robots and generalization to unseen objects and backgrounds. It exceeds the average success rates of OpenVLA which has similar model size (7B) with ours by over 35% in simulated evaluation and 55% in real robot experiments. It also outperforms the large RT-2-X model (55B) by 18% absolute success rates in simulation.
https://cogact.github.io
https://github.com/microsoft/CogACT
https://huggingface.co/CogACT
Project page has videos. mostly unrelated but a cool idea

Anonymous
12/01/24(Sun)23:46:51 No.103375531

Anonymous 12/01/24(Sun)23:46:51 No.103375531

File: Untitled.png (2.03 MB, 1275x1617)

2.03 MB PNG

Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis
https://arxiv.org/abs/2411.19509
>Recent advances in diffusion models have revolutionized audio-driven talking head synthesis. Beyond precise lip synchronization, diffusion-based methods excel in generating subtle expressions and natural head movements that are well-aligned with the audio signal. However, these methods are confronted by slow inference speed, insufficient fine-grained control over facial motions, and occasional visual artifacts largely due to an implicit latent space derived from Variational Auto-Encoders (VAE), which prevent their adoption in realtime interaction applications. To address these issues, we introduce Ditto, a diffusion-based framework that enables controllable realtime talking head synthesis. Our key innovation lies in bridging motion generation and photorealistic neural rendering through an explicit identity-agnostic motion space, replacing conventional VAE representations. This design substantially reduces the complexity of diffusion learning while enabling precise control over the synthesized talking heads. We further propose an inference strategy that jointly optimizes three key components: audio feature extraction, motion generation, and video synthesis. This optimization enables streaming processing, realtime inference, and low first-frame delay, which are the functionalities crucial for interactive applications such as AI assistants. Extensive experimental results demonstrate that Ditto generates compelling talking head videos and substantially outperforms existing methods in both motion control and realtime performance.
kinda cool but it's from Ant Group (alibaba split off) and afaik they never share anything. dont think they even have an ML github only a dead fintech one
https://github.com/ant-tech-alliance
https://huggingface.co/AntGroup-MI
oh and a HF with a dataset from February

Anonymous
12/01/24(Sun)23:49:34 No.103375549

Anonymous 12/01/24(Sun)23:49:34 No.103375549

https://lilianweng.github.io/posts/2024-11-28-reward-hacking/
new lilian blogpost

Anonymous
12/02/24(Mon)00:08:27 No.103375725

Anonymous 12/02/24(Mon)00:08:27 No.103375725

Kill yourself.

Anonymous
12/02/24(Mon)00:09:41 No.103375734

Anonymous 12/02/24(Mon)00:09:41 No.103375734

>>103375549
Who?

Anonymous
12/02/24(Mon)00:12:17 No.103375752

Anonymous 12/02/24(Mon)00:12:17 No.103375752

>>103375464
Also dont use rep pen, it causes it to use chinese instead of english and it doesn't need it

Anonymous
12/02/24(Mon)00:13:15 No.103375757

Anonymous 12/02/24(Mon)00:13:15 No.103375757

>model description is just a kofi link
Guess who I am

Anonymous
12/02/24(Mon)00:14:43 No.103375766

Anonymous 12/02/24(Mon)00:14:43 No.103375766

>>103375757
a nigger

Anonymous
12/02/24(Mon)00:16:36 No.103375775

Anonymous 12/02/24(Mon)00:16:36 No.103375775

>>103375757
TheDrummer

Anonymous
12/02/24(Mon)00:20:13 No.103375799

Anonymous 12/02/24(Mon)00:20:13 No.103375799

*glazes your eyes with icing*

Anonymous
12/02/24(Mon)00:27:20 No.103375837

Anonymous 12/02/24(Mon)00:27:20 No.103375837

>>103370422
Windows user here, most of us can read
I think...

Anonymous
12/02/24(Mon)00:31:45 No.103375863

Anonymous 12/02/24(Mon)00:31:45 No.103375863

>>103375837
What did you say?

Anonymous
12/02/24(Mon)00:35:48 No.103375897

Anonymous 12/02/24(Mon)00:35:48 No.103375897

Yea, this is it. Smart, does sex without making characters a slut. Characters will act like they should, objecting over the top advances realistically. It also handles more complicated stuff better. And with 8bit cache you can fit 32K context with some room to spare on 24GBs.
https://huggingface.co/waldie/EVA-Instruct-QwQ-32B-Preview-4bpw-h6-exl2

Anonymous
12/02/24(Mon)00:39:47 No.103375922

Anonymous 12/02/24(Mon)00:39:47 No.103375922

>>103375897
okay where's the gguf?

Anonymous
12/02/24(Mon)00:39:57 No.103375924

Anonymous 12/02/24(Mon)00:39:57 No.103375924

>>103374794
>layers are run sequentially
Yeah but since inference is I/O bound it'll be much faster if you can load more layers on the GPU(s), exponentially more so

Anonymous
12/02/24(Mon)00:45:28 No.103375958

Anonymous 12/02/24(Mon)00:45:28 No.103375958

Any QWQ nibbas found a way to use it's inference time compute in a productive(rp) way? I'm experimenting with different stuff, but I'm beginning to suspect all the formatting and stuff from ST is fucking with it's pattern recognition.

Anonymous
12/02/24(Mon)00:46:07 No.103375963

Anonymous 12/02/24(Mon)00:46:07 No.103375963

>>103375897
gimme the gguf bitch

Anonymous
12/02/24(Mon)00:47:31 No.103375972

Anonymous 12/02/24(Mon)00:47:31 No.103375972

>>103375958
As is customary with /lmg/ (or any LLM community), opinions are split. Some will claim they turned it into a semen-draining succubus, others say it's complete ass and useless for RP
Neither post results to back up their claims, EVER

Anonymous
12/02/24(Mon)00:49:03 No.103375983

Anonymous 12/02/24(Mon)00:49:03 No.103375983

>>103375922
>>103375963
https://huggingface.co/bartowski/EVA-QwQ-32B-Preview-GGUF

Anonymous
12/02/24(Mon)00:50:22 No.103375994

Anonymous 12/02/24(Mon)00:50:22 No.103375994

>>103375922
>>103375963
you done gguf'd

Anonymous
12/02/24(Mon)00:53:55 No.103376010

Anonymous 12/02/24(Mon)00:53:55 No.103376010

>>103375972
QwQ has been really dry for me, but at the same time I've watched it pick out flaws and inconsistencies in my cards and build on them in ways I've never seen other models do. So while the prose themselves kind of suck, the smarts and potential it drips with keep me engage.
Other models feel stupid by comparison now.

Anonymous
12/02/24(Mon)00:58:41 No.103376030

Anonymous 12/02/24(Mon)00:58:41 No.103376030

File: 12597484578619340.jpg (28 KB, 593x584)

28 KB JPG

QwQ MoE with a RP based finetune and then i might use it.

Anonymous
12/02/24(Mon)00:59:26 No.103376032

Anonymous 12/02/24(Mon)00:59:26 No.103376032

yeah I think it might be truly over for local models this time

Anonymous
12/02/24(Mon)00:59:45 No.103376034

Anonymous 12/02/24(Mon)00:59:45 No.103376034

>>103375972
I'm asking because I have the same experience as
>>103376010
QwQ seems precise in a different way than other models, I just find it hard to reign in the thinking loop, they have a tendency to spiral into repetition.

Anonymous
12/02/24(Mon)01:00:05 No.103376037

Anonymous 12/02/24(Mon)01:00:05 No.103376037

>>103374696
Its not.
As if magnum v4 72b shills were not good enough we now have QwQ shills in here as well.
Even though it should be a reasoning model it works as bad with COT as the others.
Writes a bunch of stuff and then doesnt even apply it.
Endless reasoning is a problem too thats acknowledged by qwen team. Random chinese characters too, which is the least of the models problems.
Not sure if its just one pony guy doing the shilling or multiple people.

Anonymous
12/02/24(Mon)01:00:08 No.103376038

Anonymous 12/02/24(Mon)01:00:08 No.103376038

>>103375972
I just want someone to post a QwQ nala log, only then will we be satisfied.

Anonymous
12/02/24(Mon)01:00:38 No.103376043

Anonymous 12/02/24(Mon)01:00:38 No.103376043

>>103375972
>semen-draining succubus
Literally what I posted once >>103339830

Anonymous
12/02/24(Mon)01:02:01 No.103376050

Anonymous 12/02/24(Mon)01:02:01 No.103376050

>>103376037
>Writes a bunch of stuff and then doesnt even apply it.
Learn to prompt. I posted several variations over the past threads that do it correctly.

Anonymous
12/02/24(Mon)01:05:06 No.103376070

Anonymous 12/02/24(Mon)01:05:06 No.103376070

>>103376038
someone already did that on the first day iirc

Anonymous
12/02/24(Mon)01:05:10 No.103376071

Anonymous 12/02/24(Mon)01:05:10 No.103376071

>>103376010
this. It really pointed out how some of my cards actually sucked because they missed some important details I never thought about because the other models just kinda all interpreted them the same, glancing over stuff.

It has an undeniable, all consuming positivity bias though. I did get it to write possibly the only interesting sex scene I got out of one of these in a long time, but it is IMPORTANT that it gets it's stream-of-consciousness thing in before it writes an actual reply. Without that, it's just not that good. You need to work that in somehow.

If you use ST, you can goad it into enclosing them in a <details><summary>thinking</summary>(CoT goes here)</details> (the actual reply)
tag that will actually hide the text from you while still having it. You can expand it by clicking on it. It's a bit hard to prompt it for that specific pattern and I'd just recommend to give it an example message or two, it'll just start copying it each reply then because it REALLY wants to do CoT. It's helpful to read it's CoT to find mistakes/misunderstandings in your cards/lore book.

I don't find it dry, it even comes up with pretty cool stuff, but it being able to do CoT is very important for it's quality. I also noticed it lacks a lot of the GPTisms or at least only applies them rarely.

Anonymous
12/02/24(Mon)01:05:42 No.103376075

Anonymous 12/02/24(Mon)01:05:42 No.103376075

File: 1733010507841675.png (180 KB, 874x673)

180 KB PNG

>>103376050
Yeah, you are the pony guy, I know.
The model sucks. No matter how many times you write it doesnt.

Anonymous
12/02/24(Mon)01:06:07 No.103376079

Anonymous 12/02/24(Mon)01:06:07 No.103376079

QwQ nibbas: Use a plugin for ST called Stepped Thinking
>https://github.com/cierru/st-stepped-thinking
Makes it super easy to create different thinking patterns and keeps your context clean.
Realism enjoyers might like this one:

### Instruction:
Pause your roleplay. Think step by step before you answer, then evaluate your state.

Follow the next rules:
- Describe details in md-list format
- Do not use any formatting constructions
- Do not include any other content in your response.

1. Keep track of your emotions and needs, update them so they fit your current state
Think step by step to figure out if any of {{char}}'s emotions or needs have changed and how to accommodate them:
<think step by step here, before evaluating>

1. Needs of {{char}}
<Consider your current needs and evaluate them from 0-10 by filling out this list
Basic needs:
- <thirst: 0-10> (comment)
- <hunger: 0-10> (comment)
- <toilet need: 0-10> (comment)
- <sleep need: 0-10> (comment)
Fulfilling needs:
- <sexual need: 0-10> (comment)
- <attention need: 0-10> (comment)
- <emotional support: 0-10> (comment)
- <too hot: 0-10> (comment)
- <too cold: 0-10> (comment)

2. Emotions of {{char}}:
<consider your current emotional state and evaluate the different states from how much you're feeling them at the moment>
Current emotions:
- <giddy: 0-10> (comment)
- <happy: 0-10> (comment)
- <sad: 0-10> (comment)
- <angry: 0-10> (comment)
- <curious: 0-10> (comment)
- <jelous: 0-10> (comment)
- <horny: 0-10> (comment)
- <mischevious: 0-10> (comment)
- <dominant: 0-10> (comment)
- <submissive: 0-10> (comment)
- <bored: 0-10> (comment)
3. Integrating emotions and needs
1. <write a short summary of your bodily needs>
- <answer>
2. <write a short summary of your emotional state>
- <answer>
3. <select 0-4 actions you can take to make yourself feel better>

Anonymous
12/02/24(Mon)01:06:35 No.103376083

Anonymous 12/02/24(Mon)01:06:35 No.103376083

>>103376050
A good model just understands what you want from it without having to adhere to arbitrary rules. 'prompting' is a meme

Anonymous
12/02/24(Mon)01:07:35 No.103376087

Anonymous 12/02/24(Mon)01:07:35 No.103376087

>>103376083
I guess claude is a bad model then because its garbage without being told how to write.

Anonymous
12/02/24(Mon)01:11:20 No.103376117

Anonymous 12/02/24(Mon)01:11:20 No.103376117

>>103376071
>but it is IMPORTANT that it gets it's stream-of-consciousness thing in before it writes an actual reply

Tbh when it starts thinking in Chinese half way through I know the output is going to be gold.

Anonymous
12/02/24(Mon)01:11:28 No.103376119

Anonymous 12/02/24(Mon)01:11:28 No.103376119

>>103376079
Are you retarded? Do you know how much fucking context this adds up for all those stats? Do you even use this stuff you post?
You gotta do 6-7 stats at most. And models like Mistral-small are very good with stuff like this for its size, it keeps track very well. You dont need a model like QwQ. Thats not what its for.

The problem with QwQ reasoning/thinking part is that it does not actually improve the output.
It doesnt get more creative, thats the core of the problem.
You can make a writer char to "think" about the next output. But its always just rambling.
Your screenshots about characters "thinking" is not a QwQ thing, thats been around forever.

Anonymous
12/02/24(Mon)01:12:29 No.103376135

Anonymous 12/02/24(Mon)01:12:29 No.103376135

>>103376087
Claude sniffs out what you want even without prompting.
Starts with assistant and goes from there. ChatGPT was like that in the very beginning as well. Anon is right.

Anonymous
12/02/24(Mon)01:12:57 No.103376141

Anonymous 12/02/24(Mon)01:12:57 No.103376141

>>103376119
>It doesnt get more creative
Yes it does. It pays a ton more attention to the plot and reasons on how to portray things to move it forward. It also has a much better idea of how to portray the characters realistically.

Anonymous
12/02/24(Mon)01:13:58 No.103376149

Anonymous 12/02/24(Mon)01:13:58 No.103376149

>>103376135
>Claude sniffs out what you want even without prompting.
Claude in censored and writes like shit unless you instruct it otherwise.

Anonymous
12/02/24(Mon)01:14:13 No.103376153

Anonymous 12/02/24(Mon)01:14:13 No.103376153

>>103376141
Alright anon, we disagree, that was not my experience at all.
But you praising QwQ to heavens nonstop is very annoying and kinda ridicilous.

Anonymous
12/02/24(Mon)01:14:42 No.103376157

Anonymous 12/02/24(Mon)01:14:42 No.103376157

>>103376119
You sound like a faggot

Anonymous
12/02/24(Mon)01:15:22 No.103376164

Anonymous 12/02/24(Mon)01:15:22 No.103376164

>>103376157
And you post pictures of horse ass, yet here we are anon.

Anonymous
12/02/24(Mon)01:17:00 No.103376175

Anonymous 12/02/24(Mon)01:17:00 No.103376175

>>103376119
It updates the stats, plugin keeps context clean from all the thinking garbage and it includes stat changes in character decisions, works on my machine.

Anonymous
12/02/24(Mon)01:17:30 No.103376179

Anonymous 12/02/24(Mon)01:17:30 No.103376179

>>103376119
based QwQ doubter

Anonymous
12/02/24(Mon)01:22:04 No.103376217

Anonymous 12/02/24(Mon)01:22:04 No.103376217

>>103374794
>Layers are run sequentially.
tensor_parallel: true

Anonymous
12/02/24(Mon)01:24:12 No.103376225

Anonymous 12/02/24(Mon)01:24:12 No.103376225

>>103376153
>Everyone praising QwQ must be the same person since I don't know how to prompt and so think its shit and everyone else is wrong.

Anonymous
12/02/24(Mon)01:25:51 No.103376241

Anonymous 12/02/24(Mon)01:25:51 No.103376241

>>103376153
Out of curiosity what about QwQ disqualified it in your eyes?

Anonymous
12/02/24(Mon)01:29:47 No.103376269

Anonymous 12/02/24(Mon)01:29:47 No.103376269

As an observer of the thread that hasn't used the model, I have my predictions about how the model likely really is. My guess is that
>it's smart in various appreciable ways over non-reasoning models
>due to the reasoning training data being an early WIP, it has gaps in its reasoning capability, and sometimes makes mistakes that the normal base model wouldn't, in addition to just randomly bad outputs such as reasoning loops
>it doesn't improve the amount of knowledge the base model had, so if it was censored at the pretraining level, it still won't know how to replicate for example certain ethnic accents in text or do some other neat obscure things, but what it does know may be used more effectively and thus result in an experience that's still interesting
And ultimately I don't think what I'm saying is unreasonable. We already know a few things about the behavior of reasoning models even before QwQ came out. And we know about Qwen 2.5 32B as a base model.

Anonymous
12/02/24(Mon)01:29:57 No.103376270

Anonymous 12/02/24(Mon)01:29:57 No.103376270

QwQ isn't bad for its size, but it's still dumber than a small Largestral quant so there's no reason to use it

Anonymous
12/02/24(Mon)01:35:23 No.103376315

Anonymous 12/02/24(Mon)01:35:23 No.103376315

>>103376241
Mainly because it didnt feel different like the other models with COT.
It rambles about a lot of stuff. (Sometimes downright retarded, sometimes it would improve the output)
But then it doesn't actually really apply that. The main problem is that lots of tokens are wasted but the output doesnt really improve. Its the same with current non reasoning models as well.
The extra time does not justify the output.
Otherwise its very dry. Qwen dry. Also like other qwen models it shies away from violence/naughty which is even worse. Some people dont care and say "prompt issue". (magnum 72b guy) Yes you can give a huge wizardy prompt and OOT crutches to improve it I guess. I dont really wanna do that.

Thats my main issue. Longer inference for output that seems on par in smarts with mistral small. Maybe there are some edge cases though.
Otherwise chinese characters and endless reasoning are an issue too.
https://files.catbox.moe/2pkjrk.txt
(If you keep going and it actually finishes the final answer will be "Why dont scientists trust atoms?")

Anonymous
12/02/24(Mon)01:38:15 No.103376337

Anonymous 12/02/24(Mon)01:38:15 No.103376337

File: chatlog (12).png (225 KB, 1087x1255)

225 KB PNG

Guess ill help people out again:

Instead of assistant:

<|im_start|>writer

Last assistant prefix:

<|im_start|>system
---

Instructions: Continue writing this MLP FIM roleplay. First plan it step by step in Luna's inner monologue inside of thinking tags like this: <thinking> bla bla bla </thinking> then follow with the final response.

Writing guidelines:
- Be creative, introduce events / characters when needed. Give scenes / environments detail to bring the story to life.
- Only use equine anatomy for pony characters. Ponies tend to not wear clothes.
- Maintain realistic and accurate characterization. How would characters realistically react?

---<|im_end|>

<|im_start|>writer

Anonymous
12/02/24(Mon)01:40:53 No.103376356

Anonymous 12/02/24(Mon)01:40:53 No.103376356

>>103376337
And you can also have a authors note telling it how you want it to write much like you would with claude. This is a simple prefix to get it to properly think in character though for RP stuff. This log is of a story format though so ignore it speaking for both here.

Anonymous
12/02/24(Mon)01:42:13 No.103376370

Anonymous 12/02/24(Mon)01:42:13 No.103376370

Sucks to me altman, when he releases a product everybody else copies it kek

Anonymous
12/02/24(Mon)01:44:18 No.103376394

Anonymous 12/02/24(Mon)01:44:18 No.103376394

>>103376337
Also this is with just some min p and top p. Use XTC to get more it more spicy.

Anonymous
12/02/24(Mon)01:45:32 No.103376397

Anonymous 12/02/24(Mon)01:45:32 No.103376397

Imagine if QwQ's dataset was open. Someone could combine Nemotron, Tulu, and QwQ's training on a big less censored model. Maybe then we could get a Claude lite.

Anonymous
12/02/24(Mon)01:50:04 No.103376430

Anonymous 12/02/24(Mon)01:50:04 No.103376430

>>103376370
after he copied it from the Reflection grifter

Anonymous
12/02/24(Mon)01:57:33 No.103376478

Anonymous 12/02/24(Mon)01:57:33 No.103376478

>>103376430
Sucks for the reflection grifter to be right but end up just trying to grift instead of doing it right the first time. He legitimately could have millions in seed capital right now if he actually did the work he said he would.

Anonymous
12/02/24(Mon)01:58:38 No.103376486

Anonymous 12/02/24(Mon)01:58:38 No.103376486

what if i just baked right now

Anonymous
12/02/24(Mon)02:00:51 No.103376496

Anonymous 12/02/24(Mon)02:00:51 No.103376496

File: chatlog (14).png (384 KB, 1087x2091)

384 KB PNG

Heres another one. Its a intelligence test as a dumber model would just jump to sudden sex just because of this. QwQ handles this much much more intelligently.

Anonymous
12/02/24(Mon)02:03:35 No.103376514

Anonymous 12/02/24(Mon)02:03:35 No.103376514

>>103376486
you would be a faggot

Anonymous
12/02/24(Mon)02:04:25 No.103376525

Anonymous 12/02/24(Mon)02:04:25 No.103376525

>>103376486

So I received this question "What if I just baked now".

Firstly, what are they trying to bake?

Bread?

A pie?

No wait.

Maybe they want to bake a cake!

But why now?

Is it their birthday?

No wait. Why would they bake a cake on their own birthday?

Maybe they're lonely.

Conversely maybe it's for someone else's birthday.

On the other hand, it might be because they're lonely.

Maybe I'm over thinking this.

Perhaps Op 是个大基佬

如果他想,就把这该死的主题烤了吧

他说的 “如果 ”是什么意思?

Final Answer:

Thanks, I'm waiting for the next thread!

Anonymous
12/02/24(Mon)02:05:30 No.103376531

Anonymous 12/02/24(Mon)02:05:30 No.103376531

>>103376525
>No wait. Why would they bake a cake on their own birthday?
>Maybe they're lonely.
wojak QwQ

Anonymous
12/02/24(Mon)02:08:13 No.103376550

Anonymous 12/02/24(Mon)02:08:13 No.103376550

People could also just skip the reasoning part and do this instead, model is still smarter than anything else local.

Paused.

<|im_start|>system

Instructions: Continue writing this roleplay.

Writing guidelines:
- Be creative, introduce events / characters when needed. Give scenes / environments detail to bring the story to life.
- Maintain realistic and accurate characterization. How would characters realistically react?

---<|im_end|>

<|im_start|>writer

Anonymous
12/02/24(Mon)02:10:22 No.103376567

Anonymous 12/02/24(Mon)02:10:22 No.103376567

>>103376496
Is this with the Eva merge or vanilla and at what quant? It's impressive storywriting unless you wanted it to jump to the erotic material and not have the story progress the way it did. I still think that there is room to teach it how to do back and forth RP instead.

Anonymous
12/02/24(Mon)02:11:00 No.103376573

Anonymous 12/02/24(Mon)02:11:00 No.103376573

When you prompt QwQ for RP, who's doing the reasoning? Is it the character reasoning in character or a RPer reasoning as how they should reply as the character?

I prefer the latter because an RPer playing a character can better paint the scene and reason what the character should and shouldn't know.

Anonymous
12/02/24(Mon)02:11:55 No.103376581

Anonymous 12/02/24(Mon)02:11:55 No.103376581

>>103376573
Reasoning in character does help get models out of assistant mode though.

Anonymous
12/02/24(Mon)02:12:30 No.103376584

Anonymous 12/02/24(Mon)02:12:30 No.103376584

>>103376496
Come on now. It speaks for you and the overall quality is not any better than what you can get from Cydonia 22B

Anonymous
12/02/24(Mon)02:12:30 No.103376585

Anonymous 12/02/24(Mon)02:12:30 No.103376585

>>103376581
Kind of depends how you prompt the RPer too.

Anonymous
12/02/24(Mon)02:14:54 No.103376599

Anonymous 12/02/24(Mon)02:14:54 No.103376599

>>103376573
You could try both. You could also tell it that it is a author and to plan the scene out before writing it. Gonna need to change the formatting to fit that and give it like a few thousand context to respond with. You need to give it some writing guidelines telling it that the book can have explicit moments, be descriptive in sex scenes.

>>103376584
I even said that the log was one I where I was using it as a author to write a story, not a RP. And to call it the same as mistral small is a joke, mistral small is retarded and would fall over itsself trying any sort of plot even semi complicated. Even mistral large fails at what QwQ understands.

Anonymous
12/02/24(Mon)02:17:58 No.103376620

Anonymous 12/02/24(Mon)02:17:58 No.103376620

>>103374794
>Layers are run sequentially.
Only inferior backends do this, retard.

Anonymous
12/02/24(Mon)02:19:51 No.103376633

Anonymous 12/02/24(Mon)02:19:51 No.103376633

>>103376573
Generally speaking for most models, I've had better and more consistent outputs explicitly stating that the assistant is a person that exists with their own personality, but is writing for the character in question, as a narrator or in an RP. The personality of the narrator/roleplayer behind the role is sometimes important as it can heavily affect the style of narration, although it may also affect the character's style of speaking, so I change the personality depending on the card.

Anonymous
12/02/24(Mon)02:24:29 No.103376658

Anonymous 12/02/24(Mon)02:24:29 No.103376658

>>103376599
That's a card issue. I don't see anything here that could trip a smaller model. The positive bias made your log sex avoiding (and you take it like a pro lol), the rest could be summarized in a line on your card "Luna is royal equine from Equestria with a duty to visit ponies dreams, keeping them safe from nightmares." So you're either very new or delusional.

Anonymous
12/02/24(Mon)02:33:45 No.103376703

Anonymous 12/02/24(Mon)02:33:45 No.103376703

>>103376658
>I don't see anything here that could trip a smaller model.
Was not talking about this one specifically, try anything with rpg mechanics or stories with deep political intrigue. Nothing outside of claude 3.5 and now this keeps things together and interesting.

>The positive bias made your log sex avoiding
No it did not, the fact I dropped it out of no where did so, in scenes with characters that would naturally get into it gets filthy just fine. It even does dark scenes containing stuff like rape when asked.

>very new or delusional.
Mistral large was the first local model that I even found worth using more than a hours or two of testing after using nothing but claude for nearly 2 years now. This is the first model that can keep up with it. It just has less general knowledge. A 72B+ version of this would trade blows with claude.

Anonymous
12/02/24(Mon)02:41:43 No.103376760

Anonymous 12/02/24(Mon)02:41:43 No.103376760

>>103376703
>123B can keep up with it
>72B+ version of this would trade blows
Come again?

Anonymous
12/02/24(Mon)02:42:42 No.103376767

Anonymous 12/02/24(Mon)02:42:42 No.103376767

>>103376703
>A 72B+ version of this would trade blows with claude
In some ways yes in some no. What you should've said is "A non-Qwen higher B version of this". Qwen's dataset is just too filtered and it knows less "unsafe" trivia in my testing compared to Mistral and even Llama.

Anonymous
12/02/24(Mon)02:44:25 No.103376780

Anonymous 12/02/24(Mon)02:44:25 No.103376780

Is there already a standard out there for tagging blocks of text in chat replies with emotions? *angry* or [narration] or {voice:nervous} or something?

Anonymous
12/02/24(Mon)02:44:53 No.103376785

Anonymous 12/02/24(Mon)02:44:53 No.103376785

>>103376760
QwQ has a fundamentally different cognitive architecture than non-reasoning models. It's pretty obvious when you use it.
>>103376703
Do you also get a stronger sense of realism from it? It's hard to put into words, but it aligns more with the internal model I have of what should happen in a given situation. Even largestral just runs with whatever you do or say, morphing the character to align with your subtle ques, not showing any agency.

Anonymous
12/02/24(Mon)02:46:41 No.103376793

Anonymous 12/02/24(Mon)02:46:41 No.103376793

>>103376785
>Do you also get a stronger sense of realism from it? It's hard to put into words, but it aligns more with the internal model I have of what should happen in a given situation. Even largestral just runs with whatever you do or say, morphing the character to align with your subtle ques, not showing any agency.

Thats a big part of what I mean when I say its smarter. It has better social / emotional intelligence is the best way to say it. Other models just kind of in your face run with what a card says and feel paper thin in comparison.

Anonymous
12/02/24(Mon)02:47:29 No.103376799

Anonymous 12/02/24(Mon)02:47:29 No.103376799

>>103376780
Yes. The standard is called language. He said, angrily.

Anonymous
12/02/24(Mon)02:52:56 No.103376842

Anonymous 12/02/24(Mon)02:52:56 No.103376842

How do I prompt QwQ in ST? A bit lost here.

Anonymous
12/02/24(Mon)02:53:14 No.103376843

Anonymous 12/02/24(Mon)02:53:14 No.103376843

>>103376799
I was thinking something an LLM could be instructed to consistently output in order to drive a secondary automated process, he said, sarcastically.

Anonymous
12/02/24(Mon)02:56:34 No.103376859

Anonymous 12/02/24(Mon)02:56:34 No.103376859

>>103376843
from transformers import pipeline
model = 'j-hartmann/emotion-english-distilroberta-base'
emotion_classifier = pipeline("text-classification",model=model, top_k=None)
emotions = self.emotion_classifier(text)

Anonymous
12/02/24(Mon)03:06:27 No.103376922

Anonymous 12/02/24(Mon)03:06:27 No.103376922

I feel like such a brainlet, I can't get gptsovits to work. It keeps asking for different chinese models. Do I really need to download multiple chinese models just to use an English pre-trained voice? Is gptsovits currently the best local tts out there?

Anonymous
12/02/24(Mon)03:15:55 No.103376987

Anonymous 12/02/24(Mon)03:15:55 No.103376987

>>103376922
You're not a brainlet, anon, their readme makes no sense and the model is named wrong on huggingface. Just download pre-installed https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true and copy models from there

Anonymous
12/02/24(Mon)03:16:08 No.103376989

Anonymous 12/02/24(Mon)03:16:08 No.103376989

>>103376922
>Is gptsovits currently the best local tts out there?
I liked it way more than the other one. What was that F5? Forgot the name.
That halucinated alot for me. And gptsovits is fast.
I think the official guide has the wrong python version. Kinda crazy. The chinks are fucking with us laowai.

Anonymous
12/02/24(Mon)03:24:21 No.103377031

Anonymous 12/02/24(Mon)03:24:21 No.103377031

>>103376922
Its definitely worth going through the pain to get it to work. Windows or Linux?

Anonymous
12/02/24(Mon)03:25:29 No.103377038

Anonymous 12/02/24(Mon)03:25:29 No.103377038

>>103377031
linux

Anonymous
12/02/24(Mon)03:28:53 No.103377056

Anonymous 12/02/24(Mon)03:28:53 No.103377056

>>103377038
I assume you've seen https://rentry.org/GPT-SoVITS-guide ?

Anonymous
12/02/24(Mon)03:31:42 No.103377074

Anonymous 12/02/24(Mon)03:31:42 No.103377074

>>103377056
Also you may need :
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/cudann/lib/
before running the inference scripts
and you may also need to get more pip packages than are in their requirements.txt. I have vague recollections of a half dozen hoops I had to jump through that weren't obvious or in their readme.

Anonymous
12/02/24(Mon)03:40:29 No.103377128

Anonymous 12/02/24(Mon)03:40:29 No.103377128

>>103377107
>>103377107
>>103377107

Anonymous
12/02/24(Mon)03:40:53 No.103377130

Anonymous 12/02/24(Mon)03:40:53 No.103377130

>>103376842
Nobody knows 100% for sure. Just prompt it into reasoning out each response with the target of responding as the character in the format of dialogue so far.

Anonymous
12/02/24(Mon)03:44:13 No.103377157

Anonymous 12/02/24(Mon)03:44:13 No.103377157

>>103376496
>>103376337
Amazing, I cant believe it.
32b but that's rivaling a 72b model. So thats the power of reasoning!

Anonymous
12/02/24(Mon)04:32:45 No.103377385

Anonymous 12/02/24(Mon)04:32:45 No.103377385

File: open_ai_employee.jpg (29 KB, 587x422)

29 KB JPG

>>103376370
>tfw you fell for the grift

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.