/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/12/24(Tue)11:56:55 No.103164659

File: 1707244174783552.jpg (149 KB, 500x500)

149 KB JPG

/lmg/ - Local Models General Anonymous 11/12/24(Tue)11:56:55 No.103164659 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103153308 & >>103135641

►News
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family/
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
11/12/24(Tue)12:00:35 No.103164707

Anonymous 11/12/24(Tue)12:00:35 No.103164707

>>103164575
>>103164575
>>103164575
Actual thread.

This thread was made by a thread splitting troll that has genuine mental issues about his ritual posting.

Anonymous
11/12/24(Tue)12:00:45 No.103164710

Anonymous 11/12/24(Tue)12:00:45 No.103164710

>>103164659
>tuesday
>not teto
shame on you

Anonymous
11/12/24(Tue)12:01:05 No.103164714

Anonymous 11/12/24(Tue)12:01:05 No.103164714

>>103164687
Shut up racist

Anonymous
11/12/24(Tue)12:01:39 No.103164718

Anonymous 11/12/24(Tue)12:01:39 No.103164718

>>103164707
>(embed)
>old news
hi petra

Anonymous
11/12/24(Tue)12:04:08 No.103164748

Anonymous 11/12/24(Tue)12:04:08 No.103164748

File: 1703071671972056.jpg (293 KB, 984x2084)

293 KB JPG

total tranny cleansing can't come soon

Anonymous
11/12/24(Tue)12:05:00 No.103164759

Anonymous 11/12/24(Tue)12:05:00 No.103164759

>>103164748
let them cook

Anonymous
11/12/24(Tue)12:06:39 No.103164771

Anonymous 11/12/24(Tue)12:06:39 No.103164771

>>103164687
Get out.

Anonymous
11/12/24(Tue)12:09:31 No.103164803

Anonymous 11/12/24(Tue)12:09:31 No.103164803

>>103164659
>Thread Theme:
https://www.youtube.com/watch?v=hlQ4IM1qzlk

Anonymous
11/12/24(Tue)12:13:02 No.103164841

Anonymous 11/12/24(Tue)12:13:02 No.103164841

>>103164817
>Qwen 3.5 coder model review and impressions

Anonymous
11/12/24(Tue)12:16:32 No.103164880

Anonymous 11/12/24(Tue)12:16:32 No.103164880

File: 2489892.png (328 KB, 484x820)

328 KB PNG

>>103164659

Anonymous
11/12/24(Tue)12:16:37 No.103164881

Anonymous 11/12/24(Tue)12:16:37 No.103164881

File: tetrecap1.png (1.96 MB, 1536x1536)

1.96 MB PNG

►Recent Highlights from the Previous Thread: >>103153308

--Paper: When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization:
>103160371 >103160493
--Papers:
>103160243
--Testing LLMs with provocative prompts and discussing prompt engineering and filtering:
>103155576 >103155674 >103155779 >103156002 >103156059 >103156672 >103156221
--Running 72b on a 3060 GPU with 12GB VRAM, and the need for high-end hardware:
>103161742 >103161813 >103161830 >103161836 >103161831 >103161872 >103161882 >103161917 >103162116 >103162214
--Voice AI and voice cloning discussion:
>103158298 >103158310 >103160612 >103160683 >103158368
--Updating model parameters during inference and its implications for AGI:
>103156465 >103156572 >103156985 >103157866 >103157900 >103157023
--Specifying GPU for speculative decoding in Tabby/ExLLaMA:
>103162124 >103162223 >103162386 >103162441
--Qwen 2.5 Coder model impressions and performance:
>103154799 >103154931 >103155013 >103155085 >103155098 >103156687
--Quantization types and their impact on AI model speed:
>103160556 >103161363
--Processing long documents with local models for summary and insights:
>103158469 >103158698
--Anons discuss Qwen2.5, Sonnet 3.5, and Largestral models:
>103161265 >103161296 >103161401 >103162176 >103162413 >103161639
--Anon tests Qwen2.5 Coder Instruct with Nala scenario:
>103160663 >103160744
--Anon shares Unbounded game, others say it's not new:
>103158574 >103158586 >103158649
--Anon questions how GPT-4 validates code:
>103161529 >103161534 >103161692
--Anon mentions Jetson Thor as a potential solution for homemade android with local processing:
>103158392
--Qwen 2.5 coder model review and impressions:
>103159846
--Miku (free space):
>103153440 >103154178 >103154266 >103154839 >103156287 >103158261 >103158447 >103160213 >103160416 >103161631 >103162124 >103163680

►Recent Highlight Posts from the Previous Thread: >>103153319

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/12/24(Tue)12:17:09 No.103164888

Anonymous 11/12/24(Tue)12:17:09 No.103164888

>>103164881
epic fail

Anonymous
11/12/24(Tue)12:17:38 No.103164893

Anonymous 11/12/24(Tue)12:17:38 No.103164893

>>103164841
Happy?

Anonymous
11/12/24(Tue)12:17:55 No.103164897

Anonymous 11/12/24(Tue)12:17:55 No.103164897

>>103164817
>>103164881
lmao retard

Anonymous
11/12/24(Tue)12:25:17 No.103164968

Anonymous 11/12/24(Tue)12:25:17 No.103164968

Was the original bitnet paper about quantization or training models from the group up?
Are there any models trained in 1.58b?

Anonymous
11/12/24(Tue)12:25:54 No.103164974

Anonymous 11/12/24(Tue)12:25:54 No.103164974

Introudicn g

The most powerful open source code large model!!!

Rombos-Coder-V2.5-Qwen-32b is a continues finetuned version of Qwen2.5-Coder-32B-Instruct. I took it upon myself to merge the instruct model with the base model myself using the Ties merge method as demonstrated in my own "Continuous Finetuning" method.

This version of the model shows higher performance than the original instruct and base models.

https://huggingface.co/rombodawg/Rombos-Coder-V2.5-Qwen-32b

Anonymous
11/12/24(Tue)12:27:45 No.103164982

Anonymous 11/12/24(Tue)12:27:45 No.103164982

>>103164968
>training models from the ground up
that
>Are there any models trained in 1.58b?
yes
>https://huggingface.co/1bitLLM/bitnet_b1_58-3B/tree/main
>8 months ago btw

Anonymous
11/12/24(Tue)12:29:24 No.103164992

Anonymous 11/12/24(Tue)12:29:24 No.103164992

>>103164974
no it doesn't

Anonymous
11/12/24(Tue)12:29:26 No.103164993

Anonymous 11/12/24(Tue)12:29:26 No.103164993

>>103164982
Sick. I never bothered to look too deep into the whole bitnet thing, so I'm catching up.
Thank you anon.

Anonymous
11/12/24(Tue)12:30:13 No.103165002

Anonymous 11/12/24(Tue)12:30:13 No.103165002

>>103164968
training models from the ground up
>Are there any models trained in 1.58b?
https://huggingface.co/1bitLLM/bitnet_b1_58-3B
https://huggingface.co/NousResearch/OLMo-Bitnet-1B
I think there's another 3B, but no one went bigger than that so far for some inexplicable reason

Anonymous
11/12/24(Tue)12:30:13 No.103165003

Anonymous 11/12/24(Tue)12:30:13 No.103165003

File: 1715093796149474.png (13 KB, 488x277)

13 KB PNG

Migrate:
>>103164575
>>103164575
>>103164575

Anonymous
11/12/24(Tue)12:31:03 No.103165015

Anonymous 11/12/24(Tue)12:31:03 No.103165015

>>103165003
Buy an AD

Anonymous
11/12/24(Tue)12:31:21 No.103165020

Anonymous 11/12/24(Tue)12:31:21 No.103165020

>>103164659
This is also a great OP image

Anonymous
11/12/24(Tue)12:38:24 No.103165081

Anonymous 11/12/24(Tue)12:38:24 No.103165081

File: file.png (95 KB, 1278x952)

95 KB PNG

The 'ick 'ecker added some things to his voice cloner.

Anonymous
11/12/24(Tue)12:39:10 No.103165091

Anonymous 11/12/24(Tue)12:39:10 No.103165091

>>103165002
>but no one went bigger than that so far for some inexplicable reason
That's fucking weird.
The META's and Mistrals of the world could train a 7b~ in a couple of days to a week I'm pretty sure.

Anonymous
11/12/24(Tue)12:42:14 No.103165113

Anonymous 11/12/24(Tue)12:42:14 No.103165113

>>103165091
If they do that, the leatherman will never sell them a GPU again.

Anonymous
11/12/24(Tue)12:49:11 No.103165186

Anonymous 11/12/24(Tue)12:49:11 No.103165186

File: relatable.png (339 KB, 484x820)

339 KB PNG

>>103164880

Anonymous
11/12/24(Tue)12:49:13 No.103165187

Anonymous 11/12/24(Tue)12:49:13 No.103165187

File: Screenshot 2024-11-12 at (...).png (138 KB, 776x1968)

138 KB PNG

>A separate training run was run with the exact same hyperparameters, but using standard fp16 weights. The comparison can be found in this wandb report.
That's really cool.

>>103165113
Really? Doesn't that mean that people would just train even bigger models and the demand for GPUs would stay the same?
Also, would make easier running seemingly even better models locally, which would put local AI in the hands of more people, and increase the demand for AI models and consumer class Nvidia GPUs too, even if the demand doubles from 1% to 2%.
Sounds like a win-win-win to me.

Anonymous
11/12/24(Tue)12:49:50 No.103165194

Anonymous 11/12/24(Tue)12:49:50 No.103165194

What would happen if it became illegal for you to run LLM's due to how "dangerous" they are? Would you ignore the law, move somewhere else or simply stop using LLM's?

Anonymous
11/12/24(Tue)12:50:08 No.103165199

Anonymous 11/12/24(Tue)12:50:08 No.103165199

>>103165091
They could and the changes to do bitnet training aren't that big either since most of the training is still done in full precision
Meta doesn't do anything but incremental changes to their gpt2-based architecture, but it doesn't make sense that Mistral or anyone else hasn't tried it yet either
Lots of people claim because it's a benefit at inferencing time, not training time so they have no incentive to care, but the same could be said about MoE

Anonymous
11/12/24(Tue)12:53:09 No.103165240

Anonymous 11/12/24(Tue)12:53:09 No.103165240

>>103165187
>Bitnet takes much longer to learn
Bitnetbros... I'm not feeling so good...

Anonymous
11/12/24(Tue)13:03:48 No.103165358

Anonymous 11/12/24(Tue)13:03:48 No.103165358

Respect for Qwen being one of the few modelmakies to still do sub 20-30B models

Anonymous
11/12/24(Tue)13:04:54 No.103165369

Anonymous 11/12/24(Tue)13:04:54 No.103165369

>>103165194
I really doubt that happens, but I would just run them anyway
what are they gonna do, raid my house for ERPing with tomboy elves?

Anonymous
11/12/24(Tue)13:06:29 No.103165380

Anonymous 11/12/24(Tue)13:06:29 No.103165380

is Serbia that bad?

Anonymous
11/12/24(Tue)13:07:23 No.103165386

Anonymous 11/12/24(Tue)13:07:23 No.103165386

>>103165187
BitNet models do not require MatMul, enabling the creation of much simpler processors for inference and (potentially) even training in the future. This poses a direct threat to NVIDIA's market dominance.

Anonymous
11/12/24(Tue)13:09:41 No.103165412

Anonymous 11/12/24(Tue)13:09:41 No.103165412

>>103165369
They have raided people for less

Anonymous
11/12/24(Tue)13:11:02 No.103165427

Anonymous 11/12/24(Tue)13:11:02 No.103165427

>>103165194
Get your loicense

Anonymous
11/12/24(Tue)13:22:14 No.103165507

Anonymous 11/12/24(Tue)13:22:14 No.103165507

>>103165369
It should be possible to detect LLM usage by analyzing power consumption graphs

Anonymous
11/12/24(Tue)13:22:25 No.103165509

Anonymous 11/12/24(Tue)13:22:25 No.103165509

>>103165386
Ah, now that makes sense. Bitnet makes ASICs more financially viable.

Anonymous
11/12/24(Tue)13:25:13 No.103165530

Anonymous 11/12/24(Tue)13:25:13 No.103165530

>>103165507
your honor, that power was actually going to my grow lights for my weed

Anonymous
11/12/24(Tue)13:26:19 No.103165539

Anonymous 11/12/24(Tue)13:26:19 No.103165539

>>103165509
Yeah, and it has already been proven possible https://github.com/rejunity/tiny-asic-1_58bit-matrix-mul

Anonymous
11/12/24(Tue)13:29:04 No.103165569

Anonymous 11/12/24(Tue)13:29:04 No.103165569

hello friends when I give llama.cpp hunyuan it says it does not load model how to fix ^^

Anonymous
11/12/24(Tue)13:31:06 No.103165596

Anonymous 11/12/24(Tue)13:31:06 No.103165596

>>103165569
I'm not your friend, nigger.

Anonymous
11/12/24(Tue)13:32:04 No.103165605

Anonymous 11/12/24(Tue)13:32:04 No.103165605

>>103165569
Maybe it doesn't like hunyuan, whatever that is.

Anonymous
11/12/24(Tue)13:32:38 No.103165611

Anonymous 11/12/24(Tue)13:32:38 No.103165611

>>103165539
Interesting. Makes me wonder why an Amazon or Google or even Apple, companies that already make their own silicon, aren't working on that.
Or maybe they are but only for internal use.
Regardless, that's fucking cool thank you so much for the link dude.
I love this rabbit hole.

Anonymous
11/12/24(Tue)13:32:47 No.103165612

Anonymous 11/12/24(Tue)13:32:47 No.103165612

>>103165569
there is an issue, who knows if anyone will pick it up https://github.com/ggerganov/llama.cpp/issues/10263
standard lmg advice applies: wait 2mw

Anonymous
11/12/24(Tue)13:33:35 No.103165621

Anonymous 11/12/24(Tue)13:33:35 No.103165621

>>103165530
Inferences have identifiable patterns https://www.researchgate.net/figure/Power-consumption-from-different-sources-CPU-GPU-or-DRAM-for-different-platforms-a_fig5_369540465

Anonymous
11/12/24(Tue)13:36:10 No.103165651

Anonymous 11/12/24(Tue)13:36:10 No.103165651

>>103164993
https://arxiv.org/abs/2411.04965
another recent paper by the original bitnet devs

Anonymous
11/12/24(Tue)13:38:19 No.103165676

Anonymous 11/12/24(Tue)13:38:19 No.103165676

>>103165621
just use a battery bank

Anonymous
11/12/24(Tue)13:44:24 No.103165729

Anonymous 11/12/24(Tue)13:44:24 No.103165729

anyone tested sarashina2 yet?
couldnt find a quant for the moe so i ran the 70b
it seems to be actually trained on more trivia than most modern models, but you kinda have to speak in jap for it to be coherent which is a shame

Anonymous
11/12/24(Tue)13:47:33 No.103165754

Anonymous 11/12/24(Tue)13:47:33 No.103165754

>the sheer number of samefag posts with pretend discussion to cover up how the samefag has split the thread...

>>103164575
>>103164575
>>103164575

Anonymous
11/12/24(Tue)13:48:38 No.103165763

Anonymous 11/12/24(Tue)13:48:38 No.103165763

>Ah, now that makes sense
>Yeah, and it has already been proven possible
Totally organic btw.

Anonymous
11/12/24(Tue)13:53:27 No.103165816

Anonymous 11/12/24(Tue)13:53:27 No.103165816

>>103165754
>looks in thread
hmm... no thanks

Anonymous
11/12/24(Tue)13:54:00 No.103165822

Anonymous 11/12/24(Tue)13:54:00 No.103165822

>>103165676
Btw, is there an affordable solution to power a 3kW rig from a 100V outlet using batteries to smooth out peaks?

Anonymous
11/12/24(Tue)13:56:51 No.103165861

Anonymous 11/12/24(Tue)13:56:51 No.103165861

Rocinante is killing my productivity...

Anonymous
11/12/24(Tue)13:58:00 No.103165877

Anonymous 11/12/24(Tue)13:58:00 No.103165877

Silly bros?

>MarinaraSpaghetti here, some of you may know me from my SillyTavern settings and NemoMix-Unleashed model over on HuggingFace. I also do model reviews from time to time.

>Today, I come to you with a request. I would appreciate it greatly if you helped me out by filling my survey about what features you use for roleplaying with models. The survey is fully anonymous. Thank you so much for your help and all the feedback! It truly means a lot.

>These devs aren’t from ST, but are working on an alternative!

>Can’t say anything due to NDA, but as soon as things are set in motion, I’m sure the word will be out! But I heavily agree with the notion that ST is too overwhelming without any proper guides online how to use it (most are outdated at this point).

https://www.reddit.com/r/SillyTavernAI/comments/1gp0og5/models_and_features_you_use_for_roleplaying_with/

Anonymous
11/12/24(Tue)13:58:53 No.103165891

Anonymous 11/12/24(Tue)13:58:53 No.103165891

>>103165861
That's why I only coom at fixed times.

Anonymous
11/12/24(Tue)13:59:21 No.103165896

Anonymous 11/12/24(Tue)13:59:21 No.103165896

>>103165877
>working on an alternative to silly
dont care

Anonymous
11/12/24(Tue)14:00:02 No.103165907

Anonymous 11/12/24(Tue)14:00:02 No.103165907

>>103165896
>its afraid

Anonymous
11/12/24(Tue)14:00:58 No.103165920

Anonymous 11/12/24(Tue)14:00:58 No.103165920

>>103165877
I don't use trannyware

Anonymous
11/12/24(Tue)14:04:33 No.103165965

Anonymous 11/12/24(Tue)14:04:33 No.103165965

>>103165877
Long abandoned

Anonymous
11/12/24(Tue)14:06:46 No.103165995

Anonymous 11/12/24(Tue)14:06:46 No.103165995

>>103165877
>NDA
It's not an alternative if it's proprietary slop.

Anonymous
11/12/24(Tue)14:08:32 No.103166017

Anonymous 11/12/24(Tue)14:08:32 No.103166017

>>103165877
Good luck making a better ST. They'll see first-hand the amount of work that went into it

Anonymous
11/12/24(Tue)14:10:13 No.103166037

Anonymous 11/12/24(Tue)14:10:13 No.103166037

>>103165841
>petra hasn't posted in months
>starts posting again while a totally different anon that hasn't posted in months comes back to threadsplit
lol you're an egyptian brown boy

Anonymous
11/12/24(Tue)14:12:08 No.103166055

Anonymous 11/12/24(Tue)14:12:08 No.103166055

hello xaars where is local opus

Anonymous
11/12/24(Tue)14:14:22 No.103166081

Anonymous 11/12/24(Tue)14:14:22 No.103166081

>>103165822
2.5k affordable?
https://www.amazon.com/dp/B0C5C9HMQ2

Anonymous
11/12/24(Tue)14:15:24 No.103166091

Anonymous 11/12/24(Tue)14:15:24 No.103166091

yoo dis locul el el em totally beatz gepetee 8 amirite fellow lmg sissies?

Anonymous
11/12/24(Tue)14:16:20 No.103166097

Anonymous 11/12/24(Tue)14:16:20 No.103166097

>sharty troon comes back
>thread quality somehow drops even more
wow they're like the indians of the internet but somehow even worse haha

Anonymous
11/12/24(Tue)14:19:15 No.103166131

Anonymous 11/12/24(Tue)14:19:15 No.103166131

>>103166097
Thread quality was never good in the first place.

Anonymous
11/12/24(Tue)14:20:43 No.103166148

Anonymous 11/12/24(Tue)14:20:43 No.103166148

i have a very revolutionary idea
what if we train mistral large on thousands of books

Anonymous
11/12/24(Tue)14:22:59 No.103166180

Anonymous 11/12/24(Tue)14:22:59 No.103166180

>>103166131
And that's why it is impressive how it can make the thread quality noticeably lower.

Anonymous
11/12/24(Tue)14:40:20 No.103166335

Anonymous 11/12/24(Tue)14:40:20 No.103166335

>>103165861
Which version?

Anonymous
11/12/24(Tue)14:41:43 No.103166351

Anonymous 11/12/24(Tue)14:41:43 No.103166351

>>103165877
>But I heavily agree with the notion that ST is too overwhelming
making software for skill issue brainlets is a red flag

Anonymous
11/12/24(Tue)14:46:04 No.103166410

Anonymous 11/12/24(Tue)14:46:04 No.103166410

>>103165877
>tranny makes lotta lots of bullshit promises
Many such cases.

Anonymous
11/12/24(Tue)14:46:33 No.103166413

Anonymous 11/12/24(Tue)14:46:33 No.103166413

>>103166335
v1.1

Anonymous
11/12/24(Tue)14:47:22 No.103166421

Anonymous 11/12/24(Tue)14:47:22 No.103166421

>>103166413
Really? None of the newer versions improved it? What format do you use, just the mistral one?

Anonymous
11/12/24(Tue)14:57:46 No.103166539

Anonymous 11/12/24(Tue)14:57:46 No.103166539

File: file.png (181 KB, 1143x714)

181 KB PNG

https://nousresearch.com/introducing-the-forge-reasoning-api-beta-and-nous-chat-an-evolution-in-llm-inference/

Anonymous
11/12/24(Tue)14:59:59 No.103166556

Anonymous 11/12/24(Tue)14:59:59 No.103166556

File: 1729185335489778.png (576 KB, 994x1258)

576 KB PNG

holy shit, Qwen-32b-coder is that good?

Anonymous
11/12/24(Tue)15:00:40 No.103166563

Anonymous 11/12/24(Tue)15:00:40 No.103166563

>>103166421
Yep the mistral one, no the other are worse in my opinion. Q8 also. Really the Mythomax of this gen.

Anonymous
11/12/24(Tue)15:05:32 No.103166599

Anonymous 11/12/24(Tue)15:05:32 No.103166599

is there a place like venus where people post context/instruction templates and system prompts?

Anonymous
11/12/24(Tue)15:06:39 No.103166610

Anonymous 11/12/24(Tue)15:06:39 No.103166610

Coder 32b really has superior prose. Obviously not trained on shitty RP logs. Weird little logic mistakes and very literal minded, though.

Anonymous
11/12/24(Tue)15:06:45 No.103166613

Anonymous 11/12/24(Tue)15:06:45 No.103166613

>>103166599
https://huggingface.co/MarinaraSpaghetti/SillyTavern-Settings

Anonymous
11/12/24(Tue)15:07:44 No.103166620

Anonymous 11/12/24(Tue)15:07:44 No.103166620

>>103166610
>Weird little logic mistakes and very literal minded, though.
such as?

Anonymous
11/12/24(Tue)15:09:20 No.103166637

Anonymous 11/12/24(Tue)15:09:20 No.103166637

>>103165861
>>103166413
That's the only model I've been using for a good while now.
As far as having 8gb of VRAM goes, you can't do much better if at all.

>>103166539
So they have a reasoning model in between the user's prompt and the final gen?
Interesting idea.
I might jerry rig (as in jank) something of the sort using a small model that is only tasked with "Reason which steps are necessary to produce an answer to the following query" or something of the sort.
Maybe have it classify which kind of request it's working with before trying to reason about it, etc.

Anonymous
11/12/24(Tue)15:13:42 No.103166683

Anonymous 11/12/24(Tue)15:13:42 No.103166683

>>103166148
>what if we train mistral large on thousands of books
Then we will have a little bit of fun to pass the time until the next thing.

Anonymous
11/12/24(Tue)15:15:55 No.103166703

Anonymous 11/12/24(Tue)15:15:55 No.103166703

>>103165358
>modelmakies

Anonymous
11/12/24(Tue)15:17:15 No.103166715

Anonymous 11/12/24(Tue)15:17:15 No.103166715

>>103164881
Thank you six-fingered Recap Teto

Anonymous
11/12/24(Tue)15:17:17 No.103166716

Anonymous 11/12/24(Tue)15:17:17 No.103166716

>>103166556
So far, the best model for coding.

Anonymous
11/12/24(Tue)15:17:28 No.103166719

Anonymous 11/12/24(Tue)15:17:28 No.103166719

>>103166703
it helps /aicg/gers identify eachother in the wild.

Anonymous
11/12/24(Tue)15:18:17 No.103166729

Anonymous 11/12/24(Tue)15:18:17 No.103166729

>>103166716
local model or like, better than fucking Sonnet 3.5???

Anonymous
11/12/24(Tue)15:19:11 No.103166742

Anonymous 11/12/24(Tue)15:19:11 No.103166742

>>103166610
who the fuck uses a coder for RP?

Anonymous
11/12/24(Tue)15:23:55 No.103166778

Anonymous 11/12/24(Tue)15:23:55 No.103166778

>>103166729
Given that I can't use a cloud model on company's codebase, indeed it is so.

Anonymous
11/12/24(Tue)15:24:48 No.103166794

Anonymous 11/12/24(Tue)15:24:48 No.103166794

>>103166729
Its 90% of the way to sonnet 3.5 without needing to pay 15$ per million tokens and giving anthropic your code. Nothing else including GPT4 one shots a lot of stuff it does.

Anonymous
11/12/24(Tue)15:26:39 No.103166812

Anonymous 11/12/24(Tue)15:26:39 No.103166812

>>103166778
>>103166794
this is actually insane, who would've known we could've achieved this level with a 32b model, holy fuck... the chinks are really dominating the AI race right now

Anonymous
11/12/24(Tue)15:27:54 No.103166823

Anonymous 11/12/24(Tue)15:27:54 No.103166823

>>103166742
Every model is a coom model if you try hard enough

Anonymous
11/12/24(Tue)15:28:43 No.103166832

Anonymous 11/12/24(Tue)15:28:43 No.103166832

>codeshit
>>>32B
*yawn*

Anonymous
11/12/24(Tue)15:28:48 No.103166834

Anonymous 11/12/24(Tue)15:28:48 No.103166834

>>103166812
>ho would've known we could've achieved this level with a 32b model
to be fair it is laser focused on coding, whereas sonnet is still an all-rounder

Anonymous
11/12/24(Tue)15:29:29 No.103166839

Anonymous 11/12/24(Tue)15:29:29 No.103166839

>>103166832
32b is all you need

Anonymous
11/12/24(Tue)15:32:01 No.103166855

Anonymous 11/12/24(Tue)15:32:01 No.103166855

>>103166834
Were I in their shoes, I would train a highly specialized models and a router to classify prompts.

Anonymous
11/12/24(Tue)15:32:17 No.103166857

Anonymous 11/12/24(Tue)15:32:17 No.103166857

So the 72B should be even better then?

Anonymous
11/12/24(Tue)15:32:27 No.103166862

Anonymous 11/12/24(Tue)15:32:27 No.103166862

Why couldn't they just put normal Qwen and Qwen coder into a MoE so you could get the best of both worlds?

Anonymous
11/12/24(Tue)15:33:11 No.103166872

Anonymous 11/12/24(Tue)15:33:11 No.103166872

>>103166862
That is not how MoEs work.

Anonymous
11/12/24(Tue)15:33:27 No.103166874

Anonymous 11/12/24(Tue)15:33:27 No.103166874

>>103166812
I found out that yi 8b coder gives more accurate results compared to codestal 22b too. It's crazy.

Anonymous
11/12/24(Tue)15:34:54 No.103166885

Anonymous 11/12/24(Tue)15:34:54 No.103166885

File: 1716992187063407.png (107 KB, 498x410)

107 KB PNG

>>103166857
Qwen-2.5-72b-coder-BitNet, trust the plan

Anonymous
11/12/24(Tue)15:35:19 No.103166886

Anonymous 11/12/24(Tue)15:35:19 No.103166886

>>103166857
Blame burgers for banning the export of GPUs to China

Anonymous
11/12/24(Tue)15:36:03 No.103166896

Anonymous 11/12/24(Tue)15:36:03 No.103166896

>>103166886
>land of the free
my fucking ass, they want misery on every country that dare to catch up to them

Anonymous
11/12/24(Tue)15:36:38 No.103166903

Anonymous 11/12/24(Tue)15:36:38 No.103166903

>>103166874
to be fair codestral was always bad

Anonymous
11/12/24(Tue)15:38:07 No.103166914

Anonymous 11/12/24(Tue)15:38:07 No.103166914

>>103166886
>>103166896
I mean I'm sure as soon as China is satisfied it has damaged the leader's market acquisition enough and has models outperforming the rest they will go private as well.

Anonymous
11/12/24(Tue)15:41:09 No.103166938

Anonymous 11/12/24(Tue)15:41:09 No.103166938

>>103166872
Nothing truly prevents this. You could employ a distinct router model to evaluate both the prompt and the generated text, then redirect prompts among models of varied sizes and architectures. I recall hearing of such an approach to mitigate costs.

Anonymous
11/12/24(Tue)15:44:03 No.103166962

Anonymous 11/12/24(Tue)15:44:03 No.103166962

>the US renders china GPU-poor in an attempt to cripple their AI researchers
>the researchers are forced to become masters of efficiency and they figure out how to make small models that btfo much larger ones
burgerbros...what happened

Anonymous
11/12/24(Tue)15:44:13 No.103166964

Anonymous 11/12/24(Tue)15:44:13 No.103166964

>>103166938
He's right. What you're describing isn't MoE

Anonymous
11/12/24(Tue)15:45:16 No.103166972

Anonymous 11/12/24(Tue)15:45:16 No.103166972

>>103166962
They have ambition.
We have avarice.

Anonymous
11/12/24(Tue)15:46:21 No.103166981

Anonymous 11/12/24(Tue)15:46:21 No.103166981

File: 1709835712617613.jpg (42 KB, 898x886)

42 KB JPG

>>103166896
Yeah, ask Japan how it feels, 失われた30年

Anonymous
11/12/24(Tue)15:47:37 No.103166989

Anonymous 11/12/24(Tue)15:47:37 No.103166989

As cloudshit is reaching a ceiling, local is getting better and more efficient. One more year before we have GPT4 at home.

Anonymous
11/12/24(Tue)15:51:22 No.103167004

Anonymous 11/12/24(Tue)15:51:22 No.103167004

File: x1.png (697 KB, 2574x2616)

697 KB PNG

>>103166964
MoE is a broader term than you think it is.

Anonymous
11/12/24(Tue)16:12:14 No.103167165

Anonymous 11/12/24(Tue)16:12:14 No.103167165

>>103166989
That rumor about hitting the celling is fake. (((They))) wish for their opponents to cease refining their models. GeePeeTee5 is real, just expensive as fuck

Anonymous
11/12/24(Tue)16:13:27 No.103167175

Anonymous 11/12/24(Tue)16:13:27 No.103167175

>>103167004
>Attension
>stareted

Anonymous
11/12/24(Tue)16:20:24 No.103167230

Anonymous 11/12/24(Tue)16:20:24 No.103167230

File: 54c819fa0784eaa9e5107b52d(...).jpg (11 KB, 236x184)

11 KB JPG

>>103167175
>It's funny that they are Koreans

Anonymous
11/12/24(Tue)16:21:06 No.103167235

Anonymous 11/12/24(Tue)16:21:06 No.103167235

>>103167165
Cope. The failure of the Opus 3.5 training run heralded the beginning of a new AI winter.

Anonymous
11/12/24(Tue)16:27:42 No.103167283

Anonymous 11/12/24(Tue)16:27:42 No.103167283

>>103167235
>failure of the Opus 3.5 training run
According to who?

Anonymous
11/12/24(Tue)16:30:39 No.103167302

Anonymous 11/12/24(Tue)16:30:39 No.103167302

>>103167283
sama

Anonymous
11/12/24(Tue)16:31:35 No.103167309

Anonymous 11/12/24(Tue)16:31:35 No.103167309

>>103167283
It came to me in a dream

Anonymous
11/12/24(Tue)17:04:51 No.103167584

Anonymous 11/12/24(Tue)17:04:51 No.103167584

File: 2024-11-12_214409_seed299(...).png (2.27 MB, 2016x1152)

2.27 MB PNG

Back to kinoslop.

Anonymous
11/12/24(Tue)17:05:32 No.103167590

Anonymous 11/12/24(Tue)17:05:32 No.103167590

>>103167309
based miku communicator

Anonymous
11/12/24(Tue)17:12:00 No.103167627

Anonymous 11/12/24(Tue)17:12:00 No.103167627

File: 2024-11-12_214447_seed177(...).png (2.96 MB, 2016x1152)

2.96 MB PNG

Noob seems to really love character portraits when doing hud gens.

Anonymous
11/12/24(Tue)17:13:06 No.103167635

Anonymous 11/12/24(Tue)17:13:06 No.103167635

>>103166620
>such as?
Just some nonsensical details or being confused about the characters. That might be the coding finetuning talking.
But truth to be told, I'd never tried Qwen 2.5 before cause /lmg/ told me it was censored chinkshit. So now I tried the original Instruct model. With a simple prefill it does every kind of depraved sex shit, without falling into retarded Literotica slop like "hardness" or "heat" or being too horny. Guess I shouldn't listen to /lmg/.

Anonymous
11/12/24(Tue)17:14:10 No.103167642

Anonymous 11/12/24(Tue)17:14:10 No.103167642

>>103166962
Necessity is the mother of invention. Sanctions forced them to git gud.
The weakness of sanctions on both China and Russia was relying on the tacit assumption that the chinese and russians are retards, and they aren't. It was self-flattery from west.

Anonymous
11/12/24(Tue)17:16:36 No.103167653

Anonymous 11/12/24(Tue)17:16:36 No.103167653

I haven't checked here in awhile local fwens, Is NVIDIA, Intel, or any startup working on a dedicated local AI card? Or has some wundermodel rendered this all moot? I just really want that ford model T of AI cards before I autistically build a home AI companion in a cute animatronic

Anonymous
11/12/24(Tue)17:17:27 No.103167658

Anonymous 11/12/24(Tue)17:17:27 No.103167658

File: 1707670832084998.jpg (200 KB, 1920x1080)

200 KB JPG

>>103166962
DON'T THINK ABOUT IT
JUST PUT TRILLIONS INTO BIGGER DATACENTERS

Anonymous
11/12/24(Tue)17:22:53 No.103167691

Anonymous 11/12/24(Tue)17:22:53 No.103167691

>>103167653
they all are and all of them are working on TPUs and not a single one is aimed at consumers obviously

Anonymous
11/12/24(Tue)17:23:37 No.103167694

Anonymous 11/12/24(Tue)17:23:37 No.103167694

>>103167627
Is this with a LoRA?

Anonymous
11/12/24(Tue)17:31:39 No.103167743

Anonymous 11/12/24(Tue)17:31:39 No.103167743

>>103167653
Yes. They will all be in the $10k range and up though.

Anonymous
11/12/24(Tue)17:32:35 No.103167751

Anonymous 11/12/24(Tue)17:32:35 No.103167751

>>103167743
Is that because they're insisting on making it super fast? My understanding is that slower VRAM is dirt cheap

Anonymous
11/12/24(Tue)17:38:16 No.103167782

Anonymous 11/12/24(Tue)17:38:16 No.103167782

Is it just me or do 70b/72b models kinda suck?

These models can't even remember what room I'm in, one moment my character is sitting in a chair, and 3 messages later they're laying on a bed. This is only like 8 messages into the RP with over 28k context available still, the fuck?

Feels like a scam considering Mistral Small exists and can fit on pretty much any modern GPU in q3+.

Anonymous
11/12/24(Tue)17:39:10 No.103167792

Anonymous 11/12/24(Tue)17:39:10 No.103167792

File: 2024-11-12_215132_seed820(...).png (2.97 MB, 2016x1152)

2.97 MB PNG

>>103167694
No, just regular noob vpred 0.5.
https://files.catbox.moe/6uu3es.png

Anonymous
11/12/24(Tue)17:39:29 No.103167795

Anonymous 11/12/24(Tue)17:39:29 No.103167795

>>103167782
Yes, but it's about the limit of what a non-bitcoin bro's computer can handle.

Anonymous
11/12/24(Tue)17:40:23 No.103167806

Anonymous 11/12/24(Tue)17:40:23 No.103167806

>>103167792
there's the 0.6 version now
https://huggingface.co/Laxhar/noobai-XL-Vpred-0.6

Anonymous
11/12/24(Tue)17:40:23 No.103167807

Anonymous 11/12/24(Tue)17:40:23 No.103167807

>>103167751
No, it's because that's what they can get away with.

Anonymous
11/12/24(Tue)17:41:42 No.103167818

Anonymous 11/12/24(Tue)17:41:42 No.103167818

>>103167795
Well at least I can fit Mistral Small on a single GPU and use the other one for other shit. Really disappointed with 70b tho.

I don't even see the point in it when small models perform decently and there's basically improvement until 120b+.

Anonymous
11/12/24(Tue)17:42:15 No.103167824

Anonymous 11/12/24(Tue)17:42:15 No.103167824

>>103167807
If (as you're suggesting) most of that price is pure margin, I don't see how that would work without some kind of cartel dynamic in play
Without a backroom cartel agreement, profit margins of 50% or more would quickly lead to undercutting from competition

Anonymous
11/12/24(Tue)17:44:55 No.103167840

Anonymous 11/12/24(Tue)17:44:55 No.103167840

>>103167806
>You need to agree to share your contact information to access this model
What the CivitAI is this gay earth shit?

>>103167818
Mistral Large is 120B, right? If I lobotomize it to IQ3 I can run it, but it's too stupid for anything factual, just creative writing. It does seem pretty good at holding context. I think I pushed something to like, 19k before it started falling apart.

Anonymous
11/12/24(Tue)17:45:01 No.103167842

Anonymous 11/12/24(Tue)17:45:01 No.103167842

>>103167782
22B makes much more stupid mistakes and lacks intelligence in my testing. Maybe you are not seeing the difference with the prompts you are testing.

Anonymous
11/12/24(Tue)17:48:53 No.103167878

Anonymous 11/12/24(Tue)17:48:53 No.103167878

File: 2024-11-06_224405_seed617(...).png (1.66 MB, 1344x1728)

1.66 MB PNG

>>103167806
Hmm, I will wait for the civitai release, I don't feel like getting past the huggingface gate today.

Anonymous
11/12/24(Tue)17:49:15 No.103167881

Anonymous 11/12/24(Tue)17:49:15 No.103167881

>>103167806
>"+ edit for auto-detection of v-pred" in community tab
I don't get it

Anonymous
11/12/24(Tue)17:50:26 No.103167892

Anonymous 11/12/24(Tue)17:50:26 No.103167892

>>103167782
I find spatial problems in general are some of the easiest ways to make questions that a normal human can get right while an llm fails.

Anonymous
11/12/24(Tue)17:52:45 No.103167911

Anonymous 11/12/24(Tue)17:52:45 No.103167911

>>103167840
Yeah Mistral Large is 120b+, basically impossible to run unless you sink thousands which isn't really worth it.

>>103167842
I asked a 70b model rping as Walter White to explain to me how to install Gentoo, and it just spat out instructions at me. I don't consider that intelligent. Walter wouldn't know shit about it because he just makes meth.

>>103167892
It could be a limitation of LLMs in general I guess. Maybe I'll fire up Mistral Large at Q3 while watching anime and see how it performs between the 3 minute long processing times.

Anonymous
11/12/24(Tue)17:55:35 No.103167937

Anonymous 11/12/24(Tue)17:55:35 No.103167937

>>103167824
FYI NVIDIA's profit margin when they make a H100 is higher than that of the US government when they print a $100 bill.

Anonymous
11/12/24(Tue)17:59:49 No.103167970

Anonymous 11/12/24(Tue)17:59:49 No.103167970

>>103167937
What a weird and convoluted analogy. Why not just say what the profit margins actually are?

Anonymous
11/12/24(Tue)18:03:24 No.103167998

Anonymous 11/12/24(Tue)18:03:24 No.103167998

>>103165194
*sigh* forced into terrorism, again.... they never learn do they ?

Anonymous
11/12/24(Tue)18:03:56 No.103168002

Anonymous 11/12/24(Tue)18:03:56 No.103168002

>>103167970
Because you can look them up yourself if you want specific numbers?

Anonymous
11/12/24(Tue)18:04:26 No.103168012

Anonymous 11/12/24(Tue)18:04:26 No.103168012

If agi is coming in 2027 how long until local models are at least smart enough to not make up shit and solve simple problems?

Anonymous
11/12/24(Tue)18:05:00 No.103168016

Anonymous 11/12/24(Tue)18:05:00 No.103168016

>>103167782
Qwen2.5 / mistral large are the only local models that smart getting that sort of stuff right 99%+ of the time.

Anonymous
11/12/24(Tue)18:05:02 No.103168018

Anonymous 11/12/24(Tue)18:05:02 No.103168018

>>103167911
Well, without knowing the exact setup you have down to reproducibility, that example is basically meaningless really. 22B should be much stupider than 70Bs and if you're not seeing that, then there are a variety of reasons that could be at play, which we could never possibly know without knowing what you've actually got set up down to the last detail.

Anonymous
11/12/24(Tue)18:05:24 No.103168022

Anonymous 11/12/24(Tue)18:05:24 No.103168022

>>103167840
Yes, there are observable degradation around 20k tokens even at q5.

Anonymous
11/12/24(Tue)18:08:10 No.103168043

Anonymous 11/12/24(Tue)18:08:10 No.103168043

just got ollama running on a 780m with UMA set to 8gb, what kind of models could i run?

Anonymous
11/12/24(Tue)18:12:31 No.103168082

Anonymous 11/12/24(Tue)18:12:31 No.103168082

>>103168016
It's funny you mention Qwen2.5 because the example I mentioned about my character going from sitting onto a couch to laying on a bed after 3 messages was from the EVA Qwen2.5-72b finetune.

>>103168018
I mean yeah 22b is dumber, I guess my issue is more that the 70b models don't even feel twice as smart as the 22b despite having 3x the parameters.

I really hope we get some bases to finetune next year because the second half of this year really didn't give much to medium weights like 70b.

Anonymous
11/12/24(Tue)18:15:50 No.103168106

Anonymous 11/12/24(Tue)18:15:50 No.103168106

>>103168043
Mistral large

Anonymous
11/12/24(Tue)18:16:32 No.103168113

Anonymous 11/12/24(Tue)18:16:32 No.103168113

>>103168043
Sarashina2-8x70b

Anonymous
11/12/24(Tue)18:17:53 No.103168126

Anonymous 11/12/24(Tue)18:17:53 No.103168126

>>103164659
I've been gone since Summer 2023 any new/good 12bs?

Anonymous
11/12/24(Tue)18:18:11 No.103168128

Anonymous 11/12/24(Tue)18:18:11 No.103168128

Early december will be so wild for local models

Anonymous
11/12/24(Tue)18:18:35 No.103168133

Anonymous 11/12/24(Tue)18:18:35 No.103168133

>>103168126
qwen 2.5 14b

Anonymous
11/12/24(Tue)18:19:17 No.103168142

Anonymous 11/12/24(Tue)18:19:17 No.103168142

>>103168082
Actually I would say that 70B is at least 2x smarter. Maybe not 3x. But in my experience it really does get things wrong like 2x more often than 70B. I use models for a bunch of stuff from RP to assistant stuff and coding, though for 22B I mostly just tested RP type stuff and noticed it behaving very stupidly compared to 70B. In any case, if you really don't notice much of a difference then good for you. Just use 22B and be happy.

Anonymous
11/12/24(Tue)18:20:09 No.103168155

Anonymous 11/12/24(Tue)18:20:09 No.103168155

>>103168142
Gemma 27B though is a outliner. It is nearly as smart as non qwen2.5 70/72Bs

Anonymous
11/12/24(Tue)18:27:40 No.103168222

Anonymous 11/12/24(Tue)18:27:40 No.103168222

>>103168155
8k though, not really a fair comparison, and most people here need more than 8k so it's not usable in the first place for them.

Anonymous
11/12/24(Tue)18:58:39 No.103168471

Anonymous 11/12/24(Tue)18:58:39 No.103168471

File: Screenshot_20241112_235137.png (92 KB, 1514x524)

92 KB PNG

>>103167911
Was curious so I tried testing Walter out. Seems to work (mostly) fine on a standard prompt.

I also tested it when playing a police officer character, and THEN it complied and gave me instructions. However, I then tried modifying the prompt to specify that the assistant should not be a dumb assistant and then it worked fine again. Llama 3's instruct template literally specifies "assistant" so I think this would probably work better on local where you can actually modify the formatting.

I'm not sure this is really a test of intelligence so much as it is a test of how hard the model has been trained to be an assistant tbqh.

Anonymous
11/12/24(Tue)19:01:58 No.103168492

Anonymous 11/12/24(Tue)19:01:58 No.103168492

File: Strawberry_soon.png (59 KB, 472x143)

59 KB PNG

>>103168128
post election crazyness

Anonymous
11/12/24(Tue)19:03:23 No.103168507

Anonymous 11/12/24(Tue)19:03:23 No.103168507

>>103168492
I haven't been able to find any use for o1 yet. I've seen people say it's better and worth the slow speed for really hard stuff, but I guess I don't have anything I need it for.

Anonymous
11/12/24(Tue)19:08:35 No.103168546

Anonymous 11/12/24(Tue)19:08:35 No.103168546

>>103168471
Hmm maybe I'll try some of these l3.1 finetunes with different prompts, I'm ngl I was using the same prompt for all of them out of laziness

Anonymous
11/12/24(Tue)19:14:10 No.103168584

Anonymous 11/12/24(Tue)19:14:10 No.103168584

svelk

Anonymous
11/12/24(Tue)19:15:28 No.103168589

Anonymous 11/12/24(Tue)19:15:28 No.103168589

Have there been any fine-tunes/projects that rip the scripts from visual novels? I know there is the vntl leaderboard but I mean like a fine-tune that is based off Japanese and English translated vns. Probably harder than tuning off of ERP chat logs but I feel like the quality would be better.

Anonymous
11/12/24(Tue)19:15:40 No.103168590

Anonymous 11/12/24(Tue)19:15:40 No.103168590

>>103168546
I wouldn't use Llama 3.1 70B fine tunes as they're notorious for being dumb. Something about 70B didn't work well with fine tuning, as 8B and 405B were able to be tuned without that intelligence loss. Though people have been saying good things about Nemotron so maybe that's actually fine and everyone else just has a skill issue, not sure.

Anonymous
11/12/24(Tue)19:17:33 No.103168597

Anonymous 11/12/24(Tue)19:17:33 No.103168597

>>103168222
you can rope it? tabby does auto-rope if you set it in the config

Anonymous
11/12/24(Tue)19:18:40 No.103168605

Anonymous 11/12/24(Tue)19:18:40 No.103168605

File: takeclaire-18519572957832(...).jpg (727 KB, 950x1399)

727 KB JPG

I spend my idle time during my daily showers contemplating the lore of Nikke.

Anonymous
11/12/24(Tue)19:35:00 No.103168693

Anonymous 11/12/24(Tue)19:35:00 No.103168693

32B Coder has beaten Nemotron for me, it's the new king for ERP

Anonymous
11/12/24(Tue)19:35:54 No.103168701

Anonymous 11/12/24(Tue)19:35:54 No.103168701

>>103168693
how does the extreme dryness not bother you

Anonymous
11/12/24(Tue)19:36:02 No.103168702

Anonymous 11/12/24(Tue)19:36:02 No.103168702

>>103168693
Magnum is the king of ERP.

Anonymous
11/12/24(Tue)19:39:07 No.103168718

Anonymous 11/12/24(Tue)19:39:07 No.103168718

>>103168701
? I found it almost too purple for me. Try giving it system instructions. It follows them to a T.

Anonymous
11/12/24(Tue)19:39:14 No.103168719

Anonymous 11/12/24(Tue)19:39:14 No.103168719

spoon-feed me a little, anything wrong with using miner mobos to stack 8 GPUs? is the bandwidth going to be a problem? anyone tried it?

Anonymous
11/12/24(Tue)19:41:51 No.103168733

Anonymous 11/12/24(Tue)19:41:51 No.103168733

Why should a talking lion be a benchmark for RP? It only measures anthro ERP alignment.

Anonymous
11/12/24(Tue)19:42:03 No.103168736

Anonymous 11/12/24(Tue)19:42:03 No.103168736

>>103168590
>>103168693
>>103168702
buy a fucking ad

Anonymous
11/12/24(Tue)19:43:30 No.103168746

Anonymous 11/12/24(Tue)19:43:30 No.103168746

>>103168719
You can do it, others have. You won't be able to do row split for an extra speed boost, but with the default layer split there is no difference after the model is loaded.

Anonymous
11/12/24(Tue)19:47:41 No.103168784

Anonymous 11/12/24(Tue)19:47:41 No.103168784

Are there any Americans here? Replies seem to lean heavily europoor primetime.

Anonymous
11/12/24(Tue)19:49:31 No.103168796

Anonymous 11/12/24(Tue)19:49:31 No.103168796

>>103168597
Roping makes models dumber though. At that point I'd probably just use 22B.

Anonymous
11/12/24(Tue)19:49:32 No.103168797

Anonymous 11/12/24(Tue)19:49:32 No.103168797

nala leaderboard where?

Anonymous
11/12/24(Tue)19:52:21 No.103168817

Anonymous 11/12/24(Tue)19:52:21 No.103168817

>>103168702
magnum-coder when?

Anonymous
11/12/24(Tue)20:04:53 No.103168919

Anonymous 11/12/24(Tue)20:04:53 No.103168919

>>103168784
it's 2am if not later, so unlikely. it's peak indian (always, it's /g/) and mutt hours.

Anonymous
11/12/24(Tue)20:07:09 No.103168932

Anonymous 11/12/24(Tue)20:07:09 No.103168932

>>103168733
The continued use of the Nala card for testing is more inertia than anything. Still, it involves a few important aspects for gooning:
- Format consistency (asterisks for narration, quotes for dialogue, second person PoV, present tense narration)
- Spatial awareness (she pounces on your back, so at minimum it should describe you landing on your front)
- Writing style (the intro and first response are prime material for slop; how well does the model write despite this?)
- Ability to work with non-human characters (quadruped with paws, fangs, and a tail)

I agree, though; it'd be nice to have more variety with few-shot coom tests

>>103168919
It's 4-8PM in burgerland

Anonymous
11/12/24(Tue)20:10:50 No.103168955

Anonymous 11/12/24(Tue)20:10:50 No.103168955

LOL microsoft's "sota" tmac backend (praised by reddit) is actually pretty shit compared to k quants.

https://github.com/ggerganov/llama.cpp/pull/10181

Anonymous
11/12/24(Tue)20:18:01 No.103169000

Anonymous 11/12/24(Tue)20:18:01 No.103169000

Apparently there was a issue with qwen2.5 GGUFs:
https://www.reddit.com/r/LocalLLaMA/comments/1gpw8ls/bug_fixes_in_qwen_25_coder_128k_context_window/

Anonymous
11/12/24(Tue)20:18:32 No.103169005

Anonymous 11/12/24(Tue)20:18:32 No.103169005

>>103168955
Seems like that at least for 2bit on the CPU, its faster for the same or better PPL right?
It's hilarious that they would compare to the static quants instead of the K quants tho.
The "right" way to do these comparisons if you wanted to show that you are the beast would be to measure the ppl and/or KL divergence, look for the fastest quant that has the same or similar performance, then compare how much faster the new method is. That they didn't do that from the get go is already suspect as fuck.

Anonymous
11/12/24(Tue)20:18:43 No.103169008

Anonymous 11/12/24(Tue)20:18:43 No.103169008

>breaking news: local ggufs have a problem!

Anonymous
11/12/24(Tue)20:19:22 No.103169010

Anonymous 11/12/24(Tue)20:19:22 No.103169010

>>103168784
South American here, I'm glad you noticed me!

Anonymous
11/12/24(Tue)20:24:27 No.103169041

Anonymous 11/12/24(Tue)20:24:27 No.103169041

>>103169008
less problems than new releases usually have!

Anonymous
11/12/24(Tue)20:34:06 No.103169087

Anonymous 11/12/24(Tue)20:34:06 No.103169087

>>103168043
an 12b llm quantized to 3 or 4 bits ?
eg: rocinante, or mistal nemo rpmax.

image gen might also be worth trying out.

is 8gb the max you can allocate?

Anonymous
11/12/24(Tue)20:34:43 No.103169089

Anonymous 11/12/24(Tue)20:34:43 No.103169089

>>103169000
>The GGUFs also include some bug fixes we found.
Wtf? Like what?

Anonymous
11/12/24(Tue)20:38:04 No.103169113

Anonymous 11/12/24(Tue)20:38:04 No.103169113

>>103169089
It was buggy for me sometimes, a lot of repeating.

Anonymous
11/12/24(Tue)20:38:37 No.103169115

Anonymous 11/12/24(Tue)20:38:37 No.103169115

>>103169005
its faster but ppl is worse. 7.36 with EQAT-w2g64-INT_N vs 6.98 with Q2_K. Also if youre using 2 bit you should use an i quant for even lower perplexity as the model's lobotimized to shit already. Like iq3_xss or iq2_m are similar sized but have better ppl.

They also used qat models for their numbers and rightfully got called out for it so screw them.

Anonymous
11/12/24(Tue)20:39:45 No.103169119

Anonymous 11/12/24(Tue)20:39:45 No.103169119

>>103169113
I meant more like how he fixed the issues that supposedly are there that he didn't mention. I make my own GGUFs so this would be useful to know, if they really are fixes.

Anonymous
11/12/24(Tue)20:42:15 No.103169136

Anonymous 11/12/24(Tue)20:42:15 No.103169136

>>103169119
I assumed you meant to say you didn't find qwen to be buggy.
I would also like to know what they did "fix".

Anonymous
11/12/24(Tue)20:44:27 No.103169144

Anonymous 11/12/24(Tue)20:44:27 No.103169144

>bot writes story
>story drones on context length increases
>gets to 2t/s but too invested to stop
>sit like a retard watching shit appear on my screen at half my reading speed (plz no reroll)

>PAIN
>WITHOUT LOVE
>PAIN
>I CANT GET ENOUGH

also anyone tried buying a shit ton of those alibaba 10$ intel xeon cpus and then using that backend where it only load 1 layer at a time to keep all the layers in the cpu cache ?

Anonymous
11/12/24(Tue)20:46:32 No.103169158

Anonymous 11/12/24(Tue)20:46:32 No.103169158

File: 1719351514748678.jpg (674 KB, 2048x2048)

674 KB JPG

>>103168817
This

Anonymous
11/12/24(Tue)20:46:39 No.103169160

Anonymous 11/12/24(Tue)20:46:39 No.103169160

File: vulkan.png (108 KB, 966x1032)

108 KB PNG

CUDA IS LOSING

Anonymous
11/12/24(Tue)20:48:24 No.103169165

Anonymous 11/12/24(Tue)20:48:24 No.103169165

>>103168590
I mean there's really not many other options at 70b besides qwen-2.5.

Nemotron is a huge pain in the ass to work with and has a gaping hole in its dataset for anything that goes beyond handholding so it's honestly a pretty boring model to rp with imo.

Anonymous
11/12/24(Tue)20:49:00 No.103169168

Anonymous 11/12/24(Tue)20:49:00 No.103169168

>>103169160
It only loses when using moes, interesting...

Anonymous
11/12/24(Tue)21:00:52 No.103169232

Anonymous 11/12/24(Tue)21:00:52 No.103169232

>>103167806
It's deleted now lol

Anonymous
11/12/24(Tue)21:01:02 No.103169234

Anonymous 11/12/24(Tue)21:01:02 No.103169234

>>103164803
That's pretty cool.

Anonymous
11/12/24(Tue)21:02:51 No.103169247

Anonymous 11/12/24(Tue)21:02:51 No.103169247

>>103164659
Would latency for these models improve if you runpod them/run them off of a dedicated machine?

Anonymous
11/12/24(Tue)21:16:28 No.103169314

Anonymous 11/12/24(Tue)21:16:28 No.103169314

Is Qwen 2.5 Coder 32B better than Codestral 22B?

Anonymous
11/12/24(Tue)21:19:45 No.103169338

Anonymous 11/12/24(Tue)21:19:45 No.103169338

>>103169314
GPT-2 is better than Codestral 22B.

Anonymous
11/12/24(Tue)21:30:59 No.103169416

Anonymous 11/12/24(Tue)21:30:59 No.103169416

>>103169338
Is Qwen 2.5 Coder 32B better than GPT-2?

Anonymous
11/12/24(Tue)21:35:11 No.103169437

Anonymous 11/12/24(Tue)21:35:11 No.103169437

>>103169416
Reflection 70B is better than both

Anonymous
11/12/24(Tue)21:37:48 No.103169454

Anonymous 11/12/24(Tue)21:37:48 No.103169454

>>103169247
>improve
what's the baseline?

Anonymous
11/12/24(Tue)21:49:25 No.103169524

Anonymous 11/12/24(Tue)21:49:25 No.103169524

I had plenty of fun with Mistral-Nemo-Gutenberg-Doppel-12B-v2.Q6_K.gguf . Are there others like it that can fit comfortably on a RTX 3060 with 12GB?

Anonymous
11/12/24(Tue)22:03:58 No.103169621

Anonymous 11/12/24(Tue)22:03:58 No.103169621

>>103169144
> also anyone tried buying a shit ton of those alibaba 10$ intel xeon cpus and then using that backend where it only load 1 layer at a time to keep all the layers in the cpu cache ?

well your idea is obviously stupid but ik has done an experiment with a model solely in 64mb cache

https://github.com/ikawrakow/ik_llama.cpp/discussions/18

Anonymous
11/12/24(Tue)22:07:54 No.103169646

Anonymous 11/12/24(Tue)22:07:54 No.103169646

File: qwen.jpg (82 KB, 863x874)

82 KB JPG

Did Qwen2.5-Coder 32B really beat closed source models?

Anonymous
11/12/24(Tue)22:08:41 No.103169657

Anonymous 11/12/24(Tue)22:08:41 No.103169657

>>103169454
I don't know this is just theoretical.

Anonymous
11/12/24(Tue)22:19:50 No.103169736

Anonymous 11/12/24(Tue)22:19:50 No.103169736

Top-nσ: Not All Logits Are You Need
https://arxiv.org/abs/2411.07641
>Large language models (LLMs) typically employ greedy decoding or low-temperature sampling for reasoning tasks, reflecting a perceived trade-off between diversity and accuracy. We challenge this convention by introducing top-nσ, a novel sampling method that operates directly on pre-softmax logits by leveraging a statistical threshold. Our key insight is that logits naturally separate into a Gaussian-distributed noisy region and a distinct informative region, enabling efficient token filtering without complex probability manipulations. Unlike existing methods (e.g., top-p, min-p) that inadvertently include more noise tokens at higher temperatures, top-nσ maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-nσ to better understand its behavior. The extensive experimental results across four reasoning-focused datasets demonstrate that our method not only outperforms existing sampling approaches but also surpasses greedy decoding, while maintaining consistent performance even at high temperatures.
new sampler

Anonymous
11/12/24(Tue)22:24:59 No.103169774

Anonymous 11/12/24(Tue)22:24:59 No.103169774

>>103169736
are all you need = shitty meme paper

Anonymous
11/12/24(Tue)22:30:52 No.103169821

Anonymous 11/12/24(Tue)22:30:52 No.103169821

>>103169736
Too much text. Does turning it one make outputs better or no? Stupid dumb researchers.

Anonymous
11/12/24(Tue)22:36:06 No.103169865

Anonymous 11/12/24(Tue)22:36:06 No.103169865

>>103169621
i remember that but besides that anyone tried any other shit with the cpu cache ? really seems like such a waste to not make use of those cpus if you add up the cost per ram is around double not including setting it up and all the cables and shit idk just weired no one ever talks about it its cheap af to just try and no one has tried to optimise it in any way

Anonymous
11/12/24(Tue)22:38:12 No.103169881

Anonymous 11/12/24(Tue)22:38:12 No.103169881

>>103169144
>only load 1 layer at a time to keep all the layers in the cpu cache
Even AMD 3D cache is too small to hold a layer for most models. Even then, layer by layer processing can only speeds things up with batching/prefill.

Anonymous
11/12/24(Tue)22:39:25 No.103169887

Anonymous 11/12/24(Tue)22:39:25 No.103169887

File: 1k1hky.jpg (35 KB, 500x414)

35 KB JPG

>>103164659

What is the best micro model for writing creative text snippets and is licensed for commercial use?

I'm building a game and I want it to run an LLM to write descriptions of NPCs and object based on stats.

Looking for maximum speed even on mid-range cards. I saw considering Llama-3.2-1B but the license is restrictive.

Is there something like Mistral for 1B?

Anonymous
11/12/24(Tue)22:45:22 No.103169932

Anonymous 11/12/24(Tue)22:45:22 No.103169932

Towards Low-bit Communication for Tensor Parallel LLM Inference
https://arxiv.org/abs/2411.07942
>Tensor parallelism provides an effective way to increase server large language model (LLM) inference efficiency despite adding an additional communication cost. However, as server LLMs continue to scale in size, they will need to be distributed across more devices, magnifying the communication cost. One way to approach this problem is with quantization, but current methods for LLMs tend to avoid quantizing the features that tensor parallelism needs to communicate. Taking advantage of consistent outliers in communicated features, we introduce a quantization method that reduces communicated values on average from 16 bits to 4.2 bits while preserving nearly all of the original performance. For instance, our method maintains around 98.0% and 99.5% of Gemma 2 27B's and Llama 2 13B's original performance, respectively, averaged across all tasks we evaluated on.
little interesting but very short paper (internship one). still being able to reduce communication between gpus is good

Anonymous
11/12/24(Tue)22:48:53 No.103169956

Anonymous 11/12/24(Tue)22:48:53 No.103169956

>>103169646
Except claude 3.5 sonnet but its close.

Anonymous
11/12/24(Tue)22:54:02 No.103170003

Anonymous 11/12/24(Tue)22:54:02 No.103170003

>>103169646
>context length up to 32,768 tokens
not quite

Anonymous
11/12/24(Tue)23:00:58 No.103170051

Anonymous 11/12/24(Tue)23:00:58 No.103170051

File: Untitled.png (481 KB, 1080x1229)

481 KB PNG

LAUREL: Learned Augmented Residual Layer
https://arxiv.org/abs/2411.07501
>One of the core pillars of efficient deep learning methods is architectural improvements such as the residual/skip connection, which has led to significantly better model convergence and quality. Since then the residual connection has become ubiquitous in not just convolutional neural networks but also transformer-based architectures, the backbone of LLMs. In this paper we introduce \emph{Learned Augmented Residual Layer} (LAuReL) -- a novel generalization of the canonical residual connection -- with the goal to be an in-situ replacement of the latter while outperforming on both model quality and footprint metrics. Our experiments show that using \laurel can help boost performance for both vision and language models. For example, on the ResNet-50, ImageNet 1K task, it achieves 60% of the gains from adding an extra layer, while only adding 0.003% more parameters, and matches it while adding 2.6× fewer parameters.
From google research. interesting though they didn't scale or test a lot of different models

Anonymous
11/12/24(Tue)23:13:52 No.103170148

Anonymous 11/12/24(Tue)23:13:52 No.103170148

Qwen2.5-Coder 72B was held back by the Chinese government because it was too powerful. Only official chinese agencies have access to it.

Anonymous
11/12/24(Tue)23:16:45 No.103170177

Anonymous 11/12/24(Tue)23:16:45 No.103170177

File: Untitled.png (478 KB, 1080x1451)

478 KB PNG

Entropy Controllable Direct Preference Optimization
https://arxiv.org/abs/2411.07595
>In the post-training of large language models (LLMs), Reinforcement Learning from Human Feedback (RLHF) is an effective approach to achieve generation aligned with human preferences. Direct Preference Optimization (DPO) allows for policy training with a simple binary cross-entropy loss without a reward model. The objective of DPO is regularized by reverse KL divergence that encourages mode-seeking fitting to the reference policy. Nonetheless, we indicate that minimizing reverse KL divergence could fail to capture a mode of the reference distribution, which may hurt the policy's performance. Based on this observation, we propose a simple modification to DPO, H-DPO, which allows for control over the entropy of the resulting policy, enhancing the distribution's sharpness and thereby enabling mode-seeking fitting more effectively. In our experiments, we show that H-DPO outperformed DPO across various tasks, demonstrating superior results in pass@k evaluations for mathematical tasks. Moreover, H-DPO is simple to implement, requiring only minor modifications to the loss calculation of DPO, which makes it highly practical and promising for wide-ranging applications in the training of LLMs.
https://github.com/pfnet
https://github.com/muupan
Code probably will be posted (nothing stated in paper) since it's just minor modification of DPO
Decomposes the reverse KL divergence into its entropy and cross-entropy components. Then by attaching a coefficient to entropy that is less than 1 it can be reduced while fitting between distributions.

Anonymous
11/12/24(Tue)23:23:34 No.103170219

Anonymous 11/12/24(Tue)23:23:34 No.103170219

>>103170003
Open source models still have problem with context length plus is is computationally expensive

Anonymous
11/12/24(Tue)23:26:35 No.103170241

Anonymous 11/12/24(Tue)23:26:35 No.103170241

>>103170219
Jamba does long context perfectly. It's very obvious that all the closed models have migrate to a similar architecture by now.

Anonymous
11/12/24(Tue)23:28:04 No.103170249

Anonymous 11/12/24(Tue)23:28:04 No.103170249

>>103170241
Jamba is retarded.

Anonymous
11/12/24(Tue)23:36:12 No.103170318

Anonymous 11/12/24(Tue)23:36:12 No.103170318

>>103170003
Apparently it works with 128k >>103169000

Anonymous
11/12/24(Tue)23:39:33 No.103170339

Anonymous 11/12/24(Tue)23:39:33 No.103170339

>>103170003
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct
Its 128k

Anonymous
11/12/24(Tue)23:49:23 No.103170396

Anonymous 11/12/24(Tue)23:49:23 No.103170396

>>103170318
>>103170339
oh okay hopefully a million context window version will be released soon like they did with meta's model (Llama-3 Gradient Instruct )

Anonymous
11/12/24(Tue)23:50:24 No.103170407

Anonymous 11/12/24(Tue)23:50:24 No.103170407

>>103170396
Why not a billion?

Anonymous
11/12/24(Tue)23:51:59 No.103170417

Anonymous 11/12/24(Tue)23:51:59 No.103170417

>>103170407
Would run out of memory.

Anonymous
11/13/24(Wed)00:15:56 No.103170567

Anonymous 11/13/24(Wed)00:15:56 No.103170567

>>103169736
>Are You Need

Anonymous
11/13/24(Wed)00:35:51 No.103170664

Anonymous 11/13/24(Wed)00:35:51 No.103170664

has 8gb cooming not progressed in months

Anonymous
11/13/24(Wed)01:18:39 No.103170920

Anonymous 11/13/24(Wed)01:18:39 No.103170920

>>103170664
yea it has. it's called use eva qwen 72b and have patience.

Anonymous
11/13/24(Wed)01:28:33 No.103170974

Anonymous 11/13/24(Wed)01:28:33 No.103170974

I have sympathy for people who use drummer tunes because some of them are relatively coherent (behemoth) and they talk dirty in a way that standard instruct won't, but eva qwen is so fucking retarded that I get tilted when I see people recommend it

Anonymous
11/13/24(Wed)01:53:45 No.103171091

Anonymous 11/13/24(Wed)01:53:45 No.103171091

>>103170974
Not sure if your trolling or have something fucked up on your end.

Anonymous
11/13/24(Wed)02:08:13 No.103171164

Anonymous 11/13/24(Wed)02:08:13 No.103171164

File: Screenshot_20241113_140548_X.jpg (508 KB, 1080x1124)

508 KB JPG

Simply pretraining on more of the same data has hit a wall. Ilya confirmed it

Anonymous
11/13/24(Wed)02:09:10 No.103171166

Anonymous 11/13/24(Wed)02:09:10 No.103171166

Very new.
Is it better to have a q8 version of a bigger parameter version of a model than to have a smaller parameter model at full precision?
specifically between Qwen2.5-Coder-14B at Q6 vs 7B fp16?
I have a 16GB gpu and 64GB DDR5

Anonymous
11/13/24(Wed)02:09:59 No.103171171

Anonymous 11/13/24(Wed)02:09:59 No.103171171

>>103171166
Bigger model is always better as long as your using a quant above 2 bit.

Anonymous
11/13/24(Wed)02:14:21 No.103171190

Anonymous 11/13/24(Wed)02:14:21 No.103171190

>>103171091
Eva Qwen 72b q4_k_m using the recommended context/instruct and system prompt in SillyTavern. Was doing standard RP formatted in narrative style (no asterisks). Total retardation. Also tried unstructured storytelling. Doo doo. Tried recommended samplers and fiddled with them a bit. I'm comparing to Mistral large q3 xxs which is generally the smartest local model I've used. I load in behemoth and switch to pygmalion format when I want to do the nasty

Anonymous
11/13/24(Wed)02:15:03 No.103171199

Anonymous 11/13/24(Wed)02:15:03 No.103171199

>>103171190
Use exl2

Anonymous
11/13/24(Wed)02:16:57 No.103171208

Anonymous 11/13/24(Wed)02:16:57 No.103171208

>>103171199
Why?

Anonymous
11/13/24(Wed)02:18:42 No.103171218

Anonymous 11/13/24(Wed)02:18:42 No.103171218

File: modelquantqualitycomparison.png (336 KB, 3000x2100)

336 KB PNG

>>103171166
picrel, what >>103171171 is talking about.
This shows how quanting down to IQ3 on large and Q4 on small models doesn't do too much damage, and that even the fp16 full 8B model scored the same as a completely lobotomized 70B.

So if you're vramlet, you have two options. Garbage at the speed of light, or letting the chef cook a meal worth eating.

Anonymous
11/13/24(Wed)02:19:44 No.103171226

Anonymous 11/13/24(Wed)02:19:44 No.103171226

>>103171208
Because it seems like every time ive seen some sort of issue complained about here it was gguf quant or llama.cpp related

Anonymous
11/13/24(Wed)02:21:32 No.103171235

Anonymous 11/13/24(Wed)02:21:32 No.103171235

>>103169887
Use a Q2 quant of Mistral 7B? That's kinda 1B-ish.

Anonymous
11/13/24(Wed)02:26:50 No.103171266

Anonymous 11/13/24(Wed)02:26:50 No.103171266

>>103171226
That's just because everyone uses gguf/llama.cpp. I seriously doubt llama.cpp is specifically breaking the shitty qwen eva fine-tune and no other models

Anonymous
11/13/24(Wed)02:28:13 No.103171272

Anonymous 11/13/24(Wed)02:28:13 No.103171272

>>103171171
Thanks.
>>103171218
I dont mind waiting, I'm using kobold and silly, can you check my thinking?
If i use
Qwen2.5-Coder-32B-Instruct-GGUF Q8 which is roughly 36GB in model size it'll be offloaded to RAM?, I have 64GB ram with maybe 2GB for system overhead.
Or should i stick to something that fits in my Vram completely?

Anonymous
11/13/24(Wed)02:34:30 No.103171301

Anonymous 11/13/24(Wed)02:34:30 No.103171301

>>103171272
I'd recommend at least around 80% in vram. The speed drop comes in fast.
Prepare for 1-2 t/s output if you offload much to ram. Especially a big model.

Anonymous
11/13/24(Wed)02:38:05 No.103171321

Anonymous 11/13/24(Wed)02:38:05 No.103171321

>>103171164
feeling smug because it felt intuitively obvious to me in 2022 that these things would eventually cap out at the average intelligence level of the material in the training data

Anonymous
11/13/24(Wed)02:40:44 No.103171333

Anonymous 11/13/24(Wed)02:40:44 No.103171333

>>103171272
I'm 12GB VRAM so while there are models that fit my card completely, they are too stupid to be worthwhile. People keep saying such-and-such SOTA small model is the nuts but I try them and they immediately fail my cursory knowledge tests and can't last three turns of role play before I shrug and delete them. It's not worth the time to type into them, no matter how quickly they write back.

Qwen 2.5 Coder 32B is the smallest I have and I just downloaded it. Everything I've not deleted for being bad is 45 to 55 GiB. I'm also 64GB system RAM, so if I go larger than that range I start risking swapping and I don't want to blow out my SSD for 0.1 t/s just because I went slightly over my RAM capacity by turning on Pluto TV. So I get 1 to 2 t/s instead.

Anonymous
11/13/24(Wed)02:41:13 No.103171336

Anonymous 11/13/24(Wed)02:41:13 No.103171336

>>103171164
I mean at some point it was obvious that you can't stack layers forever and expect to be more and more intelligent, a new architecture will increase this threshold though, they should focus about that instead

Anonymous
11/13/24(Wed)02:48:55 No.103171378

Anonymous 11/13/24(Wed)02:48:55 No.103171378

Honestly it's a good thing it's plateauing. Fuck Nvidia.

Anonymous
11/13/24(Wed)02:53:21 No.103171392

Anonymous 11/13/24(Wed)02:53:21 No.103171392

>>103171378
It's kind of my best case scenario if scaling laws permit the invention of moderately useful assistants for intellectual janitor work, but the people who wanted to create some kind of deity are out of luck. Thanks, God.

Anonymous
11/13/24(Wed)02:55:44 No.103171400

Anonymous 11/13/24(Wed)02:55:44 No.103171400

>>103171301
>>103171333
Thanks, useful to be aware of both. Some experimentation is required by me then.

Anonymous
11/13/24(Wed)02:57:28 No.103171412

Anonymous 11/13/24(Wed)02:57:28 No.103171412

>>103171336
I have a feeling it's going to be a while before we get the next revolutionary architecture like transformers were.

Anonymous
11/13/24(Wed)02:59:37 No.103171423

Anonymous 11/13/24(Wed)02:59:37 No.103171423

>>103171392
I still want a deity in a romantic fictional sense, but definitely not created by any of the faggots trying to create it currently. Like it'd be cool if a sentient and consciousness being could somehow just spontaneously rise out of the collective network of AIs communicating with each other in the future. But that's too magical of a thought.

Anonymous
11/13/24(Wed)03:04:17 No.103171439

Anonymous 11/13/24(Wed)03:04:17 No.103171439

>>103171378
>Honestly it's a good thing it's plateauing. Fuck Nvidia.
I mean, we still have a lot of potential to discover though, Qwen proved that you can get gpt4 level of coding with only a 32b model, imagine doing this quality of a pretraining + finetuning with a 1T model

Anonymous
11/13/24(Wed)03:06:51 No.103171453

Anonymous 11/13/24(Wed)03:06:51 No.103171453

File: file.png (25 KB, 802x632)

25 KB PNG

Made a shitty bullet hell game with 32B Coder. Was hell to fix some bugs since I was being retarded.
https://pastebin.com/U6gd5YGd
requires pygame
Space (hold) to shoot, Esc to quit. Enemies need to be shot with 3 bullets.

Anonymous
11/13/24(Wed)03:09:07 No.103171458

Anonymous 11/13/24(Wed)03:09:07 No.103171458

>>103171453
make it 3d

Anonymous
11/13/24(Wed)03:12:16 No.103171477

Anonymous 11/13/24(Wed)03:12:16 No.103171477

>>103171439
I'd rather wish for it to not work out just to spite Nvidia.

Anonymous
11/13/24(Wed)03:13:30 No.103171484

Anonymous 11/13/24(Wed)03:13:30 No.103171484

>>103171336
They're coming up with shit like test time compute and o1. If it kept scaling they wouldn't have to resort to that

Anonymous
11/13/24(Wed)03:16:56 No.103171507

Anonymous 11/13/24(Wed)03:16:56 No.103171507

>>103171458
I get errors trying to pip install PyOpenGL_accelerate

Anonymous
11/13/24(Wed)03:19:55 No.103171526

Anonymous 11/13/24(Wed)03:19:55 No.103171526

File: file.png (92 KB, 770x397)

92 KB PNG

>>103171507
But anyway here's the initial draft. https://pastebin.com/bc5isTjX
I don't know how to code so I'm done for now.

Anonymous
11/13/24(Wed)03:31:45 No.103171586

Anonymous 11/13/24(Wed)03:31:45 No.103171586

>>103171453
How does this compare to other programming models? Is this the first local one to be able to one shot an Asteroids With Guns? Or is it impressive to do it on 32B?

Anonymous
11/13/24(Wed)03:36:40 No.103171614

Anonymous 11/13/24(Wed)03:36:40 No.103171614

what's the status of voice cloning tts?

Anonymous
11/13/24(Wed)03:40:23 No.103171634

Anonymous 11/13/24(Wed)03:40:23 No.103171634

>>103171614
no

Anonymous
11/13/24(Wed)03:41:04 No.103171641

Anonymous 11/13/24(Wed)03:41:04 No.103171641

>>103171526
make it 4d

Anonymous
11/13/24(Wed)03:41:16 No.103171642

Anonymous 11/13/24(Wed)03:41:16 No.103171642

>>103171634
same as local language models then, gotcha

Anonymous
11/13/24(Wed)03:54:20 No.103171702

Anonymous 11/13/24(Wed)03:54:20 No.103171702

>>103171614
lurk more faggot

Anonymous
11/13/24(Wed)04:06:46 No.103171770

Anonymous 11/13/24(Wed)04:06:46 No.103171770

Red Hat bought vLLM: https://www.redhat.com/en/about/press-releases/red-hat-acquire-neural-magic

Anonymous
11/13/24(Wed)04:12:41 No.103171795

Anonymous 11/13/24(Wed)04:12:41 No.103171795

anyone using animepro flux?

Anonymous
11/13/24(Wed)04:14:19 No.103171805

Anonymous 11/13/24(Wed)04:14:19 No.103171805

sup bros, I'm using the exact specs of the getting started guide and I'm getting mixed results, plus I feel like I can't find interesting bots really.

Can you guys post some setups/models y'all use? If I could locally get to something like janitor AI I'd be set, got a 16gb card.

Anonymous
11/13/24(Wed)04:15:02 No.103171808

Anonymous 11/13/24(Wed)04:15:02 No.103171808

>>103171795
>anyone using animepro flux?
wrong thread my friend
>>103165357

Anonymous
11/13/24(Wed)04:15:27 No.103171812

Anonymous 11/13/24(Wed)04:15:27 No.103171812

Anyone here tried finetuning using aws sagemaker/ec2 inf

Anonymous
11/13/24(Wed)04:33:06 No.103171903

Anonymous 11/13/24(Wed)04:33:06 No.103171903

>>103171805
Write your own prompts, try newer models if you're using then old guides (mistral nemo is fine) and lurk. Browse this https://chub.ai/ (click on legacy site). Skip the shit, keep bits you find interesting, if any.
For nemo, neutralize all samplers and set temp to 0.5. Play with the samplers to learn what effect they have. Change temp to your liking. I use it with temp 1 and min-p 0.01. That's it. If you want more schizo, temp 5, min-p 0.1, Play with
>Sampler visualizer: https://artefact2.github.io/llm-sampling
To roughly understand what they do.
Did i mention to write your own prompts? Write your own prompts.
>Official /lmg/ card: https://files.catbox.moe/cbclyf.png
Use that as a starting point if you want.
Figure out what works for you and your model and experiment. Everyone writes differently, everyone finds different things interesting, every model behaves differently.
Or maybe the novelty is gone and it's just not for you. That's fine too.

Anonymous
11/13/24(Wed)04:52:33 No.103172004

Anonymous 11/13/24(Wed)04:52:33 No.103172004

>>103171770
Grim

Anonymous
11/13/24(Wed)04:58:37 No.103172052

Anonymous 11/13/24(Wed)04:58:37 No.103172052

>>103172009
this time it really has though

Anonymous
11/13/24(Wed)04:59:47 No.103172062

Anonymous 11/13/24(Wed)04:59:47 No.103172062

Anything interesting I can try at 24 GB for cooming? Been using Nemo tunes but it's getting a bit stale. I'll also accept writing assistants.

Anonymous
11/13/24(Wed)05:03:39 No.103172089

Anonymous 11/13/24(Wed)05:03:39 No.103172089

>>103172009
yes

Anonymous
11/13/24(Wed)05:13:23 No.103172143

Anonymous 11/13/24(Wed)05:13:23 No.103172143

>>103172009
it's the new "safe and effective" buzzword

Anonymous
11/13/24(Wed)05:13:28 No.103172144

Anonymous 11/13/24(Wed)05:13:28 No.103172144

File: based.png (35 KB, 549x382)

35 KB PNG

>>103171164
No Sam just needs more compute!

Anonymous
11/13/24(Wed)05:16:35 No.103172164

Anonymous 11/13/24(Wed)05:16:35 No.103172164

Listen, I just want to know what argument I should use with my dumbass anti-ai friend once this eventually trickles down to his social media feed and he sends it to me as a sort of "gotcha".
>You should get better friends
Maybe...

Anonymous
11/13/24(Wed)05:19:13 No.103172171

Anonymous 11/13/24(Wed)05:19:13 No.103172171

>>103171903
Thanks to someone here, I found that writing the card in first person really improves character adhesion. It feels less like the assistant persona is impersonating the character. At least it works like that with Rocinante

Anonymous
11/13/24(Wed)05:19:18 No.103172172

Anonymous 11/13/24(Wed)05:19:18 No.103172172

>>103172164
>this
what?

Anonymous
11/13/24(Wed)05:19:25 No.103172173

Anonymous 11/13/24(Wed)05:19:25 No.103172173

>>103172164
AI is like cars
A generally available 1970's car can do 1970's top speeds.
A 2030's car will do 2030's top speeds.
Both are still cars.
?

Anonymous
11/13/24(Wed)05:27:54 No.103172223

Anonymous 11/13/24(Wed)05:27:54 No.103172223

>>103172164
It's actually over for real your friend won.

Anonymous
11/13/24(Wed)05:30:10 No.103172239

Anonymous 11/13/24(Wed)05:30:10 No.103172239

>>103172144
I don't wanna hear from this retard anymore, he didn't do anything to improve the LLM ecosystem, his llama models are retarded compared to the chink ones, especially Qwen, and it's really rich of him to say that "scalling is bad" when they went to pretrain a fucking 405b model

Anonymous
11/13/24(Wed)05:36:29 No.103172270

Anonymous 11/13/24(Wed)05:36:29 No.103172270

STOP scaling models it WON'T WORK you bigots, AI is for ALL FOLK not just the rich

Anonymous
11/13/24(Wed)05:38:19 No.103172284

Anonymous 11/13/24(Wed)05:38:19 No.103172284

>>103172239
He has nothing to do with the Llama models. He works on the V-JEPA vaporware when he isn't being passive aggressive online.

Anonymous
11/13/24(Wed)05:39:57 No.103172295

Anonymous 11/13/24(Wed)05:39:57 No.103172295

>>103172284
that's even worse when you think about it, it means that he has contributed NOTHING to the modern AI ecosystem, why are people talking him seriously anymore, he's a fucking hasbeen

Anonymous
11/13/24(Wed)05:42:12 No.103172308

Anonymous 11/13/24(Wed)05:42:12 No.103172308

>>103172171
I use it mostly for coop writing, so i write in third person. I use the model as an aug, so there's no split between me (the user) and the model, but i can still talk with it as a sort of "internal dialog". The characters in the stories do their own thing with some guidance from "us". Every now and then characters would break the fourth wall, so to speak, and talk directly to us. Kind of cool, even if out of character.
That's why i suggest people write their own prompts/cards/whatever. We all use these things in different ways and have different expectations.

Anonymous
11/13/24(Wed)05:44:56 No.103172327

Anonymous 11/13/24(Wed)05:44:56 No.103172327

>>103172284
is that related to JAMBA?

Anonymous
11/13/24(Wed)05:46:15 No.103172334

Anonymous 11/13/24(Wed)05:46:15 No.103172334

>>103172308
That's a cool concept. There is so much we can do with these little things with a bit of creativity

Anonymous
11/13/24(Wed)05:46:19 No.103172335

Anonymous 11/13/24(Wed)05:46:19 No.103172335

>>103172223
Upon further reflection, I'm not under the impression that he understands the concepts "pre training" and "unlabeled data" any better than me. So, I think I'm okay here.
Additionally, I've come to the conclusion that yes, I need better (more) friends.

Anonymous
11/13/24(Wed)05:46:19 No.103172336

Anonymous 11/13/24(Wed)05:46:19 No.103172336

>>103164575
>>103164575
>>103164575
reminder that OP is a thread splitting nigger with serious mental issues.

Anonymous
11/13/24(Wed)05:47:25 No.103172347

Anonymous 11/13/24(Wed)05:47:25 No.103172347

>>103172336
cope, seethe, dilate, etc...

Anonymous
11/13/24(Wed)05:48:03 No.103172351

Anonymous 11/13/24(Wed)05:48:03 No.103172351

>>103172164
In this context, what does 'anti' signify? Does he disbelieve that AI can improve at all, or does he advocate for AI's cessation due to perceived danger?

Anonymous
11/13/24(Wed)05:50:07 No.103172366

Anonymous 11/13/24(Wed)05:50:07 No.103172366

>>103172347
I agree xer should do that instead of splitting the thread because someone used a picture of a different anime character.

Anonymous
11/13/24(Wed)05:51:26 No.103172376

Anonymous 11/13/24(Wed)05:51:26 No.103172376

>>103172327
Unrelated. V-JEPA is LeCun's project to get a model to learn by building a world model through watching videos.
https://github.com/facebookresearch/jepa

Anonymous
11/13/24(Wed)05:54:49 No.103172407

Anonymous 11/13/24(Wed)05:54:49 No.103172407

>>103172351
The latter. With the addition of "it's a plagiarism machine", "it's killing the trees", and "corpos will use it to do evil things". He did conceded something to the effect of "sometimes it has uses" when I sent him that article about the Nazca drawings but I think, in general, "anti-ai" means "we should stop developing it".

Anonymous
11/13/24(Wed)05:56:36 No.103172424

Anonymous 11/13/24(Wed)05:56:36 No.103172424

>>103172407
ask him if he thinks china will stop developing it and using it to more efficiently genocide the uyghurs

Anonymous
11/13/24(Wed)05:56:59 No.103172431

Anonymous 11/13/24(Wed)05:56:59 No.103172431

>>103172407
>"it's a plagiarism machine", "it's killing the trees", and "corpos will use it to do evil things"
Those are all valid points. At least he isn't crying about muh jobs.

Anonymous
11/13/24(Wed)05:59:38 No.103172452

Anonymous 11/13/24(Wed)05:59:38 No.103172452

The first CoT RP model would be cool

Anonymous
11/13/24(Wed)06:02:19 No.103172471

Anonymous 11/13/24(Wed)06:02:19 No.103172471

>>103172376
isn't every big lab already doing that now by tossing every modality into one semantic space

Anonymous
11/13/24(Wed)06:05:11 No.103172488

Anonymous 11/13/24(Wed)06:05:11 No.103172488

>>103171770
Wasn't vLLM already the corpo backend to begin with?
I don't think this makes a relevant difference.

Anonymous
11/13/24(Wed)06:05:39 No.103172490

Anonymous 11/13/24(Wed)06:05:39 No.103172490

>>103172471
Yes, but he argues that LLMs are a dead-end because their design fundamentally prevents them from building a world model. V-JEPA is supposed to solve that.

Anonymous
11/13/24(Wed)06:11:37 No.103172537

Anonymous 11/13/24(Wed)06:11:37 No.103172537

>>103172471
it's a completely different approach https://youtu.be/ceIlHXeYVh8?t=986

Anonymous
11/13/24(Wed)06:45:17 No.103172748

Anonymous 11/13/24(Wed)06:45:17 No.103172748

"big-engine-test" from LMSYS is crazy good in terms of vision abilities

Anonymous
11/13/24(Wed)06:46:39 No.103172754

Anonymous 11/13/24(Wed)06:46:39 No.103172754

File: pepe.jpg (9 KB, 204x247)

9 KB JPG

>thousands of users are still desperately trying to get smut out of c.ai and battling the insane censoring
Why are people so stubborn when they will likely get better stuff out of shitty 8b models? Their computer could likely handle it

Anonymous
11/13/24(Wed)06:51:53 No.103172788

Anonymous 11/13/24(Wed)06:51:53 No.103172788

>>103172754
>Their computer
lol zoomer mutts use mobile phones

Anonymous
11/13/24(Wed)06:55:40 No.103172813

Anonymous 11/13/24(Wed)06:55:40 No.103172813

>>103172754
It could be habit or familiarity too. And i suspect some of them are the types that would ask if it's "safe" to update ST because they're afraid of git or the hacker window with the letters and stuff.
Probably for the better for them to stay there...

Anonymous
11/13/24(Wed)07:00:30 No.103172834

Anonymous 11/13/24(Wed)07:00:30 No.103172834

>>103171164
That's it, I'm shorting Nvidia.

Anonymous
11/13/24(Wed)07:01:20 No.103172839

Anonymous 11/13/24(Wed)07:01:20 No.103172839

>>103172754
Someone should open a public ST instance for zoomer and log the shit out of it.

Anonymous
11/13/24(Wed)07:04:08 No.103172849

Anonymous 11/13/24(Wed)07:04:08 No.103172849

is Qwen coder good at other languages than english?

Anonymous
11/13/24(Wed)07:06:43 No.103172864

Anonymous 11/13/24(Wed)07:06:43 No.103172864

>>103172754
cai still has the best rp model

Anonymous
11/13/24(Wed)07:07:01 No.103172868

Anonymous 11/13/24(Wed)07:07:01 No.103172868

>>103172849
It's really good at chinese

Anonymous
11/13/24(Wed)07:08:15 No.103172882

Anonymous 11/13/24(Wed)07:08:15 No.103172882

File: file.png (122 KB, 1023x905)

122 KB PNG

>I’m Henry from FlowGPT! We’ve built several products, including the largest prompt platform in 2023, and are now focusing on roleplay AI.

>We could provide GPUs and over 100 billion tokens of high-quality roleplay data.

>I'm already in an existing collaboration with AI Dungeon

Anonymous
11/13/24(Wed)07:10:40 No.103172894

Anonymous 11/13/24(Wed)07:10:40 No.103172894

>>103172882
As I've been saying. Everybody in this field except (You) is profiting off it in one way or another. Thank you for your contribution.

Anonymous
11/13/24(Wed)07:11:13 No.103172900

Anonymous 11/13/24(Wed)07:11:13 No.103172900

>>103172882
>high-quality roleplay data
Hmm..

Anonymous
11/13/24(Wed)07:12:43 No.103172906

Anonymous 11/13/24(Wed)07:12:43 No.103172906

>>103172839
>extracts the assest shit roleplay to ever be written by a human and responses of similar quality

Anonymous
11/13/24(Wed)07:13:29 No.103172910

Anonymous 11/13/24(Wed)07:13:29 No.103172910

>>103172906
just filter out the bad ones

Anonymous
11/13/24(Wed)07:16:34 No.103172928

Anonymous 11/13/24(Wed)07:16:34 No.103172928

>>103172910
>we now have 3 (three) really good samples. They happened when the model started talking on behalf of the user to itself.

Anonymous
11/13/24(Wed)07:21:50 No.103172960

Anonymous 11/13/24(Wed)07:21:50 No.103172960

>>103172882
>>>>>>>>>>>AI Dungeon
Does /lmg/ know?

Anonymous
11/13/24(Wed)07:24:15 No.103172974

Anonymous 11/13/24(Wed)07:24:15 No.103172974

>>103172882
I could debate if half an epoch of roleplay data does anything at all except make model hornier at the cost of being more retarded. Buy half an epoch to try and cause different types of personalities in a model? People believe that actually works and improves quality?

Anonymous
11/13/24(Wed)07:43:41 No.103173120

Anonymous 11/13/24(Wed)07:43:41 No.103173120

Are there honest people actually making money with AI or it's just grifters bullshitting and stealing their way to the top?

Anonymous
11/13/24(Wed)07:56:31 No.103173204

Anonymous 11/13/24(Wed)07:56:31 No.103173204

>>103173120
A mix of both, AI right now is bested used as entertainment unless you are making predictive models for a short period of time, but those are way different than chat bots and 99% of people would fall asleep when listening to a presentation about predictive models for house prices or medicine or something.

Anonymous
11/13/24(Wed)07:58:47 No.103173218

Anonymous 11/13/24(Wed)07:58:47 No.103173218

>>103171614
>>103165081

Anonymous
11/13/24(Wed)08:05:15 No.103173264

Anonymous 11/13/24(Wed)08:05:15 No.103173264

>>103168693
32B Coder Instruct vs 32B Instruct? How stable with <Q4 quants?

Anonymous
11/13/24(Wed)08:26:34 No.103173398

Anonymous 11/13/24(Wed)08:26:34 No.103173398

>>103173120
>Are there honest people actually making money with AI
as a data scientist, I definitely work faster by asking claude 3.5 Sonnet to do the coding shit for me kek

Anonymous
11/13/24(Wed)08:26:55 No.103173399

Anonymous 11/13/24(Wed)08:26:55 No.103173399

File: 1713738136444537.png (206 KB, 834x856)

206 KB PNG

>>103172864
I miss the AI making noises, but discovered that Nemo 12B does them too

Anonymous
11/13/24(Wed)08:37:37 No.103173469

Anonymous 11/13/24(Wed)08:37:37 No.103173469

>>103173457
>>103173457
>>103173457

Anonymous
11/13/24(Wed)08:50:58 No.103173571

Anonymous 11/13/24(Wed)08:50:58 No.103173571

>>103173120
Making money using AI as a tool? Yes, me included.

Anonymous
11/13/24(Wed)09:39:17 No.103173875

Anonymous 11/13/24(Wed)09:39:17 No.103173875

>>103172894
And how are you profiting off it?

Anonymous
11/13/24(Wed)10:05:10 No.103174055

Anonymous 11/13/24(Wed)10:05:10 No.103174055

>>103164575
>>103164575
>>103164575

Anonymous
11/13/24(Wed)10:36:03 No.103174315

Anonymous 11/13/24(Wed)10:36:03 No.103174315

>>103173399
Nemo does onomatopoeia. At least I've seen it on lyra and rocinante.
As far as ERP goes, it's really fucking good man.

Anonymous
11/13/24(Wed)11:13:28 No.103174638

Anonymous 11/13/24(Wed)11:13:28 No.103174638

What do I need to run qwen-32b-coder?

Anonymous
11/13/24(Wed)11:19:32 No.103174714

Anonymous 11/13/24(Wed)11:19:32 No.103174714

>>103174638
A computer. Q8_0 is ~34gb and you have to shove that into your gpu. Do the math for other quants.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.