/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor applications are now being accepted. Click here to apply.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 09/11/25(Thu)17:30:17 No.106559371

File: date with miku - good end.png (1.47 MB, 1024x1512)

1.47 MB PNG

/lmg/ - Local Models General Anonymous 09/11/25(Thu)17:30:17 No.106559371

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106551921 & >>106539477

►News
>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/11/25(Thu)17:30:43 No.106559374

Anonymous 09/11/25(Thu)17:30:43 No.106559374

File: __hatsune_miku_vocaloid_d(...).png (2.3 MB, 2555x2321)

2.3 MB PNG

►Recent Highlights from the Previous Thread: >>106551921

-Optimizing code generation workflows on V100 GPUs with MoE models:
>106555312 >106555465 >106555506 >106555522 >106555524 >106555586 >106555717 >106555770 >106555782 >106555852
--Best local text gen models and VRAM optimization discussion:
>106556580 >106556863 >106556934 >106557638 >106557036 >106557046 >106557069 >106557098 >106557239 >106557514 >106557190
--AI surpasses mathematicians in complex analysis challenge:
>106558352 >106558367 >106558387 >106558476 >106558500 >106558527 >106558711
--Baidu's ERNIE-4.5-21B-A3B-Thinking model release and performance evaluation:
>106554153 >106554580 >106555008 >106555170 >106555207
--Silero VAD v6 evaluation and comparison with Nvidia's MarbleNet:
>106557953 >106558064
--LocalAI vs OpenWebUI: backend model management vs frontend interface:
>106555093 >106555341 >106555529 >106558434
--Running 30B-A3B models on 12GB VRAM via expert offloading and quantization:
>106558134 >106558186 >106558210 >106558227 >106558238 >106558251 >106558293 >106558317 >106558341
--GPU layer differences in small vs large models due to parameter grouping and optimization:
>106553923 >106554094 >106554256 >106554362 >106554384 >106554458 >106556050 >106556200
--LongCat's strengths and MoE limitations in llama.cpp compatibility:
>106552000 >106552095 >106552267 >106554325 >106554412
--Achieving deterministic LLM inference through caching logic adjustments:
>106555106 >106555150 >106555169
--llama.cpp development updates and flash attention implementation considerations:
>106553388 >106553417 >106553890 >106555026 >106555040 >106555059 >106555061 >106555068
--Qwen3 Next release`:
>106557806 >106557845 >106557853 >106557858 >106557903
--Miku (free space):
>106555337 >106554679 >106555530 >106555574 >106557190 >106558219 >106559139 >106559166 >106559181

►Recent Highlight Posts from the Previous Thread: >>106551925

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/11/25(Thu)17:33:23 No.106559395

Anonymous 09/11/25(Thu)17:33:23 No.106559395

File: 1671200815321009.png (1.27 MB, 883x1073)

1.27 MB PNG

>>106559374
I made the highlight reel again back to back boys

Anonymous
09/11/25(Thu)17:34:17 No.106559401

Anonymous 09/11/25(Thu)17:34:17 No.106559401

File: 1739311776947199.png (1.61 MB, 1024x1512)

1.61 MB PNG

>>106559371

Anonymous
09/11/25(Thu)17:34:35 No.106559403

Anonymous 09/11/25(Thu)17:34:35 No.106559403

>>106559395
https://www.youtube.com/watch?v=VcWAQ5a1NdI

Anonymous
09/11/25(Thu)17:36:13 No.106559420

Anonymous 09/11/25(Thu)17:36:13 No.106559420

>>106559404
You could still help them out. I'm pretty sure vllm supports it now.

Anonymous
09/11/25(Thu)17:44:55 No.106559499

Anonymous 09/11/25(Thu)17:44:55 No.106559499

My uni gives me access to Copilot Chat (GPT 5) and this thing is dumb as fuck, even with search. I think you people have lied to me about the big models being hugely smarter (narratively) than some 32b model.

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)17:45:44 No.106559506

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)17:45:44 No.106559506

>>106559420
Yes, I could be doing a lot of things but everything has an opportunity cost.

Anonymous
09/11/25(Thu)17:46:26 No.106559516

Anonymous 09/11/25(Thu)17:46:26 No.106559516

>llama.cpp still hasn't added support for qwen-next
https://github.com/ggml-org/llama.cpp/issues/15940
>vllm already merged in support last night
https://github.com/vllm-project/vllm/pull/24526
llama devs are hacks

Anonymous
09/11/25(Thu)17:47:56 No.106559528

Anonymous 09/11/25(Thu)17:47:56 No.106559528

Qwen3 Next geejuff status?

Anonymous
09/11/25(Thu)17:49:59 No.106559551

Anonymous 09/11/25(Thu)17:49:59 No.106559551

>>106559516
vllm devs only needed to bump their pytorch version or something.

Anonymous
09/11/25(Thu)17:50:25 No.106559555

Anonymous 09/11/25(Thu)17:50:25 No.106559555

>>106559516
Maybe rewriting the entire ML stack in C++ wasn't such a good idea.

Anonymous
09/11/25(Thu)17:54:08 No.106559598

Anonymous 09/11/25(Thu)17:54:08 No.106559598

>>106559555
should've used pure C, they probably don't need any of the OOP features anyway.

Anonymous
09/11/25(Thu)17:58:00 No.106559627

Anonymous 09/11/25(Thu)17:58:00 No.106559627

>>106559598
PyTorch is written in C++ contrary to its name. Nobody is using C for good reasons.

Anonymous
09/11/25(Thu)18:05:05 No.106559680

Anonymous 09/11/25(Thu)18:05:05 No.106559680

>>106559627
The performance critical parts are, but it's not like you can use PyTorch directly from C++.

Anonymous
09/11/25(Thu)18:07:00 No.106559696

Anonymous 09/11/25(Thu)18:07:00 No.106559696

qwen 3 80b consensus?

Anonymous
09/11/25(Thu)18:07:36 No.106559701

Anonymous 09/11/25(Thu)18:07:36 No.106559701

>>106559680
of course not. pytorch is literally a wrapper for libtorch which is in C++. you would use libtorch if you wanted to use C++. there's alot more support around pytorch tho as it's far more accessible to people.

Anonymous
09/11/25(Thu)18:08:16 No.106559704

Anonymous 09/11/25(Thu)18:08:16 No.106559704

>>106559516
>>106559555
maybe 1 has a gorillion dollars since it's used by llm companies and one is a hobby project for consumers

Anonymous
09/11/25(Thu)18:09:49 No.106559714

Anonymous 09/11/25(Thu)18:09:49 No.106559714

>>106559696
It's shit because there are no goofs

Anonymous
09/11/25(Thu)18:10:48 No.106559717

Anonymous 09/11/25(Thu)18:10:48 No.106559717

File: 1676316942545183.png (6 KB, 208x242)

6 KB PNG

>>106559714
>he doesn't know how to run safetensors

Anonymous
09/11/25(Thu)18:12:02 No.106559733

Anonymous 09/11/25(Thu)18:12:02 No.106559733

File: 1748050363563976.png (2.2 MB, 1328x1328)

2.2 MB PNG

>>106559696

Anonymous
09/11/25(Thu)18:13:53 No.106559753

Anonymous 09/11/25(Thu)18:13:53 No.106559753

>>106559733
literally me

Anonymous
09/11/25(Thu)18:14:04 No.106559756

Anonymous 09/11/25(Thu)18:14:04 No.106559756

File: MHHHHMMMMMM.png (70 KB, 252x218)

70 KB PNG

>>106559506
*CLAP EMOJI* CUDA *CLAP EMOJI* DEV *CLAP EMOJI* WE *CLAP EMOJI* ARE *CLAP EMOJI* ASKING

Anonymous
09/11/25(Thu)18:16:39 No.106559780

Anonymous 09/11/25(Thu)18:16:39 No.106559780

I never understood how some of you have the hardware and talent to render AI videos and images that are realistic and good and yet you don't make full length porn videos

Anonymous
09/11/25(Thu)18:18:54 No.106559792

Anonymous 09/11/25(Thu)18:18:54 No.106559792

>>106559780
Video models break down quickly past 5 seconds.

Anonymous
09/11/25(Thu)18:19:31 No.106559800

Anonymous 09/11/25(Thu)18:19:31 No.106559800

>>106559780
Porn sucks, text is better. The mind is most powerful sex organ. Unironically. t. man

Anonymous
09/11/25(Thu)18:19:55 No.106559803

Anonymous 09/11/25(Thu)18:19:55 No.106559803

File: 1741181107861956.jpg (41 KB, 734x734)

41 KB JPG

>>106559780
>knowing how to read instructions= talent

Anonymous
09/11/25(Thu)18:22:42 No.106559823

Anonymous 09/11/25(Thu)18:22:42 No.106559823

>>106559780
all the slop I posted this thread took around 8s~ to gen (praise be nunchaku devs)

Anonymous
09/11/25(Thu)18:22:43 No.106559824

Anonymous 09/11/25(Thu)18:22:43 No.106559824

>>106559792
IIRC standard for brainrot tiktok videos is to have a cut every 3 seconds.

Anonymous
09/11/25(Thu)18:27:34 No.106559863

Anonymous 09/11/25(Thu)18:27:34 No.106559863

>>106559824
Now try getting the model to maintain consistency across hundreds of 3 second clips.

Anonymous
09/11/25(Thu)18:29:01 No.106559871

Anonymous 09/11/25(Thu)18:29:01 No.106559871

What options are people running to get speedups on MoE models? There was a way to offload only certain tensors to RAM in order to get a significant speedup. Is it ik_llama.cpp only?

Anonymous
09/11/25(Thu)18:29:50 No.106559884

Anonymous 09/11/25(Thu)18:29:50 No.106559884

>>106559863
Who said consistency was a requirement?

Anonymous
09/11/25(Thu)18:34:34 No.106559925

Anonymous 09/11/25(Thu)18:34:34 No.106559925

>>106559871
"overridetensors": "([2-8]+).ffn_.*_exps.=CPU"
That's what I use on kobold to run 30B A3B Q4_K_M on 8 GB VRAM / 24 GB RAM, the parameter is probably the same on llama.cpp (no fork needed)

Anonymous
09/11/25(Thu)18:34:34 No.106559926

Anonymous 09/11/25(Thu)18:34:34 No.106559926

File: 1747328841336034.mp4 (2.54 MB, 1424x890)

2.54 MB MP4

>>106559780
about that.....
This one is really good
not full creation, but it's one of the better tools released yet
https://ebsynth.com/

Anonymous
09/11/25(Thu)18:35:36 No.106559938

Anonymous 09/11/25(Thu)18:35:36 No.106559938

>>106559871
--cpu-moe is all you need
--n-cpu-moe 999 if you want to be fancy

Anonymous
09/11/25(Thu)18:36:18 No.106559943

Anonymous 09/11/25(Thu)18:36:18 No.106559943

>>106559871
>>106558251
>--n-cpu-moe 37 --gpu-layers 99
Normal llama.cpp.
Obviously, adjust --n-cpu-moe as needed.

Anonymous
09/11/25(Thu)18:38:14 No.106559958

Anonymous 09/11/25(Thu)18:38:14 No.106559958

File: joelHaver-thumbnail.jpg (29 KB, 296x440)

29 KB JPG

>>106559926
huh, I thought picrelated guy was tracing frames by hand

Anonymous
09/11/25(Thu)18:39:17 No.106559962

Anonymous 09/11/25(Thu)18:39:17 No.106559962

>>106559943
What was that -ot thing I saw some anons use? It had a bunch of numbers after it.

Anonymous
09/11/25(Thu)18:40:50 No.106559975

Anonymous 09/11/25(Thu)18:40:50 No.106559975

>>106559824
>>106559780
the issue is it will suck so why bother making it. The ass wont jiggle right, the blowjob wont have audio thats good, any gimmick you add to take advantage of ai will break the lora. And real porn will just look better. Probably better off deepfaking porn already made with enhancements

I have had success using vibevoice to clone a pornstars voice and then have her talk for several minutes using infinite talker. An LLM wrote the script so I wouldnt know what it would say and I got my own personal vid from her, and it was uh... kinda good.

Anonymous
09/11/25(Thu)18:41:02 No.106559979

Anonymous 09/11/25(Thu)18:41:02 No.106559979

>>106559962
With -ot you can target the specific tensors inside the model's layers using regex. --n-cpu-moe simply obfuscates all that much like -ngl does for whole layers.
One thing to keep in mind when using -ot like in >>106559925 is to not move the shared experts (if they exist) from VRAM, since those are always used.

Anonymous
09/11/25(Thu)18:41:22 No.106559984

Anonymous 09/11/25(Thu)18:41:22 No.106559984

>>106559962
-ot was the only way to do same thing before --cpu-moe arguments were introduced.

Anonymous
09/11/25(Thu)18:43:09 No.106559998

Anonymous 09/11/25(Thu)18:43:09 No.106559998

>>106559975
nah bro, you just have to search for it
https://litter.catbox.moe/110x2tu7sbg6hixe.gif

Anonymous
09/11/25(Thu)18:43:16 No.106560000

Anonymous 09/11/25(Thu)18:43:16 No.106560000

>>106559979
>>106559984
Ah cool, thanks for the explanation. So --n-cpu-moe moves only the non-shared experts to CPU? And --cpu-moe keeps *all* non-shared experts to the CPU?

Anonymous
09/11/25(Thu)18:44:28 No.106560007

Anonymous 09/11/25(Thu)18:44:28 No.106560007

fuk me sideways, i wanted to try to use qwen3-next with vLLM and it seems it doesn't work with pipeline parallelism

Anonymous
09/11/25(Thu)18:49:38 No.106560056

Anonymous 09/11/25(Thu)18:49:38 No.106560056

>>106560000
I'm only aware of --n-cpu-moe.
Maybe --cpu-moe is the same thing for koboldcpp, I don't know.
As far as I know, --n-cpu-moe also keeps the normal experts to the CPU/RAM.
You can run llama-server with the -h option to get more details.

Anonymous
09/11/25(Thu)18:49:55 No.106560060

Anonymous 09/11/25(Thu)18:49:55 No.106560060

File: file.png (128 KB, 975x551)

128 KB PNG

llamabros...

Anonymous
09/11/25(Thu)18:53:40 No.106560089

Anonymous 09/11/25(Thu)18:53:40 No.106560089

>>106560060
Note that "primary hardware" is always GPUs. That's because to anyone serious, "cpumaxxing" is as sad and absurd as "ssdmaxxing" is to us.

Anonymous
09/11/25(Thu)18:54:55 No.106560095

Anonymous 09/11/25(Thu)18:54:55 No.106560095

https://allenai.org/blog/olmo2-32b
How did they manage to do it in just 32B?

Anonymous
09/11/25(Thu)18:59:27 No.106560128

Anonymous 09/11/25(Thu)18:59:27 No.106560128

>>106559803
it really is in this day and age
western kids have been dragged down to the level of their 80IQ peers for two generations now

Anonymous
09/11/25(Thu)19:10:53 No.106560211

Anonymous 09/11/25(Thu)19:10:53 No.106560211

>Qwen3-Next is trained on a uniformly sampled subset (15T tokens) of Qwen3’s 36T-token pretraining corpus. It uses less than 80% of the GPU hours needed by Qwen3-30A-3B, and only 9.3% of the compute cost of Qwen3-32B — while achieving better performance. This shows outstanding training efficiency and value.
And it beats Qwen 3 32B + handles long context better than the 235B moe
pretty impressing stuff

Anonymous
09/11/25(Thu)19:17:08 No.106560245

Anonymous 09/11/25(Thu)19:17:08 No.106560245

>>106560211
That's great. Would be greater if they expedited a Qwen3 Next Coder.

Anonymous
09/11/25(Thu)19:17:33 No.106560248

Anonymous 09/11/25(Thu)19:17:33 No.106560248

>>106560211
It's native 256K context I think, without extending.

Anonymous
09/11/25(Thu)19:18:18 No.106560257

Anonymous 09/11/25(Thu)19:18:18 No.106560257

>>106559998
Illya?

Anonymous
09/11/25(Thu)19:20:22 No.106560269

Anonymous 09/11/25(Thu)19:20:22 No.106560269

>>106560248
yeah but the RULER benchmark is better on the Q-Next than the Q3 235B

Anonymous
09/11/25(Thu)19:20:26 No.106560271

Anonymous 09/11/25(Thu)19:20:26 No.106560271

>>106557716
i don't work in an office

Anonymous
09/11/25(Thu)19:21:22 No.106560274

Anonymous 09/11/25(Thu)19:21:22 No.106560274

>>106560211
Isn't Qwen3-Next 70B? Why are they comparing to Qwen3 32B and not other 70B models?

Anonymous
09/11/25(Thu)19:22:28 No.106560283

Anonymous 09/11/25(Thu)19:22:28 No.106560283

>>106560274
It's 80B A3B.

Anonymous
09/11/25(Thu)19:23:42 No.106560291

Anonymous 09/11/25(Thu)19:23:42 No.106560291

>>106560274
Supersparse MoE, 80B A3B

Anonymous
09/11/25(Thu)19:24:28 No.106560294

Anonymous 09/11/25(Thu)19:24:28 No.106560294

>>106560291
>>106560283
Okay, so how does it compare to models that are around 80B?

Anonymous
09/11/25(Thu)19:25:37 No.106560302

Anonymous 09/11/25(Thu)19:25:37 No.106560302

>>106560274
they compare it to every other Qwen3.
>>106560294
they did not bother comparing it to non-Qwen3 models.

Anonymous
09/11/25(Thu)19:26:32 No.106560310

Anonymous 09/11/25(Thu)19:26:32 No.106560310

>>106560283
sqrt(80*3) means it's a copetitor to 16b models

Anonymous
09/11/25(Thu)19:28:14 No.106560314

Anonymous 09/11/25(Thu)19:28:14 No.106560314

File: file.png (366 KB, 1006x1542)

366 KB PNG

>>106560274
Even Gemini is praising it.

Anonymous
09/11/25(Thu)19:28:51 No.106560320

Anonymous 09/11/25(Thu)19:28:51 No.106560320

Qwen3-Max is such a disappointment that I have absolutely zero hope for 3.5
Alibaba truly is the meta of China

Anonymous
09/11/25(Thu)19:28:55 No.106560322

Anonymous 09/11/25(Thu)19:28:55 No.106560322

>>106560294
It's faster :)

Anonymous
09/11/25(Thu)19:30:07 No.106560327

Anonymous 09/11/25(Thu)19:30:07 No.106560327

>>106560320
Kinda funny how Max got completely overshadowed by Qwen3-Next.

Anonymous
09/11/25(Thu)19:30:45 No.106560331

Anonymous 09/11/25(Thu)19:30:45 No.106560331

3bit? is that not bitnet?

Anonymous
09/11/25(Thu)19:31:34 No.106560338

Anonymous 09/11/25(Thu)19:31:34 No.106560338

>>106560314
Gemini will praise anything

Anonymous
09/11/25(Thu)19:32:52 No.106560346

Anonymous 09/11/25(Thu)19:32:52 No.106560346

File: 1615679846051.jpg (53 KB, 660x574)

53 KB JPG

LLMs seem like a competition between Americans, Europe, and China. Why can't Russia, Japan nor Korea compete despite being tech giants?

Anonymous
09/11/25(Thu)19:33:12 No.106560350

Anonymous 09/11/25(Thu)19:33:12 No.106560350

>>106560327
Max got overshadowed by the fact that it's completely pointless so everyone forgot about it two hours after it became available.

Anonymous
09/11/25(Thu)19:34:18 No.106560354

Anonymous 09/11/25(Thu)19:34:18 No.106560354

>>106560346
>Europe
They're competing? It looks like only one European state is just barely trying.

Anonymous
09/11/25(Thu)19:34:32 No.106560356

Anonymous 09/11/25(Thu)19:34:32 No.106560356

File: 1749693672355757.jpg (47 KB, 738x415)

47 KB JPG

>>106560314
>an 80B model requires ~160GB of VRAM. A 3-bit version could potentially run in under 40GB of VRAM, making it feasible to run on a single high-end GPU like an NVIDA RTX 4090
This is Gemini? The peak of LLMs right now? With web access?

Anonymous
09/11/25(Thu)19:35:11 No.106560361

Anonymous 09/11/25(Thu)19:35:11 No.106560361

>>106560354
Mistral was great

Anonymous
09/11/25(Thu)19:42:55 No.106560404

Anonymous 09/11/25(Thu)19:42:55 No.106560404

>>106560361
>was
Yeah

Anonymous
09/11/25(Thu)19:43:19 No.106560409

Anonymous 09/11/25(Thu)19:43:19 No.106560409

File: file.png (99 KB, 983x1103)

99 KB PNG

The proper name is Qwen3-MoE-A3B thank you.

Anonymous
09/11/25(Thu)19:44:21 No.106560414

Anonymous 09/11/25(Thu)19:44:21 No.106560414

>>106560404
It still is.

Anonymous
09/11/25(Thu)19:44:51 No.106560417

Anonymous 09/11/25(Thu)19:44:51 No.106560417

File: 1750684572736732.png (4 KB, 852x31)

4 KB PNG

>>106560409
The sqrt(total * active) formula has been officially confirmed

Anonymous
09/11/25(Thu)19:46:05 No.106560428

Anonymous 09/11/25(Thu)19:46:05 No.106560428

>>106559998
ghostbusters ectoplasm ghostly appearing sperm

Anonymous
09/11/25(Thu)19:46:38 No.106560433

Anonymous 09/11/25(Thu)19:46:38 No.106560433

>>106560346
>Europe
Lol lmao even

Anonymous
09/11/25(Thu)19:46:54 No.106560436

Anonymous 09/11/25(Thu)19:46:54 No.106560436

>>106560346
>Russia
For the last 35 years the #1 rule of doing business in Russia was "don't do business in Russia". CS stuff was the easiest to move abroad.
It's not like there's nothing at all, IIRC Yandex was pretty competitive in the self-driving scene for a moment, and every street dog sells it's own proprietary voice assistant now, but for local I only found https://huggingface.co/yandex/YandexGPT-5-Lite-8B-instruct so far. (It's whatever)
>Japan
Failed into programming way back when we started using real operating systems and high level languages, and did not recover to this day. I blame language barrier.
>Korea
Probably too busy printing money with all their gachas instead.

Anonymous
09/11/25(Thu)19:47:26 No.106560440

Anonymous 09/11/25(Thu)19:47:26 No.106560440

File: file.png (316 KB, 1777x2417)

316 KB PNG

>>106560417

Anonymous
09/11/25(Thu)19:48:09 No.106560442

Anonymous 09/11/25(Thu)19:48:09 No.106560442

>Why aren't you using vllm bro?
>What do you mean you don't have H100 cluster? It can still work with A100 cluster bro.
>Wait, you got just an RTX 3090? Uhm, I've never heard about such GPUs, must be Chinese knockoff or something. Get legit hardware bro.
>You got no money? Just ask for grants bro! Or get investors. You are part of the network, right?

Anonymous
09/11/25(Thu)19:50:19 No.106560454

Anonymous 09/11/25(Thu)19:50:19 No.106560454

>>106560442
>You got no money?
Have you tried getting a job recently? scamming into a grant or investor is unironically easier at this point.

Anonymous
09/11/25(Thu)19:50:42 No.106560459

Anonymous 09/11/25(Thu)19:50:42 No.106560459

File: 30474 - SoyBooru.png (118 KB, 337x390)

118 KB PNG

Are you enjoying the next best thing? (Qwen Next) (Subtle request for feedback)

Anonymous
09/11/25(Thu)19:50:43 No.106560460

Anonymous 09/11/25(Thu)19:50:43 No.106560460

>>106560442
vllm can run on an intel arc gpu. you've got no excuse bro. Also it can do cpu as well and MoEs and even got gguf support not long ago

Anonymous
09/11/25(Thu)19:51:14 No.106560463

Anonymous 09/11/25(Thu)19:51:14 No.106560463

File: 1726683352470372.png (476 KB, 1179x441)

476 KB PNG

>>106560356

Anonymous
09/11/25(Thu)19:52:10 No.106560471

Anonymous 09/11/25(Thu)19:52:10 No.106560471

>>106559958
I wonder what this dude thinks about AI. There's not a lot of difference between what he does and what video models can do.

Anonymous
09/11/25(Thu)19:52:44 No.106560474

Anonymous 09/11/25(Thu)19:52:44 No.106560474

>>106560460
Where can I buy B60 for MSRP(1200BURGERCOINS)?

Anonymous
09/11/25(Thu)19:53:41 No.106560481

Anonymous 09/11/25(Thu)19:53:41 No.106560481

>>106560414
Mistral's 2025 output has been one okayish 24B model and nothing else of note
Meanwhile their business model is increasingly Cohere-ifying and there's good reason to believe they fucked up training Large 3
Maybe the cash injection from ASML will help some but acting like they're still internationally competitive is a joke

Anonymous
09/11/25(Thu)19:54:07 No.106560483

Anonymous 09/11/25(Thu)19:54:07 No.106560483

>>106560474
B60 will be announced next month at SEMICON West. $500 for 24GB model

Anonymous
09/11/25(Thu)19:54:56 No.106560489

Anonymous 09/11/25(Thu)19:54:56 No.106560489

>>106560460
How does pure CPU performance of vllm compare to ik_llama.cpp?

Anonymous
09/11/25(Thu)19:58:48 No.106560523

Anonymous 09/11/25(Thu)19:58:48 No.106560523

>>106560489
I would suggest just standing up the vllm cpu docker image and running benchmarks yourself. You probably won't find much public info for cpu benchmarks between the two.

Anonymous
09/11/25(Thu)20:02:52 No.106560550

Anonymous 09/11/25(Thu)20:02:52 No.106560550

>>106560483
>AMD CPU
>Intel GPU
>NVIDIA RAM
If only I had infinite money...

Anonymous
09/11/25(Thu)20:03:16 No.106560551

Anonymous 09/11/25(Thu)20:03:16 No.106560551

>>106560523
Where exe? I DONT GIVE A FUCK ABOUT THE FUCKING DOCKER! i just want to download this stupid fucking application and use it
WHY IS THERE DOCKER??? MAKE A FUCKING .EXE FILE AND GIVE IT TO ME. these dumbfucks think that everyone is a developer and understands code. well i am not and i don't understand it. I only know to download and install applications. SO WHY THE FUCK IS THERE DOCKER? make an EXE file and give it to me. STUPID FUCKING SMELLY NERDS

Anonymous
09/11/25(Thu)20:04:46 No.106560561

Anonymous 09/11/25(Thu)20:04:46 No.106560561

File: 1593076675554.jpg (57 KB, 477x477)

57 KB JPG

>>106560551

Anonymous
09/11/25(Thu)20:05:21 No.106560566

Anonymous 09/11/25(Thu)20:05:21 No.106560566

>2025
>vibevoice is fully forgotten

Anonymous
09/11/25(Thu)20:06:19 No.106560572

Anonymous 09/11/25(Thu)20:06:19 No.106560572

>>106560566
Useless without training scripts.

Anonymous
09/11/25(Thu)20:08:57 No.106560587

Anonymous 09/11/25(Thu)20:08:57 No.106560587

File: 1754938970794577.jpg (46 KB, 750x1086)

46 KB JPG

>>106560551

Anonymous
09/11/25(Thu)20:09:00 No.106560588

Anonymous 09/11/25(Thu)20:09:00 No.106560588

>>106560481
To Mistral's credit, that single model they made is actually the best model for running on a normal PC. Gemma is heavily censored, Qwen's similar sized models are worse at non benchmaxx tasks and everything else is too big unless you're building your PC for running LLMs

Anonymous
09/11/25(Thu)20:10:21 No.106560596

Anonymous 09/11/25(Thu)20:10:21 No.106560596

>>106560346
>>106560436
>Russia
Case in point: https://en.wikipedia.org/wiki/ABBYY_FineReader
I was informed this used to be SOTA for OCR.
>ABBY ... was founded in the USSR and operated in Russia for nine years before moving to the United States.

Anonymous
09/11/25(Thu)20:11:00 No.106560600

Anonymous 09/11/25(Thu)20:11:00 No.106560600

>>106560550
>NVIDIA CPU
>AMD GPU
>INTEL RAM
WE ARE MAKING A MEME SYSTEM. OPTANE WILL NEVER DIE.

Anonymous
09/11/25(Thu)20:11:18 No.106560604

Anonymous 09/11/25(Thu)20:11:18 No.106560604

nvidia not offering a 24gb 50xx card was criminal and i'm tired of pretending otherwise.

Anonymous
09/11/25(Thu)20:11:46 No.106560606

Anonymous 09/11/25(Thu)20:11:46 No.106560606

>>106560551
This argument has never been refuted

Anonymous
09/11/25(Thu)20:12:29 No.106560614

Anonymous 09/11/25(Thu)20:12:29 No.106560614

>>106560606
nobody wants to deal with women. if exe is a filter than so be it.

Anonymous
09/11/25(Thu)20:13:20 No.106560619

Anonymous 09/11/25(Thu)20:13:20 No.106560619

mistral for erp
qwen3 for anything else but erp

Anonymous
09/11/25(Thu)20:13:43 No.106560622

Anonymous 09/11/25(Thu)20:13:43 No.106560622

>>106560604
Fuck 24GB. The 5090 should have just been cheaper, it's not remotely close to being a proper workstation card and 32GB is too little for anything outside of hobbyist stuff.

Anonymous
09/11/25(Thu)20:14:36 No.106560627

Anonymous 09/11/25(Thu)20:14:36 No.106560627

>>106560622
It's a gayming gpu. Buy from their worktsation lineup if you want professional stuff.

Anonymous
09/11/25(Thu)20:14:43 No.106560630

Anonymous 09/11/25(Thu)20:14:43 No.106560630

>>106560614
You aren't a woman, though

Anonymous
09/11/25(Thu)20:15:24 No.106560633

Anonymous 09/11/25(Thu)20:15:24 No.106560633

>>106559044
SSDmaxxbros, maybe our time is finally cuming soon...

Anonymous
09/11/25(Thu)20:15:50 No.106560634

Anonymous 09/11/25(Thu)20:15:50 No.106560634

>>106560619
But what about sfw rp, is that included in that? Is Qwen 3 smarter than Gemma 3?

Anonymous
09/11/25(Thu)20:15:55 No.106560635

Anonymous 09/11/25(Thu)20:15:55 No.106560635

>>106560630
no?? really??? I think your lost bro, this isn't >>>/lgbt/

Anonymous
09/11/25(Thu)20:16:30 No.106560640

Anonymous 09/11/25(Thu)20:16:30 No.106560640

>>106560619
>anything else but erp
there is nothing else

Anonymous
09/11/25(Thu)20:17:04 No.106560645

Anonymous 09/11/25(Thu)20:17:04 No.106560645

File: file.png (176 KB, 1425x569)

176 KB PNG

Am I about to get scammed? I've never seen these under $1000. From Hong Kong.

Anonymous
09/11/25(Thu)20:20:40 No.106560665

Anonymous 09/11/25(Thu)20:20:40 No.106560665

>>106560634
>gemma3
after all safety humiliation I got I will never use it again

Anonymous
09/11/25(Thu)20:25:20 No.106560687

Anonymous 09/11/25(Thu)20:25:20 No.106560687

I refuse to support any model whose selling point is high context limits. Every llm i've used from free to paid are absolute garbage and hallucinate at high context.

Anonymous
09/11/25(Thu)20:26:47 No.106560693

Anonymous 09/11/25(Thu)20:26:47 No.106560693

>forcing full prompt re-processing due to lack of cache data (likely due to SWA
humiliation ritual

Anonymous
09/11/25(Thu)20:28:59 No.106560713

Anonymous 09/11/25(Thu)20:28:59 No.106560713

My CLINE prompts are all timing out when I'm trying to use gemma3:12b on a 4070. Do I need a quantized model instead?

not bqgfla-1
09/11/25(Thu)20:30:20 No.106560723

not bqgfla-1 09/11/25(Thu)20:30:20 No.106560723

>>106560645
No, you're in for a great deal! Buy it quick, there's only one left!

Anonymous
09/11/25(Thu)20:33:24 No.106560751

Anonymous 09/11/25(Thu)20:33:24 No.106560751

>>106560645
>seller with 0 reviews
Yeah, trust him!

Anonymous
09/11/25(Thu)20:43:20 No.106560809

Anonymous 09/11/25(Thu)20:43:20 No.106560809

File: 1733603436129628.png (2.24 MB, 2038x1678)

2.24 MB PNG

>>106560645
bro no don't do that
buy this one: https://www.ebay.com/itm/325407276138

much better trust me

Anonymous
09/11/25(Thu)20:44:15 No.106560814

Anonymous 09/11/25(Thu)20:44:15 No.106560814

>>106560809
>Graphcore IPU
what

Anonymous
09/11/25(Thu)21:13:07 No.106560965

Anonymous 09/11/25(Thu)21:13:07 No.106560965

>>106560693
How do you prevent this?

Anonymous
09/11/25(Thu)21:13:27 No.106560967

Anonymous 09/11/25(Thu)21:13:27 No.106560967

>>106560814
>intelligent processing unit
lmao

Anonymous
09/11/25(Thu)21:13:31 No.106560970

Anonymous 09/11/25(Thu)21:13:31 No.106560970

>>106560809
ok ersinc03

Anonymous
09/11/25(Thu)21:29:11 No.106561056

Anonymous 09/11/25(Thu)21:29:11 No.106561056

>>106560687
You can't trust the actual numbers for context that companies put out, they're always wrong. But it's usually safe to assume that a higher advertised number does mean a higher 'effective' context ceiling.

Anonymous
09/11/25(Thu)21:31:26 No.106561071

Anonymous 09/11/25(Thu)21:31:26 No.106561071

Can't wait for adobe research to publish an updated study on how all these models go to shit past 32k

Anonymous
09/11/25(Thu)21:32:44 No.106561077

Anonymous 09/11/25(Thu)21:32:44 No.106561077

>>106559371
>>106559401
>no tits
>shitty reddit memes
You are gay.

Anonymous
09/11/25(Thu)21:32:51 No.106561079

Anonymous 09/11/25(Thu)21:32:51 No.106561079

>>106560967
>>106560814
IPU/NPUs are a real thing, they're in all the new CPUs from AMD for instance. just not from meme companies like that one.

Anonymous
09/11/25(Thu)21:43:42 No.106561127

Anonymous 09/11/25(Thu)21:43:42 No.106561127

>>106561079
>central processing unit
makes sense
>graphics processing unit
yup
>neural processing unit
works with neural networks, gotcha
>intelligent processing unit
the fuck is this supposed to be? it sounds like some marketing term

Anonymous
09/11/25(Thu)21:48:16 No.106561145

Anonymous 09/11/25(Thu)21:48:16 No.106561145

File: Base Image.png (490 KB, 1200x1576)

490 KB PNG

ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
https://arxiv.org/abs/2509.09679
>Rotation-based methods such as QuIP and QuaRot apply orthogonal transforms to eliminate outliers before quantization, using computational invariance. However, these methods use fixed transforms--Hadamard matrices achieving optimal worst-case coherence \mu = 1/\sqrt{n}--that cannot adapt to specific weight distributions. We identify that different transformer layers exhibit distinct outlier patterns, motivating layer-adaptive rotations rather than one-size-fits-all approaches. We propose ButterflyQuant, which replaces Hadamard rotations with learnable butterfly transforms parameterized by continuous Givens rotation angles. Unlike Hadamard's discrete \{+1, -1\} entries that are non-differentiable and prohibit gradient-based learning, butterfly transforms' continuous parameterization enables smooth optimization while guaranteeing orthogonality by construction. This orthogonal constraint ensures theoretical guarantees in outlier suppression while achieving O(n \log n) computational complexity with only \frac{n \log n}{2} learnable parameters. We further introduce a uniformity regularization on post-transformation activations to promote smoother distributions amenable to quantization. Learning requires only 128 calibration samples and converges in minutes on a single GPU--a negligible one-time cost. On LLaMA-2-7B with 2-bit quantization, ButterflyQuant achieves 15.4 perplexity versus 22.1 for QuaRot.
Links belowhttps://github.com/42Shawn
https://github.com/oumi-ai/oumi
Code might be posted on one of those. Might be cool but then again very little results included.
previous paper that looked at butterfly transforms
https://arxiv.org/abs/2302.06646

Anonymous
09/11/25(Thu)21:49:17 No.106561153

Anonymous 09/11/25(Thu)21:49:17 No.106561153

>>106560089
Anyone serious is deploying for enterprise, not personal use. Normal people don't use local models for personal use, just like normal people don't use 4chan and only use fb/linkedin.

Anonymous
09/11/25(Thu)21:50:48 No.106561161

Anonymous 09/11/25(Thu)21:50:48 No.106561161

>>106561145
That is very nice, but how does it compare to GGUF?

Anonymous
09/11/25(Thu)21:51:16 No.106561164

Anonymous 09/11/25(Thu)21:51:16 No.106561164

>>106561127
>it sounds like some marketing term
it basically is. NPU = IPU
in the industry its looking like NPU has won out but AMD at least early in the developments of NPUs in like 2023 referred to it as IPUs as well

Anonymous
09/11/25(Thu)21:51:32 No.106561166

Anonymous 09/11/25(Thu)21:51:32 No.106561166

>>106561127
Graphics Processing Unit is a horrible term nowadays.
NVIDIA calls the H100 a GPU even though it doesn’t even have a display output and isn’t aimed at graphics processing.

Anonymous
09/11/25(Thu)21:52:07 No.106561168

Anonymous 09/11/25(Thu)21:52:07 No.106561168

>>106561161
Probably just as shit as Q2 ggufs are.

Anonymous
09/11/25(Thu)21:53:05 No.106561177

Anonymous 09/11/25(Thu)21:53:05 No.106561177

>>106561168
Nah, ggufs are probably better since they don't mention them.

Anonymous
09/11/25(Thu)21:53:46 No.106561184

Anonymous 09/11/25(Thu)21:53:46 No.106561184

>>106561166
GPU stands for "General Processing Unit" in nvidia's own terms

Anonymous
09/11/25(Thu)21:54:11 No.106561186

Anonymous 09/11/25(Thu)21:54:11 No.106561186

2-bit is all you need, you don't need more

Anonymous
09/11/25(Thu)21:56:39 No.106561204

Anonymous 09/11/25(Thu)21:56:39 No.106561204

>>106561177
My point stands. Q2 is shit . This is literally a competition of who has the nicer looking pile of shit. If you're seriously using a Q2 model you need to reevaluate your life. Also the paper likely doesn't mention GGUFs at all because it's talking about W2A16 which Q2 GGUF can't even map to in practice.

Anonymous
09/11/25(Thu)21:56:53 No.106561205

Anonymous 09/11/25(Thu)21:56:53 No.106561205

File: 1757508971871431.png (327 KB, 800x982)

327 KB PNG

>>106561184
https://www.nvidia.com/en-us/about-nvidia/corporate-timeline/

Anonymous
09/11/25(Thu)21:57:16 No.106561207

Anonymous 09/11/25(Thu)21:57:16 No.106561207

>>106561184
that's a backronym they made up so they can keep using the term everyone would have used anyways

Anonymous
09/11/25(Thu)22:14:33 No.106561302

Anonymous 09/11/25(Thu)22:14:33 No.106561302

That reminds me, I have vllm installed. Might as well try a quick speed comparison. Tomorrow maybe.

Anonymous
09/11/25(Thu)22:16:15 No.106561309

Anonymous 09/11/25(Thu)22:16:15 No.106561309

File: k2-0905-perplexity.png (175 KB, 2069x1400)

175 KB PNG

>>106561204
case in point

Anonymous
09/11/25(Thu)22:25:31 No.106561336

Anonymous 09/11/25(Thu)22:25:31 No.106561336

>>106561184
Should call them NVIDIA Processing Units to shit into everyone's salad.

Anonymous
09/11/25(Thu)22:26:49 No.106561341

Anonymous 09/11/25(Thu)22:26:49 No.106561341

RP testing qwen3-next-thinking and it has a completely different reasoning style from 2507, and not in a particularly good way
several times more verbose and EXTREMELY wasteful of tokens - trying out different lines of dialogue over and over again, outputting them in full with minor variations, outputting full drafts of the response, or in one case "let me check the previous messages [proceeds to output EVERY previous turn of the roleplay IN FULL]"... wtf. I get the sense that this is something of a proof of concept model for them (and to their credit, in my limited testing the models do seem smart and pretty good at long context) but they've gotta fix this for 3.5 or whatever their next release is.

Anonymous
09/11/25(Thu)22:27:49 No.106561347

Anonymous 09/11/25(Thu)22:27:49 No.106561347

>>106561341
post cockbench

Anonymous
09/11/25(Thu)22:29:57 No.106561358

Anonymous 09/11/25(Thu)22:29:57 No.106561358

>>106561341
Have you tried prefilling the thinking with some guidance on how to think about the RP?

Anonymous
09/11/25(Thu)22:32:26 No.106561367

Anonymous 09/11/25(Thu)22:32:26 No.106561367

File: fd3de020feadaaa297ee313d5(...).jpg (84 KB, 720x717)

84 KB JPG

>>106561184
>>106561207
>>106561336
You butt hurt boys are SO silly! :3

Anonymous
09/11/25(Thu)22:35:36 No.106561380

Anonymous 09/11/25(Thu)22:35:36 No.106561380

File: cute finnish bf.webm (903 KB, 720x1280)

903 KB WEBM

Why is it so hard to get models to undress the finnish catgirl pm?

Anonymous
09/11/25(Thu)22:38:26 No.106561391

Anonymous 09/11/25(Thu)22:38:26 No.106561391

>>106561347
I APIfagged, sorry anon. I'd expect it to be in line with the 2507 qwens though.
>>106561358
not yet, I'm putting off messing with it more until there are ggufs

Anonymous
09/11/25(Thu)22:52:13 No.106561458

Anonymous 09/11/25(Thu)22:52:13 No.106561458

>>106560551
> anon comes to a thread where everyone has a fucking doctorate in AI
> sees the word docker
> looses his shit as he's dumb as fuck
> after crying gets his mcdonalds uniform ready for work tomorrow

Anonymous
09/11/25(Thu)22:52:18 No.106561459

Anonymous 09/11/25(Thu)22:52:18 No.106561459

File: Screenshot_20250912_115041.png (187 KB, 2296x1297)

187 KB PNG

Not sure what I expected.
What is this called? At the beginning the sentences are long and then its all short and weird. I saw this before with another sloped model.

Anonymous
09/11/25(Thu)22:55:20 No.106561476

Anonymous 09/11/25(Thu)22:55:20 No.106561476

>>106561459
I can get better outputs from llama 8B. Holy slop

Anonymous
09/11/25(Thu)23:03:05 No.106561506

Anonymous 09/11/25(Thu)23:03:05 No.106561506

File: screencapture-192-168-1-1(...).png (1.87 MB, 3840x4458)

1.87 MB PNG

>>106561476
Sad because this would have been a really cool size.Fast even with offloading.
But at least they try something new.

Anonymous
09/11/25(Thu)23:04:09 No.106561512

Anonymous 09/11/25(Thu)23:04:09 No.106561512

>>106561506
even the chinks are putting in extreme safety nets. shame. Gemma3 tier slop

Anonymous
09/11/25(Thu)23:05:00 No.106561514

Anonymous 09/11/25(Thu)23:05:00 No.106561514

File: file.png (211 KB, 602x600)

211 KB PNG

>>106561367
Consider the following you tranny freak

Anonymous
09/11/25(Thu)23:13:44 No.106561562

Anonymous 09/11/25(Thu)23:13:44 No.106561562

>>106561459
It kinda communicates pacing.

Anonymous
09/11/25(Thu)23:15:54 No.106561572

Anonymous 09/11/25(Thu)23:15:54 No.106561572

>>106561459
Somehow way worse than Mistral Small

Anonymous
09/11/25(Thu)23:20:56 No.106561599

Anonymous 09/11/25(Thu)23:20:56 No.106561599

File: 1692170984443505.jpg (32 KB, 400x400)

32 KB JPG

I did nothing today

Anonymous
09/11/25(Thu)23:21:17 No.106561602

Anonymous 09/11/25(Thu)23:21:17 No.106561602

>>106561166
>isn’t aimed at graphics processing
you can have a gpu render something and then display it through an iGPU's display output
i wonder if you could stick a h100 inside a normal desktop PC, install the geforce driver (after doing inf mod) and then just play games on it

Anonymous
09/11/25(Thu)23:46:32 No.106561768

Anonymous 09/11/25(Thu)23:46:32 No.106561768

>>106561459
It's qwen3 only problem I think. It tries to mimic the text formatting from latest response. Also how it was trained could be the culprit, like maybe it was trained with a bunch of Chinese poems.
The pattern I noticed is like this :
1 paragraph -> 2 paragraphs -> 3 -> 4 -> 5 -> Then it ended with one line per paragraph.

So far the only way to control it is by instruct it explicitly in system prompt. For example I'm using this :
"Respond in multiple standard paragraphs format. Avoid poetic or dramatic elements. "

Anonymous
09/11/25(Thu)23:50:36 No.106561794

Anonymous 09/11/25(Thu)23:50:36 No.106561794

File: Screenshot_20250912_124937.png (238 KB, 3180x591)

238 KB PNG

>>106561768
That helps. But what a weird writing style. Feels like Deepseek on steroids.

Anonymous
09/11/25(Thu)23:54:39 No.106561807

Anonymous 09/11/25(Thu)23:54:39 No.106561807

>>106560606
pay me

Anonymous
09/12/25(Fri)00:05:17 No.106561855

Anonymous 09/12/25(Fri)00:05:17 No.106561855

>>106561794
now you've got a pattern of 3 paragraphs of exactly 3 lines.

Anonymous
09/12/25(Fri)00:07:04 No.106561861

Anonymous 09/12/25(Fri)00:07:04 No.106561861

>>106561855
As god intended it. Proper paragraphs should never exceed more than 3-4 lines. I learned that in middle school

Anonymous
09/12/25(Fri)00:16:33 No.106561915

Anonymous 09/12/25(Fri)00:16:33 No.106561915

>>106561077
>>no tits
Perfect.

Anonymous
09/12/25(Fri)00:17:32 No.106561922

Anonymous 09/12/25(Fri)00:17:32 No.106561922

>>106561861
congratulations for completing middle school anon.
nobody thought you could do it, but you did.

Anonymous
09/12/25(Fri)00:23:28 No.106561944

Anonymous 09/12/25(Fri)00:23:28 No.106561944

File: 1736083582437887.jpg (64 KB, 1024x1020)

64 KB JPG

How will qwen 80b-A3B improve my text adventures involving me being a magical kemoshota that cures predators of their fucked up fetishes?

Anonymous
09/12/25(Fri)00:38:42 No.106561992

Anonymous 09/12/25(Fri)00:38:42 No.106561992

wen qwen ggoofs

Anonymous
09/12/25(Fri)00:41:23 No.106562002

Anonymous 09/12/25(Fri)00:41:23 No.106562002

y no opera his son?

Anonymous
09/12/25(Fri)00:49:51 No.106562029

Anonymous 09/12/25(Fri)00:49:51 No.106562029

i think im gonna goof...

Anonymous
09/12/25(Fri)00:57:40 No.106562070

Anonymous 09/12/25(Fri)00:57:40 No.106562070

Realistically speaking, there haven't been any improvements erp wise since llama3.3-70b and mistral large 2407

Anonymous
09/12/25(Fri)00:59:56 No.106562085

Anonymous 09/12/25(Fri)00:59:56 No.106562085

>>106561915
Oke doke gay

Anonymous
09/12/25(Fri)01:00:54 No.106562090

Anonymous 09/12/25(Fri)01:00:54 No.106562090

>>106562070
I've never once used LLMs to goon so I have no idea what this even means
But the obvious solution to get around LLMs not doing what you want is to be agentic
agents aren't only for tool calling and APIs. They can also form complex logic based on natural language, like following and maintaining a story structure despite whatever retarded shit you're trying to pull

Anonymous
09/12/25(Fri)01:01:04 No.106562092

Anonymous 09/12/25(Fri)01:01:04 No.106562092

File: file.png (175 KB, 420x459)

175 KB PNG

https://www.washingtontimes.com/news/2025/sep/11/ftc-launches-inquiry-ai-chatbots-acting-companions-effects-children

Anonymous
09/12/25(Fri)01:03:58 No.106562108

Anonymous 09/12/25(Fri)01:03:58 No.106562108

File: 1754987229502936.png (2.39 MB, 1416x2120)

2.39 MB PNG

Anonymous
09/12/25(Fri)01:07:50 No.106562124

Anonymous 09/12/25(Fri)01:07:50 No.106562124

>>106562070
Air is a direct upgrade for 3-4 3090 VRAMlets

Anonymous
09/12/25(Fri)01:14:52 No.106562161

Anonymous 09/12/25(Fri)01:14:52 No.106562161

File: 1740718713054690.png (1.9 MB, 1416x2120)

1.9 MB PNG

Anonymous
09/12/25(Fri)01:22:35 No.106562200

Anonymous 09/12/25(Fri)01:22:35 No.106562200

>>106562108
>>106562161
tfw the goofs are nevermore

Anonymous
09/12/25(Fri)01:29:59 No.106562231

Anonymous 09/12/25(Fri)01:29:59 No.106562231

>>106561944
Seeing how shit it is will make you put even more effort into your RPs using Mistral Nemo

Anonymous
09/12/25(Fri)01:32:55 No.106562240

Anonymous 09/12/25(Fri)01:32:55 No.106562240

im vibecoding vibevoice for my vibecoded local ai software. what am i in for?

Anonymous
09/12/25(Fri)01:34:31 No.106562244

Anonymous 09/12/25(Fri)01:34:31 No.106562244

>>106562240
aids

Anonymous
09/12/25(Fri)01:35:41 No.106562252

Anonymous 09/12/25(Fri)01:35:41 No.106562252

File: 1729918764920247.png (3.01 MB, 1416x2120)

3.01 MB PNG

Anonymous
09/12/25(Fri)01:43:08 No.106562289

Anonymous 09/12/25(Fri)01:43:08 No.106562289

It's up!

Anonymous
09/12/25(Fri)01:44:30 No.106562298

Anonymous 09/12/25(Fri)01:44:30 No.106562298

>>106562289
>*looks down*
Yes, it is!

Anonymous
09/12/25(Fri)01:51:14 No.106562330

Anonymous 09/12/25(Fri)01:51:14 No.106562330

>>106559371
Cute miku I like

Anonymous
09/12/25(Fri)01:51:17 No.106562331

Anonymous 09/12/25(Fri)01:51:17 No.106562331

wheres my fucking ggoofs Daniel

Anonymous
09/12/25(Fri)01:53:34 No.106562345

Anonymous 09/12/25(Fri)01:53:34 No.106562345

>>106562331
What's happening?

Anonymous
09/12/25(Fri)01:55:11 No.106562353

Anonymous 09/12/25(Fri)01:55:11 No.106562353

>>106562331
>unsloth
>ever
lmao

Anonymous
09/12/25(Fri)01:55:50 No.106562356

Anonymous 09/12/25(Fri)01:55:50 No.106562356

>>106562345
upload the qwen3 next ggoofs you goof

Anonymous
09/12/25(Fri)01:57:50 No.106562370

Anonymous 09/12/25(Fri)01:57:50 No.106562370

>>106559420
vLLM supported it in June via https://github.com/vllm-project/vllm/commit/b69781f107b7ad847a351f584178cfafbee2b32a but it's really hacky and depends on their Extension for Pytorch and some calls in their LLM hacked backend.
The best I've seen from Intel publicly for C++ is their closed pull request inside the main Flash Attention repo.
https://github.com/Dao-AILab/flash-attention/pull/1528
This uses SYCL so yeah, would be kind of an uphill for anyone not an Intel developer to adapt to the existing CUDA code.

Anonymous
09/12/25(Fri)02:12:30 No.106562423

Anonymous 09/12/25(Fri)02:12:30 No.106562423

Damn phonemizers are a huge bottleneck for TTS because devs use by default the pile of trash that is espeak. On CPU for kokoro it takes almost 8-9s to preprocess a single sentence to IPA phonemes on my laptop while the inference itself is ~6s and that shit grows at O(n) or more (fucking 22s to preprocess a paragraph). Switching to g2p_en for american english + a bunch of heuristics I got from chatgpt achieves the same preprocessing output in 1.5s for a single sentence, growing at ~O(log N). I wish this field focused a bit on efficiency instead of convenience

Anonymous
09/12/25(Fri)02:13:43 No.106562427

Anonymous 09/12/25(Fri)02:13:43 No.106562427

>>106562423
ok nerd

Anonymous
09/12/25(Fri)02:14:52 No.106562430

Anonymous 09/12/25(Fri)02:14:52 No.106562430

>>106562423
You don't need to pre-process anything.

Anonymous
09/12/25(Fri)02:19:49 No.106562450

Anonymous 09/12/25(Fri)02:19:49 No.106562450

>>106562430
It's not feeding the raw text to the TTS, it's preprocessing the text to phonemes before feeding them to the model

Anonymous
09/12/25(Fri)02:20:10 No.106562453

Anonymous 09/12/25(Fri)02:20:10 No.106562453

>>106562423
Shouldn't that just be a database lookup for retarded languages like english where the pronounciation doesn't match the spelling?

Anonymous
09/12/25(Fri)02:24:37 No.106562482

Anonymous 09/12/25(Fri)02:24:37 No.106562482

>>106562423
They aren't using espeak just because it is easy, it is because it has multilingual support out of the box. G2P is much harder to configure with mappings needed for each language.

Anonymous
09/12/25(Fri)02:24:55 No.106562483

Anonymous 09/12/25(Fri)02:24:55 No.106562483

>>106560481
They've made other models too, but they're mostly not open-weights. But I don't get why they don't start doing MoE models the Qwen way though, wouldn't that make them able to release them in a wider range of sizes with less compute?

Anonymous
09/12/25(Fri)02:26:43 No.106562493

Anonymous 09/12/25(Fri)02:26:43 No.106562493

>>106562453
It's not enough, because some words have different pronunciation depending on whether they're a noun or a verb while written the same way, like "use" + other things that are context dependant

Anonymous
09/12/25(Fri)02:31:30 No.106562515

Anonymous 09/12/25(Fri)02:31:30 No.106562515

>>106562482
You're describing convenience bro. Espeak is almost twenty years old, it has memory leaks and a lot of issues that won't ever be fixed because of GPL no one wants to contribute to this trash.

Anonymous
09/12/25(Fri)02:36:55 No.106562542

Anonymous 09/12/25(Fri)02:36:55 No.106562542

>>106562450
One thing what will help you regardless - doesn't matter if it gets converted to phonememes or not - is to use contractions module
>import contractions
>cleaned_text = contractions.fix(text)
and then remove surrogates with regex and optionally add abbreviations and optionally clean up any problematic remaining characters (because LLMs always output random shit).

Anonymous
09/12/25(Fri)02:36:56 No.106562543

Anonymous 09/12/25(Fri)02:36:56 No.106562543

>>106562482
Sounds like an llm whose sole purpose is to take text as input and output ipa is required.

Anonymous
09/12/25(Fri)02:37:22 No.106562546

Anonymous 09/12/25(Fri)02:37:22 No.106562546

>MUH ERP
go to sleep americlaps and huemonkeys.
productive eurochads are taking over from here.

Anonymous
09/12/25(Fri)02:38:41 No.106562549

Anonymous 09/12/25(Fri)02:38:41 No.106562549

>>106562546
Give us Miqu 3 already or at least Largestral 3. WTF are you frogs doing?

Anonymous
09/12/25(Fri)02:44:38 No.106562579

Anonymous 09/12/25(Fri)02:44:38 No.106562579

>>106562543
at that point you might as well take text as input and diffuse the audio directly.
Take in some positive/negative descriptor tokens too.

Anonymous
09/12/25(Fri)02:46:00 No.106562586

Anonymous 09/12/25(Fri)02:46:00 No.106562586

>>106562542
Thanks, I didn't know it was a thing. I'll add that

Anonymous
09/12/25(Fri)02:50:24 No.106562603

Anonymous 09/12/25(Fri)02:50:24 No.106562603

>>106562586
Yeah so what I did with piper voice (it's instant tts, takes ~100 mb or less but it's not as robust as vibevoice of course)
>contractions
>surrogates
># remove surrogates (U+D800 to U+DFFF unicode range)
>cleaned_text = re.sub(r'[\ud800-\udfff]', '', cleaned_text)
>Then replace commas, ellipses, "", dash, em dash, and whatever else there is with either empty spaces or periods - this way TTS does not even try to do anything but it'll go straight onward - basically remove and replace everything else except periods. This is sort of trial and error, you'll need to test this and proceed accordingly.

Anonymous
09/12/25(Fri)02:51:06 No.106562608

Anonymous 09/12/25(Fri)02:51:06 No.106562608

>>106562543
There are small transformers for that (T5), but it's even slower than espeak. They're using them for disambiguation, which is fine when you don't care about latency and want the output to be as good as possible

Anonymous
09/12/25(Fri)02:52:22 No.106562612

Anonymous 09/12/25(Fri)02:52:22 No.106562612

>>106559401
kek

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.