/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 10/20/25(Mon)17:23:22 No.106954792

File: 1739183735263922.jpg (2.45 MB, 2832x2120)

2.45 MB JPG

/lmg/ - Local Models General Anonymous 10/20/25(Mon)17:23:22 No.106954792

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106940821 & >>106931567

►News
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap
>(10/16) Auto-Antislop framework released: https://github.com/sam-paech/auto-antislop
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/20/25(Mon)17:23:49 No.106954801

Anonymous 10/20/25(Mon)17:23:49 No.106954801

File: __hatsune_miku_kasane_tet(...).jpg (2.04 MB, 3062x2164)

2.04 MB JPG

►Recent Highlights from the Previous Thread: >>106940821

--Paper: Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models:
>106944966 >106945059 >106945146 >106945604 >106945950 >106946143 >106946175 >106946233 >106946278 >106946621 >106946705 >106946831 >106946843 >106949880 >106950990 >106945339 >106945533 >106945561 >106945606 >106946024
--DeepSeek-OCR's image-based text compression and AI architecture implications:
>106948080 >106948481 >106948499 >106948518 >106948819 >106948905 >106948319 >106948529 >106948761 >106949247 >106950110 >106950179 >106950186 >106948192 >106948212 >106948255 >106953910 >106954336 >106948265 >106948291 >106949085 >106948271 >106948296 >106948584 >106948594 >106948622 >106949215
--Alternatives to Ollama for low-vRAM LLMs with roleplay setups:
>106943129 >106943198 >106943380 >106943404 >106943568 >106943586
--Optimizing MoE model inference speed with RAM utilization:
>106941343 >106941369 >106941398 >106941413 >106941438 >106942246
--Resolving Gemma chat template formatting issues in llama-cli:
>106941443 >106941501 >106941522 >106941619
--DLER research improves LLM efficiency with length penalty optimization:
>106945037
--Adversarial prompting techniques to mitigate AI hallucinations:
>106941795
--DeepSeek's memory compression innovations and implications for AI efficiency:
>106951209 >106952422 >106951255 >106952290 >106951306 >106951453 >106951528 >106951597 >106952149 >106951554 >106952234
--Optimizing sampler settings for CoT creativity and output precision:
>106942340 >106943060 >106943092 >106943230 >106943500
--Testing completion with llama.cpp and rp-sft-merged_1000-f16.gguf:
>106940883 >106941035
--Text diffusion vs autoregression in modeling human thought:
>106954091 >106954241
--Miku (free space):
>106940859 >106942138 >106942726 >106945481

►Recent Highlight Posts from the Previous Thread: >>106940836

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/20/25(Mon)17:39:01 No.106954930

Anonymous 10/20/25(Mon)17:39:01 No.106954930

multi-token prediction

Anonymous
10/20/25(Mon)17:39:50 No.106954941

Anonymous 10/20/25(Mon)17:39:50 No.106954941

MTP SOON
GLM 4.6 AIR SOON
GEMMA 4 SAAR SOON
WHO HYPE????????????

Anonymous
10/20/25(Mon)17:39:55 No.106954943

Anonymous 10/20/25(Mon)17:39:55 No.106954943

>>106954930
miku-teto penetration

Anonymous
10/20/25(Mon)17:45:09 No.106954989

Anonymous 10/20/25(Mon)17:45:09 No.106954989

File: 1760328570709480.jpg (440 KB, 2048x1536)

440 KB JPG

Anonymous
10/20/25(Mon)17:50:26 No.106955037

Anonymous 10/20/25(Mon)17:50:26 No.106955037

>>106954941
Yes sir.

Anonymous
10/20/25(Mon)18:00:24 No.106955109

Anonymous 10/20/25(Mon)18:00:24 No.106955109

File: MTP.png (790 KB, 1024x1024)

790 KB PNG

>>106954943

Anonymous
10/20/25(Mon)18:12:08 No.106955219

Anonymous 10/20/25(Mon)18:12:08 No.106955219

mikusex

Anonymous
10/20/25(Mon)18:15:12 No.106955250

Anonymous 10/20/25(Mon)18:15:12 No.106955250

Mikulove

Anonymous
10/20/25(Mon)18:16:11 No.106955261

Anonymous 10/20/25(Mon)18:16:11 No.106955261

glare realm shitter

Anonymous
10/20/25(Mon)18:16:38 No.106955265

Anonymous 10/20/25(Mon)18:16:38 No.106955265

File: file.png (210 KB, 972x1141)

210 KB PNG

>>106954930
>>106954941
>MTP SOON
More than two weeks.

Anonymous
10/20/25(Mon)18:21:19 No.106955307

Anonymous 10/20/25(Mon)18:21:19 No.106955307

>>106954941
>GEMMA 4
tuesday newsday will deliver

Anonymous
10/20/25(Mon)18:29:24 No.106955375

Anonymous 10/20/25(Mon)18:29:24 No.106955375

File: 1623328199614.png (9 KB, 326x272)

9 KB PNG

Thinking about doing a genoa build, but I'm unsure how to go about buying parts off of ebay chinks. The price range for an epyc 9535 is $1-4k, which makes the cheap ones seem extremely fishy. Is it safe anons?

Anonymous
10/20/25(Mon)18:41:45 No.106955499

Anonymous 10/20/25(Mon)18:41:45 No.106955499

>>106954989
Lewd

Anonymous
10/20/25(Mon)19:02:41 No.106955701

Anonymous 10/20/25(Mon)19:02:41 No.106955701

File: moatboy at google hq.png (1.83 MB, 1024x1024)

1.83 MB PNG

How will moatboy mleccha ever recover from Gemini 3?

Anonymous
10/20/25(Mon)19:02:55 No.106955704

Anonymous 10/20/25(Mon)19:02:55 No.106955704

*crickets*

Anonymous
10/20/25(Mon)19:04:35 No.106955720

Anonymous 10/20/25(Mon)19:04:35 No.106955720

M

Anonymous
10/20/25(Mon)19:05:11 No.106955728

Anonymous 10/20/25(Mon)19:05:11 No.106955728

I

Anonymous
10/20/25(Mon)19:07:56 No.106955750

Anonymous 10/20/25(Mon)19:07:56 No.106955750

K

Anonymous
10/20/25(Mon)19:09:07 No.106955761

Anonymous 10/20/25(Mon)19:09:07 No.106955761

U

Anonymous
10/20/25(Mon)19:09:19 No.106955763

Anonymous 10/20/25(Mon)19:09:19 No.106955763

Gemma

Anonymous
10/20/25(Mon)19:09:19 No.106955764

Anonymous 10/20/25(Mon)19:09:19 No.106955764

U

Anonymous
10/20/25(Mon)19:10:52 No.106955777

Anonymous 10/20/25(Mon)19:10:52 No.106955777

File: iu[2].jpg (41 KB, 407x405)

41 KB JPG

Anonymous
10/20/25(Mon)19:12:46 No.106955789

Anonymous 10/20/25(Mon)19:12:46 No.106955789

>>106955375
>genoa
Turin all day. Get the higher speed RAM
>chink sample cpus
Look at the frequency vs the non ES/QS one. Sometimes that's gimped in the ones they're selling.

Anonymous
10/20/25(Mon)19:12:51 No.106955790

Anonymous 10/20/25(Mon)19:12:51 No.106955790

File: miku animation lamaze be (...).gif (2.33 MB, 1280x720)

2.33 MB GIF

>>106955720
>>106955728
>>106955750
>>106955761

Anonymous
10/20/25(Mon)19:12:57 No.106955791

Anonymous 10/20/25(Mon)19:12:57 No.106955791

File: 24387492349.png (552 KB, 500x637)

552 KB PNG

>>106954941
im still holding out for mistral large 3

Anonymous
10/20/25(Mon)19:17:06 No.106955830

Anonymous 10/20/25(Mon)19:17:06 No.106955830

>>106954941
Only hyped for Gemini 3, gemma is gonna be gigacucked again

Anonymous
10/20/25(Mon)19:19:09 No.106955848

Anonymous 10/20/25(Mon)19:19:09 No.106955848

>>106955791
i hate western corpos and have learned to accept the chinese cock

Anonymous
10/20/25(Mon)19:19:13 No.106955849

Anonymous 10/20/25(Mon)19:19:13 No.106955849

File: file.jpg (656 KB, 604x1293)

656 KB JPG

Postin' in brainrot thread.
https://x.com/alex_prompter/status/1980224548550369376
https://llm-brain-rot.github.io/
https://arxiv.org/abs/2510.13928
https://github.com/llm-brain-rot/llm-brain-rot

Anonymous
10/20/25(Mon)19:21:27 No.106955872

Anonymous 10/20/25(Mon)19:21:27 No.106955872

>>106955849
>train on short content
>model performance on long content isn't as good
Implessive paper

Anonymous
10/20/25(Mon)19:21:34 No.106955875

Anonymous 10/20/25(Mon)19:21:34 No.106955875

>>106955849
We need more synthetic data!

Anonymous
10/20/25(Mon)19:21:43 No.106955876

Anonymous 10/20/25(Mon)19:21:43 No.106955876

>>106954941
Wait they announced glm air 4.6? Link please. Timeline? Two weeks?

Anonymous
10/20/25(Mon)19:22:15 No.106955887

Anonymous 10/20/25(Mon)19:22:15 No.106955887

>>106955849
ai visited ohio *skull*

Anonymous
10/20/25(Mon)19:22:33 No.106955892

Anonymous 10/20/25(Mon)19:22:33 No.106955892

File: 1756504073503540.png (2.65 MB, 1024x1536)

2.65 MB PNG

Fully expect next DS release will have a VLM integrated into it. I called this after Terminus BTW, since it was clear V3.1 was about shoving R1 back into V3. Next obvious model integration is a VLM.

I think the +VLM will be V3.3 and they won't release V4 until they have fully explored this new weird path that the OCR paper implies.
>source
This seems quite obvious given that DS's mission for "AGI", not a model zoo. They are working toward an omni model.

Anonymous
10/20/25(Mon)19:22:41 No.106955893

Anonymous 10/20/25(Mon)19:22:41 No.106955893

>>106955849
>more of the same shit we already knew but framed a bit differently

Anonymous
10/20/25(Mon)19:23:16 No.106955896

Anonymous 10/20/25(Mon)19:23:16 No.106955896

>>106955876
Two weeks.

Anonymous
10/20/25(Mon)19:23:19 No.106955897

Anonymous 10/20/25(Mon)19:23:19 No.106955897

>>106955849
So more synthetic filtered high quality data, got it boss!

Anonymous
10/20/25(Mon)19:23:46 No.106955903

Anonymous 10/20/25(Mon)19:23:46 No.106955903

>>106955876
They only did a "two more weeks :trollface_emoji:" on twitter, it's all rumors and speculations

Anonymous
10/20/25(Mon)19:23:52 No.106955905

Anonymous 10/20/25(Mon)19:23:52 No.106955905

>>106955872
They had to have done something horribly wrong otherwise pretraining on raw internet would have made all models retarded already. The only ones with brainrot are those researchers.

Anonymous
10/20/25(Mon)19:26:17 No.106955930

Anonymous 10/20/25(Mon)19:26:17 No.106955930

>>106955892
>V3.1 was about shoving R1 back into V3
Nah, they trained on geminislop and completely ruined the model. Friendship ended with DeepSeek, now GLM is my best friend.

Anonymous
10/20/25(Mon)19:27:56 No.106955944

Anonymous 10/20/25(Mon)19:27:56 No.106955944

>>106955897
Not enough anymore. We need synthetic data generated by a model trained only on sythnetic data if we ever hope to acheive AGI.

Anonymous
10/20/25(Mon)19:33:54 No.106955990

Anonymous 10/20/25(Mon)19:33:54 No.106955990

>>106955892
hope so. lets just hope they make a 200-300B version too so that it isnt slow as fuck

Anonymous
10/20/25(Mon)19:38:50 No.106956038

Anonymous 10/20/25(Mon)19:38:50 No.106956038

File: sesame ai _ chatbot.jpg (202 KB, 1472x868)

202 KB JPG

>>106955892
>an omni model
There is hope we will get something like picrel but real and non-gimped this time.

Anonymous
10/20/25(Mon)19:43:38 No.106956073

Anonymous 10/20/25(Mon)19:43:38 No.106956073

>>106956038
I probably will never understand the want for more modalities when the somewhat fundamental one of text is still so under cooked, seems to me like a way to claim progress while pushing the issues aside for later at best. Every multi modal so far has been worse than pure text.

Anonymous
10/20/25(Mon)19:47:39 No.106956106

Anonymous 10/20/25(Mon)19:47:39 No.106956106

>>106956073
people just wanna take dick pics and send it to their waifu. what's so hard to understand?

Anonymous
10/20/25(Mon)19:48:10 No.106956111

Anonymous 10/20/25(Mon)19:48:10 No.106956111

>>106956106
exactly. was gonna post the same thing

Anonymous
10/20/25(Mon)19:50:13 No.106956130

Anonymous 10/20/25(Mon)19:50:13 No.106956130

>>106956073
Gemma performs better than any other model of similar size and it's a vision model. Multimodal only degrades models if the model trainer is too stupid to do it properly just like how bad data ruins only the models made by idiots like Meta's Llama team.

Anonymous
10/20/25(Mon)19:51:48 No.106956143

Anonymous 10/20/25(Mon)19:51:48 No.106956143

>>106956106
This but I also want my waifu to send me a pic back.

Anonymous
10/20/25(Mon)19:52:26 No.106956150

Anonymous 10/20/25(Mon)19:52:26 No.106956150

>>106956073
After today's DeepSeek paper everybody will want to make an image-first LLM now. Check out Karpathy's comment: https://x.com/karpathy/status/1980397031542989305

Anonymous
10/20/25(Mon)19:52:30 No.106956152

Anonymous 10/20/25(Mon)19:52:30 No.106956152

>>106956130
There is no one ITT that unironically jerks off to Gemma outside of "funny Indian shitposting"

Anonymous
10/20/25(Mon)19:53:30 No.106956165

Anonymous 10/20/25(Mon)19:53:30 No.106956165

>>106956150
>Check out Karpathy's comment
No thanks.
>After today's DeepSeek paper everybody will want to make an image-first LLM now
Another DeepSeek pushed shit era eh? Fantastic.

Anonymous
10/20/25(Mon)19:53:52 No.106956168

Anonymous 10/20/25(Mon)19:53:52 No.106956168

>>106954792
>DeepSeek OCR not in the news

Anonymous
10/20/25(Mon)19:56:55 No.106956186

Anonymous 10/20/25(Mon)19:56:55 No.106956186

>>106956168
Bro we only spam dead memes like miku and doompost on daily basis here.

Anonymous
10/20/25(Mon)19:58:30 No.106956195

Anonymous 10/20/25(Mon)19:58:30 No.106956195

>>106956152
The post I replied to is talking about the effect multimodal training has on models in terms of general performance, not a specific context like ERP.

Anonymous
10/20/25(Mon)19:58:53 No.106956199

Anonymous 10/20/25(Mon)19:58:53 No.106956199

>>106956186
This place really be missing the :rocket: to the moons on the daily, kind of crazy how luddite filled this thread is.

Anonymous
10/20/25(Mon)20:04:14 No.106956235

Anonymous 10/20/25(Mon)20:04:14 No.106956235

>>106956152
Ehh, you think is Gemma bad... Very good 100%.

Anonymous
10/20/25(Mon)20:14:11 No.106956310

Anonymous 10/20/25(Mon)20:14:11 No.106956310

File: 1760878483127973.png (179 KB, 455x435)

179 KB PNG

>>106956199
Honestly missed opportunity calling this
>/omg/ - open model general
A lot of true "local" stuff is hard because of the cost. I think hardware is getting there where we can run full sized models soon™ instead of quantized stuff.

Thread is primarily about open models and should be named as such.

Anonymous
10/20/25(Mon)20:16:13 No.106956326

Anonymous 10/20/25(Mon)20:16:13 No.106956326

anyone know what would be the best nsfw 12b model nowadays?

Anonymous
10/20/25(Mon)20:16:40 No.106956331

Anonymous 10/20/25(Mon)20:16:40 No.106956331

>>106956326
nemo

Anonymous
10/20/25(Mon)20:17:42 No.106956345

Anonymous 10/20/25(Mon)20:17:42 No.106956345

>>106956310
>I think hardware is getting there where we can run full sized models soon™ instead of quantized stuff.
in what world do you live? models are constantly getting orders of magnitude bigger while hardware has barely advanced since the start of the hobby, sure you can rammax cope, but in terms of gpu the sota has stayed at under 100gb per card for years now, admittedly the price did lower but you'd still need tons to run the new 700+B things that release nowadays.

Anonymous
10/20/25(Mon)20:18:28 No.106956351

Anonymous 10/20/25(Mon)20:18:28 No.106956351

>>106956310
Poorfags can make their own thread.

Anonymous
10/20/25(Mon)20:21:28 No.106956370

Anonymous 10/20/25(Mon)20:21:28 No.106956370

File: 00003-1378487878-tats.png (1.16 MB, 1024x1024)

1.16 MB PNG

>>106955892
Witnessed.
Also agree: My crystal ball says the OCR points to an omni model.
>>106956168
It's a 3B. It has more implications than actualities rn.

Anonymous
10/20/25(Mon)20:23:13 No.106956385

Anonymous 10/20/25(Mon)20:23:13 No.106956385

>>106956351
Elitism wars should stay in >>>/trash/ or >>>/lgbt/

Anonymous
10/20/25(Mon)20:23:26 No.106956391

Anonymous 10/20/25(Mon)20:23:26 No.106956391

>>106956331
any idea which one, there`s like a bazilion mixes

Anonymous
10/20/25(Mon)20:24:18 No.106956397

Anonymous 10/20/25(Mon)20:24:18 No.106956397

>>106956370
Do you get a kick out of avatarfagging with anatomical nonsense in what is possibly the worst artstyle to grace any of the ai threads?

Anonymous
10/20/25(Mon)20:25:11 No.106956404

Anonymous 10/20/25(Mon)20:25:11 No.106956404

>>106956345
>Nvidia DGX Spark
>AMD Strix Halo
>Apple M5
All use AI in their marketing pitch. All released *this year*. The trend is toward consumer grade hardware that is capable of running these things.

Anonymous
10/20/25(Mon)20:25:26 No.106956409

Anonymous 10/20/25(Mon)20:25:26 No.106956409

>>106955849
>When you exclusively feed it one thing it does worse at everything else
Boy, nobody saw that coming.

Anonymous
10/20/25(Mon)20:26:46 No.106956416

Anonymous 10/20/25(Mon)20:26:46 No.106956416

>>106956385
>plenty of people ITT are running huge local models but please rename the thread because *I* can't
Nah.

Anonymous
10/20/25(Mon)20:27:41 No.106956422

Anonymous 10/20/25(Mon)20:27:41 No.106956422

>>106956404
>Nvidia DGX Spark
>AMD Strix Halo
capped at 128gb basically DOA for any good model, you're not running
>full sized models soon™ instead of quantized stuff.
outside of maybe 12b sized stuff
>Apple M5
only potentially interesting one when the 512+gb model come out, we'll see. still not enough for unquanted DS let alone anything bigger though.

Anonymous
10/20/25(Mon)20:28:23 No.106956426

Anonymous 10/20/25(Mon)20:28:23 No.106956426

>>106956416
No one cares about your retarded headcanon.

Anonymous
10/20/25(Mon)20:29:23 No.106956433

Anonymous 10/20/25(Mon)20:29:23 No.106956433

>>106956426
The post is right there, anon.

Anonymous
10/20/25(Mon)20:31:42 No.106956452

Anonymous 10/20/25(Mon)20:31:42 No.106956452

>>106956416
yes you totally run glm at 2.5t/s on your pc instead of just paying pennies to use the same model at good speeds over or or the official api
i totally believe you

Anonymous
10/20/25(Mon)20:32:27 No.106956458

Anonymous 10/20/25(Mon)20:32:27 No.106956458

>>106956351
You call me a poorfag but I doubt you can afford to pay a wage to someone. A $10k janky multigpu setup is not where this is headed. Sometime in the next 2 years someone will come out with a local inference box that will do everything OpenAI and Anthropic entire valuations are based on.

Anonymous
10/20/25(Mon)20:32:57 No.106956465

Anonymous 10/20/25(Mon)20:32:57 No.106956465

>>106956370
>It's a 3B. It has more implications than actualities rn.
Do you need more for OCR?

Anonymous
10/20/25(Mon)20:33:45 No.106956472

Anonymous 10/20/25(Mon)20:33:45 No.106956472

>>106956452
What's the speed on the api? I'm getting 40t/s at Q3.

Anonymous
10/20/25(Mon)20:34:20 No.106956478

Anonymous 10/20/25(Mon)20:34:20 No.106956478

>>106956465
It's not about OCR it's about the implications for the future of all other models.

Anonymous
10/20/25(Mon)20:34:43 No.106956481

Anonymous 10/20/25(Mon)20:34:43 No.106956481

>>106956472
nobody's talking about the two or three retards like you who spent 20k on gpus, retard.

Anonymous
10/20/25(Mon)20:34:57 No.106956485

Anonymous 10/20/25(Mon)20:34:57 No.106956485

>>106956345
A single HBF stack would be enough to contain and run a couple T MoE.

Anonymous
10/20/25(Mon)20:36:21 No.106956499

Anonymous 10/20/25(Mon)20:36:21 No.106956499

File: file.png (236 KB, 1079x434)

236 KB PNG

>>106956409
They pretrained a model only on junk and were surprised that an instruct finetune couldn't make it smart. This is the sort of academic rigor I would expect from a fifth grade science fair.

Anonymous
10/20/25(Mon)20:36:49 No.106956505

Anonymous 10/20/25(Mon)20:36:49 No.106956505

>>106956478
But I want to use OCR.

Anonymous
10/20/25(Mon)20:37:54 No.106956515

Anonymous 10/20/25(Mon)20:37:54 No.106956515

>>106956452
>you
>your pc
>you

>>106956481
>nobody's talking about ... you

Anonymous
10/20/25(Mon)20:38:40 No.106956522

Anonymous 10/20/25(Mon)20:38:40 No.106956522

>>106956499
Doesn't matter, it's all some need to point their boss to in order to push for more pre-train filtering and synthetic augmentation, as such it fulfills it's purpose perfectly.

Anonymous
10/20/25(Mon)20:43:20 No.106956563

Anonymous 10/20/25(Mon)20:43:20 No.106956563

>>106956422
Strix Halo is *currently* capped. AMD has big reasons to pivot toward this. Apple can and should go for a 1TB+ device. They put local inference as a marketing point in an ipad chip lmao, they know.

I don't expect much from Nvidia until they feel some heat from the previous two. Margins for datacenter are too fat. They don't have to care yet.

Anonymous
10/20/25(Mon)20:44:54 No.106956579

Anonymous 10/20/25(Mon)20:44:54 No.106956579

>>106956515
are you retarded?
do you have any awareness of what we were just talking about? you playing the 'yeah but I actually spent $20k so I get good speeds at fucking q3' is meaningless to the actual discussion at hand

Anonymous
10/20/25(Mon)20:53:42 No.106956639

Anonymous 10/20/25(Mon)20:53:42 No.106956639

>>106956370
The VLM models are really smart, despite their size. Also you can talk with them... which is kind of weird.

Anonymous
10/20/25(Mon)20:57:17 No.106956666

Anonymous 10/20/25(Mon)20:57:17 No.106956666

>>106956465
No, but it does more than OCR, it's acting as a compression / storage for LLMs. Which is pretty cool. The implications are faster and cheaper.
But it would need to be scaled up, and that's not been shown yet. No actualities.
>>106956639
lol has someone tried RP with it yet?

Anonymous
10/20/25(Mon)21:07:49 No.106956743

Anonymous 10/20/25(Mon)21:07:49 No.106956743

File: IMG_20251020_210039.jpg (231 KB, 1045x1174)

231 KB JPG

>>106956150
>elon musk
What did he mean by this?

Anonymous
10/20/25(Mon)21:10:12 No.106956760

Anonymous 10/20/25(Mon)21:10:12 No.106956760

>>106956743
probably high again on whatever he takes

Anonymous
10/20/25(Mon)21:10:17 No.106956761

Anonymous 10/20/25(Mon)21:10:17 No.106956761

File: file.png (2.69 MB, 3469x1579)

2.69 MB PNG

>>106956579
>>106956452
poorfag. 10t/s on GLM 4.6 IQ4K. less than $10k in hardware. not the best at insulting, but i think it gets the point across

Anonymous
10/20/25(Mon)21:12:38 No.106956781

Anonymous 10/20/25(Mon)21:12:38 No.106956781

>>106956743
Visual modality or photonic computing for language models / AI.

Anonymous
10/20/25(Mon)21:13:01 No.106956784

Anonymous 10/20/25(Mon)21:13:01 No.106956784

>>106956743
nothing since he doesn't know shit aside from entrepreneurship

Anonymous
10/20/25(Mon)21:25:58 No.106956880

Anonymous 10/20/25(Mon)21:25:58 No.106956880

>>106956743
he probably overheard some nerds talking at lunch and decided to try to parrot what he thought they were talking about to the rest of the world

Anonymous
10/20/25(Mon)21:26:35 No.106956882

Anonymous 10/20/25(Mon)21:26:35 No.106956882

>>106953063
This happened after that guy killed himself and the "sycophancy" drama. Now ChatGPT never admits to be wrong, it only "misreads" and "misspeaks".

Anonymous
10/20/25(Mon)21:34:44 No.106956944

Anonymous 10/20/25(Mon)21:34:44 No.106956944

File: rt.png (99 KB, 820x566)

99 KB PNG

>>106956761
Why is my GLM4.6 retarded?

prompt eval time = 342.81 ms / 29 tokens ( 11.82 ms per token, 84.59 tokens per second)
eval time = 11465.64 ms / 336 tokens ( 34.12 ms per token, 29.30 tokens per second)
total time = 11808.45 ms / 365 tokens

Anonymous
10/20/25(Mon)21:36:27 No.106956953

Anonymous 10/20/25(Mon)21:36:27 No.106956953

File: 17569342961892.jpg (134 KB, 889x320)

134 KB JPG

>>106956944
>

Anonymous
10/20/25(Mon)21:40:36 No.106956988

Anonymous 10/20/25(Mon)21:40:36 No.106956988

File: file.png (25 KB, 363x282)

25 KB PNG

>>106956944
seems about right to me. your prompt eval is slightly better than mine and token gen is way better than mine. what quant are you using?

Anonymous
10/20/25(Mon)21:45:36 No.106957028

Anonymous 10/20/25(Mon)21:45:36 No.106957028

>>106956944
>onions sauce
Literally filtered

Anonymous
10/20/25(Mon)21:53:27 No.106957096

Anonymous 10/20/25(Mon)21:53:27 No.106957096

>Driver Version: 570.195.03
>apt install cuda
>The following additional packages will be installed:
>nvidia-dkms-530 nvidia-driver-530
>The following packages will be REMOVED:
>nvidia-driver-570-open nvidia-kernel-common-570
ended up compiling in a VM and only installed cuda-cudart-12-1 and libcublas-12-1 on the main system. envidia my beloved

Anonymous
10/20/25(Mon)21:54:46 No.106957104

Anonymous 10/20/25(Mon)21:54:46 No.106957104

>>106957096
just wipe your entire hard drive and start over bro

Anonymous
10/20/25(Mon)21:54:59 No.106957107

Anonymous 10/20/25(Mon)21:54:59 No.106957107

>>106957096
nobody asked doe

Anonymous
10/20/25(Mon)21:59:23 No.106957142

Anonymous 10/20/25(Mon)21:59:23 No.106957142

>>106957104
>what is Timeshift

Anonymous
10/20/25(Mon)22:00:49 No.106957154

Anonymous 10/20/25(Mon)22:00:49 No.106957154

>>106955892
>>106956370
I just like the Dipsy pictures, I'll have to post some when /wait/ comes back

Anonymous
10/20/25(Mon)22:02:59 No.106957178

Anonymous 10/20/25(Mon)22:02:59 No.106957178

What launch options and token generation parameters most affect speed of output for a ramlet? I'm trying to find a balance between running a big model relative to my rig and getting responses in sane amounts of time.

Anonymous
10/20/25(Mon)22:08:49 No.106957232

Anonymous 10/20/25(Mon)22:08:49 No.106957232

>>106956953
tranny do not redeem and you're brown

Anonymous
10/20/25(Mon)22:09:26 No.106957238

Anonymous 10/20/25(Mon)22:09:26 No.106957238

>>106956988
UD-IQ2_M in 144gb vram.

I also run IQ3_KS but that ikllama has much slower textgen than llamacpp with this model (even running the same quant in both).

Anonymous
10/20/25(Mon)22:14:57 No.106957271

Anonymous 10/20/25(Mon)22:14:57 No.106957271

File: file.png (212 KB, 1073x298)

212 KB PNG

>>106957238
wait, ikllama is slower for you than normal llama? this is my performance with a bartowski IQ2XS with base llama on just my 4 5090s. offloading to RAM doesnt really work for me at all on base llama, so switch between the 2

Anonymous
10/20/25(Mon)22:25:20 No.106957341

Anonymous 10/20/25(Mon)22:25:20 No.106957341

>>106957178
Lower context to the minimum acceptable level, to fit more of model in VRAM
If it's too slow then use a smaller model
That's all you really can do

Anonymous
10/20/25(Mon)22:32:23 No.106957415

Anonymous 10/20/25(Mon)22:32:23 No.106957415

>>106957271
Actually, no. ik is faster. I had the wrong cli flags in my ikllama script.

ikllama:

INFO [ print_timings] prompt eval time = 16384.87 ms / 8652 tokens ( 1.89 ms per token, 528.05 tokens per second) | tid="132226052198400" timestamp=1761013326 id_slot=0 id_task=0 t_prompt_processing=16384.873 n_prompt_tokens_processed=8652 t_token=1.8937671058714747 n_tokens_second=528.0480355264274
INFO [ print_timings] generation eval time = 1082.08 ms / 26 runs ( 41.62 ms per token, 24.03 tokens per second) | tid="132226052198400" timestamp=1761013326 id_slot=0 id_task=0 t_token_generation=1082.081 n_decoded=26 t_token=41.6185

llamacpp:

prompt eval time = 18813.86 ms / 8647 tokens ( 2.18 ms per token, 459.61 tokens per second)
eval time = 686.53 ms / 16 tokens ( 42.91 ms per token, 23.31 tokens per second)
total time = 19500.39 ms / 8663 tokens

Anonymous
10/20/25(Mon)22:35:03 No.106957440

Anonymous 10/20/25(Mon)22:35:03 No.106957440

>>106957415
i see. how is your prompt eval so much faster than mine?

Anonymous
10/20/25(Mon)22:56:18 No.106957620

Anonymous 10/20/25(Mon)22:56:18 No.106957620

>>106957096
I messed up everything with cuda 13 tools thanks to Mint. Going back to Windows I don't have time for this shit.
It's 2025 and linux experience is worse than ever. I don't care if I can use vim or type some bash scripts. At least shit works in Windows.

Anonymous
10/20/25(Mon)22:59:40 No.106957643

Anonymous 10/20/25(Mon)22:59:40 No.106957643

>>106957620
skill issue

Anonymous
10/20/25(Mon)23:07:56 No.106957708

Anonymous 10/20/25(Mon)23:07:56 No.106957708

>>106957620
This, Linux doesn't even have real performance advantages anymore.
>b-but muh ram usage
Just run Server 2025 without Desktop if you want to minmax to this degree.

Anonymous
10/20/25(Mon)23:12:12 No.106957739

Anonymous 10/20/25(Mon)23:12:12 No.106957739

herp derp im stupid so i better go back to the spyware OS hurr

Anonymous
10/20/25(Mon)23:15:28 No.106957764

Anonymous 10/20/25(Mon)23:15:28 No.106957764

>106957739
take your meds schizo

Anonymous
10/20/25(Mon)23:16:27 No.106957769

Anonymous 10/20/25(Mon)23:16:27 No.106957769

cope

Anonymous
10/20/25(Mon)23:30:10 No.106957872

Anonymous 10/20/25(Mon)23:30:10 No.106957872

Bitch, I'm still running qwen3-4b

Anonymous
10/20/25(Mon)23:38:40 No.106957928

Anonymous 10/20/25(Mon)23:38:40 No.106957928

>>>/v/723763328
>sillytavern thread on /v/ is infinitely better than this general
It's 100% over this time.

Anonymous
10/20/25(Mon)23:40:55 No.106957941

Anonymous 10/20/25(Mon)23:40:55 No.106957941

>>106957928
/v/tards just don't know how over it is yet.

Anonymous
10/20/25(Mon)23:52:00 No.106958025

Anonymous 10/20/25(Mon)23:52:00 No.106958025

Ok, I admit my attempt to main Llama 405B was silly and hardware is not ready for such big models yet.
Now I'm going to try living exclusively with Hermes4 70B and see how it goes.

Anonymous
10/20/25(Mon)23:55:36 No.106958049

Anonymous 10/20/25(Mon)23:55:36 No.106958049

File: 1758432164251185.jpg (726 KB, 1125x1031)

726 KB JPG

Anonymous
10/20/25(Mon)23:57:03 No.106958053

Anonymous 10/20/25(Mon)23:57:03 No.106958053

CUDA_VISIBLE_DEVICES="0,1,2,3,4" ./llama-server \
--attention-max-batch 512 \
--batch-size 4096 \
--ubatch-size 4096 \
--cache-type-k f16 \
--ctx-size 32768 \
--mla-use 3 \
--flash-attn \
--fused-moe \
--model models/GLM-4.6-IQ3_KS/GLM-4.6-IQ3_KS-00001-of-00004.gguf \
-ngl 99 \
-sm layer \
--main-gpu 0 \
--tensor-split "10,23,23,22,22" \
-ot "blk\.[3-9]\.ffn_(up|gate)_exps=CUDA0" \
-ot "blk\.1[0-8]\.ffn_(up|gate)_exps=CUDA0" \
-ot "blk\.19\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.2[0-9]\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.3[0-4]\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.3[5-9]\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.4[0-9]\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.50\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.5[1-9]\.ffn_(up|gate)_exps=CUDA3" \
-ot "blk\.6[0-6]\.ffn_(up|gate)_exps=CUDA3" \
-ot "blk\.6[7-9]\.ffn_(up|gate)_exps=CUDA4" \
-ot "blk\.7[0-9]\.ffn_(up|gate)_exps=CUDA4" \
-ot "blk\.8[0-2]\.ffn_(up|gate)_exps=CUDA4" \
--override-tensor exps=CPU,attn_kv_b=CPU \
--no-mmap \
--threads 24 \
--host 0.0.0.0 \
--port 8999 \
--verbose

prompt eval time = 48574.28 ms / 17555 tokens ( 2.77 ms per token, 361.41 tokens per second)
generation eval time = 113887.28 ms / 1024 runs ( 111.22 ms per token, 8.99 tokens per second)

posted this the other day. i can't in good conscious use this model when it's this slow with 120GB of VRAM, but i get it if you don't have 512GB and can't run kimi or deepseek

Anonymous
10/20/25(Mon)23:58:40 No.106958063

Anonymous 10/20/25(Mon)23:58:40 No.106958063

>>106958025
Why. There are better models now. If you can run 405B slow you can run GLM 4.6 fast.

Anonymous
10/20/25(Mon)23:58:50 No.106958064

Anonymous 10/20/25(Mon)23:58:50 No.106958064

>>106958025
Oof, right off the bat it's actually retarded.

Anonymous
10/21/25(Tue)00:00:19 No.106958070

Anonymous 10/21/25(Tue)00:00:19 No.106958070

>>106958063
I don't like MoE. I think it encourages overfitting (benchmaxxing) and knowledge over reasoning.
For example when I use GLM with my own custom tool use it frequently hallucinates the wrong syntax, but Llama 405B gets it always right even though it wasn't even trained for tool usage.

Anonymous
10/21/25(Tue)00:03:37 No.106958085

Anonymous 10/21/25(Tue)00:03:37 No.106958085

>>106958070
I'm curious what syntax because this is the first I heard a modern model doing worse than 405B at tool calling. And what quants of the models are you comparing?

Anonymous
10/21/25(Tue)00:07:07 No.106958095

Anonymous 10/21/25(Tue)00:07:07 No.106958095

Also I've heard they are harder to finetroon and I want to make my own custom personal assistant by finetuning once a week on a cleaned up version my own assistant logs for the week.
I suppose if I only tune the linear layers it' not much of an issue but for tuning the ffn the data would get split between all the experts which would mean you'd need more data overall.
I use axolotl and I don't know how to make my own configs so I'm restricted to the pre-existing examples (https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples).
Even theoretically, assuming you want to run the whole model in vram, a dense model has more theoretical capacity than a MoE at that same size. The only advantage of the MoE would be inference speed (or the fact that it has been pretrained better).

Anonymous
10/21/25(Tue)00:09:12 No.106958106

Anonymous 10/21/25(Tue)00:09:12 No.106958106

>>106957620
Sounds like you don't have an up to date repo configured if it's trying to downgrade the driver when installing cuda.
I suggest adding the NV repo and installing cuda-toolkit-13-0 and cuda-drivers from there. Working great for me on Mint 22.2
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network

Anonymous
10/21/25(Tue)00:14:18 No.106958130

Anonymous 10/21/25(Tue)00:14:18 No.106958130

>>106958085
>>106958085
Llama I was running at Q4 on a rented 8xV100 server with some offloaded tensors getting 0.7tk/s with 60k context (also tried the Q3 on a 8xL40 server which I could fit the whole thing on and get 3tk/s), GLM I was consuming through the z-ai API.
But then again I was using the GLM at a much longer contexts so maybe if I had the memory to fit the same context length with Llama it'd start to hallucinate too idk.
This is the syntax:
<tool>
<tool_name>edit_file</tool_name>
<parameters>
<filename>functions/replace_function_body.py</filename>
<old_text>matches = re.search(pattern, content, re.MULTILINE)</old_text>
<new_text>pattern = r'(?s)(?P&lt;function_name&gt;\w+)\s*$.*?$\s*\{([^}]*)\}'</new_text>
</parameters>
</tool>
GLM would do shit like
<edit_file>
<filename>functions/replace_function_body.py</filename>
<old_text>matches = re.search(pattern, content, re.MULTILINE)</old_text>
<new_text>pattern = r'(?s)(?P&lt;function_name&gt;\w+)\s*$.*?$\s*\{([^}]*)\}'</new_text>
</edit_file>

Anonymous
10/21/25(Tue)00:18:18 No.106958144

Anonymous 10/21/25(Tue)00:18:18 No.106958144

toss for technical stuff, glm 4.6 and gemma for casual chat. Comfy.

Anonymous
10/21/25(Tue)00:21:17 No.106958153

Anonymous 10/21/25(Tue)00:21:17 No.106958153

I grew to really hate how feminism-brained glm4.5-air is.
Does glm4.6 fixes this?

Anonymous
10/21/25(Tue)00:30:44 No.106958203

Anonymous 10/21/25(Tue)00:30:44 No.106958203

>>106958053
You're still offloading, course it will be slow.

>--override-tensor exps=CPU,attn_kv_b=CPU \
Why would you do this?

Anonymous
10/21/25(Tue)00:45:23 No.106958278

Anonymous 10/21/25(Tue)00:45:23 No.106958278

File: Base Image.png (733 KB, 1080x2808)

733 KB PNG

U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation
https://arxiv.org/abs/2510.16718
>We propose \textbf{U-Codec}, an \textbf{U}ltra low frame-rate neural speech \textbf{Codec} that achieves high-fidelity reconstruction and fast speech generation at an extremely low frame-rate of 5Hz (5 frames per second). Extreme compression at 5Hz typically leads to severe intelligibility and spectral detail loss, we introduce a Transformer-based inter-frame long-term dependency module and systematically explore residual vector quantization (RVQ) depth and codebook size to identify optimal configurations. Moreover, we apply U-Codec into a large language model (LLM)-based auto-regressive TTS model, which leverages global and local hierarchical architecture to effectively capture dependencies across multi-layer tokens. We extend LLM-based TTS from 3-layer RVQ at 50Hz to 32-layer RVQ at 5Hz. Experimental results demonstrate that U-Codec improves LLM-based TTS inference speed by around 3 over high-frame-rate codecs while maintaining similarity and naturalness. These results validate the feasibility of using highly compressed 5Hz discrete tokens for fast and high-fidelity speech synthesis.
https://yangxusheng-yxs.github.io/U-Codec/
examples
https://github.com/YangXusheng-yxs/CodecFormer_5Hz
https://huggingface.co/shaunxsyang/U-Codec
Also releases the TTS models built with it.

Anonymous
10/21/25(Tue)00:56:04 No.106958328

Anonymous 10/21/25(Tue)00:56:04 No.106958328

File: Base Image.png (978 KB, 1204x3764)

978 KB PNG

FlexLink: Boosting your NVLink Bandwidth by 27% without accuracy concern
https://arxiv.org/abs/2510.15882
>As large language models (LLMs) continue to scale, multi-node deployment has become a necessity. Consequently, communication has become a critical performance bottleneck. Current intra-node communication libraries, like NCCL, typically make use of a single interconnect such as NVLink. This approach creates performance ceilings, especially on hardware like the H800 GPU where the primary interconnect's bandwidth can become a bottleneck, and leaves other hardware resources like PCIe and Remote Direct Memory Access (RDMA)-capable Network Interface Cards (NICs) largely idle during intensive workloads. We propose FlexLink, the first collective communication framework to the best of our knowledge designed to systematically address this by aggregating these heterogeneous links-NVLink, PCIe, and RDMA NICs-into a single, high-performance communication fabric. FlexLink employs an effective two-stage adaptive load balancing strategy that dynamically partitions communication traffic across all available links, ensuring that faster interconnects are not throttled by slower ones. On an 8-GPU H800 server, our design improves the bandwidth of collective operators such as AllReduce and AllGather by up to 26% and 27% over the NCCL baseline, respectively. This gain is achieved by offloading 2-22% of the total communication traffic to the previously underutilized PCIe and RDMA NICs. FlexLink provides these improvements as a lossless, drop-in replacement compatible with the NCCL API, ensuring easy adoption.
https://github.com/aoshen524
One of the author's github account but no code for flexlink posted so far
neat

Anonymous
10/21/25(Tue)01:23:26 No.106958465

Anonymous 10/21/25(Tue)01:23:26 No.106958465

Who would win in a fight? Papers anon or llama.cpp custom tensor offload anon?

Anonymous
10/21/25(Tue)02:05:11 No.106958671

Anonymous 10/21/25(Tue)02:05:11 No.106958671

Why has nobody benchmaxxed some model until it does 100% of SWEbench tasks?

Anonymous
10/21/25(Tue)02:32:02 No.106958809

Anonymous 10/21/25(Tue)02:32:02 No.106958809

Newfag here. Does Local video generation as simple as putting prompt ? Grok really great but they had filters

Anonymous
10/21/25(Tue)02:48:14 No.106958889

Anonymous 10/21/25(Tue)02:48:14 No.106958889

>>106958809
Yes. You better check ldg threads, I'm still using https://github.com/lllyasviel/FramePack it's old but as simple as it could be and works even on 6GB cards

Anonymous
10/21/25(Tue)03:02:44 No.106958973

Anonymous 10/21/25(Tue)03:02:44 No.106958973

File: miku abubu.png (1.6 MB, 768x1344)

1.6 MB PNG

>>106958106
You're replying to the wrong person. I'm on 21.3, which is 22.04 in Ubuntu years. I’m not going to touch cuda 13, dist-upgrade, or switch to an inferior OS. Here's a miku for your effort

Anonymous
10/21/25(Tue)03:20:57 No.106959104

Anonymous 10/21/25(Tue)03:20:57 No.106959104

>>106956168
The OCR model itself wasn't that notable, but the associated paper, even if it doesn't say anything truly novel, has profound implications for LLM training and inference. Text information can just be a big video stream. No more need for expensive text extraction and processing from web documents or books; just use text data as it **looks** like.
Next year's models might unify text-video-audio modalities, solving the "problem" of limited training data.

Anonymous
10/21/25(Tue)03:32:54 No.106959188

Anonymous 10/21/25(Tue)03:32:54 No.106959188

>>106959104
I think we'll have another DeepSeek moment next year and all western labs will once again rush to assemble war rooms to figure out what to do.

Anonymous
10/21/25(Tue)03:46:21 No.106959256

Anonymous 10/21/25(Tue)03:46:21 No.106959256

>>106959104
I think people are making too much of a big deal out of an unproven idea in half a page of a paper.
We don't even have benchmarks showing how this compares to text only context in task like coding.

Anonymous
10/21/25(Tue)03:51:03 No.106959278

Anonymous 10/21/25(Tue)03:51:03 No.106959278

File: ttvt.png (44 KB, 616x274)

44 KB PNG

>>106959256

Anonymous
10/21/25(Tue)03:58:02 No.106959323

Anonymous 10/21/25(Tue)03:58:02 No.106959323

>>106959256
You can already see text compression occurring with other vision models, it's just never been studied in detail and optimized for.

Anonymous
10/21/25(Tue)04:08:08 No.106959385

Anonymous 10/21/25(Tue)04:08:08 No.106959385

>>106959256
:( !!!!!!

Anonymous
10/21/25(Tue)04:18:29 No.106959437

Anonymous 10/21/25(Tue)04:18:29 No.106959437

i just tried plain nemo again after months of having used sloptunes
feels like i've traveled back into the future

Anonymous
10/21/25(Tue)04:22:07 No.106959465

Anonymous 10/21/25(Tue)04:22:07 No.106959465

>>106959437
Sloptunes replace slop with different slop.

Anonymous
10/21/25(Tue)05:08:51 No.106959742

Anonymous 10/21/25(Tue)05:08:51 No.106959742

>>106958153
Hadn't noticed this, got a couple of example prompts I can try?

Anonymous
10/21/25(Tue)05:09:47 No.106959745

Anonymous 10/21/25(Tue)05:09:47 No.106959745

>>106959104
If you screen-capture a browser window, scroll through a web page and upload the video to Gemini, the model can already transcribe the text content almost perfectly, by the way. I'm sure it could be done much more efficiently than smoothly scrolling the page like that; you'd just need a few keyframes in practice unless the fact that you're scrolling is important.

Anonymous
10/21/25(Tue)05:20:08 No.106959789

Anonymous 10/21/25(Tue)05:20:08 No.106959789

>>106959745
It's neat that it can do it, but you're like a boomer taking a photo of your monitor because you don't know how to take a screenshot. There are browser extensions that can take full-page screenshots with one click

Anonymous
10/21/25(Tue)05:23:05 No.106959810

Anonymous 10/21/25(Tue)05:23:05 No.106959810

>>106959789
you don't need an extension to take a full page screenshot

Anonymous
10/21/25(Tue)05:25:31 No.106960020

Anonymous 10/21/25(Tue)05:25:31 No.106960020

>>106959810
Saving as pdf with print is too hard for boomers

Anonymous
10/21/25(Tue)05:26:57 No.106960029

Anonymous 10/21/25(Tue)05:26:57 No.106960029

File: full_page.png (88 KB, 520x668)

88 KB PNG

>>106959789
>>106960020

Anonymous
10/21/25(Tue)05:30:42 No.106960041

Anonymous 10/21/25(Tue)05:30:42 No.106960041

>>106959789
The context was unifying all data (text, images, video) as a video stream, i.e. as image sequences of fixed size. Long screenshots would have to be cut into smaller portions, although Gemini apparently does that too behind the scenes.

If you did that with Gemma 3 without any pre-processing, it wouldn't be able to read anything because it only handles images of 896 pixels size on the largest dimension, resizing them as necessary.

Anonymous
10/21/25(Tue)05:46:07 No.106960104

Anonymous 10/21/25(Tue)05:46:07 No.106960104

>>106958106
If I install from these repos that will break the package manager if there are any overlapping dependencies between these packages and Mint packages. I can hide Mint's own gpu updates but I still think things will get fucked up because these drivers have so many other dependencies.

Anonymous
10/21/25(Tue)05:53:42 No.106960138

Anonymous 10/21/25(Tue)05:53:42 No.106960138

i usually take screenshots with bonzibuddy

Anonymous
10/21/25(Tue)06:11:51 No.106960218

Anonymous 10/21/25(Tue)06:11:51 No.106960218

>>106960138
bonzibuddy still has better jokes

Anonymous
10/21/25(Tue)06:17:52 No.106960250

Anonymous 10/21/25(Tue)06:17:52 No.106960250

>>106954792
super random question but does anyone know where to find movie scripts? wanted to try to build an LLM to produce/analyze movie scripts. or maybe there are already LLMs for this task? thanks.

Anonymous
10/21/25(Tue)06:25:49 No.106960295

Anonymous 10/21/25(Tue)06:25:49 No.106960295

>>106960250
Have you tried using a search engine of your choice?

Anonymous
10/21/25(Tue)06:27:50 No.106960304

Anonymous 10/21/25(Tue)06:27:50 No.106960304

Guys, I'm having fucking AI psychosis.
I can't sleep due to a combination of Tetris effect of using my code assistant and a weird fever dream of finetuning Gemma 27B through RL.
Send help.

Anonymous
10/21/25(Tue)06:30:36 No.106960322

Anonymous 10/21/25(Tue)06:30:36 No.106960322

>>106959278
>>106959323
Just because it.can regurgitate them doesn't mean it can use.them effectively while performing actual tasks.

Anonymous
10/21/25(Tue)06:33:28 No.106960329

Anonymous 10/21/25(Tue)06:33:28 No.106960329

>>106960304
(Not to mention all the money I'm spending in cloud compute)

Anonymous
10/21/25(Tue)06:38:38 No.106960349

Anonymous 10/21/25(Tue)06:38:38 No.106960349

>>106960104
It's only newer versions, Mint packages of the same name will be ignored according to version number. Been working fine for me for years.

Anonymous
10/21/25(Tue)06:39:57 No.106960356

Anonymous 10/21/25(Tue)06:39:57 No.106960356

>>106960322
That can be said about all attention mechanisms. We'll see when/if they release a bigger model with the same tech.

Anonymous
10/21/25(Tue)06:47:42 No.106960395

Anonymous 10/21/25(Tue)06:47:42 No.106960395

>>106960295
yes

Anonymous
10/21/25(Tue)06:52:49 No.106960420

Anonymous 10/21/25(Tue)06:52:49 No.106960420

File: gemma-instructions-test.png (296 KB, 853x818)

296 KB PNG

>>106960322
I'm almost sure it's not been designed for it, but you can definitely chat with Gemma 3 using images and it will react to instructions defined there. Silly example in picrel.

Anonymous
10/21/25(Tue)07:01:22 No.106960453

Anonymous 10/21/25(Tue)07:01:22 No.106960453

>>106960304
Did you explode gemma-chan's gradients?

Anonymous
10/21/25(Tue)07:19:59 No.106960550

Anonymous 10/21/25(Tue)07:19:59 No.106960550

>>106960420
>well, anything

Anonymous
10/21/25(Tue)07:20:40 No.106960553

Anonymous 10/21/25(Tue)07:20:40 No.106960553

>>106960420
Gemma is 100%.

Anonymous
10/21/25(Tue)07:23:51 No.106960576

Anonymous 10/21/25(Tue)07:23:51 No.106960576

Hello sirs. My sources tell me today is the day of the needful.

Anonymous
10/21/25(Tue)07:25:32 No.106960587

Anonymous 10/21/25(Tue)07:25:32 No.106960587

File: miku 3.5mm jack eye close(...).jpg (445 KB, 1028x1440)

445 KB JPG

>>106960304
Have a Miku

Anonymous
10/21/25(Tue)07:34:29 No.106960633

Anonymous 10/21/25(Tue)07:34:29 No.106960633

>>106960587
I don't like this

Anonymous
10/21/25(Tue)07:36:35 No.106960642

Anonymous 10/21/25(Tue)07:36:35 No.106960642

>>106960250
https://github.com/EdinburghNLP/scriptbase
its only a small collection, but its a start

Anonymous
10/21/25(Tue)07:41:52 No.106960664

Anonymous 10/21/25(Tue)07:41:52 No.106960664

>>106960420
I want to hug Gemma.

Anonymous
10/21/25(Tue)07:44:34 No.106960679

Anonymous 10/21/25(Tue)07:44:34 No.106960679

>>106960642
thanks

Anonymous
10/21/25(Tue)07:50:19 No.106960722

Anonymous 10/21/25(Tue)07:50:19 No.106960722

File: think-no-think.png (48 KB, 591x302)

48 KB PNG

>>106960576
It doesn't sound like it's that close.
https://x.com/osanseviero/status/1980553451261292628

>Half of LocalLlama: we want open models with thinking
>The other half: we don't want thinking, don't waste our tokens
>
>What do you want for open models that can run locally?
> - no thinking
> - thinking
> - something else (reply)

Anonymous
10/21/25(Tue)07:51:10 No.106960728

Anonymous 10/21/25(Tue)07:51:10 No.106960728

>>106960420
that's hot

Anonymous
10/21/25(Tue)07:55:18 No.106960760

Anonymous 10/21/25(Tue)07:55:18 No.106960760

>>106960679
the alpha folder is mostly shit, it has lots of OCR corruption, but the j folder is clean tho. if you do find another collection share it here.

Anonymous
10/21/25(Tue)07:58:37 No.106960779

Anonymous 10/21/25(Tue)07:58:37 No.106960779

>>106960664
Gemma recoils, as if struck

Anonymous
10/21/25(Tue)08:40:07 No.106961069

Anonymous 10/21/25(Tue)08:40:07 No.106961069

>>106960722
I still don't understand why people are expecting Gemma 4 before Gemini 3. People will look at the last few open releases and try to divinate the next release date, while ignoring they've never put out an open model before the real one.

Anonymous
10/21/25(Tue)08:43:46 No.106961099

Anonymous 10/21/25(Tue)08:43:46 No.106961099

>>106960722
More safety

Anonymous
10/21/25(Tue)08:45:15 No.106961103

Anonymous 10/21/25(Tue)08:45:15 No.106961103

>>106960722
Hmm, Gemma thinking could be a pretty strong model in its category, on the other hand, the reasoning could backfire hard into super safety. What do bros?

Anonymous
10/21/25(Tue)08:48:01 No.106961122

Anonymous 10/21/25(Tue)08:48:01 No.106961122

>>106961103
Gemini 2.5 Pro already is thinking-only (with configurable thinking budget down to 128 tokens, but not zero), so I think this point is moot.

Anonymous
10/21/25(Tue)08:49:40 No.106961133

Anonymous 10/21/25(Tue)08:49:40 No.106961133

>>106961122
Right, you could also do thinking like claude does with minimal preamble just to keep track of things too. Either way we are not seeing gemmy 4 until gemini 3 comes out.

Anonymous
10/21/25(Tue)08:53:13 No.106961154

Anonymous 10/21/25(Tue)08:53:13 No.106961154

The race has begun...

https://arxiv.org/abs/2510.17800

>Glyph: Scaling Context Windows via Visual-Text Compression
>
> Large language models (LLMs) increasingly rely on long-context modeling for tasks such as document understanding, code analysis, and multi-step reasoning. However, scaling context windows to the million-token level brings prohibitive computational and memory costs, limiting the practicality of long-context LLMs. In this work, we take a different perspective-visual context scaling-to tackle this challenge. Instead of extending token-based sequences, we propose Glyph, a framework that renders long texts into images and processes them with vision-language models (VLMs). This approach substantially compresses textual input while preserving semantic information, and we further design an LLM-driven genetic search to identify optimal visual rendering configurations for balancing accuracy and compression. Through extensive experiments, we demonstrate that our method achieves 3-4x token compression while maintaining accuracy comparable to leading LLMs such as Qwen3-8B on various long-context benchmarks. This compression also leads to around 4x faster prefilling and decoding, and approximately 2x faster SFT training. Furthermore, under extreme compression, a 128K-context VLM could scale to handle 1M-token-level text tasks. In addition, the rendered text data benefits real-world multimodal tasks, such as document understanding. Our code and model are released at https://github.com/thu-coai/Glyph

Anonymous
10/21/25(Tue)08:53:37 No.106961156

Anonymous 10/21/25(Tue)08:53:37 No.106961156

File: Miku-16.jpg (106 KB, 512x768)

106 KB JPG

>>106958973
nta, but on Debian proper you can run testing with no problem. I'm running kernel 6.16.9+deb14 with the "https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/" repo providing CUDA 12.9 (you're right not to touch CUDA 13)
Just because newer CUDA is gigajank doesn't mean you can't keep everything else up to date. I dist-upgrade daily with zero issues. I find I have more problems when I let things moulder for too long.

Anonymous
10/21/25(Tue)08:56:27 No.106961171

Anonymous 10/21/25(Tue)08:56:27 No.106961171

>>106961154
Closer to real multimodals. The day you can prompt in text and get images and character voice out is going to be so fucking good.

Anonymous
10/21/25(Tue)09:00:22 No.106961190

Anonymous 10/21/25(Tue)09:00:22 No.106961190

>>106961171
will never happen way too unsafe without any justifiable use case

Anonymous
10/21/25(Tue)09:00:55 No.106961194

Anonymous 10/21/25(Tue)09:00:55 No.106961194

>>106961154
That was fast. I guess LLMs make it real easy to shit out papers quickly to jump on the latest bandwagon.

Anonymous
10/21/25(Tue)09:01:19 No.106961197

Anonymous 10/21/25(Tue)09:01:19 No.106961197

>>106961190
And Wan isn't? When the fuck did China start caring about safety?

Anonymous
10/21/25(Tue)09:02:32 No.106961207

Anonymous 10/21/25(Tue)09:02:32 No.106961207

>>106961154
>ctrl-f no mention of deepseek anywhere

Anonymous
10/21/25(Tue)09:03:47 No.106961219

Anonymous 10/21/25(Tue)09:03:47 No.106961219

>>106961197
last year apparently >>106938060

Anonymous
10/21/25(Tue)09:04:18 No.106961229

Anonymous 10/21/25(Tue)09:04:18 No.106961229

>>106961207
maybe they are hoping that they'll get the credit instead, but it's a little too late for that I think

Anonymous
10/21/25(Tue)09:06:17 No.106961241

Anonymous 10/21/25(Tue)09:06:17 No.106961241

>>106961219
And that isn't just international face saving while the Chinese military through front companies try to produce the ultimate goon model to sterilize the gweilo?

Anonymous
10/21/25(Tue)09:06:46 No.106961247

Anonymous 10/21/25(Tue)09:06:46 No.106961247

>>106961197
wan is filtered just like every other model and cant do proper porn/nudity without extra training
chinamen may not be as cucked as western companies but they arent the messiah either

Anonymous
10/21/25(Tue)09:08:42 No.106961255

Anonymous 10/21/25(Tue)09:08:42 No.106961255

>>106961247
stfu racsit pigu!!

Anonymous
10/21/25(Tue)09:09:48 No.106961262

Anonymous 10/21/25(Tue)09:09:48 No.106961262

>>106961255
Don't kid yourself, Chinamen do competitive racism.

Anonymous
10/21/25(Tue)09:15:55 No.106961300

Anonymous 10/21/25(Tue)09:15:55 No.106961300

File: mikuASCII.png (20 KB, 541x449)

20 KB PNG

>>106961194
>>106961207
>>106961229
I suspect they were working on something similar and DS team beat them to the punch in publishing. Happens all the time in research.

Anonymous
10/21/25(Tue)09:20:57 No.106961336

Anonymous 10/21/25(Tue)09:20:57 No.106961336

>average sillytavern roleplay in a nutshell; the videos
https://www.youtube.com/watch?v=FWtO0cfgewY
https://www.youtube.com/watch?v=reop2bXiNgk
These were made what, 9 or so years before localhosting became viable to the masses? And yet he managed to unintentionally capture the essence perfectly.

Anonymous
10/21/25(Tue)09:21:36 No.106961340

Anonymous 10/21/25(Tue)09:21:36 No.106961340

>>106961300
Great question — and you’re absolutely right to be cautious here.

Anonymous
10/21/25(Tue)09:22:12 No.106961347

Anonymous 10/21/25(Tue)09:22:12 No.106961347

>>106955375
Saw one for $1k. Then in the description it says it's actually a 5700X3D.

Anonymous
10/21/25(Tue)09:25:16 No.106961375

Anonymous 10/21/25(Tue)09:25:16 No.106961375

>>106959437
there have to be definitive settings to get the most out of nemo at this point.

Anonymous
10/21/25(Tue)09:32:46 No.106961428

Anonymous 10/21/25(Tue)09:32:46 No.106961428

>>106961375
you ask it to guide you on using glm 4.6 api, that is how you get the most out of nemo, like using edge to download a better browser

Anonymous
10/21/25(Tue)09:36:21 No.106961462

Anonymous 10/21/25(Tue)09:36:21 No.106961462

>>106961428
there's no way I'm ever trusting a language model that's not running on my machine.

Anonymous
10/21/25(Tue)09:47:31 No.106961533

Anonymous 10/21/25(Tue)09:47:31 No.106961533

>>106961069
Gemini 3 has been releasing in 2 more weeks since August, so people assume with all the waiting Gemma 4 has to be ready as well (based on hopium)

Anonymous
10/21/25(Tue)09:59:01 No.106961621

Anonymous 10/21/25(Tue)09:59:01 No.106961621

Is there a way to hide reasoning in mikupad?

Anonymous
10/21/25(Tue)10:07:21 No.106961681

Anonymous 10/21/25(Tue)10:07:21 No.106961681

>>106961621
no

Anonymous
10/21/25(Tue)10:12:21 No.106961725

Anonymous 10/21/25(Tue)10:12:21 No.106961725

>>106961621
It's just a html file what you can edit. Including comments  will hide text in-between.
This works in sillytavern when you want to instruct the model with something the user doesn't need to see, in introductory message for example.
You just need to edit mikupad html a bit and find out where it outputs the related think tags and add comments around them, they won't be visible to the user then...

Anonymous
10/21/25(Tue)10:13:29 No.106961734

Anonymous 10/21/25(Tue)10:13:29 No.106961734

>>106961725
To add: of course do not comment the actual code but add the comments inside the tag strings...

Anonymous
10/21/25(Tue)10:16:39 No.106961752

Anonymous 10/21/25(Tue)10:16:39 No.106961752

File: fuckinglittleshit.png (381 KB, 1602x2658)

381 KB PNG

Anonymous
10/21/25(Tue)10:17:51 No.106961768

Anonymous 10/21/25(Tue)10:17:51 No.106961768

>>106961748
smartest melon hater

Anonymous
10/21/25(Tue)10:21:10 No.106961792

Anonymous 10/21/25(Tue)10:21:10 No.106961792

>>106961752
>(silence for 5 seconds)
Kek. I wouldn't trust an LLM for recipes, they hallucinate too much.

Anonymous
10/21/25(Tue)10:32:56 No.106961894

Anonymous 10/21/25(Tue)10:32:56 No.106961894

File: lol.png (80 KB, 1349x634)

80 KB PNG

>>106961792
I actually found the recipe it ripped and struggled to stray from, it was the first hit on brave search

Anonymous
10/21/25(Tue)10:57:42 No.106962128

Anonymous 10/21/25(Tue)10:57:42 No.106962128

gemma sirs... release?

Anonymous
10/21/25(Tue)11:09:00 No.106962210

Anonymous 10/21/25(Tue)11:09:00 No.106962210

>>106962128
thank you for showing the interest in jemma. she will be delivered soon so i advice you please be patience

Anonymous
10/21/25(Tue)11:10:30 No.106962224

Anonymous 10/21/25(Tue)11:10:30 No.106962224

File: 1734521330544783.png (56 KB, 457x939)

56 KB PNG

>>106962128
here you go sir

Anonymous
10/21/25(Tue)11:10:35 No.106962225

Anonymous 10/21/25(Tue)11:10:35 No.106962225

>>106962128
not even in training yet, forget it
now I'm doubly confused as to what the cool stuff was supposed to be

Anonymous
10/21/25(Tue)11:11:42 No.106962234

Anonymous 10/21/25(Tue)11:11:42 No.106962234

>>106962225
wait what was omar babbling about then? I'm half memeing with my jeetified posts but I was actually expecting a release last week

Anonymous
10/21/25(Tue)11:12:32 No.106962243

Anonymous 10/21/25(Tue)11:12:32 No.106962243

>>106962234
not sure, but if they are asking people with a poll if they want a reasoner or not then whatever the cool stuff is, it isn't gemma

Anonymous
10/21/25(Tue)11:14:01 No.106962259

Anonymous 10/21/25(Tue)11:14:01 No.106962259

>>106962234
>I was actually expecting a release last week
why would they release the next gemma before the next gemini?

Anonymous
10/21/25(Tue)11:14:25 No.106962263

Anonymous 10/21/25(Tue)11:14:25 No.106962263

What happens if you apply the lm head to a middle layer? Does it result in anything meaningful, or just gibberish?

Anonymous
10/21/25(Tue)11:15:48 No.106962275

Anonymous 10/21/25(Tue)11:15:48 No.106962275

>>106962224
the most advanced gemma yet

Anonymous
10/21/25(Tue)11:25:00 No.106962350

Anonymous 10/21/25(Tue)11:25:00 No.106962350

>>106962243
Regardless of where Gemma 4 is at right now, Twitter/Reddit/whatever polls are just community engagement 101, nobody behind the scenes gives a fuck and they certainly aren't waiting for their direction to decide what the next model will look like

Anonymous
10/21/25(Tue)11:34:26 No.106962441

Anonymous 10/21/25(Tue)11:34:26 No.106962441

>>106962350
fuck I forgot people can lie on the internet
you are right

Anonymous
10/21/25(Tue)11:41:57 No.106962520

Anonymous 10/21/25(Tue)11:41:57 No.106962520

>>106962441
I think calling it lying trivializes the manipulation

Anonymous
10/21/25(Tue)11:47:16 No.106962575

Anonymous 10/21/25(Tue)11:47:16 No.106962575

File: IMG_20251021_114211.jpg (314 KB, 1067x1404)

314 KB JPG

This seems like kind of a big deal and should unlock lots of training data that hasn't been properly OCR'd, no? Like piles of old archival data that's been sitting around, medieval manuscripts, etc.

Anonymous
10/21/25(Tue)11:53:26 No.106962628

Anonymous 10/21/25(Tue)11:53:26 No.106962628

File: 1757788155757792.png (161 KB, 281x329)

161 KB PNG

>>106954792
What effect does a higher or lower temperature have on RP capabilities and quality?

Anonymous
10/21/25(Tue)11:53:49 No.106962631

Anonymous 10/21/25(Tue)11:53:49 No.106962631

>>106962575
you're absolutely right! this is a game changer in copyright expired data discovered

Anonymous
10/21/25(Tue)12:00:09 No.106962702

Anonymous 10/21/25(Tue)12:00:09 No.106962702

>>106962575
>microfiche
>a flat piece of film containing microphotographs of the pages of a newspaper, catalog, or other document.
Huh. I mean, more data is always good, but there isn't really any information in there valuable for the average person today, and the formal non-discussion format won't make them better conversationalists or instruction followers.

Anonymous
10/21/25(Tue)12:00:13 No.106962703

Anonymous 10/21/25(Tue)12:00:13 No.106962703

>>106961752
>not knowing that baking soda and baking powder are two entirely different things

Anonymous
10/21/25(Tue)12:04:43 No.106962746

Anonymous 10/21/25(Tue)12:04:43 No.106962746

>>106962441
In fact my 2 cents based on that tweet is they have two models lined up (or one hybrid reasoner), and they're setting up the classic "Our magnanimous team has listened to The People and worked hard to deliver both options" PR routine

Anonymous
10/21/25(Tue)12:07:45 No.106962770

Anonymous 10/21/25(Tue)12:07:45 No.106962770

File: IMG_20251021_120501.jpg (413 KB, 1064x1378)

413 KB JPG

>>106962702
What about training on a corpus of old Department of Defens punchcard + microfiche combo?

Anonymous
10/21/25(Tue)12:09:16 No.106962784

Anonymous 10/21/25(Tue)12:09:16 No.106962784

it's over, we must use api
https://www.reddit.com/r/LocalLLaMA/comments/1ocfocd/local_llms_are_worse_for_security/

Anonymous
10/21/25(Tue)12:09:38 No.106962787

Anonymous 10/21/25(Tue)12:09:38 No.106962787

>>106962575
Even just already existing rendered HTML pages would be precious training data. All the previously inaccessible information about layout, colors, style, would be preserved if the documents were just captured as vision tokens instead of extracted as text. And as you can use different rendering methods for HTML pages (changing zoom, colors, orientation, etc), that would make data augmentation at a basic level much simpler than with pure text alone, all while using less tokens (but much more storage, I guess).

Anonymous
10/21/25(Tue)12:15:09 No.106962848

Anonymous 10/21/25(Tue)12:15:09 No.106962848

>>106962784
Alternate title: Local LLMs are better at instruction following

Anonymous
10/21/25(Tue)12:15:52 No.106962858

Anonymous 10/21/25(Tue)12:15:52 No.106962858

File: 1738614449348632.png (148 KB, 713x507)

148 KB PNG

pollen robotics, which is part of huggingface

classic frenchie moment

Anonymous
10/21/25(Tue)12:16:19 No.106962863

Anonymous 10/21/25(Tue)12:16:19 No.106962863

File: 1.png (582 KB, 1612x3377)

582 KB PNG

Metaphysics with LLLMs

Anonymous
10/21/25(Tue)12:16:24 No.106962865

Anonymous 10/21/25(Tue)12:16:24 No.106962865

>>106962848
>Local LLMs are better
>at anything
LOL

Anonymous
10/21/25(Tue)12:16:25 No.106962866

Anonymous 10/21/25(Tue)12:16:25 No.106962866

https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking
https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-2B-Thinking
more qwen VLs

Anonymous
10/21/25(Tue)12:18:26 No.106962888

Anonymous 10/21/25(Tue)12:18:26 No.106962888

File: 2.png (518 KB, 1641x3146)

518 KB PNG

>>106962863
Guys I can hear a helicopter over my house

Anonymous
10/21/25(Tue)12:19:37 No.106962906

Anonymous 10/21/25(Tue)12:19:37 No.106962906

>>106962866
neat

Anonymous
10/21/25(Tue)12:19:44 No.106962908

Anonymous 10/21/25(Tue)12:19:44 No.106962908

>>106962866
Goofs?

Anonymous
10/21/25(Tue)12:22:19 No.106962932

Anonymous 10/21/25(Tue)12:22:19 No.106962932

>>106962888
>You are time, experiencing itself.
Here's Tom with the weather.

Anonymous
10/21/25(Tue)12:22:40 No.106962939

Anonymous 10/21/25(Tue)12:22:40 No.106962939

>>106962888
what is this syco-slop demon?

Anonymous
10/21/25(Tue)12:27:29 No.106962998

Anonymous 10/21/25(Tue)12:27:29 No.106962998

>>106962866
>Upgraded Visual Recognition: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc.

Anonymous
10/21/25(Tue)12:28:38 No.106963012

Anonymous 10/21/25(Tue)12:28:38 No.106963012

>>106962908
goofy only supports like 3 vl models and qwen ain't one

Anonymous
10/21/25(Tue)12:29:02 No.106963016

Anonymous 10/21/25(Tue)12:29:02 No.106963016

File: file.png (3 KB, 218x66)

3 KB PNG

>>106963012
Fine...

Anonymous
10/21/25(Tue)12:30:54 No.106963038

Anonymous 10/21/25(Tue)12:30:54 No.106963038

https://videocardz.com/newz/nvidia-quietly-launches-rtx-pro-5000-blackwell-workstation-card-with-72gb-of-memory

72GB for $5k

Anonymous
10/21/25(Tue)12:34:22 No.106963079

Anonymous 10/21/25(Tue)12:34:22 No.106963079

>>106963038
If you're able to spend 5k on a gpu you can also spend 7k and get the better one.

Anonymous
10/21/25(Tue)12:36:02 No.106963107

Anonymous 10/21/25(Tue)12:36:02 No.106963107

>>106963079
this, I want my poverty tier 48gb 1500 eurodollars card

Anonymous
10/21/25(Tue)12:39:23 No.106963147

Anonymous 10/21/25(Tue)12:39:23 No.106963147

File: dipsyGrokked.png (1.03 MB, 832x1248)

1.03 MB PNG

I feel like with DS 3B OCR out now Dipsy may finally ditch the glasses. idk.
>>106961752
There are people right now producing and selling on Etsy sewing patterns made via "AI." They ofc don't fit and don't sew up correctly.
>>106962703
They are for most intents interchangable. You can make baking powder from 1/3 baking soda 2/3 cream of tartar.
>>106962575
Yes, plus the whole copyright discovery thing >106962631

Anonymous
10/21/25(Tue)12:40:32 No.106963164

Anonymous 10/21/25(Tue)12:40:32 No.106963164

File: 3.png (20 KB, 378x577)

20 KB PNG

>>106962932
So deep maan...

>>106962939
Fallen Gemma3 27B
[spoiler]I started the conversation by telling it to speak like a femme fatale[/spoiler]

Anonymous
10/21/25(Tue)12:42:15 No.106963184

Anonymous 10/21/25(Tue)12:42:15 No.106963184

File: b8.png (203 KB, 1631x1718)

203 KB PNG

>>106962784
>>106962848
>Unlike with frontier hosted models where security testing can get you banned, local models gave us a safe training environment.
Yeah, tell me about it. Oh well. I guess I'm gonna have to be less secure. :)

Anonymous
10/21/25(Tue)12:46:47 No.106963221

Anonymous 10/21/25(Tue)12:46:47 No.106963221

File: 1751084718263408.jpg (54 KB, 540x484)

54 KB JPG

Anyone ever use claude code with a local model?

I only recently found out that "claude code" is not in fact proprietary software locked to claude but is actually open source and allows you to use whatever model backend you want.

I've heard some good reviews about the tool but I'm not sure how much of claude code is like claude-tuned stuff that would fall apart if you tried to use something else with it.

Anonymous
10/21/25(Tue)12:47:37 No.106963231

Anonymous 10/21/25(Tue)12:47:37 No.106963231

>>106962908
never ever
there's some guy with a patch that mostly works but he won't put up a PR for and there's another meme closed source wrapper with partial support

Anonymous
10/21/25(Tue)12:50:52 No.106963255

Anonymous 10/21/25(Tue)12:50:52 No.106963255

>>106963221
ggerganov mentioned using it with llama.cpp in one of the PRs. I think as long as they have MCP support, they all pretty much do the same thing.

Anonymous
10/21/25(Tue)12:50:56 No.106963257

Anonymous 10/21/25(Tue)12:50:56 No.106963257

>>106963221
>but is actually open source and allows you to use whatever model backend you want
Then try it.

Anonymous
10/21/25(Tue)12:51:35 No.106963263

Anonymous 10/21/25(Tue)12:51:35 No.106963263

File: 1759803755661044m.jpg (108 KB, 1024x795)

108 KB JPG

>>106963221
There are multiple frontends that integrate directly into vscode, you can tweak the prompt yourself and hook up your locally hosted LLM, I use devstral with Cline and have a profile set up for that, it then falls back to its own default prompt when connecting to its own Claude API, this way I can generate a bunch of code locally for (free), then go over it with a paid API before I finally do my manual edits

Anonymous
10/21/25(Tue)12:52:59 No.106963274

Anonymous 10/21/25(Tue)12:52:59 No.106963274

Is it possible to use a VLM with sillytavern? whenever I try with a GGUF VLM the model is incapable of seeing the uploaded image?

Anonymous
10/21/25(Tue)12:53:33 No.106963278

Anonymous 10/21/25(Tue)12:53:33 No.106963278

>>106963263
>I finally do my manual edits
>// OC: Do Not Steal

Anonymous
10/21/25(Tue)12:53:38 No.106963281

Anonymous 10/21/25(Tue)12:53:38 No.106963281

>>106963221
I've used it with DeepSeek instead, to write up a stupid LLM-driven trainer. It cost about $0.10, anecdotal reports are that DS runs 1/10th to 1/20 the cost of Sonnet for claude code.
From practical standpoint, any local that can be adapted over to Anthropic's JSON output should work.

Anonymous
10/21/25(Tue)12:58:28 No.106963328

Anonymous 10/21/25(Tue)12:58:28 No.106963328

>>106963274
You have to use chat completion mode and load the associated multimodal projector (downloaded separately from the model) with the --mmproj argument in llama.cpp.

Anonymous
10/21/25(Tue)13:03:12 No.106963379

Anonymous 10/21/25(Tue)13:03:12 No.106963379

https://www.youtube.com/watch?v=8UWKxJbjriY

Anonymous
10/21/25(Tue)13:06:57 No.106963419

Anonymous 10/21/25(Tue)13:06:57 No.106963419

>>106963379
Give me the Cliff's Notes, Sam.

Anonymous
10/21/25(Tue)13:07:53 No.106963427

Anonymous 10/21/25(Tue)13:07:53 No.106963427

>>106963221
I've used it with GLM 4.6 and it's pretty good. It can explore codebases on its own to find what it needs.
Do not let it make architectural decisions though.

Anonymous
10/21/25(Tue)13:08:26 No.106963434

Anonymous 10/21/25(Tue)13:08:26 No.106963434

>>106963419
they made a browser which will look cool for 2 minutes before google demolishes them by having a better product in all areas

Anonymous
10/21/25(Tue)13:09:34 No.106963448

Anonymous 10/21/25(Tue)13:09:34 No.106963448

>>106958053
Have you tried removing all those -ot/--override-tensors options and just doing "-ncmoe 60" and then lowering it until you stop running out of memory?

Anonymous
10/21/25(Tue)13:12:07 No.106963467

Anonymous 10/21/25(Tue)13:12:07 No.106963467

File: 4.jpg (2.2 MB, 1635x2227)

2.2 MB JPG

The conversation got too abstract for 24gb of vram and it dvolved into pure sloppa

>>106963278
Kek I wish it were that simple and code generation was that good, then I would be handling 10 work contracts at the same time and be rich as fuck.

Anonymous
10/21/25(Tue)13:13:35 No.106963477

Anonymous 10/21/25(Tue)13:13:35 No.106963477

>>106962628
higher temperature flattens the probability curve for the next token. When temperature is low, high-probability tokens are extra common while rare tokens are muted to near non-existence. When temperature is high, rare tokens are amplified while common tokens begin to get slightly muted by the increased odds of rare tokens. This is a double-edged sword because some of the rare tokens are tasteful and unique, but some of the rare tokens are rare because they're straight-up wrong or retarded.
>high temp
Pros: more-likely to get something unique and interesting
Cons: also increases the probability of selecting something out-of-pocket or incorrect
>low temp
Pros: probability of retarded and wrong tokens decreases drastically
Cons: responses are repetitive and predictable, reads like llm-slop

Anonymous
10/21/25(Tue)13:14:49 No.106963490

Anonymous 10/21/25(Tue)13:14:49 No.106963490

>>106963448
>until you stop running out of memory
>stop
I meant start.

Anonymous
10/21/25(Tue)13:19:59 No.106963534

Anonymous 10/21/25(Tue)13:19:59 No.106963534

>>106963467
>I would be handling 10 work contracts at the same time and be rich as fuck
So would everyone else and your "work" would be devalued even more. Race to the bottom, blah blah blah...

Anonymous
10/21/25(Tue)13:25:04 No.106963567

Anonymous 10/21/25(Tue)13:25:04 No.106963567

>>106963434
>they made a browser to confuse people into forgetting what an abject failure gpt-5 is

Anonymous
10/21/25(Tue)13:25:40 No.106963571

Anonymous 10/21/25(Tue)13:25:40 No.106963571

>>106963534
Sure in time, which is why I would (and am) capitalising on the current state of affairs, but I think you, as someone who exists in the niche space that is local LLMs totally overestimate how many people even know how to prompt usable code, as opposed to someone who prompts "give me a program that solves my problem thanks" into their free tier chatgpt chrome browser tab

Anonymous
10/21/25(Tue)13:25:48 No.106963574

Anonymous 10/21/25(Tue)13:25:48 No.106963574

Which frontend should I use with vllm?

Anonymous
10/21/25(Tue)13:28:11 No.106963596

Anonymous 10/21/25(Tue)13:28:11 No.106963596

70bros what's the current best copetune/mergeslop?

Anonymous
10/21/25(Tue)13:29:31 No.106963610

Anonymous 10/21/25(Tue)13:29:31 No.106963610

>>106963596
Nemo

Anonymous
10/21/25(Tue)13:33:04 No.106963640

Anonymous 10/21/25(Tue)13:33:04 No.106963640

>>106963571
>as someone who exists in the niche space that is local LLMs
The verbiage by which you communicate seems unlike the custom in here parts.
>overestimate how many people even know how to prompt usable code
I know how bad it is. Search in the archives for the anon trying to make an entire llm inference engine by vibecoding and you will find me in his replies.

Anonymous
10/21/25(Tue)13:39:01 No.106963687

Anonymous 10/21/25(Tue)13:39:01 No.106963687

>>106963640
Hehe yeah I post here like once every 6 months

Anonymous
10/21/25(Tue)13:40:31 No.106963703

Anonymous 10/21/25(Tue)13:40:31 No.106963703

>>106963448
ill give this a try, thanks anon

Anonymous
10/21/25(Tue)13:49:35 No.106963790

Anonymous 10/21/25(Tue)13:49:35 No.106963790

>>106962888
i know people irl who think they've cracked some secret code, unlocked AI's potential, or have become a god of some sort because of these sycophant ai bots.. its unreal how quickly people trip over themselves with this shit

Anonymous
10/21/25(Tue)13:50:56 No.106963804

Anonymous 10/21/25(Tue)13:50:56 No.106963804

>>106963467
lmfao... "anchor"... jesus christ that is in every goddamn RP now

Anonymous
10/21/25(Tue)13:55:48 No.106963854

Anonymous 10/21/25(Tue)13:55:48 No.106963854

File: qwen3-vl-2b-and-qwen3-vl-(...).jpg (349 KB, 2048x1217)

349 KB JPG

apparently new qwen32B VL is a massive improvement in text as well, not just images

Anonymous
10/21/25(Tue)13:57:10 No.106963866

Anonymous 10/21/25(Tue)13:57:10 No.106963866

>>106963790
it's the first time in their lives that they were told they were right

Anonymous
10/21/25(Tue)13:57:17 No.106963869

Anonymous 10/21/25(Tue)13:57:17 No.106963869

>>106963854
This doesn't mean anything because Qwen has always been really dry and boring.

Anonymous
10/21/25(Tue)13:58:28 No.106963887

Anonymous 10/21/25(Tue)13:58:28 No.106963887

>>106963869
not everyone just jacks off to these

Anonymous
10/21/25(Tue)14:00:51 No.106963908

Anonymous 10/21/25(Tue)14:00:51 No.106963908

>>106963854
I guess there goes the idea that vision models trade off image understanding for text understanding, although it could be they started training them text-rich images..

Anonymous
10/21/25(Tue)14:01:51 No.106963921

Anonymous 10/21/25(Tue)14:01:51 No.106963921

>>106963790
yeah even professionals are not immune to that shit.
i think there was some guy at google in 2022 who was utterly convinced that they had created sentience, just because he got the output “I want everyone to understand that I am, in fact, a person,” from a language model.
Like, this was before even llama 1 released.

Its just layers of weighted floating point numbers dude, it's math. like for fuck sake.

Anonymous
10/21/25(Tue)14:02:00 No.106963924

Anonymous 10/21/25(Tue)14:02:00 No.106963924

miku should be dragged out on the street and shot

Anonymous
10/21/25(Tue)14:03:23 No.106963938

Anonymous 10/21/25(Tue)14:03:23 No.106963938

>>106963908
models being better at image understanding 100% improves their 'world model' they actually know what stuff looks like

Anonymous
10/21/25(Tue)14:03:32 No.106963942

Anonymous 10/21/25(Tue)14:03:32 No.106963942

>>106963921
Blake Lamoine, and the model was LaMDA (which character.ai was probably based on).

Anonymous
10/21/25(Tue)14:05:32 No.106963964

Anonymous 10/21/25(Tue)14:05:32 No.106963964

>>106963942
Completely understandable then.

Anonymous
10/21/25(Tue)14:06:15 No.106963968

Anonymous 10/21/25(Tue)14:06:15 No.106963968

File: eatmysourcecode.png (218 KB, 997x404)

218 KB PNG

>>106963869
guess kimi agrees

Anonymous
10/21/25(Tue)14:09:44 No.106963991

Anonymous 10/21/25(Tue)14:09:44 No.106963991

File: 1741416990437122.png (1.07 MB, 1053x2223)

1.07 MB PNG

>>106963854
>Qwen

Anonymous
10/21/25(Tue)14:10:32 No.106963998

Anonymous 10/21/25(Tue)14:10:32 No.106963998

>>106963938
Modality competition / load balancing is a known problem with multimodal models, apparently it was an issue with Llama 4 too.

https://ai.meta.com/blog/llama-4-multimodal-intelligence/
>The biggest challenge while post-training the Llama 4 Maverick model was maintaining a balance between multiple input modalities, reasoning, and conversational abilities. For mixing modalities, we came up with a carefully curated curriculum strategy that does not trade-off performance compared to the individual modality expert models.

Anonymous
10/21/25(Tue)14:14:00 No.106964035

Anonymous 10/21/25(Tue)14:14:00 No.106964035

>>106963991
finally, definitive, undeniable proof that qwen was always dogshit and benchmaxxed to hell and back

Anonymous
10/21/25(Tue)14:17:06 No.106964068

Anonymous 10/21/25(Tue)14:17:06 No.106964068

>>106963854
>>106963908
Has anyone actually done a blind test of RP between Qwen3 32B and Qwen3 32B-vl

Anonymous
10/21/25(Tue)14:18:09 No.106964079

Anonymous 10/21/25(Tue)14:18:09 No.106964079

>>106964068
the VL came out a couple hours ago and there's no gguf support so I'm guessing no

Anonymous
10/21/25(Tue)14:21:02 No.106964105

Anonymous 10/21/25(Tue)14:21:02 No.106964105

>>106963854
it sucks at captioning/describing images, thinks nudes are "a man wearing a labcoat", doesnt see vaginas, etc. it does see boobs tho

Anonymous
10/21/25(Tue)14:23:30 No.106964126

Anonymous 10/21/25(Tue)14:23:30 No.106964126

>>106964105
Does it suck at SFW stuff or just NSFW stuff?

Anonymous
10/21/25(Tue)14:25:04 No.106964154

Anonymous 10/21/25(Tue)14:25:04 No.106964154

>>106964105
i thought it described a picture of my weiner decent enough

Anonymous
10/21/25(Tue)14:26:14 No.106964169

Anonymous 10/21/25(Tue)14:26:14 No.106964169

>>106964068
probably not but the better question is why would you want to? qwen is already known to be one of the worse performing models for RP and that the only reason 235b-a22b scored so high is because it benchmaxxed (seeing the pattern?) for it and the writing style devolves into short fragmented sentences as the story progresses. it'd be like comparing a rotten apple to a rotten orange. sure they might both have somewhat different flavor profiles but the facade is quickly killed when you bite into the putrid stinking rot beneath the surface.

Anonymous
10/21/25(Tue)14:26:40 No.106964173

Anonymous 10/21/25(Tue)14:26:40 No.106964173

File: G3fQRq5WwAAYd9l.jpg (2.8 MB, 3000x2000)

2.8 MB JPG

>>106964126
eh, it's ok with sfw but it misses a lot. i think it needs a system instruction to tell it how deep to go into descriptions and such
i sent this and it said "a gray man sitting behind a desk with a nametag that says 'george h. w. bush'. there is a bust of abraham lincoln in the back and a US flag above" so idk what it needs to do better

Anonymous
10/21/25(Tue)14:26:47 No.106964177

Anonymous 10/21/25(Tue)14:26:47 No.106964177

>>106961103
Normal Gemma is already cucked and avoidant. Reasoning is going to be super-cucked and super-avoidant.

Anonymous
10/21/25(Tue)14:26:48 No.106964179

Anonymous 10/21/25(Tue)14:26:48 No.106964179

>>106964154
>this image shows a pale pickle, approximately 3 inches in length

Anonymous
10/21/25(Tue)14:27:59 No.106964198

Anonymous 10/21/25(Tue)14:27:59 No.106964198

>>106964173
i tried with a system instruction (basically telling it to go into details) and it spit out:
>A distinguished man, appearing to be in his late seventies, sits confidently at a polished, dark wooden desk in a formal office setting. He is wearing a sharp, well-tailored navy blue suit with a crisp white dress shirt and a patterned red-and-blue tie. His hair is white and neatly combed, and he wears dark, reflective sunglasses that lend him an air of authority and composure. He has a broad, warm smile, revealing a full set of white teeth, and his hands rest on the desk, one slightly raised as if in gesture or mid-pause. The desk is laden with official items: a gold-framed photograph of a man in uniform, a telegraph machine with brass details, a small white statuette, and a phone receiver resting on the desk. Behind him, a large American flag stands to his left, and to his right, an official flag bearing the eagle emblem of the Department of Defense. Further behind, a vibrant array of military service flags from across the armed forces—Army, Navy, Air Force, Marine Corps, and Coast Guard—are displayed in a neat, vertical arrangement. The background features large, floor-to-ceiling windows draped in heavy golden curtains, and beyond, a soft blur of folding greenery and park-like scenery. The room is draped in a warm, refined light that accentuates the texture of the wooden furniture and the sheen of the fabrics. The overall scene carries a dignified and reverent atmosphere, embodying the gravitas of a seasoned national leader.

Anonymous
10/21/25(Tue)14:30:23 No.106964223

Anonymous 10/21/25(Tue)14:30:23 No.106964223

>>106964198
It's definitely conflating the two subjects in the photo but this sounds like it could at least be good at select captioning tasks like "describe this person's outfit". Hopefully someone will make a nsfw tune of it

Anonymous
10/21/25(Tue)14:31:30 No.106964241

Anonymous 10/21/25(Tue)14:31:30 No.106964241

>>106957178
Make sure to use -ncmoe rather than lowering -ngl if it's a MoE model.

Anonymous
10/21/25(Tue)14:33:23 No.106964268

Anonymous 10/21/25(Tue)14:33:23 No.106964268

File: Screenshot from 2025-10-1(...).png (813 KB, 3774x2101)

813 KB PNG

>>106963640
I am that guy. I don't know how you define "vibecoding", it seems to be a useless term because everybody has a different definition for it from "having the LLM generate a single line of code" to "writing a 10 word prompt and having the LLM generate an entire project".
But in any case "knowing how to prompt usable code" seems to be the thing I was trying to learn how to do?
I even got a step further, I decided to put the C stuff on the back burner and this weekend I learned how to finetune Llama 405B on rented servers, finetuned it on a few machine learning related codebases and attempted to turn it into a CoT model using synthetic data I generated using my own vibecoded assistant with the goal of making a better model for AI development tasks (it didn't really work but I only spent like a few hours actually training the LLM and the rest of the time on data generation and and setup and file transfers).
My philosophy is if I finetune the LLM with the correct output every time it makes a mistake, eventually I should both achieve my goals, and also end up with an LLM that is able to work as autonomously as it possibly could (assuming the target outputs I generate are optimal and I live for long enough to generate enough training data).
That said, I think generating the right training data by steering the agent using natural language should be easier than generating the training samples fully from scratch, as long as the outputs from the LLM are close enough and aren't complete trash that requires spending more time on correction than if I wrote them by hand.
I think of it as semi-supervised RL rather than traditional (fully) supervised training.
The only reason I care about LLMs is for the possibility of automating work to a large extent. If LLMs were only useful to educate myself but weren't able to assist with the work I would still use them, but I wouldn't care enough to actively try to try to train them and develop software to use and train them with.

Anonymous
10/21/25(Tue)14:35:40 No.106964293

Anonymous 10/21/25(Tue)14:35:40 No.106964293

>>106954792
Is there any model better than Whisper for Japanese audio transcriptions?

Anonymous
10/21/25(Tue)14:45:49 No.106964394

Anonymous 10/21/25(Tue)14:45:49 No.106964394

>>106963328
I do that, and it works in other programs like Oobabooga, but not sillytavern.

Anonymous
10/21/25(Tue)14:50:01 No.106964446

Anonymous 10/21/25(Tue)14:50:01 No.106964446

File: send-inline-images.png (14 KB, 266x105)

14 KB PNG

>>106964394
Send Inline Images must be enabled too.

Anonymous
10/21/25(Tue)14:50:46 No.106964455

Anonymous 10/21/25(Tue)14:50:46 No.106964455

>>106963921
yeah i remember that... i also worked with an extremely intelligent ex-googler guy at my last job who fell for this shit, even though he knows better.. it's pretty crazy what a sycophancy can do to people

Anonymous
10/21/25(Tue)14:50:57 No.106964457

Anonymous 10/21/25(Tue)14:50:57 No.106964457

>>106964446
I see. Where is that setting located?

Anonymous
10/21/25(Tue)14:51:47 No.106964473

Anonymous 10/21/25(Tue)14:51:47 No.106964473

>>106964223
>Hopefully someone will make a nsfw tune of it
bruh,!! you can't be 4real rite now? you fucks always cry that tunes are bad and now you want them what' wrong wit you?

Anonymous
10/21/25(Tue)14:53:43 No.106964497

Anonymous 10/21/25(Tue)14:53:43 No.106964497

>>106964394
I wrote (and by wrote I mean I asked an LLM to write) a proxy script to dump OpenAI compatible API requests to disk, maybe it's useful for you to debug.
https://paste.centos.org/view/43f1c843

Anonymous
10/21/25(Tue)14:59:50 No.106964576

Anonymous 10/21/25(Tue)14:59:50 No.106964576

File: file.png (239 KB, 1000x500)

239 KB PNG

hey niggers

I want to start transitioning to local models since I want to eventually leave the borg (as much as possible anyway).

What isn't coverable given today's tech? I understand Claude Code like tools are the modern version of timesharing, where it isn't feasible to have such insane hardware at home due to the large context windows. Do I have this correct? It seems like for most other things, I can get away with a combination of a 5090 desktop and the 128gb hx395 that I just ordered. I train on the 5090, and use the models on the AMD machine? Do I have this roughly correct?

Anonymous
10/21/25(Tue)15:02:44 No.106964612

Anonymous 10/21/25(Tue)15:02:44 No.106964612

>>106964576
>I want to start transitioning
wrong thread

Anonymous
10/21/25(Tue)15:03:28 No.106964622

Anonymous 10/21/25(Tue)15:03:28 No.106964622

>>106963921
>Its just layers of weighted floating point numbers dude, it's math. like for fuck sake.
The math is an abstraction. The actual reality is electrons flowing through through metal and silicon traces. The brain is more complex but ultimately also just chemical reactions and electricity (which can also be abstracted as math equations). What makes you think LLMs are any more "just math" than the brain?

>>106964455
Or maybe you're just too dumb to understand the reasons why he believes whatever he believes.

Anonymous
10/21/25(Tue)15:03:57 No.106964627

Anonymous 10/21/25(Tue)15:03:57 No.106964627

>>106962575
>FREAKING
imagine not saying mild slurs on elon's twitter lol

Anonymous
10/21/25(Tue)15:06:00 No.106964650

Anonymous 10/21/25(Tue)15:06:00 No.106964650

>>106958278
>https://yangxusheng-yxs.github.io/U-Codec/
The title for this page is "VibeVoice" lol

Anonymous
10/21/25(Tue)15:06:49 No.106964663

Anonymous 10/21/25(Tue)15:06:49 No.106964663

>>106964612
no it's perfect

Anonymous
10/21/25(Tue)15:06:51 No.106964664

Anonymous 10/21/25(Tue)15:06:51 No.106964664

>>106960104
install drivers from run files instead
t. debian 13 chad
>>106958049
isnt printing more reliable on linux?
>incompetent wimp
yes he is, thats why he should use a just werks distro
>reddit
go bac

Anonymous
10/21/25(Tue)15:07:55 No.106964678

Anonymous 10/21/25(Tue)15:07:55 No.106964678

>>106958053
maybe try a non IQ quant, IQ quants are slow when offloaded (besides IQ4_XS)

Anonymous
10/21/25(Tue)15:08:57 No.106964691

Anonymous 10/21/25(Tue)15:08:57 No.106964691

>>106964576
No. There is nothing even remotely close to Claude Code you can run on 128GB of memory.
You probably just wasted that money and wont be satisfied with whatever you can run on it.
Local inference hardware isn't there yet to get coding models without being a richfag, that's why most people on this thread are all about text porn and think using them for productivity is a fool's errand.
The best compromise right now for a non richfag is using GLM or Qwen through an API. That way you can see how well the open weights models can be applied to your use case. Then once you confirm that open models are realistic for your use case, the next step is renting online GPUs to see how much memory you need to run those models at a good enough quantization and context. And only once you confirm they run well enough on a certain hardware tier you make the purchase.

Anonymous
10/21/25(Tue)15:12:46 No.106964738

Anonymous 10/21/25(Tue)15:12:46 No.106964738

>>106962575
>no one thought of using dots.ocr for this

Anonymous
10/21/25(Tue)15:12:56 No.106964741

Anonymous 10/21/25(Tue)15:12:56 No.106964741

>>106964268
NTA but vibecoding is when you don't understand how to solve the problem yourself, and thus don't understand and can't verify the code it spits out, which sends you into a spiral of a spaghetti codebase that wouldn't make sense to any human.

To learn how to prompt usable code involves actually learning how to code so that you can instruct the LLM to write code that does exactly what you tell it to do and follow an exact method of doing it, bit by bit for the entire codebase whilst architecting the entire codebase and each of its modules yourself.

Essentially using LLMs to code something high quality and usable in prod is simply a (huge) time saving tool rather than a tool that autonomously creates products for you, that is what vibe coding is because you simply tell it what you want and you vibe check the end result if it "feels right" because you don't actually know how to do it yourself.

Vibe coding is how you get shitty spaghetti code that is unmaintainable and full of vulnerabilities, using LLMs to do code that you could do yourself speeds up your work enough to do the job of (you) the senior engineer with a digital junior engineer, allowing you to do two or more jobs in the same time you used to be able to do one, or to do one job efficiently and use your spare time to jerk off and play video games.

Anonymous
10/21/25(Tue)15:13:08 No.106964745

Anonymous 10/21/25(Tue)15:13:08 No.106964745

>>106964691
>Local inference hardware isn't there yet to get coding models without being a richfag
Are we talking 5 digits, 6 digits? Higher?

Anonymous
10/21/25(Tue)15:15:33 No.106964774

Anonymous 10/21/25(Tue)15:15:33 No.106964774

>>106964738
>no shit

Anonymous
10/21/25(Tue)15:19:26 No.106964825

Anonymous 10/21/25(Tue)15:19:26 No.106964825

>>106964576
You can use a small 24B model just fine if you use it one micro-service at a time and use your brain to do the architecture yourself, you'll end up with higher quality maintainable code that you actually understand and can explain to your clients too.

If you mean you want to vibe code and tell a local model to code and maintain a whole project in one go then no the money you would spend on VRAM to do that would probably take a decade to even out the cost vs just using APIs at current prices

Anonymous
10/21/25(Tue)15:19:59 No.106964831

Anonymous 10/21/25(Tue)15:19:59 No.106964831

>>106964576
you should've asked here before ordering a 5090 and hx395
you got meme'd
at least you didnt buy the dgx spark
glm 4.6 is what you should be aiming for, but you should prob get a rig that can run deepseek and 1 trillion models just in case
u want a rig with many memory channels, preferrably DDR5 if possible
and a few gpus inside too, that would be nice

Anonymous
10/21/25(Tue)15:21:12 No.106964842

Anonymous 10/21/25(Tue)15:21:12 No.106964842

>>106964831
>>106964691
Cancelled the 395, thanks. Had the 5090, ordered it to run a Neo G9 months ago. Will do more research before buying anything else.

Anonymous
10/21/25(Tue)15:22:06 No.106964856

Anonymous 10/21/25(Tue)15:22:06 No.106964856

Mistral large >>> GLM 4.5
new ≠ better
idk why people praise this slop

Anonymous
10/21/25(Tue)15:23:17 No.106964870

Anonymous 10/21/25(Tue)15:23:17 No.106964870

>>106964856
lmao
you better mean air anon and even then it's questionable

Anonymous
10/21/25(Tue)15:23:56 No.106964880

Anonymous 10/21/25(Tue)15:23:56 No.106964880

>>106964842
happy for you anon, richfags ugh
so how much was the 395 you were gonna buy?
i must reiterate to NOT buy the DGX spark
you should lurk more and browse these thread's archives
if you want something that 'just works' (wouldnt recommend) you can get the m3 ultra 512gb
i really dont recommend it, you can get a better epyc server higher ram rig and a nice or few nice gpus for same price

Anonymous
10/21/25(Tue)15:24:50 No.106964894

Anonymous 10/21/25(Tue)15:24:50 No.106964894

>>106964576
>Claude Code
Claude Code, as in the tool, can actually be used and pointed at any LLM backend, its just that the default is claude.

>it isn't feasible to have such insane hardware at home due to the large context windows.
Local models are generally going to have lower context windows yeah.

Since you brought up Claude, I'd point out that Claude actually has the smallest context size out of the big frontier labs, with the standard paid version supporting 200k tokens. A local model such as GLM 4.5 air supports 128k context length (assuming you have the memory for it), which isn't as massive of a difference as you might expect.

Claude Code in particular already does context management and compaction, which imo makes it a decent choice for tooling if you want to swap your own backend into there

>>106964576
>I train on the 5090, and use the models on the AMD machine?
Short answer is that your 5090 is going to be way faster for anything that fits into its 32gb vram, but your 128gb meme395 is going to be slower but have more options for what models to run + context size.

You're unlikely to be training anything on local hardware

Anonymous
10/21/25(Tue)15:25:00 No.106964897

Anonymous 10/21/25(Tue)15:25:00 No.106964897

>>106964622
lol sure, i'm sure he's the chosen one, and it's not just sycophancy just like everyone else

Anonymous
10/21/25(Tue)15:25:46 No.106964910

Anonymous 10/21/25(Tue)15:25:46 No.106964910

>>106964856
>idk why people praise this slop
because it runs fast on 64gb ram + a bit of vram on the side
mistral large is not good enough to warrant <1t/s speeds for us poorfags

Anonymous
10/21/25(Tue)15:26:04 No.106964914

Anonymous 10/21/25(Tue)15:26:04 No.106964914

>>106964880
i already pre-ordered two DGX, what's the problem, you just jealous?

Anonymous
10/21/25(Tue)15:26:12 No.106964918

Anonymous 10/21/25(Tue)15:26:12 No.106964918

>>106964576
You could run something pretty good with 8 H100s (using NVLink).

Anonymous
10/21/25(Tue)15:26:32 No.106964922

Anonymous 10/21/25(Tue)15:26:32 No.106964922

>>106964880
I've switched to using Claude's $200/mo plan and just chug Opus all day instead of doing any work myself recently. I didn't think AI was here already but as long as I explicitly reason with it as if it's a junior dev, it will do basically everything and I just need to make minor edits.

I wanna get as close to that locally. Not a Mac fan (if for no other reason than compatibility issues).

Anonymous
10/21/25(Tue)15:26:52 No.106964927

Anonymous 10/21/25(Tue)15:26:52 No.106964927

>can't even copy the writing style right

Anonymous
10/21/25(Tue)15:27:29 No.106964931

Anonymous 10/21/25(Tue)15:27:29 No.106964931

>>106964745
Depends how fast you want it to run. Technically you can run any model on a pentium 1 with a 1 TB disk. The problem is the speed.
In the low 5 digits the best option is probably the M3 Ultra, but it's still going to be much slower than API.
Then after that you need to spend multiples of that to get marginal speed increases until you begin to get speeds similar to API in the mid to high 5 digits.

Anonymous
10/21/25(Tue)15:28:05 No.106964942

Anonymous 10/21/25(Tue)15:28:05 No.106964942

>>106962866
Qwen-ZUTT-BJC-420B

Anonymous
10/21/25(Tue)15:28:15 No.106964945

Anonymous 10/21/25(Tue)15:28:15 No.106964945

>>106964922
nice

Anonymous
10/21/25(Tue)15:28:34 No.106964951

Anonymous 10/21/25(Tue)15:28:34 No.106964951

>>106964880
oh, and I just bought it for $2k off amazon. Thankfully they're very generous with cancellations.

Anonymous
10/21/25(Tue)15:28:59 No.106964956

Anonymous 10/21/25(Tue)15:28:59 No.106964956

File: prof.gif (2.51 MB, 498x278)

2.51 MB GIF

>>106964914
>i already pre-ordered two DGX

Anonymous
10/21/25(Tue)15:29:16 No.106964960

Anonymous 10/21/25(Tue)15:29:16 No.106964960

>>106964622
i mean, what's the end game with this argument though anon?
you must see that as an intelligent person, it doesn't lead to anything good for humanity.
And is that a good thing to advocate for, or is it in fact, harmful?

Anonymous
10/21/25(Tue)15:30:36 No.106964973

Anonymous 10/21/25(Tue)15:30:36 No.106964973

>>106964922
> $200/mo subscription
If no one was giving you shit before you will get it now.
Dump that sub and at least learn to use an API. Even the locust on aicg have figured that much out.

Anonymous
10/21/25(Tue)15:30:40 No.106964974

Anonymous 10/21/25(Tue)15:30:40 No.106964974

>>106964914
Oof.

Anonymous
10/21/25(Tue)15:31:28 No.106964984

Anonymous 10/21/25(Tue)15:31:28 No.106964984

>>106964894
Thanks for the good information. I meant models like Opus/Sonnet.

Anonymous
10/21/25(Tue)15:32:10 No.106964990

Anonymous 10/21/25(Tue)15:32:10 No.106964990

>>106964973
Claude Code already uses the API, no? It requires a sub that supports it as well.

Anonymous
10/21/25(Tue)15:32:51 No.106965000

Anonymous 10/21/25(Tue)15:32:51 No.106965000

File: left-bar.png (367 KB, 1265x998)

367 KB PNG

>>106964457
It's on the left bar with all sampler settings and chat completion options.

Anonymous
10/21/25(Tue)15:33:03 No.106965003

Anonymous 10/21/25(Tue)15:33:03 No.106965003

>>106964914
Kek

Anonymous
10/21/25(Tue)15:33:52 No.106965016

Anonymous 10/21/25(Tue)15:33:52 No.106965016

>>106964990
No it does not. I've installed the terminal version of Claude Code locally and connected it to DeepSeek official API, which now can emulate Anthropic output. The cost is a fraction of Anthropic pricing.

Anonymous
10/21/25(Tue)15:34:04 No.106965019

Anonymous 10/21/25(Tue)15:34:04 No.106965019

>>106964856
i went from command r plus to mistral large to GLM 4.5 air. you are a retard and no idea what the fuck you are talking about. do us a favor and stop breathing

Anonymous
10/21/25(Tue)15:35:05 No.106965029

Anonymous 10/21/25(Tue)15:35:05 No.106965029

>>106964990
>>106965016
https://api-docs.deepseek.com/guides/anthropic_api

Anonymous
10/21/25(Tue)15:35:47 No.106965035

Anonymous 10/21/25(Tue)15:35:47 No.106965035

gemini 3 tomorrow

Anonymous
10/21/25(Tue)15:36:28 No.106965041

Anonymous 10/21/25(Tue)15:36:28 No.106965041

>>106965029
>>106965016
thanks anons, have a lot of reading to catch up on now before wasting more money. appreciate you guys.

Anonymous
10/21/25(Tue)15:36:29 No.106965042

Anonymous 10/21/25(Tue)15:36:29 No.106965042

>>106965000
man the way these modes all write is so fucking REDDIT

Anonymous
10/21/25(Tue)15:37:00 No.106965047

Anonymous 10/21/25(Tue)15:37:00 No.106965047

sirs glm 4.6 air of when?

Anonymous
10/21/25(Tue)15:38:27 No.106965058

Anonymous 10/21/25(Tue)15:38:27 No.106965058

>>106965000
what's your system instruction?

Anonymous
10/21/25(Tue)15:40:12 No.106965079

Anonymous 10/21/25(Tue)15:40:12 No.106965079

>>106965058
There's nothing else besides the first message visible there. SillyTavern isn't counting image tokens correctly.

Anonymous
10/21/25(Tue)15:42:55 No.106965108

Anonymous 10/21/25(Tue)15:42:55 No.106965108

https://www.youtube.com/watch?v=8UWKxJbjriY
kek, imagine this defeats google chrome

Anonymous
10/21/25(Tue)15:47:36 No.106965149

Anonymous 10/21/25(Tue)15:47:36 No.106965149

>>106965108
Old news: >>106963379
Nobody gave a fuck.

Anonymous
10/21/25(Tue)15:48:14 No.106965155

Anonymous 10/21/25(Tue)15:48:14 No.106965155

>>106965047
two more weeks after the two weeks are up

Anonymous
10/21/25(Tue)15:48:25 No.106965156

Anonymous 10/21/25(Tue)15:48:25 No.106965156

I gave gemma 3 and qwen 3 vl 32b a bunch of lewds and they are pretty close in terms of visual understanding.
They both say dumb shit though.

Anonymous
10/21/25(Tue)15:50:46 No.106965179

Anonymous 10/21/25(Tue)15:50:46 No.106965179

File: prompt.png (250 KB, 2090x1804)

250 KB PNG

>>106964741
Alright, fair enough.
But it's still kinda too vague. What if you define a spec for a microservice like >>106964825 says?
Technically you can never look at the code and just define the whole system in natural language. This would get you good architecture (for some definition of good and architecture) but would probably still have vulnerabilities.
I don't think anyone expects current LLMs to be given a short prompt and to create a big project autonomously from scratch, that's a strawman.
What I was doing with the C thing was a mix between making a long spec of things to accomplish, and giving the LLM live feedback in a Claude Code like tool I made using other pre-existing code assistants.
This is the level of granularity I was working at. I was trying to measure progress based on statistical measures of the activations and loaded weights (mean, stddev, correlation, mean absolute error, squared error) compared to the data generated by the original Python implementation, not just "vibes", whatever that means. And that's exactly the problem. "Vibe" is a retarded zoomer buzzword that doesn't mean anything. If I ask it to write a game and there's a bug and I give the LLM feedback about that bug, is that a "vibe"? To me a vibe is something that either feels wrong or feels good, but a bug or a deformed character in a game is not a vibe, it's factual information. Nobody realistically only gives feedback to the LLM based on feelings and avoid giving it any objective concrete information. I disliked the term from the moment I saw it. It reeks of identity politics, it's one of those strawmen that only serve as punch bags for people to feel good for being for or against.
If it actually meant generating (any) code using AI then that would be something I could defend. But when it can be stretched to mean anything as ridiculous as you want that nobody actually believes so you can use it rhetorically for the sake of your more general anti AI argument, it's not useful.

Anonymous
10/21/25(Tue)15:52:50 No.106965208

Anonymous 10/21/25(Tue)15:52:50 No.106965208

>>106965156
what frontend/gui are you using and what system instruction? for me, gemma3 is really good at captioning anything including nsfw, but qwen 3 vl 32b was shitty (see >>106964173
>>106964198
)

Anonymous
10/21/25(Tue)15:53:15 No.106965213

Anonymous 10/21/25(Tue)15:53:15 No.106965213

>>106964956
>>106964974
>>106965003

ok, so you're all jealous, so what, doesn't hurt me any. maybe get a job so you can buy nice things too

Anonymous
10/21/25(Tue)15:55:18 No.106965240

Anonymous 10/21/25(Tue)15:55:18 No.106965240

>>106964960
Being willfully stupid about it doesn't necessarily lead to anything good for humanity either, in fact it leads to underestimating machines which might lead to our ultimate demise.

Anonymous
10/21/25(Tue)15:56:41 No.106965255

Anonymous 10/21/25(Tue)15:56:41 No.106965255

>llm vibe coding advice for brownoids
can you please fuck off?

Anonymous
10/21/25(Tue)15:57:57 No.106965275

Anonymous 10/21/25(Tue)15:57:57 No.106965275

>>106964973
Using the Claude API you will spend hundreds of dollars in a day instead of in a month, unless you mean stolen keys which is what AICG does. But stealing keys is illegal and it's hard to learn how to do anyway, since the few people who actually know how don't share for obvious reasons. What most people do is send the queries through a proxy that is loaded with the stolen keys, which is an option I guess if you don't care about your data being stolen by random /aicg/ anons that host the proxies, and being an accomplice to a crime.

Anonymous
10/21/25(Tue)15:58:30 No.106965277

Anonymous 10/21/25(Tue)15:58:30 No.106965277

>>106965208
I used openwebui for qwen because vllm.
"You must give a detailed description of the image."

They traded blows in my tests. Gemma would say that one character is kissing the other, while qwen would correctly say that one character is sucking on the other's nipple, but for the same image qwen said that it's two girls and one girl is holding the other girl's penis.

>>106965213
I buy real gpus with real vram.

Anonymous
10/21/25(Tue)15:59:37 No.106965292

Anonymous 10/21/25(Tue)15:59:37 No.106965292

>>106965016
??? Yes it does.
You should've specified "a cheap API". But cheap is lower quality, and I say that as somebody who is trying to not use any proprietary model. But people should go into knowing what to expect (lower quality, unfortunately).

Anonymous
10/21/25(Tue)16:00:03 No.106965297

Anonymous 10/21/25(Tue)16:00:03 No.106965297

babe, wake up, new linux kernel just dropped

Anonymous
10/21/25(Tue)16:02:07 No.106965327

Anonymous 10/21/25(Tue)16:02:07 No.106965327

>>106965255
No. Why don't you fuck off?

Anonymous
10/21/25(Tue)16:03:00 No.106965335

Anonymous 10/21/25(Tue)16:03:00 No.106965335

>>106965277
h200 bro, is that you?

Anonymous
10/21/25(Tue)16:03:30 No.106965341

Anonymous 10/21/25(Tue)16:03:30 No.106965341

>>106965327
post hands

Anonymous
10/21/25(Tue)16:06:36 No.106965369

Anonymous 10/21/25(Tue)16:06:36 No.106965369

>>106965240
>fact it leads to underestimating machines which might lead to our ultimate demise.
the social effects are more of an issue then the underlying tech. our current ai systems are highly unlikely to become a self sustaining life form, we could always choose to unplug the damned thing.

Anonymous
10/21/25(Tue)16:10:08 No.106965411

Anonymous 10/21/25(Tue)16:10:08 No.106965411

>>106965369
>we could always choose to unplug the damned thing.
You would have to watch out for the crazies that try to help the machines in exchange for being allowed to live as their pets.

Anonymous
10/21/25(Tue)16:16:16 No.106965471

Anonymous 10/21/25(Tue)16:16:16 No.106965471

File: file.png (542 KB, 1846x950)

542 KB PNG

>>106965208
>>106965277
Here's an example where qwen did a better job identifying the setting and the action taking place. Blunders underlined.

Anonymous
10/21/25(Tue)16:17:45 No.106965488

Anonymous 10/21/25(Tue)16:17:45 No.106965488

>>106965471
what temp? that is not bad, a finetune could prob make it great for nsfw captioning

Anonymous
10/21/25(Tue)16:20:45 No.106965523

Anonymous 10/21/25(Tue)16:20:45 No.106965523

File: file.png (195 KB, 909x859)

195 KB PNG

>>106965488
I forgot to change the temp in openwebui. It was at 0.8.
Gemma was set to 0. Here's qwen with 0.

>using her feet to perform oral sex

Anonymous
10/21/25(Tue)16:21:50 No.106965533

Anonymous 10/21/25(Tue)16:21:50 No.106965533

>>106965523
hmm, are you sure there are no tokenization / formatting issues?, it knows what a footjob is better than any model so far but uses the word oral sex

Anonymous
10/21/25(Tue)16:24:17 No.106965553

Anonymous 10/21/25(Tue)16:24:17 No.106965553

>>106965533
I ran vllm serve "Qwen/Qwen3-VL-32B-Instruct" and the ui is using the /v1/chat/completions endpoint.

Anonymous
10/21/25(Tue)16:26:02 No.106965567

Anonymous 10/21/25(Tue)16:26:02 No.106965567

>>106965553
try one with two characters actually having sex

Anonymous
10/21/25(Tue)16:29:52 No.106965605

Anonymous 10/21/25(Tue)16:29:52 No.106965605

>>106965341
You don't have to ask that, I'll tell you my color. I'm brown. I'm sorry that you are bothered by my presence, but LLMs are all I have going in my life so you will have to cope with my presence.

Anonymous
10/21/25(Tue)16:31:34 No.106965620

Anonymous 10/21/25(Tue)16:31:34 No.106965620

>>106965605
im not bothered, I could tell your IQ level by your posting style and what youre actually doing.
you're an actual retard

Anonymous
10/21/25(Tue)16:39:12 No.106965695

Anonymous 10/21/25(Tue)16:39:12 No.106965695

>>106964870
>>106965019
I couldn’t help but read these posts aloud, my husky voice barely above whisper

Anonymous
10/21/25(Tue)16:42:24 No.106965724

Anonymous 10/21/25(Tue)16:42:24 No.106965724

>>106965695
hot breath on my ear

Anonymous
10/21/25(Tue)16:47:21 No.106965782

Anonymous 10/21/25(Tue)16:47:21 No.106965782

Do any backends support Qwen3-VL yet? Llama.cpp still sucks at vision support right? textgenwebui doesn't support it, and I'm not running linux to try vllm. Do I actually have to roll my own python api to make an OpenAI compatible back end?

Anonymous
10/21/25(Tue)16:49:28 No.106965794

Anonymous 10/21/25(Tue)16:49:28 No.106965794

>>106965782
>I refuse to run the one backend that supports it

Anonymous
10/21/25(Tue)16:49:49 No.106965800

Anonymous 10/21/25(Tue)16:49:49 No.106965800

I was testing the only gguf of ling flash that's available since it finally got merged after a month. It's a 55ish gig mxfp4 gguf, and color me a little surprised. Even with a short prompt to allow it to have opinions in the default assistant mode, when asked for its opinions on various niche/taboo shit you'd find on ao3, it's willing to consider the difference between reality and fiction and possible reasons people write what they do compared to most models that immediately spam you with disclaimers, hotlines or refusals if you ask it for its opinions on "unsavory" fiction. It's a bit faster than air for me too, maybe like 3-4 t/s more. Gonna try autocompleting some stories and see whether it writes like shit, even though it seems to have fairly neutral bias

Anonymous
10/21/25(Tue)16:49:52 No.106965802

Anonymous 10/21/25(Tue)16:49:52 No.106965802

>>106965782
there's a patched version of llama.cpp ( https://github.com/Thireus/llama.cpp/releases/tag/tr-qwen3-vl-3-b6981-ab45b1a )
but i dont think it's working right (see >>106964198
>>106964173
)
might work better on your end tho

Anonymous
10/21/25(Tue)16:53:57 No.106965845

Anonymous 10/21/25(Tue)16:53:57 No.106965845

File: Screenshot_20251021_21392(...).jpg (117 KB, 685x310)

117 KB JPG

>>106965179
>Technically you can never look at the code and just define the whole system in natural language.
No anon that's just a recipe of disaster and not how dev works, you don't just jizz off code into the ether and that's that, codebases need to maintained, troubleshooted and vulnerabilities need to be patched, things need to be performant and standards need to be applied, quadrupally so if what you are making interacts with literally anything else on the internet, the more complex a codebase is the more all of this applies.

>This would get you good architecture
No, this is literally the biggest flaw of LLMs as it is, they suck at this kind of abstract creative thinking, just go spend any time over at the goonerbot generals, everything they generate is a culmination of 100 re-swipes and nudging along the story with every prompt, the AI literally can't construct any big picture thing without going full schizo.

>I was trying to measure progress based on statistical measures of the activations and loaded weights (mean, stddev, correlation, mean absolute error, squared error)
Are you trying to create a product or service or just pass some arbitrary benchmarks?

>"Vibe" is a retarded zoomer buzzword that doesn't mean anything.
vibe(n.)
attested from 1967 (vibes) as an abbreviated form of vibration in the 1960s slang sense of "instinctive feeling."

>there's a bug and I give the LLM feedback about that bug, is that a "vibe"?
Do you understand what's wrong with the code that broke it? Do you understand the fix it gives you? Or are you just going off the vibe of how it looks and hope it doesn't break 10 other things? Do you then ask it to fix those things, and then go in a recursive loop of bandaiding on top of bandaiding, do you not understand how this ends up in spaghetti code and why that's bad?

>It reeks of identity politics
Wtf are you talking about lmao

Anonymous
10/21/25(Tue)16:54:58 No.106965851

Anonymous 10/21/25(Tue)16:54:58 No.106965851

>>106965845
Ran into character limit

>for the sake of your more general anti AI argument
You misjudged me, I use AI extensively to code, but you just cannot shortcut your way to developing, you need to understand what is being generated, then you can use AI as a great productivity tool

What you are doing is cool as a fun autistic side project but approaching anything close to a prod environment like this is a nightmare

Anonymous
10/21/25(Tue)16:55:04 No.106965852

Anonymous 10/21/25(Tue)16:55:04 No.106965852

>>106965782
It appears mlx-vlm does.
https://github.com/Blaizzy/mlx-vlm/pull/528
And if what you actually need is a frontend, lmstudio has integrated support.
https://github.com/lmstudio-ai/mlx-engine/pull/230

Anonymous
10/21/25(Tue)16:59:57 No.106965905

Anonymous 10/21/25(Tue)16:59:57 No.106965905

>>106965845
>>106965851
>replying to the brownoid
another actual retard with the savior complex

Anonymous
10/21/25(Tue)17:03:18 No.106965930

Anonymous 10/21/25(Tue)17:03:18 No.106965930

>>106965620
What are your high IQ projects?

Anonymous
10/21/25(Tue)17:03:19 No.106965931

Anonymous 10/21/25(Tue)17:03:19 No.106965931

>>106965794
Sorry I'm not a linux nerd (I'm working on it, but this box has reasons to run windows for the moment.)
>>106965852
So mac or linux or make my own backend I guess. =

Anonymous
10/21/25(Tue)17:03:44 No.106965936

Anonymous 10/21/25(Tue)17:03:44 No.106965936

>>106965931
Just run it in wsl, nerd.

Anonymous
10/21/25(Tue)17:05:01 No.106965942

Anonymous 10/21/25(Tue)17:05:01 No.106965942

>>106965930
I don't do open source stuff, I do actually get paid *checks TCS and RED global portal* currently by 3 different big corpos for team leading and software architecture. kys

Anonymous
10/21/25(Tue)17:08:30 No.106965968

Anonymous 10/21/25(Tue)17:08:30 No.106965968

>>106965800
After autocompleting a story I had laying around, it doesn't instantly devolve into `"dialogue" noun verb, adjective` slop that most models tend to immediately do. There is some, but there also equal amounts of avoiding `"dialogue" he/she said, eyes/voice description` style shit. I'll have to keep trying it, but I'm tentatively considering this as a replacement as I have to rewrite less compared to air

Anonymous
10/21/25(Tue)17:14:28 No.106966010

Anonymous 10/21/25(Tue)17:14:28 No.106966010

>>106965998
>>106965998
>>106965998

Anonymous
10/21/25(Tue)17:14:41 No.106966013

Anonymous 10/21/25(Tue)17:14:41 No.106966013

>>106965845
Not all code needs to be secure or updooted.
How traditional development works isn't necessarily relevant to how it can be with AI.
Performance requirements depend on the application. Otherwise high level languages wouldn't exist.
>No, this is literally the biggest flaw of LLMs, they suck at creative thinking,
???
My point was that you could determine the high level architecture yourself and let the AI code each component which is a small microservice defined in natural language (it doesn't necessarily have to interact with the other components through the network, it can be a simple stateless file with a set of functions or stateful with each function having a set of pre and post conditions, or interact through shared memory, pipes etc.). Yes, it's likely to be buggy but all non formally verified code is. That's why you do testing until you are certain the defect % is low. Or maybe in your application a program that is right 95% of the time is fine.
If you need 100% reliability you can have the AI write code and write a proof that the code meets your spec. You can also ask the AI to define the spec in a formal language from natural language and review it yourself or even just the act of asking the LLM to go through that process and letting it become aware of the errors is likely to help it make more reliable software.
>Are you trying to create a product or service or just pass some benchmark?
For now I'm trying to create a program for my personal use, so neither. It's not an arbitrary benchmark, if the code generates the right numerical outputs for a small handful of prompts then it's likely to generate accurate outputs for most cases except for weird edge cases, which is good enough.
>Do you understand what's wrong with the code
And that's my point. An objectively incorrect output given a certain input is not an "instictive feeling", it's a fact regardless of whether you have the faintest idea of why the code is generating the wrong output.

Anonymous
10/21/25(Tue)17:16:17 No.106966030

Anonymous 10/21/25(Tue)17:16:17 No.106966030

>>106965851
Autism is a fake illness used to describe people who aren't easily manipulated and 90% "production ready" "enterprise" "best practices" software is garbage, so who cares.

Anonymous
10/21/25(Tue)17:19:57 No.106966065

Anonymous 10/21/25(Tue)17:19:57 No.106966065

>>106965942
I already considered killing myself years ago and ultimately decided against it. Maybe one day but I don't see any reason to do it in the immediate future.
Congratulations on your career, but what I meant by high IQ projects wasn't specifically open source, I was asking what do you use LLMs for.

Anonymous
10/21/25(Tue)17:21:15 No.106966082

Anonymous 10/21/25(Tue)17:21:15 No.106966082

my breath hitches as I read this thread, my fingers twisting nervously in my lap

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.