/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/12/25(Fri)13:39:07 No.106566836

File: 1740718713054690.png (1.9 MB, 1416x2120)

1.9 MB PNG

/lmg/ - Local Models General Anonymous 09/12/25(Fri)13:39:07 No.106566836 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

The Raven Edition

Previous threads: >>106559371 & >>106551921

►News
>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/12/25(Fri)13:39:33 No.106566844

Anonymous 09/12/25(Fri)13:39:33 No.106566844

File: __hatsune_miku_vocaloid_d(...).jpg (767 KB, 1024x768)

767 KB JPG

►Recent Highlights from the Previous Thread: >>106559371

--Paper: ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms:
>106561145 >106561161
--Troubleshooting llama-server.exe performance with multi-GPU configurations:
>106563691 >106563763 >106563772 >106563861 >106563838 >106563879 >106563891 >106563919 >106563941 >106563960 >106564017 >106564070 >106564107 >106564154 >106564411 >106564784
--Qwen3-Next model efficiency and performance analysis:
>106560211 >106560245 >106560248 >106560269 >106560274 >106560283 >106560310 >106560291 >106560294 >106560322 >106560314 >106560338 >106560356 >106560302 >106563929
--Optimizing MoE models via selective tensor offloading:
>106559871 >106559925 >106559938 >106559943 >106559962 >106559979 >106559984 >106560000 >106560056
--Role of LLMs in TTS and image generation via autoregression and embeddings:
>106562827 >106562864 >106562981 >106563064
--Qwen3 model's verbose reasoning issues during roleplay testing:
>106561341 >106561358 >106561391
--Public server setup for Qwen3-Next-80B-A3B-Instruct with 65t/s performance:
>106563343
--TTS phonemizer bottlenecks and optimization:
>106562423 >106562450 >106562542 >106562586 >106562603 >106562493 >106563141 >106562482 >106562515 >106562543 >106562579 >106562608 >106562763 >106564046 >106564012 >106564024
--California bill to regulate AI companion chatbots:
>106563074 >106563109 >106563402 >106563820 >106563680 >106564086 >106563394
--OpenAI optimization techniques boost local transformer efficiency:
>106563608
--FTC probes major tech firms' AI chatbots for child safety risks:
>106562092
--Specialized small LLMs vs multi-purpose models:
>106564203 >106564224 >106564273 >106564280 >106564230 >106564323 >106564409 >106564560 >106564600 >106564607
--Miku (free space):
>106559401 >106562108 >106562161 >106562252

►Recent Highlight Posts from the Previous Thread: >>106559374

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/12/25(Fri)13:43:15 No.106566876

Anonymous 09/12/25(Fri)13:43:15 No.106566876

Ramlets, how are we doing today?

Anonymous
09/12/25(Fri)13:44:50 No.106566895

Anonymous 09/12/25(Fri)13:44:50 No.106566895

Mikulove

Anonymous
09/12/25(Fri)13:45:28 No.106566903

Anonymous 09/12/25(Fri)13:45:28 No.106566903

>>106566876
I compress my ram - gives approx. 2 times more memory.

Anonymous
09/12/25(Fri)13:46:16 No.106566908

Anonymous 09/12/25(Fri)13:46:16 No.106566908

>>106566903
I sparsify my ram - gives 2 times more speed.

Anonymous
09/12/25(Fri)13:46:53 No.106566920

Anonymous 09/12/25(Fri)13:46:53 No.106566920

>>106566903
I overclock my ram,
It cost twice as much because it breaks.

Anonymous
09/12/25(Fri)13:47:04 No.106566923

Anonymous 09/12/25(Fri)13:47:04 No.106566923

>>106566778
If you're using it on API, why are you using Air instead of the big one?

Anonymous
09/12/25(Fri)13:47:08 No.106566924

Anonymous 09/12/25(Fri)13:47:08 No.106566924

https://www.downloadmoreram.com/

Anonymous
09/12/25(Fri)13:48:04 No.106566936

Anonymous 09/12/25(Fri)13:48:04 No.106566936

Back in the MS-DOS days, I actually had a driver that compressed my RAM. It broke some things though.

Anonymous
09/12/25(Fri)13:48:41 No.106566944

Anonymous 09/12/25(Fri)13:48:41 No.106566944

File: nomem.png (53 KB, 575x480)

53 KB PNG

https://research.google/blog/vaultgemma-the-worlds-most-capable-differentially-private-llm/
https://services.google.com/fh/files/blogs/vaultgemma_tech_report.pdf
https://huggingface.co/google/vaultgemma-1b

The future of Google LLMs: models that know nothing about rare information. They use huge batch size to mitigate memorization, and other stuff.

>What does this mean in practice? Informally speaking, because we provide protection at the sequence level, if information relating to any (potentially private) fact or inference occurs in a single sequence, then VaultGemma essentially does not know that fact: the response to any query will be statistically similar to the result from a model that never trained on the sequence in question. However, if many training sequences contain information relevant to a particular fact, then in general VaultGemma will be able to provide that information.
>
> [...] Sequence-level DP provably bounds the influence of any single training sequence (example) on the final model. We prompted the model with a 50-token prefix from a training document to see if it would generate the corresponding 50-token suffix. VaultGemma 1B shows no detectable memorization of its training data and successfully demonstrates the efficacy of DP training.

Anonymous
09/12/25(Fri)13:49:04 No.106566952

Anonymous 09/12/25(Fri)13:49:04 No.106566952

>>106566923
Maybe he means he tried it, and found it lacking (in addition to the obese one).

Anonymous
09/12/25(Fri)13:51:03 No.106566972

Anonymous 09/12/25(Fri)13:51:03 No.106566972

>>106566876
As a 24GB vramlet I just went back to gemma3 qat and it's still the goat for ST-style RP. The writing is always fresh and pleasant to read, with outstanding vocabulary. Fucked around with offloading glm4.5-air q3 and other drummer finetunes and they seemed broken and inconsistent in their responses. Google needs to release a 80b moe gemma 4

Anonymous
09/12/25(Fri)14:04:34 No.106567073

Anonymous 09/12/25(Fri)14:04:34 No.106567073

>>106567050
Hard to say, can't see the image any longer.

Anonymous
09/12/25(Fri)14:09:13 No.106567118

Anonymous 09/12/25(Fri)14:09:13 No.106567118

File: llm.png (94 KB, 536x764)

94 KB PNG

Is this legit or over-hyped?
I find hard to believe that just 32B is enough to match GPT4 and 200B checkpoints.
With just 32B, you can run it locally in most PCs with a decent quantization that doesn't sacrifice much, having a private GPT4 with no quota limitations...sounds too good to be true.

Anonymous
09/12/25(Fri)14:10:48 No.106567127

Anonymous 09/12/25(Fri)14:10:48 No.106567127

>>106567118
Sounds like a typical marketing department sales pitch.

Anonymous
09/12/25(Fri)14:13:57 No.106567150

Anonymous 09/12/25(Fri)14:13:57 No.106567150

>>106567118
>sounds too good to be true
Congratulations, you have a brain.

Anonymous
09/12/25(Fri)14:19:22 No.106567193

Anonymous 09/12/25(Fri)14:19:22 No.106567193

>>106567118
>reasoning
always and has always been a meme

Anonymous
09/12/25(Fri)14:45:57 No.106567440

Anonymous 09/12/25(Fri)14:45:57 No.106567440

>>106567105
>>106566972
Welcome to /lmg/ thread google stealth marketing engineer technician saars. Please kindly inform us if the next gemma will be as cucked as the last one so I can decide if I should post funny jeet pictures or not.

Anonymous
09/12/25(Fri)14:47:50 No.106567457

Anonymous 09/12/25(Fri)14:47:50 No.106567457

>>106566972
> glm4.5-air
> glm4.5-air
To me work well is at the moment the best model for Vramlets, you are doing something work, this model is ultra sensitive to context temperature, when you reach too much context is time to low the temperature.

Anonymous
09/12/25(Fri)15:09:05 No.106567628

Anonymous 09/12/25(Fri)15:09:05 No.106567628

>>106566836
quoteth the raven.....

Anonymous
09/12/25(Fri)15:13:02 No.106567662

Anonymous 09/12/25(Fri)15:13:02 No.106567662

File: Untitled.png (4 KB, 481x82)

4 KB PNG

>>106565629
It's been stuck for four hours...

Anonymous
09/12/25(Fri)15:20:56 No.106567707

Anonymous 09/12/25(Fri)15:20:56 No.106567707

>>106566944
If this works how it looks like it works based on what you quoted and the image, then the model would theoretically be equivalent to stuff like Phi. Maybe a bit better. But ultimately it will have trouble with real world user queries since it would lack OOD skills and knowledge. This technique can only create models for extremely narrow use cases, not general assistant models. So if they do it for all future models, Google would be shooting themselves in the foot and lose all the market share they just spent tons of effort to claw back.

Anonymous
09/12/25(Fri)15:21:24 No.106567715

Anonymous 09/12/25(Fri)15:21:24 No.106567715

>>106567662
It's probably for the better.
WSL is a janky half-measure.
If you want to run windows for your main gaming/home office PC but you want to run linux for LLM stuff just get a second system and run linux properly.

Anonymous
09/12/25(Fri)15:31:40 No.106567795

Anonymous 09/12/25(Fri)15:31:40 No.106567795

>>106566952
GLM-chan is NOT obese!

Anonymous
09/12/25(Fri)15:32:37 No.106567806

Anonymous 09/12/25(Fri)15:32:37 No.106567806

>>106567118
Both can be true

30b's are seeing massive improvements in abilities but that has to do with coding, physics, chemical equations etc. And who gives a fuck about that. It's a glorified wikipedia. Good for grunt work.

For stuff like writing and other more complex tasks, size is still king and may be for a long time. My LLM needs to know every dragon ball z character and understand that yes, piccolo <could> wipe his ass with goku's face if he wanted to. If you want nuance and complexity, simple trivia is not gonna do it for ya.

Anonymous
09/12/25(Fri)15:33:10 No.106567814

Anonymous 09/12/25(Fri)15:33:10 No.106567814

>>106567795
(stomp stomp stomp)

Anonymous
09/12/25(Fri)15:41:05 No.106567898

Anonymous 09/12/25(Fri)15:41:05 No.106567898

>>106567118
I'll ignore all the rest of the shit in the post. What called my attention was
>2000 tokens/s on cerebras
>most reasoning models crawl at 200 tok/sec
What the fuck does reasoning have to do with the token generation speed?
And why the fuck are you paying attention to that retard?

Anonymous
09/12/25(Fri)15:45:32 No.106567937

Anonymous 09/12/25(Fri)15:45:32 No.106567937

>>106567628
ggufs nevermore

Anonymous
09/12/25(Fri)15:48:32 No.106567970

Anonymous 09/12/25(Fri)15:48:32 No.106567970

>>106567898
Assuming that's total throughput over multiple concurrent requests, that guy has skill issues with the other models or cerebras is shit.

Anonymous
09/12/25(Fri)15:49:12 No.106567977

Anonymous 09/12/25(Fri)15:49:12 No.106567977

File: ComfyUI_01105_.png (1.22 MB, 1280x720)

1.22 MB PNG

>>106567795
>>106566952
This is just AIR-CHAN, GLM-CHAN wont even fit in the frame.

Anonymous
09/12/25(Fri)16:15:22 No.106568235

Anonymous 09/12/25(Fri)16:15:22 No.106568235

File: gem3-memoriz.png (95 KB, 849x787)

95 KB PNG

>>106567707
They were already bragging how memorization in Gemma 3 was lower than Gemma 2, so I think that's the direction where things are going.

Anonymous
09/12/25(Fri)16:24:54 No.106568337

Anonymous 09/12/25(Fri)16:24:54 No.106568337

>>106566944
>adding noise to the model so it doesn't precisely reproduce input data
>reduces training stability, increases computation costs
>today’s private training methods produce models with utility comparable to that of non-private models from roughly 5 years ago, highlighting the important gap our work will help the community systematically close.

Okay, so adding noise to the model makes it significantly worse (as one would expect). They seem to think that's avoidable, but I don't know how.

Anonymous
09/12/25(Fri)16:25:14 No.106568344

Anonymous 09/12/25(Fri)16:25:14 No.106568344

>>106566944
>make your model retarded
>brag about it

Anonymous
09/12/25(Fri)16:27:37 No.106568369

Anonymous 09/12/25(Fri)16:27:37 No.106568369

>>106567118
I think punching above weight and trading blows with gpt 4 started in 2024.

Anonymous
09/12/25(Fri)16:28:02 No.106568374

Anonymous 09/12/25(Fri)16:28:02 No.106568374

File: Screenshot 2025-09-12 at (...).png (32 KB, 908x220)

32 KB PNG

macchads just cant stop winning

Anonymous
09/12/25(Fri)16:28:38 No.106568384

Anonymous 09/12/25(Fri)16:28:38 No.106568384

>>106567806
>piccolo <could> wipe his ass with goku's face if he wanted to
That is only at the start of dbz.

Anonymous
09/12/25(Fri)16:29:58 No.106568398

Anonymous 09/12/25(Fri)16:29:58 No.106568398

is next better than 235B for sex?

Anonymous
09/12/25(Fri)16:31:40 No.106568414

Anonymous 09/12/25(Fri)16:31:40 No.106568414

File: Screenshot 2025-09-12 at (...).png (58 KB, 1274x274)

58 KB PNG

macchads just cant stop winning/2

Anonymous
09/12/25(Fri)16:32:38 No.106568426

Anonymous 09/12/25(Fri)16:32:38 No.106568426

File: Screenshot 2025-09-12 at (...).png (84 KB, 1338x418)

84 KB PNG

macchads just cant stop winning/3

Anonymous
09/12/25(Fri)16:47:29 No.106568544

Anonymous 09/12/25(Fri)16:47:29 No.106568544

>>106567977
Fake news! GLM-chan is a slender young lady.

Anonymous
09/12/25(Fri)16:49:17 No.106568567

Anonymous 09/12/25(Fri)16:49:17 No.106568567

>>106568337
Differential Privacy is an area of research. Researchers work on it, promising "it'll be good soon" and push out papers. Everyone not working in that field ignores them, unless they need to bring them up for compliance like "we're working on DP, don't worry".

Anonymous
09/12/25(Fri)16:53:37 No.106568601

Anonymous 09/12/25(Fri)16:53:37 No.106568601

Macfags stopped bragging about glm. It's the one thing they had. They will stop bragging about 80b-a3b soon enough. It's the one thing they'll have for a few days.
Then it's back to just sucking cock.

Anonymous
09/12/25(Fri)16:59:59 No.106568645

Anonymous 09/12/25(Fri)16:59:59 No.106568645

File: file.png (963 KB, 1535x1185)

963 KB PNG

Qwen Image Edit abuse

Anonymous
09/12/25(Fri)17:01:20 No.106568659

Anonymous 09/12/25(Fri)17:01:20 No.106568659

anyone else testing qwen 80b right now? it feels scarily close to 235b, long context, barely hallucinates, incredibly fast. their new moe architecture must be getting close to the closed models sota.

(testing the mlx version)

Anonymous
09/12/25(Fri)17:01:56 No.106568661

Anonymous 09/12/25(Fri)17:01:56 No.106568661

>>106568235
That's weird since Gemma 3 still knows a ton of trivia compared to other similarly sized models. If it truly didn't memorize things, then it should be worse than even Qwen. Also that graph is weird. How does 4B have the exact same memorization rate as 9B and 27B?

Anonymous
09/12/25(Fri)17:03:22 No.106568674

Anonymous 09/12/25(Fri)17:03:22 No.106568674

>>106568659
It's honestly been garbage for me, outputs seem comparable or even a little worse than 30B A3B

Anonymous
09/12/25(Fri)17:04:20 No.106568680

Anonymous 09/12/25(Fri)17:04:20 No.106568680

>>106568645
>needs a lora to do it
Local nano banana when?

Anonymous
09/12/25(Fri)17:05:18 No.106568688

Anonymous 09/12/25(Fri)17:05:18 No.106568688

>>106568661
xou don't need or want the model to memorize single examples verbatim. its supposed to generalize. if the information is presented many times it will get integrated. just not a single example. its just to prevent personal information from getting memorized.

Anonymous
09/12/25(Fri)17:05:27 No.106568690

Anonymous 09/12/25(Fri)17:05:27 No.106568690

>>106568680
If you praise china hard enough, two more weeks.
If you dont 4 more months

Anonymous
09/12/25(Fri)17:06:20 No.106568697

Anonymous 09/12/25(Fri)17:06:20 No.106568697

>>106568674
seethe ggoof

Anonymous
09/12/25(Fri)17:06:38 No.106568700

Anonymous 09/12/25(Fri)17:06:38 No.106568700

>>106568680
>need a lora
wrong mentality, reframe: can train the model to do new things if desired.
Also it can already do this without, this just enhances the result faithfulness (fabric texture on fumo, painting style).

Anonymous
09/12/25(Fri)17:08:24 No.106568713

Anonymous 09/12/25(Fri)17:08:24 No.106568713

>>106568700
But I don't want to browse, search for, and maintain a library of loras like I did during the SD days. Not again...

Anonymous
09/12/25(Fri)17:09:29 No.106568718

Anonymous 09/12/25(Fri)17:09:29 No.106568718

>>106568713
Yeah me neither
But I also understand that no model, ever, will be able to do enough for what I want to see.
Being able to teach a model new stuff is important to me.
Granted this isn't super complex stuff and I can understand why you'd want it out of the box.

Anonymous
09/12/25(Fri)17:12:32 No.106568743

Anonymous 09/12/25(Fri)17:12:32 No.106568743

>>106568700
Let me guess you need 100gb of vram to train a Qwen lora?

Anonymous
09/12/25(Fri)17:12:56 No.106568747

Anonymous 09/12/25(Fri)17:12:56 No.106568747

>>106568688
So is it a bad thing then? Won't the model have more room to learn things when it's not spending time memorizing everything word for word? I mean, could this explain why Gemma seems to know so much for its parameter size?

Anonymous
09/12/25(Fri)17:13:33 No.106568753

Anonymous 09/12/25(Fri)17:13:33 No.106568753

>>106568743
yeah it's a dire situation there, I had to use prosumer hardware.
still, the results were fantastic on even a tiny dataset, model understand is clever even if the base results are dog.

Anonymous
09/12/25(Fri)17:18:17 No.106568779

Anonymous 09/12/25(Fri)17:18:17 No.106568779

>>106567628
2 weeks more

Anonymous
09/12/25(Fri)17:19:28 No.106568789

Anonymous 09/12/25(Fri)17:19:28 No.106568789

/lmg/ still struggling to cope with the fact that apple won local

Anonymous
09/12/25(Fri)17:19:48 No.106568796

Anonymous 09/12/25(Fri)17:19:48 No.106568796

>>106568645
How well does that lora work with fancy outfits?

Anonymous
09/12/25(Fri)17:20:35 No.106568803

Anonymous 09/12/25(Fri)17:20:35 No.106568803

>>106568747
I think it is a valid idea. it shouldn't hurt models at the trillions of training tokens scale. anything important will be seen multiple times from multiple examples. it just won't be able to reproduce some random persons medical information or ssn.

Anonymous
09/12/25(Fri)17:21:10 No.106568808

Anonymous 09/12/25(Fri)17:21:10 No.106568808

>>106568747
The possibly have very good/long post-training and include general trivia there.

Anonymous
09/12/25(Fri)17:22:07 No.106568821

Anonymous 09/12/25(Fri)17:22:07 No.106568821

my toaster is already creepin in on the 6 year mark
Planning my next machine to buy on this black friday, will prolly buy an rtx 5090, a ryzen 9950X3D cpu and 256gb of ddr5 memory, what are some of the more memory hungry models I should try then?

Anonymous
09/12/25(Fri)17:27:53 No.106568863

Anonymous 09/12/25(Fri)17:27:53 No.106568863

What are some must have extensions for SillyTavern?

Anonymous
09/12/25(Fri)17:28:10 No.106568866

Anonymous 09/12/25(Fri)17:28:10 No.106568866

>>106568821
Qwen3-next

Anonymous
09/12/25(Fri)17:29:11 No.106568875

Anonymous 09/12/25(Fri)17:29:11 No.106568875

File: son of a bitch.jpg (302 KB, 800x1200)

302 KB JPG

why is training a model to make it specialized so fucking hard?

Anonymous
09/12/25(Fri)17:29:43 No.106568879

Anonymous 09/12/25(Fri)17:29:43 No.106568879

>>106568863
a girlfriend

Anonymous
09/12/25(Fri)17:31:36 No.106568892

Anonymous 09/12/25(Fri)17:31:36 No.106568892

>>106568879
I already have one. She doesn't like AI or me using it but she respects my drive to learn and be skilled with computers.

Anonymous
09/12/25(Fri)17:31:59 No.106568897

Anonymous 09/12/25(Fri)17:31:59 No.106568897

>>106568875
its easier now then it was 10 years ago.

Anonymous
09/12/25(Fri)17:32:11 No.106568898

Anonymous 09/12/25(Fri)17:32:11 No.106568898

>>106568892
does she know you are having sex with it?

Anonymous
09/12/25(Fri)17:33:09 No.106568907

Anonymous 09/12/25(Fri)17:33:09 No.106568907

>>106568645
>spyware UI
any other options?

Anonymous
09/12/25(Fri)17:34:15 No.106568916

Anonymous 09/12/25(Fri)17:34:15 No.106568916

qwen goofs???????????????

Anonymous
09/12/25(Fri)17:35:11 No.106568923

Anonymous 09/12/25(Fri)17:35:11 No.106568923

>>106568916
https://huggingface.co/mlx-community/Qwen3-Next-80B-A3B-Instruct-4bit

Anonymous
09/12/25(Fri)17:35:28 No.106568925

Anonymous 09/12/25(Fri)17:35:28 No.106568925

File: file.png (2.1 MB, 1593x1073)

2.1 MB PNG

>>106568796
dunno, you can crank an arbitrary resolution though

Anonymous
09/12/25(Fri)17:35:56 No.106568931

Anonymous 09/12/25(Fri)17:35:56 No.106568931

>>106566944
safetymaxxing niggers

Anonymous
09/12/25(Fri)17:36:48 No.106568936

Anonymous 09/12/25(Fri)17:36:48 No.106568936

>>106568923
>mlx
I said goofs nigger, im not downloading lm studio or vllm for the awq variant

Anonymous
09/12/25(Fri)17:38:02 No.106568947

Anonymous 09/12/25(Fri)17:38:02 No.106568947

can I run glm4.5v in llama.cpp? I cant find gguf for it

Anonymous
09/12/25(Fri)17:38:10 No.106568949

Anonymous 09/12/25(Fri)17:38:10 No.106568949

>>106568875
If you're trying to teach it domain-specific information, it's pretty much impossible with a tiny LoRA and/or without heavily affecting previous knowledge with a huge learning rate and burning the information into the weights.
Using summarized information might work better/faster than entire documents, but good luck training the model in a way to make it truly understand the information without just parroting it (verbatim, even).

Anonymous
09/12/25(Fri)17:39:39 No.106568962

Anonymous 09/12/25(Fri)17:39:39 No.106568962

>>106568936
wait 2 more weeks then faggot

Anonymous
09/12/25(Fri)17:42:47 No.106568998

Anonymous 09/12/25(Fri)17:42:47 No.106568998

File: 1746577128453113.png (2.52 MB, 1328x1328)

2.52 MB PNG

>>106568962

Anonymous
09/12/25(Fri)17:43:36 No.106569008

Anonymous 09/12/25(Fri)17:43:36 No.106569008

File: j61735c8xsof1.png (192 KB, 3000x1800)

192 KB PNG

qwen next is officially goated

Anonymous
09/12/25(Fri)17:43:54 No.106569014

Anonymous 09/12/25(Fri)17:43:54 No.106569014

>>106568680
judging by the google studio api outages the chinese are already working on it

Anonymous
09/12/25(Fri)17:44:39 No.106569021

Anonymous 09/12/25(Fri)17:44:39 No.106569021

>>106569008
what happens at 100%?

Anonymous
09/12/25(Fri)17:45:01 No.106569026

Anonymous 09/12/25(Fri)17:45:01 No.106569026

>>106569008
>80b
>worse than 30b coder
go fuck your herpes infested goat

Anonymous
09/12/25(Fri)17:45:06 No.106569029

Anonymous 09/12/25(Fri)17:45:06 No.106569029

>>106569008
why is thinking variant so much worse than the regular chat version?

Anonymous
09/12/25(Fri)17:46:04 No.106569044

Anonymous 09/12/25(Fri)17:46:04 No.106569044

>>106569014
But distilling never gets the performance of the original...

Anonymous
09/12/25(Fri)17:48:17 No.106569072

Anonymous 09/12/25(Fri)17:48:17 No.106569072

>>106569029
it hasnt been trained on the secret benchmaxx

Anonymous
09/12/25(Fri)17:48:34 No.106569075

Anonymous 09/12/25(Fri)17:48:34 No.106569075

File: Screenshot 2025-09-12 at (...).png (147 KB, 1984x1078)

147 KB PNG

yeah im starting to think its over

Anonymous
09/12/25(Fri)17:48:59 No.106569082

Anonymous 09/12/25(Fri)17:48:59 No.106569082

File: 1743173215999927.png (56 KB, 1000x1000)

56 KB PNG

>>106569008
>officially goated
>Lost to Qwen3-Coder-30B
Dense bros we can't stop winning

Anonymous
09/12/25(Fri)17:51:18 No.106569104

Anonymous 09/12/25(Fri)17:51:18 No.106569104

>>106569072
wouldn't that be kind of against the point if they have to train it specifically for the benchmark? That's like one of the big flaws with test driven programming where you make your program fit your test rather than your actual problem.

Anonymous
09/12/25(Fri)17:52:20 No.106569111

Anonymous 09/12/25(Fri)17:52:20 No.106569111

>>106569075
GLM4.5 AIR BROS, WE CANT STOP WINNING!!!

Anonymous
09/12/25(Fri)17:57:44 No.106569153

Anonymous 09/12/25(Fri)17:57:44 No.106569153

>pull vllm
>follow the same steps I did last time that I wrote down for successfully compiling it, which I had to change and come up with new steps for as the ones previous stopped working at some point
>doesn't work anymore either, giving different errors
Sigh...

Anonymous
09/12/25(Fri)18:04:18 No.106569204

Anonymous 09/12/25(Fri)18:04:18 No.106569204

>>106568907
>>spyware UI
Uhh what??

Anonymous
09/12/25(Fri)18:06:38 No.106569224

Anonymous 09/12/25(Fri)18:06:38 No.106569224

so... has anyone modified a onahole to interact with a LLM yet?

Anonymous
09/12/25(Fri)18:08:05 No.106569237

Anonymous 09/12/25(Fri)18:08:05 No.106569237

File: 94854 - SoyBooru.png (997 KB, 1534x1382)

997 KB PNG

Groksirs when is we getting supports?
https://github.com/ggml-org/llama.cpp/pull/15539
@CUDAdev sir please do the needful Vishnu bless yo

Anonymous
09/12/25(Fri)18:11:33 No.106569268

Anonymous 09/12/25(Fri)18:11:33 No.106569268

>vllm supports gguf guys!!!
>try loading a gguf
>it just errors out
My ass it's supported. Now I'm downloading some safetensors to try again and see if it's a model issue or my build is just fucked for some reason.

Anonymous
09/12/25(Fri)18:13:02 No.106569281

Anonymous 09/12/25(Fri)18:13:02 No.106569281

>>106569204
sends your data to jewgle on startup. API nodes, electron have it but they are optional. the manager calls home

Anonymous
09/12/25(Fri)18:13:30 No.106569285

Anonymous 09/12/25(Fri)18:13:30 No.106569285

>>106568925
>/gig/ on /lmg/
Weird colab

Anonymous
09/12/25(Fri)18:15:17 No.106569299

Anonymous 09/12/25(Fri)18:15:17 No.106569299

https://www.trendforce.com/news/2025/09/11/news-kioxia-reportedly-eyes-2027-launch-for-nvidia-partnered-ai-ssds-with-100x-speed-boost/
>Kioxia, in partnership with NVIDIA, is developing next-generation SSDs aimed at AI servers, targeting commercialization by 2027 with read speeds nearly 100 times faster than current models, reaching up to 200 million IOPS using PCIe 7.0 technology. These SSDs, designed to partially replace HBM as GPU memory expanders, reflect the growing AI-driven demand in storage, with projections indicating that AI-related NAND will comprise 34% of the global NAND market by 2029, adding $29 billion in total addressable market (TAM). As a result, a U.S. investment firm warns of a potential NAND shortage starting in 2026, exacerbated by increased adoption of QLC eSSDs, Nearline SSDs, and high-bandwidth flash in response to tightening HDD supplies and AI infrastructure needs.
SSDmaxxers, 2027 will be your year!

Anonymous
09/12/25(Fri)18:17:26 No.106569319

Anonymous 09/12/25(Fri)18:17:26 No.106569319

>>106569299
>Kioxia, in partnership with NVIDIA
doa

Anonymous
09/12/25(Fri)18:19:48 No.106569331

Anonymous 09/12/25(Fri)18:19:48 No.106569331

>>106569268
When I tried it, I couldn't get a single moe gguf to load. I was expecting it to be slow and unoptimized, but it didn't even load.

Anonymous
09/12/25(Fri)18:20:43 No.106569335

Anonymous 09/12/25(Fri)18:20:43 No.106569335

>>106569268
just use llama for goofs bro

Anonymous
09/12/25(Fri)18:23:10 No.106569349

Anonymous 09/12/25(Fri)18:23:10 No.106569349

>>106569268
Support for gguf on vllm is ass anyway

Anonymous
09/12/25(Fri)18:24:17 No.106569356

Anonymous 09/12/25(Fri)18:24:17 No.106569356

>>106569268
>he expects corposlop pyshit to "just work" without suffering through dependency hell

Anonymous
09/12/25(Fri)18:24:18 No.106569357

Anonymous 09/12/25(Fri)18:24:18 No.106569357

>>106569268
Ok I just tried loading a small safetensors model and it also failed. Searching the error on the github issues gives 0 hits.
Wtf is wrong with vllm man.
I suppose GPU would probably work fine as I can just download the prebuilt wheels, but the CPU build is not well.

>>106569331
Thanks.
I think CPU AND goof support just simply cannot be expected to be stable on vllm. Let alone GPU + CPU inference which isn't currently supported.

>>106569335
>>106569349
It seems even safetensors don't work on my build kek. They don't truly "support" goofs or CPU either.

Anonymous
09/12/25(Fri)18:25:51 No.106569367

Anonymous 09/12/25(Fri)18:25:51 No.106569367

you will give me ze best GERMAN low latency TTS or STS model right now!
I'm tired of my models turning into drooling retards when trying to pronounce 25 aka FÜNFUNDZWANZIG!
>fünfffffuuffuhffffzwwffggg 'hick!
Don't make me use jewgle voices...(fuck elevenlabs, they aren't even that good).

Anonymous
09/12/25(Fri)18:27:23 No.106569379

Anonymous 09/12/25(Fri)18:27:23 No.106569379

File: thedrummer pfp.png (77 KB, 200x200)

77 KB PNG

Did Drummer finally troon out?

Anonymous
09/12/25(Fri)18:27:56 No.106569385

Anonymous 09/12/25(Fri)18:27:56 No.106569385

>>106569357
Thier github is practically useless and it seems like all support happens through their discord.
What error did you get? Try using the V0 engine. They rushed getting V1 out and making it the default while it was still a broken pile of shit missing more than half the features of V0.

Anonymous
09/12/25(Fri)18:28:23 No.106569391

Anonymous 09/12/25(Fri)18:28:23 No.106569391

>>106569367
Just directly pass your ipa transcription?

Anonymous
09/12/25(Fri)18:29:12 No.106569402

Anonymous 09/12/25(Fri)18:29:12 No.106569402

>>106569367
VibeVoice
>low latency
oh...

Anonymous
09/12/25(Fri)18:29:24 No.106569407

Anonymous 09/12/25(Fri)18:29:24 No.106569407

File: 1752012802872567.png (44 KB, 562x479)

44 KB PNG

>>106569379
What's with this tech to troon pipeline?

Anonymous
09/12/25(Fri)18:31:07 No.106569413

Anonymous 09/12/25(Fri)18:31:07 No.106569413

>>106569407
>What's with this tech to troon pipeline?
The terminally online fit into two groups mostly, the mentally ill and assholes. If you are weak to praise and group think the former is where you will stay. If you just want to solve problems you are going to argue try out things fix it come back and then call everyone a dumbass.

Anonymous
09/12/25(Fri)18:35:02 No.106569441

Anonymous 09/12/25(Fri)18:35:02 No.106569441

How do you guys usually write character prompts? Do you just write a paragraph describing them, or something more structured?

Anonymous
09/12/25(Fri)18:37:35 No.106569471

Anonymous 09/12/25(Fri)18:37:35 No.106569471

>>106569413
your logic is a self report
anyway, maybe focus less on people, or do you keep your nose firmly buried up everyone's ass

Anonymous
09/12/25(Fri)18:38:01 No.106569474

Anonymous 09/12/25(Fri)18:38:01 No.106569474

Since I'm an esl I'd like to know if this prosody using heuristics (kokoro) sounds acceptable for americans: https://vocaroo.com/17z5mdm2a0yU
The sample is complex on purpose so I can test a bunch of heuristics at once: "At 11:30 p.m. on Jan. 3, 2024, the project's lead (Dr. O'Neil) announced, "We'll ship v2.1 to the U.S. and EU by Q2," but a $1250.50 shortfall, a 5% processing fee, and three critical API regressions forced the team to triage, reassign tasks, and reconvene Monday—prioritizing user-facing fixes over backend refactors to preserve product quality."

Anonymous
09/12/25(Fri)18:39:28 No.106569488

Anonymous 09/12/25(Fri)18:39:28 No.106569488

File: hatsune miku (not sam altman).png (1021 KB, 512x1024)

1021 KB PNG

>>106569441
Everyone here just asks GPT5 to write it for them and improve it. Nobody uses local models for roleplay, GPT5 is the current meta.

Anonymous
09/12/25(Fri)18:39:50 No.106569492

Anonymous 09/12/25(Fri)18:39:50 No.106569492

>>106569471
What isnt a self report? im writing myself my opinion what else am i suppose to right or think

Anonymous
09/12/25(Fri)18:41:13 No.106569503

Anonymous 09/12/25(Fri)18:41:13 No.106569503

The hell kinda quant method are you supposed to use again? I've seen conflicting reports.

Anonymous
09/12/25(Fri)18:41:53 No.106569510

Anonymous 09/12/25(Fri)18:41:53 No.106569510

>>106569492
>>106569492
penis

Anonymous
09/12/25(Fri)18:43:32 No.106569520

Anonymous 09/12/25(Fri)18:43:32 No.106569520

>>106569391
Local or private, show me a conversational STT>TTS method with decent enough latency. Best I found was some hugging face space from the transformers.js dudedev. But it was kinda meh and engerrish only. and it had no dialogue turn system or whatever that interruption mechanic is called. I really cba developing this as I'm more interested in the backend stuff. I'd just use openAI realtime API for prototyping, but fuck me those prices are surreal.

Anonymous
09/12/25(Fri)18:46:27 No.106569553

Anonymous 09/12/25(Fri)18:46:27 No.106569553

>>106569385
>discord
Ugh.
The error is "TypeError: Cannot instantiate typing.Literal". I guess I could ask my llm about it to see if it possibly has any solutions.
How do I use the V0 engine? I tried the environment variable but it doesn't seem to do anything?

Anonymous
09/12/25(Fri)18:46:43 No.106569557

Anonymous 09/12/25(Fri)18:46:43 No.106569557

>>106569520
Check this https://github.com/Open-LLM-VTuber/Open-LLM-VTuber it has a barge-in system which is the interruption mechanic you're looking for

Anonymous
09/12/25(Fri)18:47:24 No.106569564

Anonymous 09/12/25(Fri)18:47:24 No.106569564

>>106569474
>https://vocaroo.com/17z5mdm2a0yU
sounds fine

Anonymous
09/12/25(Fri)18:50:09 No.106569594

Anonymous 09/12/25(Fri)18:50:09 No.106569594

>>106569357
I was actually fucking with the cpu install just to see if I could give next a little test or two, but I could smell it being a migraine the minute I started running into weird dependency mismatches. I'd honestly rather wait the multiple weeks it'll take to get support in llamacpp only to test it and go "yeah, it's pretty shit for writing" anyway. Shame, because small active parameter models are great for cheap context and being relatively fast off the bat. Even jamba with more active parameters is still pretty snappy if you put enough of it into vram, but sperganov has yet to fix the constant reprocessing on new messages for it, or multiple a couple other models that whatever the fuck they coded that causes this.

Anonymous
09/12/25(Fri)18:52:43 No.106569614

Anonymous 09/12/25(Fri)18:52:43 No.106569614

>>106569503
IQ2 is the new meta, really. You will not notice any difference even when using smaller models.

Anonymous
09/12/25(Fri)18:53:08 No.106569617

Anonymous 09/12/25(Fri)18:53:08 No.106569617

>>106569557
yeah that looks promising, will give it a shot. thanks, pedo.

Anonymous
09/12/25(Fri)18:55:14 No.106569630

Anonymous 09/12/25(Fri)18:55:14 No.106569630

>>106569553
>How do I use the V0 engine? I tried the environment variable but it doesn't seem to do anything?
Read the output on startup, it should tell you which engine is being used.
>TypeError: Cannot instantiate typing.Literal
See if there's any hints in the stack trace before this part. For me, the only success I had when vllm decided to throw errors was upgrading or downgrading (at the cost of model support) the vllm version. Using the V0 engine solved a lot of trouble for me, but once they hit v10, I gave up on messing with it.

Anonymous
09/12/25(Fri)18:57:09 No.106569646

Anonymous 09/12/25(Fri)18:57:09 No.106569646

>>106569614
He says as he spins on his heel and then says how q8 kv cache is disastrous for models or something

Anonymous
09/12/25(Fri)18:59:19 No.106569666

Anonymous 09/12/25(Fri)18:59:19 No.106569666

>>106569630
Yeah I think I'll just stop here if my LLM doesn't solve it. Don't feel like trying out various versions.
And honestly I have a feeling the CPU performance is worse than Llama.cpp's anyway, but it'd be nice to actually confirm.

Anonymous
09/12/25(Fri)19:00:09 No.106569672

Anonymous 09/12/25(Fri)19:00:09 No.106569672

File: 1746701731814999.jpg (287 KB, 1920x1080)

287 KB JPG

>>106569617
break a leg

Anonymous
09/12/25(Fri)19:18:07 No.106569780

Anonymous 09/12/25(Fri)19:18:07 No.106569780

>>106568925
>>106568645
damn, wasn't qwen-image and qwen-image edit supposed to be a slopped failure that's not worth it to run?

Anonymous
09/12/25(Fri)19:26:16 No.106569817

Anonymous 09/12/25(Fri)19:26:16 No.106569817

Is there some kind of model that can act as a sort of twitch chat for when you're playing games by yourself? Like you give it a video feed and it reacts in some way. Just so that it's not so lonely.

Anonymous
09/12/25(Fri)19:27:28 No.106569822

Anonymous 09/12/25(Fri)19:27:28 No.106569822

>>106569817
im gonna use this idea to become a billionaire
thanks

Anonymous
09/12/25(Fri)19:28:50 No.106569829

Anonymous 09/12/25(Fri)19:28:50 No.106569829

>>106569822
You'd be lucky to make lunch money. The only billionare is the owner of whatever platform streaming platform you use.

Anonymous
09/12/25(Fri)19:29:49 No.106569832

Anonymous 09/12/25(Fri)19:29:49 No.106569832

>finally found a model with little censoring and pretty comptetent logic
>leaks chatml all over the place half the time
sigh

Anonymous
09/12/25(Fri)19:32:37 No.106569839

Anonymous 09/12/25(Fri)19:32:37 No.106569839

>>106569817
Having used 30b+ models, it depends. If you start off in a prose setting, then ask it to interject with something like a chat/review section (eg: character reads chat mid-story), it will fuck it up. Off the bat with examples, maybe. As for giving an llm a video feed, I don't think that's feasible at the moment unless you have a shitload of vram or some kind of highly specialized and hand written pipeline

Anonymous
09/12/25(Fri)19:33:25 No.106569846

Anonymous 09/12/25(Fri)19:33:25 No.106569846

>>106569817
>Just so that it's not so lonely
This is a general for pretend sex with computers and yet this post is one of the most pathetic things I've ever read

Anonymous
09/12/25(Fri)19:34:45 No.106569856

Anonymous 09/12/25(Fri)19:34:45 No.106569856

File: p5wYpZT.jpg (646 KB, 1920x1080)

646 KB JPG

Just became a 128gb ramGOD with 24gb vram. What's the best I can run now?

Anonymous
09/12/25(Fri)19:36:34 No.106569868

Anonymous 09/12/25(Fri)19:36:34 No.106569868

>>106569856
>ram
>What's the best I can run now
You mean crawl?

Anonymous
09/12/25(Fri)19:36:39 No.106569869

Anonymous 09/12/25(Fri)19:36:39 No.106569869

>>106569817
for the millionth time, no we cant build screenreading twitch yet, no we dont know how neuro does it and it cannot be done locally for any sane price

Anonymous
09/12/25(Fri)19:37:38 No.106569874

Anonymous 09/12/25(Fri)19:37:38 No.106569874

>>106569856
Probably glm 4.5 at iq3, with a whopping 9 t/s on empty context

Anonymous
09/12/25(Fri)19:38:16 No.106569876

Anonymous 09/12/25(Fri)19:38:16 No.106569876

>>106569869
Sorry...

Anonymous
09/12/25(Fri)19:38:26 No.106569878

Anonymous 09/12/25(Fri)19:38:26 No.106569878

>>106569832
What model and how does it "leak" "chatml" "all" "over" "the" "place"?

Anonymous
09/12/25(Fri)19:40:07 No.106569888

Anonymous 09/12/25(Fri)19:40:07 No.106569888

>>106569856
qwen235b q4

Anonymous
09/12/25(Fri)19:43:15 No.106569905

Anonymous 09/12/25(Fri)19:43:15 No.106569905

>>106569856
>128gb ramGOD
sounds you are a channellet with 2 channels at most

Anonymous
09/12/25(Fri)19:48:48 No.106569920

Anonymous 09/12/25(Fri)19:48:48 No.106569920

>>106569905
Times are changing old man, I could barely fit a llama2 13b at 4k context and now I can run 100b moes and 32b dense models with bare minimum 16k context yet I have not bothered buying new hardware

Anonymous
09/12/25(Fri)19:49:30 No.106569923

Anonymous 09/12/25(Fri)19:49:30 No.106569923

>>106569817
Could you get a primitive version by sending screenshots through a multimodal model?

Anonymous
09/12/25(Fri)19:55:33 No.106569955

Anonymous 09/12/25(Fri)19:55:33 No.106569955

>>106569923
If you don't mind minute+ long latency.

Anonymous
09/12/25(Fri)20:04:39 No.106570004

Anonymous 09/12/25(Fri)20:04:39 No.106570004

>>106569869
>no we dont know how neuro does it
It's still hilarious that some random guy built a better utilization for AI than trillions in VC cash between every major corporation

Anonymous
09/12/25(Fri)20:05:54 No.106570013

Anonymous 09/12/25(Fri)20:05:54 No.106570013

>>106570004
Did you forget Ani exists?

Anonymous
09/12/25(Fri)20:06:26 No.106570016

Anonymous 09/12/25(Fri)20:06:26 No.106570016

>>106570004
>It's still hilarious that some random guy built a better utilization for AI than trillions in VC cash between every major corporation
Not really, if you dig into anything you will realize its a very small group of people actually doing anything at all sometimes its just one hyperfocused dude who does nothing but that for years cause of autism.

Anonymous
09/12/25(Fri)20:06:28 No.106570017

Anonymous 09/12/25(Fri)20:06:28 No.106570017

>>106570013
I wish I could

Anonymous
09/12/25(Fri)20:07:32 No.106570027

Anonymous 09/12/25(Fri)20:07:32 No.106570027

>>106570013
Someone post the mouth animations

Anonymous
09/12/25(Fri)20:08:44 No.106570036

Anonymous 09/12/25(Fri)20:08:44 No.106570036

>>106569869
Uhh, techlet?
>stream has a few minutes long delay (this is what most parasites do normally even)
>selected twitch chat entries are redirected to llm
It's not rocket science.
He wrote a backend what controls the character and llm and integrates them together but I can assure I could make a demo if I had more interest.

Anonymous
09/12/25(Fri)20:10:10 No.106570048

Anonymous 09/12/25(Fri)20:10:10 No.106570048

>>106570036
>I can assure I could make a demo if I had more interest.
That means you cant, and no one else has cracked it as good and made it available.

Anonymous
09/12/25(Fri)20:10:47 No.106570054

Anonymous 09/12/25(Fri)20:10:47 No.106570054

>I could
lol

Anonymous
09/12/25(Fri)20:15:48 No.106570087

Anonymous 09/12/25(Fri)20:15:48 No.106570087

The new fiscal quarter starts in October. As usual, this will be when companies start pushing out new models to look good.
Two more weeks and the big releases start.

Anonymous
09/12/25(Fri)20:16:51 No.106570094

Anonymous 09/12/25(Fri)20:16:51 No.106570094

>>106570048
>no one else has cracked it as good and made it available.
What is the incentive to put in that much work just to make it available because you want it? Even if I put in that much effort, I would just make a Neuro knockoff and try to make money off it.

Anonymous
09/12/25(Fri)20:18:24 No.106570100

Anonymous 09/12/25(Fri)20:18:24 No.106570100

>>106570094
Okay thats fair, but still if you can clone it and make money why not? how come none of the 'i made my own neuro' is close to his?

Anonymous
09/12/25(Fri)20:19:43 No.106570107

Anonymous 09/12/25(Fri)20:19:43 No.106570107

My implementation is cool she's just on the Canadian Github

Anonymous
09/12/25(Fri)20:20:22 No.106570114

Anonymous 09/12/25(Fri)20:20:22 No.106570114

>>106570048
You are just too stupid and/or underage even. Jesus christ these ERPers shouldn't even allowed to post in this thread.

Anonymous
09/12/25(Fri)20:52:58 No.106570266

Anonymous 09/12/25(Fri)20:52:58 No.106570266

I just did a test of GPU performance with vllm and llama.cpp. With Qwen 4B, original BF16 safetensors, I got around 71 t/s on vllm with empty context, and 45 t/s on llama.cpp with a BF16 GGUF. At 8k context, Llama.cpp got 44 t/s, and vllm got 60 t/s. I also tried an F16 GGUF and it was like 2 t/s faster. These results suggest that at least on my system with Qwen 4B on full precision, there is something bottlenecking Llama.cpp. Maybe it'd be closer with a larger parameter model, but I don't have the VRAM to hold the full precision weights.

Anonymous
09/12/25(Fri)20:54:58 No.106570278

Anonymous 09/12/25(Fri)20:54:58 No.106570278

>>106570054
Problem nigger?

Anonymous
09/12/25(Fri)20:57:02 No.106570295

Anonymous 09/12/25(Fri)20:57:02 No.106570295

>>106569082
So, a general instruct model lost to a model that was specialized for coding at coding, and that's supposed to be a mark against the general instruct model?

Anonymous
09/12/25(Fri)21:03:58 No.106570343

Anonymous 09/12/25(Fri)21:03:58 No.106570343

File: 1752714112927008.jpg (32 KB, 540x540)

32 KB JPG

>>106569869
Nah you coomers are braindead. There are bazillions of projects like these on github https://github.com/kimjammer/Neuro

Anonymous
09/12/25(Fri)21:08:43 No.106570369

Anonymous 09/12/25(Fri)21:08:43 No.106570369

File: Gx5J_2VaoAAkfJO.jpg (1.06 MB, 4032x3024)

1.06 MB JPG

>>106569817
>>106569869
>>106570343
the one that can play games with ai:
https://github.com/moeru-ai/airi

Anonymous
09/12/25(Fri)21:10:29 No.106570386

Anonymous 09/12/25(Fri)21:10:29 No.106570386

>>106566836
Question about -ts command on llamacpp, when considering a split, would I consider the fact that my main(first) GPU will have vram in use from other programs and windows when considering the split? Or does llama.cpp take that into consideration and balance it properly? Are there any other commands that will just split the vram evenly between two cards without having to adjust numbers with -ts? I find myself using such odd -ts number combos to get an almost even vram usage split, I don't know why.

For example, currently -ts 24,15 has it split almost evenly between my cards which makes no sense to me considering my 1st card is using vram for other programs and windows. I just don't like how I have to keep re-loading the model over and over trying different numbers until I find a combo that splits it properly.

Anonymous
09/12/25(Fri)21:11:19 No.106570396

Anonymous 09/12/25(Fri)21:11:19 No.106570396

What if my computer is over a decade old with no vram? Is local llms the hobby for me.

Anonymous
09/12/25(Fri)21:12:14 No.106570405

Anonymous 09/12/25(Fri)21:12:14 No.106570405

File: 1747024248873642.jpg (119 KB, 600x450)

119 KB JPG

>>106570396

Anonymous
09/12/25(Fri)21:23:24 No.106570480

Anonymous 09/12/25(Fri)21:23:24 No.106570480

>>106570369
Wait, so it is possible? Why were anons being mean to me? Are they trying to keep this tech all to themselves?

Anonymous
09/12/25(Fri)21:24:38 No.106570488

Anonymous 09/12/25(Fri)21:24:38 No.106570488

File: 4257584087.jpg (34 KB, 628x480)

34 KB JPG

>>106570369
>ElevenLabs voice synthesis

Anonymous
09/12/25(Fri)21:50:41 No.106570646

Anonymous 09/12/25(Fri)21:50:41 No.106570646

>>106570480
You talked to clueless retards. Very few here know more than edging to chub cards

Anonymous
09/12/25(Fri)21:50:55 No.106570647

Anonymous 09/12/25(Fri)21:50:55 No.106570647

>>106570480
they're all tsun with little dere here

Anonymous
09/12/25(Fri)22:15:12 No.106570796

Anonymous 09/12/25(Fri)22:15:12 No.106570796

File: 1746939416534722.png (882 KB, 896x704)

882 KB PNG

>>106570488
It's the best and will continue to be the best

Anonymous
09/12/25(Fri)22:16:07 No.106570807

Anonymous 09/12/25(Fri)22:16:07 No.106570807

>>106570796
China will rip off vibe voice and make it better.
I believe

Anonymous
09/12/25(Fri)22:30:41 No.106570867

Anonymous 09/12/25(Fri)22:30:41 No.106570867

File: ocr.jpg (503 KB, 1742x926)

503 KB JPG

Finally a model that passes this test and it's only 1.7B and open sourced. wild

Anonymous
09/12/25(Fri)22:36:45 No.106570892

Anonymous 09/12/25(Fri)22:36:45 No.106570892

>>106570867
Didn't it get the 8th character wrong?

Anonymous
09/12/25(Fri)22:38:51 No.106570901

Anonymous 09/12/25(Fri)22:38:51 No.106570901

>>106570867
the model alone or the whole stack?

Anonymous
09/12/25(Fri)22:39:13 No.106570902

Anonymous 09/12/25(Fri)22:39:13 No.106570902

>>106570892
well fuck, guess there's always next model

Anonymous
09/12/25(Fri)22:48:10 No.106570943

Anonymous 09/12/25(Fri)22:48:10 No.106570943

>>106570901
just the model

Anonymous
09/12/25(Fri)22:51:31 No.106570964

Anonymous 09/12/25(Fri)22:51:31 No.106570964

What is a good model for being negative and critical? I hate how they agree with everything. I want to be told I'm wrong or being an idiot.

Anonymous
09/12/25(Fri)23:09:39 No.106571070

Anonymous 09/12/25(Fri)23:09:39 No.106571070

For those of you who use the models for anything other than porn, what is the best way to let the model browse the web to search for info?
In my opinion the difference nowadays between proprietary models and local is mostly in the tooling integration rather than the actual models.

Anonymous
09/12/25(Fri)23:10:45 No.106571077

Anonymous 09/12/25(Fri)23:10:45 No.106571077

File: 1744916497455479.png (126 KB, 812x537)

126 KB PNG

>>106570964
Kimi k2 is the GOAT

Anonymous
09/12/25(Fri)23:13:58 No.106571090

Anonymous 09/12/25(Fri)23:13:58 No.106571090

>>106571077
>Kimi k2
Is kimi k2 local? can you run it?

Anonymous
09/12/25(Fri)23:14:49 No.106571094

Anonymous 09/12/25(Fri)23:14:49 No.106571094

>>106571090
No but I understand that one or two people here can run it :)
nta btw

Anonymous
09/12/25(Fri)23:14:57 No.106571095

Anonymous 09/12/25(Fri)23:14:57 No.106571095

>>106571090
yes

Anonymous
09/12/25(Fri)23:16:10 No.106571099

Anonymous 09/12/25(Fri)23:16:10 No.106571099

>>106571090
It's 1T/30A

Anonymous
09/12/25(Fri)23:16:36 No.106571102

Anonymous 09/12/25(Fri)23:16:36 No.106571102

>>106571094
>one or two people here can run it :)
I wish i was one of them.

Anonymous
09/12/25(Fri)23:16:48 No.106571105

Anonymous 09/12/25(Fri)23:16:48 No.106571105

>>106571077
Can it still talk about medical or mental stuff or does it just shut down?

Anonymous
09/12/25(Fri)23:19:24 No.106571123

Anonymous 09/12/25(Fri)23:19:24 No.106571123

File: 1747381911296038.png (2.23 MB, 1254x1600)

2.23 MB PNG

>>106571105
post your full medical history and social security number and i'll ask my buddy kimi

Anonymous
09/12/25(Fri)23:38:20 No.106571216

Anonymous 09/12/25(Fri)23:38:20 No.106571216

any kokoro voice recs?
https://voca.ro/1jAMPLyV0zJA

Anonymous
09/12/25(Fri)23:40:59 No.106571237

Anonymous 09/12/25(Fri)23:40:59 No.106571237

>>106571216
Bateman is always good.
https://files.catbox.moe/bwv1fc.mp3

Anonymous
09/12/25(Fri)23:42:24 No.106571243

Anonymous 09/12/25(Fri)23:42:24 No.106571243

>>106571216
Can it do Japanese sex, moaning, and blowjob noises?
If no, it's worthless

Anonymous
09/12/25(Fri)23:47:49 No.106571266

Anonymous 09/12/25(Fri)23:47:49 No.106571266

>>106571243
VibeVoice can, but no api support yet

Anonymous
09/13/25(Sat)00:02:45 No.106571337

Anonymous 09/13/25(Sat)00:02:45 No.106571337

>>106571243
>braindead coomer

Anonymous
09/13/25(Sat)00:04:34 No.106571347

Anonymous 09/13/25(Sat)00:04:34 No.106571347

There arent any coomers here. We are all using this technology safely and to enhance our lives and work abilities.

Anonymous
09/13/25(Sat)00:08:31 No.106571360

Anonymous 09/13/25(Sat)00:08:31 No.106571360

>>106571070
>what is the best way to let the model browse the web to search for info?
MCP

Anonymous
09/13/25(Sat)00:12:20 No.106571379

Anonymous 09/13/25(Sat)00:12:20 No.106571379

>>106571337
gooners are the reason AI has advanced so much
a 4chan holo gooner invented chain of thought

Anonymous
09/13/25(Sat)00:27:17 No.106571466

Anonymous 09/13/25(Sat)00:27:17 No.106571466

>>106571347
>There arent any coomers here
Sorry I was offline for a bit, I'm back now.

Anonymous
09/13/25(Sat)00:39:30 No.106571553

Anonymous 09/13/25(Sat)00:39:30 No.106571553

>>106571466
show me your coom

Anonymous
09/13/25(Sat)00:57:48 No.106571656

Anonymous 09/13/25(Sat)00:57:48 No.106571656

KH music came on in my playlist and I remembered the lyrics poster :)

Anonymous
09/13/25(Sat)01:09:06 No.106571715

Anonymous 09/13/25(Sat)01:09:06 No.106571715

File: 1757153848305659.jpg (234 KB, 998x1321)

234 KB JPG

>>106570892
Come on now, let the man rest

Anonymous
09/13/25(Sat)01:32:20 No.106571835

Anonymous 09/13/25(Sat)01:32:20 No.106571835

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

I made a Miku for you guys. Feel free to use at your leisure.

Anonymous
09/13/25(Sat)01:32:28 No.106571836

Anonymous 09/13/25(Sat)01:32:28 No.106571836

Can you guys give me a list of safetymaxxing companies so I know to ignore their model releases?

Anonymous
09/13/25(Sat)01:34:12 No.106571849

Anonymous 09/13/25(Sat)01:34:12 No.106571849

File: mikulmg.jpg (1.18 MB, 1804x2160)

1.18 MB JPG

>>106571835

Anonymous
09/13/25(Sat)01:35:13 No.106571852

Anonymous 09/13/25(Sat)01:35:13 No.106571852

>>106571836
Pretty much everyone else except Mistral and Chinese..

Anonymous
09/13/25(Sat)01:35:14 No.106571853

Anonymous 09/13/25(Sat)01:35:14 No.106571853

File: mikuholdingsign.png (2.85 MB, 1804x2160)

2.85 MB PNG

Textless, exploitable version.

Anonymous
09/13/25(Sat)01:35:37 No.106571854

Anonymous 09/13/25(Sat)01:35:37 No.106571854

>>106571836
All of them

Anonymous
09/13/25(Sat)01:36:16 No.106571856

Anonymous 09/13/25(Sat)01:36:16 No.106571856

File: mikuholdingsigntransparency.png (2.93 MB, 1804x2160)

2.93 MB PNG

Exploitable transparency version.
Enjoy you are images of official /lmg/ mascot Hatsune Miku!

Anonymous
09/13/25(Sat)01:41:07 No.106571876

Anonymous 09/13/25(Sat)01:41:07 No.106571876

>>106571835
>>106571849
>>106571853
>>106571856
fuck off, spammer.

Anonymous
09/13/25(Sat)01:42:11 No.106571883

Anonymous 09/13/25(Sat)01:42:11 No.106571883

stay, cute normal poster

Anonymous
09/13/25(Sat)01:42:44 No.106571885

Anonymous 09/13/25(Sat)01:42:44 No.106571885

>>106571876
I'm sorry for contributing OC. Really, I am.
I'll go back to enjoying my chat with official /lmg/ roleplaying model Rocinante 1.1, the best roleplaying model made by the best finetuner, TheDrummer!

Anonymous
09/13/25(Sat)01:48:16 No.106571916

Anonymous 09/13/25(Sat)01:48:16 No.106571916

>>106571835
Cute migu

Anonymous
09/13/25(Sat)01:51:45 No.106571940

Anonymous 09/13/25(Sat)01:51:45 No.106571940

>>106571916
Yeah I'm happy with how that artist tag blend turned out.
The key is Namori. Namori tag makes anything cute.

Anonymous
09/13/25(Sat)01:55:08 No.106571956

Anonymous 09/13/25(Sat)01:55:08 No.106571956

>>106569281
fork it and edit out the homing beacon

Anonymous
09/13/25(Sat)01:56:20 No.106571961

Anonymous 09/13/25(Sat)01:56:20 No.106571961

File: 1739206685129105.webm (1.4 MB, 720x720)

1.4 MB WEBM

>>106571553

Anonymous
09/13/25(Sat)02:06:12 No.106572018

Anonymous 09/13/25(Sat)02:06:12 No.106572018

>>106570867
wasn't the mokuro manga thing able to do this already?
>https://github.com/kha-white/mokuro

Anonymous
09/13/25(Sat)02:21:38 No.106572081

Anonymous 09/13/25(Sat)02:21:38 No.106572081

>>106572018
where are you supposed to get the high quality raws for this though

Anonymous
09/13/25(Sat)02:40:27 No.106572139

Anonymous 09/13/25(Sat)02:40:27 No.106572139

File: 1757594794929503.png (984 KB, 1024x1512)

984 KB PNG

>>106569856
>24

Anonymous
09/13/25(Sat)02:43:21 No.106572145

Anonymous 09/13/25(Sat)02:43:21 No.106572145

>>106568374
>>106568414
>>106568426
You do know there is still transformers and that has all the model support and where everything goes first, right? Most of the internet only mentions GGUF because they don't want to waste space to download the raw model and use AWQ for non-fined grained 4/8 bit inference because most people don't overspend for the amount of compute they get and are running <1-2k USD builds for these models.

Anonymous
09/13/25(Sat)02:45:44 No.106572155

Anonymous 09/13/25(Sat)02:45:44 No.106572155

>>106568789
Apple didn't win jack shit when it is slower per dollar and harder to use overall for anything <=128 GB of RAM than AMD's Strix Halo. Maybe their matmul implementation in the A19/M5 is worth a shit but I am leaning towards no unless proven otherwise given how shit Apple is at AI.

Anonymous
09/13/25(Sat)02:48:29 No.106572165

Anonymous 09/13/25(Sat)02:48:29 No.106572165

w y w a
y d h m s
p
o b
d g

Anonymous
09/13/25(Sat)02:49:02 No.106572166

Anonymous 09/13/25(Sat)02:49:02 No.106572166

File: 1740958196202560.jpg (202 KB, 1252x1080)

202 KB JPG

>>106572139
b-but I make up for it with my prompting...

Anonymous
09/13/25(Sat)02:50:17 No.106572171

Anonymous 09/13/25(Sat)02:50:17 No.106572171

my prompts turns 30B models to 500B behemoths

Anonymous
09/13/25(Sat)03:17:59 No.106572287

Anonymous 09/13/25(Sat)03:17:59 No.106572287

File: im2.png (54 KB, 594x170)

54 KB PNG

>>106570867
Haha, nice to see my image still floating around.
8th character like other people said and also the KA hiragana torwards the end.
Damn 2 years and and they all still struggle.
In 2023 I thought we would have a local gaming buddy by now. That I can have in the background translating games with an overlay.

At least drummer finetunes are good enough for lunatranslator. That works pretty well most of the time.
I remember the old ATLAS translations back in the 00s. kek

Anonymous
09/13/25(Sat)03:57:55 No.106572459

Anonymous 09/13/25(Sat)03:57:55 No.106572459

>>106570867
It failed though. There is one big error and three small ones.

Anonymous
09/13/25(Sat)03:59:15 No.106572468

Anonymous 09/13/25(Sat)03:59:15 No.106572468

>>106572459
You're absolutely right anon! It really is a testament to your genius to point this out!

Anonymous
09/13/25(Sat)04:21:03 No.106572569

Anonymous 09/13/25(Sat)04:21:03 No.106572569

File: warllama-2x-mi50-llm-micr(...).jpg (38 KB, 640x480)

38 KB JPG

what did they mean by this

Anonymous
09/13/25(Sat)04:26:49 No.106572592

Anonymous 09/13/25(Sat)04:26:49 No.106572592

>>106572569
>everyone picking up mi50s and v100s despite the next major ML stack software releases for their vendors with AMD and Nvidia dropping them.
I don't get it at all, Even if you had to pay double the price, it is worth still having software support over trying hack things together after that point and praying the Vulkan backend is super optimized one day so you can keep using your card.

Anonymous
09/13/25(Sat)04:29:12 No.106572601

Anonymous 09/13/25(Sat)04:29:12 No.106572601

>>106572592
i meant the little green display but yes the gpu choice is also questionable

Anonymous
09/13/25(Sat)04:34:58 No.106572637

Anonymous 09/13/25(Sat)04:34:58 No.106572637

>>106572592
What could be the reasons for updating your drivers? The last time I assembled my LLM machine was last year and I had to downgrade drivers for better power savings, it still works to this day. The only thing I've heard about these drivers is that they become slower in newer versions, and power efficiency on idle has been broken for more than a year now

Anonymous
09/13/25(Sat)04:39:40 No.106572653

Anonymous 09/13/25(Sat)04:39:40 No.106572653

And when it comes to AMD drivers, if you find a version that somewhat works, you'd better never touch it again

Anonymous
09/13/25(Sat)04:41:59 No.106572669

Anonymous 09/13/25(Sat)04:41:59 No.106572669

>>106572601
Oh didn't notice. Yeah, won't comment on that. I still think microATX is way too small to fit cards like that even with blowers but I guess that's why noise is never a factor to consider.
>>106572637
Depends on what card you have. Ada and now Blackwell are still getting performance improvements and fixes. If you locked your hardware stack now especially on Blackwell, you're missing out on Nvidia actually providing value in unbreaking shit, although to be fair, it's shit they broke in the first place. CUDA also does get a bunch of API changes between major releases.
>>106572653
For AMD, you especially want to run nightly ROCm if you can build it yourself.
Of course, that's from a developer/tinkerer standpoint. If you want shit to just work, then okay, you do you in order to keep software stability at all costs.

Anonymous
09/13/25(Sat)04:47:35 No.106572696

Anonymous 09/13/25(Sat)04:47:35 No.106572696

just don't use AYYMD and you will be happy

llama.cpp CUDA dev !!yhbFjk57TDr
09/13/25(Sat)04:52:56 No.106572714

llama.cpp CUDA dev !!yhbFjk57TDr 09/13/25(Sat)04:52:56 No.106572714

>>106570386
Unless someone changed it when I wasn't looking, the default in llama.cpp is to use the GPU with index 0 as "main GPU".
Note that the order in which llama.cpp/ggml receives GPUs is not necessarily consistent with the one reported elsewhere.

>>106572592
Essentially all GPUs you buy are a depreciating asset.
Even if you have to replace them earlier and they end up as e-waste that may have been a better deal than buying and later selling a more expensive GPU.
Though as long as there are drivers I intend to maintain llama.cpp/ggml support for Pascal and Vega (Mi50).

Anonymous
09/13/25(Sat)04:56:20 No.106572729

Anonymous 09/13/25(Sat)04:56:20 No.106572729

>>106572669
Have you ever experienced a t/s increase after updating nvidia drivers?
>>106572696
People who buy AMD are either desperate enough or in it for the ride. Someone has to finish off that lunatic extra phantasm, you know

Anonymous
09/13/25(Sat)04:57:55 No.106572734

Anonymous 09/13/25(Sat)04:57:55 No.106572734

>>106570295
>So, a general instruct model lost to a model that was specialized for coding at coding, and that's supposed to be a mark against the general instruct model?
the general instruct usually did better than the previous coder focused model, yes.
Qwen 3 instructs (the general instruct, not coder) are better than 2.5 coder.
A new model being worse than previous model is a sign that the tech is stalling.

Anonymous
09/13/25(Sat)05:16:51 No.106572794

Anonymous 09/13/25(Sat)05:16:51 No.106572794

What's the best model I can run on my RTX 3060?
I tried Cydonia 22b q5 and Rocinante 12b q8, but im not sure if im using low tier stuff, it's been ages since I last used ai chatbots

Anonymous
09/13/25(Sat)05:38:41 No.106572883

Anonymous 09/13/25(Sat)05:38:41 No.106572883

>>106572794
>Cydonia
>Rocinante
finetroon users are a lost cause

Anonymous
09/13/25(Sat)05:42:51 No.106572897

Anonymous 09/13/25(Sat)05:42:51 No.106572897

>>106572883
>mikutroon opinion
discarded

>>106572794
theres a newer cydonia and valkyrie
https://huggingface.co/TheDrummer/Cydonia-24B-v4.1-GGUF

https://huggingface.co/TheDrummer/Valkyrie-49B-v2-GGUF

Anonymous
09/13/25(Sat)05:46:50 No.106572916

Anonymous 09/13/25(Sat)05:46:50 No.106572916

>>106568645
...Will I have to take the comfy troon pill?

Anonymous
09/13/25(Sat)05:52:59 No.106572948

Anonymous 09/13/25(Sat)05:52:59 No.106572948

>>106568659
>mlx-community/Qwen3-Next-80B-A3B-Instruct-8bit/blob/main/config.json
> "group_size": 64,
dumbasses

Anonymous
09/13/25(Sat)05:54:29 No.106572953

Anonymous 09/13/25(Sat)05:54:29 No.106572953

>tr**n
>tr**n
>tr**n
>tr**n
obesed!

Anonymous
09/13/25(Sat)06:13:17 No.106573036

Anonymous 09/13/25(Sat)06:13:17 No.106573036

Mistral
Large
3

Anonymous
09/13/25(Sat)06:14:24 No.106573040

Anonymous 09/13/25(Sat)06:14:24 No.106573040

>>106573036
neber ever

Anonymous
09/13/25(Sat)06:15:00 No.106573045

Anonymous 09/13/25(Sat)06:15:00 No.106573045

>>106572948
I know the default is 128, but I wonder why they changed that

Anonymous
09/13/25(Sat)06:18:57 No.106573073

Anonymous 09/13/25(Sat)06:18:57 No.106573073

>>106573045
>128
at that point it's literally retarded, you're supposed to quant it to 32gs. lower is better

Anonymous
09/13/25(Sat)06:30:04 No.106573118

Anonymous 09/13/25(Sat)06:30:04 No.106573118

>>106573036
ugh i need it so much

Anonymous
09/13/25(Sat)06:36:15 No.106573151

Anonymous 09/13/25(Sat)06:36:15 No.106573151

Update on Qwen3-next goofs?

Anonymous
09/13/25(Sat)06:40:25 No.106573171

Anonymous 09/13/25(Sat)06:40:25 No.106573171

>>106573151
2 weeks more
https://github.com/ggml-org/llama.cpp/issues/15940#issuecomment-3286596522

Anonymous
09/13/25(Sat)06:43:02 No.106573185

Anonymous 09/13/25(Sat)06:43:02 No.106573185

File: 1733246660561543.jpg (558 KB, 1600x2400)

558 KB JPG

>>106572714
>Essentially all GPUs you buy are a depreciating asset.
My used 3090 I bought 3 years ago is worth about 30% more now than when I bought it

Anonymous
09/13/25(Sat)06:45:40 No.106573199

Anonymous 09/13/25(Sat)06:45:40 No.106573199

>>106573171
sir no
>This is a massive task, likely 2-3 months of full-time work for a highly specialized engineer.
this no goods

Anonymous
09/13/25(Sat)06:50:00 No.106573224

Anonymous 09/13/25(Sat)06:50:00 No.106573224

>>106573171
>MXFP4
>Successfully quantized the Qwen/Qwen3-Next-80B-A3B-Instruct model to the MXFP4 format, with expert layers quantized to MXFP4 and other layers retaining their original precision. The model size has been reduced from 159GB to 45GB.
seems like sama's shitty model was useful after all

Anonymous
09/13/25(Sat)06:50:00 No.106573226

Anonymous 09/13/25(Sat)06:50:00 No.106573226

File: pardon.jpg (45 KB, 637x518)

45 KB JPG

>>106573151
>run a prompt with 0 temperature
>get result
>run it again with 0 temperature
>slightly different result
Why is it so fucking hard to make a robust rounding in GPU calculations, are they stupid?

Anonymous
09/13/25(Sat)06:51:28 No.106573231

Anonymous 09/13/25(Sat)06:51:28 No.106573231

>>106573199
paging dr cuda dev, drop what you're doing and get on it

Anonymous
09/13/25(Sat)06:52:53 No.106573235

Anonymous 09/13/25(Sat)06:52:53 No.106573235

>>106573224
>seems like sama's shitty model was useful after all
You can say that once you run benchmarks comparing mxfp4 to other q4 quants on that model.

Anonymous
09/13/25(Sat)06:52:56 No.106573236

Anonymous 09/13/25(Sat)06:52:56 No.106573236

>>106573226
>He didn't fix the seed

Anonymous
09/13/25(Sat)06:56:53 No.106573260

Anonymous 09/13/25(Sat)06:56:53 No.106573260

>>106573226
result from the second one on should be the same

Anonymous
09/13/25(Sat)06:58:43 No.106573270

Anonymous 09/13/25(Sat)06:58:43 No.106573270

>>106573036
What's the point? Dense lost

Anonymous
09/13/25(Sat)06:58:49 No.106573271

Anonymous 09/13/25(Sat)06:58:49 No.106573271

local is getting more relevant and popular because, spoiler alert, all these cloud NIGGERS are serving comped and quantmaxxed garbage!

Anonymous
09/13/25(Sat)07:00:23 No.106573279

Anonymous 09/13/25(Sat)07:00:23 No.106573279

>>106573226
top-k 1

Anonymous
09/13/25(Sat)07:02:01 No.106573289

Anonymous 09/13/25(Sat)07:02:01 No.106573289

>>106573271
>all these cloud NIGGERS are serving comped and quantmaxxed garbage
Good.

Anonymous
09/13/25(Sat)07:02:26 No.106573292

Anonymous 09/13/25(Sat)07:02:26 No.106573292

>>106573271
we're in peak race to bottoms phase

Anonymous
09/13/25(Sat)07:07:32 No.106573324

Anonymous 09/13/25(Sat)07:07:32 No.106573324

File: 1729161978418371.jpg (607 KB, 1080x1920)

607 KB JPG

>>106573292
This is the only bottom I'm racing to

Anonymous
09/13/25(Sat)07:12:28 No.106573351

Anonymous 09/13/25(Sat)07:12:28 No.106573351

>>106573324
we must get behind this

Anonymous
09/13/25(Sat)07:15:55 No.106573367

Anonymous 09/13/25(Sat)07:15:55 No.106573367

>>106573270
You thought it was going to be dense?

Anonymous
09/13/25(Sat)07:18:14 No.106573379

Anonymous 09/13/25(Sat)07:18:14 No.106573379

>>106573236
I seeded your mother
>>106573260
>>106573279
I know what the problem is, I literally wrote it in my post. It depends on the order calculations are made as they introduce rounding errors. In real life a + (b + c) = (a + b) + c, but not on GPU. On GPU it will give you two different results and the errors stacks up until they flip a token and from there it's over.

I'm asking if it's fixable in a reasonable manner by people writing inference engines.

Anonymous
09/13/25(Sat)07:24:46 No.106573409

Anonymous 09/13/25(Sat)07:24:46 No.106573409

>>106573379
cuda said it wasn't worth its time iirc

Anonymous
09/13/25(Sat)07:27:40 No.106573423

Anonymous 09/13/25(Sat)07:27:40 No.106573423

Whats the point of doing all this?
Why even have private LLM?

Anonymous
09/13/25(Sat)07:28:23 No.106573425

Anonymous 09/13/25(Sat)07:28:23 No.106573425

>>106573379
temp 0 only weights the tokens. Sampling still happens. top-k 1 picks the first one always, removing the chance for other samplers to interfere. The first token won't change.

Anonymous
09/13/25(Sat)07:29:41 No.106573431

Anonymous 09/13/25(Sat)07:29:41 No.106573431

>>106573423
>private LLM
The clue is in the name.

Anonymous
09/13/25(Sat)07:30:20 No.106573435

Anonymous 09/13/25(Sat)07:30:20 No.106573435

>>106573425
anon...

Anonymous
09/13/25(Sat)07:31:58 No.106573441

Anonymous 09/13/25(Sat)07:31:58 No.106573441

>>106573435
Check your probs for all the tokens you generate. top-k doesn't have that problem.

Anonymous
09/13/25(Sat)07:32:42 No.106573447

Anonymous 09/13/25(Sat)07:32:42 No.106573447

>>106573423
some people like owning things.

Anonymous
09/13/25(Sat)07:35:20 No.106573467

Anonymous 09/13/25(Sat)07:35:20 No.106573467

>>106573441
Obviously I have deterministic samplers. It still flips tokens for the reason I explained above. top-k 1 won't do shit if the top token changes between generations.

Anonymous
09/13/25(Sat)07:35:40 No.106573469

Anonymous 09/13/25(Sat)07:35:40 No.106573469

>>106573447
that sounds awful

Anonymous
09/13/25(Sat)07:40:34 No.106573491

Anonymous 09/13/25(Sat)07:40:34 No.106573491

>>106573469
it really is. it is much more exciting to gamble if my work flow will continue working when the corpos 'update' the models.

Anonymous
09/13/25(Sat)07:48:59 No.106573519

Anonymous 09/13/25(Sat)07:48:59 No.106573519

>>106573467 (me)
To these who are still confused:
>CPU does the calculations sequentially, GPU splits operations and do them in parallel
>the order in which the calculations are made is more or less random
>because it's random, you don't control how rounding errors are introduced
> (a+b) + c is not equal to a + (b+c) meaning you will get different results depending on GPU whims
>this give us micro errors that can sometimes flip top tokens

You can see the numerical errors I'm talking about in action if you run this:

a = 1.0
b = 0.00000001
c = 0.00000001

result_1 = (a + b) + c
result_2 = a + (b + c)

print(result_1)
print(result_2)

I still think cuda dev should make an option to force sequential operations in the necessary places to make results reproducible for the sake of having a baseline for experimentation.

Anonymous
09/13/25(Sat)07:53:29 No.106573535

Anonymous 09/13/25(Sat)07:53:29 No.106573535

Large Migu's Galore?

Anonymous
09/13/25(Sat)07:56:03 No.106573551

Anonymous 09/13/25(Sat)07:56:03 No.106573551

>>106573535
I prefer them small

Anonymous
09/13/25(Sat)08:03:04 No.106573578

Anonymous 09/13/25(Sat)08:03:04 No.106573578

File: uoh.jpg (62 KB, 680x680)

62 KB JPG

>>106573551

Anonymous
09/13/25(Sat)08:04:54 No.106573588

Anonymous 09/13/25(Sat)08:04:54 No.106573588

>>106573324
PANTYHOSE

Anonymous
09/13/25(Sat)08:06:41 No.106573603

Anonymous 09/13/25(Sat)08:06:41 No.106573603

>>106573551
Obviously, small things are good

Anonymous
09/13/25(Sat)08:07:33 No.106573610

Anonymous 09/13/25(Sat)08:07:33 No.106573610

>>106573519
Reproductibility will murder your t/s. Just read retard https://docs.pytorch.org/docs/stable/notes/randomness.html

Anonymous
09/13/25(Sat)08:10:53 No.106573634

Anonymous 09/13/25(Sat)08:10:53 No.106573634

>>106573610
>Reproductibility will murder your t/s
I'm aware of that

Anonymous
09/13/25(Sat)08:11:17 No.106573637

Anonymous 09/13/25(Sat)08:11:17 No.106573637

>>106569281
>not using a firewall
ngmi

Anonymous
09/13/25(Sat)08:13:51 No.106573650

Anonymous 09/13/25(Sat)08:13:51 No.106573650

>>106569281
>sends your data
It just pings an IP. Do you know what that means? Of course you don't, retard.

llama.cpp CUDA dev !!yhbFjk57TDr
09/13/25(Sat)08:15:48 No.106573660

llama.cpp CUDA dev !!yhbFjk57TDr 09/13/25(Sat)08:15:48 No.106573660

>>106573519
The ggml CUDA backend is deterministic, All operations are always done in the exact same order.
However, when you use prompt caching this is done by re-running the model evaluation for the last token of the prompt only.
As a consequence the KV cache/logits of the first token are slightly different and you can get different results.
For 100% reproducible results with prompt caching one would have to implement caching the prompt only in chunks of the physical batch size.

Anonymous
09/13/25(Sat)08:29:02 No.106573722

Anonymous 09/13/25(Sat)08:29:02 No.106573722

>>106569281
link the line of code that does those things

Anonymous
09/13/25(Sat)08:59:44 No.106573887

Anonymous 09/13/25(Sat)08:59:44 No.106573887

>>106572734
>>106570295
qwe3 coder 30b is a3b
stupid niggers!

Anonymous
09/13/25(Sat)09:17:06 No.106573977

Anonymous 09/13/25(Sat)09:17:06 No.106573977

>>106573185
i can buy 3090 for 470 euro in my country

Anonymous
09/13/25(Sat)09:18:13 No.106573982

Anonymous 09/13/25(Sat)09:18:13 No.106573982

>>106573977
I can get an RTX 6000 for half that

Anonymous
09/13/25(Sat)09:20:30 No.106573990

Anonymous 09/13/25(Sat)09:20:30 No.106573990

>>106573982
fine bro ill post the site i dont care if anons buy all 3090s in my country i wont be buying them anytime soon anyways
https://www.kupujemprodajem.com/kompjuteri-desktop/graficke-kartice/inno3d-ichill-rtx-3090-rtx3090/oglas/181677213?filterId=7125152428
https://www.kupujemprodajem.com/kompjuteri-desktop/graficke-kartice/rtx-3090-3080-3070-3060-1070-rx6900-6700-5700-580/oglas/141810340?filterId=7125152428

Anonymous
09/13/25(Sat)09:23:31 No.106574004

Anonymous 09/13/25(Sat)09:23:31 No.106574004

File: Untitled.png (93 KB, 961x88)

93 KB PNG

Trying windows 11... Why does it just instantly quit? I have no idea what's wrong because it tells me nothing.

Anonymous
09/13/25(Sat)09:24:35 No.106574010

Anonymous 09/13/25(Sat)09:24:35 No.106574010

>>106573977
1.6k aud lmao

Anonymous
09/13/25(Sat)09:25:34 No.106574016

Anonymous 09/13/25(Sat)09:25:34 No.106574016

>>106574004
try -v for verbose output maybe.

Anonymous
09/13/25(Sat)09:27:38 No.106574029

Anonymous 09/13/25(Sat)09:27:38 No.106574029

File: Untitled.png (91 KB, 961x91)

91 KB PNG

>>106574016

Anonymous
09/13/25(Sat)09:30:20 No.106574045

Anonymous 09/13/25(Sat)09:30:20 No.106574045

>>106574029
maybe try renaming the model so it doesn't have spaces and dashes and shit?

Anonymous
09/13/25(Sat)09:33:02 No.106574071

Anonymous 09/13/25(Sat)09:33:02 No.106574071

>>106573171
>I asked GPT5 Codex to get a view of the work to be done, it's monstrous...
this is why "social media" style coding (github) is cancer
randos should not be allowed to post on issues or do PR
much less randos who are just copy pasting shit in chatgpt and pasting back the answer

Anonymous
09/13/25(Sat)09:33:20 No.106574072

Anonymous 09/13/25(Sat)09:33:20 No.106574072

the silence is deafening

Anonymous
09/13/25(Sat)09:34:50 No.106574080

Anonymous 09/13/25(Sat)09:34:50 No.106574080

File: Untitled.png (58 KB, 960x69)

58 KB PNG

>>106574045

Anonymous
09/13/25(Sat)09:35:30 No.106574084

Anonymous 09/13/25(Sat)09:35:30 No.106574084

>>106574080
maybe install linux?

Anonymous
09/13/25(Sat)09:36:03 No.106574089

Anonymous 09/13/25(Sat)09:36:03 No.106574089

>>106574080
your shit's right fucked mate

Anonymous
09/13/25(Sat)09:36:05 No.106574090

Anonymous 09/13/25(Sat)09:36:05 No.106574090

i need new sex. is qwen next good?

Anonymous
09/13/25(Sat)09:36:14 No.106574093

Anonymous 09/13/25(Sat)09:36:14 No.106574093

>>106574080
I've had the same problem back when I was on windows and the only solution was to compile the binary myself. The one from github ci just refused to work with my drivers.

Anonymous
09/13/25(Sat)09:37:05 No.106574097

Anonymous 09/13/25(Sat)09:37:05 No.106574097

File: 1754300617278694.gif (1023 KB, 497x352)

1023 KB GIF

>>106574090
>a3b

Anonymous
09/13/25(Sat)09:37:22 No.106574101

Anonymous 09/13/25(Sat)09:37:22 No.106574101

>>106574080
wow, okay, I'm all out of ideas. I suppose you could try building it from source or downloading a different version.

Anonymous
09/13/25(Sat)09:45:15 No.106574132

Anonymous 09/13/25(Sat)09:45:15 No.106574132

>>106574080
You do have CUDA 12.4 installed, right?

Anonymous
09/13/25(Sat)09:49:32 No.106574160

Anonymous 09/13/25(Sat)09:49:32 No.106574160

>>106574071
That's pretty much same everywhere on internet. Post a mod on Nexus for example, and these weird retarded posts immediately come out of nowhere... Why would other people's work should be allowed to be commented by strangers anyway unless they are actually part of the team. Pure demoralising cancer.

Anonymous
09/13/25(Sat)09:50:34 No.106574167

Anonymous 09/13/25(Sat)09:50:34 No.106574167

>>106574132
You do not need to install cuda on windows.
cudart-llama-bin-win-cuda-12.4-x64.zip provides the necessary runtime files and is distributed on the github releases alongside the binaries for the CUDA version of lamer.cpp

Anonymous
09/13/25(Sat)09:52:25 No.106574177

Anonymous 09/13/25(Sat)09:52:25 No.106574177

File: Untitled.png (43 KB, 982x50)

43 KB PNG

>>106574089
But why? Shouldn't it at least inform me if something's wrong? What kind of code just instantly exits without anything, even with a verbose flag?

>>106574101
I went with kobold cpp...

It's still around 15 tk/s. Seems like it's a windows issue with my hardware? >>106574084 On linux I get over 50tk/s, but I'm not a linux user, and I don't want to have to switch between operating systems every time I want to ask an AI something.

A few threads ago I thought it was maybe windows 10... but windows 11 is also fucked.

I also installed SystemPanic/vllm-windows, but that had a problem with pyzmq, and I couldn't get it to run with multiple gpus. Single gpu works fine, but I never had a problem with the single gpu performance in the first place.

>>106574132
Yeah, cuda 12.4 and driver version 552. Also tested with 12.8, and driver version 571.

Anonymous
09/13/25(Sat)09:56:04 No.106574196

Anonymous 09/13/25(Sat)09:56:04 No.106574196

>>106574097
I am tired of people hating on low A count. Low A count is the future. Attention and context on GPU and a fuckswarm of fucksmall experts on CPU is the future. It is all a training problem and I am not training so corpo nerds have to solve it so I can jerk off in peace.

Anonymous
09/13/25(Sat)09:58:41 No.106574208

Anonymous 09/13/25(Sat)09:58:41 No.106574208

>>106574196
Big models with low active parameters are useless outside of benchmaxxing, a model doesn't need intelligence if it's just pulling complete answers from memory.

Anonymous
09/13/25(Sat)09:59:40 No.106574215

Anonymous 09/13/25(Sat)09:59:40 No.106574215

>>106574196
Anything less than 30B active is too retarded for anything but trivia recall and the most simplest of tasks.

Anonymous
09/13/25(Sat)10:00:54 No.106574225

Anonymous 09/13/25(Sat)10:00:54 No.106574225

>>106574177
windows just isnt for ai, whats

Anonymous
09/13/25(Sat)10:01:35 No.106574229

Anonymous 09/13/25(Sat)10:01:35 No.106574229

>>106566836
Qwen3-Next or GLM Air for storytelling and roleplay?

Anonymous
09/13/25(Sat)10:02:05 No.106574234

Anonymous 09/13/25(Sat)10:02:05 No.106574234

>>106574225
cuda dev gets 80 tk/s on triple 4090s on his windows though

Anonymous
09/13/25(Sat)10:04:06 No.106574247

Anonymous 09/13/25(Sat)10:04:06 No.106574247

>>106574229
Nemo

Anonymous
09/13/25(Sat)10:04:35 No.106574250

Anonymous 09/13/25(Sat)10:04:35 No.106574250

>>106574208
Agreed. I love that model I use to ERP that never pulls complete answers from memory. I forgot the name of it though....

Anonymous
09/13/25(Sat)10:10:53 No.106574292

Anonymous 09/13/25(Sat)10:10:53 No.106574292

File: 1740360483705753.jpg (46 KB, 980x540)

46 KB JPG

>>106574247

Anonymous
09/13/25(Sat)10:14:29 No.106574320

Anonymous 09/13/25(Sat)10:14:29 No.106574320

>>106574292
Go fuck yourself straight to reddit.

Anonymous
09/13/25(Sat)10:15:27 No.106574326

Anonymous 09/13/25(Sat)10:15:27 No.106574326

>>106574320
t. infiltrator ledditor

Anonymous
09/13/25(Sat)10:16:56 No.106574335

Anonymous 09/13/25(Sat)10:16:56 No.106574335

>>106574229
probably glm. qwen models are really into that thing where they go "It's not X. It's Y."

Anonymous
09/13/25(Sat)10:17:10 No.106574336

Anonymous 09/13/25(Sat)10:17:10 No.106574336

>>106571849
I like this Miku

Anonymous
09/13/25(Sat)10:19:27 No.106574348

Anonymous 09/13/25(Sat)10:19:27 No.106574348

>>106574292
I like this Miku

Anonymous
09/13/25(Sat)11:02:55 No.106574618

Anonymous 09/13/25(Sat)11:02:55 No.106574618

Is there any point to LLMs when free llm with better data like deepseek and aistudio exist? It feels like I wasted money on the P40 I got

Anonymous
09/13/25(Sat)11:10:47 No.106574663

Anonymous 09/13/25(Sat)11:10:47 No.106574663

>>106574618
You should have thought of that before buying it

Anonymous
09/13/25(Sat)11:11:18 No.106574667

Anonymous 09/13/25(Sat)11:11:18 No.106574667

>>106574618
no, you have discovered the hole in the system
you have just fully invalidated all of /lmg/ and the very existence of this general

Anonymous
09/13/25(Sat)11:14:27 No.106574685

Anonymous 09/13/25(Sat)11:14:27 No.106574685

>>106574618
Its yours its offline and importantly for workflows, it doesnt change. You wont get random drops in performance because the company wants to tweak or save money.

Anonymous
09/13/25(Sat)11:25:44 No.106574738

Anonymous 09/13/25(Sat)11:25:44 No.106574738

File: 1732400855213371.png (30 KB, 946x238)

30 KB PNG

Look what anniversary is coming up in just under two weeks. This is when they will drop Large 3. It's the perfect occasion.

Anonymous
09/13/25(Sat)11:26:54 No.106574742

Anonymous 09/13/25(Sat)11:26:54 No.106574742

>>106574685
>its yours
This means if something goes wrong you can shift blame to the provider
>its offline
This means you have to pay for cost of hosting and ensuring availability
>it doesnt change
This means you will never be the first to get the new features and fixes

All the positives about local are viewed as negatives by apifags

Anonymous
09/13/25(Sat)11:34:26 No.106574778

Anonymous 09/13/25(Sat)11:34:26 No.106574778

>>106574738
the only anniversary I will celebrate here is the one that will celebrate their death anniversary when funding dries out

Anonymous
09/13/25(Sat)11:35:03 No.106574786

Anonymous 09/13/25(Sat)11:35:03 No.106574786

>>106574738
Insider here: Mistral is being refactored. This means safety factoring them, but also two new models - Mistral Small 4.0 and Large 4.0.

Anonymous
09/13/25(Sat)11:37:18 No.106574796

Anonymous 09/13/25(Sat)11:37:18 No.106574796

I just had an idea (that I won't do myself), what if you run one pass through the chat with your model without continuing, instead just use the attentions to find out which messages/paragraphs are more important right now, and then make your entire chat history just an RAG in the second pass?

Anonymous
09/13/25(Sat)11:37:33 No.106574799

Anonymous 09/13/25(Sat)11:37:33 No.106574799

>>106574786
Last news was that they might be bought out by Apple. Presumably because their attempt to relocate themselves to California failed.

Anonymous
09/13/25(Sat)11:39:33 No.106574809

Anonymous 09/13/25(Sat)11:39:33 No.106574809

>>106574786
Never trust a frenchy.
Never trust anyone else.

Anonymous
09/13/25(Sat)11:40:19 No.106574814

Anonymous 09/13/25(Sat)11:40:19 No.106574814

I believe anon,
but not Anon

Anonymous
09/13/25(Sat)11:41:23 No.106574818

Anonymous 09/13/25(Sat)11:41:23 No.106574818

Why local when API?

Anonymous
09/13/25(Sat)11:41:30 No.106574819

Anonymous 09/13/25(Sat)11:41:30 No.106574819

>>106574799
Last news is this:
https://www.asml.com/en/news/press-releases/2025/asml-mistral-ai-enter-strategic-partnership

> ASML, Mistral AI enter strategic partnership
>
> Companies agree on long-term collaboration deal to benefit ASML customers
> ASML to lead Mistral AI’s Series C funding round investing €1.3 billion

Anonymous
09/13/25(Sat)11:46:24 No.106574848

Anonymous 09/13/25(Sat)11:46:24 No.106574848

>>106573379
It's impossible.

Anonymous
09/13/25(Sat)11:46:32 No.106574850

Anonymous 09/13/25(Sat)11:46:32 No.106574850

>>106574685
so it might be useful in the future when the companies decide to charge us 1000 dollars per month but not now? got it

Anonymous
09/13/25(Sat)11:47:38 No.106574857

Anonymous 09/13/25(Sat)11:47:38 No.106574857

>>106574819
100% misuse of corporate assets from a French (ASML's CEO is a french faggot) to support another French in what is likely to be corruption.
What a way to throw away a billion.

Anonymous
09/13/25(Sat)11:49:17 No.106574864

Anonymous 09/13/25(Sat)11:49:17 No.106574864

>>106574857
they're just putting in a quick dirty $1.2b so that they can make apple pay them $3b in four months when they finally decide to buy up mistral

Anonymous
09/13/25(Sat)11:55:48 No.106574900

Anonymous 09/13/25(Sat)11:55:48 No.106574900

>>106574864
https://www.devdiscourse.com/article/politics/3200518-former-french-finance-minister-le-maire-joins-asml-as-adviser
>Former French Finance Minister Le Maire Joins ASML as Adviser
No, you are wrong and as a French I can guarantee this is yet another affair of corruption on our part
Not only a French CEO but also one of the biggest piece of shit of recent political history is taking part in this mess
the only thing that saves ASML is that they have a monopoly on EUV otherwise the rot that is currently beginning to eat them at the core is the sort that can kill a corporation

Anonymous
09/13/25(Sat)11:57:38 No.106574919

Anonymous 09/13/25(Sat)11:57:38 No.106574919

For a full private waifu stack at good speed you need 6 RTX Pro 6000 and some other 24+GB GPU, right? Four to run GLM-4.5, two for GLM-4.5V for vision and the other GPU for VibeVoice 7B.
I hope 29GB are enough for 128k context.
>5120*2*92*128000 = 112 GB
Fuck. So another RTX Pro 6000, two to be safe, and hope that TP=5 or TP=6 work well.

Anonymous
09/13/25(Sat)12:00:26 No.106574940

Anonymous 09/13/25(Sat)12:00:26 No.106574940

>>106574919
>Four to run GLM-4.5, two for GLM-4.5V for vision
Why not just use GLM-4.5V for both text and vision? 4 RTX Pro 6000 is not worth running GLM-4.5 over Air/V especially when your usecase is waifu talk.

Anonymous
09/13/25(Sat)12:11:48 No.106575011

Anonymous 09/13/25(Sat)12:11:48 No.106575011

>>106574940
Because the full run is much better and within reach.

Anonymous
09/13/25(Sat)12:12:41 No.106575020

Anonymous 09/13/25(Sat)12:12:41 No.106575020

>>106574900
What did you expect from Macron's government? It's already like this for several industries. Mistral got gov gibs too so they're cashing out taxpayers' money.
t. french

Anonymous
09/13/25(Sat)12:18:38 No.106575060

Anonymous 09/13/25(Sat)12:18:38 No.106575060

>>106575020
isn't all tax money going to boomers in france

Anonymous
09/13/25(Sat)12:29:22 No.106575123

Anonymous 09/13/25(Sat)12:29:22 No.106575123

how do you run glm4.5v?

Anonymous
09/13/25(Sat)12:30:18 No.106575131

Anonymous 09/13/25(Sat)12:30:18 No.106575131

>>106575123
very carefully

Anonymous
09/13/25(Sat)12:31:00 No.106575140

Anonymous 09/13/25(Sat)12:31:00 No.106575140

>>106575131
but theres no mmproj :( like I thought I could run it with glm air since it's based on it :(

Anonymous
09/13/25(Sat)12:31:58 No.106575144

Anonymous 09/13/25(Sat)12:31:58 No.106575144

>>106575131
Because she's fat and obese.

Anonymous
09/13/25(Sat)12:33:53 No.106575163

Anonymous 09/13/25(Sat)12:33:53 No.106575163

Ling
>https://huggingface.co/inclusionAI/Ling-mini-2.0
Ring
>https://huggingface.co/inclusionAI/Ring-mini-2.0
16B 1.4A
Here's hoping it's somehow at least as good as Qwen 30BA3.

Anonymous
09/13/25(Sat)12:35:47 No.106575181

Anonymous 09/13/25(Sat)12:35:47 No.106575181

File: Screenshot_20250913_163448.png (453 KB, 980x963)

453 KB PNG

>>106570867
Wanted a quick sanity check on my memory here so I ran this on Gemma 3 27B Q8 with BF16 mmproj.
I don't know Japanese. I see one definite error near the end. Are the others errors?

Anonymous
09/13/25(Sat)12:39:27 No.106575218

Anonymous 09/13/25(Sat)12:39:27 No.106575218

>>106575202
>>106575202
>>106575202

Anonymous
09/13/25(Sat)12:54:02 No.106575322

Anonymous 09/13/25(Sat)12:54:02 No.106575322

>>106575144
So rude.

Anonymous
09/13/25(Sat)12:54:50 No.106575331

Anonymous 09/13/25(Sat)12:54:50 No.106575331

>>106575181
Impressive, it actually got the kanji all other models fail at.

Anonymous
09/13/25(Sat)13:18:35 No.106575495

Anonymous 09/13/25(Sat)13:18:35 No.106575495

https://youtu.be/7Jzjl3eWMA0?t=117
Women raping gay billionare werewolf writers sounds unsafe. But their fucked up fetishes are somehow safe. I hate this world.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.