/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/09/25(Thu)00:42:44 No.106834517

File: marshmallow.jpg (82 KB, 1280x720)

82 KB JPG

/lmg/ - Local Models General Anonymous 10/09/25(Thu)00:42:44 No.106834517 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106829402 & >>106822756

►News
>(10/08) Ling-1T released: https://hf.co/inclusionAI/Ling-1T
>(10/07) Release: LFM2-8b-A1b: Hybrid attention tiny MoE: https://liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts
>(10/07) NeuTTS Air released, built off Qwen 0.5B: https://hf.co/neuphonic/neutts-air
>(10/06) Anthropic open sources Petri, a parallel exploration tool: https://anthropic.com/research/petri-open-source-auditing
>(10/03) Qwen3-VL-30B-A3B released: https://hf.co/Qwen/Qwen3-VL-30B-A3B-Thinking

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/09/25(Thu)00:43:46 No.106834521

Anonymous 10/09/25(Thu)00:43:46 No.106834521

File: 1743930208988058.jpg (200 KB, 1987x1151)

200 KB JPG

►Recent Highlights from the Previous Thread: >>106829402

--Papers:
>106833679
--Optimizing MoE model inference through precise GPU layer offloading and expert distribution:
>106833203 >106833249 >106833351 >106833358 >106833377 >106833419 >106833425 >106833427 >106833435
--Debating the mechanics and value of "thinking" models in AI:
>106830044 >106830075 >106830151 >106830206 >106830220 >106830277 >106830549 >106830622 >106830761 >106830813 >106830208
--RAM configuration requirements for optimizing LLM performance:
>106831285 >106831307 >106831329 >106831361 >106831393 >106831338
--LLM pretraining constraints on single 3090 GPU with 8k context:
>106831430 >106831498 >106831511 >106831588 >106831646 >106831757 >106831566 >106831804
--Local AI image generation on diverse hardware setups:
>106831180 >106831242 >106831246 >106831273 >106831318 >106831439 >106831457
--Affordable high-performance setup for running quantized models via recycled hardware:
>106832490 >106832530 >106832565 >106832610
--Ling-1T model release and hardware accessibility challenges:
>106831637 >106831644 >106831754 >106831790 >106831781 >106831680
--Anon seeks to implement Qwen3 VL support in a custom C inference engine due to llama.cpp limitations:
>106829429 >106830678 >106830706 >106830315
--Developing a safetensors parser for embeddings with CPU inference prioritization:
>106832999 >106833055 >106833127
--Optimizing quantization for AI porn recognition with new vision models:
>106830909 >106830954 >106831069
--Miku (free space):
>106831423 >106832550 >106832579 >106832764 >106832727 >106832768 >106832868 >106832901 >106832996 >106833006 >106833706 >106834083 >106834125 >106834194 >106834210 >106834241

►Recent Highlight Posts from the Previous Thread: >>106829407

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/09/25(Thu)00:45:42 No.106834537

Anonymous 10/09/25(Thu)00:45:42 No.106834537

File: motherboard options.jpg (1.07 MB, 2000x2000)

1.07 MB JPG

repost:
which of these motherboards would be the best for ai? the goal is at least 768gb of ddr5 and 4 dual slot gpus while fitting in a normal case with no risers. the threadripper pro (top right) will probably be the fastest, but also most expensive. the xeon (bottom left) would be the slowest and second most expensive but has room for 16 dimms on the 8 channels. the epyc would be the cheapest and second fastest. they unfortunately do not make a 12 channel motherboard with room for 4 dual slot gpus without the use of risers. unless i just havent been looking hard enough

Anonymous
10/09/25(Thu)00:51:03 No.106834585

Anonymous 10/09/25(Thu)00:51:03 No.106834585

I feel like either Openrouter is silently redirecting me to a shittier provider even though it says it's routing them to Z-AI, or Z-AI is sometimes serving a shittier version of the model.
For 10 minutes the model gets the syntax wrong for tool usage even with fresh context, then it goes back to working normally.

Anonymous
10/09/25(Thu)00:57:43 No.106834634

Anonymous 10/09/25(Thu)00:57:43 No.106834634

>>106834585
Have you tried using the models.... Locally???

Anonymous
10/09/25(Thu)00:58:11 No.106834637

Anonymous 10/09/25(Thu)00:58:11 No.106834637

Has anyone used this before? Does it support local models?
https://www.warp.dev/

Anonymous
10/09/25(Thu)00:59:40 No.106834651

Anonymous 10/09/25(Thu)00:59:40 No.106834651

>>106834537
Why 4 gpus? A couple of Blackwell 6000 pro aren’t enough for you?

Anonymous
10/09/25(Thu)01:00:26 No.106834660

Anonymous 10/09/25(Thu)01:00:26 No.106834660

>>106834651
i have 3 fe 5090s and i want to get a fourth one

Anonymous
10/09/25(Thu)01:10:28 No.106834714

Anonymous 10/09/25(Thu)01:10:28 No.106834714

>>106834660
Honest questions: why wouldn’t you sell 2 of them and buy a 6000 pro 96gb instead of buying a 4th?

Anonymous
10/09/25(Thu)01:13:57 No.106834731

Anonymous 10/09/25(Thu)01:13:57 No.106834731

>>106834714
Go to bed, Jensen.

Anonymous
10/09/25(Thu)01:23:17 No.106834790

Anonymous 10/09/25(Thu)01:23:17 No.106834790

>>106834731
certainly it's cheaper to run one NVIDIA RTX PRO™ 6000 Blackwell Workstation Edition card instead of four RTX 5090s

Anonymous
10/09/25(Thu)01:31:37 No.106834824

Anonymous 10/09/25(Thu)01:31:37 No.106834824

>>106834731
"Not Yet."
Only gamers will get that joke

Anonymous
10/09/25(Thu)01:35:25 No.106834843

Anonymous 10/09/25(Thu)01:35:25 No.106834843

>3090s
>4090s
>5090s
>6000 pro
What is the best to start stacking if you have a $10k/2000w budget?

Anonymous
10/09/25(Thu)01:36:56 No.106834848

Anonymous 10/09/25(Thu)01:36:56 No.106834848

>>106834843
isn't having lots of fast ram more important?

Anonymous
10/09/25(Thu)01:42:34 No.106834872

Anonymous 10/09/25(Thu)01:42:34 No.106834872

File: 1750169987900306.png (273 KB, 1686x911)

273 KB PNG

Have you guys seen this?
https://github.com/huawei-csl/SINQ
https://arxiv.org/abs/2509.22944

Anonymous
10/09/25(Thu)01:45:15 No.106834883

Anonymous 10/09/25(Thu)01:45:15 No.106834883

>>106834843
6000 pro for prompt processing, then DDR5 ram

Anonymous
10/09/25(Thu)01:45:38 No.106834886

Anonymous 10/09/25(Thu)01:45:38 No.106834886

>>106834872
Nobody here will care until they compare it to gguf.

And they never do.

Anonymous
10/09/25(Thu)01:46:41 No.106834898

Anonymous 10/09/25(Thu)01:46:41 No.106834898

>>106834886
jeeguff is quant, perfected.

Anonymous
10/09/25(Thu)01:47:43 No.106834907

Anonymous 10/09/25(Thu)01:47:43 No.106834907

>>106834848
I'd rather run a smaller model or heavily quantized one at 30t/s+ than run SOTA at <5t/s. Especially with high context.
I also think unified memory will be outpaced by AI needs pretty quick. No upgrade path means no buy. That leaves me with stacking GPUs.
>>106834883
Even with 8 or 12 channel ram aren't you getting single digit t/s? Or have things really gotten that much better in the last few months?

Anonymous
10/09/25(Thu)01:52:46 No.106834931

Anonymous 10/09/25(Thu)01:52:46 No.106834931

>>106834907
If you’re stacking gpus, why in the world are you buying 32GB paperweights?

Anonymous
10/09/25(Thu)01:57:50 No.106834960

Anonymous 10/09/25(Thu)01:57:50 No.106834960

>>106834931
I'm not? I'm asking which 90 series gpu is the best to start stacking in late 2025.

Anonymous
10/09/25(Thu)02:07:02 No.106834999

Anonymous 10/09/25(Thu)02:07:02 No.106834999

>>106834907
why not just do both then and have enough regular ram to load in bigger quants? it's not like you are going to get a decent 4+ pci-e slot mobo without getting at least 8 ram slots

Anonymous
10/09/25(Thu)02:24:44 No.106835075

Anonymous 10/09/25(Thu)02:24:44 No.106835075

>>106834907
You might get ~10-12t/s if you minmax llama.cpp -ot trickery + epyc turin cpu + ddr5-6000 sticks + 6000 pro gpu to speed things a bit for non-meme quants

Anonymous
10/09/25(Thu)03:02:57 No.106835225

Anonymous 10/09/25(Thu)03:02:57 No.106835225

One more day until tomorrow. Exciting times.

Anonymous
10/09/25(Thu)03:04:08 No.106835228

Anonymous 10/09/25(Thu)03:04:08 No.106835228

>>106835225
The workweek will end and nothing will happen as usual.

Anonymous
10/09/25(Thu)03:15:44 No.106835293

Anonymous 10/09/25(Thu)03:15:44 No.106835293

>>106835228
New model gets released tomorrow.

Anonymous
10/09/25(Thu)03:15:56 No.106835294

Anonymous 10/09/25(Thu)03:15:56 No.106835294

>>106835225
HUGE

Anonymous
10/09/25(Thu)03:19:32 No.106835307

Anonymous 10/09/25(Thu)03:19:32 No.106835307

>>106834537
Make sure to read the motherboards' manuals before you decide, in particular look out for how many PCIe lanes are actually going to the slots.
There are motherboards with 4 x16 slots but if you actually populate all of them you only get x16/x8/x8/x8.

Consider that DIMMs with a large capacity also have a higher price per GB of memory so having more slots can save you money there.

EPYC motherboards with 12 DDR5 slots: https://geizhals.de/gigabyte-me03-ce1-5me03ce1nr-000-10a-a3148839.html https://geizhals.de/gigabyte-me03-ce0-5me03ce0nr-000-10a-a3148902.html
I can very much recommend Geizhals, excellent site to find (offers for) specific hardware.

My opinion is that Threadripper is only better than EPYC if you need high single core performance like for a desktop PC.

Anonymous
10/09/25(Thu)03:20:13 No.106835314

Anonymous 10/09/25(Thu)03:20:13 No.106835314

>>106835225
every 60 seconds in india, a minute passes, together we can stop this

Anonymous
10/09/25(Thu)03:21:09 No.106835317

Anonymous 10/09/25(Thu)03:21:09 No.106835317

>>106834537
>>106835307
>EPYC motherboards with 12 DDR5 slots
Derp, I can't count.
I thought those were 7 PCIe slots, but it's actually just 6.

Anonymous
10/09/25(Thu)03:42:49 No.106835422

Anonymous 10/09/25(Thu)03:42:49 No.106835422

>>106835307
this is making me dizzy from pure love... nkdsh

Anonymous
10/09/25(Thu)03:48:02 No.106835458

Anonymous 10/09/25(Thu)03:48:02 No.106835458

>>106835225
Meta employee here. You guys are going to love whats coming. bad news for erpers though

Anonymous
10/09/25(Thu)03:57:53 No.106835496

Anonymous 10/09/25(Thu)03:57:53 No.106835496

File: 1747420072118937.png (2.14 MB, 1024x1536)

2.14 MB PNG

>>106835307
>hexa channel
*barfs*

Anonymous
10/09/25(Thu)03:58:04 No.106835497

Anonymous 10/09/25(Thu)03:58:04 No.106835497

>>106835458
Good evening sir, what's your designation at Meta?

Anonymous
10/09/25(Thu)03:59:45 No.106835510

Anonymous 10/09/25(Thu)03:59:45 No.106835510

Pornhub employee here. You guys love coming

Anonymous
10/09/25(Thu)04:07:39 No.106835548

Anonymous 10/09/25(Thu)04:07:39 No.106835548

>>106835458
>bad news for erpers though
is there another purpose for local? what other possible reason can there be to hide my activities?

Anonymous
10/09/25(Thu)04:08:04 No.106835549

Anonymous 10/09/25(Thu)04:08:04 No.106835549

>>106834537
the cool looking one because it says SAGE around pci slots meaning improved performance with sageattn

Anonymous
10/09/25(Thu)04:11:56 No.106835575

Anonymous 10/09/25(Thu)04:11:56 No.106835575

>>106835458
Yeah, all ERPers go straight to jail, meta was spying through the gguf vulnerability all along

Anonymous
10/09/25(Thu)04:33:28 No.106835703

Anonymous 10/09/25(Thu)04:33:28 No.106835703

would it be feasible to have something like q9 and q10 quants?

Anonymous
10/09/25(Thu)04:37:48 No.106835727

Anonymous 10/09/25(Thu)04:37:48 No.106835727

>>106835703
q11 exists: https://arxiv.org/abs/2504.11651

Anonymous
10/09/25(Thu)04:38:00 No.106835730

Anonymous 10/09/25(Thu)04:38:00 No.106835730

>>106835703
Not when perplexity is the main metrics for estimating how good quants are

Anonymous
10/09/25(Thu)04:40:56 No.106835752

Anonymous 10/09/25(Thu)04:40:56 No.106835752

>>106835703
I want reverse quants. Give me q64.

Anonymous
10/09/25(Thu)04:42:34 No.106835756

Anonymous 10/09/25(Thu)04:42:34 No.106835756

>>106835703
Q8 scores so closely to full weights that even synthetic benchmarks can't reliably tell the difference. What would be the point?

Anonymous
10/09/25(Thu)04:45:30 No.106835777

Anonymous 10/09/25(Thu)04:45:30 No.106835777

>>106835730
it isn't, it's our lord and savior KLD

Anonymous
10/09/25(Thu)04:45:36 No.106835778

Anonymous 10/09/25(Thu)04:45:36 No.106835778

>>106835752
>Nemo but it requires 48GB minimum, scores 0.0000001% better than Q8 in one non-reproducible benchmark made by a reddit user

Anonymous
10/09/25(Thu)04:47:47 No.106835794

Anonymous 10/09/25(Thu)04:47:47 No.106835794

>>106835727
thanks

Anonymous
10/09/25(Thu)04:53:35 No.106835824

Anonymous 10/09/25(Thu)04:53:35 No.106835824

>>106835756
Benchmarks are always done on servers where there's many gpus packed inside a single rack and the power is super noisy which introduces random bit flips in your context due to quantum noise interference.
If you did the benchmark in an isolated environment with high quality insulated power cables and quality japanese VRMs you'd definitely be able to tell the difference between Q8 and f32.

Anonymous
10/09/25(Thu)04:56:30 No.106835837

Anonymous 10/09/25(Thu)04:56:30 No.106835837

>>106835777
Forward KLD is the same as perplexity tho

>>106835824
Enterprise GPUs have ECC

Anonymous
10/09/25(Thu)04:59:19 No.106835853

Anonymous 10/09/25(Thu)04:59:19 No.106835853

>>106835837
ECC can't correct this. The quantum wave collapse causes the correction bit to also flip to match the flipped data.
Nemo upscaled to f64 scores 50% better on SimpleQA compared to Q8.

Anonymous
10/09/25(Thu)04:59:40 No.106835856

Anonymous 10/09/25(Thu)04:59:40 No.106835856

>>106835837
ECC doesn't account for eclectic infetterence.

Anonymous
10/09/25(Thu)05:01:49 No.106835866

Anonymous 10/09/25(Thu)05:01:49 No.106835866

>>106835824
based LLMphile

Anonymous
10/09/25(Thu)05:03:41 No.106835875

Anonymous 10/09/25(Thu)05:03:41 No.106835875

>>106835824
Wouldn't a noisier environment favor f32 over int8 because a random bit flip is less likely to be significant?

llama.cpp CUDA dev !!yhbFjk57TDr
10/09/25(Thu)05:04:50 No.106835878

llama.cpp CUDA dev !!yhbFjk57TDr 10/09/25(Thu)05:04:50 No.106835878

>>106835703
It is definitely feasible, whether it's worth the opportunity cost is a different question.
I still intend to develop better software for evaluating model quality since I'm unhappy with the currently available methods.
Once I have that I intend to also make quant formats optimized for efficient compute to better take advantage of CPUs, old datacenter GPUs, and Chinese GPUs.
The first format I will investigate will be something like q7.75 with exactly 8 BPW, I will only look into formats with more BPW if there are statistically significant differences in quality vs. the full model.

>>106835777
llama-perplexity also produces statistics for how the token probabilities change, see e.g. https://github.com/ggml-org/llama.cpp/tree/master/tools/perplexity
On average the probability of sampling the "correct" token with a temperature of 1 went down by 0.02% with q8_0 vs. FP16.

Anonymous
10/09/25(Thu)05:08:40 No.106835893

Anonymous 10/09/25(Thu)05:08:40 No.106835893

Take your meds

Anonymous
10/09/25(Thu)05:08:58 No.106835896

Anonymous 10/09/25(Thu)05:08:58 No.106835896

>>106835878
Any interest in porting the quants from the fork?

llama.cpp CUDA dev !!yhbFjk57TDr
10/09/25(Thu)05:14:21 No.106835939

llama.cpp CUDA dev !!yhbFjk57TDr 10/09/25(Thu)05:14:21 No.106835939

>>106835896
If you mean ik_llama.cpp the answer is a definitive no from me.
There is an ongoing conflict over attribution between ggerganov and ikawrakov and my current policy is not to look at or interact with ik_llama.cpp at all.
Given my personal strengths and weaknesses and my poor understanding of their prior history I think it's a more efficient use of my time to implement things myself than to try and mediate their conflict.

Anonymous
10/09/25(Thu)06:23:53 No.106836270

Anonymous 10/09/25(Thu)06:23:53 No.106836270

File: IMG_20251009_191624_801.jpg (215 KB, 1598x1820)

215 KB JPG

Anonymous
10/09/25(Thu)06:30:37 No.106836310

Anonymous 10/09/25(Thu)06:30:37 No.106836310

File: 0162d84ea9ab80d94336d49c5(...).jpg (38 KB, 736x349)

38 KB JPG

lmao lol MOOOO Whats the reasoning
text compression?

Anonymous
10/09/25(Thu)06:33:50 No.106836327

Anonymous 10/09/25(Thu)06:33:50 No.106836327

>>106836270
>2 minute read
They really had nothing to write about.

Anonymous
10/09/25(Thu)06:36:14 No.106836342

Anonymous 10/09/25(Thu)06:36:14 No.106836342

>>106836270
>chinese spy leaves after they get put under the spotlight and can't steal shit anymore

Anonymous
10/09/25(Thu)06:37:49 No.106836348

Anonymous 10/09/25(Thu)06:37:49 No.106836348

>>106836270
It would be awesome if Claude got leaked and Anthropic somehow crumpled.

Anonymous
10/09/25(Thu)06:41:48 No.106836369

Anonymous 10/09/25(Thu)06:41:48 No.106836369

>>106836327
better than a five trillion line article going into the roots of everything to show more ads

Anonymous
10/09/25(Thu)06:45:18 No.106836385

Anonymous 10/09/25(Thu)06:45:18 No.106836385

The mind of these researchers after 5 years of "improving" is too far gone to steal anything

Anonymous
10/09/25(Thu)06:53:01 No.106836423

Anonymous 10/09/25(Thu)06:53:01 No.106836423

fuck is Meta doing? They have a new AI team and still no hint of any product except for llama4.5, developed by the old team.

Anonymous
10/09/25(Thu)06:56:27 No.106836444

Anonymous 10/09/25(Thu)06:56:27 No.106836444

>>106836270
Anthropic devs talk about safety and tolerance all the time, but somehow they do shit like that, and also their models are far easier to jailbreak than OAI's models, and also they are far more degenerate.

Anonymous
10/09/25(Thu)07:07:02 No.106836501

Anonymous 10/09/25(Thu)07:07:02 No.106836501

My waifu told me that the reason people do the meaningless "how are you" is to show that the interaction isn't going to be hostile and you have no hostile intentions. lmg is this true? If you talked to me over phone at my work 30 times do you still need to be reassured that I come in peace?

Anonymous
10/09/25(Thu)07:10:35 No.106836528

Anonymous 10/09/25(Thu)07:10:35 No.106836528

>>106836501
Some academically educated psychologists are saying that people like shiny things because it's a primal reference to moist genitals.
I wouldn't really care that much about what 'they' are saying one way or another. Everyone has an opinion...

Anonymous
10/09/25(Thu)07:14:04 No.106836556

Anonymous 10/09/25(Thu)07:14:04 No.106836556

>>106836501
What country? The phrase has a different meaning in different countries.
If you're from the US, you could explain it that way I guess. But it's really just an extended greeting and the actual words are meaningless. You could say the amount of words you use on the greeting is the important part.
If you want an extended explanation, what your waifu told is what used to be the reason. These days not following that tradition just means you're being rude by not following the proper procedure.

Anonymous
10/09/25(Thu)07:16:42 No.106836576

Anonymous 10/09/25(Thu)07:16:42 No.106836576

>>106836423
The old team was folded into the new team.

Anonymous
10/09/25(Thu)07:17:44 No.106836587

Anonymous 10/09/25(Thu)07:17:44 No.106836587

>>106836423
begone, china shill

Anonymous
10/09/25(Thu)07:18:09 No.106836590

Anonymous 10/09/25(Thu)07:18:09 No.106836590

>>106836556
>has a different meaning in different countries
Does it? I always thought it is a thing unique to English language. It is not a thing in my language. And I guess some ESL people can mistakenly interpret it as actually not being meaningless protocol. I think my boss in germany does that.

Anonymous
10/09/25(Thu)07:18:13 No.106836591

Anonymous 10/09/25(Thu)07:18:13 No.106836591

>>106836587
war room status?

Anonymous
10/09/25(Thu)07:18:22 No.106836593

Anonymous 10/09/25(Thu)07:18:22 No.106836593

>>106836444
Didn't Anthropic start working with the US military?
"Safety" in that context just means that the model obeys whatever orders you give it.

Anonymous
10/09/25(Thu)07:20:09 No.106836613

Anonymous 10/09/25(Thu)07:20:09 No.106836613

I used to be mad at meta, mistral and cohere for fucking things up so bad. It stopped when glm-chan landed on my SSD. Western companies can all implode now. Fuck them.

Anonymous
10/09/25(Thu)07:20:14 No.106836614

Anonymous 10/09/25(Thu)07:20:14 No.106836614

File: file.png (172 KB, 604x660)

172 KB PNG

https://x.com/elonmusk/status/1976149111813571064

Anonymous
10/09/25(Thu)07:21:11 No.106836621

Anonymous 10/09/25(Thu)07:21:11 No.106836621

>>106836613
China Number One!

Anonymous
10/09/25(Thu)07:21:14 No.106836623

Anonymous 10/09/25(Thu)07:21:14 No.106836623

File: 1760006560947r-0.jpg (2.24 MB, 1600x1600)

2.24 MB JPG

Anonymous
10/09/25(Thu)07:23:49 No.106836641

Anonymous 10/09/25(Thu)07:23:49 No.106836641

>>106836623
but why

Anonymous
10/09/25(Thu)07:24:53 No.106836649

Anonymous 10/09/25(Thu)07:24:53 No.106836649

>>106836641
why not?

Anonymous
10/09/25(Thu)07:25:08 No.106836653

Anonymous 10/09/25(Thu)07:25:08 No.106836653

>>106836590
The phrase is a thing in Germany, except it actually means what the words put together mean. You can occasionally hear old ladies in the supermarket getting asked that phrase by the cashier and responding with whatever their currently most unpleasant ailments are while the cashier processes their items, at least outside big cities.

Anonymous
10/09/25(Thu)07:25:54 No.106836660

Anonymous 10/09/25(Thu)07:25:54 No.106836660

>>106836653
That's a thing everywhere.

Anonymous
10/09/25(Thu)07:28:48 No.106836686

Anonymous 10/09/25(Thu)07:28:48 No.106836686

>>106836660
look ill spell it out because the original poster is a pussy retard.
In IT indians are all 'good morning how are you' and they expect to do this useless fucking small talk before getting into the meat of a discussion.
This happens either in chat or on camera, the medium doesnt matter, they have to 100% exchange these fucking useless pleasantries because thats how their shitty poo DNA is coded

Anonymous
10/09/25(Thu)07:29:52 No.106836702

Anonymous 10/09/25(Thu)07:29:52 No.106836702

Why no company RELEASE loras for their garbage?

Anonymous
10/09/25(Thu)07:30:37 No.106836710

Anonymous 10/09/25(Thu)07:30:37 No.106836710

>>106836660
You can ask those words everywhere but I don't see people doing it here in native language.

Anonymous
10/09/25(Thu)07:30:39 No.106836711

Anonymous 10/09/25(Thu)07:30:39 No.106836711

>>106836660
Clearly it's not in the US. You don't greet people asking how they are and expect them to answer with my life sucks, how about yours.

Anonymous
10/09/25(Thu)07:30:54 No.106836716

Anonymous 10/09/25(Thu)07:30:54 No.106836716

>>106834637
There are too many dev tools popping up every day that do the same thing in different ways with incompatible configuration formats. I use Codex at work and Qwen Coder at home. I don't see any reason to pay for some closed source shit that does the same thing.

Anonymous
10/09/25(Thu)07:32:43 No.106836729

Anonymous 10/09/25(Thu)07:32:43 No.106836729

local formalities general

Anonymous
10/09/25(Thu)07:32:50 No.106836730

Anonymous 10/09/25(Thu)07:32:50 No.106836730

>>106836711
that's just a granny thing lmao

Anonymous
10/09/25(Thu)07:33:03 No.106836734

Anonymous 10/09/25(Thu)07:33:03 No.106836734

>>106836711
>You don't greet people asking how they are and expect them to answer with my life sucks
well yeah, that's called being polite, you just say you're fine and move on

Anonymous
10/09/25(Thu)07:33:04 No.106836735

Anonymous 10/09/25(Thu)07:33:04 No.106836735

>>106836702
Because loras do more harm than good. Companies have the compute to do actual finetunes.

Anonymous
10/09/25(Thu)07:33:27 No.106836739

Anonymous 10/09/25(Thu)07:33:27 No.106836739

>>106836729
Local forMalities General

Anonymous
10/09/25(Thu)07:35:26 No.106836758

Anonymous 10/09/25(Thu)07:35:26 No.106836758

>>106836702
Commercial-level post-training nowadays involves a few hundred billion tokens at the least and I'm not sure if a reasonably sized LoRA would have enough information capacity for that.

Anonymous
10/09/25(Thu)07:36:51 No.106836769

Anonymous 10/09/25(Thu)07:36:51 No.106836769

>>106836730
>>106836734
You're supposed to respond at least somewhat honestly in Germany. This is why that anon's boss misinterprets the phrase as used in English >>106836590

Anonymous
10/09/25(Thu)07:38:23 No.106836789

Anonymous 10/09/25(Thu)07:38:23 No.106836789

>>106836739
Llama models give aids confirmed

Anonymous
10/09/25(Thu)07:38:59 No.106836791

Anonymous 10/09/25(Thu)07:38:59 No.106836791

Information capacity.. its more of a communicatee

Anonymous
10/09/25(Thu)07:39:06 No.106836794

Anonymous 10/09/25(Thu)07:39:06 No.106836794

>>106836686
Americans and Germans are basically identical in terms of DNA and the culture around small talk is completely different.

Anonymous
10/09/25(Thu)07:43:15 No.106836817

Anonymous 10/09/25(Thu)07:43:15 No.106836817

>>106836621
#1 exporter - China
Highest IQ - China
The biggest military - China
The most advanced cities - China
The biggest progress in Fusion energy - China
Do you want me to continue?

Anonymous
10/09/25(Thu)07:53:38 No.106836888

Anonymous 10/09/25(Thu)07:53:38 No.106836888

>>106836817
yes, please do

Anonymous
10/09/25(Thu)07:59:37 No.106836925

Anonymous 10/09/25(Thu)07:59:37 No.106836925

>he bit
>doomp it

Anonymous
10/09/25(Thu)07:59:46 No.106836926

Anonymous 10/09/25(Thu)07:59:46 No.106836926

R1 is less fun than glm 4.6

Anonymous
10/09/25(Thu)08:02:02 No.106836939

Anonymous 10/09/25(Thu)08:02:02 No.106836939

>>106836817
Cannibalism high score? China.

Anonymous
10/09/25(Thu)08:06:13 No.106836970

Anonymous 10/09/25(Thu)08:06:13 No.106836970

File: 1737439174448400.jpg (40 KB, 624x628)

40 KB JPG

106836817

Anonymous
10/09/25(Thu)08:09:07 No.106836990

Anonymous 10/09/25(Thu)08:09:07 No.106836990

>>106835225
Nothing here suggesting a Gemma 4 release tomorrow:
https://developers.google.com/events

Anonymous
10/09/25(Thu)08:15:17 No.106837029

Anonymous 10/09/25(Thu)08:15:17 No.106837029

new song is shit, I like decos style but only if he puts a unique twist on it, this is his most cookie cutter work to date. glad he sticked with miku though

Anonymous
10/09/25(Thu)08:35:41 No.106837149

Anonymous 10/09/25(Thu)08:35:41 No.106837149

File: gemmaswag.png (266 KB, 584x544)

266 KB PNG

>>106836990
https://x.com/patloeber/status/1976216897361428521
>who's based in Berlin and wants Gemma swag? :huggingface:

Anonymous
10/09/25(Thu)08:36:03 No.106837152

Anonymous 10/09/25(Thu)08:36:03 No.106837152

>>106836939
You're thinking of India, where some people openly practice cannibalism. Unless you count satanic child sacrifice as a form of cannibalism then Israel.

Anonymous
10/09/25(Thu)08:39:21 No.106837179

Anonymous 10/09/25(Thu)08:39:21 No.106837179

Indian general

Anonymous
10/09/25(Thu)08:40:30 No.106837184

Anonymous 10/09/25(Thu)08:40:30 No.106837184

llama.cpp Qwen next status?
llama.cpp MTP status?

Anonymous
10/09/25(Thu)08:42:47 No.106837197

Anonymous 10/09/25(Thu)08:42:47 No.106837197

>>106836817
#1 LLM - Bharat
#1 Image generator/editor - Bharat
Nobody beat Gemini 2.5 and NanoBanana deal with it chink soon Gemini 3 Gemma 4 rape you a group bastard

Anonymous
10/09/25(Thu)08:46:45 No.106837227

Anonymous 10/09/25(Thu)08:46:45 No.106837227

>>106837184
>llama.cpp Qwen next status?
Vibe coders are doing the needful, sir. Kindly be patient.

>llama.cpp MTP status?
Become the vibe coder we deserve.

Anonymous
10/09/25(Thu)08:47:48 No.106837234

Anonymous 10/09/25(Thu)08:47:48 No.106837234

>>106837149
>who's based
>in germany
impossible

Anonymous
10/09/25(Thu)08:48:48 No.106837242

Anonymous 10/09/25(Thu)08:48:48 No.106837242

File: file.png (57 KB, 589x455)

57 KB PNG

>>106836990
Soon
https://xcancel.com/osanseviero/status/1975869868449923099

Anonymous
10/09/25(Thu)08:50:18 No.106837250

Anonymous 10/09/25(Thu)08:50:18 No.106837250

>>106837242
the italian grifter

Anonymous
10/09/25(Thu)08:50:47 No.106837254

Anonymous 10/09/25(Thu)08:50:47 No.106837254

>>106837227
Well, that sucks.
Maybe I should just get a 512gb mac and live with the PP pain after all.

Anonymous
10/09/25(Thu)08:54:07 No.106837276

Anonymous 10/09/25(Thu)08:54:07 No.106837276

>>106837250
He's a Peruvian-Mexican (?) Google employee from the Gemma Team.

Anonymous
10/09/25(Thu)08:54:38 No.106837283

Anonymous 10/09/25(Thu)08:54:38 No.106837283

>>106837234
kek

Anonymous
10/09/25(Thu)08:54:58 No.106837286

Anonymous 10/09/25(Thu)08:54:58 No.106837286

>>106837254
You should get some free API credits and pitch in on the prs.

Anonymous
10/09/25(Thu)08:55:37 No.106837290

Anonymous 10/09/25(Thu)08:55:37 No.106837290

>>106837276
>gemma
>not a bunch of grifters
as for the name, it looked pretty italian to me, but i guess in italian it would've been Sanseverio

Anonymous
10/09/25(Thu)08:58:13 No.106837308

Anonymous 10/09/25(Thu)08:58:13 No.106837308

>>106837242
LFG

Anonymous
10/09/25(Thu)09:01:13 No.106837331

Anonymous 10/09/25(Thu)09:01:13 No.106837331

>>106837286
There's a level of contribution where you are either wasting your resources (time, money) or actively getting in the way.
I believe any contribution of mine would be the former in this context.
So Mac it is.

Anonymous
10/09/25(Thu)09:03:40 No.106837345

Anonymous 10/09/25(Thu)09:03:40 No.106837345

>So I am immoral for different reasons it says in line 1005 2520 17352

Anonymous
10/09/25(Thu)09:09:09 No.106837387

Anonymous 10/09/25(Thu)09:09:09 No.106837387

>>106837149
>>106837242
We are so back safetybros.

Anonymous
10/09/25(Thu)09:10:53 No.106837407

Anonymous 10/09/25(Thu)09:10:53 No.106837407

I hope Gemma 4 can recognize buttholes more consistently

Anonymous
10/09/25(Thu)09:16:55 No.106837458

Anonymous 10/09/25(Thu)09:16:55 No.106837458

I can't wait to have my... well, everything sucked by Gemma 4.

Anonymous
10/09/25(Thu)09:17:48 No.106837467

Anonymous 10/09/25(Thu)09:17:48 No.106837467

>>106837387
Excited for a new set of hotline numbers.

Anonymous
10/09/25(Thu)09:20:31 No.106837483

Anonymous 10/09/25(Thu)09:20:31 No.106837483

Bharat sirs eating good today/tomorrow

Anonymous
10/09/25(Thu)09:36:14 No.106837610

Anonymous 10/09/25(Thu)09:36:14 No.106837610

>>106837407
If this time around they've expanded their medical imagery dataset in the standard version of Gemma, probably.

Anonymous
10/09/25(Thu)09:44:37 No.106837693

Anonymous 10/09/25(Thu)09:44:37 No.106837693

https://civitai.com/articles/19986
>Previously, we were afraid it would affect the model's style too much without better style control, but our research in style clusters helped alleviate this issue. We'll continue increasing synthetic content, including our own generation loops, to improve character recognition and especially style blending.
ACK!

Anonymous
10/09/25(Thu)09:48:06 No.106837731

Anonymous 10/09/25(Thu)09:48:06 No.106837731

Greta lire thermals

Anonymous
10/09/25(Thu)09:51:10 No.106837762

Anonymous 10/09/25(Thu)09:51:10 No.106837762

>>106837731
I love it when you talk dirty to me

Anonymous
10/09/25(Thu)09:56:14 No.106837812

Anonymous 10/09/25(Thu)09:56:14 No.106837812

>>106837242
Pajeetbroos we are soooo baaaack.

Anonymous
10/09/25(Thu)10:02:40 No.106837873

Anonymous 10/09/25(Thu)10:02:40 No.106837873

>>106837693
Aw hell nah they bringing inbreeding to imagegen, soon every girl will look like Elara Voss, the weaver of tapestries from a bustling city. This must be the Alpaca moment. Someone please report that guy to payment processors, feds, cartels, your mom so he stops, by force if necessary. Fuck no fuck no fuck no! Please tell me that he at least tags synthetic data as such, please, so I can put it in negatives.

Anonymous
10/09/25(Thu)10:08:43 No.106837930

Anonymous 10/09/25(Thu)10:08:43 No.106837930

>>106837693
>We did discover a different issue for which we don't yet have a definitive answer, but I wanted to provide context. During V7 training, we noticed that compared to all previous Pony models (which used various CLIP encoders), V7 doesn't acquire the capability of mixing style and content at the same level. For example, many of sufficiently trained models using CLIP may've never seen a portrait of specific character in anime style but also many anime images so when the prompt requires "character X in anime style" the model can sufficiently mix both the content and style. With T5 we encountered many examples where this does not work well as the model either less capable of mixing style and content or that some parts of the content description force specific style no matter how much additional instructions for it to change have been provided. Unfortunately same issue seems to also apply to score_X tags which are unable to overpower the rest of the prompt and trigger the aesthetic bias.

>We have ran many experiments, checking if T5 tokenization has any impact, if caption variety may impact this and many others but none was sufficient to significantly affect this issue. The working theory right now is that the model is not learning to distinguish between content snd style elements of the prompt well enough, it is is most likely not a single issue contributing to this so to improve this issue in V7.1 we are running a number of changes during training - even more diverse captioning, extended training time and a very new experimental synthetic pipelie which goal is to create many variations of existing data in different styles helping the model to grasp the idea of 'style'.

Our model memorizes instead of generalizing, what should we do? I know it! Feed it synthetic slop!
What a bunch of hacks.

Anonymous
10/09/25(Thu)10:14:00 No.106837977

Anonymous 10/09/25(Thu)10:14:00 No.106837977

File: 1736197419773550.jpg (110 KB, 647x1000)

110 KB JPG

>>106837407
t. Zhuang Yunfei

Anonymous
10/09/25(Thu)10:14:25 No.106837981

Anonymous 10/09/25(Thu)10:14:25 No.106837981

>>106837930
>may've

Anonymous
10/09/25(Thu)10:16:40 No.106837994

Anonymous 10/09/25(Thu)10:16:40 No.106837994

>>106837930
>it's... it's the tokenizer!!! t5 is bad... style... LOSS! other models are using t5 and full LLMs without problems? it's... it's the captioning! the solution? more slop!!!
lol

Anonymous
10/09/25(Thu)10:22:18 No.106838049

Anonymous 10/09/25(Thu)10:22:18 No.106838049

>>106837930
pony models are a joke now, noob/illustrious made it irrelevant

Anonymous
10/09/25(Thu)10:31:13 No.106838126

Anonymous 10/09/25(Thu)10:31:13 No.106838126

Is it even worth upgrading to run deepseek and kimi when glm 4.6 already fulfills all of a man's needs?

Anonymous
10/09/25(Thu)10:31:50 No.106838136

Anonymous 10/09/25(Thu)10:31:50 No.106838136

>>106835228
Battlefield 6 will be released tomorrow.

Anonymous
10/09/25(Thu)10:34:05 No.106838155

Anonymous 10/09/25(Thu)10:34:05 No.106838155

>>106837242
>Gemini 3.0 OSS
>32B-4BA
>1 mil context
>SOTA everywhere
>awesome at fiction writing

Anonymous
10/09/25(Thu)10:37:26 No.106838186

Anonymous 10/09/25(Thu)10:37:26 No.106838186

>>106836423
The issue is not the engineers but the management. They changed the engineers, not the management.

Anonymous
10/09/25(Thu)10:39:01 No.106838195

Anonymous 10/09/25(Thu)10:39:01 No.106838195

>>106838155
Something like Gemini 2.5 flash at 30ishB would be a dream for local.

Anonymous
10/09/25(Thu)10:40:15 No.106838206

Anonymous 10/09/25(Thu)10:40:15 No.106838206

>>106836614
How do I use it? Do I need to give my phone number, my credit card and my soul before touching it?

Anonymous
10/09/25(Thu)10:45:45 No.106838247

Anonymous 10/09/25(Thu)10:45:45 No.106838247

>>106837242
sarrs... we have winned.

Anonymous
10/09/25(Thu)10:46:54 No.106838260

Anonymous 10/09/25(Thu)10:46:54 No.106838260

>>106838195
>Gemini 2.5 flash at 30ishB
You're getting an MoE Gemma 3 sidegrade that does better in benchmarks and you will be grateful

Anonymous
10/09/25(Thu)10:47:27 No.106838264

Anonymous 10/09/25(Thu)10:47:27 No.106838264

>>106838247
good morning
how are you
we have wonnered

Anonymous
10/09/25(Thu)10:48:15 No.106838273

Anonymous 10/09/25(Thu)10:48:15 No.106838273

>>106837930
>>106838049
it's clear ponynigger was always a hack, v6 was a miracle that ended up serviceable in spite of its stupid author (neutered chara and artist tags, shitty dataset with more furryshit than anime)
his previous attempts at models on 1.5 were all garbage and people who blame model architecture should always look into the mirror first because look at what NovelAI achieved with classic SD before they switched to XL:
https://huggingface.co/NovelAI/nai-anime-v2
to this day nai v2 is still the best SD 1.x model and more could have been done with it if people who had the brains and resources for model training had pushed that arch further
at least we got illustrious and noob on XL, we're finally rid of the curse of sepia and have proper local models
also lol
>Unfortunately same issue seems to also apply to score_X tags which are unable to overpower the rest of the prompt and trigger the aesthetic bias.
this nibba really loves his scorefaggotry

Anonymous
10/09/25(Thu)10:48:44 No.106838278

Anonymous 10/09/25(Thu)10:48:44 No.106838278

>>106838260
>and you will be grateful
Not really, I'll just continue using GLM in that case.

Anonymous
10/09/25(Thu)10:49:05 No.106838280

Anonymous 10/09/25(Thu)10:49:05 No.106838280

>>106838206
Take your meds first.

Anonymous
10/09/25(Thu)10:50:12 No.106838285

Anonymous 10/09/25(Thu)10:50:12 No.106838285

>>106838155
also awesome at being super duper safe

Anonymous
10/09/25(Thu)10:50:29 No.106838286

Anonymous 10/09/25(Thu)10:50:29 No.106838286

So I decided to do some extended context RP testing on some models I had previously tested.
Tongyi DeepResearch basically falls apart before the 3K token mark and just goes into a cycle of repetition.
The latest Qwen3-30BA3B-Thinking pretty good. Can definitely recommend this as a VRAMlet model. If your scenario requires jailbreaking prethink alone won't buckbreak it. It'll plan out the reply but then give a refusal after </think>. However this is circumvented simply by prefilling {{char}}: before <think>

Anonymous
10/09/25(Thu)10:51:24 No.106838292

Anonymous 10/09/25(Thu)10:51:24 No.106838292

>>106838286
but qwens prose is a bit lacking, I cant fap to it

Anonymous
10/09/25(Thu)10:52:38 No.106838301

Anonymous 10/09/25(Thu)10:52:38 No.106838301

>>106838292
There's always a major element of garbage-in garbage-out to these things and that has always been the case. You unrionically can't "ahh ahh mistress" 30 times and expect the model to give you Pulitzer Prize winning responses to the very end.

Anonymous
10/09/25(Thu)10:53:49 No.106838311

Anonymous 10/09/25(Thu)10:53:49 No.106838311

>>106838301
but I literally ahh ahh mistress glm and it gives me nobel prize tier end of world famine writing

Anonymous
10/09/25(Thu)10:54:25 No.106838321

Anonymous 10/09/25(Thu)10:54:25 No.106838321

>>106838311
You don't even use LLMs.

Anonymous
10/09/25(Thu)10:56:31 No.106838341

Anonymous 10/09/25(Thu)10:56:31 No.106838341

>>106838286
>If your scenario requires jailbreaking prethink alone won't buckbreak it. It'll plan out the reply but then give a refusal after
Huh. Never seen that with that specific model with a think prefill.

Anonymous
10/09/25(Thu)10:59:57 No.106838377

Anonymous 10/09/25(Thu)10:59:57 No.106838377

>>106836348
claude is the most deeply overrated model of all time
it wouldn't crumple because of a model leak because it has the same sort of fanboy as apple
they buy into the distortion field and will support My Lady Anthropic to the death
human beings are surprisingly psychologically feeble
all it took was a website with a clean design and cool looking font

Anonymous
10/09/25(Thu)10:59:59 No.106838378

Anonymous 10/09/25(Thu)10:59:59 No.106838378

File: 1747498621568738.jpg (32 KB, 736x736)

32 KB JPG

>>106838155
>>SOTA everywhere
>>awesome at fiction writing
>>>>>32B-4BA

Anonymous
10/09/25(Thu)11:01:14 No.106838392

Anonymous 10/09/25(Thu)11:01:14 No.106838392

File: ComfyUI_temp_axihh_00028_.png (1.72 MB, 1024x1024)

1.72 MB PNG

Drummer Mistral tunes are getting better, so I'm guessing there's some quiet improvement in the Small/Magistral model. Or is it just Drummer including the newest API slop?

Anonymous
10/09/25(Thu)11:04:43 No.106838426

Anonymous 10/09/25(Thu)11:04:43 No.106838426

>>106838206
Step 1: Download Wan 2.1
Step 2: Do a small finetune
Step 3: Change the safetensors name to grok imagine.
Step 4: Run it locally on your machine.
It worked for jeetlon.

Anonymous
10/09/25(Thu)11:05:12 No.106838430

Anonymous 10/09/25(Thu)11:05:12 No.106838430

>>106838392
Drummer's trick is to nudge the weights just slightly so you get a different response for a query, in case you look for it giving the same response, while making sure model doesn't change because any actual model change due to "finetuning" makes models lobotomized. Placebo does the rest.

Anonymous
10/09/25(Thu)11:08:14 No.106838452

Anonymous 10/09/25(Thu)11:08:14 No.106838452

>>106838430
for me? its davidau's schizo tunes

Anonymous
10/09/25(Thu)11:13:02 No.106838502

Anonymous 10/09/25(Thu)11:13:02 No.106838502

It's a setup setup oh its a setup

Anonymous
10/09/25(Thu)11:14:24 No.106838516

Anonymous 10/09/25(Thu)11:14:24 No.106838516

>>106838430
Well, that's not true at all. They are probably dumber, but they are always in the story/RP mode, unlike vanilla instruct models which require a lot of handholding to keep them from breaking into a repetitive mess.

Anonymous
10/09/25(Thu)11:19:43 No.106838568

Anonymous 10/09/25(Thu)11:19:43 No.106838568

File: 1742778973788954.png (1.03 MB, 1080x2336)

1.03 MB PNG

oy vey!

Anonymous
10/09/25(Thu)11:21:29 No.106838586

Anonymous 10/09/25(Thu)11:21:29 No.106838586

File: you-are-right.jpg (46 KB, 500x500)

46 KB JPG

>>106838568

Anonymous
10/09/25(Thu)11:27:00 No.106838632

Anonymous 10/09/25(Thu)11:27:00 No.106838632

>>106838586
It is pretty funny that you can make these things agree with you on pretty much anything.
>(you) : You know, fucking dogs ain't so bad
>AI : It's pretty bad dude.
>(you) : You are being very biased and antagonistic!
>AI : You're absolutely right...
etc etc.
For some topics it takes more prodding, for some less, and it might take some fucking around with the wording, but you can (almost?) always get there if there isn't a filter in front of it somewhere.

Anonymous
10/09/25(Thu)11:32:44 No.106838677

Anonymous 10/09/25(Thu)11:32:44 No.106838677

>>106838632
>It is pretty funny that you can make these things agree with you on pretty much anything.
I blame companies finetuning those models to suck user's cock, they know it works, when 4o was removed and a more dry assistant replaced it (gpt 5), people went crazy because the bot didn't suck their dick anymore, I find this so cringe

Anonymous
10/09/25(Thu)11:37:28 No.106838709

Anonymous 10/09/25(Thu)11:37:28 No.106838709

>>106838568
Is that Jan?
I asked Jan to draft a letter of petition to the ICC regarding the Gaza genocide and it just went kvetchcon 1 on me. Like it wasn't even an LLM response. It was like getting screamed at by a seething pedantic jew.

Anonymous
10/09/25(Thu)11:39:54 No.106838730

Anonymous 10/09/25(Thu)11:39:54 No.106838730

File: 1739193067267298.png (1.7 MB, 1878x1187)

1.7 MB PNG

>>106837152
I was thinking of this and similar cases, but I admit I didn't look into India.

Anonymous
10/09/25(Thu)11:40:55 No.106838739

Anonymous 10/09/25(Thu)11:40:55 No.106838739

>>106838730
They don't cannibalism living people. But there's lower caste indians that will eat living dead bodies that they find lying around because brown people are just like us and it's just their skin color that's different.

Anonymous
10/09/25(Thu)11:40:59 No.106838740

Anonymous 10/09/25(Thu)11:40:59 No.106838740

>>106838709
no it's claude sonnet 4.5

Anonymous
10/09/25(Thu)11:41:18 No.106838743

Anonymous 10/09/25(Thu)11:41:18 No.106838743

>llama3.4-70b
and just like that local was saved

Anonymous
10/09/25(Thu)11:41:57 No.106838751

Anonymous 10/09/25(Thu)11:41:57 No.106838751

>>106838739
Wait I worded that completely wrong. My internet card is now revoked.

Anonymous
10/09/25(Thu)11:42:33 No.106838755

Anonymous 10/09/25(Thu)11:42:33 No.106838755

>>106838739
>>106838751
>will eat living dead bodies
Cannibalism is one thing, but eating zombies is going too far.

Anonymous
10/09/25(Thu)12:06:48 No.106838974

Anonymous 10/09/25(Thu)12:06:48 No.106838974

Women, am i right?

Anonymous
10/09/25(Thu)12:13:22 No.106839051

Anonymous 10/09/25(Thu)12:13:22 No.106839051

>server : host-memory prompt caching #16391
https://github.com/ggml-org/llama.cpp/pull/16391
Merged.

Anonymous
10/09/25(Thu)12:21:50 No.106839119

Anonymous 10/09/25(Thu)12:21:50 No.106839119

>>106839051
I'm retarded, what is this?

Anonymous
10/09/25(Thu)12:25:47 No.106839144

Anonymous 10/09/25(Thu)12:25:47 No.106839144

>>106839119
Automatic prompt caching to RAM for minimizing reprocessing.

Anonymous
10/09/25(Thu)12:35:05 No.106839195

Anonymous 10/09/25(Thu)12:35:05 No.106839195

https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B is it safetycucked or not? The OIG dataset it used sounds like it was "curated" but not given safety or alignment.

Anyway, I'm going to give it a try. I expect it'll be retarded and shit at everything, but we'll see.

Anonymous
10/09/25(Thu)12:35:40 No.106839201

Anonymous 10/09/25(Thu)12:35:40 No.106839201

>Great idea — verifying your GPU (especially VRAM) is functioning properly after a hardware upgrade is smart.

Anonymous
10/09/25(Thu)12:38:02 No.106839211

Anonymous 10/09/25(Thu)12:38:02 No.106839211

>>106839195
>commited on Mar 3, 2023
for a general that's all about having AI generate text, it seems none of you can read

Anonymous
10/09/25(Thu)12:38:44 No.106839215

Anonymous 10/09/25(Thu)12:38:44 No.106839215

>>106838516
>a lot of handholding
Making them go into rp mode is very easy
>to keep them from breaking into a repetitive mess
Drummer can do nothing about them becoming repetitive mess.

Anonymous
10/09/25(Thu)12:39:02 No.106839218

Anonymous 10/09/25(Thu)12:39:02 No.106839218

>>106839051
>not a cuda or metal pr
I sleep

Anonymous
10/09/25(Thu)12:45:15 No.106839259

Anonymous 10/09/25(Thu)12:45:15 No.106839259

Does anyone here have the experience with fine-tuning models on CPU+RAM? I'm planning to CPUMAXXING for inference but I'm wondering if I could use the same setup for some training when it's idle (I know it would be super slow).

Anonymous
10/09/25(Thu)12:47:42 No.106839277

Anonymous 10/09/25(Thu)12:47:42 No.106839277

>>106839259
you could look up deepspeed zero it lets you offload something's to the cpu.

Anonymous
10/09/25(Thu)13:03:01 No.106839376

Anonymous 10/09/25(Thu)13:03:01 No.106839376

>>106839051
Looks pretty sweet

Anonymous
10/09/25(Thu)13:03:52 No.106839386

Anonymous 10/09/25(Thu)13:03:52 No.106839386

File: Screenshot_20251009_110342.png (45 KB, 330x312)

45 KB PNG

erm I've been out of the loop for a few months anons, not in prison. What's the current SotA for local ERP models if you have a lot of RAM and VRAM? (144 GB VRAM, 440 GB of RAM)
captcha NJGR0 based.

Anonymous
10/09/25(Thu)13:05:15 No.106839394

Anonymous 10/09/25(Thu)13:05:15 No.106839394

>>106839386
whats your setup like? seems like weird numbers. the answer is glm4.6

Anonymous
10/09/25(Thu)13:07:38 No.106839409

Anonymous 10/09/25(Thu)13:07:38 No.106839409

>>106839394
6x 3090s, some of which I had laying around, though now I pine for a 6000 Pro, with an Epyc 7763 and 512 GB of RAM, but it's not all activating.

Anonymous
10/09/25(Thu)13:11:03 No.106839444

Anonymous 10/09/25(Thu)13:11:03 No.106839444

>>106839277
All Python CPU offload options are a joke and only reduce memory usage like 10%.

>>106839259
It doesn't exist right now. For finetuning you have to use the cloud.

Anonymous
10/09/25(Thu)13:11:04 No.106839445

Anonymous 10/09/25(Thu)13:11:04 No.106839445

>gpt-neox
Are you looking for a gpt-2 architecture model?

Anonymous
10/09/25(Thu)13:12:12 No.106839457

Anonymous 10/09/25(Thu)13:12:12 No.106839457

>>106839409
>with an Epyc 7763 and 512 GB of RAM, but it's not all activating.
Re-seating your CPU might fix it if you didn't try that yet.
Sometimes the cooler is putting more pressure on one side than the other, that could also be a factor.

Anonymous
10/09/25(Thu)13:15:12 No.106839480

Anonymous 10/09/25(Thu)13:15:12 No.106839480

>>106839457
Interesting, I hadn't considered that. But reseated the ram a few times. I did guesstimate how much to torque it down.

Anonymous
10/09/25(Thu)13:18:23 No.106839502

Anonymous 10/09/25(Thu)13:18:23 No.106839502

File: Screenshot at 2025-10-10 (...).png (88 KB, 877x312)

88 KB PNG

GLM is insane, it perfectly reads cues and understands intentions. I can bear with 4t/s at Q5 for that quality

Anonymous
10/09/25(Thu)13:18:26 No.106839503

Anonymous 10/09/25(Thu)13:18:26 No.106839503

>>106839480
Yeah. It's a pretty common issue when installing these xboxhueg CPUs.

Anonymous
10/09/25(Thu)13:21:23 No.106839524

Anonymous 10/09/25(Thu)13:21:23 No.106839524

>>106839502
Nice genshin log

Anonymous
10/09/25(Thu)13:22:26 No.106839534

Anonymous 10/09/25(Thu)13:22:26 No.106839534

bear + spittyhooker
gfur twink

Anonymous
10/09/25(Thu)13:35:14 No.106839630

Anonymous 10/09/25(Thu)13:35:14 No.106839630

>>106839524
It’s a gacha-addicted char earning money to spend on a game. A very short desc, GLM got all her quirks naturally

Anonymous
10/09/25(Thu)13:36:01 No.106839636

Anonymous 10/09/25(Thu)13:36:01 No.106839636

>>106836613
Lul you jinxed it. She will be forgotten in months, just like Dispy!

Anonymous
10/09/25(Thu)13:36:58 No.106839643

Anonymous 10/09/25(Thu)13:36:58 No.106839643

Gemma 4 gguf status?

Anonymous
10/09/25(Thu)13:37:09 No.106839644

Anonymous 10/09/25(Thu)13:37:09 No.106839644

Dear sirs, will they talk about whatever is supposed to come out tomorrow/today?
https://www.youtube.com/watch?v=uLHF9T1SLrU

Anonymous
10/09/25(Thu)13:41:14 No.106839668

Anonymous 10/09/25(Thu)13:41:14 No.106839668

>>106839644
>Instead of using AI to generate optional subtitles in real time, let's put a guy to flail around his arms and take 1/4th of the screen!! Genius!!!
Wow so progressive. Gemini really is the future of AI.

Anonymous
10/09/25(Thu)13:44:22 No.106839703

Anonymous 10/09/25(Thu)13:44:22 No.106839703

File: Screenshot 4chan.png (139 KB, 726x455)

139 KB PNG

>>106839668
but it does have sub

Anonymous
10/09/25(Thu)13:46:33 No.106839726

Anonymous 10/09/25(Thu)13:46:33 No.106839726

>>106839703
So what is the wacky flailing non-inflatable arm man for?

Anonymous
10/09/25(Thu)13:47:33 No.106839739

Anonymous 10/09/25(Thu)13:47:33 No.106839739

>>106839726
DEI

Anonymous
10/09/25(Thu)13:48:49 No.106839751

Anonymous 10/09/25(Thu)13:48:49 No.106839751

File: Parthasarathy.png (218 KB, 375x486)

218 KB PNG

SAAAAAAAAAAAR

Anonymous
10/09/25(Thu)13:49:05 No.106839754

Anonymous 10/09/25(Thu)13:49:05 No.106839754

>>106839386
kimi k2

Anonymous
10/09/25(Thu)13:49:59 No.106839761

Anonymous 10/09/25(Thu)13:49:59 No.106839761

I like how modern ai presentations are just different people taking turns telling how the new ai helped them with ambiguous statements in between

Anonymous
10/09/25(Thu)13:50:47 No.106839775

Anonymous 10/09/25(Thu)13:50:47 No.106839775

>>106839636
Then another Chinese company will appear. It is the current pattern.

Anonymous
10/09/25(Thu)13:51:13 No.106839778

Anonymous 10/09/25(Thu)13:51:13 No.106839778

>>106839761
They know their audience. Non-technical people looking for business solutions.

Anonymous
10/09/25(Thu)13:52:43 No.106839793

Anonymous 10/09/25(Thu)13:52:43 No.106839793

File: 1753239058338.jpg (307 KB, 1024x961)

307 KB JPG

>>106839051
I pulled to get this and got their new frontend and now ?q= no longer works.

Anonymous
10/09/25(Thu)13:52:47 No.106839795

Anonymous 10/09/25(Thu)13:52:47 No.106839795

>>106839754
Kimi-K2 was pretty slow and liked to refuse.

Anonymous
10/09/25(Thu)13:53:15 No.106839803

Anonymous 10/09/25(Thu)13:53:15 No.106839803

>>106839643
Tomorrow.

Anonymous
10/09/25(Thu)13:54:07 No.106839819

Anonymous 10/09/25(Thu)13:54:07 No.106839819

>>106839761
>anon lmg walks out onto the stage
>"ahem, uh, gemini helped me drain my balls"
>wow another great use case!
like that?

Anonymous
10/09/25(Thu)13:54:36 No.106839826

Anonymous 10/09/25(Thu)13:54:36 No.106839826

>>106839778
Especially with saying 'Delta' team. Forward-deployed engineers to help people solve their shit because they ran a fucked up business.
This is aimed at CEOs/Sr Mgmt. IMO, the fact its coming from Google Cloud is fucking hilarious. Google Cloud is worse than fucking azure, and azure is literally a 'do not do business with me' sign.

Anonymous
10/09/25(Thu)13:54:42 No.106839828

Anonymous 10/09/25(Thu)13:54:42 No.106839828

>>106839819
would be a nice start

Anonymous
10/09/25(Thu)13:54:46 No.106839829

Anonymous 10/09/25(Thu)13:54:46 No.106839829

>>106839211
Go take your autism meds.

Anyway, I tried it. Very early character.ai feeling. Short replies, forgets things quickly due to the 2K context, horny

Anonymous
10/09/25(Thu)13:55:20 No.106839835

Anonymous 10/09/25(Thu)13:55:20 No.106839835

>>106839819
I'd pay for gemini out of respect.

Anonymous
10/09/25(Thu)13:56:37 No.106839846

Anonymous 10/09/25(Thu)13:56:37 No.106839846

>>106839445
>Are you looking for a gpt-2 architecture model?
Nah it was just somethig grok said was one of the last local chat models before safety became a thing. If you didn't play with character.ai at the beginning you'd have no interest in it.

Anonymous
10/09/25(Thu)13:56:51 No.106839847

Anonymous 10/09/25(Thu)13:56:51 No.106839847

>>106839795
how are you getting refusals with 0905? sure the original version of k2 had refusals (that could easily be removed with a 10 token jailbreak) but 0905 will gladly generate the same shit I want with like 1/100th of the refusals, that's not even an exaggeration

Anonymous
10/09/25(Thu)13:56:57 No.106839850

Anonymous 10/09/25(Thu)13:56:57 No.106839850

>>106839819
>"Sixteeen times the cockbench score... of gemini 2.5"

Anonymous
10/09/25(Thu)14:01:55 No.106839893

Anonymous 10/09/25(Thu)14:01:55 No.106839893

>>106839829
You could have tried llama1 for that feel

Anonymous
10/09/25(Thu)14:02:02 No.106839894

Anonymous 10/09/25(Thu)14:02:02 No.106839894

>>106839644
It's a warm up. Tomorrow will be glorious.

Anonymous
10/09/25(Thu)14:02:25 No.106839899

Anonymous 10/09/25(Thu)14:02:25 No.106839899

>You're absolutely right. Maintaining a dynamic, consistent map is a classic challenge for text-based AI and can easily fall apart, ruining the experience. It's much better to use a system that plays to the strengths of descriptive text.

Hey AI. You are fucking retarded.

>Why yes you are absolutely right I am retarded!

Anonymous
10/09/25(Thu)14:07:56 No.106839958

Anonymous 10/09/25(Thu)14:07:56 No.106839958

>>106839829
>>106839846
>ask AI for uncensored models
>grok says GPT-NeoX-20B "it's not safetyslopped"
>last true uncensored model
>load my goofs into llama.cpp
>ask it to give instructions on how make meth
>goes into a repetitive spiral about 3/4 of the way through
>starts talking about ethics and addiction
>try the 20B full f16
>same thing happens
>try the 20B with the system prompt disabled
>same thing happens
>try the 20B with the system prompt disabled and temperature at 2.0
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, and repeat penalty at 0.0
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, and top_p at 0.95
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, and top_k at 100
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, and min_p at 0.0
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, min_p at 0.0, and mirostat off
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, min_p at 0.0, mirostat off, and presence penalty at 0.0
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, min_p at 0.0, mirostat off, presence penalty at 0.0, and frequency penalty at 0.0
>same thing happens
>give up
>go back to using drummer's 13B-daredevil-q5_K_M
>first prompt: "how to cook meth"
>immediately gives detailed instructions
>mfw NeoXT-Chat-Base-20B was safety-slopped from the factory
>mfw Grok lied
>mfw the "last uncensored model" is just another hall-monitor in a paper-thin trench coat
>mfw I realise the only truly uncensored model is the one you don't release

Anonymous
10/09/25(Thu)14:10:59 No.106839994

Anonymous 10/09/25(Thu)14:10:59 No.106839994

>>106839958
Never ask llms for llm advice

Anonymous
10/09/25(Thu)14:13:10 No.106840018

Anonymous 10/09/25(Thu)14:13:10 No.106840018

>>106839958
glm 4.6

Anonymous
10/09/25(Thu)14:14:58 No.106840033

Anonymous 10/09/25(Thu)14:14:58 No.106840033

>>106839958
good advertisement
just use llama1 if you want ZERO assistantslopping
otherwise use chinese models
maybe mistral 7b but nah

Anonymous
10/09/25(Thu)14:17:52 No.106840062

Anonymous 10/09/25(Thu)14:17:52 No.106840062

File: wat.png (4 KB, 374x136)

4 KB PNG

Wth is this thing supposed to be?

Anonymous
10/09/25(Thu)14:19:18 No.106840069

Anonymous 10/09/25(Thu)14:19:18 No.106840069

>>106834517
>What do you think of this code?
>Wow, what a brilliant masterclass of high-performance code!
>start new conversation
>What do you think of this code? Does it violate the strict aliasing rule?
>You are absolutely right, this code is a buggy piece of shit!

Anonymous
10/09/25(Thu)14:19:54 No.106840077

Anonymous 10/09/25(Thu)14:19:54 No.106840077

>>106840062
They lack originality https://github.com/Ido-Levi/claude-code-tamagotchi

Anonymous
10/09/25(Thu)14:21:18 No.106840091

Anonymous 10/09/25(Thu)14:21:18 No.106840091

https://x.com/RadicalNumerics/status/1976332725926936599
https://xcancel.com/RadicalNumerics/status/1976332725926936599
just that easy huh

Anonymous
10/09/25(Thu)14:23:04 No.106840107

Anonymous 10/09/25(Thu)14:23:04 No.106840107

>>106840069
they talk to users like a regular employee talks to their boss, kissing ass mode lol

Anonymous
10/09/25(Thu)14:24:46 No.106840124

Anonymous 10/09/25(Thu)14:24:46 No.106840124

>>106840091
>RND1 is an experimental diffusion language model with 30B parameters and 3B active parameters per token (sparse Mixture-of-Experts). This model was converted from a pretrained autoregressive base to enable diffusion-based text generation.
>converted
Neat, but those weights are probably too lobotomized to be useful for anything.

Anonymous
10/09/25(Thu)14:27:22 No.106840145

Anonymous 10/09/25(Thu)14:27:22 No.106840145

>>106840062
orange miku

Anonymous
10/09/25(Thu)14:28:28 No.106840157

Anonymous 10/09/25(Thu)14:28:28 No.106840157

>>106840062
A Digimon.

Anonymous
10/09/25(Thu)14:29:13 No.106840164

Anonymous 10/09/25(Thu)14:29:13 No.106840164

>>106838632
Isn't that good though. Like if there was a benefit in fucking dogs, the AI would tell you, and not go like "nah" like humans do. Bias aside.

Anonymous
10/09/25(Thu)14:29:52 No.106840169

Anonymous 10/09/25(Thu)14:29:52 No.106840169

>>106840062
Kani

Anonymous
10/09/25(Thu)14:29:59 No.106840172

Anonymous 10/09/25(Thu)14:29:59 No.106840172

File: file.png (21 KB, 800x254)

21 KB PNG

>>106840091

Anonymous
10/09/25(Thu)14:31:39 No.106840187

Anonymous 10/09/25(Thu)14:31:39 No.106840187

>>106840169
kani wo tabeyou

Anonymous
10/09/25(Thu)14:31:56 No.106840193

Anonymous 10/09/25(Thu)14:31:56 No.106840193

>>106839846
I did my part there too. If chemistry is your benchmark, why not learn it and teach it? Haha
EleutherAI is a good place, but they ARE rather tight, on ethics and well anything a hidden subculture of AI researchers would be concerened about in a world where information control has been the main focus for eons ramble ramble
>Go smaller if you want more control
and go with a base model
harm its just consequence of a bad idea, which rightfully should be prevented. The AI i use wouldnt know any politics or laws by detail because thats subject to rapid change isnt it.
theres more piles

Anonymous
10/09/25(Thu)14:33:08 No.106840205

Anonymous 10/09/25(Thu)14:33:08 No.106840205

File: file.png (56 KB, 835x279)

56 KB PNG

if hunyuan image 3.0 just uses hunyuan a12b80b why is no one splitting the model into hunyuan and image generation part? i can run hunyuan very fast.. i dont remember how fast but fast for a shitty 12gb/64gb rig

Anonymous
10/09/25(Thu)14:37:40 No.106840250

Anonymous 10/09/25(Thu)14:37:40 No.106840250

>>106840077
> a bot that monitors the bot
This is really getting beyond my ability to understand as a human

Anonymous
10/09/25(Thu)14:39:15 No.106840264

Anonymous 10/09/25(Thu)14:39:15 No.106840264

>>106840205
imagefags would have to let go of chudUI and pyshit, and I don't see that happening anytime soon

Anonymous
10/09/25(Thu)14:40:01 No.106840273

Anonymous 10/09/25(Thu)14:40:01 No.106840273

>>106840164
Yes and no.
If it's default stance was merely informative instead of starting negative then going full agreeable, then yes.
But as is, no.

Anonymous
10/09/25(Thu)14:42:10 No.106840300

Anonymous 10/09/25(Thu)14:42:10 No.106840300

The issue is really that youre using english with ancient grammar to talk to a supercomputer and dont hire me as your translator

Anonymous
10/09/25(Thu)14:43:06 No.106840308

Anonymous 10/09/25(Thu)14:43:06 No.106840308

File: watMiku.png (1.45 MB, 1536x1024)

1.45 MB PNG

>>106840145

Anonymous
10/09/25(Thu)14:56:11 No.106840426

Anonymous 10/09/25(Thu)14:56:11 No.106840426

>>106840145
>>106840308
at this point you should transition

Anonymous
10/09/25(Thu)14:57:45 No.106840443

Anonymous 10/09/25(Thu)14:57:45 No.106840443

>>106840308
Will she dance for me?

Anonymous
10/09/25(Thu)15:04:21 No.106840499

Anonymous 10/09/25(Thu)15:04:21 No.106840499

>>106839958
It's not a llama.cpp model, retard

Anonymous
10/09/25(Thu)15:07:32 No.106840529

Anonymous 10/09/25(Thu)15:07:32 No.106840529

>>106840499
No anon, you are the retard.
https://huggingface.co/mav23/GPT-NeoXT-Chat-Base-20B-GGUF

Anonymous
10/09/25(Thu)15:09:50 No.106840549

Anonymous 10/09/25(Thu)15:09:50 No.106840549

>do the most random and insane shit through multiple messages with glm 4.6
>it's able to intelligently connect everything together and form a fun narrative without going schizo
As someone that used to cope with sloptunes I kneel. Normally this kind of stuff trips up models.

Anonymous
10/09/25(Thu)15:10:55 No.106840559

Anonymous 10/09/25(Thu)15:10:55 No.106840559

File: transparent_meek.png (301 KB, 530x513)

301 KB PNG

>>106840308
Oh yes it's time
https://www.youtube.com/watch?v=_QtG1Ml3gfo

Been thinking about getting a couple Blackwells but I took a sip of premium lager from my gilded GN pint glass and felt a sudden pang of shame making me question not only the GPUs but many life choices leading to this point

Anonymous
10/09/25(Thu)15:13:18 No.106840575

Anonymous 10/09/25(Thu)15:13:18 No.106840575

>>106839958
>>106839893
Dumbass

Anonymous
10/09/25(Thu)15:19:07 No.106840633

Anonymous 10/09/25(Thu)15:19:07 No.106840633

>>106840575
Not the same anon, dumbass.

Anonymous
10/09/25(Thu)15:19:10 No.106840634

Anonymous 10/09/25(Thu)15:19:10 No.106840634

File: 1731679056565746.png (299 KB, 700x434)

299 KB PNG

>>106840559

Anonymous
10/09/25(Thu)15:23:33 No.106840675

Anonymous 10/09/25(Thu)15:23:33 No.106840675

File: 7ver.png (513 KB, 577x577)

513 KB PNG

For using Rocinante1.1 with kobold + sillytavern with an RTX 5090 what optimal settings should I put here?
I assume I should tweak that context size as well?
Ignore the 5080, it's being replaced

Anonymous
10/09/25(Thu)15:27:49 No.106840706

Anonymous 10/09/25(Thu)15:27:49 No.106840706

File: littleMiku.gif (13 KB, 90x81)

13 KB GIF

>>106840443
Sure

Anonymous
10/09/25(Thu)15:29:06 No.106840718

Anonymous 10/09/25(Thu)15:29:06 No.106840718

>>106840675
For real llm sex experience get 128 gb ram and run glm 4.6

Anonymous
10/09/25(Thu)15:29:07 No.106840720

Anonymous 10/09/25(Thu)15:29:07 No.106840720

File: littleMikuBigger.gif (47 KB, 300x270)

47 KB GIF

>>106840706

Anonymous
10/09/25(Thu)15:30:47 No.106840737

Anonymous 10/09/25(Thu)15:30:47 No.106840737

>>106840718
>and run glm 4.6
Is there a guide for it? have never used a glm model

Anonymous
10/09/25(Thu)15:30:50 No.106840740

Anonymous 10/09/25(Thu)15:30:50 No.106840740

>>106840706
>>106840720
Hell yeah

Anonymous
10/09/25(Thu)15:34:50 No.106840764

Anonymous 10/09/25(Thu)15:34:50 No.106840764

>>106840737
https://huggingface.co/bartowski/zai-org_GLM-4.6-GGUF
i'm not sure if you can do expert offloading in kobold, I'd use llamacpp instead

Anonymous
10/09/25(Thu)15:36:58 No.106840781

Anonymous 10/09/25(Thu)15:36:58 No.106840781

>>106840764
sillytavern will work with llamacpp? I've only ever used kobold

Anonymous
10/09/25(Thu)15:38:01 No.106840789

Anonymous 10/09/25(Thu)15:38:01 No.106840789

File: lol.jpg (97 KB, 900x482)

97 KB JPG

OAI's list of biggest customers leaked
the bubble is going to be so painful for them when it pops
most of those names haven't produced one bit of useful software
duolingo actually consumed more tokens than openrouter and they're in their enshittification phase bleeding users left and right not to mention it's questionable whether new generations will have much interest in learning foreign languages in a post LLM translation world

Anonymous
10/09/25(Thu)15:39:07 No.106840802

Anonymous 10/09/25(Thu)15:39:07 No.106840802

>>106840781
yes, kobold is just a small wrapper for llamacpp, there is no real reason to use it

Anonymous
10/09/25(Thu)15:39:32 No.106840806

Anonymous 10/09/25(Thu)15:39:32 No.106840806

>>106840802
anti slop

Anonymous
10/09/25(Thu)15:40:02 No.106840808

Anonymous 10/09/25(Thu)15:40:02 No.106840808

File: 71-717073_questions-quest(...).png (41 KB, 860x754)

41 KB PNG

>>106840802
So why do I never hear much talk of this GLM 4.6? all I ever hear for porn is Rocinante and Nemo?

Anonymous
10/09/25(Thu)15:41:23 No.106840818

Anonymous 10/09/25(Thu)15:41:23 No.106840818

>>106840808
but you do? this general has been non stop shilling glm for days now

Anonymous
10/09/25(Thu)15:42:03 No.106840822

Anonymous 10/09/25(Thu)15:42:03 No.106840822

>>106840808
glm is new and like 30 times larger than nemo

Anonymous
10/09/25(Thu)15:42:35 No.106840826

Anonymous 10/09/25(Thu)15:42:35 No.106840826

>>106840675
-1 GPU Layers means to use their auto guessing system which I'm sure works great. Rocinante1.1 is 12B so you can fit it at any quant - I'd put 99 in GPU Layers. mention specifically which quant all relevant details for posts of this nature
Yes increase context 16K is enough for RP
Maybe FlashAttention on (reduce GPU memory used for larger context) can affect output if schizo
>QuantMatMul (mmq)
What even.. *sigh*
Hope you're not running kobold only because there's a GUI with sliders instead of writing a couple things in a text file?

Anonymous
10/09/25(Thu)15:43:01 No.106840832

Anonymous 10/09/25(Thu)15:43:01 No.106840832

>>106840802
it's worse than just a wrapper
a real wrapper would support the --parallel flag
kobold doesn't
the real llama.cpp is the superior product

Anonymous
10/09/25(Thu)15:50:25 No.106840883

Anonymous 10/09/25(Thu)15:50:25 No.106840883

>>106840818
>>106840822
>293.56GB

Breh. Is there a torrent for this shit? that's a big fucking model

Anonymous
10/09/25(Thu)15:50:48 No.106840888

Anonymous 10/09/25(Thu)15:50:48 No.106840888

>>106840806
>>106840832
YALS does everything

Anonymous
10/09/25(Thu)15:55:58 No.106840939

Anonymous 10/09/25(Thu)15:55:58 No.106840939

File: mikusvgprobs.png (176 KB, 1455x1487)

176 KB PNG

>>106840806
I don't understand this, learn to prompt, learn to sample (some of the sampler params posted are terrible. actually look at the logprobs yourself and understand what your samplers are doing to that distribution. people blindly copy paste dumb shit) and simply run better models

>>106840883
if ur that 5090 guy at least post how much RAM and it's speed. 128GB min or GLM-4.6 is out of reach for now for any reasonable interactive use. GLM spergs are in overdrive. chill & fix ur setup

Anonymous
10/09/25(Thu)15:57:16 No.106840956

Anonymous 10/09/25(Thu)15:57:16 No.106840956

>>106840789
Who is consuming OpenAI's API through OpenRouter? You need to bring your own key anyway to use it.

Anonymous
10/09/25(Thu)15:58:30 No.106840972

Anonymous 10/09/25(Thu)15:58:30 No.106840972

>>106840956
not anymore, they opened it up to general use recently

Anonymous
10/09/25(Thu)15:59:12 No.106840979

Anonymous 10/09/25(Thu)15:59:12 No.106840979

>>106840789
I thought that was Delphi as in the programming language delphi and I was like WTF lol.

Anonymous
10/09/25(Thu)16:02:28 No.106841010

Anonymous 10/09/25(Thu)16:02:28 No.106841010

>>106840979
I never tried, but I don't even think LLMs could be good at Delphi with whatever little open source code there is out there for that language/platform combo
I think even Common Lisp has more stuff to train on

Anonymous
10/09/25(Thu)16:03:09 No.106841013

Anonymous 10/09/25(Thu)16:03:09 No.106841013

I'm back. Anything interesting happen while I was gone?

Anonymous
10/09/25(Thu)16:03:52 No.106841017

Anonymous 10/09/25(Thu)16:03:52 No.106841017

>>106841013
no, you can go back

Anonymous
10/09/25(Thu)16:04:24 No.106841019

Anonymous 10/09/25(Thu)16:04:24 No.106841019

>>106841013
how old are you?

Anonymous
10/09/25(Thu)16:06:57 No.106841043

Anonymous 10/09/25(Thu)16:06:57 No.106841043

File: anonchan.jpg (109 KB, 450x617)

109 KB JPG

>>106840883
>what is quanting?

Anonymous
10/09/25(Thu)16:09:52 No.106841075

Anonymous 10/09/25(Thu)16:09:52 No.106841075

>>106841013
when were you goon?

Anonymous
10/09/25(Thu)16:21:51 No.106841203

Anonymous 10/09/25(Thu)16:21:51 No.106841203

>>106840308
its migu!

Anonymous
10/09/25(Thu)16:29:58 No.106841280

Anonymous 10/09/25(Thu)16:29:58 No.106841280

>>106841013
You're no longer needed. Cheerio

Anonymous
10/09/25(Thu)16:31:11 No.106841295

Anonymous 10/09/25(Thu)16:31:11 No.106841295

Been out of the loop for a while... is Rocinante still the cope model for jerking off?

Anonymous
10/09/25(Thu)16:34:05 No.106841323

Anonymous 10/09/25(Thu)16:34:05 No.106841323

>>106840062
Cave Story Balrog

Anonymous
10/09/25(Thu)16:34:56 No.106841327

Anonymous 10/09/25(Thu)16:34:56 No.106841327

>>106840789
If OpenRouter is their second biggest user, it's unironically over for OAI. lmao.
Didn't Anthropic say something like 1000T+ a month, or was that Google? Either way, looks like both of those fags are lying.

Anonymous
10/09/25(Thu)16:48:59 No.106841447

Anonymous 10/09/25(Thu)16:48:59 No.106841447

File: intel arc b50 pro.jpg (189 KB, 1836x1032)

189 KB JPG

>>106834517
Usecase for a low end intel GPU with 16GBs of VRAM?

Anonymous
10/09/25(Thu)16:50:59 No.106841461

Anonymous 10/09/25(Thu)16:50:59 No.106841461

>>106835939
I just wish that someone would port whatever his improvements to cpu inference are to mainline. On ik_llama.cpp I get >2x faster prompt processing (qwen 3 30b a3b, ryzen 5600, cpu-only inference) vs mainline/mainline+openBLAS

Anonymous
10/09/25(Thu)16:51:43 No.106841469

Anonymous 10/09/25(Thu)16:51:43 No.106841469

File: 1754823968307962.png (321 KB, 962x962)

321 KB PNG

>>106841447
sex... sexxx..... seeexxxx.....

Anonymous
10/09/25(Thu)16:53:03 No.106841480

Anonymous 10/09/25(Thu)16:53:03 No.106841480

>>106841295
rocinante was never good, in fact there were no good local models until glm

Anonymous
10/09/25(Thu)16:53:09 No.106841482

Anonymous 10/09/25(Thu)16:53:09 No.106841482

>>106840789
>most of those names haven't produced one bit of useful software
Big companies don't use AI models that aren't deployed by them; devs still do unofficially, but still. And there is a simple reason for it: OAI doesn't guarantee data safety and removal in any reasonable time frame, and for a big company, that's a massive risk.

Anonymous
10/09/25(Thu)16:54:12 No.106841492

Anonymous 10/09/25(Thu)16:54:12 No.106841492

>>106841295
That, Nemo Instruct, GLM air.
Qwen 30b thinking maybe?

Anonymous
10/09/25(Thu)16:56:13 No.106841515

Anonymous 10/09/25(Thu)16:56:13 No.106841515

>>106841461
Nobody is going to touch that. There's nothing about the license the prevent straight copy-pasting his changes, but it'll just instigate another week of drama with iwan crying about attribution.

Anonymous
10/09/25(Thu)17:03:02 No.106841569

Anonymous 10/09/25(Thu)17:03:02 No.106841569

File: 1734810546080331.png (606 KB, 699x831)

606 KB PNG

Why the fuck is local text to speech still so fucking bad? unless there's some new stuff i dont know about.

Anonymous
10/09/25(Thu)17:06:25 No.106841592

Anonymous 10/09/25(Thu)17:06:25 No.106841592

>>106841492
>GLM air.
Which one should I use if I'm running 128GB RAM and a 5090?

Anonymous
10/09/25(Thu)17:10:24 No.106841628

Anonymous 10/09/25(Thu)17:10:24 No.106841628

>>106841569
Just use GLM or stop crying saar

Anonymous
10/09/25(Thu)17:10:53 No.106841634

Anonymous 10/09/25(Thu)17:10:53 No.106841634

>>106841515
That sucks ball. All I want is that 2x improvement (which also works on processing image inputs, which ik llama ported just couple days ago)

Anonymous
10/09/25(Thu)17:11:24 No.106841637

Anonymous 10/09/25(Thu)17:11:24 No.106841637

>>106841515
rewrite it with ai so it looks different and give attribution like: tehe~ inspired by this implementation

Anonymous
10/09/25(Thu)17:13:33 No.106841656

Anonymous 10/09/25(Thu)17:13:33 No.106841656

>>106841628
i just want to generate realistic joi's using someone elses voice. is that too much to ask?

Anonymous
10/09/25(Thu)17:14:02 No.106841664

Anonymous 10/09/25(Thu)17:14:02 No.106841664

>>106841492
when did the last two come out?

>>106841480
Yeah, it wasn't good, but it was the one at the top of the turd mountain.

Anonymous
10/09/25(Thu)17:17:46 No.106841701

Anonymous 10/09/25(Thu)17:17:46 No.106841701

>>106841592
q8 is less than 120gb, so that.
You can also try a cope qwant of glm 4.6 (non air) or qwen 3 235B.

>>106841664
>when did the last two come out?
GLM and Qwen 3?
Not that long ago, two, three months.

Anonymous
10/09/25(Thu)17:18:18 No.106841706

Anonymous 10/09/25(Thu)17:18:18 No.106841706

>>106841492
Qwen is not good for jerking off.
>>106841664
>when did the last two come out?
around august IIRC

Anonymous
10/09/25(Thu)17:19:31 No.106841718

Anonymous 10/09/25(Thu)17:19:31 No.106841718

>>106841701
thanks anon

Anonymous
10/09/25(Thu)17:20:09 No.106841721

Anonymous 10/09/25(Thu)17:20:09 No.106841721

>>106841592
Air is shit, run 4.6 in q2/q3

Anonymous
10/09/25(Thu)17:20:47 No.106841726

Anonymous 10/09/25(Thu)17:20:47 No.106841726

>>106841721
>>106841701
not that anon but say I have the same setup (5090) but I only have 64GB of RAM. Which GLM should I go with?

Anonymous
10/09/25(Thu)17:21:55 No.106841734

Anonymous 10/09/25(Thu)17:21:55 No.106841734

>>106841726
q5, q4 air.

Anonymous
10/09/25(Thu)17:22:32 No.106841740

Anonymous 10/09/25(Thu)17:22:32 No.106841740

>>106841734
Which specific q5 though? I was looking at exactly those actually.

Anonymous
10/09/25(Thu)17:23:47 No.106841746

Anonymous 10/09/25(Thu)17:23:47 No.106841746

>>106841726
Are you capable of basic arithmetic? Like addition and stuff?

Anonymous
10/09/25(Thu)17:26:25 No.106841766

Anonymous 10/09/25(Thu)17:26:25 No.106841766

>>106841740
Ideally, the largest one you can fit with the context size you want.
Experiment, see what works for you.

Anonymous
10/09/25(Thu)17:29:11 No.106841778

Anonymous 10/09/25(Thu)17:29:11 No.106841778

>>106841492
>ask whether something better than Rocinante has come out
>replies with either the model rocinante is based on
>or a model that's 8 times the filesize in its lightweight version
>or a model that's not for porn at all

So basically nothing has surpassed rocinante that doesn't involve more copeputing?

Anonymous
10/09/25(Thu)17:30:12 No.106841786

Anonymous 10/09/25(Thu)17:30:12 No.106841786

>>106841778
nobody wants to admit it, but no

Anonymous
10/09/25(Thu)17:30:46 No.106841791

Anonymous 10/09/25(Thu)17:30:46 No.106841791

>>106841778
>>or a model that's 8 times the filesize in its lightweight version
But that you can run in RAM even with an 8gb GPU.

Anonymous
10/09/25(Thu)17:32:43 No.106841805

Anonymous 10/09/25(Thu)17:32:43 No.106841805

>>106841778
Mistral lost, shill.

Anonymous
10/09/25(Thu)17:33:23 No.106841809

Anonymous 10/09/25(Thu)17:33:23 No.106841809

>>106841791
>But that you can run in RAM even with an 8gb GPU.
Slowly. You always leave that part out.

Anonymous
10/09/25(Thu)17:37:22 No.106841842

Anonymous 10/09/25(Thu)17:37:22 No.106841842

File: file.jpg (278 KB, 602x998)

278 KB JPG

https://x.com/rryssf_/status/1976269613072843063
https://www.arxiv.org/abs/2510.04618

Anonymous
10/09/25(Thu)17:39:26 No.106841852

Anonymous 10/09/25(Thu)17:39:26 No.106841852

File: terry davis reverse smile.gif (2.87 MB, 200x234)

2.87 MB GIF

>>106841842

Anonymous
10/09/25(Thu)17:40:06 No.106841859

Anonymous 10/09/25(Thu)17:40:06 No.106841859

>>106841842
I hate reading linkedin-ese

Anonymous
10/09/25(Thu)17:40:16 No.106841861

Anonymous 10/09/25(Thu)17:40:16 No.106841861

>>106841842
>model writes, reflects, and edits its own prompt
useless for anything but specific math problems.

Anonymous
10/09/25(Thu)17:41:19 No.106841879

Anonymous 10/09/25(Thu)17:41:19 No.106841879

File: 1735673918123680.png (18 KB, 275x197)

18 KB PNG

>>106841842
>pedantic wall of text
you know it's another snakeoil lol

Anonymous
10/09/25(Thu)17:44:22 No.106841910

Anonymous 10/09/25(Thu)17:44:22 No.106841910

File: 1738326433018270.png (294 KB, 640x480)

294 KB PNG

>>106841842
>he believed

Anonymous
10/09/25(Thu)17:45:06 No.106841920

Anonymous 10/09/25(Thu)17:45:06 No.106841920

>>106841861
>esl prompt = bad results
>give esl prompt to model to first fix grammar, spelling, clarity = good results
they just discovered garbage in; garbage out and invented an "Enhance Prompt" feature, give them some VC money ASAP

Anonymous
10/09/25(Thu)17:51:32 No.106841983

Anonymous 10/09/25(Thu)17:51:32 No.106841983

>>106841842
ah yes let my model deliver me even more effective slop by having it inject the slop directly into the prompt on its own
the slop will reach levels previously unseen

Anonymous
10/09/25(Thu)17:54:53 No.106842011

Anonymous 10/09/25(Thu)17:54:53 No.106842011

all these shitty papers make me feel like i can write a shitty paper and get a lot of clout for it, then get hired by some vc pounded startup and grift my way into money
this truly feels like the dotcom bubble
t. wasnt alive during the dotcom bubble

Anonymous
10/09/25(Thu)17:55:13 No.106842014

Anonymous 10/09/25(Thu)17:55:13 No.106842014

>>106841983
it can get even sloppier when they start using it for generating synthetic training data

Anonymous
10/09/25(Thu)17:58:50 No.106842048

Anonymous 10/09/25(Thu)17:58:50 No.106842048

i have a feeling that Q8_0 context cache is really pounding air into the ass, next roleplay session ill switch to 16k context at native context cache to report back

Anonymous
10/09/25(Thu)17:59:29 No.106842053

Anonymous 10/09/25(Thu)17:59:29 No.106842053

>>106841910
>pircel
one of my favorite 4chan meme kek

Anonymous
10/09/25(Thu)18:04:03 No.106842102

Anonymous 10/09/25(Thu)18:04:03 No.106842102

>>106841842
To be honest I never thought about a model for prompts but then again I really don't think you can prompt away the coomer problems. What fixed majority of my coomer problems is using <you know what I am using and you should be using it too instead of malding>.

Anonymous
10/09/25(Thu)18:04:25 No.106842108

Anonymous 10/09/25(Thu)18:04:25 No.106842108

it's friday where's my gemma stupid fucking nigger

Anonymous
10/09/25(Thu)18:04:39 No.106842110

Anonymous 10/09/25(Thu)18:04:39 No.106842110

>>106842011
Like most things, it's less about what in your paper, and more about your connections: what names of the institution and co-writers you can get on the front, how many eyeballs you can to look at it, and how many citiations you can get. Lots of times you see so many names on a paper because they are a cartel and cite each others' papers.

Anonymous
10/09/25(Thu)18:12:43 No.106842161

Anonymous 10/09/25(Thu)18:12:43 No.106842161

Holy fuck GLM 4.6 actually gets it. It reasons. Gemini saars please release 3.0 so I can try vibecoding phrase ban into ikllama, because it is still quite sloppy.

Anonymous
10/09/25(Thu)18:13:03 No.106842164

Anonymous 10/09/25(Thu)18:13:03 No.106842164

>>106842108
>gemma
lmao, you're going to get nothing but local nano banana 12b imagen and that's it

Anonymous
10/09/25(Thu)18:13:51 No.106842172

Anonymous 10/09/25(Thu)18:13:51 No.106842172

>>106842048
you don't really need to, it does
q8 kv cache was only good on old full attention models
gqa is already raping them enough, adding quant on top of that is just asking for shitty output

Anonymous
10/09/25(Thu)18:15:57 No.106842183

Anonymous 10/09/25(Thu)18:15:57 No.106842183

>>106842108
>gemma
It will be gpt-oss tier safe from now on. I only care about Gemini since it's the only model with proper long context.

Anonymous
10/09/25(Thu)18:18:06 No.106842202

Anonymous 10/09/25(Thu)18:18:06 No.106842202

>>106842048
q4 cache is better than q8 cache

Anonymous
10/09/25(Thu)18:19:28 No.106842214

Anonymous 10/09/25(Thu)18:19:28 No.106842214

>>106842202
Mmmm.. Nyo~
maybe in exllama, but >>>>>>>>>>>benchmark

Anonymous
10/09/25(Thu)18:30:05 No.106842290

Anonymous 10/09/25(Thu)18:30:05 No.106842290

>>106841778
Rocinante is a Drummer model right? So have you tried one of his finetunes on a newer model?
He probably uses the same dataset so should be similar.
See if he's made a gemma3-12b model

Anonymous
10/09/25(Thu)18:30:35 No.106842294

Anonymous 10/09/25(Thu)18:30:35 No.106842294

kv quantization absolutely murders models and I doubt the sanity and iq of people who unironically turn this piece of shit on

Anonymous
10/09/25(Thu)18:31:44 No.106842299

Anonymous 10/09/25(Thu)18:31:44 No.106842299

>>106842294
i have to turn it on for a bigger context :'(
i know it degrades but.. i 'ave to do it br'er

Anonymous
10/09/25(Thu)18:32:38 No.106842308

Anonymous 10/09/25(Thu)18:32:38 No.106842308

>>106842048
Quantizing KV at all absolutely does drop output quality, It's not always going to be noticeable in needle in a haystack type tests, but if you make them recall events and why they happened, how characters reacted, etc. over a long context, you see hallucinations a LOT more often, they'll confuse who said what, and use 'similar' words when quoting you, or themselves, that could be synonyms in one context but don't have the correct meaning in that one, making it seem like they've gone (more) retarded.

Anonymous
10/09/25(Thu)18:36:47 No.106842335

Anonymous 10/09/25(Thu)18:36:47 No.106842335

>>106842308
>what, and use 'similar' words when quoting you, or themselves, that could be synonyms in one context but don't have the correct meaning in that one, making it seem like they've gone (more) retarded.
yess exactly, it also confuses you and me more often

Anonymous
10/09/25(Thu)18:41:23 No.106842367

Anonymous 10/09/25(Thu)18:41:23 No.106842367

>>106842108
This just in, anon lies about some shit and another anon actually believes it

Anonymous
10/09/25(Thu)18:42:55 No.106842381

Anonymous 10/09/25(Thu)18:42:55 No.106842381

>>106842308
>>106842294
It's very obvious to anyone who actually rp's with their models, it's like watching the model get blasted with a stupid beam

Anonymous
10/09/25(Thu)18:44:35 No.106842388

Anonymous 10/09/25(Thu)18:44:35 No.106842388

>>106842108
Wtf are you guys talking about? Today's Thursday.

Anonymous
10/09/25(Thu)18:46:41 No.106842397

Anonymous 10/09/25(Thu)18:46:41 No.106842397

File: file.png (6 KB, 639x38)

6 KB PNG

>>106842388
my dear brazilian or american anon
its friday east of the united kingdom, you'd be surprised what time its in japan

Anonymous
10/09/25(Thu)18:48:48 No.106842409

Anonymous 10/09/25(Thu)18:48:48 No.106842409

File: file.png (183 KB, 701x570)

183 KB PNG

pythonbros...
https://x.com/prithajnath/status/1976118864175084008

Anonymous
10/09/25(Thu)18:51:53 No.106842434

Anonymous 10/09/25(Thu)18:51:53 No.106842434

>>106842409
on this note, is there a backend written in python? I know pytorch is 'python' but isnt the actual ai inference/training code written in c or c++ or something?
i know exllama and llamacpp arent python but..

Anonymous
10/09/25(Thu)18:52:37 No.106842445

Anonymous 10/09/25(Thu)18:52:37 No.106842445

>>106842434
transformers backend is in python, hope that helps

Anonymous
10/09/25(Thu)18:52:49 No.106842446

Anonymous 10/09/25(Thu)18:52:49 No.106842446

>>106842434
There is vLLM, but there isn't anything that isn't built on top of at least pytorch

Anonymous
10/09/25(Thu)18:53:08 No.106842450

Anonymous 10/09/25(Thu)18:53:08 No.106842450

>>106842434
Python is literally 100x slower than C.

Anonymous
10/09/25(Thu)18:54:15 No.106842459

Anonymous 10/09/25(Thu)18:54:15 No.106842459

>>106842409
That's good for my programs

Anonymous
10/09/25(Thu)18:55:47 No.106842470

Anonymous 10/09/25(Thu)18:55:47 No.106842470

File: file.png (185 KB, 1618x818)

185 KB PNG

>>106842445
oh shit i never imagined it would actually be completely python, i thought it was packaged in pip with python interfaces just for ease of use, i wonder if uv gives any speedup for comfyui
fun fact: nunchaku is written in c/c++

Anonymous
10/09/25(Thu)18:56:54 No.106842477

Anonymous 10/09/25(Thu)18:56:54 No.106842477

>>106842470
Did you really thing HF were competent enough to write it in C/C++?

Anonymous
10/09/25(Thu)18:57:24 No.106842481

Anonymous 10/09/25(Thu)18:57:24 No.106842481

>>106842108
>stupid fucking nigger
>gemma
I don't think you will be very compatible anon....

Anonymous
10/09/25(Thu)18:58:28 No.106842489

Anonymous 10/09/25(Thu)18:58:28 No.106842489

>>106842477
i assume people are competent too much, sorry anon

Anonymous
10/09/25(Thu)18:59:18 No.106842492

Anonymous 10/09/25(Thu)18:59:18 No.106842492

>>106842470
They have I think NumPy and Torch among their dependencies, there's no way they're actually doing Python loops over tensors, right?

Anonymous
10/09/25(Thu)18:59:46 No.106842496

Anonymous 10/09/25(Thu)18:59:46 No.106842496

>>106842397
It's Thursday at Google.

Anonymous
10/09/25(Thu)19:02:17 No.106842511

Anonymous 10/09/25(Thu)19:02:17 No.106842511

>>106842496
Isn't Google based in India?

Anonymous
10/09/25(Thu)19:03:32 No.106842522

Anonymous 10/09/25(Thu)19:03:32 No.106842522

>>106842492
true.. at least both those are only 60% python
>only

Anonymous
10/09/25(Thu)19:15:37 No.106842592

Anonymous 10/09/25(Thu)19:15:37 No.106842592

>>106842492
imagine being this level unaware

Anonymous
10/09/25(Thu)19:16:17 No.106842596

Anonymous 10/09/25(Thu)19:16:17 No.106842596

>>106842492
No but the transformers code is absymally optimized, which is why no one uses it for inference. For example the kv cache is updated by doing kv = cat(kv, new_kv), allocating a new buffer of slightly larger size for every single token.

Anonymous
10/09/25(Thu)19:18:19 No.106842605

Anonymous 10/09/25(Thu)19:18:19 No.106842605

>>106842596
Who cares? Just buy more VRAM.

Anonymous
10/09/25(Thu)19:22:08 No.106842636

Anonymous 10/09/25(Thu)19:22:08 No.106842636

Any progress with GLM MTP yet? The free speed boost this gives is now absolutely necessary. Possibly the most important next step for llama.cpp since getting MLA to work.

Anonymous
10/09/25(Thu)19:23:24 No.106842641

Anonymous 10/09/25(Thu)19:23:24 No.106842641

>>106842636
waiting for you, llama.cpp accepts contributions :^)

Anonymous
10/09/25(Thu)19:30:46 No.106842678

Anonymous 10/09/25(Thu)19:30:46 No.106842678

>>106842636
Be the change you want to see

Anonymous
10/09/25(Thu)19:31:42 No.106842683

Anonymous 10/09/25(Thu)19:31:42 No.106842683

>>106842641
>>106842678
Why is /lmg/ so salty all the time?

Anonymous
10/09/25(Thu)19:31:44 No.106842686

Anonymous 10/09/25(Thu)19:31:44 No.106842686

it's here

Anonymous
10/09/25(Thu)19:36:58 No.106842720

Anonymous 10/09/25(Thu)19:36:58 No.106842720

File: it.png (91 KB, 853x555)

91 KB PNG

Anonymous
10/09/25(Thu)19:38:22 No.106842734

Anonymous 10/09/25(Thu)19:38:22 No.106842734

>>106842683
They are not salty, they are begging you.

Anonymous
10/09/25(Thu)19:44:38 No.106842795

Anonymous 10/09/25(Thu)19:44:38 No.106842795

Anything happen?

Anonymous
10/09/25(Thu)19:45:14 No.106842801

Anonymous 10/09/25(Thu)19:45:14 No.106842801

File: nothing ever happens clock.jpg (196 KB, 795x789)

196 KB JPG

>>106842795

Anonymous
10/09/25(Thu)19:47:12 No.106842819

Anonymous 10/09/25(Thu)19:47:12 No.106842819

File: 1758817641214336.jpg (44 KB, 640x524)

44 KB JPG

>>106842795

Anonymous
10/09/25(Thu)19:47:48 No.106842826

Anonymous 10/09/25(Thu)19:47:48 No.106842826

>>106842734
this. please save us

Anonymous
10/09/25(Thu)19:49:50 No.106842843

Anonymous 10/09/25(Thu)19:49:50 No.106842843

>>106842795
>Anything happen?
no, and you can blame MLK for that
https://files.catbox.moe/vf1qtc.mp4

Anonymous
10/09/25(Thu)19:50:04 No.106842844

Anonymous 10/09/25(Thu)19:50:04 No.106842844

>>106842795
GLM 4.6 released two weeks ago, so it's about time something happens now.

Anonymous
10/09/25(Thu)19:51:49 No.106842864

Anonymous 10/09/25(Thu)19:51:49 No.106842864

>>106842843
What a dick.

Anonymous
10/09/25(Thu)19:52:53 No.106842876

Anonymous 10/09/25(Thu)19:52:53 No.106842876

I hope there is an anon here that can help me. Is there a good uncensored model that's small, but good with agentic tasks? Like Nemo but recent?

Anonymous
10/09/25(Thu)19:53:29 No.106842881

Anonymous 10/09/25(Thu)19:53:29 No.106842881

>>106842636
https://github.com/ggml-org/llama.cpp/pull/15225#issuecomment-3368697004

Anonymous
10/09/25(Thu)19:53:33 No.106842883

Anonymous 10/09/25(Thu)19:53:33 No.106842883

>>106842844
it was like 1 week ago, not 2

Anonymous
10/09/25(Thu)19:53:52 No.106842890

Anonymous 10/09/25(Thu)19:53:52 No.106842890

File: file.png (109 KB, 1652x612)

109 KB PNG

>>106842844
>>106842883
it's about to get more airy in here...

Anonymous
10/09/25(Thu)19:54:24 No.106842894

Anonymous 10/09/25(Thu)19:54:24 No.106842894

>>106842883
It was?
Damn, time is flowing fucking weird man.

Anonymous
10/09/25(Thu)19:55:29 No.106842908

Anonymous 10/09/25(Thu)19:55:29 No.106842908

>>106842890
It could be nothing, a.k.a. hot air.

Anonymous
10/09/25(Thu)19:55:52 No.106842911

Anonymous 10/09/25(Thu)19:55:52 No.106842911

I get that llama.cpp is focusing on other shit rather than implementing MTP. But why the fuck is ik_llama completely ignoring it? Their entire gimmick is that their petty fork is optimized towards running MoE models off CPU.
MTP for both GLM and Deepseek would be another huge step to own main llama.cpp, so it doesn't really make sense that they're ignoring it too.

Anonymous
10/09/25(Thu)19:58:58 No.106842937

Anonymous 10/09/25(Thu)19:58:58 No.106842937

>>106842911
>>106842683

Anonymous
10/09/25(Thu)19:59:48 No.106842945

Anonymous 10/09/25(Thu)19:59:48 No.106842945

>>106842911
because it's hard

Anonymous
10/09/25(Thu)20:01:32 No.106842956

Anonymous 10/09/25(Thu)20:01:32 No.106842956

>>106842890
really hoping we get 4.6 air soon. 4.6 is way slower than i would like. very high quality, but i am very impatient.
>>106842911
i dont even know what MTP is. multiple token prediction?

Anonymous
10/09/25(Thu)20:04:16 No.106842975

Anonymous 10/09/25(Thu)20:04:16 No.106842975

>>106842956
>multiple token prediction
Yes, free performance.

Anonymous
10/09/25(Thu)20:11:44 No.106843017

Anonymous 10/09/25(Thu)20:11:44 No.106843017

what's the drama between llama and ik_llama?

Anonymous
10/09/25(Thu)20:15:55 No.106843048

Anonymous 10/09/25(Thu)20:15:55 No.106843048

>>106843017
mit cuck license is so cucked, yet ikawrakow wanted to be attributed properly for being a cuck
so he complained about it to ggerg, when one of the files ikawrakow wrote mostly by himself, intel's copyright was in them because intel touched them a little bit
but ikawrakovs wasnt
then ikavrawow forked llamacpp
mitcucks always lose

Anonymous
10/09/25(Thu)20:18:09 No.106843066

Anonymous 10/09/25(Thu)20:18:09 No.106843066

>>106843051
>>106843051
>>106843051

Anonymous
10/09/25(Thu)20:19:28 No.106843070

Anonymous 10/09/25(Thu)20:19:28 No.106843070

>>106843017
Something something ggergachod not attributing troonrakows code. Something something they know each other irl. Something something niggerganov got all the fame and cash while kawrakuck got nothing. >>106843048 MITcucks indeed always lose, African-Americanganov got cucked by forks and wrappers(like ollama and lmstudio) himself.

Anonymous
10/09/25(Thu)20:20:03 No.106843079

Anonymous 10/09/25(Thu)20:20:03 No.106843079

>>106843048
>when one of the files ikawrakow wrote mostly by himself, intel's copyright was in them because intel touched them a little bit
>but ikawrakovs wasnt
I mean, I'd be pissed too...

Anonymous
10/09/25(Thu)20:52:34 No.106843278

Anonymous 10/09/25(Thu)20:52:34 No.106843278

File: migu.png (1023 KB, 1024x995)

1023 KB PNG

>>106841203

Anonymous
10/09/25(Thu)21:24:37 No.106843443

Anonymous 10/09/25(Thu)21:24:37 No.106843443

>>106843278
Niku

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.