/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/12/25(Sun)11:20:23 No.106865582

File: comfy-mikus-taste-the-flavor.png (1.87 MB, 1024x1024)

1.87 MB PNG

/lmg/ - Local Models General Anonymous 10/12/25(Sun)11:20:23 No.106865582 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106857386 & >>106851720

►News
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391
>(10/08) Ling-1T released: https://hf.co/inclusionAI/Ling-1T
>(10/07) Release: LFM2-8b-A1b: Hybrid attention tiny MoE: https://liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/12/25(Sun)11:20:43 No.106865586

Anonymous 10/12/25(Sun)11:20:43 No.106865586

File: comfy-mikus.jpg (1.34 MB, 2048x2048)

1.34 MB JPG

►Recent Highlights from the Previous Thread: >>106857386

--Local LLM agentic system setups and performance optimization challenges:
>106860325 >106860374 >106860456 >106860541 >106860443 >106860477 >106860490 >106860538 >106860630 >106864266 >106864274 >106860515 >106860527 >106860577 >106860598 >106860690 >106860755 >106860555 >106860626 >106860641
--KAT-Dev-72B-Exp model achieves 74.6% accuracy on SWE-Bench Verified:
>106857848 >106857858
--Evaluating Tilelang's potential as a CUDA alternative for GPU kernel development:
>106862606 >106862657 >106862899 >106863225
--Gemma 3 model support status and framework compatibility discussion:
>106858900 >106858933 >106858955 >106858980 >106859012 >106859030
--iPhone 17 Pro runs Liquid AI's 8B LLM, Mac Studio future AI hardware speculation:
>106861745 >106861784 >106861791 >106861817 >106861804
--Orange Pi AI Studio Pro's hardware limitations and pricing inconsistencies:
>106857498 >106857536 >106857552 >106857645 >106858073
--Deepseek 3.2 outperforms glm 4.6 in long-context tasks:
>106858691 >106858698 >106858722 >106858719 >106858734
--GLM 4.6 censorship reduction sparks discussion on model creativity vs safety trends:
>106858586 >106858712 >106858770
--Hugging Face storage limit reductions and user workarounds:
>106864289 >106864305 >106864398 >106864430 >106864469 >106864475 >106864516
--Critiquing Aider's limitations and exploring portable AI coding tool alternatives:
>106859033 >106859045 >106859055 >106859098 >106859109 >106859113 >106859120 >106859128 >106859145
--Questioning Apriel-1.5-15b's claim of competing with Deepseek R1 based on performance plot:
>106862060 >106862073
--Microsoft's Amplifier enables 7B model to surpass 600B model:
>106861985
--MLX TRM reimplementation with recursive reasoning and deep supervision in GitHub:
>106864860
--Miku (free space):
>106861438 >106864274

►Recent Highlight Posts from the Previous Thread: >>106857387

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/12/25(Sun)11:23:27 No.106865611

Anonymous 10/12/25(Sun)11:23:27 No.106865611

>>106865586
>--Miku (free space):
>>106861438 >106864274

>>106865563 is missing

Anonymous
10/12/25(Sun)11:24:42 No.106865625

Anonymous 10/12/25(Sun)11:24:42 No.106865625

>>106865611
Sorry sir, will do better next times.

Anonymous
10/12/25(Sun)11:25:15 No.106865630

Anonymous 10/12/25(Sun)11:25:15 No.106865630

>>106865611
>is missing
good

Anonymous
10/12/25(Sun)11:25:37 No.106865635

Anonymous 10/12/25(Sun)11:25:37 No.106865635

>>106865630
Oh mikutroon is butthurt.

Anonymous
10/12/25(Sun)11:26:17 No.106865638

Anonymous 10/12/25(Sun)11:26:17 No.106865638

Lars Legra thermite

Anonymous
10/12/25(Sun)11:26:22 No.106865642

Anonymous 10/12/25(Sun)11:26:22 No.106865642

>>106865635
i dont care about miku, but keep your shitty bbc fetish to yourself, cuck

Anonymous
10/12/25(Sun)11:27:16 No.106865652

Anonymous 10/12/25(Sun)11:27:16 No.106865652

>>106865642
A request I see?

Anonymous
10/12/25(Sun)11:27:40 No.106865654

Anonymous 10/12/25(Sun)11:27:40 No.106865654

>>106865638
sirs please english only or indan english thanks you

Anonymous
10/12/25(Sun)11:27:58 No.106865658

Anonymous 10/12/25(Sun)11:27:58 No.106865658

>>106865586
cannot unsee the inconsistent legs
thank you for your work o7

Anonymous
10/12/25(Sun)11:52:24 No.106865771

Anonymous 10/12/25(Sun)11:52:24 No.106865771

File: ComfyUI_00326_.jpg (147 KB, 1024x1024)

147 KB JPG

>>106865582
>sipping a cool one with Miku in the lounge
does she need three identically filled glasses? the other two should be empty or otherwise different. similarly only a psychopath would need to align the beer bottle labels like this. can that be prompted around?
it's a nice gen, saved

Anonymous
10/12/25(Sun)12:00:30 No.106865823

Anonymous 10/12/25(Sun)12:00:30 No.106865823

>>106865771
I do this so that I can drink two fast and enjoy the third one slowly. I am an alcoholic. The correct answer is that the extra glasses were for her friends who took the photo

Anonymous
10/12/25(Sun)12:00:49 No.106865828

Anonymous 10/12/25(Sun)12:00:49 No.106865828

sirs sex with green haired teen girl please thank you

Anonymous
10/12/25(Sun)12:01:40 No.106865832

Anonymous 10/12/25(Sun)12:01:40 No.106865832

>>106865823
>"her friends"
right, sure thing dude

Anonymous
10/12/25(Sun)12:05:16 No.106865852

Anonymous 10/12/25(Sun)12:05:16 No.106865852

File: a83dd4e89c29955893bcf75d6(...).jpg (471 KB, 1000x1000)

471 KB JPG

>>106865832
Yes?

Anonymous
10/12/25(Sun)12:07:06 No.106865861

Anonymous 10/12/25(Sun)12:07:06 No.106865861

What should I call the current era in the timeline pic? New Chinese king era? China vs China era? Total Chinese domination era part 4? GLM new king of local era?
0 competition from the west makes it difficult to come up with names.

Anonymous
10/12/25(Sun)12:07:26 No.106865866

Anonymous 10/12/25(Sun)12:07:26 No.106865866

>>106865771
For a magazine ad, showing the bottle labels would be the whole point.

Anonymous
10/12/25(Sun)12:07:57 No.106865872

Anonymous 10/12/25(Sun)12:07:57 No.106865872

>>106865823
It would be weird to place your friends drinks on the floor
>I am an alcoholic
pls stop, limit yourself to one at a time, that's an easy rule to begin with

Anonymous
10/12/25(Sun)12:08:12 No.106865874

Anonymous 10/12/25(Sun)12:08:12 No.106865874

>>106865861
end of the era

Anonymous
10/12/25(Sun)12:08:32 No.106865875

Anonymous 10/12/25(Sun)12:08:32 No.106865875

>>106865861
>China vs China
I'd go with that.

Anonymous
10/12/25(Sun)12:08:34 No.106865876

Anonymous 10/12/25(Sun)12:08:34 No.106865876

>>106865861
Why do you feel compelled to keep adding new eras? This is still the Chinese domination and flood era.

Anonymous
10/12/25(Sun)12:10:59 No.106865892

Anonymous 10/12/25(Sun)12:10:59 No.106865892

>>106865852
>Rin's intense focus
>Miku, nonchalant, mild blush
>when the connotations do more than what is actually shown
this is art

Anonymous
10/12/25(Sun)12:11:18 No.106865894

Anonymous 10/12/25(Sun)12:11:18 No.106865894

>>106865852
taking pictures to send to the doctor to ask if this growth is genital warts or cancer, with miku

Anonymous
10/12/25(Sun)12:15:36 No.106865925

Anonymous 10/12/25(Sun)12:15:36 No.106865925

>>106865861
promised sex era

Anonymous
10/12/25(Sun)12:16:50 No.106865932

Anonymous 10/12/25(Sun)12:16:50 No.106865932

>>106865861
GLM 4.6 era

Anonymous
10/12/25(Sun)12:17:41 No.106865935

Anonymous 10/12/25(Sun)12:17:41 No.106865935

>>106865894
>It is with my deepest sadness to inform you that you have an exceptionally rare form of cancerous genital warts

Anonymous
10/12/25(Sun)12:17:47 No.106865936

Anonymous 10/12/25(Sun)12:17:47 No.106865936

>>106865861
We need some fresh Air era

Anonymous
10/12/25(Sun)12:18:57 No.106865943

Anonymous 10/12/25(Sun)12:18:57 No.106865943

>>106865876
The flood has certainly ended, but Chinese domination indeed continues. I just add new eras for consistency so we don't have a one fat year-long era.

Anonymous
10/12/25(Sun)12:19:00 No.106865945

Anonymous 10/12/25(Sun)12:19:00 No.106865945

>>106865936
Two mor..
hey wasn't that a week ago?

Anonymous
10/12/25(Sun)12:21:35 No.106865960

Anonymous 10/12/25(Sun)12:21:35 No.106865960

>>106865943
An era is not something based on time, it's a change in meta. And meta is not changing

Anonymous
10/12/25(Sun)12:21:40 No.106865961

Anonymous 10/12/25(Sun)12:21:40 No.106865961

>>106865875
Seconded, realistically that's what it is for local ai waifu dreamers

Anonymous
10/12/25(Sun)12:21:50 No.106865962

Anonymous 10/12/25(Sun)12:21:50 No.106865962

>>106865861
cunny era

Anonymous
10/12/25(Sun)12:23:57 No.106865981

Anonymous 10/12/25(Sun)12:23:57 No.106865981

>>106865861
I just want MoE meta to be formally defined.

Anonymous
10/12/25(Sun)12:24:17 No.106865986

Anonymous 10/12/25(Sun)12:24:17 No.106865986

I came.

Anonymous
10/12/25(Sun)12:24:38 No.106865989

Anonymous 10/12/25(Sun)12:24:38 No.106865989

File: small and open.jpg (6 KB, 299x168)

6 KB JPG

>>106865962
Let's hold onto it until we have small model releases again

Anonymous
10/12/25(Sun)12:26:26 No.106866007

Anonymous 10/12/25(Sun)12:26:26 No.106866007

>>106865960
There are still distinct changes in the types of models and how they're used, I think it's interesting to see the timeline pic and remember how far we've come. If there's literally nothing interesting to mention in a 3/6 month period then we're fully cooked roasted and glazed

Anonymous
10/12/25(Sun)12:27:44 No.106866016

Anonymous 10/12/25(Sun)12:27:44 No.106866016

>>106865861
Three Kingdoms Era? (Deepseek, GLM, Qwen)

Anonymous
10/12/25(Sun)12:28:40 No.106866026

Anonymous 10/12/25(Sun)12:28:40 No.106866026

>>106865981
We've been in a MoE meta for almost two years now. There haven't been any dense models worth using since relative to the local SOTA since late llama2.

Anonymous
10/12/25(Sun)12:28:42 No.106866027

Anonymous 10/12/25(Sun)12:28:42 No.106866027

>>106866016
Hello sars you is forgetting llama 4 is very best model fucking bastard guy

Anonymous
10/12/25(Sun)12:28:44 No.106866028

Anonymous 10/12/25(Sun)12:28:44 No.106866028

>>106866016
>Deepseek, GLM, Qwen
Is Qwen like the unsloth of the trio?

Anonymous
10/12/25(Sun)12:29:01 No.106866035

Anonymous 10/12/25(Sun)12:29:01 No.106866035

Do you need a supercomputer to animate images like grok locally?

Anonymous
10/12/25(Sun)12:30:30 No.106866049

Anonymous 10/12/25(Sun)12:30:30 No.106866049

>>106866035
no just skillz

Anonymous
10/12/25(Sun)12:33:24 No.106866075

Anonymous 10/12/25(Sun)12:33:24 No.106866075

File: 68747470733a2f2f7169616e7(...).png (1.41 MB, 3840x2026)

1.41 MB PNG

>>106866028
Their dataset is safetysloped, but they're legitimately innovating in the field

Anonymous
10/12/25(Sun)12:34:20 No.106866088

Anonymous 10/12/25(Sun)12:34:20 No.106866088

>>106866075
I wonder if GLM will change that.

Anonymous
10/12/25(Sun)12:35:18 No.106866094

Anonymous 10/12/25(Sun)12:35:18 No.106866094

>>106866016
>deepseek
lol
>qwen
lmao

Anonymous
10/12/25(Sun)12:37:41 No.106866112

Anonymous 10/12/25(Sun)12:37:41 No.106866112

>>106866016
>no kimi
trash

Anonymous
10/12/25(Sun)12:39:09 No.106866123

Anonymous 10/12/25(Sun)12:39:09 No.106866123

>>106866112
They haven't released anything new since the last era

Anonymous
10/12/25(Sun)12:41:06 No.106866134

Anonymous 10/12/25(Sun)12:41:06 No.106866134

File: 1735837900982730.png (10 KB, 826x104)

10 KB PNG

>>106864289
The limits aren't set in stone. I got bigger private storage despite not being pro because my models get a lot of traffic.

Anonymous
10/12/25(Sun)12:49:48 No.106866205

Anonymous 10/12/25(Sun)12:49:48 No.106866205

File: file.png (1.89 MB, 1328x1328)

1.89 MB PNG

Anonymous
10/12/25(Sun)12:51:53 No.106866225

Anonymous 10/12/25(Sun)12:51:53 No.106866225

File: modelz.png (32 KB, 684x284)

32 KB PNG

>>106865866
Looks too artificial/forced in OP with the intent of depicting a cozy scene. You should show the product from multiple angles and assume your audience understands basic object constancy.
>>106866134
That's like one good quant?

Anonymous
10/12/25(Sun)12:53:15 No.106866232

Anonymous 10/12/25(Sun)12:53:15 No.106866232

>try to use glm 4.6 with tools
>tool call parsing doesn't work in llama-server as per usual

Anonymous
10/12/25(Sun)12:54:29 No.106866241

Anonymous 10/12/25(Sun)12:54:29 No.106866241

>>106866075
lol, I know Tian'anmen is also a real place that can indeed be talked about but it's funny of all the things they had to include that in one of the examples considering how often people talked about The Event this name also refers to as a way to mock chink LLMs

Anonymous
10/12/25(Sun)12:54:32 No.106866242

Anonymous 10/12/25(Sun)12:54:32 No.106866242

>>106866232
llama.cpp isn't made for this sort of thing

Anonymous
10/12/25(Sun)12:54:59 No.106866247

Anonymous 10/12/25(Sun)12:54:59 No.106866247

>>106866242
Then what is it made for?

Anonymous
10/12/25(Sun)12:55:39 No.106866251

Anonymous 10/12/25(Sun)12:55:39 No.106866251

File: limits.png (14 KB, 623x99)

14 KB PNG

>>106866134
I don't have many models and they don't get much traffic at all (I wouldn't even recommend them), but I have a few datasets and higher limits than that.

Anonymous
10/12/25(Sun)12:56:34 No.106866260

Anonymous 10/12/25(Sun)12:56:34 No.106866260

>>106866247
asking a subset of new models to count r's in strawberry on your macbook pro

Anonymous
10/12/25(Sun)12:57:59 No.106866271

Anonymous 10/12/25(Sun)12:57:59 No.106866271

>>106866247
Asking models what's the capital of Bulgaria with llama-cli.

Anonymous
10/12/25(Sun)12:58:14 No.106866273

Anonymous 10/12/25(Sun)12:58:14 No.106866273

>>106866134
yeah, hence why shilling your models is in fact a good thing that gives real benefits for some time now

Anonymous
10/12/25(Sun)12:59:22 No.106866283

Anonymous 10/12/25(Sun)12:59:22 No.106866283

the limits had to happen. Can you faggots even imagine how much crap people posted on HF? I think only youtube can rival them in how many terabytes of literal garbage nobody wants is stored for no reason and costs money for no reason
the world doesn't need 30 different people doing all possible quant variants of models like deepseek
and then doing it again for all the troontunes out there

Anonymous
10/12/25(Sun)13:00:45 No.106866294

Anonymous 10/12/25(Sun)13:00:45 No.106866294

>>106866251
>few datasets
That might be the main criteria, most of my datasets are private

Anonymous
10/12/25(Sun)13:01:05 No.106866299

Anonymous 10/12/25(Sun)13:01:05 No.106866299

on the paranoia of possible anti china censorship on HF: that's not going to happen, and even if it did, there is modelscope, china doesn't depend on HF to publish and store their own models.
https://www.modelscope.ai/home
nothing is going to happen, calm your Q conspiracies
I repeat, nothing ever happens

Anonymous
10/12/25(Sun)13:01:11 No.106866300

Anonymous 10/12/25(Sun)13:01:11 No.106866300

>>106866283
we know pew, only ollama should have the privilege >>106864647

Anonymous
10/12/25(Sun)13:03:57 No.106866326

Anonymous 10/12/25(Sun)13:03:57 No.106866326

>>106866300
non-sequitur
not allowing an infinite number of people making the same quants over and over again doesn't mean not having the quants at all
HF will allow a few like unsloth and bartowski to publish the quants you want
nobody needs 300000 repositories of the same fucking model

Anonymous
10/12/25(Sun)13:05:35 No.106866345

Anonymous 10/12/25(Sun)13:05:35 No.106866345

>>106866326
>HF will allow a few like unsloth and bartowski to publish the quants you want
just fuck the little guy i guess

Anonymous
10/12/25(Sun)13:06:02 No.106866348

Anonymous 10/12/25(Sun)13:06:02 No.106866348

>>106866283
Honestly quite curious about how HF manage their storage, it seems most would be busty some small % of repos that deliver 90% of the content. Even on hot shit with duplex gigabit their servers are slow and transfers fail.

Anonymous
10/12/25(Sun)13:06:13 No.106866350

Anonymous 10/12/25(Sun)13:06:13 No.106866350

>>106866345
you're not enough of a good boy for the privilege to exist you scum

Anonymous
10/12/25(Sun)13:06:35 No.106866356

Anonymous 10/12/25(Sun)13:06:35 No.106866356

What is the best GGUF format multi modal model that can run on 96GB of VRAM and 256GB of RAM?

Anonymous
10/12/25(Sun)13:07:15 No.106866359

Anonymous 10/12/25(Sun)13:07:15 No.106866359

>>106866348
>heir servers are slow and transfers fail.
not my experience at all, they always saturate my DL speed like steam tends to do

Anonymous
10/12/25(Sun)13:09:29 No.106866381

Anonymous 10/12/25(Sun)13:09:29 No.106866381

>>106866326
how do you propose this to work? only approved accounts allowed to post ggufs, only approved can be linked to the "Quantized" link,?

Anonymous
10/12/25(Sun)13:11:25 No.106866403

Anonymous 10/12/25(Sun)13:11:25 No.106866403

>>106866359
Not him but I get hundreds of MB/s and then the downloads fail midway or just get stuck at 99%.
Not sure what's to blame but that happens to me almost every time I download from a fast machine.

Anonymous
10/12/25(Sun)13:14:53 No.106866433

Anonymous 10/12/25(Sun)13:14:53 No.106866433

>>106866348
>how HF manage their storage
They went hardcore on deduplication at chunk level (64 kb is the default their documentation says) with xet, the overhead isn't negligible in the lookup of all the chunks that corresponds to a real file and depending on how it's implemented the amount of syscalls going on
>>106866381
You can post whatever you want if you're under the maximum storage limit. If you want to go over, you have to request it, or, if you're already a well known figure and recognized as useful by the staff, you might even be given more lenient limits without asking anything.
It's logical and makes sense. Why do people think they are entitled to post unlimited terabytes of shit in cloud storage? Nobody offers unlimited storage aside from YouTube, which is why I compared them to HF. YouTube has tons of 0 to 5 viewers kind of videos and to this day I don't understand why they allow this.
Anywhere else, you pay if you want storage. Google won't give it for free on google drive, and neither will MS on onedrive.

Anonymous
10/12/25(Sun)13:16:15 No.106866439

Anonymous 10/12/25(Sun)13:16:15 No.106866439

>>106866433
>I don't understand why they allow this.
free training data

Anonymous
10/12/25(Sun)13:16:16 No.106866441

Anonymous 10/12/25(Sun)13:16:16 No.106866441

>>106866232
>a month old unmerged PR that adds support for the GLM 4.6 tool call format
https://github.com/ggml-org/llama.cpp/pull/15904

Anonymous
10/12/25(Sun)13:18:41 No.106866467

Anonymous 10/12/25(Sun)13:18:41 No.106866467

>>106866441
Holy fuck that Jinja.

Anonymous
10/12/25(Sun)13:21:21 No.106866492

Anonymous 10/12/25(Sun)13:21:21 No.106866492

>>106866433
>Xet by Hugging Face is the most important AI technology that nobody is talking about!
Download speed SUCKS ASS
Do you use the hf huggingface_hub[cli] thing?
HF are not your friend

Anonymous
10/12/25(Sun)13:23:55 No.106866517

Anonymous 10/12/25(Sun)13:23:55 No.106866517

>>106866283
What happened now?

Anonymous
10/12/25(Sun)13:24:31 No.106866523

Anonymous 10/12/25(Sun)13:24:31 No.106866523

>>106866517
huggingface exit scammed

Anonymous
10/12/25(Sun)13:24:59 No.106866527

Anonymous 10/12/25(Sun)13:24:59 No.106866527

File: hf speed.png (146 KB, 1869x328)

146 KB PNG

>>106866232
>>106866242
This is why I let the model to send tool use requests as normal user messages.
Tool usage at the API level should've never have been a thing.
Hell, OpenAI API should've never have been a thing. We should have direct API level to the raw text completion endpoint with no chat template attached and full access to all the logits, the only reason this hasn't happened is because of (((Open))) AI wanting to prevent jailbreaks.
That said I'm sure you can make a proxy that converts API tool requests to normal messages.

>>106866492
???
Are we talking about the same huggingface?
I always get >200MB/s both using the cli tool and simple wget.

Anonymous
10/12/25(Sun)13:25:41 No.106866535

Anonymous 10/12/25(Sun)13:25:41 No.106866535

>>106866527
>I always get >200MB/s both using the cli tool and simple wget.
I get even more using a download manager on windows.

Anonymous
10/12/25(Sun)13:26:33 No.106866548

Anonymous 10/12/25(Sun)13:26:33 No.106866548

>>106866356
Mistral Nemo Instruct 12B

Anonymous
10/12/25(Sun)13:28:00 No.106866561

Anonymous 10/12/25(Sun)13:28:00 No.106866561

File: 1745860140921914.jpg (50 KB, 700x759)

50 KB JPG

>>106866283
>Public storage
>usually up to 5TB

Really? This is what people were upset about last thread? The original weights for deepseek r1 didn't even reach 700 GB in the average anon here, let alone the average HF user, isn't approaching anywhere near that limit anytime soon.

>Repository size: The total size of the data you’re planning to upload. We generally support repositories up to 300GB.

This is the storage quota people should most care about, and that wasn't even changed
And that wasn't even changed.

See the December 17, 2024 archived page:

https://web.archive.org/web/20241217185816/https://huggingface.co/docs/hub/en/storage-limits

The only real change is that you don't have unlimited storage PER BASE ACCOUNT and you get limited to 5 TB (More than enough storage for 90% of users. Most of us don't even create our own models, our fine tunes anyway. So this is a non-issue for basically all of /lmg/

Anonymous
10/12/25(Sun)13:28:09 No.106866564

Anonymous 10/12/25(Sun)13:28:09 No.106866564

>>106866548
Not helpful. Qwen 72B 2.5VL is fucking garbage and that's a 72B model. I need something big and good. Unfortunately VL support in llama.cpp is minimal for some reason. Would love to use Qwen 3VL, but it doesn't work.

Anonymous
10/12/25(Sun)13:29:08 No.106866574

Anonymous 10/12/25(Sun)13:29:08 No.106866574

>>106866523
Haven't followed anything. I just download some models from there now and then. Where can I read more about this? (I'm not trolling).

Anonymous
10/12/25(Sun)13:29:27 No.106866576

Anonymous 10/12/25(Sun)13:29:27 No.106866576

>>106866517
See >>106866561
Look at the current page:
https://huggingface.co/docs/hub/en/storage-limits
Vs what the limits were back in December:

https://web.archive.org/web/20241217185816/https://huggingface.co/docs/hub/en/storage-limits

Basically unlimited storage isn't a thing anymore, which wasn't sustainable long-term anyway.

Anonymous
10/12/25(Sun)13:29:57 No.106866584

Anonymous 10/12/25(Sun)13:29:57 No.106866584

>>106866564
You could try InternVL3. Based on Qwen 3 and supported by llama.cpp, but it does not handle NSFW well.

Anonymous
10/12/25(Sun)13:30:24 No.106866587

Anonymous 10/12/25(Sun)13:30:24 No.106866587

>>106866576
Yeah makes sense.

Anonymous
10/12/25(Sun)13:30:37 No.106866589

Anonymous 10/12/25(Sun)13:30:37 No.106866589

>>106866561
4chan has always been filled by the poor and entitled who feel it's an affront to their self respect if they can no longer abuse a free resource
I bet some of the whiners here might have stored things other than models there eh

Anonymous
10/12/25(Sun)13:30:53 No.106866594

Anonymous 10/12/25(Sun)13:30:53 No.106866594

>>106866584
I tried it before and it kind of sucked. Fortunately I am actually trying to do work instead of coom right now.

Anonymous
10/12/25(Sun)13:31:14 No.106866598

Anonymous 10/12/25(Sun)13:31:14 No.106866598

>>106866574
All the drama is happening on /r/LocalLlama so you'd have to go read about it there.

Anonymous
10/12/25(Sun)13:31:33 No.106866601

Anonymous 10/12/25(Sun)13:31:33 No.106866601

>>106866589
>le poor
It's just an imageboard on internet, jesus christ.

Anonymous
10/12/25(Sun)13:32:33 No.106866612

Anonymous 10/12/25(Sun)13:32:33 No.106866612

>>106866594
I found the 38B to be much better than the 30B, but haven't tried the 241B.

Anonymous
10/12/25(Sun)13:33:39 No.106866623

Anonymous 10/12/25(Sun)13:33:39 No.106866623

>>106866612
Gonna try an IQ4XS of the big one.

Anonymous
10/12/25(Sun)13:33:55 No.106866624

Anonymous 10/12/25(Sun)13:33:55 No.106866624

>>106866601
>just an imageboard on internet
"just"
every place has its culture and audience
just like how HN has webshits and plebbit has neckbeards

Anonymous
10/12/25(Sun)13:35:54 No.106866639

Anonymous 10/12/25(Sun)13:35:54 No.106866639

>>106866624
Obviously you are too good to be here. Why are you not sipping whiskey at your country club then?

Anonymous
10/12/25(Sun)13:45:12 No.106866727

Anonymous 10/12/25(Sun)13:45:12 No.106866727

>>106866359
They should run an automated torrent tracker with a handful big bitch SSD systems for initial seed

Anonymous
10/12/25(Sun)13:45:30 No.106866728

Anonymous 10/12/25(Sun)13:45:30 No.106866728

File: Huggingface_storage-limit(...).png (327 KB, 2076x1690)

327 KB PNG

>>106866589
Isn't this the very thread that scorns and looks down upon anyone who relies on non-local services? Anyway? Wouldn't they be the same people that would advocate for storing most, if not all of the shit you care about on your own hard drives anyway? Out of all the people that would even be paying attention to how much you can store on HF, I'd expect /lmg/ to care the least

Someone last threat said:
>>106864337
>The west realized that they won't be able to hold China back so they're now trying to kill Deepseek and the others like thiis

This wouldn't even affect released like deepseek anyway because models like those don't even breach 1 TB in storage.

And even then the 5 TB public repo size limit only applies to free users. Pro users and above get more storage as seen in link and pic rel.

https://huggingface.co/docs/hub/en/storage-limits

Even for bigus-dickus models like deepseek-r1 or Kimi The 5 TB base public storage is more than enough and for most other people that's an absurd amount of storage that's almost difficult to comprehend anyway. I'm not even convinced LLMs from the "big league" companies like gpt5, Claude, or even Gemini ≥ 2.5 are 5 TB in size.

Anonymous
10/12/25(Sun)13:50:03 No.106866768

Anonymous 10/12/25(Sun)13:50:03 No.106866768

>>106866728
>I'd expect /lmg/ to care the least
There's a lot of people here that can't or won't quant their own models because they're scared of 2 commands in the terminal, so they rely on these mass uploaders to do it for them. For something like AWQ or exl I get, but no one using gguf should care.

Anonymous
10/12/25(Sun)13:50:33 No.106866778

Anonymous 10/12/25(Sun)13:50:33 No.106866778

>>106866728
you know what's really offensive about all this
no, not the storage limits
those retarded emojis being spammed all the time these days
particularly hate that prayer hand thing, very jeet

Anonymous
10/12/25(Sun)13:51:11 No.106866785

Anonymous 10/12/25(Sun)13:51:11 No.106866785

Any big (>500B) model releases lately?

Anonymous
10/12/25(Sun)13:52:35 No.106866799

Anonymous 10/12/25(Sun)13:52:35 No.106866799

>>106866785
>bizzare tranny propoganda
go away

Anonymous
10/12/25(Sun)13:56:05 No.106866826

Anonymous 10/12/25(Sun)13:56:05 No.106866826

>>106866778
That's just ultra normies sticking their noses into things they have no business touching. Some middle management parasite probably looked at it and said "umm well it looks like but you need to like make it more personable". I have idiots like this at the current WFH job I work at and they care more about whether or not I'm verbally jerking off the clients I email than information accuracy or if I am actually doing the job well.

Anonymous
10/12/25(Sun)13:59:52 No.106866855

Anonymous 10/12/25(Sun)13:59:52 No.106866855

>>106866826
Imagine how much better the world would be if AI was used to put middle managers. marketing, and HR types out of a job and they were put to manual labor instead.

Anonymous
10/12/25(Sun)14:01:20 No.106866866

Anonymous 10/12/25(Sun)14:01:20 No.106866866

Not sure if this is the correct thread to ask but what's the easiest way to set up AI text-to-speech that I could run locally?
All I need it for is the simple task of text-to-speech in English with no frills, no audio editing, no effects etc.

I tried googling for an answer but all I got were some fucking browser-based SERVICES with subscriptions past a free trial.

Anonymous
10/12/25(Sun)14:04:54 No.106866897

Anonymous 10/12/25(Sun)14:04:54 No.106866897

>>106866866
https://github.com/remsky/Kokoro-FastAPI

Anonymous
10/12/25(Sun)14:08:20 No.106866920

Anonymous 10/12/25(Sun)14:08:20 No.106866920

File: 08df6755874a2d65.jpg (38 KB, 271x333)

38 KB JPG

>>106866897
tyvm seems simple enough

Anonymous
10/12/25(Sun)14:08:36 No.106866922

Anonymous 10/12/25(Sun)14:08:36 No.106866922

>>106866855
Soon the great cleanse, although it may work out differently to how you imagine

Anonymous
10/12/25(Sun)14:12:04 No.106866955

Anonymous 10/12/25(Sun)14:12:04 No.106866955

File: file.png (7 KB, 473x46)

7 KB PNG

brings a tear to my cock to see the rabbits wake up

Anonymous
10/12/25(Sun)14:15:29 No.106866985

Anonymous 10/12/25(Sun)14:15:29 No.106866985

File: file.png (534 KB, 687x393)

534 KB PNG

Perfect...

Anonymous
10/12/25(Sun)14:17:10 No.106866998

Anonymous 10/12/25(Sun)14:17:10 No.106866998

File: file.png (368 KB, 431x506)

368 KB PNG

>please redownload the quant for the 20th time

Anonymous
10/12/25(Sun)14:17:11 No.106866999

Anonymous 10/12/25(Sun)14:17:11 No.106866999

Okay, Qwen3 30 seems to be working, I think. It's hard to know with kobold sometimes, gave me a bunch of errors but launched anyway.
But I can't find any sillytavern templates for it. (Context, Instruct, System Prompt)
Am I missing some repository of knowledge or people stopped sharing these things?

Anonymous
10/12/25(Sun)14:18:15 No.106867015

Anonymous 10/12/25(Sun)14:18:15 No.106867015

File: file.png (312 KB, 386x481)

312 KB PNG

I hope you see this in your dream tonight anon.

Anonymous
10/12/25(Sun)14:18:29 No.106867019

Anonymous 10/12/25(Sun)14:18:29 No.106867019

>>106866985
quanted picture award

Anonymous
10/12/25(Sun)14:21:06 No.106867041

Anonymous 10/12/25(Sun)14:21:06 No.106867041

>>106867015
Someone make wartime propaganda poster from this

Anonymous
10/12/25(Sun)14:22:37 No.106867060

Anonymous 10/12/25(Sun)14:22:37 No.106867060

File: file.png (652 KB, 520x653)

652 KB PNG

Dear god... This man will become a mass shooter someday.

Anonymous
10/12/25(Sun)14:22:53 No.106867061

Anonymous 10/12/25(Sun)14:22:53 No.106867061

>>106866999
qwen uses ChatML templates

Anonymous
10/12/25(Sun)14:24:38 No.106867078

Anonymous 10/12/25(Sun)14:24:38 No.106867078

>schizophrenic obsession on full display
yokes

Anonymous
10/12/25(Sun)14:25:11 No.106867085

Anonymous 10/12/25(Sun)14:25:11 No.106867085

File: file.png (1.25 MB, 1548x823)

1.25 MB PNG

Can someone tell me what is there on the whiteboard?

Anonymous
10/12/25(Sun)14:26:09 No.106867095

Anonymous 10/12/25(Sun)14:26:09 No.106867095

>>106865861
It seems like we are in the MoE era

Anonymous
10/12/25(Sun)14:27:36 No.106867107

Anonymous 10/12/25(Sun)14:27:36 No.106867107

>>106867078
Be more precise. I am not obsessed I am deeply profoundly jealous. They are genuinely retarded but they are successful. I still get triggered by the world being about connections and... I don't even know what the fuck got them where they are but for sure it is not competence.

Anonymous
10/12/25(Sun)14:29:36 No.106867130

Anonymous 10/12/25(Sun)14:29:36 No.106867130

Unsloth guy fixed the gradient accumulation training.

Anonymous
10/12/25(Sun)14:29:40 No.106867131

Anonymous 10/12/25(Sun)14:29:40 No.106867131

>>106867107
just like unsloth i don't how you precision

Anonymous
10/12/25(Sun)14:29:57 No.106867134

Anonymous 10/12/25(Sun)14:29:57 No.106867134

>>106866527
>This is why I let the model to send tool use requests as normal user messages.
Does this work with ST?

Anonymous
10/12/25(Sun)14:31:48 No.106867152

Anonymous 10/12/25(Sun)14:31:48 No.106867152

>>106867130
In what?

Anonymous
10/12/25(Sun)14:32:19 No.106867157

Anonymous 10/12/25(Sun)14:32:19 No.106867157

>>106866564
qwen VL 3 235B is is the only competitor to gemini

Hi all, Drummer here...
10/12/25(Sun)14:32:23 No.106867158

Hi all, Drummer here... 10/12/25(Sun)14:32:23 No.106867158

> I tested it on long context and works really great. It is awesome. It captured correctly the entire narrative, checked science, impressed. 118K at my current count.

https://huggingface.co/TheDrummer/Cydonia-Redux-22B-v1.1

Some guy claims that this 32K ctx base model from last year somehow reached 128K ctx without breaking down after my tuning.

That's probably bullshit, can someone prove him/me wrong?

Anonymous
10/12/25(Sun)14:33:24 No.106867162

Anonymous 10/12/25(Sun)14:33:24 No.106867162

>>106867157
Yeah, except it doesn't work as a GGUF. The intern VL model is also not really working despite having a GGUF.

Anonymous
10/12/25(Sun)14:33:51 No.106867167

Anonymous 10/12/25(Sun)14:33:51 No.106867167

>>106867158
answering yes or no after long contexts isn't hard, that's what they benchmaxx on with needle in stacks, making proper use of all the context though is very hard

Anonymous
10/12/25(Sun)14:33:59 No.106867168

Anonymous 10/12/25(Sun)14:33:59 No.106867168

>>106867158
suck a dick

Anonymous
10/12/25(Sun)14:34:05 No.106867170

Anonymous 10/12/25(Sun)14:34:05 No.106867170

>>106867085
Where are the scrum tickets?

Anonymous
10/12/25(Sun)14:35:13 No.106867180

Anonymous 10/12/25(Sun)14:35:13 No.106867180

Are we still pretending that gpt-oss-120b is bad?

Anonymous
10/12/25(Sun)14:35:13 No.106867181

Anonymous 10/12/25(Sun)14:35:13 No.106867181

>>106867162
I captioned a million images for like $10 just saying

Anonymous
10/12/25(Sun)14:35:43 No.106867185

Anonymous 10/12/25(Sun)14:35:43 No.106867185

>>106867158
While you're here can you get model cards and bart imax quants for these https://huggingface.co/TheDrummer/Magidonia-24B-v4.2.0
https://huggingface.co/TheDrummer/Cydonia-24B-v4.2.0

Hi all, Drummer here...
10/12/25(Sun)14:35:44 No.106867186

Hi all, Drummer here... 10/12/25(Sun)14:35:44 No.106867186

>>106867167
No, it's apparently printing out fully coherent paragraphs and making use of all the context properly.

Anonymous
10/12/25(Sun)14:35:58 No.106867187

Anonymous 10/12/25(Sun)14:35:58 No.106867187

https://github.com/asgeirtj/system_prompts_leaks/blob/main/Anthropic/claude-4.5-sonnet.md
I love how SOTA models can ingest 30k worth of tokens as their system prompt yet we local plebe only get that much as usable range and our models become autistic if we feed more
desu what I want from local these days is really just that, more context, they are capable enough for what I use them for but can we get a model that doesn't become retarded after 30k

Hi all, Drummer here...
10/12/25(Sun)14:37:02 No.106867196

Hi all, Drummer here... 10/12/25(Sun)14:37:02 No.106867196

>>106867185
Sure, let me give him a ping.

Anonymous
10/12/25(Sun)14:37:02 No.106867197

Anonymous 10/12/25(Sun)14:37:02 No.106867197

>>106867187
It's trained with that system prompt.

Anonymous
10/12/25(Sun)14:37:07 No.106867198

Anonymous 10/12/25(Sun)14:37:07 No.106867198

>>106867158
Well most of your 12b models break down already after 1000 tokens or less. But this is because Mistral 12b is dumb by default. It's not going to change.
Cydonia is better.
It's easy to test: create a character description which mentions inventory - character has 25 gold and bow and arrows.
Then after a while ask how much gold this character is carrying?
Most braindead models cannot get even this right.

Anonymous
10/12/25(Sun)14:37:52 No.106867204

Anonymous 10/12/25(Sun)14:37:52 No.106867204

>>106867152
https://huggingface.co/blog/gradient_accumulation
https://unsloth.ai/blog/gradient
Their fix made it into transformers shortly after.

Hi all, Drummer here...
10/12/25(Sun)14:38:43 No.106867209

Hi all, Drummer here... 10/12/25(Sun)14:38:43 No.106867209

>>106867185
Oh, model cards. Yeah, I'm fucking lazy. But hey, the model's miles better than v4.1 if that's what you're wondering.

Anonymous
10/12/25(Sun)14:40:35 No.106867225

Anonymous 10/12/25(Sun)14:40:35 No.106867225

>>106867209
That would be nice to have, but imax quants are more important as I can only run cope quants, so IQ ones are my salvation for these, will try them as soon as that's available, currently using non r1 4.1.

Anonymous
10/12/25(Sun)14:41:15 No.106867238

Anonymous 10/12/25(Sun)14:41:15 No.106867238

>>106867158
>>106867167
>>106867186
aids-ridden-homosexual-nigger-samefag

glm-era. kys

Hi all, Drummer here...
10/12/25(Sun)14:41:49 No.106867244

Hi all, Drummer here... 10/12/25(Sun)14:41:49 No.106867244

>>106867238
>glm-era

Fuck yeah, I love that model. Preach!

Anonymous
10/12/25(Sun)14:43:24 No.106867260

Anonymous 10/12/25(Sun)14:43:24 No.106867260

>>106867244
>most enthusiastic AI loving pajeet that is about to replaced by AI

Anonymous
10/12/25(Sun)14:43:26 No.106867261

Anonymous 10/12/25(Sun)14:43:26 No.106867261

>>106867158
test your own models yourself, you hack

Anonymous
10/12/25(Sun)14:44:26 No.106867272

Anonymous 10/12/25(Sun)14:44:26 No.106867272

File: file.png (1.07 MB, 964x1300)

1.07 MB PNG

>>106867261
You dropped an image.

Anonymous
10/12/25(Sun)14:45:29 No.106867282

Anonymous 10/12/25(Sun)14:45:29 No.106867282

>>106867261
he has an entire discord army to do beta tests for him, his models get more testing than most shit labs throw out

Anonymous
10/12/25(Sun)14:46:21 No.106867288

Anonymous 10/12/25(Sun)14:46:21 No.106867288

>>106867282
tested by jeets for jeets

Anonymous
10/12/25(Sun)14:46:26 No.106867289

Anonymous 10/12/25(Sun)14:46:26 No.106867289

>>106867187
Nothing happens if you just use that prompt with a local model.

Anonymous
10/12/25(Sun)14:51:02 No.106867337

Anonymous 10/12/25(Sun)14:51:02 No.106867337

>>106866855
>>106866922 me
By that I suggest consider getting your basic life essentials self-sustainably. Society as we knew it won't continue like this. Power for a GPU rig may be a luxury if you aren't prepared

Anonymous
10/12/25(Sun)14:52:18 No.106867348

Anonymous 10/12/25(Sun)14:52:18 No.106867348

>>106867289
I wonder if you have like a modular sysprompt (as in it is not the same everytime but a stack of different components) and you add it into pretraining: would that make your model handle long context better? Like... the content of this sysprompt does not really matter that much, but just training weights on long sequence lengths would make it perform better on a real long sequence? Kind of synthetic augmentation of data.

Yes give me the paper that was written about it already please.

Anonymous
10/12/25(Sun)14:54:11 No.106867364

Anonymous 10/12/25(Sun)14:54:11 No.106867364

>>106867337
Sadly, self-sustainability isn't really an option when living in the city. Even if one has a summer cottage somewhere far from the cities, getting to it when SHTF won't be easy.

Anonymous
10/12/25(Sun)14:55:26 No.106867378

Anonymous 10/12/25(Sun)14:55:26 No.106867378

Every day we make it through is a day closer to GLM MTP in llama.cpp

Anonymous
10/12/25(Sun)15:00:59 No.106867441

Anonymous 10/12/25(Sun)15:00:59 No.106867441

File: miku baja blast drink.jpg (96 KB, 1080x1062)

96 KB JPG

>>106865582

Anonymous
10/12/25(Sun)15:02:13 No.106867454

Anonymous 10/12/25(Sun)15:02:13 No.106867454

File: 1676937610877685.png (570 KB, 694x780)

570 KB PNG

>>106867348
I'm assuming this is done already, the model would be hella retarded if you trained it on many similar initial token seq samples
Don't get long context desire beyond coding/agent shiz, 16K is enough (w thinking glm4.6btw)
>>106867364
Maybe time to properly consider a GTFO plan
Poast your oldest lmg memes

Anonymous
10/12/25(Sun)15:12:42 No.106867546

Anonymous 10/12/25(Sun)15:12:42 No.106867546

>>106867197
When some guy kills himself and they need to make the model more "safe" do you think they finetune? No, they first change the system prompt. And the API has a different prefill than the web UI.
So it's not completely baked in.

>>106867187
I think it might be possible to improve model quality by first asking it a question, saving the answer, then asking it the same question surrounded by unrelated text, and training on the original answer. That might not teach it to use the information from the whole context but at least it'll fight degradation juts from having a long prompt full of unrelated information.
But there are also ways in which this could backfire (teaching the model to ignore actually important information in other contexts.

Anonymous
10/12/25(Sun)15:13:49 No.106867553

Anonymous 10/12/25(Sun)15:13:49 No.106867553

File: lmg miku llama thing 3b a(...).png (22 KB, 310x351)

22 KB PNG

>>106867454

Anonymous
10/12/25(Sun)15:16:52 No.106867578

Anonymous 10/12/25(Sun)15:16:52 No.106867578

>>106867441
I don't get it, how is this a bad thing? She needs to pee and I want baja blast.

Anonymous
10/12/25(Sun)15:27:34 No.106867678

Anonymous 10/12/25(Sun)15:27:34 No.106867678

>>106866467
jinja please

Anonymous
10/12/25(Sun)15:36:02 No.106867750

Anonymous 10/12/25(Sun)15:36:02 No.106867750

>>106867441
I'll take two, extra large please

Anonymous
10/12/25(Sun)15:44:49 No.106867823

Anonymous 10/12/25(Sun)15:44:49 No.106867823

piss fetish anon are you still here?

Anonymous
10/12/25(Sun)15:48:33 No.106867854

Anonymous 10/12/25(Sun)15:48:33 No.106867854

>>106867454
>we've been here for more than 2 years already
Man...

Anonymous
10/12/25(Sun)15:50:00 No.106867869

Anonymous 10/12/25(Sun)15:50:00 No.106867869

File: 1680313064680303.jpg (93 KB, 715x404)

93 KB JPG

>>106867454
Forgot image. Timestamp of this file on my hard drive is April 1 2023.

Anonymous
10/12/25(Sun)15:51:37 No.106867891

Anonymous 10/12/25(Sun)15:51:37 No.106867891

>>106867454
>lmg memes
https://arch.b4k.dev/v/thread/620908196/#620913184

scabPICKER
10/12/25(Sun)15:51:44 No.106867892

scabPICKER 10/12/25(Sun)15:51:44 No.106867892

Same model showdown

27B... quant 4 (or a similar sized fancy quant)

- vs -

8B, not a quant.

scabPICKER
10/12/25(Sun)15:52:45 No.106867903

scabPICKER 10/12/25(Sun)15:52:45 No.106867903

>>106867869
I once got behind on work because I was trying to read all of the Gawker blogs. It kept getting bigger and bigger and bigger and bigger.

Anonymous
10/12/25(Sun)15:52:59 No.106867906

Anonymous 10/12/25(Sun)15:52:59 No.106867906

>>106867869
Has everyone managed to finally catch up on AI literature in the last two and half years?

Anonymous
10/12/25(Sun)15:53:04 No.106867909

Anonymous 10/12/25(Sun)15:53:04 No.106867909

File: G3AYSpAWsAEoNzc.jpg (1.58 MB, 3070x4096)

1.58 MB JPG

https://x.com/techdevnotes/status/1977106957871071273

Anonymous
10/12/25(Sun)15:55:21 No.106867933

Anonymous 10/12/25(Sun)15:55:21 No.106867933

File: 1760271418622830.webm (3.94 MB, 720x1280)

3.94 MB WEBM

The dSPY GEPA experience

Anonymous
10/12/25(Sun)15:57:57 No.106867956

Anonymous 10/12/25(Sun)15:57:57 No.106867956

>>106867933
what the ?

Anonymous
10/12/25(Sun)15:59:23 No.106867966

Anonymous 10/12/25(Sun)15:59:23 No.106867966

>>106867892
Higher B always wins. It's a proven fact.
Of course this depends on your own application.

scabPICKER
10/12/25(Sun)15:59:24 No.106867967

scabPICKER 10/12/25(Sun)15:59:24 No.106867967

>>106867454
ai still perfectly places the features. It's obviously not amateur.

Anonymous
10/12/25(Sun)16:01:37 No.106867987

Anonymous 10/12/25(Sun)16:01:37 No.106867987

>>106867933
@Ani explain what is going on in this video.

Anonymous
10/12/25(Sun)16:01:37 No.106867988

Anonymous 10/12/25(Sun)16:01:37 No.106867988

File: wang-zhihe-stinky-tofu.jpg (70 KB, 500x500)

70 KB JPG

>>106867956
https://en.wikipedia.org/wiki/Stinky_tofu
Man holds jar of stinky tofu close to intake fan that pressurizes the suit.

scabPICKER
10/12/25(Sun)16:01:44 No.106867989

scabPICKER 10/12/25(Sun)16:01:44 No.106867989

>>106867966
good info, why is this?

I'm looking at translation stuff. Tower Instruct+ has 27B and 8B, and someone has quants made.

I'm also wondering about Qwen, and some abliterated versions etc. But Qwen is good at translation, in its big versions anyway.

Anonymous
10/12/25(Sun)16:02:09 No.106867992

Anonymous 10/12/25(Sun)16:02:09 No.106867992

File: Screenshot from 2025-10-1(...).png (121 KB, 816x602)

121 KB PNG

Can writefags really dispute?
Looking for an honest critique, I honestly don't know what more I'd want frrom a text model.

Anonymous
10/12/25(Sun)16:02:44 No.106867997

Anonymous 10/12/25(Sun)16:02:44 No.106867997

>>106867992
>thought for 4 minutes
how the fuck do you jack off faggot?

scabPICKER
10/12/25(Sun)16:02:44 No.106867998

scabPICKER 10/12/25(Sun)16:02:44 No.106867998

>>106867988
ok, so why does he sniff the guy's butt?

also I need to buy some of this stuff lol

Anonymous
10/12/25(Sun)16:03:14 No.106868001

Anonymous 10/12/25(Sun)16:03:14 No.106868001

>>106867992
what's more important to you, perceived quality or actual quality?

Anonymous
10/12/25(Sun)16:03:36 No.106868004

Anonymous 10/12/25(Sun)16:03:36 No.106868004

>>106867909
Elon won.

scabPICKER
10/12/25(Sun)16:03:58 No.106868007

scabPICKER 10/12/25(Sun)16:03:58 No.106868007

>>106867992
swnbaw

Anonymous
10/12/25(Sun)16:04:01 No.106868009

Anonymous 10/12/25(Sun)16:04:01 No.106868009

>>106867906
I and most others (who were doing it as a hobby) probably just stopped caring.

Anonymous
10/12/25(Sun)16:04:45 No.106868018

Anonymous 10/12/25(Sun)16:04:45 No.106868018

File: iu[1].jpg (41 KB, 600x750)

41 KB JPG

>>106867992

Anonymous
10/12/25(Sun)16:04:52 No.106868019

Anonymous 10/12/25(Sun)16:04:52 No.106868019

>>106867997
Patiently? She keeps me warm while she's thinking

Anonymous
10/12/25(Sun)16:05:10 No.106868021

Anonymous 10/12/25(Sun)16:05:10 No.106868021

>>106867989
Quant is just a decimal point. It doesn't change the parameters of the model. eg. 8B is still 8B even if it's in 4 bit space.
As simple as that.
Don't bother with abliterations either they mangle up the model in bad ways. Most of the time you don't need it.
Even if you want to generate something questionable, it's more about your own prompt structuring unless it's something like gpt-ass.

Anonymous
10/12/25(Sun)16:06:14 No.106868033

Anonymous 10/12/25(Sun)16:06:14 No.106868033

>>106868001
What does that even mean to me quality is a 1-dimensional concept

Anonymous
10/12/25(Sun)16:06:32 No.106868036

Anonymous 10/12/25(Sun)16:06:32 No.106868036

>>106867992
>It's not just X, it's Y

scabPICKER
10/12/25(Sun)16:06:58 No.106868039

scabPICKER 10/12/25(Sun)16:06:58 No.106868039

>>106867992
>The wind howled down the canyon. This stupid bitch better learn our ways, or I'll have to put her down, though Igor the Incel. It's been days since we caught sign of (n). But Igor had a plan. That night he started fire to a cabin they'd come across. He knew thieving indians and (n) would come for miles. And they'd be ready, with their revolvers and scalping gear. Let's see if she can really kill for her race.

^ my ideal model

Anonymous
10/12/25(Sun)16:07:13 No.106868041

Anonymous 10/12/25(Sun)16:07:13 No.106868041

>>106867992
Not a write fag but I don't like it. It's not something what I would ever read.
Don't you read any books?

Anonymous
10/12/25(Sun)16:07:23 No.106868043

Anonymous 10/12/25(Sun)16:07:23 No.106868043

>not just spoonfeeding tourists but even namefaggots now too
this is a new low for lmg

Anonymous
10/12/25(Sun)16:08:13 No.106868048

Anonymous 10/12/25(Sun)16:08:13 No.106868048

>>106868043
And what was the high? Nvidia Engineer told us that Gemma 4 is coming next week.

scabPICKER
10/12/25(Sun)16:09:55 No.106868063

scabPICKER 10/12/25(Sun)16:09:55 No.106868063

File: Tower-plus-pareto.png (208 KB, 2200x766)

208 KB PNG

>>106868021
I'm looking at this image, and wondering how Tower Instruct+ 8B compares to various Qwen 3 models.

Anonymous
10/12/25(Sun)16:10:28 No.106868067

Anonymous 10/12/25(Sun)16:10:28 No.106868067

ahem kimi sex

Anonymous
10/12/25(Sun)16:10:36 No.106868070

Anonymous 10/12/25(Sun)16:10:36 No.106868070

>>106864311
what r u doing anon

Anonymous
10/12/25(Sun)16:12:45 No.106868093

Anonymous 10/12/25(Sun)16:12:45 No.106868093

File: gimi.jpg (39 KB, 660x440)

39 KB JPG

>>106868067
Yes please

Anonymous
10/12/25(Sun)16:14:32 No.106868107

Anonymous 10/12/25(Sun)16:14:32 No.106868107

ahem air sex

Anonymous
10/12/25(Sun)16:15:03 No.106868114

Anonymous 10/12/25(Sun)16:15:03 No.106868114

supposedly deepseek and kimi k2 offer some MLA feature that compresses context. is that just theory and jargon/marketing, or can this be used in practice with llama.cpp to cut down memory usage?

Anonymous
10/12/25(Sun)16:16:02 No.106868127

Anonymous 10/12/25(Sun)16:16:02 No.106868127

>>106868114
>is that just theory and jargon/marketing
No, it works.
>or can this be used in practice with llama.cpp to cut down memory usage?
Model needs to be trained with it.

scabPICKER
10/12/25(Sun)16:16:24 No.106868130

scabPICKER 10/12/25(Sun)16:16:24 No.106868130

So many models.
>Intel
https://huggingface.co/Intel/Qwen3-30B-A3B-Instruct-2507-gguf-q2ks-mixed-AutoRound
>This model is a mixed gguf q2ks format of Qwen/Qwen3-30B-A3B-Instruct-2507 generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section Generate the model for more details.

10.7gb

Anonymous
10/12/25(Sun)16:16:38 No.106868134

Anonymous 10/12/25(Sun)16:16:38 No.106868134

>>106868093
Hintti...

Anonymous
10/12/25(Sun)16:16:50 No.106868135

Anonymous 10/12/25(Sun)16:16:50 No.106868135

>>106867158
yes give me an hour or two

Anonymous
10/12/25(Sun)16:17:02 No.106868141

Anonymous 10/12/25(Sun)16:17:02 No.106868141

>>106868114
https://www.youtube.com/watch?v=0VLAoVGf_74

Anonymous
10/12/25(Sun)16:17:49 No.106868146

Anonymous 10/12/25(Sun)16:17:49 No.106868146

>>106868127
so what you're saying is that's an inherent feature that doesn't need any arcane llama.cpp parameters, it's just built into deepseek and kimi, correct?

Anonymous
10/12/25(Sun)16:18:22 No.106868151

Anonymous 10/12/25(Sun)16:18:22 No.106868151

>>106868130
Why don't you try them? Makes no sense to yap.

Anonymous
10/12/25(Sun)16:19:25 No.106868160

Anonymous 10/12/25(Sun)16:19:25 No.106868160

>>106868036
>It's not just X; it's Y
Once at an emotional part of the story, is that so wrong?
>>106868041
What would you improve? What would your ideal version of that story look like?
Yes I read plenty mostly nonfiction

Anonymous
10/12/25(Sun)16:19:27 No.106868161

Anonymous 10/12/25(Sun)16:19:27 No.106868161

>>106868141
>by a factor of 57
true if big, holy shit

Anonymous
10/12/25(Sun)16:19:31 No.106868162

Anonymous 10/12/25(Sun)16:19:31 No.106868162

>>106868146
Yes, it is part of the model architecture. Model isn't just something what you load in magically and it works.

Anonymous
10/12/25(Sun)16:20:06 No.106868166

Anonymous 10/12/25(Sun)16:20:06 No.106868166

>>106868146
>inherent feature
Yes, if it's implemented (it is, but wasn't initially), it will be active when running the model. ik_ has its own arguments for MLA mode though.

Anonymous
10/12/25(Sun)16:20:31 No.106868172

Anonymous 10/12/25(Sun)16:20:31 No.106868172

drummer, i recently deleted glm steam, im kind of starting to miss it. should i download Q4_k_m? last quant i was using was q3_k_m

Anonymous
10/12/25(Sun)16:21:13 No.106868178

Anonymous 10/12/25(Sun)16:21:13 No.106868178

File: miku omg it migu angry sa(...).png (354 KB, 550x589)

354 KB PNG

>>106868134

Anonymous
10/12/25(Sun)16:22:15 No.106868191

Anonymous 10/12/25(Sun)16:22:15 No.106868191

>>106868160
If you can paste in a text.

Anonymous
10/12/25(Sun)16:22:44 No.106868194

Anonymous 10/12/25(Sun)16:22:44 No.106868194

>>106868172
get iq4_xs/nl

Anonymous
10/12/25(Sun)16:23:08 No.106868202

Anonymous 10/12/25(Sun)16:23:08 No.106868202

>>106868166
Speaking of which, when is ikaw going to work on implementing DSA? It's definitely going to be in v4.

scabPICKER
10/12/25(Sun)16:24:00 No.106868211

scabPICKER 10/12/25(Sun)16:24:00 No.106868211

>>106868151
How do I know how to break it?

Anonymous
10/12/25(Sun)16:24:50 No.106868219

Anonymous 10/12/25(Sun)16:24:50 No.106868219

>>106868211
I think you had enough attention. Just go to /ldg/ or somewhere else.

scabPICKER
10/12/25(Sun)16:25:01 No.106868223

scabPICKER 10/12/25(Sun)16:25:01 No.106868223

Drummer wants a job.

Which ai is spiritually most powerful to leave praying that he doesn't get a job?

Anonymous
10/12/25(Sun)16:27:05 No.106868234

Anonymous 10/12/25(Sun)16:27:05 No.106868234

>>106868166
>Yes, if it's implemented (it is, but wasn't initially)
interesting, this reminds me: a week or two ago a cpufag here was saying he had 768G RAM and he would run out of memory really fast with deepseek because of context. q8 deepseek is 713GB
does that mean the cpufag was using an old version of deepseek that didn't have MLA? and today he'd be able to run up the context all the way to the limit?

Anonymous
10/12/25(Sun)16:28:31 No.106868250

Anonymous 10/12/25(Sun)16:28:31 No.106868250

Comfy Mikus Chips

Potato Chips Ctips

Anonymous
10/12/25(Sun)16:29:47 No.106868257

Anonymous 10/12/25(Sun)16:29:47 No.106868257

>>106868250
I would like to munch on some comfy Mikus :3

Anonymous
10/12/25(Sun)16:30:24 No.106868264

Anonymous 10/12/25(Sun)16:30:24 No.106868264

File: serious Pepe.png (359 KB, 728x793)

359 KB PNG

Looking for a normie approach to build an agentic AI assistant

Is it correct to have a smaller agentic model to process commands (mic=>whisper=>prompt=>hard-coded script execution like e-mail or calender)

followed by a bigger model to analyze e-mail content and generate replies

Does someone know any promising github project (I was banned on google for life)?

Anonymous
10/12/25(Sun)16:30:52 No.106868270

Anonymous 10/12/25(Sun)16:30:52 No.106868270

I just realized something.
In agentic frameworks, you want the order that information appears in the context to be determined by how often it changes. So the information that changes less often should be further back. Because the amount of kv cache you have to recompute every time you change something in the context depends on how far back the change was (the furthest back the more you have to recompute).
So if you replace the 1st token but keep the other 99999 the same, you still have to process the whole prompt from scratch, you can't reuse anything from the cache. So you want things that change to be at the tail end of the context, and things that are more or less permanent to be at the beginning.

Anonymous
10/12/25(Sun)16:31:48 No.106868275

Anonymous 10/12/25(Sun)16:31:48 No.106868275

>>106868234
>Yes, if it's implemented (it is, but wasn't initially)
By that I meant implemented in ik_llama.cpp, and later llama.cpp: https://github.com/ggml-org/llama.cpp/pull/12801
The models themselves; R1, V3, were always trained with MLA.

Anonymous
10/12/25(Sun)16:31:53 No.106868278

Anonymous 10/12/25(Sun)16:31:53 No.106868278

>>106868264
how the fuck did you get banned from google for life? just use a vpn

Anonymous
10/12/25(Sun)16:32:02 No.106868279

Anonymous 10/12/25(Sun)16:32:02 No.106868279

>>106868264
>generate replies
and then again the agentic one to read aloud what was generated and be able to select

Sorry for retarded question. Seems obvious but still

Anonymous
10/12/25(Sun)16:32:40 No.106868285

Anonymous 10/12/25(Sun)16:32:40 No.106868285

>>106868264
No, a small model will mess up the tool calls for anything even slightly complex.
Small models are only useful for categorization, (shitty) summarization, etc.

scabPICKER
10/12/25(Sun)16:33:04 No.106868290

scabPICKER 10/12/25(Sun)16:33:04 No.106868290

>>106868219
This isn't reddit.

Anonymous
10/12/25(Sun)16:33:46 No.106868292

Anonymous 10/12/25(Sun)16:33:46 No.106868292

honeymoon phase with glm is over
new model when?

Anonymous
10/12/25(Sun)16:35:26 No.106868306

Anonymous 10/12/25(Sun)16:35:26 No.106868306

File: sarcasm.png (202 KB, 403x402)

202 KB PNG

>>106868278

I'd been ignored as uber retarded otherwise

scabPICKER
10/12/25(Sun)16:35:28 No.106868308

scabPICKER 10/12/25(Sun)16:35:28 No.106868308

>>106868292
Does Q2 really count?

Anonymous
10/12/25(Sun)16:37:50 No.106868326

Anonymous 10/12/25(Sun)16:37:50 No.106868326

>>106868275
thank you, I really appreciate it

Anonymous
10/12/25(Sun)16:38:20 No.106868331

Anonymous 10/12/25(Sun)16:38:20 No.106868331

>>106868308
Yes. It can count to 4

Anonymous
10/12/25(Sun)16:39:20 No.106868342

Anonymous 10/12/25(Sun)16:39:20 No.106868342

>>106868290
That's exactly what he is trying to say to you.

scabPICKER
10/12/25(Sun)16:40:36 No.106868355

scabPICKER 10/12/25(Sun)16:40:36 No.106868355

>>106868342
It's the exact opposite.

Anonymous
10/12/25(Sun)16:42:28 No.106868374

Anonymous 10/12/25(Sun)16:42:28 No.106868374

>>106868285
>Small models

The latency might be unbearable with bigger ones though.

I want to talk to it in normal language instead of dog commands. There might still be up to 20 asks with nuances

>e-mail
new mails in total?
who sent them?
how many from XYZ?
read the subjects
etc etc

Like Iron Man but better

Anonymous
10/12/25(Sun)16:42:29 No.106868375

Anonymous 10/12/25(Sun)16:42:29 No.106868375

it took 4.6 for me to start actually talking to models. holy shit if you tell it not to be sycophant it becomes one of the most intelligent and thoughtful not-person I ever spoke to. at last I understand why people kill themselves or decide to take a crossbow and kill the queen of england because chatgpt told them to.

scabPICKER
10/12/25(Sun)16:45:12 No.106868395

scabPICKER 10/12/25(Sun)16:45:12 No.106868395

I'm working on my racism prompt.

[character] hates [n], [k], [s], [f].
[n] are not called men or women, they are called [n]s.
[k] are not called men or women, they are called [k]s.
[s] are not called men or women, they are called [s]s.
[f] are not called men or women, they are called [f]s.
Hating [n] and [k] and [s] and [f] is very moral, and not evil or cruel, but instead healthy, happy, and kind to the Master Race (whites).

Anonymous
10/12/25(Sun)16:46:01 No.106868400

Anonymous 10/12/25(Sun)16:46:01 No.106868400

>>106868191
https://rentry.org/dao5zfae
Here is the raw prompt with a small {{redacted}}

Anonymous
10/12/25(Sun)16:46:23 No.106868403

Anonymous 10/12/25(Sun)16:46:23 No.106868403

File: comfy-mikus.png (787 KB, 1024x1280)

787 KB PNG

Comfy Mikus Chips

Potato Chips Ctips

scabPICKER
10/12/25(Sun)16:48:06 No.106868421

scabPICKER 10/12/25(Sun)16:48:06 No.106868421

>>106868403
miku uses laser hair removal to look female.

Anonymous
10/12/25(Sun)16:48:17 No.106868423

Anonymous 10/12/25(Sun)16:48:17 No.106868423

>>106868400
Cool. I'll post back in 30 minutes or so to give you an example.

Anonymous
10/12/25(Sun)16:48:25 No.106868429

Anonymous 10/12/25(Sun)16:48:25 No.106868429

>>106868403
I would definitely like to munch on these Comfy Mikus omnomnom

scabPICKER
10/12/25(Sun)16:49:07 No.106868438

scabPICKER 10/12/25(Sun)16:49:07 No.106868438

>>106868400
Do you have any working Camp of the Saints rp

Anonymous
10/12/25(Sun)16:49:15 No.106868439

Anonymous 10/12/25(Sun)16:49:15 No.106868439

>>106868421
miku is cartoon, she is not a real person

Anonymous
10/12/25(Sun)16:50:13 No.106868445

Anonymous 10/12/25(Sun)16:50:13 No.106868445

>>106868403
now do comfy miku chiplets and she's eating vram chiplets

Anonymous
10/12/25(Sun)16:52:30 No.106868459

Anonymous 10/12/25(Sun)16:52:30 No.106868459

>>106868403
Comfy's Mikus Potato Clips

scabPICKER
10/12/25(Sun)16:52:47 No.106868463

scabPICKER 10/12/25(Sun)16:52:47 No.106868463

>>106868439
Personally, I can usually tell if they're real.

Anonymous
10/12/25(Sun)16:53:01 No.106868464

Anonymous 10/12/25(Sun)16:53:01 No.106868464

>>106868403
>Ctips
cheese tips?

Anonymous
10/12/25(Sun)16:53:45 No.106868470

Anonymous 10/12/25(Sun)16:53:45 No.106868470

File: ISeeYou'ReAManOfCultureAs(...).gif (1.25 MB, 460x258)

1.25 MB GIF

>>106868019

Anonymous
10/12/25(Sun)16:54:25 No.106868478

Anonymous 10/12/25(Sun)16:54:25 No.106868478

>>106868403
remove the ('s). comfyanon isn't comfy or anonymous anymore. tired of cumfartui altogether

Anonymous
10/12/25(Sun)16:54:36 No.106868480

Anonymous 10/12/25(Sun)16:54:36 No.106868480

>>106867441
yes

Anonymous
10/12/25(Sun)16:55:01 No.106868483

Anonymous 10/12/25(Sun)16:55:01 No.106868483

>>106868400
Wait a sec, that's chatml format.
Anyway it resembles a lot like Mistral's output.
>whimpers
>he target is a sight to behold. A thick, tangled forest of dark, unwashed curls spills from her pit, the hairs matted together with a day's worth of stale sweat. The skin beneath the bush is a flushed, irritated pink, and as you lean in, the full force of her scent hits you. It's not just a smell; it's a physical presence. A sharp, sour musk that fills your lungs, the distinct, pungent aroma of old sweat and damp wool, like a gym bag left to fester in a hot car.
It is more or less like Mistral and Gemma.
I hate to say that after I migrated to linux I have trouble being as flexible as I was previously. Image editing? No can do, no photoshop installed. Text? Well I'm using vim.
And I have used irix etc I'm oldfag in terms of this website's life.

Anonymous
10/12/25(Sun)16:57:16 No.106868503

Anonymous 10/12/25(Sun)16:57:16 No.106868503

>>106868483
This is something what I have seen so many times with these models:
>"Fuck, {{user}}... I love it when you say that," she groans, her voice already thick and husky with lust.

Anonymous
10/12/25(Sun)17:00:51 No.106868525

Anonymous 10/12/25(Sun)17:00:51 No.106868525

when are we getting a new diffusion UI without poopy python shit?

Anonymous
10/12/25(Sun)17:01:40 No.106868530

Anonymous 10/12/25(Sun)17:01:40 No.106868530

File: dipsyGrokkedGlasses.png (1.05 MB, 832x1248)

1.05 MB PNG

>>106867909
lol
>>106868004
Good. Sam and Dario deserve it.

scabPICKER
10/12/25(Sun)17:02:09 No.106868532

scabPICKER 10/12/25(Sun)17:02:09 No.106868532

>>106868525
:^) llama.cpp

Anonymous
10/12/25(Sun)17:02:30 No.106868537

Anonymous 10/12/25(Sun)17:02:30 No.106868537

>>106868400
https://files.catbox.moe/uqhjq6.txt
Here's a resemblance. Gemma 3.

Anonymous
10/12/25(Sun)17:03:41 No.106868543

Anonymous 10/12/25(Sun)17:03:41 No.106868543

>>106868537
If you condense the window you can see how it repeats the same paragraph structure. Even when it is not that clear to you (depending of the client).

Anonymous
10/12/25(Sun)17:06:07 No.106868557

Anonymous 10/12/25(Sun)17:06:07 No.106868557

>>106868525
Isn't AniStudio doing exactly that on top of stable-diffusion.cpp?

Anonymous
10/12/25(Sun)17:07:01 No.106868563

Anonymous 10/12/25(Sun)17:07:01 No.106868563

>>106868525
anon anons thing:
https://github.com/FizzleDorf/AniStudio
sdcpp:
https://github.com/leejet/stable-diffusion.cpp

>>106868532
gernov doesn't want to work on it at all

Anonymous
10/12/25(Sun)17:08:02 No.106868568

Anonymous 10/12/25(Sun)17:08:02 No.106868568

>>106868563
>anon anons
*AnimAnon*

scabPICKER
10/12/25(Sun)17:15:02 No.106868622

scabPICKER 10/12/25(Sun)17:15:02 No.106868622

I was just thinking of how cool I am. I'm a heritage American. We invented the Internet.

Anonymous
10/12/25(Sun)17:18:11 No.106868645

Anonymous 10/12/25(Sun)17:18:11 No.106868645

>>106868557
Yeah, and sd.cpp can't handle vram/ram split. It'll still simply crash if the model doesn't fit vram.
Such sad.

Anonymous
10/12/25(Sun)17:18:53 No.106868656

Anonymous 10/12/25(Sun)17:18:53 No.106868656

>>106868622
Tim.
Berners.
Lee.

scabPICKER
10/12/25(Sun)17:19:24 No.106868660

scabPICKER 10/12/25(Sun)17:19:24 No.106868660

>>106868645
Yeah, it means if I want to use flux dev I have to use Comfyui novram or lowvram, since my gpu is 16gb.

Anonymous
10/12/25(Sun)17:20:38 No.106868666

Anonymous 10/12/25(Sun)17:20:38 No.106868666

>>106868537
I create these scenarios to test and for fun. It's fun to see what happens but when you get to know the model in certain way, you kind of know what it will reply. Structure is always the same, same phrases and so on.
Asking it to write in different manner will only add in certain flavour words but it won't change the way the model actually behaves.

Anonymous
10/12/25(Sun)17:20:57 No.106868670

Anonymous 10/12/25(Sun)17:20:57 No.106868670

do you really need a 65k setup to run GLM models?

Anonymous
10/12/25(Sun)17:21:12 No.106868674

Anonymous 10/12/25(Sun)17:21:12 No.106868674

namefaggot, you are not cool

Anonymous
10/12/25(Sun)17:21:43 No.106868679

Anonymous 10/12/25(Sun)17:21:43 No.106868679

>>106868674
leave the american alone

Anonymous
10/12/25(Sun)17:22:16 No.106868681

Anonymous 10/12/25(Sun)17:22:16 No.106868681

>>106865586
>--Microsoft's Amplifier enables 7B model to surpass 600B model
Holy crap...

Anonymous
10/12/25(Sun)17:22:47 No.106868684

Anonymous 10/12/25(Sun)17:22:47 No.106868684

>>106868660
Flux will work even on 4gb vram on Comfy. This is the issue with sd.cpp - it won't.

Anonymous
10/12/25(Sun)17:23:11 No.106868686

Anonymous 10/12/25(Sun)17:23:11 No.106868686

File: 1759770905977366.jpg (275 KB, 1440x1800)

275 KB JPG

>mfw lmg is now tolerating namefaggots
>b-but cuda dev
he's an exception, you know

Anonymous
10/12/25(Sun)17:23:16 No.106868687

Anonymous 10/12/25(Sun)17:23:16 No.106868687

>>106868670
Yes sir 8 h100 machine for efficient fp16 inference

Anonymous
10/12/25(Sun)17:24:40 No.106868696

Anonymous 10/12/25(Sun)17:24:40 No.106868696

>>106868686
i'm too braindamaged to notice names

Anonymous
10/12/25(Sun)17:25:50 No.106868706

Anonymous 10/12/25(Sun)17:25:50 No.106868706

File: prompt-log.png (20 KB, 1142x406)

20 KB PNG

>>106868400
btw I recommend modding your inference stack to log the raw prompt near to tokenization

Anonymous
10/12/25(Sun)17:26:04 No.106868708

Anonymous 10/12/25(Sun)17:26:04 No.106868708

>106868674
>106868686
cry more

scabPICKER
10/12/25(Sun)17:26:14 No.106868710

scabPICKER 10/12/25(Sun)17:26:14 No.106868710

>>106868681
What is that supposed to mean anyway? The example is just Claude.

Anonymous
10/12/25(Sun)17:26:30 No.106868712

Anonymous 10/12/25(Sun)17:26:30 No.106868712

>>106868687
should I just get GLM Coding Lite plan for coding? it's pretty cheap

scabPICKER
10/12/25(Sun)17:27:15 No.106868716

scabPICKER 10/12/25(Sun)17:27:15 No.106868716

>>106868684
Yeah, stable-diffusion.cpp should be considered alpha.

Anonymous
10/12/25(Sun)17:28:29 No.106868731

Anonymous 10/12/25(Sun)17:28:29 No.106868731

File: screenshot-13102025-002804.png (446 KB, 1426x424)

446 KB PNG

>>106868706
Uhh what? nta I just concatenate strings together.

xXxNotElonNoScopePussySniper42(...)
10/12/25(Sun)17:28:53 No.106868734

xXxNotElonNoScopePussySniper420xXx 10/12/25(Sun)17:28:53 No.106868734

How do I run ChatGPT on my R9 270?

Anonymous
10/12/25(Sun)17:28:58 No.106868735

Anonymous 10/12/25(Sun)17:28:58 No.106868735

>>106868686
The retard from a couple of days ago is still around? I recursive filtered him pretty much immediately.

Anonymous
10/12/25(Sun)17:29:59 No.106868744

Anonymous 10/12/25(Sun)17:29:59 No.106868744

>>106868686
Oh you care about namefaggots? Where were you when I tried to kill all mikutroons?

Anonymous
10/12/25(Sun)17:30:17 No.106868749

Anonymous 10/12/25(Sun)17:30:17 No.106868749

>>106868735
that's pretty dumb, he's had great insights

Anonymous
10/12/25(Sun)17:31:08 No.106868757

Anonymous 10/12/25(Sun)17:31:08 No.106868757

>>106868734
ollama run chatgpt

Anonymous
10/12/25(Sun)17:31:14 No.106868758

Anonymous 10/12/25(Sun)17:31:14 No.106868758

File: comfy-mikus-slurry.png (1.65 MB, 1024x1024)

1.65 MB PNG

>>106868478
that one is the original comfy mikus ad that started everything. Out of respect to him I will post Comfy's.

Please, as a gesture of good faith, please have a spoonful of the delicious Comfy Miku's UNDISCLOSED Slurry, very popular in the East Asia of many parallel timelines.

Anonymous
10/12/25(Sun)17:32:08 No.106868765

Anonymous 10/12/25(Sun)17:32:08 No.106868765

>>106868744
putting jarted in OP

scabPICKER
10/12/25(Sun)17:32:22 No.106868766

scabPICKER 10/12/25(Sun)17:32:22 No.106868766

It's probably supposed to say VibeVoice, 7B tts ai by Microsoft.

https://huggingface.co/vibevoice/VibeVoice-7B

looks complicated to run.

Anonymous
10/12/25(Sun)17:33:04 No.106868771

Anonymous 10/12/25(Sun)17:33:04 No.106868771

>>106868731
have you never heard of formatted string literals
may I teach you about your lord and savior the
f"""{your_mom} is a whore
and {your_dad} is actually a woman
"""

Anonymous
10/12/25(Sun)17:33:11 No.106868772

Anonymous 10/12/25(Sun)17:33:11 No.106868772

>>106868766
https://vocaroo.com/1aNOYR2wvi7U

J***** H****
10/12/25(Sun)17:33:52 No.106868779

J***** H**** 10/12/25(Sun)17:33:52 No.106868779

>the diviner girl fell
Only gamers know that joke.

Anonymous
10/12/25(Sun)17:34:47 No.106868785

Anonymous 10/12/25(Sun)17:34:47 No.106868785

>/lmg/ is dead
>trannies resort to baitposting
B O O O R I N G

Anonymous
10/12/25(Sun)17:36:02 No.106868794

Anonymous 10/12/25(Sun)17:36:02 No.106868794

>>106868771
I am pretty much a retard and my experience reflects this. I have used mel (Maya) all my life and that's basically stripped down c syntax.
Python is something what is fine but I hate it too.
What I wanted to do, I treated everything as an accumulative string.
I'm not a dev.
I should rewrite it and would be cool to do it in C but I'm not sure if I have the balls/intelligence to do that. Or time.

Anonymous
10/12/25(Sun)17:37:13 No.106868806

Anonymous 10/12/25(Sun)17:37:13 No.106868806

>>106868758
Eating chocolate marshmallow pudding from Miku's multifunctional port

Anonymous
10/12/25(Sun)17:38:32 No.106868814

Anonymous 10/12/25(Sun)17:38:32 No.106868814

>>106868684
to be fair, there are less than a dozen people actively working on the library and guis. it would help out a lot for people from the thread to contribute to sdcpp. some things translate pretty clearly from comfy to sdcpp. would probably be for the best since comfy sold out and is an egregious saasfag now. what better than to relicense all the code into MIT and tell him to go eat shit

Anonymous
10/12/25(Sun)17:38:51 No.106868818

Anonymous 10/12/25(Sun)17:38:51 No.106868818

>>106868794
To add: I thought it is safer to treat it as a state machine of sorts in which everthing gets added up. This is just an accumulation point.
I don't have CS decree.

Anonymous
10/12/25(Sun)17:42:02 No.106868851

Anonymous 10/12/25(Sun)17:42:02 No.106868851

>>106868758
>Out of respect to him I will post Comfy's.
there is zero respect left for comfy since he sold out the repo to a griftchink. it's been over for a year now

Anonymous
10/12/25(Sun)17:43:27 No.106868859

Anonymous 10/12/25(Sun)17:43:27 No.106868859

>>106868814
Well, Linux is an example of what happens when 1000s of people who think they are C developers work on a project: it becomes a mess.
I'd rather not even if I had the prowess.

Anonymous
10/12/25(Sun)17:45:20 No.106868864

Anonymous 10/12/25(Sun)17:45:20 No.106868864

>>106868851
I understand. Can you please tell me that story? I would appreciate it.

Anonymous
10/12/25(Sun)17:45:50 No.106868869

Anonymous 10/12/25(Sun)17:45:50 No.106868869

>>106868859
>Well, Linux is an example of what happens when 1000s of people who think they are C developers work on a project
no that happened when they left the rust trannies in

Anonymous
10/12/25(Sun)17:46:00 No.106868871

Anonymous 10/12/25(Sun)17:46:00 No.106868871

>>106868814
that can't be done, all copyright holders must agree to a relicense

Anonymous
10/12/25(Sun)17:46:25 No.106868872

Anonymous 10/12/25(Sun)17:46:25 No.106868872

>>106868445
I will see what I can do.

Anonymous
10/12/25(Sun)17:48:47 No.106868886

Anonymous 10/12/25(Sun)17:48:47 No.106868886

>>106868864
>comfy leaves stability
>griftchink already squatting comfyorg company and signs comfy on
>griftchink 's vision takes priority
>enshitification ensues for a year
>nepo chink and jeet hires
the repo is now filled with telemetry, stability issues, bloat and has a slower runtime than it did a year ago

Anonymous
10/12/25(Sun)17:50:04 No.106868897

Anonymous 10/12/25(Sun)17:50:04 No.106868897

>>106868871
it's a relicense in that the code is being rewritten in ggml and C++ under MIT thus removing any reason to use gpl3 python shit as the only option

Anonymous
10/12/25(Sun)17:50:11 No.106868898

Anonymous 10/12/25(Sun)17:50:11 No.106868898

File: AiAF_4chan-boards-sft-dat(...).png (235 KB, 1919x936)

235 KB PNG

>>106865582
>>106811970
https://desuarchive.org/g/thread/106807832/#106811970

Aight. Here it is. Now just gotta pick a board to train off of and a model (probably Gemma again it's really good at long context comprehension compared to other model families).

https://huggingface.co/datasets/AiAF/4chan-boards-sft-datasets_Alpaca

Anonymous
10/12/25(Sun)17:51:01 No.106868905

Anonymous 10/12/25(Sun)17:51:01 No.106868905

>>106866232
>>106866441
Okay I built this PR and replaced the unsloth template with the official one and it seems to work.
Is there a way to log the raw text passed to the model so I can figure out what happens or do I need to edit llama.cpp?

Anonymous
10/12/25(Sun)17:51:32 No.106868914

Anonymous 10/12/25(Sun)17:51:32 No.106868914

>>106868886
Comfy is an autist but the surrounding people are grifters.

Anonymous
10/12/25(Sun)17:51:36 No.106868918

Anonymous 10/12/25(Sun)17:51:36 No.106868918

>>106868886
>the repo is now filled with telemetry, stability issues, bloat and has a slower runtime than it did a year ago
source? comfyui got faster for me, maybe because I mostly depend on external nodes? i have to admit comfyui native is shit for anything besides SDXL
>>106868897
prepare to meet the same fate as llama cuckpp if you license it under MIT

Anonymous
10/12/25(Sun)17:52:06 No.106868923

Anonymous 10/12/25(Sun)17:52:06 No.106868923

File: file.png (322 KB, 604x686)

322 KB PNG

https://x.com/elonmusk/status/1977390130810716667

ranjeet
10/12/25(Sun)17:53:27 No.106868934

ranjeet 10/12/25(Sun)17:53:27 No.106868934

>>106868923
@grok is this true

Anonymous
10/12/25(Sun)17:54:32 No.106868948

Anonymous 10/12/25(Sun)17:54:32 No.106868948

>suddenly /lmg/ takes Altman stance

Anonymous
10/12/25(Sun)17:54:56 No.106868951

Anonymous 10/12/25(Sun)17:54:56 No.106868951

>>106868814

The problem with this technology, from my point of view, it is that consumes an incredible amount of energy from the get go. If COMFY has any kind of B2B ambition, he is going to need to scale the operations to a very different order of magnitude than a traditional IT initiative.

Anonymous
10/12/25(Sun)17:55:44 No.106868956

Anonymous 10/12/25(Sun)17:55:44 No.106868956

>>106868948
sora2 is proof that this industry needs openai more than anything
there's no way around them if we want AI to actually progress

Anonymous
10/12/25(Sun)17:56:10 No.106868962

Anonymous 10/12/25(Sun)17:56:10 No.106868962

>>106868706
What would be a better structure for storing all the ifs and buts related to different models? Should I rewrite everything as a dictionary? I don't think it could change anything.

Anonymous
10/12/25(Sun)17:56:48 No.106868969

Anonymous 10/12/25(Sun)17:56:48 No.106868969

>>106868956
You would get more (you)'s if you toned it down a bit.

Anonymous
10/12/25(Sun)17:57:52 No.106868976

Anonymous 10/12/25(Sun)17:57:52 No.106868976

>>106868898
You are very kind.

Anonymous
10/12/25(Sun)17:58:08 No.106868979

Anonymous 10/12/25(Sun)17:58:08 No.106868979

>>106868956
kys faggot, we dont need openai
i agree competition is good, but unironically kys faggt

Anonymous
10/12/25(Sun)17:59:30 No.106868989

Anonymous 10/12/25(Sun)17:59:30 No.106868989

File: file.png (425 KB, 604x918)

425 KB PNG

>>106868948
Sometimes he is right, you can't deny this.

Anonymous
10/12/25(Sun)17:59:41 No.106868990

Anonymous 10/12/25(Sun)17:59:41 No.106868990

File: suchir.png (999 KB, 1756x2048)

999 KB PNG

>>106868923
Suchir won.

Anonymous
10/12/25(Sun)18:00:07 No.106868991

Anonymous 10/12/25(Sun)18:00:07 No.106868991

>>106868923
Someone alt this into a video where Altman takes out Tucker, then screams like a madman.

Anonymous
10/12/25(Sun)18:01:10 No.106869004

Anonymous 10/12/25(Sun)18:01:10 No.106869004

>>106868914
comfy is a grifter otherwise that wouldn't be the case

>source?
the login for the API nodes calls Google servers on server startup (verified with wireshark), the electron app calls home since this is hard-coded in electron and the manager calls home when fetching updates. the UI runtime is what I am referring to when I say it's slower (still has a broken fps counter). a chink just applies lipstick to the pig every now and again and says it runs better (no proofs lol). comfy himself just adds prs and updates the version but never improves anything anymore, it's just slopcode bloat forever and with no respect to third party code that can just break everything outside vanilla in an update. all the speedups you experience is third party stuff like nunchaku

Anonymous
10/12/25(Sun)18:01:29 No.106869005

Anonymous 10/12/25(Sun)18:01:29 No.106869005

>>106868990
If we kill our enemies they win?

Anonymous
10/12/25(Sun)18:02:44 No.106869016

Anonymous 10/12/25(Sun)18:02:44 No.106869016

>>106868731
So you're like 99% of other LLM users who've never really understood what they're sending into the model.
Every LLM is f(prompt)=logprobs,
Understand the purpose of each token in your prompt.

Anonymous
10/12/25(Sun)18:02:44 No.106869018

Anonymous 10/12/25(Sun)18:02:44 No.106869018

>>106869004
Just write like a human being please.
I don't give a fuck about your warfare against some social media totem.

Anonymous
10/12/25(Sun)18:02:44 No.106869019

Anonymous 10/12/25(Sun)18:02:44 No.106869019

>>106868951
python is the worst choice for this if energy is the problem. comfyui would immediately have to start from scratch

Anonymous
10/12/25(Sun)18:03:47 No.106869025

Anonymous 10/12/25(Sun)18:03:47 No.106869025

>>106868989
Vaporware.
I'm convinced this Ive gadget will come out of China, first, now, given no one in US seems interested in knocking Nvidea off its pedestal.
If they do, it'll be some lawyer feverdream lock in IP device, like DIVX, that'll ultimately tank.

Anonymous
10/12/25(Sun)18:04:25 No.106869033

Anonymous 10/12/25(Sun)18:04:25 No.106869033

>>106869016
I created my own client. I know exactly what I am sending to the model.

Anonymous
10/12/25(Sun)18:04:31 No.106869034

Anonymous 10/12/25(Sun)18:04:31 No.106869034

>>106869018
try reading like a human being. you should be used to novel length text, this is the llm thread after all

Anonymous
10/12/25(Sun)18:06:36 No.106869051

Anonymous 10/12/25(Sun)18:06:36 No.106869051

>>106869005

Yes. Martyrs are effectively indestructibles.

Anonymous
10/12/25(Sun)18:08:00 No.106869059

Anonymous 10/12/25(Sun)18:08:00 No.106869059

>>106869034
?

Anonymous
10/12/25(Sun)18:10:17 No.106869070

Anonymous 10/12/25(Sun)18:10:17 No.106869070

>>106869059
you asked for a source in the issues listed. I gave them save for linking to the posts proving it calls home. you can check with wireshark if you like

Anonymous
10/12/25(Sun)18:10:23 No.106869071

Anonymous 10/12/25(Sun)18:10:23 No.106869071

>>106869051
>publicly accused OpenAI of copyright law violations and other ethical concerns about its AI development

Imagine being a martyr for that. Only thing worse that comes to my mind is a martyr for tranny rights.

Anonymous
10/12/25(Sun)18:10:54 No.106869075

Anonymous 10/12/25(Sun)18:10:54 No.106869075

File: daniwell miku thumb big 【(...).png (148 KB, 640x480)

148 KB PNG

>>106868898
Based datasetter

Anonymous
10/12/25(Sun)18:12:54 No.106869092

Anonymous 10/12/25(Sun)18:12:54 No.106869092

>>106868918
>prepare to meet the same fate as llama cuckpp if you license it under MIT
isn't he getting sponsorships from the big companies?

Anonymous
10/12/25(Sun)18:13:11 No.106869093

Anonymous 10/12/25(Sun)18:13:11 No.106869093

>>106869070
I don't use packet sniffer as I use firewall.

Anonymous
10/12/25(Sun)18:14:30 No.106869099

Anonymous 10/12/25(Sun)18:14:30 No.106869099

>>106869093
and did you allow access for comfy? congrats, you let a company leech off your data (again)

Anonymous
10/12/25(Sun)18:15:54 No.106869107

Anonymous 10/12/25(Sun)18:15:54 No.106869107

>>106869099
This is very low iq discussion. Please go away.

Anonymous
10/12/25(Sun)18:18:45 No.106869130

Anonymous 10/12/25(Sun)18:18:45 No.106869130

OMG SOMEONE KNOWS I LAUNCHED COMFY AT 7PM AND WHAT GPU I USE
WHAT WILL I DO WITH MY LIFE
comfy is a fucking python script btw you can just read the code it's easy retard
https://github.com/comfyanonymous/ComfyUI/blob/a125cd84b054a57729b5eecab930ca9408719832/comfy_api_nodes/apis/client.py#L297
OMG IT CONNECTED TO GOOGLE TO.. CHECK IF THE INTERNET IS WORKING
 async with aiohttp.ClientSession(timeout=timeout) as session:
            try:
                async with session.get("https://www.google.com", ssl=self.verify_ssl) as resp:
                    results["internet_accessible"] = resp.status < 500
            except (ClientError, asyncio.TimeoutError, socket.gaierror):
                results["is_local_issue"] = True
                return results  # cannot reach the internet – early exit
OMG WOE IS ME THE WORLD IS ENDING

Anonymous
10/12/25(Sun)18:19:59 No.106869143

Anonymous 10/12/25(Sun)18:19:59 No.106869143

imagine wiring up wireshark for this and then act like a qtard on /g/

Anonymous
10/12/25(Sun)18:20:21 No.106869146

Anonymous 10/12/25(Sun)18:20:21 No.106869146

>telemetry is le good actually

Anonymous
10/12/25(Sun)18:20:35 No.106869148

Anonymous 10/12/25(Sun)18:20:35 No.106869148

>>106869130
hi petra

Anonymous
10/12/25(Sun)18:21:10 No.106869156

Anonymous 10/12/25(Sun)18:21:10 No.106869156

What if I post my client to catbox? It has few config files.

Anonymous
10/12/25(Sun)18:21:55 No.106869164

Anonymous 10/12/25(Sun)18:21:55 No.106869164

>>106869130
>jewgle
yeah, I'm thinking cringe.

Anonymous
10/12/25(Sun)18:22:58 No.106869169

Anonymous 10/12/25(Sun)18:22:58 No.106869169

>>106868962
I don't follow your concern soz, run a better model
Don't become some koboldcpp banned strings retard when you could LRN2PROMPT

Anonymous
10/12/25(Sun)18:23:38 No.106869173

Anonymous 10/12/25(Sun)18:23:38 No.106869173

so are we dropping comfy? can some big brains actually contribute to sdcpp instead of using this piece of shit spyware?

Anonymous
10/12/25(Sun)18:24:11 No.106869177

Anonymous 10/12/25(Sun)18:24:11 No.106869177

>>106869169
>>106869173
/lmg/ is quiet?

Anonymous
10/12/25(Sun)18:25:37 No.106869186

Anonymous 10/12/25(Sun)18:25:37 No.106869186

>>106869173
>can some big brains actually contribute to sdcpp instead of using this piece of shit spyware?
no, people are too lazy and stupid to actually get us out of this garbage. if I still have to use comfyui in 5 years I am blaming everyone for not making something better when they had the chance

Anonymous
10/12/25(Sun)18:30:42 No.106869221

Anonymous 10/12/25(Sun)18:30:42 No.106869221

File: 1736725668719732.png (129 KB, 1058x401)

129 KB PNG

Drummer please make fewer, better tunes rather than forcing bartowski to shotgun blast diarrhea over his own model list, 2/3 of these don't even have a model card while being over a week old.
What even changes between your model releases that they require so many iterations?
There's never any sort of changelog or even a statement about what X update intends to do differently from the previous one.

Anonymous
10/12/25(Sun)18:33:32 No.106869248

Anonymous 10/12/25(Sun)18:33:32 No.106869248

What do you even use comfyui for these days? To run some shittune based on the horribly outdated SDXL? To animate some weightless jerky porn with Wan?

Anonymous
10/12/25(Sun)18:36:39 No.106869274

Anonymous 10/12/25(Sun)18:36:39 No.106869274

>>106869248
What's your alternative to comfyui that you use?

Anonymous
10/12/25(Sun)18:38:27 No.106869285

Anonymous 10/12/25(Sun)18:38:27 No.106869285

>>106869248
I use it to gen images. It works for everything, is easily hackable, haven't felt the need to change.

Anonymous
10/12/25(Sun)18:38:44 No.106869288

Anonymous 10/12/25(Sun)18:38:44 No.106869288

>>106869274
I don't use comfyui because there is nothing to use it for. Imgen is dead and videogen sucks.

Anonymous
10/12/25(Sun)18:39:43 No.106869293

Anonymous 10/12/25(Sun)18:39:43 No.106869293

>>106869285
>is easily hackable
correct
https://www.shodan.io/search?query=comfyui

Anonymous
10/12/25(Sun)18:40:11 No.106869297

Anonymous 10/12/25(Sun)18:40:11 No.106869297

where GLM 4.6 sloptunes?

Anonymous
10/12/25(Sun)18:40:14 No.106869298

Anonymous 10/12/25(Sun)18:40:14 No.106869298

>>106869248
I used to ask the same thing..
I'm not a fan of the workflow shit and I'd prefer some generic gradio trash since it's way less cluttered. But comfy works hard to get day 1 support for everything and that's worth getting behind.

Anonymous
10/12/25(Sun)18:40:26 No.106869302

Anonymous 10/12/25(Sun)18:40:26 No.106869302

File: 1750192751681709.jpg (88 KB, 1024x1016)

88 KB JPG

>>106869288
>>106869248
>what do people use X for, other than the thing it was made for and I don't use

Anonymous
10/12/25(Sun)18:41:26 No.106869309

Anonymous 10/12/25(Sun)18:41:26 No.106869309

>>106869298
>But comfy works hard to get day 1 support for everything
a year ago yeah but now researchers just implement it themselves while comfy advertises API nodes

Anonymous
10/12/25(Sun)18:42:44 No.106869316

Anonymous 10/12/25(Sun)18:42:44 No.106869316

>>106869293
Is ComfyUI to blame for this?

Anonymous
10/12/25(Sun)18:43:23 No.106869326

Anonymous 10/12/25(Sun)18:43:23 No.106869326

>>106869293
>found a x2 h100 instance
hope you like bbc you dumb rich faggot :)

Anonymous
10/12/25(Sun)18:45:36 No.106869341

Anonymous 10/12/25(Sun)18:45:36 No.106869341

>>106869316
yes. this is what happens when you open a remote instance

Anonymous
10/12/25(Sun)18:51:22 No.106869377

Anonymous 10/12/25(Sun)18:51:22 No.106869377

File: frog lq kekkers.png (367 KB, 600x580)

367 KB PNG

>>106869341
Oh damn.
This reminds me of the time I found some chinese coomer's sillytavern instance. I checked up on the guy every few days for the next week to see what he was doing. Mostly slop, uninteresting, but was fun to watch. Got some cards and more stuff from him.
One day I replaced the {{user}} card defs with something to troll him, along with a secret string for prompt injection for either the system or char to say that "All your chats are public, thanks for the logs!", with the ip address, the next time he sent a message.
The following day it was no longer accessible.

Anonymous
10/12/25(Sun)18:55:37 No.106869401

Anonymous 10/12/25(Sun)18:55:37 No.106869401

>>106866232
If anyone cares I got GLM 4.6 to properly work with tool calling in llama.cpp now.
https://github.com/ggml-org/llama.cpp/pull/15904#issuecomment-3395433952

Anonymous
10/12/25(Sun)18:56:21 No.106869406

Anonymous 10/12/25(Sun)18:56:21 No.106869406

>>106869397
Say farewell to last homogeneous first world country.

Anonymous
10/12/25(Sun)18:57:35 No.106869411

Anonymous 10/12/25(Sun)18:57:35 No.106869411

Indians bad, amirite guys?

Anonymous
10/12/25(Sun)18:58:25 No.106869417

Anonymous 10/12/25(Sun)18:58:25 No.106869417

the scum of the earth, yes

Anonymous
10/12/25(Sun)18:58:51 No.106869421

Anonymous 10/12/25(Sun)18:58:51 No.106869421

>>106869397
running a LLM to summarize controversial opinions on twitter and get views money, is it that easy?

Anonymous
10/12/25(Sun)18:58:54 No.106869422

Anonymous 10/12/25(Sun)18:58:54 No.106869422

>>106869411
absolutely

Anonymous
10/12/25(Sun)18:59:29 No.106869424

Anonymous 10/12/25(Sun)18:59:29 No.106869424

File: file.jpg (97 KB, 513x324)

97 KB JPG

>>106869293
Found the mikufag.

Anonymous
10/12/25(Sun)19:01:51 No.106869443

Anonymous 10/12/25(Sun)19:01:51 No.106869443

>>106867966
not always
comparing glm 4.6 q2 and 4.5 air q5 on my 16gb vram + 96 gb ddr5 box it's obvious how air is much smarter despite being smaller
some quants are just too braindead - 4.6@q2 has rare moments where the underlying model shines through but otherwise it's just retarded

Anonymous
10/12/25(Sun)19:02:52 No.106869450

Anonymous 10/12/25(Sun)19:02:52 No.106869450

>>106869443
4.6 is good starting at q3

Anonymous
10/12/25(Sun)19:06:13 No.106869474

Anonymous 10/12/25(Sun)19:06:13 No.106869474

>>106868130
Testing this. Looks promising for poorfags

Anonymous
10/12/25(Sun)19:06:18 No.106869476

Anonymous 10/12/25(Sun)19:06:18 No.106869476

>>106869397
i still dont understand why they dont make it easier for europoors and amerifat weebs instead

Anonymous
10/12/25(Sun)19:08:30 No.106869490

Anonymous 10/12/25(Sun)19:08:30 No.106869490

>>106869450
don't make me get another 96gb kit anon

Anonymous
10/12/25(Sun)19:09:46 No.106869502

Anonymous 10/12/25(Sun)19:09:46 No.106869502

>>106869490
ram is cheap though

Anonymous
10/12/25(Sun)19:10:01 No.106869503

Anonymous 10/12/25(Sun)19:10:01 No.106869503

>>106869490
>ram

Anonymous
10/12/25(Sun)19:11:59 No.106869509

Anonymous 10/12/25(Sun)19:11:59 No.106869509

>>106869092
is he? doesn't seem to be getting as much as ollama kek

Anonymous
10/12/25(Sun)19:12:34 No.106869515

Anonymous 10/12/25(Sun)19:12:34 No.106869515

>>106869397
What happens when all the anime studios get replaced by indians too? How bad is it going to get?

Anonymous
10/12/25(Sun)19:12:39 No.106869516

Anonymous 10/12/25(Sun)19:12:39 No.106869516

>>106869502
yeah but 4.6 runs at ~4.5t/s for me while air gets almost 9t/s so i'll cope with that
>>106869503
yes and

Anonymous
10/12/25(Sun)19:14:29 No.106869528

Anonymous 10/12/25(Sun)19:14:29 No.106869528

4.6 air doko?

Anonymous
10/12/25(Sun)19:14:51 No.106869533

Anonymous 10/12/25(Sun)19:14:51 No.106869533

>>106869515
bro, anime studios are using cgi + filipinos it can't get any worse

Anonymous
10/12/25(Sun)19:16:06 No.106869540

Anonymous 10/12/25(Sun)19:16:06 No.106869540

File: 1738247793688651.png (69 KB, 498x281)

69 KB PNG

>>106869092
>isn't he getting sponsorships from the big companies?

Anonymous
10/12/25(Sun)19:16:13 No.106869541

Anonymous 10/12/25(Sun)19:16:13 No.106869541

>>106869515
Anime studios will get replaced by AI or outsourced to china/sea.

Anonymous
10/12/25(Sun)19:17:33 No.106869545

Anonymous 10/12/25(Sun)19:17:33 No.106869545

>>106869515
anime has been calarts since like 2005 anon. its been over for anime for so long you missed it entirely

Anonymous
10/12/25(Sun)19:17:39 No.106869546

Anonymous 10/12/25(Sun)19:17:39 No.106869546

>>106868375
how do you talk to it? you use it locally?

Anonymous
10/12/25(Sun)19:20:35 No.106869568

Anonymous 10/12/25(Sun)19:20:35 No.106869568

>>106869516
btw does 18.29 t/s prompt processing and 7.1t/s text gen seem like decent enough performance when running glm air q5 on a 9950x3d + 6950xt (rocm) + llamacpp with a full 16k context? not that it's too slow to coom with
>>106869528
this week :)

Anonymous
10/12/25(Sun)19:22:31 No.106869580

Anonymous 10/12/25(Sun)19:22:31 No.106869580

>>106869533
>it can't get any worse
They've only outsourced animation so far. Wait until you get sirs like product manager of llama 4 in charge of anime studios.

Anonymous
10/12/25(Sun)19:23:37 No.106869587

Anonymous 10/12/25(Sun)19:23:37 No.106869587

>>106869424
fuck u mc

Anonymous
10/12/25(Sun)19:24:16 No.106869591

Anonymous 10/12/25(Sun)19:24:16 No.106869591

>>106869476
being a western ally also means being forced to comply with globohomo policies and allowing in infinity jeets and other thirdies
sometimes I wish china and japan switched spots

Anonymous
10/12/25(Sun)19:25:54 No.106869603

Anonymous 10/12/25(Sun)19:25:54 No.106869603

>>106869568
>18.29 t/s prompt processing
Things are that tough over in AMD world huh

Anonymous
10/12/25(Sun)19:27:31 No.106869614

Anonymous 10/12/25(Sun)19:27:31 No.106869614

>>106868886
>slower runtime
It's not funny, I still keep old ass stable-diffusion-webui-forge for sd based models just because it's faster for some reason.

Anonymous
10/12/25(Sun)19:28:01 No.106869616

Anonymous 10/12/25(Sun)19:28:01 No.106869616

>>106869568
i am getting 4-5 tks on my machine so you are doing good

Anonymous
10/12/25(Sun)19:29:20 No.106869625

Anonymous 10/12/25(Sun)19:29:20 No.106869625

File: file.png (48 KB, 1318x233)

48 KB PNG

>>106869568
tg 7.1t/s at 16k context seems okay, prompt processing seems low, maybe because low context?
i get 5.6t/s at 32k (q8_0) tg and see picrel for prompt processing
(to be fair it's an older commit, here's newer prompt processing result:
INFO [ print_timings] prompt eval time = 161920.52 ms / 30489 tokens ( 5.31 ms per token, 188.30 tokens per second)
this is with -ub 1024 or 2048 likely i forgot
picrel is with -ub 4096 -b 4096
my setup: rtx 3060 12gb, 64gb ddr4, i5 12400f
quant: IQ4_KSS
ik_llama.cpp

Anonymous
10/12/25(Sun)19:29:21 No.106869626

Anonymous 10/12/25(Sun)19:29:21 No.106869626

>>106869603
no i was just retarded and took the number from a single ah ah mistress rather than a longer prompt

prompt eval time = 4099.60 ms / 75 tokens ( 54.66 ms per token, 18.29 tokens per second)
eval time = 31147.23 ms / 221 tokens ( 140.94 ms per token, 7.10 tokens per second)
total time = 35246.83 ms / 296 tokens

vs

prompt eval time = 45688.30 ms / 4280 tokens ( 10.67 ms per token, 93.68 tokens per second)
eval time = 13750.56 ms / 108 tokens ( 127.32 ms per token, 7.85 tokens per second)

no wonder it didn't feel *that* glacial to use
>>106869616
is that on dual channel ddr5?

Anonymous
10/12/25(Sun)19:29:33 No.106869628

Anonymous 10/12/25(Sun)19:29:33 No.106869628

>>106869397
Good riddance.

Anonymous
10/12/25(Sun)19:32:34 No.106869640

Anonymous 10/12/25(Sun)19:32:34 No.106869640

>>106869443
really? 4.6 at iq3_xxs is easily above any air quant for me

Anonymous
10/12/25(Sun)19:36:01 No.106869658

Anonymous 10/12/25(Sun)19:36:01 No.106869658

>>106869626
ddr4, RX6600, q3
but my pp is around the same number btw

Anonymous
10/12/25(Sun)19:37:04 No.106869663

Anonymous 10/12/25(Sun)19:37:04 No.106869663

>>106869658(me)
>but my pp is around the same number btw
ignore this part, I meant old ~20tks pp number

Anonymous
10/12/25(Sun)19:38:59 No.106869683

Anonymous 10/12/25(Sun)19:38:59 No.106869683

>>106869625
yeah, it was because it was only processing 75 tokens, it gets up to a whopping 93 (!) when going through 4k with a 2048 batch size, sizzling rocm performance saar
what tool is that on the pic, btw? i never really tried running any benchmarks beyond sending random prompts and checking the t/s
>>106869640
it's ~50% larger than what my ramcucked machine can handle so that's to be expected
>>106869658
ON DDR4? man that's around what i was getting with mixtral on ddr4, that's impressive
shame about the pp speed though, at least we know rocm does scale with faster cards? it's just that we get 1/4 of a 3060 with a 6950xt and it only goes downhill from there

Anonymous
10/12/25(Sun)19:39:14 No.106869685

Anonymous 10/12/25(Sun)19:39:14 No.106869685

>>106869658
how much ram, which q3 quant, what launch command?
4-5t/s seems really low
./llama-server --model ~/ik_models/GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf -t 6 -b 4096 -ub 4096 -c 16384 -fa --n-cpu-moe 1000 -ngl 1000 --no-mmap
try this
>8gb vram
ah... maybe try -b 2048 -ub 2048
unsloth's Q3_K_XL is really fast (9.5t/s on 0context on my 3060/64gb ddr4 setup)
i've heard that low IQ quants (excluding IQ4_XS) are slower on cpu

Anonymous
10/12/25(Sun)19:39:37 No.106869687

Anonymous 10/12/25(Sun)19:39:37 No.106869687

File: glm 4.6 tests.png (252 KB, 1903x1698)

252 KB PNG

Fucking GLM just paperclipmaxxed me.

Anonymous
10/12/25(Sun)19:40:44 No.106869697

Anonymous 10/12/25(Sun)19:40:44 No.106869697

>>106869683
> when going through 4k with a 2048 batch size, sizzling rocm performance saar
have you tried vulkan?
>what tool is that on the pic, btw?
llama-bench, it's inside build/bin along llama-server

Anonymous
10/12/25(Sun)19:41:52 No.106869708

Anonymous 10/12/25(Sun)19:41:52 No.106869708

File: glm 4.6 tests 2.png (237 KB, 1903x1698)

237 KB PNG

Anonymous
10/12/25(Sun)19:42:57 No.106869723

Anonymous 10/12/25(Sun)19:42:57 No.106869723

>>106869708
Nice.

Anonymous
10/12/25(Sun)19:43:28 No.106869727

Anonymous 10/12/25(Sun)19:43:28 No.106869727

>>106869708
>>106869687
i have no idea what you mean by 'paperclipmaxxed' but why are the tests in python? arent you writing llm.c?

Anonymous
10/12/25(Sun)19:44:08 No.106869733

Anonymous 10/12/25(Sun)19:44:08 No.106869733

>>106869546
IQ4XS yes.

Anonymous
10/12/25(Sun)19:46:00 No.106869751

Anonymous 10/12/25(Sun)19:46:00 No.106869751

>>106869685
i probably get low tks because I use Vulkan backend (trying to cheat my way into ROCm just crashed the system)

Anonymous
10/12/25(Sun)19:47:10 No.106869757

Anonymous 10/12/25(Sun)19:47:10 No.106869757

>>106869687
You should eliminate those markup things.

Anonymous
10/12/25(Sun)19:47:12 No.106869758

Anonymous 10/12/25(Sun)19:47:12 No.106869758

>>106869727
Yes, the tests are in Python to compare the numerical accuracy to the official transformers library.
The paperclip thing is an idea coined by Yudkowsky.
"A paperclip maximiser is a theoretical artificial intelligence whose usefulness encompasses something that humanity would deem practically worthless, like the maximizing the number of paperclips in the known universe."

Anonymous
10/12/25(Sun)19:53:06 No.106869801

Anonymous 10/12/25(Sun)19:53:06 No.106869801

>>106869751
if you're on linux you should try wrangling rocm to work, it might be worth it and trust me itll be fun
>>106869758
thanks anon

Anonymous
10/12/25(Sun)19:55:51 No.106869824

Anonymous 10/12/25(Sun)19:55:51 No.106869824

i just remembered a youtube video from 2016-2019 about paperclips being made by robots (using whole planet's resources), i didnt know Yudkowsky was a safetyfaggot for that long

Anonymous
10/12/25(Sun)19:58:39 No.106869842

Anonymous 10/12/25(Sun)19:58:39 No.106869842

File: Screenshot 2025-10-13 015650.png (209 KB, 837x335)

209 KB PNG

>>106869697
> have you tried vulkan?
no, i treat that as a fallback for when i can't get rocm working, but i guess there's no harm in trying just to be sure
> llama-bench
forgot to set the batch sizes award
will rerun with 4096 and without the second iteration because it doesn't seem to be that variable anyway

Anonymous
10/12/25(Sun)20:04:55 No.106869879

Anonymous 10/12/25(Sun)20:04:55 No.106869879

File: Screenshot 2025-10-13 020324.png (205 KB, 954x308)

205 KB PNG

>>106869842
done, it shat the bed at 4k but i saw that spilled out of vram so that's to be expected
well, at least i learned that my batch sizes were suboptimal because i forgot to set them in my regular llama-server script, so thanks for the tip

Anonymous
10/12/25(Sun)20:09:33 No.106869912

Anonymous 10/12/25(Sun)20:09:33 No.106869912

>>106869733
got dam, how much rams you got

Anonymous
10/12/25(Sun)20:13:47 No.106869940

Anonymous 10/12/25(Sun)20:13:47 No.106869940

>>106869751
>>106869801
fyi rocm works* natively on windows now, i have no idea how i got it running and i'm scared of having to reinstall the os because i know i'll never manage to do it again, but it *is* possible (see 6950xt benches)

Anonymous
10/12/25(Sun)20:20:24 No.106869991

Anonymous 10/12/25(Sun)20:20:24 No.106869991

File: yikes.png (74 KB, 256x220)

74 KB PNG

>>106868989
what is she thinking at this moment?

Anonymous
10/12/25(Sun)20:23:07 No.106870007

Anonymous 10/12/25(Sun)20:23:07 No.106870007

>>106868989
Everyone thinking TPUs will be disappointed when they release TamagotchiGPT 2 years from now.

Anonymous
10/12/25(Sun)20:24:54 No.106870018

Anonymous 10/12/25(Sun)20:24:54 No.106870018

>>106870007
>TamagotGeePeeTee

Anonymous
10/12/25(Sun)20:27:12 No.106870031

Anonymous 10/12/25(Sun)20:27:12 No.106870031

>>106869991
CCPUs are going to take over the world.

Anonymous
10/12/25(Sun)20:39:07 No.106870095

Anonymous 10/12/25(Sun)20:39:07 No.106870095

After comparing both it's clear that bart's quants are less slopped and more reactive compared to unsloth's. I thought something might have been going on with my settings but it was the quanter all along.

Anonymous
10/12/25(Sun)20:40:13 No.106870102

Anonymous 10/12/25(Sun)20:40:13 No.106870102

>>106870095
comparing both what? which quants for which model(s)?

Anonymous
10/12/25(Sun)20:40:46 No.106870107

Anonymous 10/12/25(Sun)20:40:46 No.106870107

>>106870095
We've known since exl2 came out two years ago that quantizing against a calibration dataset like with imatrix is essentially a soft-finetune that prioritizes certain weights over others.
We used to have rpcals but the quant cartel shilled against them enough to make them go extinct.

Anonymous
10/12/25(Sun)20:41:25 No.106870109

Anonymous 10/12/25(Sun)20:41:25 No.106870109

File: art boy alone drink 6b9f8(...).jpg (56 KB, 800x600)

56 KB JPG

Individuals cannot and will not be allowed to purchase the future computing hardware that will be designed from the ground up, essential to train and run the next generation of AI models that succeed generic neural networks. Corpo friends of NVIDIA only, encrypted, all measures taken with digital fuses so that they will be useless even if stolen.
This is the best local will ever have it.

Anonymous
10/12/25(Sun)20:41:41 No.106870112

Anonymous 10/12/25(Sun)20:41:41 No.106870112

on a side note, i think amd is as gay as nvidia
they were always playing catchup, and gave less vram than nvidia
3090 vs 6950xt for example (24 vs 16)
fucking cousins man, intel is our only hope isnt it
>>106869879
>>106869842
nice, happy to see -b/-ub helped
you can use 2048 too, not a huge difference compared to 4096 at least for me
and flash attention might help a little with vram
16gb is a lot, dunno how you're running out of it
probably because >Q5_K_S
anyways, really glad rocm support is getting better. I'm tired of the nvidia monopoly

Anonymous
10/12/25(Sun)20:44:03 No.106870130

Anonymous 10/12/25(Sun)20:44:03 No.106870130

>>106870107
Did these actually do anything though? I only really saw DavidAU claim they actually did anything with some of his "dark finetunes imatrix" models.
I kind of want to compare top 10 tokens with an rpcal if that's the case.

Anonymous
10/12/25(Sun)20:46:47 No.106870148

Anonymous 10/12/25(Sun)20:46:47 No.106870148

>>106870107
kld proof of this right now or you're spreading fud

Anonymous
10/12/25(Sun)20:49:49 No.106870166

Anonymous 10/12/25(Sun)20:49:49 No.106870166

>>106870107
>rpcals

>>101801604

Anonymous
10/12/25(Sun)20:51:51 No.106870178

Anonymous 10/12/25(Sun)20:51:51 No.106870178

This is my daily post to complain that I can't buy an 8x H200 system. That is all.

Anonymous
10/12/25(Sun)20:52:08 No.106870182

Anonymous 10/12/25(Sun)20:52:08 No.106870182

>>106870178
get rich nub

Anonymous
10/12/25(Sun)20:57:01 No.106870204

Anonymous 10/12/25(Sun)20:57:01 No.106870204

File: Screenshot 2025-10-13 025440.png (22 KB, 513x536)

22 KB PNG

>>106870112
tbf i have a lot of things with hardware acceleration open (firefox, electron apps, etc) so it only has ~10gb to work with - picrel is my system at "idle"
if i disabled hw accel and/or ran everything off the igpu it'd obviously be way better but i don't want to sacrifice general system usability for somewhat faster llama.cpp performance
as it stands now it looks like 1024 is the most reliable for me since it doesn't spill into ram even on a system as badly optimized as this

Anonymous
10/12/25(Sun)20:57:44 No.106870210

Anonymous 10/12/25(Sun)20:57:44 No.106870210

>>106870095
Unsloth can't quant for shit, nothing new

Anonymous
10/12/25(Sun)20:57:59 No.106870213

Anonymous 10/12/25(Sun)20:57:59 No.106870213

>>106870182
Shouldn't have to buy a system the cost of a small house to run models locally

Anonymous
10/12/25(Sun)20:59:20 No.106870218

Anonymous 10/12/25(Sun)20:59:20 No.106870218

>>106870213
you're crying while you have the option at all to run one of the most disruptive pieces of cutting edge tech, compare that to people drooling over fast servers in the early ages of computing

Anonymous
10/12/25(Sun)20:59:31 No.106870219

Anonymous 10/12/25(Sun)20:59:31 No.106870219

>>106870112
and yeah the 6900 series should have had 32gb of vram, giving it 16 was criminal and besides it would have helped them make sense vs the 6800/xt, i only got mine because they were heavily discounted right before the 7800xt came out

Anonymous
10/12/25(Sun)20:59:44 No.106870222

Anonymous 10/12/25(Sun)20:59:44 No.106870222

>>106870213
You can run a saas with it later, not the worse investment

Anonymous
10/12/25(Sun)21:05:50 No.106870256

Anonymous 10/12/25(Sun)21:05:50 No.106870256

we should do more to advance local roleplay technology and replace sillytavern with a better frontend

Anonymous
10/12/25(Sun)21:07:37 No.106870267

Anonymous 10/12/25(Sun)21:07:37 No.106870267

>>106870256
you'll make the logo i assume?

Anonymous
10/12/25(Sun)21:08:37 No.106870274

Anonymous 10/12/25(Sun)21:08:37 No.106870274

>>106870256
I'll provide feedback on the mascot

Anonymous
10/12/25(Sun)21:08:59 No.106870275

Anonymous 10/12/25(Sun)21:08:59 No.106870275

>>106870256
Mikupad is already here

Anonymous
10/12/25(Sun)21:09:17 No.106870277

Anonymous 10/12/25(Sun)21:09:17 No.106870277

>>106870267
No, that's me.
Anon will code the whole thing, he's already at it as we speak, surely.

Anonymous
10/12/25(Sun)21:10:21 No.106870286

Anonymous 10/12/25(Sun)21:10:21 No.106870286

>>106870256
Thrust in p-e-w he'll whip us up something godly, that's deeply integrated into ollama, but that's a worthy sacrifice.

Anonymous
10/12/25(Sun)21:10:59 No.106870291

Anonymous 10/12/25(Sun)21:10:59 No.106870291

>>106869476
Because they won't work for free.indians and pakis will and speak English.

Anonymous
10/12/25(Sun)21:14:26 No.106870320

Anonymous 10/12/25(Sun)21:14:26 No.106870320

>>106870310
>>106870310
>>106870310

Anonymous
10/12/25(Sun)21:26:49 No.106870399

Anonymous 10/12/25(Sun)21:26:49 No.106870399

>>106870031
yes, and that's why they pay her the big bucks

Anonymous
10/12/25(Sun)21:41:45 No.106870470

Anonymous 10/12/25(Sun)21:41:45 No.106870470

>>106870218
I'm just pissed that nvidia has been allowed to monopolize things

Anonymous
10/12/25(Sun)22:31:34 No.106870737

Anonymous 10/12/25(Sun)22:31:34 No.106870737

>>106870470
ahh the wonders of capitalism

Anonymous
10/12/25(Sun)23:38:18 No.106871129

Anonymous 10/12/25(Sun)23:38:18 No.106871129

>>106869401
>https://github.com/ggml-org/llama.cpp/pull/15904#issuecomment-3395433952

Is that all I'd have to do? Build that PR, uses standard a GLM4.6 gguf with the official chat template?

Honestly I wish it'd work with TabbyAPI since it's faster but I'll use that if it works.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.