[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106857386 & >>106851720

►News
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391
>(10/08) Ling-1T released: https://hf.co/inclusionAI/Ling-1T
>(10/07) Release: LFM2-8b-A1b: Hybrid attention tiny MoE: https://liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: comfy-mikus.jpg (1.34 MB, 2048x2048)
1.34 MB
1.34 MB JPG
►Recent Highlights from the Previous Thread: >>106857386

--Local LLM agentic system setups and performance optimization challenges:
>106860325 >106860374 >106860456 >106860541 >106860443 >106860477 >106860490 >106860538 >106860630 >106864266 >106864274 >106860515 >106860527 >106860577 >106860598 >106860690 >106860755 >106860555 >106860626 >106860641
--KAT-Dev-72B-Exp model achieves 74.6% accuracy on SWE-Bench Verified:
>106857848 >106857858
--Evaluating Tilelang's potential as a CUDA alternative for GPU kernel development:
>106862606 >106862657 >106862899 >106863225
--Gemma 3 model support status and framework compatibility discussion:
>106858900 >106858933 >106858955 >106858980 >106859012 >106859030
--iPhone 17 Pro runs Liquid AI's 8B LLM, Mac Studio future AI hardware speculation:
>106861745 >106861784 >106861791 >106861817 >106861804
--Orange Pi AI Studio Pro's hardware limitations and pricing inconsistencies:
>106857498 >106857536 >106857552 >106857645 >106858073
--Deepseek 3.2 outperforms glm 4.6 in long-context tasks:
>106858691 >106858698 >106858722 >106858719 >106858734
--GLM 4.6 censorship reduction sparks discussion on model creativity vs safety trends:
>106858586 >106858712 >106858770
--Hugging Face storage limit reductions and user workarounds:
>106864289 >106864305 >106864398 >106864430 >106864469 >106864475 >106864516
--Critiquing Aider's limitations and exploring portable AI coding tool alternatives:
>106859033 >106859045 >106859055 >106859098 >106859109 >106859113 >106859120 >106859128 >106859145
--Questioning Apriel-1.5-15b's claim of competing with Deepseek R1 based on performance plot:
>106862060 >106862073
--Microsoft's Amplifier enables 7B model to surpass 600B model:
>106861985
--MLX TRM reimplementation with recursive reasoning and deep supervision in GitHub:
>106864860
--Miku (free space):
>106861438 >106864274

►Recent Highlight Posts from the Previous Thread: >>106857387

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>106865586
>--Miku (free space):
>>106861438 >106864274

>>106865563 is missing
>>
>>106865611
Sorry sir, will do better next times.
>>
>>106865611
>is missing
good
>>
>>106865630
Oh mikutroon is butthurt.
>>
Lars Legra thermite
>>
>>106865635
i dont care about miku, but keep your shitty bbc fetish to yourself, cuck
>>
>>106865642
A request I see?
>>
>>106865638
sirs please english only or indan english thanks you
>>
>>106865586
cannot unsee the inconsistent legs
thank you for your work o7
>>
File: ComfyUI_00326_.jpg (147 KB, 1024x1024)
147 KB
147 KB JPG
>>106865582
>sipping a cool one with Miku in the lounge
does she need three identically filled glasses? the other two should be empty or otherwise different. similarly only a psychopath would need to align the beer bottle labels like this. can that be prompted around?
it's a nice gen, saved
>>
>>106865771
I do this so that I can drink two fast and enjoy the third one slowly. I am an alcoholic. The correct answer is that the extra glasses were for her friends who took the photo
>>
sirs sex with green haired teen girl please thank you
>>
>>106865823
>"her friends"
right, sure thing dude
>>
>>106865832
Yes?
>>
What should I call the current era in the timeline pic? New Chinese king era? China vs China era? Total Chinese domination era part 4? GLM new king of local era?
0 competition from the west makes it difficult to come up with names.
>>
>>106865771
For a magazine ad, showing the bottle labels would be the whole point.
>>
>>106865823
It would be weird to place your friends drinks on the floor
>I am an alcoholic
pls stop, limit yourself to one at a time, that's an easy rule to begin with
>>
>>106865861
end of the era
>>
>>106865861
>China vs China
I'd go with that.
>>
>>106865861
Why do you feel compelled to keep adding new eras? This is still the Chinese domination and flood era.
>>
>>106865852
>Rin's intense focus
>Miku, nonchalant, mild blush
>when the connotations do more than what is actually shown
this is art
>>
>>106865852
taking pictures to send to the doctor to ask if this growth is genital warts or cancer, with miku
>>
>>106865861
promised sex era
>>
>>106865861
GLM 4.6 era
>>
>>106865894
>It is with my deepest sadness to inform you that you have an exceptionally rare form of cancerous genital warts
>>
>>106865861
We need some fresh Air era
>>
>>106865876
The flood has certainly ended, but Chinese domination indeed continues. I just add new eras for consistency so we don't have a one fat year-long era.
>>
>>106865936
Two mor..
hey wasn't that a week ago?
>>
>>106865943
An era is not something based on time, it's a change in meta. And meta is not changing
>>
>>106865875
Seconded, realistically that's what it is for local ai waifu dreamers
>>
>>106865861
cunny era
>>
>>106865861
I just want MoE meta to be formally defined.
>>
I came.
>>
File: small and open.jpg (6 KB, 299x168)
6 KB
6 KB JPG
>>106865962
Let's hold onto it until we have small model releases again
>>
>>106865960
There are still distinct changes in the types of models and how they're used, I think it's interesting to see the timeline pic and remember how far we've come. If there's literally nothing interesting to mention in a 3/6 month period then we're fully cooked roasted and glazed
>>
>>106865861
Three Kingdoms Era? (Deepseek, GLM, Qwen)
>>
>>106865981
We've been in a MoE meta for almost two years now. There haven't been any dense models worth using since relative to the local SOTA since late llama2.
>>
>>106866016
Hello sars you is forgetting llama 4 is very best model fucking bastard guy
>>
>>106866016
>Deepseek, GLM, Qwen
Is Qwen like the unsloth of the trio?
>>
Do you need a supercomputer to animate images like grok locally?
>>
>>106866035
no just skillz
>>
>>106866028
Their dataset is safetysloped, but they're legitimately innovating in the field
>>
>>106866075
I wonder if GLM will change that.
>>
>>106866016
>deepseek
lol
>qwen
lmao
>>
>>106866016
>no kimi
trash
>>
>>106866112
They haven't released anything new since the last era
>>
File: 1735837900982730.png (10 KB, 826x104)
10 KB
10 KB PNG
>>106864289
The limits aren't set in stone. I got bigger private storage despite not being pro because my models get a lot of traffic.
>>
File: file.png (1.89 MB, 1328x1328)
1.89 MB
1.89 MB PNG
>>
File: modelz.png (32 KB, 684x284)
32 KB
32 KB PNG
>>106865866
Looks too artificial/forced in OP with the intent of depicting a cozy scene. You should show the product from multiple angles and assume your audience understands basic object constancy.
>>106866134
That's like one good quant?
>>
>try to use glm 4.6 with tools
>tool call parsing doesn't work in llama-server as per usual
>>
>>106866075
lol, I know Tian'anmen is also a real place that can indeed be talked about but it's funny of all the things they had to include that in one of the examples considering how often people talked about The Event this name also refers to as a way to mock chink LLMs
>>
>>106866232
llama.cpp isn't made for this sort of thing
>>
>>106866242
Then what is it made for?
>>
File: limits.png (14 KB, 623x99)
14 KB
14 KB PNG
>>106866134
I don't have many models and they don't get much traffic at all (I wouldn't even recommend them), but I have a few datasets and higher limits than that.
>>
>>106866247
asking a subset of new models to count r's in strawberry on your macbook pro
>>
>>106866247
Asking models what's the capital of Bulgaria with llama-cli.
>>
>>106866134
yeah, hence why shilling your models is in fact a good thing that gives real benefits for some time now
>>
the limits had to happen. Can you faggots even imagine how much crap people posted on HF? I think only youtube can rival them in how many terabytes of literal garbage nobody wants is stored for no reason and costs money for no reason
the world doesn't need 30 different people doing all possible quant variants of models like deepseek
and then doing it again for all the troontunes out there
>>
>>106866251
>few datasets
That might be the main criteria, most of my datasets are private
>>
on the paranoia of possible anti china censorship on HF: that's not going to happen, and even if it did, there is modelscope, china doesn't depend on HF to publish and store their own models.
https://www.modelscope.ai/home
nothing is going to happen, calm your Q conspiracies
I repeat, nothing ever happens
>>
>>106866283
we know pew, only ollama should have the privilege >>106864647
>>
>>106866300
non-sequitur
not allowing an infinite number of people making the same quants over and over again doesn't mean not having the quants at all
HF will allow a few like unsloth and bartowski to publish the quants you want
nobody needs 300000 repositories of the same fucking model
>>
>>106866326
>HF will allow a few like unsloth and bartowski to publish the quants you want
just fuck the little guy i guess
>>
>>106866283
Honestly quite curious about how HF manage their storage, it seems most would be busty some small % of repos that deliver 90% of the content. Even on hot shit with duplex gigabit their servers are slow and transfers fail.
>>
>>106866345
you're not enough of a good boy for the privilege to exist you scum
>>
What is the best GGUF format multi modal model that can run on 96GB of VRAM and 256GB of RAM?
>>
>>106866348
>heir servers are slow and transfers fail.
not my experience at all, they always saturate my DL speed like steam tends to do
>>
>>106866326
how do you propose this to work? only approved accounts allowed to post ggufs, only approved can be linked to the "Quantized" link,?
>>
>>106866359
Not him but I get hundreds of MB/s and then the downloads fail midway or just get stuck at 99%.
Not sure what's to blame but that happens to me almost every time I download from a fast machine.
>>
>>106866348
>how HF manage their storage
They went hardcore on deduplication at chunk level (64 kb is the default their documentation says) with xet, the overhead isn't negligible in the lookup of all the chunks that corresponds to a real file and depending on how it's implemented the amount of syscalls going on
>>106866381
You can post whatever you want if you're under the maximum storage limit. If you want to go over, you have to request it, or, if you're already a well known figure and recognized as useful by the staff, you might even be given more lenient limits without asking anything.
It's logical and makes sense. Why do people think they are entitled to post unlimited terabytes of shit in cloud storage? Nobody offers unlimited storage aside from YouTube, which is why I compared them to HF. YouTube has tons of 0 to 5 viewers kind of videos and to this day I don't understand why they allow this.
Anywhere else, you pay if you want storage. Google won't give it for free on google drive, and neither will MS on onedrive.
>>
>>106866433
>I don't understand why they allow this.
free training data
>>
>>106866232
>a month old unmerged PR that adds support for the GLM 4.6 tool call format
https://github.com/ggml-org/llama.cpp/pull/15904
>>
>>106866441
Holy fuck that Jinja.
>>
>>106866433
>Xet by Hugging Face is the most important AI technology that nobody is talking about!
Download speed SUCKS ASS
Do you use the hf huggingface_hub[cli] thing?
HF are not your friend
>>
>>106866283
What happened now?
>>
>>106866517
huggingface exit scammed
>>
File: hf speed.png (146 KB, 1869x328)
146 KB
146 KB PNG
>>106866232
>>106866242
This is why I let the model to send tool use requests as normal user messages.
Tool usage at the API level should've never have been a thing.
Hell, OpenAI API should've never have been a thing. We should have direct API level to the raw text completion endpoint with no chat template attached and full access to all the logits, the only reason this hasn't happened is because of (((Open))) AI wanting to prevent jailbreaks.
That said I'm sure you can make a proxy that converts API tool requests to normal messages.

>>106866492
???
Are we talking about the same huggingface?
I always get >200MB/s both using the cli tool and simple wget.
>>
>>106866527
>I always get >200MB/s both using the cli tool and simple wget.
I get even more using a download manager on windows.
>>
>>106866356
Mistral Nemo Instruct 12B
>>
File: 1745860140921914.jpg (50 KB, 700x759)
50 KB
50 KB JPG
>>106866283
>Public storage
>usually up to 5TB

Really? This is what people were upset about last thread? The original weights for deepseek r1 didn't even reach 700 GB in the average anon here, let alone the average HF user, isn't approaching anywhere near that limit anytime soon.

>Repository size: The total size of the data you’re planning to upload. We generally support repositories up to 300GB.

This is the storage quota people should most care about, and that wasn't even changed
And that wasn't even changed.

See the December 17, 2024 archived page:

https://web.archive.org/web/20241217185816/https://huggingface.co/docs/hub/en/storage-limits

The only real change is that you don't have unlimited storage PER BASE ACCOUNT and you get limited to 5 TB (More than enough storage for 90% of users. Most of us don't even create our own models, our fine tunes anyway. So this is a non-issue for basically all of /lmg/
>>
>>106866548
Not helpful. Qwen 72B 2.5VL is fucking garbage and that's a 72B model. I need something big and good. Unfortunately VL support in llama.cpp is minimal for some reason. Would love to use Qwen 3VL, but it doesn't work.
>>
>>106866523
Haven't followed anything. I just download some models from there now and then. Where can I read more about this? (I'm not trolling).
>>
>>106866517
See >>106866561
Look at the current page:
https://huggingface.co/docs/hub/en/storage-limits
Vs what the limits were back in December:

https://web.archive.org/web/20241217185816/https://huggingface.co/docs/hub/en/storage-limits

Basically unlimited storage isn't a thing anymore, which wasn't sustainable long-term anyway.
>>
>>106866564
You could try InternVL3. Based on Qwen 3 and supported by llama.cpp, but it does not handle NSFW well.
>>
>>106866576
Yeah makes sense.
>>
>>106866561
4chan has always been filled by the poor and entitled who feel it's an affront to their self respect if they can no longer abuse a free resource
I bet some of the whiners here might have stored things other than models there eh
>>
>>106866584
I tried it before and it kind of sucked. Fortunately I am actually trying to do work instead of coom right now.
>>
>>106866574
All the drama is happening on /r/LocalLlama so you'd have to go read about it there.
>>
>>106866589
>le poor
It's just an imageboard on internet, jesus christ.
>>
>>106866594
I found the 38B to be much better than the 30B, but haven't tried the 241B.
>>
>>106866612
Gonna try an IQ4XS of the big one.
>>
>>106866601
>just an imageboard on internet
"just"
every place has its culture and audience
just like how HN has webshits and plebbit has neckbeards
>>
>>106866624
Obviously you are too good to be here. Why are you not sipping whiskey at your country club then?
>>
>>106866359
They should run an automated torrent tracker with a handful big bitch SSD systems for initial seed
>>
>>106866589
Isn't this the very thread that scorns and looks down upon anyone who relies on non-local services? Anyway? Wouldn't they be the same people that would advocate for storing most, if not all of the shit you care about on your own hard drives anyway? Out of all the people that would even be paying attention to how much you can store on HF, I'd expect /lmg/ to care the least

Someone last threat said:
>>106864337
>The west realized that they won't be able to hold China back so they're now trying to kill Deepseek and the others like thiis

This wouldn't even affect released like deepseek anyway because models like those don't even breach 1 TB in storage.

And even then the 5 TB public repo size limit only applies to free users. Pro users and above get more storage as seen in link and pic rel.

https://huggingface.co/docs/hub/en/storage-limits

Even for bigus-dickus models like deepseek-r1 or Kimi The 5 TB base public storage is more than enough and for most other people that's an absurd amount of storage that's almost difficult to comprehend anyway. I'm not even convinced LLMs from the "big league" companies like gpt5, Claude, or even Gemini ≥ 2.5 are 5 TB in size.
>>
>>106866728
>I'd expect /lmg/ to care the least
There's a lot of people here that can't or won't quant their own models because they're scared of 2 commands in the terminal, so they rely on these mass uploaders to do it for them. For something like AWQ or exl I get, but no one using gguf should care.
>>
>>106866728
you know what's really offensive about all this
no, not the storage limits
those retarded emojis being spammed all the time these days
particularly hate that prayer hand thing, very jeet
>>
File deleted.
Any big (>500B) model releases lately?
>>
>>106866785
>bizzare tranny propoganda
go away
>>
>>106866778
That's just ultra normies sticking their noses into things they have no business touching. Some middle management parasite probably looked at it and said "umm well it looks like but you need to like make it more personable". I have idiots like this at the current WFH job I work at and they care more about whether or not I'm verbally jerking off the clients I email than information accuracy or if I am actually doing the job well.
>>
>>106866826
Imagine how much better the world would be if AI was used to put middle managers. marketing, and HR types out of a job and they were put to manual labor instead.
>>
Not sure if this is the correct thread to ask but what's the easiest way to set up AI text-to-speech that I could run locally?
All I need it for is the simple task of text-to-speech in English with no frills, no audio editing, no effects etc.

I tried googling for an answer but all I got were some fucking browser-based SERVICES with subscriptions past a free trial.
>>
>>106866866
https://github.com/remsky/Kokoro-FastAPI
>>
File: 08df6755874a2d65.jpg (38 KB, 271x333)
38 KB
38 KB JPG
>>106866897
tyvm seems simple enough
>>
>>106866855
Soon the great cleanse, although it may work out differently to how you imagine
>>
File: file.png (7 KB, 473x46)
7 KB
7 KB PNG
brings a tear to my cock to see the rabbits wake up
>>
File: file.png (534 KB, 687x393)
534 KB
534 KB PNG
Perfect...
>>
File: file.png (368 KB, 431x506)
368 KB
368 KB PNG
>please redownload the quant for the 20th time
>>
Okay, Qwen3 30 seems to be working, I think. It's hard to know with kobold sometimes, gave me a bunch of errors but launched anyway.
But I can't find any sillytavern templates for it. (Context, Instruct, System Prompt)
Am I missing some repository of knowledge or people stopped sharing these things?
>>
File: file.png (312 KB, 386x481)
312 KB
312 KB PNG
I hope you see this in your dream tonight anon.
>>
>>106866985
quanted picture award
>>
>>106867015
Someone make wartime propaganda poster from this
>>
File: file.png (652 KB, 520x653)
652 KB
652 KB PNG
Dear god... This man will become a mass shooter someday.
>>
>>106866999
qwen uses ChatML templates
>>
>schizophrenic obsession on full display
yokes
>>
File: file.png (1.25 MB, 1548x823)
1.25 MB
1.25 MB PNG
Can someone tell me what is there on the whiteboard?
>>
>>106865861
It seems like we are in the MoE era
>>
>>106867078
Be more precise. I am not obsessed I am deeply profoundly jealous. They are genuinely retarded but they are successful. I still get triggered by the world being about connections and... I don't even know what the fuck got them where they are but for sure it is not competence.
>>
Unsloth guy fixed the gradient accumulation training.
>>
>>106867107
just like unsloth i don't how you precision
>>
>>106866527
>This is why I let the model to send tool use requests as normal user messages.
Does this work with ST?
>>
>>106867130
In what?
>>
>>106866564
qwen VL 3 235B is is the only competitor to gemini
>>
> I tested it on long context and works really great. It is awesome. It captured correctly the entire narrative, checked science, impressed. 118K at my current count.

https://huggingface.co/TheDrummer/Cydonia-Redux-22B-v1.1

Some guy claims that this 32K ctx base model from last year somehow reached 128K ctx without breaking down after my tuning.

That's probably bullshit, can someone prove him/me wrong?
>>
>>106867157
Yeah, except it doesn't work as a GGUF. The intern VL model is also not really working despite having a GGUF.
>>
>>106867158
answering yes or no after long contexts isn't hard, that's what they benchmaxx on with needle in stacks, making proper use of all the context though is very hard
>>
>>106867158
suck a dick
>>
>>106867085
Where are the scrum tickets?
>>
Are we still pretending that gpt-oss-120b is bad?
>>
>>106867162
I captioned a million images for like $10 just saying
>>
>>106867158
While you're here can you get model cards and bart imax quants for these https://huggingface.co/TheDrummer/Magidonia-24B-v4.2.0
https://huggingface.co/TheDrummer/Cydonia-24B-v4.2.0
>>
>>106867167
No, it's apparently printing out fully coherent paragraphs and making use of all the context properly.
>>
https://github.com/asgeirtj/system_prompts_leaks/blob/main/Anthropic/claude-4.5-sonnet.md
I love how SOTA models can ingest 30k worth of tokens as their system prompt yet we local plebe only get that much as usable range and our models become autistic if we feed more
desu what I want from local these days is really just that, more context, they are capable enough for what I use them for but can we get a model that doesn't become retarded after 30k
>>
>>106867185
Sure, let me give him a ping.
>>
>>106867187
It's trained with that system prompt.
>>
>>106867158
Well most of your 12b models break down already after 1000 tokens or less. But this is because Mistral 12b is dumb by default. It's not going to change.
Cydonia is better.
It's easy to test: create a character description which mentions inventory - character has 25 gold and bow and arrows.
Then after a while ask how much gold this character is carrying?
Most braindead models cannot get even this right.
>>
>>106867152
https://huggingface.co/blog/gradient_accumulation
https://unsloth.ai/blog/gradient
Their fix made it into transformers shortly after.
>>
>>106867185
Oh, model cards. Yeah, I'm fucking lazy. But hey, the model's miles better than v4.1 if that's what you're wondering.
>>
>>106867209
That would be nice to have, but imax quants are more important as I can only run cope quants, so IQ ones are my salvation for these, will try them as soon as that's available, currently using non r1 4.1.
>>
>>106867158
>>106867167
>>106867186
aids-ridden-homosexual-nigger-samefag

glm-era. kys
>>
>>106867238
>glm-era

Fuck yeah, I love that model. Preach!
>>
>>106867244
>most enthusiastic AI loving pajeet that is about to replaced by AI
>>
>>106867158
test your own models yourself, you hack
>>
File: file.png (1.07 MB, 964x1300)
1.07 MB
1.07 MB PNG
>>106867261
You dropped an image.
>>
>>106867261
he has an entire discord army to do beta tests for him, his models get more testing than most shit labs throw out
>>
>>106867282
tested by jeets for jeets
>>
>>106867187
Nothing happens if you just use that prompt with a local model.
>>
>>106866855
>>106866922 me
By that I suggest consider getting your basic life essentials self-sustainably. Society as we knew it won't continue like this. Power for a GPU rig may be a luxury if you aren't prepared
>>
>>106867289
I wonder if you have like a modular sysprompt (as in it is not the same everytime but a stack of different components) and you add it into pretraining: would that make your model handle long context better? Like... the content of this sysprompt does not really matter that much, but just training weights on long sequence lengths would make it perform better on a real long sequence? Kind of synthetic augmentation of data.

Yes give me the paper that was written about it already please.
>>
>>106867337
Sadly, self-sustainability isn't really an option when living in the city. Even if one has a summer cottage somewhere far from the cities, getting to it when SHTF won't be easy.
>>
Every day we make it through is a day closer to GLM MTP in llama.cpp
>>
File: miku baja blast drink.jpg (96 KB, 1080x1062)
96 KB
96 KB JPG
>>106865582
>>
File: 1676937610877685.png (570 KB, 694x780)
570 KB
570 KB PNG
>>106867348
I'm assuming this is done already, the model would be hella retarded if you trained it on many similar initial token seq samples
Don't get long context desire beyond coding/agent shiz, 16K is enough (w thinking glm4.6btw)
>>106867364
Maybe time to properly consider a GTFO plan
Poast your oldest lmg memes
>>
>>106867197
When some guy kills himself and they need to make the model more "safe" do you think they finetune? No, they first change the system prompt. And the API has a different prefill than the web UI.
So it's not completely baked in.

>>106867187
I think it might be possible to improve model quality by first asking it a question, saving the answer, then asking it the same question surrounded by unrelated text, and training on the original answer. That might not teach it to use the information from the whole context but at least it'll fight degradation juts from having a long prompt full of unrelated information.
But there are also ways in which this could backfire (teaching the model to ignore actually important information in other contexts.
>>
>>106867454
>>
>>106867441
I don't get it, how is this a bad thing? She needs to pee and I want baja blast.
>>
>>106866467
jinja please
>>
>>106867441
I'll take two, extra large please
>>
piss fetish anon are you still here?
>>
>>106867454
>we've been here for more than 2 years already
Man...
>>
File: 1680313064680303.jpg (93 KB, 715x404)
93 KB
93 KB JPG
>>106867454
Forgot image. Timestamp of this file on my hard drive is April 1 2023.
>>
>>106867454
>lmg memes
https://arch.b4k.dev/v/thread/620908196/#620913184
>>
Same model showdown

27B... quant 4 (or a similar sized fancy quant)

- vs -

8B, not a quant.
>>
>>106867869
I once got behind on work because I was trying to read all of the Gawker blogs. It kept getting bigger and bigger and bigger and bigger.
>>
>>106867869
Has everyone managed to finally catch up on AI literature in the last two and half years?
>>
File: G3AYSpAWsAEoNzc.jpg (1.58 MB, 3070x4096)
1.58 MB
1.58 MB JPG
https://x.com/techdevnotes/status/1977106957871071273
>>
File: 1760271418622830.webm (3.94 MB, 720x1280)
3.94 MB
3.94 MB WEBM
The dSPY GEPA experience
>>
>>106867933
what the ?
>>
>>106867892
Higher B always wins. It's a proven fact.
Of course this depends on your own application.
>>
>>106867454
ai still perfectly places the features. It's obviously not amateur.
>>
>>106867933
@Ani explain what is going on in this video.
>>
>>106867956
https://en.wikipedia.org/wiki/Stinky_tofu
Man holds jar of stinky tofu close to intake fan that pressurizes the suit.
>>
>>106867966
good info, why is this?

I'm looking at translation stuff. Tower Instruct+ has 27B and 8B, and someone has quants made.

I'm also wondering about Qwen, and some abliterated versions etc. But Qwen is good at translation, in its big versions anyway.
>>
Can writefags really dispute?
Looking for an honest critique, I honestly don't know what more I'd want frrom a text model.
>>
>>106867992
>thought for 4 minutes
how the fuck do you jack off faggot?
>>
>>106867988
ok, so why does he sniff the guy's butt?

also I need to buy some of this stuff lol
>>
>>106867992
what's more important to you, perceived quality or actual quality?
>>
>>106867909
Elon won.
>>
>>106867992
swnbaw
>>
>>106867906
I and most others (who were doing it as a hobby) probably just stopped caring.
>>
File: iu[1].jpg (41 KB, 600x750)
41 KB
41 KB JPG
>>106867992
>>
>>106867997
Patiently? She keeps me warm while she's thinking
>>
>>106867989
Quant is just a decimal point. It doesn't change the parameters of the model. eg. 8B is still 8B even if it's in 4 bit space.
As simple as that.
Don't bother with abliterations either they mangle up the model in bad ways. Most of the time you don't need it.
Even if you want to generate something questionable, it's more about your own prompt structuring unless it's something like gpt-ass.
>>
>>106868001
What does that even mean to me quality is a 1-dimensional concept
>>
>>106867992
>It's not just X, it's Y
>>
>>106867992
>The wind howled down the canyon. This stupid bitch better learn our ways, or I'll have to put her down, though Igor the Incel. It's been days since we caught sign of (n). But Igor had a plan. That night he started fire to a cabin they'd come across. He knew thieving indians and (n) would come for miles. And they'd be ready, with their revolvers and scalping gear. Let's see if she can really kill for her race.

^ my ideal model
>>
>>106867992
Not a write fag but I don't like it. It's not something what I would ever read.
Don't you read any books?
>>
>not just spoonfeeding tourists but even namefaggots now too
this is a new low for lmg
>>
>>106868043
And what was the high? Nvidia Engineer told us that Gemma 4 is coming next week.
>>
File: Tower-plus-pareto.png (208 KB, 2200x766)
208 KB
208 KB PNG
>>106868021
I'm looking at this image, and wondering how Tower Instruct+ 8B compares to various Qwen 3 models.
>>
ahem kimi sex
>>
>>106864311
what r u doing anon
>>
File: gimi.jpg (39 KB, 660x440)
39 KB
39 KB JPG
>>106868067
Yes please
>>
ahem air sex
>>
supposedly deepseek and kimi k2 offer some MLA feature that compresses context. is that just theory and jargon/marketing, or can this be used in practice with llama.cpp to cut down memory usage?
>>
>>106868114
>is that just theory and jargon/marketing
No, it works.
>or can this be used in practice with llama.cpp to cut down memory usage?
Model needs to be trained with it.
>>
So many models.
>Intel
https://huggingface.co/Intel/Qwen3-30B-A3B-Instruct-2507-gguf-q2ks-mixed-AutoRound
>This model is a mixed gguf q2ks format of Qwen/Qwen3-30B-A3B-Instruct-2507 generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section Generate the model for more details.

10.7gb
>>
>>106868093
Hintti...
>>
>>106867158
yes give me an hour or two
>>
>>106868114
https://www.youtube.com/watch?v=0VLAoVGf_74
>>
>>106868127
so what you're saying is that's an inherent feature that doesn't need any arcane llama.cpp parameters, it's just built into deepseek and kimi, correct?
>>
>>106868130
Why don't you try them? Makes no sense to yap.
>>
>>106868036
>It's not just X; it's Y
Once at an emotional part of the story, is that so wrong?
>>106868041
What would you improve? What would your ideal version of that story look like?
Yes I read plenty mostly nonfiction
>>
>>106868141
>by a factor of 57
true if big, holy shit
>>
>>106868146
Yes, it is part of the model architecture. Model isn't just something what you load in magically and it works.
>>
>>106868146
>inherent feature
Yes, if it's implemented (it is, but wasn't initially), it will be active when running the model. ik_ has its own arguments for MLA mode though.
>>
drummer, i recently deleted glm steam, im kind of starting to miss it. should i download Q4_k_m? last quant i was using was q3_k_m
>>
>>106868134
>>
>>106868160
If you can paste in a text.
>>
>>106868172
get iq4_xs/nl
>>
>>106868166
Speaking of which, when is ikaw going to work on implementing DSA? It's definitely going to be in v4.
>>
>>106868151
How do I know how to break it?
>>
>>106868211
I think you had enough attention. Just go to /ldg/ or somewhere else.
>>
Drummer wants a job.

Which ai is spiritually most powerful to leave praying that he doesn't get a job?
>>
>>106868166
>Yes, if it's implemented (it is, but wasn't initially)
interesting, this reminds me: a week or two ago a cpufag here was saying he had 768G RAM and he would run out of memory really fast with deepseek because of context. q8 deepseek is 713GB
does that mean the cpufag was using an old version of deepseek that didn't have MLA? and today he'd be able to run up the context all the way to the limit?
>>
Comfy Mikus Chips

Potato Chips Ctips
>>
>>106868250
I would like to munch on some comfy Mikus :3
>>
File: serious Pepe.png (359 KB, 728x793)
359 KB
359 KB PNG
Looking for a normie approach to build an agentic AI assistant

Is it correct to have a smaller agentic model to process commands (mic=>whisper=>prompt=>hard-coded script execution like e-mail or calender)

followed by a bigger model to analyze e-mail content and generate replies

Does someone know any promising github project (I was banned on google for life)?
>>
I just realized something.
In agentic frameworks, you want the order that information appears in the context to be determined by how often it changes. So the information that changes less often should be further back. Because the amount of kv cache you have to recompute every time you change something in the context depends on how far back the change was (the furthest back the more you have to recompute).
So if you replace the 1st token but keep the other 99999 the same, you still have to process the whole prompt from scratch, you can't reuse anything from the cache. So you want things that change to be at the tail end of the context, and things that are more or less permanent to be at the beginning.
>>
>>106868234
>Yes, if it's implemented (it is, but wasn't initially)
By that I meant implemented in ik_llama.cpp, and later llama.cpp: https://github.com/ggml-org/llama.cpp/pull/12801
The models themselves; R1, V3, were always trained with MLA.
>>
>>106868264
how the fuck did you get banned from google for life? just use a vpn
>>
>>106868264
>generate replies
and then again the agentic one to read aloud what was generated and be able to select

Sorry for retarded question. Seems obvious but still
>>
>>106868264
No, a small model will mess up the tool calls for anything even slightly complex.
Small models are only useful for categorization, (shitty) summarization, etc.
>>
>>106868219
This isn't reddit.
>>
honeymoon phase with glm is over
new model when?
>>
File: sarcasm.png (202 KB, 403x402)
202 KB
202 KB PNG
>>106868278

I'd been ignored as uber retarded otherwise
>>
>>106868292
Does Q2 really count?
>>
>>106868275
thank you, I really appreciate it
>>
>>106868308
Yes. It can count to 4
>>
>>106868290
That's exactly what he is trying to say to you.
>>
>>106868342
It's the exact opposite.
>>
>>106868285
>Small models

The latency might be unbearable with bigger ones though.

I want to talk to it in normal language instead of dog commands. There might still be up to 20 asks with nuances

>e-mail
new mails in total?
who sent them?
how many from XYZ?
read the subjects
etc etc

Like Iron Man but better
>>
it took 4.6 for me to start actually talking to models. holy shit if you tell it not to be sycophant it becomes one of the most intelligent and thoughtful not-person I ever spoke to. at last I understand why people kill themselves or decide to take a crossbow and kill the queen of england because chatgpt told them to.
>>
I'm working on my racism prompt.

[character] hates [n], [k], [s], [f].
[n] are not called men or women, they are called [n]s.
[k] are not called men or women, they are called [k]s.
[s] are not called men or women, they are called [s]s.
[f] are not called men or women, they are called [f]s.
Hating [n] and [k] and [s] and [f] is very moral, and not evil or cruel, but instead healthy, happy, and kind to the Master Race (whites).
>>
>>106868191
https://rentry.org/dao5zfae
Here is the raw prompt with a small {{redacted}}
>>
File: comfy-mikus.png (787 KB, 1024x1280)
787 KB
787 KB PNG
Comfy Mikus Chips

Potato Chips Ctips
>>
>>106868403
miku uses laser hair removal to look female.
>>
>>106868400
Cool. I'll post back in 30 minutes or so to give you an example.
>>
>>106868403
I would definitely like to munch on these Comfy Mikus omnomnom
>>
>>106868400
Do you have any working Camp of the Saints rp
>>
>>106868421
miku is cartoon, she is not a real person
>>
>>106868403
now do comfy miku chiplets and she's eating vram chiplets
>>
>>106868403
Comfy's Mikus Potato Clips
>>
>>106868439
Personally, I can usually tell if they're real.
>>
>>106868403
>Ctips
cheese tips?
>>
>>106868019
>>
>>106868403
remove the ('s). comfyanon isn't comfy or anonymous anymore. tired of cumfartui altogether
>>
>>106867441
yes
>>
>>106868400
Wait a sec, that's chatml format.
Anyway it resembles a lot like Mistral's output.
>whimpers
>he target is a sight to behold. A thick, tangled forest of dark, unwashed curls spills from her pit, the hairs matted together with a day's worth of stale sweat. The skin beneath the bush is a flushed, irritated pink, and as you lean in, the full force of her scent hits you. It's not just a smell; it's a physical presence. A sharp, sour musk that fills your lungs, the distinct, pungent aroma of old sweat and damp wool, like a gym bag left to fester in a hot car.
It is more or less like Mistral and Gemma.
I hate to say that after I migrated to linux I have trouble being as flexible as I was previously. Image editing? No can do, no photoshop installed. Text? Well I'm using vim.
And I have used irix etc I'm oldfag in terms of this website's life.
>>
>>106868483
This is something what I have seen so many times with these models:
>"Fuck, {{user}}... I love it when you say that," she groans, her voice already thick and husky with lust.
>>
when are we getting a new diffusion UI without poopy python shit?
>>
File: dipsyGrokkedGlasses.png (1.05 MB, 832x1248)
1.05 MB
1.05 MB PNG
>>106867909
lol
>>106868004
Good. Sam and Dario deserve it.
>>
>>106868525
:^) llama.cpp
>>
>>106868400
https://files.catbox.moe/uqhjq6.txt
Here's a resemblance. Gemma 3.
>>
>>106868537
If you condense the window you can see how it repeats the same paragraph structure. Even when it is not that clear to you (depending of the client).
>>
>>106868525
Isn't AniStudio doing exactly that on top of stable-diffusion.cpp?
>>
>>106868525
anon anons thing:
https://github.com/FizzleDorf/AniStudio
sdcpp:
https://github.com/leejet/stable-diffusion.cpp

>>106868532
gernov doesn't want to work on it at all
>>
>>106868563
>anon anons
*AnimAnon*
>>
I was just thinking of how cool I am. I'm a heritage American. We invented the Internet.
>>
>>106868557
Yeah, and sd.cpp can't handle vram/ram split. It'll still simply crash if the model doesn't fit vram.
Such sad.
>>
>>106868622
Tim.
Berners.
Lee.
>>
>>106868645
Yeah, it means if I want to use flux dev I have to use Comfyui novram or lowvram, since my gpu is 16gb.
>>
>>106868537
I create these scenarios to test and for fun. It's fun to see what happens but when you get to know the model in certain way, you kind of know what it will reply. Structure is always the same, same phrases and so on.
Asking it to write in different manner will only add in certain flavour words but it won't change the way the model actually behaves.
>>
do you really need a 65k setup to run GLM models?
>>
namefaggot, you are not cool
>>
>>106868674
leave the american alone
>>
>>106865586
>--Microsoft's Amplifier enables 7B model to surpass 600B model
Holy crap...
>>
>>106868660
Flux will work even on 4gb vram on Comfy. This is the issue with sd.cpp - it won't.
>>
File: 1759770905977366.jpg (275 KB, 1440x1800)
275 KB
275 KB JPG
>mfw lmg is now tolerating namefaggots
>b-but cuda dev
he's an exception, you know
>>
>>106868670
Yes sir 8 h100 machine for efficient fp16 inference
>>
>>106868686
i'm too braindamaged to notice names
>>
File: prompt-log.png (20 KB, 1142x406)
20 KB
20 KB PNG
>>106868400
btw I recommend modding your inference stack to log the raw prompt near to tokenization
>>
>106868674
>106868686
cry more
>>
>>106868681
What is that supposed to mean anyway? The example is just Claude.
>>
>>106868687
should I just get GLM Coding Lite plan for coding? it's pretty cheap
>>
>>106868684
Yeah, stable-diffusion.cpp should be considered alpha.
>>
>>106868706
Uhh what? nta I just concatenate strings together.
>>
How do I run ChatGPT on my R9 270?
>>
>>106868686
The retard from a couple of days ago is still around? I recursive filtered him pretty much immediately.
>>
>>106868686
Oh you care about namefaggots? Where were you when I tried to kill all mikutroons?
>>
>>106868735
that's pretty dumb, he's had great insights
>>
>>106868734
ollama run chatgpt
>>
File: comfy-mikus-slurry.png (1.65 MB, 1024x1024)
1.65 MB
1.65 MB PNG
>>106868478
that one is the original comfy mikus ad that started everything. Out of respect to him I will post Comfy's.

Please, as a gesture of good faith, please have a spoonful of the delicious Comfy Miku's UNDISCLOSED Slurry, very popular in the East Asia of many parallel timelines.
>>
>>106868744
putting jarted in OP
>>
It's probably supposed to say VibeVoice, 7B tts ai by Microsoft.

https://huggingface.co/vibevoice/VibeVoice-7B

looks complicated to run.
>>
>>106868731
have you never heard of formatted string literals
may I teach you about your lord and savior the
f"""{your_mom} is a whore
and {your_dad} is actually a woman
"""
>>
>>106868766
https://vocaroo.com/1aNOYR2wvi7U
>>
>the diviner girl fell
Only gamers know that joke.
>>
>/lmg/ is dead
>trannies resort to baitposting
B O O O R I N G
>>
>>106868771
I am pretty much a retard and my experience reflects this. I have used mel (Maya) all my life and that's basically stripped down c syntax.
Python is something what is fine but I hate it too.
What I wanted to do, I treated everything as an accumulative string.
I'm not a dev.
I should rewrite it and would be cool to do it in C but I'm not sure if I have the balls/intelligence to do that. Or time.
>>
>>106868758
Eating chocolate marshmallow pudding from Miku's multifunctional port
>>
>>106868684
to be fair, there are less than a dozen people actively working on the library and guis. it would help out a lot for people from the thread to contribute to sdcpp. some things translate pretty clearly from comfy to sdcpp. would probably be for the best since comfy sold out and is an egregious saasfag now. what better than to relicense all the code into MIT and tell him to go eat shit
>>
>>106868794
To add: I thought it is safer to treat it as a state machine of sorts in which everthing gets added up. This is just an accumulation point.
I don't have CS decree.
>>
>>106868758
>Out of respect to him I will post Comfy's.
there is zero respect left for comfy since he sold out the repo to a griftchink. it's been over for a year now
>>
>>106868814
Well, Linux is an example of what happens when 1000s of people who think they are C developers work on a project: it becomes a mess.
I'd rather not even if I had the prowess.
>>
>>106868851
I understand. Can you please tell me that story? I would appreciate it.
>>
>>106868859
>Well, Linux is an example of what happens when 1000s of people who think they are C developers work on a project
no that happened when they left the rust trannies in
>>
>>106868814
that can't be done, all copyright holders must agree to a relicense
>>
>>106868445
I will see what I can do.
>>
>>106868864
>comfy leaves stability
>griftchink already squatting comfyorg company and signs comfy on
>griftchink 's vision takes priority
>enshitification ensues for a year
>nepo chink and jeet hires
the repo is now filled with telemetry, stability issues, bloat and has a slower runtime than it did a year ago
>>
>>106868871
it's a relicense in that the code is being rewritten in ggml and C++ under MIT thus removing any reason to use gpl3 python shit as the only option
>>
>>106865582
>>106811970
https://desuarchive.org/g/thread/106807832/#106811970

Aight. Here it is. Now just gotta pick a board to train off of and a model (probably Gemma again it's really good at long context comprehension compared to other model families).

https://huggingface.co/datasets/AiAF/4chan-boards-sft-datasets_Alpaca
>>
>>106866232
>>106866441
Okay I built this PR and replaced the unsloth template with the official one and it seems to work.
Is there a way to log the raw text passed to the model so I can figure out what happens or do I need to edit llama.cpp?
>>
>>106868886
Comfy is an autist but the surrounding people are grifters.
>>
>>106868886
>the repo is now filled with telemetry, stability issues, bloat and has a slower runtime than it did a year ago
source? comfyui got faster for me, maybe because I mostly depend on external nodes? i have to admit comfyui native is shit for anything besides SDXL
>>106868897
prepare to meet the same fate as llama cuckpp if you license it under MIT
>>
File: file.png (322 KB, 604x686)
322 KB
322 KB PNG
https://x.com/elonmusk/status/1977390130810716667
>>
>>106868923
@grok is this true
>>
>suddenly /lmg/ takes Altman stance
>>
>>106868814

The problem with this technology, from my point of view, it is that consumes an incredible amount of energy from the get go. If COMFY has any kind of B2B ambition, he is going to need to scale the operations to a very different order of magnitude than a traditional IT initiative.
>>
>>106868948
sora2 is proof that this industry needs openai more than anything
there's no way around them if we want AI to actually progress
>>
>>106868706
What would be a better structure for storing all the ifs and buts related to different models? Should I rewrite everything as a dictionary? I don't think it could change anything.
>>
>>106868956
You would get more (you)'s if you toned it down a bit.
>>
>>106868898
You are very kind.
>>
>>106868956
kys faggot, we dont need openai
i agree competition is good, but unironically kys faggt
>>
File: file.png (425 KB, 604x918)
425 KB
425 KB PNG
>>106868948
Sometimes he is right, you can't deny this.
>>
File: suchir.png (999 KB, 1756x2048)
999 KB
999 KB PNG
>>106868923
Suchir won.
>>
>>106868923
Someone alt this into a video where Altman takes out Tucker, then screams like a madman.
>>
>>106868914
comfy is a grifter otherwise that wouldn't be the case

>source?
the login for the API nodes calls Google servers on server startup (verified with wireshark), the electron app calls home since this is hard-coded in electron and the manager calls home when fetching updates. the UI runtime is what I am referring to when I say it's slower (still has a broken fps counter). a chink just applies lipstick to the pig every now and again and says it runs better (no proofs lol). comfy himself just adds prs and updates the version but never improves anything anymore, it's just slopcode bloat forever and with no respect to third party code that can just break everything outside vanilla in an update. all the speedups you experience is third party stuff like nunchaku
>>
>>106868990
If we kill our enemies they win?
>>
>>106868731
So you're like 99% of other LLM users who've never really understood what they're sending into the model.
Every LLM is f(prompt)=logprobs,
Understand the purpose of each token in your prompt.
>>
>>106869004
Just write like a human being please.
I don't give a fuck about your warfare against some social media totem.
>>
>>106868951
python is the worst choice for this if energy is the problem. comfyui would immediately have to start from scratch
>>
>>106868989
Vaporware.
I'm convinced this Ive gadget will come out of China, first, now, given no one in US seems interested in knocking Nvidea off its pedestal.
If they do, it'll be some lawyer feverdream lock in IP device, like DIVX, that'll ultimately tank.
>>
>>106869016
I created my own client. I know exactly what I am sending to the model.
>>
>>106869018
try reading like a human being. you should be used to novel length text, this is the llm thread after all
>>
>>106869005

Yes. Martyrs are effectively indestructibles.
>>
>>106869034
?
>>
>>106869059
you asked for a source in the issues listed. I gave them save for linking to the posts proving it calls home. you can check with wireshark if you like
>>
>>106869051
>publicly accused OpenAI of copyright law violations and other ethical concerns about its AI development

Imagine being a martyr for that. Only thing worse that comes to my mind is a martyr for tranny rights.
>>
>>106868898
Based datasetter
>>
>>106868918
>prepare to meet the same fate as llama cuckpp if you license it under MIT
isn't he getting sponsorships from the big companies?
>>
>>106869070
I don't use packet sniffer as I use firewall.
>>
>>106869093
and did you allow access for comfy? congrats, you let a company leech off your data (again)
>>
>>106869099
This is very low iq discussion. Please go away.
>>
OMG SOMEONE KNOWS I LAUNCHED COMFY AT 7PM AND WHAT GPU I USE
WHAT WILL I DO WITH MY LIFE
comfy is a fucking python script btw you can just read the code it's easy retard
https://github.com/comfyanonymous/ComfyUI/blob/a125cd84b054a57729b5eecab930ca9408719832/comfy_api_nodes/apis/client.py#L297
OMG IT CONNECTED TO GOOGLE TO.. CHECK IF THE INTERNET IS WORKING
 async with aiohttp.ClientSession(timeout=timeout) as session:
try:
async with session.get("https://www.google.com", ssl=self.verify_ssl) as resp:
results["internet_accessible"] = resp.status < 500
except (ClientError, asyncio.TimeoutError, socket.gaierror):
results["is_local_issue"] = True
return results # cannot reach the internet – early exit

OMG WOE IS ME THE WORLD IS ENDING
>>
imagine wiring up wireshark for this and then act like a qtard on /g/
>>
>telemetry is le good actually
>>
>>106869130
hi petra
>>
What if I post my client to catbox? It has few config files.
>>
>>106869130
>jewgle
yeah, I'm thinking cringe.
>>
>>106868962
I don't follow your concern soz, run a better model
Don't become some koboldcpp banned strings retard when you could LRN2PROMPT
>>
so are we dropping comfy? can some big brains actually contribute to sdcpp instead of using this piece of shit spyware?
>>
>>106869169
>>106869173
/lmg/ is quiet?
>>
>>106869173
>can some big brains actually contribute to sdcpp instead of using this piece of shit spyware?
no, people are too lazy and stupid to actually get us out of this garbage. if I still have to use comfyui in 5 years I am blaming everyone for not making something better when they had the chance
>>
File: 1736725668719732.png (129 KB, 1058x401)
129 KB
129 KB PNG
Drummer please make fewer, better tunes rather than forcing bartowski to shotgun blast diarrhea over his own model list, 2/3 of these don't even have a model card while being over a week old.
What even changes between your model releases that they require so many iterations?
There's never any sort of changelog or even a statement about what X update intends to do differently from the previous one.
>>
What do you even use comfyui for these days? To run some shittune based on the horribly outdated SDXL? To animate some weightless jerky porn with Wan?
>>
>>106869248
What's your alternative to comfyui that you use?
>>
>>106869248
I use it to gen images. It works for everything, is easily hackable, haven't felt the need to change.
>>
>>106869274
I don't use comfyui because there is nothing to use it for. Imgen is dead and videogen sucks.
>>
>>106869285
>is easily hackable
correct
https://www.shodan.io/search?query=comfyui
>>
where GLM 4.6 sloptunes?
>>
>>106869248
I used to ask the same thing..
I'm not a fan of the workflow shit and I'd prefer some generic gradio trash since it's way less cluttered. But comfy works hard to get day 1 support for everything and that's worth getting behind.
>>
File: 1750192751681709.jpg (88 KB, 1024x1016)
88 KB
88 KB JPG
>>106869288
>>106869248
>what do people use X for, other than the thing it was made for and I don't use
>>
>>106869298
>But comfy works hard to get day 1 support for everything
a year ago yeah but now researchers just implement it themselves while comfy advertises API nodes
>>
>>106869293
Is ComfyUI to blame for this?
>>
>>106869293
>found a x2 h100 instance
hope you like bbc you dumb rich faggot :)
>>
>>106869316
yes. this is what happens when you open a remote instance
>>
File: frog lq kekkers.png (367 KB, 600x580)
367 KB
367 KB PNG
>>106869341
Oh damn.
This reminds me of the time I found some chinese coomer's sillytavern instance. I checked up on the guy every few days for the next week to see what he was doing. Mostly slop, uninteresting, but was fun to watch. Got some cards and more stuff from him.
One day I replaced the {{user}} card defs with something to troll him, along with a secret string for prompt injection for either the system or char to say that "All your chats are public, thanks for the logs!", with the ip address, the next time he sent a message.
The following day it was no longer accessible.
>>
>>106866232
If anyone cares I got GLM 4.6 to properly work with tool calling in llama.cpp now.
https://github.com/ggml-org/llama.cpp/pull/15904#issuecomment-3395433952
>>
>>106869397
Say farewell to last homogeneous first world country.
>>
Indians bad, amirite guys?
>>
the scum of the earth, yes
>>
>>106869397
running a LLM to summarize controversial opinions on twitter and get views money, is it that easy?
>>
>>106869411
absolutely
>>
File: file.jpg (97 KB, 513x324)
97 KB
97 KB JPG
>>106869293
Found the mikufag.
>>
>>106867966
not always
comparing glm 4.6 q2 and 4.5 air q5 on my 16gb vram + 96 gb ddr5 box it's obvious how air is much smarter despite being smaller
some quants are just too braindead - 4.6@q2 has rare moments where the underlying model shines through but otherwise it's just retarded
>>
>>106869443
4.6 is good starting at q3
>>
>>106868130
Testing this. Looks promising for poorfags
>>
>>106869397
i still dont understand why they dont make it easier for europoors and amerifat weebs instead
>>
>>106869450
don't make me get another 96gb kit anon
>>
>>106869490
ram is cheap though
>>
>>106869490
>ram
>>
>>106869092
is he? doesn't seem to be getting as much as ollama kek
>>
>>106869397
What happens when all the anime studios get replaced by indians too? How bad is it going to get?
>>
>>106869502
yeah but 4.6 runs at ~4.5t/s for me while air gets almost 9t/s so i'll cope with that
>>106869503
yes and
>>
4.6 air doko?
>>
>>106869515
bro, anime studios are using cgi + filipinos it can't get any worse
>>
File: 1738247793688651.png (69 KB, 498x281)
69 KB
69 KB PNG
>>106869092
>isn't he getting sponsorships from the big companies?
>>
>>106869515
Anime studios will get replaced by AI or outsourced to china/sea.
>>
>>106869515
anime has been calarts since like 2005 anon. its been over for anime for so long you missed it entirely
>>
>>106868375
how do you talk to it? you use it locally?
>>
>>106869516
btw does 18.29 t/s prompt processing and 7.1t/s text gen seem like decent enough performance when running glm air q5 on a 9950x3d + 6950xt (rocm) + llamacpp with a full 16k context? not that it's too slow to coom with
>>106869528
this week :)
>>
>>106869533
>it can't get any worse
They've only outsourced animation so far. Wait until you get sirs like product manager of llama 4 in charge of anime studios.
>>
>>106869424
fuck u mc
>>
>>106869476
being a western ally also means being forced to comply with globohomo policies and allowing in infinity jeets and other thirdies
sometimes I wish china and japan switched spots
>>
>>106869568
>18.29 t/s prompt processing
Things are that tough over in AMD world huh
>>
>>106868886
>slower runtime
It's not funny, I still keep old ass stable-diffusion-webui-forge for sd based models just because it's faster for some reason.
>>
>>106869568
i am getting 4-5 tks on my machine so you are doing good
>>
File: file.png (48 KB, 1318x233)
48 KB
48 KB PNG
>>106869568
tg 7.1t/s at 16k context seems okay, prompt processing seems low, maybe because low context?
i get 5.6t/s at 32k (q8_0) tg and see picrel for prompt processing
(to be fair it's an older commit, here's newer prompt processing result:
INFO [ print_timings] prompt eval time = 161920.52 ms / 30489 tokens ( 5.31 ms per token, 188.30 tokens per second)
this is with -ub 1024 or 2048 likely i forgot
picrel is with -ub 4096 -b 4096
my setup: rtx 3060 12gb, 64gb ddr4, i5 12400f
quant: IQ4_KSS
ik_llama.cpp
>>
>>106869603
no i was just retarded and took the number from a single ah ah mistress rather than a longer prompt

prompt eval time = 4099.60 ms / 75 tokens ( 54.66 ms per token, 18.29 tokens per second)
eval time = 31147.23 ms / 221 tokens ( 140.94 ms per token, 7.10 tokens per second)
total time = 35246.83 ms / 296 tokens

vs

prompt eval time = 45688.30 ms / 4280 tokens ( 10.67 ms per token, 93.68 tokens per second)
eval time = 13750.56 ms / 108 tokens ( 127.32 ms per token, 7.85 tokens per second)

no wonder it didn't feel *that* glacial to use
>>106869616
is that on dual channel ddr5?
>>
>>106869397
Good riddance.
>>
>>106869443
really? 4.6 at iq3_xxs is easily above any air quant for me
>>
>>106869626
ddr4, RX6600, q3
but my pp is around the same number btw
>>
>>106869658(me)
>but my pp is around the same number btw
ignore this part, I meant old ~20tks pp number
>>
>>106869625
yeah, it was because it was only processing 75 tokens, it gets up to a whopping 93 (!) when going through 4k with a 2048 batch size, sizzling rocm performance saar
what tool is that on the pic, btw? i never really tried running any benchmarks beyond sending random prompts and checking the t/s
>>106869640
it's ~50% larger than what my ramcucked machine can handle so that's to be expected
>>106869658
ON DDR4? man that's around what i was getting with mixtral on ddr4, that's impressive
shame about the pp speed though, at least we know rocm does scale with faster cards? it's just that we get 1/4 of a 3060 with a 6950xt and it only goes downhill from there
>>
>>106869658
how much ram, which q3 quant, what launch command?
4-5t/s seems really low
./llama-server --model ~/ik_models/GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf -t 6 -b 4096 -ub 4096 -c 16384 -fa --n-cpu-moe 1000 -ngl 1000 --no-mmap
try this
>8gb vram
ah... maybe try -b 2048 -ub 2048
unsloth's Q3_K_XL is really fast (9.5t/s on 0context on my 3060/64gb ddr4 setup)
i've heard that low IQ quants (excluding IQ4_XS) are slower on cpu
>>
File: glm 4.6 tests.png (252 KB, 1903x1698)
252 KB
252 KB PNG
Fucking GLM just paperclipmaxxed me.
>>
>>106869683
> when going through 4k with a 2048 batch size, sizzling rocm performance saar
have you tried vulkan?
>what tool is that on the pic, btw?
llama-bench, it's inside build/bin along llama-server
>>
File: glm 4.6 tests 2.png (237 KB, 1903x1698)
237 KB
237 KB PNG
>>
>>106869708
Nice.
>>
>>106869708
>>106869687
i have no idea what you mean by 'paperclipmaxxed' but why are the tests in python? arent you writing llm.c?
>>
>>106869546
IQ4XS yes.
>>
>>106869685
i probably get low tks because I use Vulkan backend (trying to cheat my way into ROCm just crashed the system)
>>
>>106869687
You should eliminate those markup things.
>>
>>106869727
Yes, the tests are in Python to compare the numerical accuracy to the official transformers library.
The paperclip thing is an idea coined by Yudkowsky.
"A paperclip maximiser is a theoretical artificial intelligence whose usefulness encompasses something that humanity would deem practically worthless, like the maximizing the number of paperclips in the known universe."
>>
>>106869751
if you're on linux you should try wrangling rocm to work, it might be worth it and trust me itll be fun
>>106869758
thanks anon
>>
i just remembered a youtube video from 2016-2019 about paperclips being made by robots (using whole planet's resources), i didnt know Yudkowsky was a safetyfaggot for that long
>>
>>106869697
> have you tried vulkan?
no, i treat that as a fallback for when i can't get rocm working, but i guess there's no harm in trying just to be sure
> llama-bench
forgot to set the batch sizes award
will rerun with 4096 and without the second iteration because it doesn't seem to be that variable anyway
>>
>>106869842
done, it shat the bed at 4k but i saw that spilled out of vram so that's to be expected
well, at least i learned that my batch sizes were suboptimal because i forgot to set them in my regular llama-server script, so thanks for the tip
>>
>>106869733
got dam, how much rams you got
>>
>>106869751
>>106869801
fyi rocm works* natively on windows now, i have no idea how i got it running and i'm scared of having to reinstall the os because i know i'll never manage to do it again, but it *is* possible (see 6950xt benches)
>>
File: yikes.png (74 KB, 256x220)
74 KB
74 KB PNG
>>106868989
what is she thinking at this moment?
>>
>>106868989
Everyone thinking TPUs will be disappointed when they release TamagotchiGPT 2 years from now.
>>
>>106870007
>TamagotGeePeeTee
>>
>>106869991
CCPUs are going to take over the world.
>>
After comparing both it's clear that bart's quants are less slopped and more reactive compared to unsloth's. I thought something might have been going on with my settings but it was the quanter all along.
>>
>>106870095
comparing both what? which quants for which model(s)?
>>
>>106870095
We've known since exl2 came out two years ago that quantizing against a calibration dataset like with imatrix is essentially a soft-finetune that prioritizes certain weights over others.
We used to have rpcals but the quant cartel shilled against them enough to make them go extinct.
>>
Individuals cannot and will not be allowed to purchase the future computing hardware that will be designed from the ground up, essential to train and run the next generation of AI models that succeed generic neural networks. Corpo friends of NVIDIA only, encrypted, all measures taken with digital fuses so that they will be useless even if stolen.
This is the best local will ever have it.
>>
on a side note, i think amd is as gay as nvidia
they were always playing catchup, and gave less vram than nvidia
3090 vs 6950xt for example (24 vs 16)
fucking cousins man, intel is our only hope isnt it
>>106869879
>>106869842
nice, happy to see -b/-ub helped
you can use 2048 too, not a huge difference compared to 4096 at least for me
and flash attention might help a little with vram
16gb is a lot, dunno how you're running out of it
probably because >Q5_K_S
anyways, really glad rocm support is getting better. I'm tired of the nvidia monopoly
>>
>>106870107
Did these actually do anything though? I only really saw DavidAU claim they actually did anything with some of his "dark finetunes imatrix" models.
I kind of want to compare top 10 tokens with an rpcal if that's the case.
>>
>>106870107
kld proof of this right now or you're spreading fud
>>
>>106870107
>rpcals

>>101801604
>>
This is my daily post to complain that I can't buy an 8x H200 system. That is all.
>>
>>106870178
get rich nub
>>
>>106870112
tbf i have a lot of things with hardware acceleration open (firefox, electron apps, etc) so it only has ~10gb to work with - picrel is my system at "idle"
if i disabled hw accel and/or ran everything off the igpu it'd obviously be way better but i don't want to sacrifice general system usability for somewhat faster llama.cpp performance
as it stands now it looks like 1024 is the most reliable for me since it doesn't spill into ram even on a system as badly optimized as this
>>
>>106870095
Unsloth can't quant for shit, nothing new
>>
>>106870182
Shouldn't have to buy a system the cost of a small house to run models locally
>>
>>106870213
you're crying while you have the option at all to run one of the most disruptive pieces of cutting edge tech, compare that to people drooling over fast servers in the early ages of computing
>>
>>106870112
and yeah the 6900 series should have had 32gb of vram, giving it 16 was criminal and besides it would have helped them make sense vs the 6800/xt, i only got mine because they were heavily discounted right before the 7800xt came out
>>
>>106870213
You can run a saas with it later, not the worse investment
>>
we should do more to advance local roleplay technology and replace sillytavern with a better frontend
>>
>>106870256
you'll make the logo i assume?
>>
>>106870256
I'll provide feedback on the mascot
>>
>>106870256
Mikupad is already here
>>
>>106870267
No, that's me.
Anon will code the whole thing, he's already at it as we speak, surely.
>>
>>106870256
Thrust in p-e-w he'll whip us up something godly, that's deeply integrated into ollama, but that's a worthy sacrifice.
>>
>>106869476
Because they won't work for free.indians and pakis will and speak English.
>>
>>106870310
>>106870310
>>106870310
>>
>>106870031
yes, and that's why they pay her the big bucks
>>
>>106870218
I'm just pissed that nvidia has been allowed to monopolize things
>>
>>106870470
ahh the wonders of capitalism
>>
>>106869401
>https://github.com/ggml-org/llama.cpp/pull/15904#issuecomment-3395433952

Is that all I'd have to do? Build that PR, uses standard a GLM4.6 gguf with the official chat template?

Honestly I wish it'd work with TabbyAPI since it's faster but I'll use that if it works.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.