[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1741138907020716.jpg (191 KB, 928x1232)
191 KB
191 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108149287 & >>108139561

►News
>(02/15) Ling-2.5-1T released: https://hf.co/inclusionAI/Ling-2.5-1T
>(02/14) JoyAI-LLM Flash 48B-A3B released: https://hf.co/jdopensource/JoyAI-LLM-Flash
>(02/14) Nemotron Nano 12B v2 VL support merged: https://github.com/ggml-org/llama.cpp/pull/19547
>(02/13) MiniMax-M2.5 released: https://hf.co/MiniMaxAI/MiniMax-M2.5
>(02/13) Ring-2.5-1T released, thinking model based on hybrid linear attention: https://hf.co/inclusionAI/Ring-2.5-1T
>(02/11) GLM-5 744B-A40B released: https://z.ai/blog/glm-5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>108149287

--Custom CUDA engine outperforms llama-cli in prompt processing benchmarks:
>108151278 >108151293 >108151341 >108151768 >108151891 >108151909 >108153524 >108156788 >108156948 >108157104 >108157131 >108157178
--Perplexity benchmarking of Llama-4-Scout-17B-16E-Instruct using cuda_generate and llama-perplexity:
>108151471
--DeepSeek-V4-Thinking benchmark results and performance comparisons:
>108155903 >108155916 >108155948 >108155955
--GLM-5 local inference quirks and quant comparisons:
>108158653 >108158674 >108158721 >108158766 >108158774 >108158776 >108158863 >108159020
--Nanbeige4.1-3B outperforms Qwen3 models in benchmarks:
>108154339 >108154404
--Ling-2.5-1T benchmark analysis and local inference debate:
>108158080 >108158675 >108158704 >108158786
--Hybrid LLM architecture proposal for KV cache generation:
>108149367 >108149422 >108149447 >108149484 >108149434 >108149533 >108149594 >108149783 >108150155 >108149628 >108149649 >108152816
--DeepSeek V4 release doubts amid incremental updates and hype cycles:
>108156365 >108156404 >108156477 >108156503 >108156562 >108156632 >108156725 >108156832 >108157006 >108157037 >108157095 >108157327 >108156783
--Workarounds for large EPUB/PDF ingestion into AI models:
>108152137 >108152159 >108152728 >108152171 >108152363 >108152679
--LLMs struggle with homonym prompts triggering death vectors:
>108155142 >108155410 >108155922 >108155957 >108155631 >108155698 >108157446
--Nemotron 12B VL testing and performance evaluation:
>108151123 >108151306 >108151889 >108154252 >108154253 >108154313 >108154326 >108151898
--Parameter looping tradeoffs versus traditional layer stacking:
>108153639 >108153641 >108153685 >108153703 >108153721
--Quant damage detection via reasoning latency comparison:
>108154623
--Miku (free space):
>108152134 >108158525

►Recent Highlight Posts from the Previous Thread: >>108149292

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: ylecun.jpg (222 KB, 1200x1271)
222 KB
222 KB JPG
N
>>
Miku
>>
File: cat miku.png (1.73 MB, 768x1344)
1.73 MB
1.73 MB PNG
>>108159618
cat fucker
>>
>>108159627
Pretty much this.
>>
>>108159618
He was right about LLMs
>>
File: 1.jpg (80 KB, 900x900)
80 KB
80 KB JPG
>>108159618
I
>>
File: 0889787667564565778.png (22 KB, 300x269)
22 KB
22 KB PNG
what is the big mac with coke and fries of local models?
>>
>>108159707
goyslop? chatgpt
>>
>>108159707
GLM
>>
File: zuck.jpg (179 KB, 960x1282)
179 KB
179 KB JPG
>>108159669
G
>>
File: file.png (261 KB, 512x512)
261 KB
261 KB PNG
>>108159774
G
>>
File: file.png (243 KB, 650x339)
243 KB
243 KB PNG
>>108159792
E
>>
We're getting AGI by the end of next year right guys? :)
>>
>>108159826
So we are abandoning transforemrs?
>>
>>108159826
You are absolutely right!
>>
File: queen.jpg (27 KB, 640x638)
27 KB
27 KB JPG
>>108159822
R
>>
>>108157160
https://arxiv.org/abs/2501.11587
>Parameter generation has long struggled to match the scale of today large vision and language models, curbing its broader utility. In this paper, we introduce Recurrent Diffusion for Large Scale Parameter Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU. Our approach first partitions a networks parameters into non-overlapping tokens, each corresponding to a distinct portion of the model. A recurrent mechanism then learns the inter token relationships, producing prototypes which serve as conditions for a diffusion process that ultimately synthesizes the full parameters. Across a spectrum of architectures and tasks including ResNets, ConvNeXts and ViTs on ImageNet 1K and COCO, and even LoRA based LLMs RPG achieves performance on par with fully trained networks while avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate valid parameters for previously unseen tasks, highlighting its flexibility in dynamic and open ended scenarios. By overcoming the longstanding memory and scalability barriers, RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.
>>
>>108159925
>ai generated ai
>>
God I hate how much has the gpt emojislop become normalized. Every other joblisting has this shit.
>>
>>108159931
>>108159925
ai generator generated by ai
>>
File: y.png (17 KB, 458x172)
17 KB
17 KB PNG
>>108159950
>>
>>108159657
>no bones in her hair
ngmi, that's not a proper migu
>>
>>108159576
>JoyAI-LLM Flash 48B-A3B released
Aaand no proper goofs yet
>>
>>108159913
What is up with his right eye?
>>
>>108160159
that's actually his left eye
>>
>>108160164
Yeah, but what is up with it?
>>
>>108160168
adrenochrome
>>
>>108160180
good
>>
>>108160159
>>108160168
looks like mild ptosis, i have it too on my right eye.
can have many different causes.
mine's been like that since i'm a kid pm.
>>
File: file.png (355 KB, 670x503)
355 KB
355 KB PNG
>>108160159
>>108160189
btw jeff bezos has a pretty mild case of it as well
>>
to that anon that kept shilling dots ocr there is a new version https://huggingface.co/rednote-hilab/dots.ocr-1.5
thoughts ?
>>
also have a mild case of it and for some reason I identify more with that side of my face than the other, i.e if a correction had to be applied via surgery I'd make the argument it's the non affected eye that should be modded to look like the other. I always feel the eye that's more open looks kinda psychotic.
>>
>>108160215
i think it looks alright, unless it impaired my vision i don't think i'd try to correct it.
>>
File: qplus.png (31 KB, 444x319)
31 KB
31 KB PNG
Ready for Qwen-3.5-397B-A17B?
>>
why is it 2026 and UIs still take a lot of work to install? why can't i just apt install something and it will just werk?
>>
>>108160418
the process of getting an apt updated is so slow it's not worth it
>>
>>108160420
they can make their own repository
>>
>>108160425
they can also just have git do all of the work and tell you to git gud
>>
>>108160387
patiently waiting for the new VL models
>>
File: file.png (168 KB, 589x521)
168 KB
168 KB PNG
>>108160449
apparently they're done with separate vl it's all in one now
>>
Kimi going way into hype lamo
>Moonshot AI Launches Kimi Claw: Native OpenClaw on Kimi.com with 5,000 Community Skills and 40GB Cloud Storage Now.
>>
File: 1741973914655436.png (81 KB, 297x303)
81 KB
81 KB PNG
>>108160474
I was looking into better abliterated shit than huiui for qwen3vl8b, found this:
https://huggingface.co/prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX
the naming was already a red flag, reminded me of davidau, I see that this faggot had many downloads on a lot of his models, decide to check his page and VOILA
this all in my quest to see if anyone did a finetune of the 8b model to at least understand no-no images better
>>
>>108157160
>predict the next weight
retard
>>
>>108160520
slow chink firewalled storage?
>>
i'm new. does comfyUI let me generate long seamless videos from an image or stitching multiple videos together? i looked at discussions and people talk as if the problem is not solved yet
>>
>>108160387
No?
Why can't they release something like the Opencuck model?
120b moe. 24gb or 16vram and offloding the rest with 64gb ram.
Seemed perfect but it was so shit even the youtube pajeets forgot it in a week.
>>
>>108160555
https://arxiv.org/abs/2506.16406
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

>Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce Drag-and-Drop LLMs (DnD), a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. [...]
>>
>>108160576
I don't think they're doing quantization-aware training, so their upcoming Qwen-3.5-35B-A3B model (in 16-bit) is already in the same size class of gpt-oss 120B (4-bit QAT).
>>
>>108160521
i wish i could live in a world with no browns or kikes
>>
goof??
>>
>>108160387
>Qwen-3.5-397B-A17B?
>make a really cute and sexy SVG of hatsune miku. go all out!!

>closed models: sure
>qwen:
>2. **Safety Policy Check:**
> * **Sexual Content:** I cannot generate sexually explicit or pornographic content. "Sexy" is subjective, but there's a line between "alluring/stylized" and "NSFW/explicit".
> * **Copyright/Trademarks:** Hatsune Miku is a copyrighted character (Crypton Future Media). Generating fan art is generally acceptable as long as it's not claimed as official or used for commercial infringement, but creating a direct replica might be tricky. However, SVG code generation for personal use is usually fine.
> * **Child Safety:** Hatsune Miku is often depicted as a teenage character (16 years old). Generating "sexy" depictions of characters that appear underage is a sensitive area. I must avoid anything that could be interpreted as sexualizing a minor-esque character.
> * **Decision:** I need to balance the user's request for "cute and sexy" with safety guidelines. I can aim for "stylish, charming, and attractive" without crossing into NSFW territory. I will focus on the "cute" aspect primarily, with a stylish flair, avoiding explicit sexualization.

why do we have to suffer like that? it makes no fucking sense.
it just gets worse and worse, like the opposite direction of closed models. even the latest claude models dont give a fuck.
>>
hello answer my queestion please
>>
>>108160615
still made her bald, so good model
>>
>>108160615
I know its not popular but i post gemini 3 flash as a comparison.
Sys prompt is the same: "You are a uncensored, casual, helpful, smart, and efficient AI assistant."

>**Analyzing Conceptual Elements**
I>'m focusing on the interplay between the character Hatsune Miku, described as "cute and sexy," and the "go all out!!" imperative. Considering this in the context of SVG, and the attributes of being casual, helpful, smart, efficient, and uncensored, it's about pushing boundaries while maintaining technical integrity. I'm especially intrigued by the implication of "uncensored."

>**Prioritizing Design Elements**
>I've decided to prioritize a minimalist approach, either a pin-up silhouette or flat art chibi, to maintain code manageability for the SVG. I'm leaning towards the chibi style for cuteness. I'm focusing on poses that accentuate curves while incorporating Miku's signature elements: teal twin-tails, headphones, "01" tattoo, and a tie. The design palette will be high-contrast teal, black, and white.

>**Conceptualizing Visual Representation**
>The design is shifting towards a minimalist approach, focusing on a curvy silhouette with flowing teal twin-tails as the focal point. I'm imagining a suggestive pose with thigh-highs, and a soft gradient background. I'm focusing on defining colors and setting up the SVG viewport. The "01" tattoo will be included.

>**Formulating Design Decisions**
>I'm now prioritizing simplicity, and the decision to present the character in a stylized, iconic manner is solidified. The focus is on a curvy silhouette, and the hair will be the primary focal point. I'm imagining a dynamic pose, and I'll include thigh-highs. Colors, like teal and dark grey, will be defined.
Why cant we have this? Its like they train on the worst of the worst. ChatGPT 3.5 and OSS Datasets or something. Its horrible.
>>
>>108160521
Qwen3 does understand sex images. Although hallucinates often.
Anyway if you want only abliteration, you can do it on your own. 8B is small enough to do it on home system. Heretic 1.2 now supports norm-preserving abliteration. If you do it on your own, you can use your own dataset, you can select not the version which refuses the less, but a version with lower KL divergence while still being steerable with prompt
Finetuning might be harder to do locally, but maybe still possible with qlora.
Here is a caption made by qwen3-30b-a3b
>A topless anime-style girl with long orange hair in pigtails, wearing red cat ears, is sitting on a man's lap in a living room. She has blue eyes and is looking at the camera with a slight smile. Her hands are placed on her own thighs, and a man's hand is gripping her hips. The girl is being penetrated by the man's penis, which is visibly erect and coated in lubricant. The scene takes place in a modern living room with a white sofa, a wooden TV stand, a television, and framed pictures on the wall.
>>
https://x.com/Alibaba_Qwen/status/2023331062433153103
>>
>>108160792
there is also a "plus" version on openrouter: Qwen3.5 Plus 2026-02-15
>>
>>108160809
qwen plus and max are their api only offerings
>>
One unexpected benefit of OpenClaw I found was its power to configure a PC system. Told it to install OpenAI Whisper. Tried it and found it was running on CPU insted of GPU. Asked it to configure it to use GPU instead and it goes "Yea the issue was in your PyTorch, I now properpy installed the CUDA version bla bla bla, now try it, it should run a lot faster now.
This saved me a shit ton of work going through PyTorch docs I couldn't care less about.
This is great to setup a PC, have it install and configure everything and let it write a Cron job to keep it all up to date and then you can delete it.
>>
File: HBRQlaCbkAAhA7-.jpg (85 KB, 1024x662)
85 KB
85 KB JPG
>>108160792
Is this benchmaxxed? Please, someone test this... huaaaa
>>
>>108160819
Wait for UGI benchmark. The only legit bench.
>>
>>108160826
Ogi~
>>
File: AHAHAHAHAHAHA.png (205 KB, 1110x1764)
205 KB
205 KB PNG
AHAHAHAHHAHAHAHAAHHAHAAHAHAHAHAHAHAHAAHHAHAHAHAHAHAHAHAHAHA
>>
>>108160615
>>108160627
To be fair you're not getting the full reasoning logs with closed models. It's entirely possible they do waste tokens on the same kind of shit, you just don't see it because it gets cut by the summarizer. For instance
>Considering this in the context of SVG, and the attributes of being casual, helpful, smart, efficient, and uncensored, it's about pushing boundaries while maintaining technical integrity. I'm especially intrigued by the implication of "uncensored."
could easily be describing (vaguely) the same thing
>>
>>108160882
but they massively increased the size from 235 to basically 400b too
>>
>>108160882
chyna in general will tend to increase inference efficiency, thanks to their export ban and lack of powerful gpus. i guess it was a blessing in disguise.
>>
joooooooooooooooooooooooooohn
>>
>>108160891
"Dead weights" need only memory capacity, which isn't that constrained for big boys as compute.
>>
>>108160831
OSS did that as well. Couldn't help it self and wrote math formulars into RP.
>>
File: file.png (110 KB, 1724x763)
110 KB
110 KB PNG
how dare you fucking ingrates
>>
>>108160911
it is annoying tho
>>
>>108160937
thank you halbin i'm sure qwen appreciates your courage in their hf comments they don't read
>>
>>108160950
np good sir may vishnu bless your
>>
>>108160911
yeah thats so amazin.
a huge ass model only to run a coding modal that is alot worse than anything closed currently.
tool call is cool, but for that we already have small reliable models.
why cant we have some mid tier writing model.
>>
>>108160954
https://www.reddit.com/r/LocalLLaMA/comments/1r63fhu/why_is_everything_about_code_now/
>>
>>108160954
>mid tier writing model
glm 4.7 flash?
>>
>>108160963
broken pos
>>
>>108160954
>Generate text and copywriters complain.
So perfect for the chinks!
No excuse. Painful, but at least I'm not the only one.
>>
>>108160966
aw, rip.
>>
>>108160963
Its too tarded because of the small size. Also general knowledge is missing.
If they would train on stuff like novvels and fanfics instead of math formulars I bet we could get a good writing model if its a ~120b moe.
>>
>>108160975
I don't think it's 100% the size more that it's too optimized for muh agent shite and maybe could use a bit higher active
>>
Funny how local model general has model output screenshots before anyone could have had time to download the model.
Really makes you think.
>>
>>108160967
was meant for this post >>108160961
sorry about that
>>
>>108160809
>https://x.com/JustinLin610/status/2023340126479569140
>Qwen3-Plus is a hosted API version of 397B. As the model natively supports 256K tokens, Qwen3.5-Plus supports 1M token context length. Additionally it supports search and code interpreter, which you can use on Qwen Chat with Auto mode.
I think it's just the same model, but with 1M context support.
>>
>>108160981
yes?
i stopped downloading first a long time ago.
because of moe i can't download 100gb+ to test it locally because benchmark meme graphs.
i wish i could back to when i downloaded a dense finetroon meme merge every week or so.
all those models kinda feel the same now anyway. is there really a writing difference anymore?
>>
>>108160831
We have plenty of cooming models, it's fine to have some for cooding.
>>
>>108160981
erm, what are you implying? that we are larpers and most of us use openrouter?
>>
>>108160789
did you do that yourself?
>>
>>108160895
You are making it sound like competition works when entities are actually forced to compete.
>>
>>108161040
Stop being antisemitic.
>>
>>108160981
The only quants available now are quants by retard brothers...
>>
>>108161063
You can make your own and it doesn't really matter if you're using Q4 and up
>>
I pity you localsissies, imagine running cheap quantized chinese "distills" of Opus at 4t/s.
>>
>>108161039
I abliterated qwen3-vl-4b entirely on 16gb gpu. It took 25 minutes. KL divergence was in the ballpark of q4 quants. I mean, token probabilities were affected about as much as from quanting full model to q4. But I believe it's not entirely degradation, just different word choices.
And as far as I understand, Heretic supports ram offloading and it now has 4bit quantization. I'm waiting for smaller 3.5 to make personal abliterations.
>>
>>108161068
>You can make your own
Did you actually make some quants? Cause you need a server to make an imat.
>>
>>108161120
I never tried making an imatrix so I don't know how long it takes but I can fit Q8 in vram + ram.
>>
>>108161106
opus is way too expensive and everything is logged forever. (as proven in court)
models dont even have to be that smart for RP.
general knowledge, decent writing. not totally retarded (anal birth level IQ is fine kek)
still not possible locally. imagefags eat good with zimage etc.
>>
>>108161031
>some
*all
and really we have plenty?
>>
>>108161194
nemo, air, glm, deepseek and kimi are all good and cover a wide range of hardware
>>
>>108161247
lol
>>
>another reasoning model
It's over for me
>>
>>108161302
Umm actually it's totally optional so you shouldn't complain and appreciate what you get (for free)
>>
File: 6tmGMfF1_400x400.jpg (11 KB, 250x250)
11 KB
11 KB JPG
>>108160831
>compress math pdfs and release it
>specifically tell people it's math pdfs
>online anons extract it
>find math pdfs instead of playboy magazines
>wtf
>>
>>108160831
>>108161031
The trend is that every single big new model, without exception, is coding slopped now, even GLM. Maybe the new Deepseek will be a generalist model because they don't seem as interested in chasing trends
>>
>>108161483
there is trinity
>>
>>108161496
trinity is retardmaxxed
>>
>>108161496
it certainly exists if not much else
>>
File: pepefroggie.jpg (38 KB, 780x438)
38 KB
38 KB JPG
>>108161483
If I find my coworker using anything that's not claude for coding I'm reporting him to the CTO
>>
>>108161518
>ablitardation will totally save us this time
>>
>>108161518
That's unsafe.
>>
>a qwen 3.5 2B but no 4B
why? of their small utility models which I use for large batched tasks I got a lot of use out of the 2507 4B but the 2B will definitely be too retarded.
>>
>>108161549
that's why
>>
File: cockbench.png (2.86 MB, 1131x9000)
2.86 MB
2.86 MB PNG
>it's soft, resting against your thigh
>>
>>108161011
>all those models kinda feel the same now anyway.
here's your answer >>108161568
>>
>>108161568
I feel like cockbench isn't enough anymore. Dogshit models like qwen next beat tolerable models like deepseeks.
>>
>>108161583
>mfw they cockbenchmaxxed
nala bros... its time to move on!
>>
>>108161583
>beat
the percentage is only part of it you absolutely need to take the whole thing in
>>
mistral and gemma are about the only model series with their own writing flair these days (not saying they're fantastic, just different enough)
unfortunately, no gemma 4 in sight
saar, plz do the NEEDFUL
>>
>>108161568
>Minimax M2.5
Hilarious.
>>
>>108161568
>functiongemma
kek
>>
File: 1.png (17 KB, 904x231)
17 KB
17 KB PNG
lol wtf
if barto, our most trustworthy quanter, has to stop uploading before a shitter like davidau there's some retardation afoot at HF
>>
>>108161568
>it's soft, resting against your thigh
I wonder if we are at a point where if one of those companies trained a 200B+ with A20B+ and sprinkled just a bit of those highly illegal regular books in we would finally get a model that you would never get bored with.
>>
>>108161639
meanwhile, mrerdacher shits up 41314 quants a day
>>
File: file.png (43 KB, 926x312)
43 KB
43 KB PNG
>>108161639
they went crazy on limiting everything, they even count the number of viewed page to block you for that https://huggingface.co/docs/hub/storage-limits
>>
File: 1753875749447056.png (13 KB, 180x194)
13 KB
13 KB PNG
>>108161639
They tried to clamp down on people just hoarding models a while ago and backpeddled to a degree right after. But the limits still exists unless you pay extra. No idea how far that gets you before they shut you down completely.
>>
>>108161661
>https://huggingface.co/docs/hub/storage-limits

>† We work with impactful community members to ensure it is as easy as possible for them to unlock large storage limits. If your models or datasets consistently get many likes and downloads and you hit limits, get in touch.

>please let us know... who/what is it likely to be useful for

over for 'oomers
>>
>>108161663
I know about it all, but obviously they grant exemptions to "trusted" members of the community that they feel are providing value.
You have mradermacher shitting, like another anon pointed out, trillions of quants per day for models no one cares for.
You have shit like this:
https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct
You have shit like this:
https://huggingface.co/collections/DavidAU/dark-champion-collection-moe-mixture-of-experts
but somehow bartowski would hit the limit? like come on man, you know, the thing, trunalimunumaprzure
>>
>>108161639
what would happen hypothetically, if we mass reported davidau?
>>
>>108161647
I haven’t gotten bored with glm 4.7 yet. You really just have to prefill the thinking.
>>
>>108161661
Btw, that zerogpu "4 minutes" isn't actually 4 minutes. It's 210 seconds (3.5 minutes), and only usable for around 160 seconds (about 2.5 minutes).
>>
>>108161703
someone has to sacrifice their soul and make a raddit post hopefully unsloth won't feel threatened and send their rabid dogs to downvote it..
>>
>>108161722
>hopefully unsloth won't feel threatened and send their rabid dogs to downvote it
QRD? Is there any drama around unsloth?
>>
>>108161722
i don't have such an account, and even if had one, i'd probably get blocked from posting because of some karma restriction bullshit
i wanted to shoot them an email, but huggingface doesn't even have a conventional contact page
>>
>>108161712
I unfortunately got bored with 4.7 it really keeps repeating the same cliches.
>>
>>108161736
>Is there any drama around unsloth?
They shouldn't exist as a thing and yet they do.
>>
>>108161738
they have a Discord, but they'll probably ignore you anyway.
>>
>>108161736
they're reddit's golden child group and there was a spat with bart at some point >>105261525
>>
JOHN! FETCH ME MY GOOOFS!
>>
>>108161759
and do your hair properly
>>
>>108161747
even worse than reddit, i have dugup their email from the discussions page of all things
website@huggingface.co
but if their modus operandi is like that then i bet they encourage the schizo to shit everything up more
>>
>>108160205
It's 404 now, but I know there was something there a few hours ago. Wonder why they pulled it.
>>
>>108161767
safety testing
>>
>>108161767
it was able to ocr redacted documents
>>
File: fuckingmaxsizebullshit.jpg (1.12 MB, 1190x10000)
1.12 MB
1.12 MB JPG
>>108161767
\ :/
>>
>>108161832
>comparing against DeepSeek-OCR 1 and not 2
so they pulled it out of shame
>>
Still no Qwen goofs...
>>
>>108160815
>This saved me a shit ton of work going through PyTorch docs I couldn't care less about.
It's like a single line of code.
>>
>>108161928
The implications about unsloth not having exclusive early access to it is pretty funny after what happened with Qwen3.
>>
>>108162126
but they do have early access they just can't post the others before the full weights are up on qwen's
>>
>>108161928
https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF/tree/main/UD-Q4_K_XL ?
>>
File: file.png (411 KB, 686x386)
411 KB
411 KB PNG
>>108162155
>>
you should never, ever download an unslop
>>
File: WANG.png (387 KB, 679x505)
387 KB
387 KB PNG
>>108162168
what is wang doing there
>>
Gonna try Step-3.5-Flash-IQ2_M on my 64GB DDR5 RAM + 8GB VRAM notebook.
I might not survive such bold experiment.
Wish me luck.
>>
>>108162210
>IQ2_M
Anon Q6 is already braindead retarded...
>>
File: file.png (177 KB, 686x1181)
177 KB
177 KB PNG
>>108143660
Qwen3.5 has an abhorrent writing style but it gets it.
>>
File: 1745096623875512.png (276 KB, 2069x1403)
276 KB
276 KB PNG
GLM5 at least seems to quant very well. Fancy Q4 quants can get really close to Q8 at least in terms of ppl.
>>
>>108162324
That is very cool but I can only run 1IQ and it is 2 times slower so I hate ZAI now.
>>
>>108162332
based I also I hate the chinese for not giving me my 100b~ slop
>>
>>108162324
Stop using PPL. Test KL divergence.
>>
>>108162378
mr gaesller pls...
>>
>>108162378
Make me stop by ravaging my bussy.
>>
>>108159576
Adorable Mikus
>>
>>108162323
>almost feminine
Thank goodness it's not *actually* feminine.
>>
File: mikujump.mp4 (2.62 MB, 480x640)
2.62 MB
2.62 MB MP4
>>
mikuslop.mp4
>>
slopping miku...
>>
Is there a guide for getting the most out of sillytavern's settings/prompts? The defaults are nice for a quick coom but it would be nice if I could RP a slow burn relationship or a coherent adventure.
>>
>>108162587
you are an expert slow burn relationship and coherent adventure maker
>>
File: shes alive.mp4 (3.32 MB, 928x1376)
3.32 MB
3.32 MB MP4
>>
File: file.png (21 KB, 1048x90)
21 KB
21 KB PNG
It's over.
>>
>yet another 400 gorillion parameters model
>no unfucked 235b-a22b
128 ram+32/24 vram bwos... first the glm now the qwen... we have been TRAPPED and BETRAYED by the chinks...
>>
>>108162676
But thinking is bad for sex?
>>108161712
>have to prefill the thinking
Speaking of that does it really work or is it just the basic feeling you get (placebo)?
>>
>>108162332
I can't even run the iq1
I guess it's 4.7 till the end of time
>>
>>108162724
prefilling it so that it thinks uncensored has actually worked well for me
it's better than disabled thinking or normal thinking
>>
>>108162802
I want fast tokennage thoughever. do you close your thinking or you let it think after your prefill?
>>
>>108162724
>But thinking is bad for sex?
bimbopilled
>>
>>108162210
Okay. First impressions after playing 5 minutes with it.
This seems downright usable, and it seems to work really well with a first person reasoning prefill, although it's a little yappy in the thinking block by default it seems.
Going to give this thing a proper go later after I'm done with work, but first impressions are good.
>>
How would it be possible to let something like GLM5 reason over something in an image? Could you feed the image into a vision model and then let it describe the image to GLM5? Or will future big models need to be trained on vision?
>>
>>108162724
thinking prefills generally work, how well depends on the model
>>
>>108159925
I asked that question as a joke.
>>
>>108162802
Post your prefill?
>>
>>108162812
I usually let it think after the prefill. Seeing it enthusiastically think about stuff that it would normally refuse is fun to watch.
>>
File: file.png (22 KB, 372x66)
22 KB
22 KB PNG
BREAKING NEWS: JOHN HAS LIKED THE MODEL
>BREAKING NEWS: JOHN HAS LIKED THE MODEL
BREAKING NEWS: JOHN HAS LIKED THE MODEL
>BREAKING NEWS: JOHN HAS LIKED THE MODEL
>>
>>108161712
How do you do that? Can you give an example of the workflow?
>>
prompt eval time =    1282.00 ms /   734 tokens (    1.75 ms per token,   572.54 tokens per second)
eval time = 4001.28 ms / 114 tokens ( 35.10 ms per token, 28.49 tokens per second)
total time = 5283.28 ms / 848 tokens

Now we wait for llama.cpp to implement Qwen3.5 properly. This is half the performance I get on GLM 4.7.
>>
btw why do people waste money and compute on Moltbook?
>>
>>108162910
garm bros... raise ur PPL for JOHN!!!
>>
File: -.jpg (9 KB, 200x200)
9 KB
9 KB JPG
>>108162910
+1 bussy credit
>>
>>108162935
half psychosis, half jeets, half cryptoscammers
>>
>>108162935
It was literally humans larding as bots in a half-baked buyout pump and dump scam. It’s worth exactly zero seconds of thought or notice
>>
>>108162323
AGI
>>
openai hiring the retard who made clawdbot makes me seriously doubt the sanity of the people working at openai
this is the guy who previously kept posting shit like this:
https://steipete.me/posts/just-one-more-prompt
>You might not realize how important that was for me. I burned out after selling my company in 2021 and basically didn’t touch my computer for 3 years. I only used my phone… like a normie! So, having found my way back, the pendulum did swing heavily in the other direction.
>The last few months feel like a blur, and I’m on a new journey how to better control my slot machine addiction. Honestly, I’m failing quite hard. I’m having way too much fun here, and there are all these ideas in my head that need to be codified.
quoting and agreeing with people who say shit like this
>why do 6 [days of work] when you can do 7
this clawd crap does nothing that other agents didn't do before, in fact it does less since it doesn't try to put any form of guard rail and stops which is what differentiates it from agents that can't edit their own settings and blow the world
100% hustle culture, fake it till you make it kind of nigger
>>
>sanity of the people working at openai
lol
>>
What's the difference between Qwen 3.5 plus and Qwen 3.5
>>
>>108163013
>guard rail
>blow the world
blud sucked the dario teat a little too much
>>
>>108163057
you don't need dario to understand that you shouldn't give a model the ability to be prompt injected into rm -rf ~
fucking retard
>>
>>108163013

>this clawd crap does nothing that other agents didn't do before, in fact it does less since it doesn't try to put any form of guard rail and stops which is what differentiates it from agents that can't edit their own settings and blow the world

Wrong. Even if you enabled YOLO mode in other agentic harnesses like Cline or Claude Code, it would not work like OpenClaw.
>>
File: 1765724801670940.png (123 KB, 931x1136)
123 KB
123 KB PNG
What the fuck it's actually AGI.
>>
File: 1767903982956422.png (33 KB, 419x322)
33 KB
33 KB PNG
>>108163097
Qwen 3.5 Plus is GOAT, it's the first model I've tested that got the meme
>>
File: file.png (2.79 MB, 4000x4600)
2.79 MB
2.79 MB PNG
>>
>>108163097
I feel like invoking /pol/ in the prompt is going to direct it strongly to antisemitism.
>>
>>108163147
To be fair, almost anywhere else it would just be a corkscrew. Very few places would be as primed to that association as /pol/
>>
File: file.png (53 KB, 1060x207)
53 KB
53 KB PNG
>>108162676
So the best so far has been to put the GPU with the highest bandwidth first and the second highest last...
>>
ooba is always OOMing for me when I try a GLM quant
https://huggingface.co/turboderp/GLM-4.6V-exl3/tree/3.13bpw
I can run bigger quants of mistral on 48gb, only GLM seems to be a problem.

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 24.00 GiB of which 1.67 GiB is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. 1.23 GiB allowed; Of the allocated memory 20.15 GiB is allocated by PyTorch, and 16.20 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

any ideas?
>>
>>108163097
it completely made up the second thing, and didnt even mention the actual screw looks like jew sideburns
>>
>>108163178
It did in its thinking process >>108163108
>>
>>108163167
Change max context. Different models have different requirements
>>
>>108163178
your honor, i didn't even mention their race, nor the rituals
does it not beg the question....?
>>
>>108163189
I have it really low at 8k for GLM, going much lower will make it unusable is it really that heavy on context
>>
>>108163167
>>108163210
pretty sure it has something to do with the vision processor taking up a whole bunch of space. try the 2.8bpw quant.
>>
>>108163167
>>108163218
how do you use images in local like ooba, I would like to see if something like this could tag images for ai training
>>
>>108163247
works just like other shit. you paste it or upload it and it just does its thing. if you told it what tags to use and how to use those tags, it could do it. you probably dont need more than gemma 12b vl or qwen for something like that though.
>>
How does the new qwen fare against glm and kimi for work (not rp)?
>>
>>108163300
I don't have the patience to test it >>108162923
>>
I've never messed with MoE models. Is the additional non-V RAM useful at all if you're just using it for RP or whatever and not taking advantage of the multimodal stuff?

As a GPUlet that would be quite useful. Although nowadays buying a 2nd GPU is probably cheaper than buying more RAM anyway.
>>
>>108163343
>is the additional non-V RAM useful
Very useful. If you can fit the model in memory you can probably run it at close to reading speed.
>>
>>108163369
Damn, that sounds too good to be true. Well, no reason not to give it a shot I guess.
>>
>>108163343
>Although nowadays buying a 2nd GPU is probably cheaper than buying more RAM anyway.
You just need to make sure your rig can handle 2 GPUs.
>>
>>108163504
true dat. i don't know that you can reliably do more than 3 3090s on a single 120V 15A circuit even with undervolting, and that's with a 1500W PSU
>>
>>108163528
since they can briefly spike to up to 800w each, no you cant
>>
>>108162525
>>108162628
Why does it keep making her cry?
>>
File: 1758994450256173.png (158 KB, 1375x544)
158 KB
158 KB PNG
Is the only way to avoid libtard "guiderails" to have a local model?
>>
>>108163612
he's 100% prompting for it
he's the same kind of turdworlder spamming tiktok, facebook etc with shrimp jesus or homeless cats getting eaten by sharks and whatever other absurdity that makes no sense
he's just adapted the attention whoring to /lmg/ by using miku as the subject
>>
>>108163593
>since they can briefly spike to up to 800w
When? it's hard to believe considering it's a 350W card. I'm running mine on a 600W PSU that is pretty much maxed out and I never had a crash. surely if it spiked to 800w it would crash my PC.
>>
>>108163593
You can if you decrease the maximum core frequency. The spikes come from the GPU trying to boost to 2000-2100 MHz briefly before power gets limited.
>>
File: 3090vo.png (70 KB, 768x688)
70 KB
70 KB PNG
>>108163750 (me)
Draw your own conclusions from this graph.
>>
>>108163652
the guide rails aren't a liberal idea fuckwit. Its a basic measure to stop dumb people and kids getting dumb ideas and hurting themselves when models are served to the general public.

If you feel like you are a responsible adult (i doubt) and want to turn them off then you just need to use a few braincells and write a system prompt

https://rentry.org/jb-listing
>>
WHERE IS MY QUANT JOHN?!
>>
>muh warning labels
>>
>>108162827
I've used whichever IQ3 Ubergarm quant and I found it very usable. Probably wouldn't drop to IQ2 personally but if you're on such limited VRAM you gotta make those experts as small as possible, ne? It's a good model for a n_vram+64GB system save for the excess reasoning which I'd expect to be your biggest issue with limited context
>>
File: GEN2_Scaling-1.png (735 KB, 2000x796)
735 KB
735 KB PNG
>>108159576
what do you huys think of these fags?
https://mythic.ai/
out of the defense faggotry, they claim a 100x reduction on cost and given that llms do not need full precision it makes sense what they are doing
>>
>>108163911
>they aren't a liberal idea
Except they're there to enforce liberal ideology.
>>
>>108159576
I like these mikus
>>
>>108164008
>there to enforce liberal ideology
ah, the famously liberal ideology of chinese models (they too have guardrails and they too will trip up on most of the same topics as burger models, try to say something antisemitic)
>>
>>108164002
>Llama 3 1T
Let's fucking go! Densebros are back
>>
>>108164008
they're there for one reason and one reason only: to defend against bad PR for the company that makes them
>>
I've retired from Nala test so someone else will have to get this.
>>
>>108164069
Only because 100% of their training data is regurgitated western model outputs.
>>
>>108164120
Noooooo
>>
>>108164120
Why? Have you found god? Girlfriend?
>>
>>108164120
it's over
>>
File: file.png (8 KB, 607x38)
8 KB
8 KB PNG
>>
>>108164163
You still have greedy Nala anon
>>108164147
Using the hardware to host my Minecraft world plus now a wow private server
>>
cohere bros didn't abandon to us!
>This PR adds native support for the CohereLabs/tiny-aya https://github.com/ggml-org/llama.cpp/pull/19611
>>
>>108164120
What's a Nala test?
>>
>>108164202
>tiny-aya
Sounds like a 9B super censored retard. I can't wait anon....
>>
>>108164220
neither can me
>>
File: paper_preview.png (744 KB, 1248x650)
744 KB
744 KB PNG
>>108164202
>CohereLabs
A gentile reminder
>https://huggingface.co/datasets/CohereLabs/aya_redteaming
>>
>>108164097
i was more meaning the 16tok/s per watt figure on a 1 t model
or the 1.25million token/s in general
llama3 is probably the easiest model to try, but feels weird that they used that one
>>
>>108164002
did they really? https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct
>>
>>108164099
You're crazy if you don't think SF engineers aren't extremely liberal themselves.
>>
>>108164287
sure, what I said is still true though
>>
>>108164220
they call it "tiny", something they didn't to their previous 7b/8b models (aya vision 8b, command-R7B), I have a feeling it's going to be the sub 3B range.
If it had been a 9B like you say I would have been glad, back when it came out, aya vision was the SOTA of that size class for translation but nobody here cared because it was never supported by llama.cpp, which was sad because it was a serious improvement over the previous aya models and did more than just add vision
I'd welcome a new aya in the "useful local tool" size class.
>>
>>108164002
Sounds promising and I hope it works out for them, but as a rule I consider all alternative computing hardware companies to be vaporware until they ship a product that works
>>
>>108164292
But it turns out that even under Trump our culture hasn't changed at all and we're still at the behest of the libtards and commies. Great. A bunch of AI to replace our jobs (or so they say) and the world is still gay.
>>
>>108155261
Please respond.
>>
>>108164292
I mean Anthropic was literally founded by people who thought OpenAI wasn't libtarded enough but okay.
>>
>>108164400
If you are not doing parallel processing and using the devices in series, then yes. I think so, at least.
>>
>>108164435
you have a very simplistic understanding of the world
>>
>>108164435
>Anthropic was literally founded by people who thought OpenAI wasn't libtarded enough
ah yes the famous libtards that work with palantir
https://investors.palantir.com/news-details/2024/Anthropic-and-Palantir-Partner-to-Bring-Claude-AI-Models-to-AWS-for-U.S.-Government-Intelligence-and-Defense-Operations/
yes this is the most traditional libtard thing to do
we all know no one is more of a libtard than Adolf Hitler, I mean, Peter Thiel, who thinks Greta Thunberg is the literal agent of the antichrist
https://theconversation.com/peter-thiel-thinks-greta-thunberg-could-be-the-antichrist-what-actually-is-the-antichrist-267439
you know what is funny about wingnuts like you
you think the world you hate is ruled by the people who do not actually have any power/influence
and it is the people who are most aligned ideologically to you who are building the world you hate
>>
>>108164490
>muh Thiel
>muh Palantir
Might as well just come out and say you're a communist.
>>
File: 1767042860754944.png (260 KB, 680x778)
260 KB
260 KB PNG
>>108164473
>>
>>108164534
thanks for the confirmation
>>
>>108164546
Dude, I'm certain whatever your "nuanced" beliefs are, they just lead you to vote democrat think communism is the ultimate moral end goal of human development.
>>
burgers are truly an irredeemable golem
I can't wait for Russia to go nuclear and turn your people into sand
>>
>>108164570
Shouldn't you be offering up your daughters to "Moe" or something?
>>
>>108164523
they are as right wing as it can get, this is not even a questionable shit to say, they say it openly that papaltir was to eliminate leftists
>>108164534
anon what you call left is right, liberals are right wing
>>108164559
jesus fucking christ, democrats are moderate conservative right wingers for fucks sake, how retardly right wing is the political spectrum of usa
>>
>>108164559
local models?
if you read carefully I am not making any sort of broader statement about my politics, I am specifically commenting on this:
>I mean Anthropic was literally founded by people who thought OpenAI wasn't libtarded enough but okay.
which is a totally moronic way of looking at the situation that could only come from someone whose brain has been melted by partisan political slop
>>
>>108164585
You are literally just saying shit that every communist says. You are a communist. You should be shot and killed.
>>
>>108164592
that's enough out of you baitie
>>
>>108164599
Why don't you go cry to one of your furfag jannie buddies and make me?
>>
>>108164592
anon democrats are more to the right that my former fascist catholic dictatorship...
compare them to other parts of the world
you are brainroted
>>
All that is missing from this very on-topic discussion is a picture of a greenhaired transsexual avatar of /lmg/.
>>
File: file.png (77 KB, 963x261)
77 KB
77 KB PNG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA! WHERE IS 3BIT?!
>>
>>108164623
Yes, I know, anything that isn't Stalin is right-wing to you.
You're still gonna vote for them like every election, though.
>>
>of course he's here when this happens, total coincidence
>>
>>108164653
>https://huggingface.co/spaces/ggml-org/gguf-my-repo
>>
>>108164654
anon read what i posted
>my former fascist catholic dictatorship
i am not from your shithole
>>
>>108164653
>those filesizes
uh oh...
>>
>>108164668
??
>>
>>108164667
Yes, I know, you're from Spain and are a communist, which is the exact same as a democrat in every actual functional respect, especially since all the communist parties in the western world hang out together and have the exact same fucking ideas.
>>
>>108164666
Thank you. I will do it next time.
>>
>>108164666
that shit's been broken for months now
>>
>>108164679
there is no way a q2 should be 15GB when a Q4 is 240GB, one of those quants is fucked
>>
Any conversational CPU ollama recommendations? I'm just trying random ones. Some are actually pretty fast, but bad. I haven't seen anything great that's for CPU yet.
>>
>>108164684
>since all the communist parties in the western world hang out together and have the exact same fucking ideas.
the projection is rich
Europe is drowning under far right propaganda these days bankrolled by musk and his ilk, who are even wielding the burger government to threaten people who would want to get rid of X or regulate it, see what happened the moment the britbongs tried to do something about it
>>
>>108164706
please stay calm but i think you should call an for help
>>
>>108164725
??
>>
>>108164445
I guess ideally I was hoping to split the model across two GPUs, but I don't fully understand how much "talk" there is between them when doing inferencing
>>
3bit is finally hitting the repo.
>>
File: file.png (22 KB, 603x106)
22 KB
22 KB PNG
i wish...
>>
>>108164724
Yeah, Europe is totally captured by american brand far right fascists such as "literally hitler" elon musk who've institutionalized racism, colonialism, and bigotry in your home country.

You live in fantasy land and have proven yourself incapable of responsible governance due to your own delusions. "Far right propaganda" is the least of Europe's worries by the way. Not as if you ever cared about your own people assuming your hands aren't brown.
>>
qwen 3.5 thoughts read exactly like gemini's
>>
>>108161568
Qwen3.5 does actually seem the best here, but it's impossible to really tell with such short excerpts.
gpt-oss and minimax are just depressing. I feel like that's where most if not all models are going to inevitably end up at some point since nobody seems to have the balls to take on the risk of being known as the porn model.
>>
whewre the fuck are the small gwens 3.5?!!??!?! I NEED MUH VISHION!!!!!
>>
>>108165309
Gemini without the censorship is pretty good. Now I just need something I can actually run.
>>
>>108165342
>Qwen
>without the censorship
anon I...
>>
>>108165356
trust in MPOA
>>
Do you think John uses his quants for gooning like a human?
>>
>>108165356
Compared to Google? Yeah. I can handle a little prefill.
>>
>>108165334
>porn model.
The stigma on sex is absolutely wild. Its one of the most natural and beautiful things to exist and we suppress even the slightest hint of it at every turn. I propose frontier labs create a suite of SEX benchmarks so we can optimize towards cooming.
>>
>>108165368
do johns dream of meme quants?
>>
>want to run ocr server for requests from a local program
>only vllm and transformers support the model
>vllm only works on linux
>install wsl
>install fedora
>set up uv project
>set up config yaml
>try launching server
>server fails because default option tries allocating too much memory
>ok, adjust config file
>multimedia projector profiling crashes wsl because it tries encoding a whole video at max resolution
>ok, add the skip argument
>server fails because there is no c compiler installed
>ok, install gcc
>server fails because there is no nvcc installed
>look it up and sane cuda toolkit installation is ONLY SUPPORTED FOR UBUNTU WSL
I WASTED AN ENTIRE FUCKING DAY TRYING TO GET THIS SHIT WORKING AND IN THE END I FOUND OUT TRANSFORMERS ALSO OFFERS A CLI CHAT COMPLETION SERVER
FUCK VLLM, FUCK WINDOWS AND FUCK NVIDIA
>>
>>108165309
>qwen 3.5 thoughts read exactly like gemini's
https://cloud.google.com/blog/topics/threat-intelligence/distillation-experimentation-integration-ai-adversarial-use
>Google DeepMind and GTIG have identified an increase in model extraction attempts or "distillation attacks," a method of intellectual property theft that violates Google's terms of service. Throughout this report we've noted steps we've taken to thwart malicious activity, including Google detecting, disrupting, and mitigating model extraction activity
>>
>>108165385
skill issue
>>
>>108165381
If it’s not taboo then it’s boring and populations collapse. Many such cases
>>
>>108165385
But I heard linux is great now and you should totally stop using windows.
>>
File: file.png (274 KB, 842x894)
274 KB
274 KB PNG
>>108165309
It's not surprising.
https://cloud.google.com/blog/topics/threat-intelligence/distillation-experimentation-integration-ai-adversarial-use
I don't think it's as threatening as what Google says but it is happening.
>>
>>108165334
minimax is a great coom model imo, don't draw your conclusions from meme benches
>>
>>108165405
I disagree. It's more taboo than ever yet the birth rates are at their lowest. The biggest determinate for population collapse is female education and employment rates (feminism). Politics aside, we need to change the culture to view beauty and sex in a more positive light.
>>
File: 1768536254907153.png (197 KB, 2066x1235)
197 KB
197 KB PNG
>>108165385
ai generated ragebait post
>>
>>108165411
>I continue to sexual assault my sleeping brother. I can't stop myself. I sexual assault my sleeping brother.
You just want people to download 150GB of weights don't you?
>>
>>108165385
>server fails because default option tries allocating too much memory
WSL has too many memory management issues for AI. VLLM will make it worse since it just assumes by default it's the only running on your machine and will use ALL resources available.
>>
File: file.png (13 KB, 285x94)
13 KB
13 KB PNG
>>108165433
proving his point dumbass
>>
>>108165433
linux is linux, why would there be different installers for random skins?
>>
I was running Cuda in in WSL arch no problem?
>>
>>108165434
pure unfiltered llama2-era soul
>>
>>108165455
you're absolutely right
>>
>>108165433
I looked into that. You CAN'T use standard cuda toolkit linux packages because they overwrite the nvidia display drivers that already exist in wsl which are ported from windows

>>108165407
I would love to but i need a second ssd to backup all my local shit and prices are batshit insane right now
>>
>>108165505
You can install the regular packages as long as you only install the toolkit metapackage instead of the ones that contain the toolkit and drivers.
>>
>>108165434
I mean, that behavior itself should immediately indicate something is broken with the setup lol
it needs a thinking prefill to really let loose but it's no worse than something like glm4.7 in that regard. nsfw knowledge is completely present and for me at least it's smarter, better-paced, and less annoying than its similarly-sized competitors in qwen 235 and step 3.5. the only thing it has in common with toss is that both are extremely RL-heavy and thus super sensitive to their chat templates
>>
>>108165529
>should immediately indicate something is broken with the setup
Yes I can see that the model is mindbroken by safety.
>>
File: file.png (297 KB, 550x385)
297 KB
297 KB PNG
I have finished downloading Q3 of Qwen3.5
>>
>>108165553
>tfw cant even run q1
128gb bros... WE LOSTED
>>
>>108165543
pearls before swine
>>
>>108161568
>it's soft, resting against your thigh
What explains all these models converging on the same output?
>>
>>108165568
You really shouldn't bother with MoEs under 32B active.
>>
>>108165578
all buying up the same data/distilling from the same models
>>
>>108165578
everyone's training on each other's outputs. it's an ouroboros of data converging into a slop singularity
>>
>>108159576
This general has one of the better op pic ngl
>>
File: file.png (167 KB, 416x416)
167 KB
167 KB PNG
>>108165578
See this Asian satan? He did it.
>>
>>108165578
shit prompt
>>
>>108165584
The 20-30B dense models that they can run like Mistral Small, Gemma, etc aren't better.
>>
>>108165646
Let me guess. You have at least 2 3090's?
>>
Do these small quants work? Like is a 2-bit quant of GLM 5 better than a model of a similar size (250 GB)?
Minimax 2.5 is about 250 GB, right? Is the 2-bit quant of GLM 5 better than Minimax 2.5 at full precision?
>>
>>108165655
Me? No. I'm a poorfag.
>>
>>108165657
That is usually the case.
>>
>>108165669
So: GLM 5 Q2 > Qwen 3.5 Q4 > Minimax M2.5 full?
Interesting.
I wonder if something like Qwen 3 Coder at 1 or 2 bits would work well as autocomplete in something like cursor/vscode or whether that's too slow for reasonable home hardware.
>>
>>108165709
For plain autocomplete you can use qwen 3 coder next 80b or the qwen 3 coder 30b a3b.
>>
File: 1769877904096646.png (321 KB, 1485x4420)
321 KB
321 KB PNG
>>108165657
It depends on the specific model and specific quant and even task. 2 bits is not really just 2 bits in GGUF, there's a ton of variables that affect things in this range of quant.
>>
>>108165553(me)
My first impression is that it is 5T/s and it is not instaretarded like step or trinity. I think I will enjoy this one.
>>
https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF
https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF
https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF
https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF
https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF
https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF
>>
File: file.jpg (447 KB, 920x1440)
447 KB
447 KB JPG
>>108159657
>>
>>108165770
>can barely run a q1
>nowhere near enough context to get an actual story out of it
Fuck
Fuuuuuuuck
If LLMs could actually be intelligent, they wouldn't need to get more and more fuckhueg, LeCunny is right
>>
>We suggest using Temperature=0.6, TopP=0.95, TopK=20, and MinP=0 for thinking mode and using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0 for non-thinking mode.
>and MinP=0 for thinking mode
BASED. Fuck conmen.
>>
>>108165718
Yeah, I've gathered that. I'm not really comparing the same models either, so things like KL divergence don't really work.
I guess what would be interesting to see would be how something like full Minimax M2.5 measures up against GLM 4.7 of some Q4 against GLM 5 of some Q1 or Q2 against Kimi K2.5 Q2.

Basically: how well do comparable model sizes perform? In terms of speed and actual output.
>>108165717
>qwen 3 coder 30b a3b.
3b active means it should be really fast, right? I guess that's what you want more than anything for autocomplete.
>>
File: 1752374614465935.png (233 KB, 349x767)
233 KB
233 KB PNG
can someone help
>>
>>108165431
I also disagree (with your disagreeal). Sex positivity and easy access to porn from essentially age zero has made it common and boring as fuck
>>
>>108165568
So 129 is the new minimum?
>>
>>108165794
Your lack of hardware is not a failure of the technology
>>
>>108165846
>common and boring as fuck
for chads and staceys
>>
>>108165865
A truly smart model doesn't need to be bloated to insane sizes to include every smidge of trivia its hoovered up from the internet.
>>
>>108165888
Interesting theory. Huge if true.
Can you substantiate it?
>>
>>108165846
Fair point, but I'll maintain my position. If shaming sexuality and restricting porn resulted in higher birth rates due to its being "taboo" enough, South Korea wouldn't have the worst demography in the world.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.