[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106738470 & >>106729809

►News
>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6
>(09/30) Sequential Diffusion Language Models released: https://hf.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552
>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-preview
>(09/29) DeepSeek-V3.2-Exp released: https://hf.co/collections/deepseek-ai/deepseek-v32-68da2f317324c70047c28f66
>(09/27) HunyuanVideo-Foley for video to audio released: https://hf.co/tencent/HunyuanVideo-Foley

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: no particular reason.jpg (306 KB, 1536x1536)
306 KB
306 KB JPG
►Recent Highlights from the Previous Thread: >>106738470

--Paper: Sequential Diffusion Language Models:
>106743521 >106743844
--Papers:
>106743597 >106743650 >106744459
--Building multimodal RAG systems with morphik.ai and qwen3-vl:
>106738676 >106738891 >106739314 >106739374 >106739595 >106739732 >106739776 >106739975 >106740025 >106740155 >106740221 >106740302 >106740526 >106740708 >106740911 >106744029 >106744053 >106746008 >106739682 >106739744 >106739155
--GLM-4.6 model update with 200K token context window for enhanced agentic tasks:
>106744045 >106744056 >106744058 >106744346
--LoRA in RL matches full-finetuning performance with 2/3 resource usage:
>106740379
--Proposing a parameter to reduce formatting-driven repetition in model outputs:
>106745814 >106745845
--GLM-4.6 outperforms larger Deepseek despite fewer parameters:
>106746646 >106746877 >106747030 >106747076 >106747443
--Speculation about glm4.6 vision integration and existing model limitations:
>106742710 >106742714 >106742770
--VRAM pooling vs local execution with RTX 3090 24GB for LLMs:
>106739448 >106739546
--Lorebook limitations in bypassing model safeguards and context flooding risks:
>106739761 >106739782 >106739799 >106739840 >106739924 >106741168 >106741221 >106743857 >106741428 >106742363 >106742372 >106741468
--Hardware recommendations for running LLMs like Mistral Nemo GGUF with VRAM/RAM considerations:
>106746751 >106746759 >106746768 >106746787 >106746788 >106746962
--AI model refusal behavior and alignment training critique:
>106745464 >106745474 >106745520 >106745561 >106745611 >106745697 >106746211
--GLM-4.5 transformers compatibility update:
>106744567
--GLM 4.6 performance comparison with Qwen3-Coder using CC-Bench metrics:
>106743685 >106743785 >106743807
--Miku (free space):
>106739332 >106739379 >106739390 >106740160 >106745506

►Recent Highlight Posts from the Previous Thread: >>106738476

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
this thread is going to hell
>>
File: 1733491296330353.png (613 KB, 694x1238)
613 KB
613 KB PNG
local lost
https://www.youtube.com/watch?v=gzneGhpXwjU
>>
>>106748605
it is mikutroon quality. you asked for this.
>>
>>106748605
this is hell, bloody basterd bitch
>>
How does the 8b qwen MoE compare to GLM air for ERP?
>>
File: file.png (16 KB, 330x512)
16 KB
16 KB PNG
ooeeoo
>>
>>106748610
NGL it does look pretty impressive. But how much does it cost to use?
>>
File: saddaaasda.jpg (776 KB, 1012x4297)
776 KB
776 KB JPG
needs a little love in terms of scene-scene consistency and manual touchups (none, it's SaaS you take your credit genned result and you like it)
visual quality is high in places but even the chair shapes are fucked which was a surprise
>>
>>106748610
like it reproduced the Us Open court from scratch? wtf
>>
>>106748605
this thread will be alive again when AI eventually gets the ban hammer somewhere in the world
>>
>>106748655
omg it is migu
>>
>>106748674
sama's self insert!?
>>
>>106748610
this is one of the worst things I've ever seen. Literally who asked for an AI only social media platform? Why wouldn't people just upload their aislop to regular social media?
The character cameo feature will surely cause some PR incidents for openai. Teens will make lewd gens of their real schoolmates. Yeah openai probably have some safety filters in place but what if the original cameo recording is taken with a revealing outfit or the girl has massive tits?
The whole concept is retarded from conception.
>>
Thoughts on applying some noise in layers to make models more creative?
>>
>>106748605
This thread is the V4/R2 waiting room, and it has been a loooong wait.
>>
>>106748610
This is pretty much just on the level of Veo 3 which was released 5 months ago.
This 8-10 second duration threshold seems to be a significant problem because you'd think we would have something by now.
>>
File: 1759254054173564.png (330 KB, 431x945)
330 KB
330 KB PNG
>>106748610
>"our model is great at physics"
>not flowing out of the tap but out of his mouth
>tap levitating in the air
>nonensical body movements that only models from a year ago had
lmao, what the fuck were openai doing all these months despite having sora 1 all that time ago? this is just brutal
>>
>>106748706
isn't that like just increasing the temperature but at the layer level?
>>
>>106748736
desu I never expected OpenAI to reach Google's level for video models, so that is impressive on itself
>>
JOHHHNNNN WHERE IS MY SEX?
>>
>>106748706
There's a project called DRUGS that does exactly that.
>>
>>106748752
Yeah, but that's usually applied at the output level only. I wonder what would happen if we applied it in other parts of the model.
>>
Re suggesting old suggestion for new meme sampler
Wiggle
A random rotation, within a specified range, is applied to each tensor at the time of inferencing.
>>
>>106748751
>the skin texture is good
>the face likeliness is great
I think it's the best model to reproduce someone else's behavior, for a company focused on "safety" that's why they went that hard on that lol
>>
>>106748736
>>106748751
I don't understand how Sora could create one minute long videos in February of 2024 that looked better then this shit
>>
>>106748777
>one minute long videos in February of 2024
they weren't one minute long lol, those were short 5 sec clip videos
>>
ama getter shriller
>>
>>106748774
Good thing that you can't reliably make a model adhere to anything so I expect a bunch of celebrities cursing niggers by the end of today
>>
>>106748777
OpenAI was targeted by everyone for poaching. They ain't got no brains left.
>>
>>106748786
https://www.youtube.com/watch?v=tRSdt5kmeW0
>>
>>106748700
their idea is to contain AI slop spam on specific AI slop platforms. this serves the purpose of...fuck, I don't know...so they can train freely on all the user generated content without having to worry about copyright? create a containment zone for deepfakes that are all watermarked by their app? honestly I'm all for it. you ever thought about how forums and chans would look like if zoomers didnt have tiktok? so if this can achieve the same thing for low quality sloppers, i'm all for it
>>
>>106748798
>They ain't got no brains left.
Idk man, if they have no brains left, Sora 2 wouldn't look this decent
>>
how long until Google drops their next video model now?
>>
>>106748812
oh yeah that's true, my b, maybe they just continued from the last frame to continue the video or some shit, we never had the occasion to get more than 10 sec video from any API model so far
>>
>>106748767
Probably wouldn't be too different from what happens in models with mixed quantization.
>>
>>106748714
>he still believes in deepseek after their recent releases
it's over, just accept it anon. ZAI are our new saviors now
>>
>>106748706
>>106748762
https://github.com/EGjoni/DRUGS
This guy explored the topic
>>
has anyone tried 4.6 yet?
>>
>>106748610
this is actually pretty good
https://xcancel.com/GabrielPeterss4/status/1973071380842229781#m
>>
File: .png (447 KB, 466x797)
447 KB
447 KB PNG
>>106748902
>putting an end to the uncanny ai vibes
KEK
>>
>>106748932
desu I never seen the real Sam Altman smile so maybe it looks like that kek
>>
>>106748630
Sounds reminiscent of the cancer vocals AI songs have. I really do hate it.
>>
>>106748902
YUCK
>>
>>106748950
Suno has a V5 now. The vocals sound pretty damn good, although it suffers from serious same-songiness now.
>>
>>106748841
this
and glm also fits on a lot more consumer PCs
>>
>>106749095
Exactly. GLM hasn't done anything new except repackage DS for poorfags. If you want something new you need to wait for DS, otherwise it's just incremental improvements forever.
>>
Was googles ai mode model swapped out? It seems more bing level retarded the last few days
>>
>>106748655
yeee
https://www.youtube.com/watch?v=3kV_xkoDI7c
>>
>>106749150
Absolutely false statement. GLM-4.5 Air is for poorfags and has middling performance, GLM-4.5 is a full on replacement for Deepseek that's better in every way. It's not as schizo as the originals or as censored as the new ones. 4.6 might be a complete Sonnet replacement once GGUFs come out.
Ubergarm please do your job
>>
>>106749219
Not noticed any differences when comparing a bunch of items. (That's all I've been using it for.)
It does occassionally fail to imediately tie a comment back to chat history, and you have to prompt it in a follow up comment.
But that was a problem earlier too.
>>
File: hatsune chiizu.png (8 KB, 512x512)
8 KB
8 KB PNG
>>
>>106749290
>GLM-4.5 is a full on replacement for Deepseek
It's a sidegrade. They have done nothing significant to improve on what Deepseek already released.
>>
>>106749000
Oh God. This is so fucking good. There's no way local music gen is ever catching up to this. Prompt done by ChatGPT-5 (there was some back and forth feedback to calibrate things, so don't expect same results)
https://suno.com/s/Mpq3Bb34loTIgq2D
>>
new 4.6 glm is fucking crazy good btw, they did some black magic that only claude opus did before
>>
>>106749320
Better attention, more response variety, and half the size while maintaining a similar knowledge base. So far all that Deepseek has managed is to calm the R1 schizophrenia and cull the interesting quirks that made it a good writer. The last good DS release was V3-0324 because of that.
IMO DS has become completely irrelevant at this moment. K2-0905 has its own flaws but its knowledge is unmatched, and GLM-4.5 has more creative variety while remaining coherent and almost half the size of DS.
>>106749406
How did you manage to use it? OR provider is down
>>
>>106749428
https://huggingface.co/zai-org/GLM-4.6
>>
>>106749428
>and half the size
>and almost half the size of DS.
Exactly, that's all it has going for it. Which is great for poorfags, but people are hoping for another leap from DS. Holding their experimental point releases against them is stupid. If you're lucky they might even put out a Lite version for their next series.
>>
>>106748610
TL;DW
>>
>>106749406
Can fucking confirm. Creative, no repetition, no slop. They literally made a footnote of "improved roleplay" and fixed everything, something the community has been failing to fix for 3 years.
>>
>>106749479
at best they'll do another round of distill shits
>>
>>106749479
DS3 is now retarded in comparison and does not know more any more, try it
>>
>>106749400
>https://suno.com/s/Mpq3Bb34loTIgq2D
I agree, this is awesome.
but.. it's also proprietary.
so what use is it to anyone, really?
>>
>>106749506
Bro they saved so much money tho who cares?
>>
>>106749400
breddy gud
crazy how ai is like a meta-tool in all these domains...you can move "farther up the stack" if you know what you want out of the realm of all possibilities and can even get pro-level results if how to do qc on it. eg. bro who can make tracks can suddenly move up to being a producer or even a full on record label with swarms of automatons.
guess we need to see how long before the human element is slowing shit down too much, even at the uppermost layers of what's needed to orchestrate things
>>
>>106749538
for creating emergent soundscapes that you would otherwise never hear. Like I'm talking about from a musical perspective it's just insane the way it side-chains and counterpoints all of the different sounds the way it does. It's like a dimensionality beyond just the soundwave itself.
>>
>>106749479
Okay, ASIDE from being half the size, the newer DS models are really corporate. It became obvious with R1-0528 and it has continued plummeting downhill ever since. It's not nearly as bad as Qwen but they are very obviously benchmaxxing the models and letting people continue using V3-0324 or R1 for DS RP. GLM-4.5 is a much better RP model while being less prone to the shit em-dash prose.
>>106749506
This
>>106749541
V3.1 was already bad, 3.2 didn't change anything except context attention.
>>
>>106749559 (Me)
Oh I've also kind of demonstrated that through in-context learning an LLM can be taught to directly manipulate the soundscape through the prompt and make adjustments that are abstract beyond technical explanation. Imagine what a model trained natively on this would be able to do. Like you would literally gain forbidden knowledge in the process.
>>
>>106749400
>>106749548
>>106749559
Did any of the open musicgen things ever produce a decent model that can compete, even with old revs of suno? No music hording anon create a custom one to share?
>>
>>106749580
>GLM-4.5 is a much better RP model
You are a retard and still not getting the point. GLM-4.5 would not exist without DeepSeek-V3. Before V3/R1 the best we had was what, 405B and Mistral Large? It was a huge leap. GLM isn't going to get better without someone to copy from. If V4/R2 ends up only as a smaller model good for RP, I'm sure you'll be ecstatic, but that would be a disappointing release. GLM has proven not been able to do anything but incremental improvements, but hopefully DS won't disappoint.
>>
>>106749621
The best local one I heard I think is probably pretty close to being as good as Suno 3.0 which is pretty respectable.
Like if you want to make music and shit for a youtube video or something local has tools that'll get the job done. But if you want something like next level, the kind of shit that makes musicians make seething copium youtube videos then you gotta pay the corpo man.
>>
>>106749625
but ds already disappointed for months now
>>
>>106749654
DeepSeek V4 and R2 were out for months now? Link?
>>
macGODS keep on winning
https://xcancel.com/awnihannun/status/1973063906341114327#m
>>
why are people talking here and not using it on their macs?
https://huggingface.co/cs2764/GLM-4.6-mlx-mixed_4_6

and this model is legit no longer chinese cope, this is claude sonnet tier
>>
>>106749625
The natural conclusion of your logic is to thank Sam Altman as our overlord that released the great and mighty GPT-2 which spurred GPT-3.5 into existence, then the leaker who released Llama-1, et cetera. Of course DS played a role in GLM-4.5's creation, DS established that MoE models are a viable and powerful alternative to dense models if designed properly. Something something Bhuddist concept of nothingness and the interconnectivity of everything required for existence.
DS has been stalling for a while now. Most of their releases have been increased model performance (V3.2-Exp, which is just V3.1 with altered attention mechanisms) or coding performance (see: all their releases for the past year).
DS is no longer the king it once was. There's a broader variety of models with a range of performance and quality, and it seems right now based on the benchmarks and all the yap in this thread GLM-4.6 just beat it and the best Claude has to offer.
>>
>>106749703
>DS established that MoE models are a viable and powerful alternative to dense models if designed properly.
gpt4 leak already did that
>>
>>106749665
there won't be r2 and they were supposed to release v4 way back the fact they haven't shows they're done
>>
>>106749642
AceStep? It was ok, but needed a better model.
I actually preferred some of the early Suno to what's out now. That weird base64 string mashup eastern-european choir soundscape thing an anon posted two years ago what super unique and haunting in a way the current slick sounding slop doesn't quite reach. A unique and alien quality that was more cyberpunk and awe inspiring.
>>
>>106749799
There was something recently not ace step that was way better. It sounds about suno 3.0 quality but I think it can only generate like 140 second segments or something. I'm not sure how fast either because someone else was providing the samples https://suno.com/s/B0Y5i9k7vElRAQRd
>>
>>106748610
oof, those voices come out pretty rough
>>
>>106749703
Bit funny to dismiss increased performance as insignificant in the supposed local general
>>
>>106748568
I look exactly like this down to the finest detail
>>
>>106750284
troon
>>
>>106749406
>>106749506
lol it's obvious RP is the only usecase that coomers here have
get your brains unfucked
>>
>>106750328
nyope *cums on you*
>>
>>106749703
I mean it is logical absolutely to thank Sam Altman and others at OpenAI who did GPT-3 to 4 for starting the LLM industry. That doesn't mean you have to love him or that he should be thanked for all the other bullshit they did.
>>
>>106750355
based
>>
I'm gonna cooooode
>>
>>106750328
? have you bothered trying 4.6? it beats all other local models for coding by a huge margin, first model that competes with sonnet
>>
>>106750361
>Thank me for my services
you wish, sam
>>
>>106750376
do you need a side of math with that sir
>>
>>106748610
Wow Sora 2 is revolutionary. Miles ahead of anything local. Well done Sam Altman
>>
Stop shitposting, You will never be Sam Altman.
>>
>>106750282
I guess the only reason I mention it is because they added those performance optimizations to a model that is subpar in the fields they trained it for. It's gotten to the point that they will only compare V3.1 to older models, V3.1-Terminus to V3.1, and V3.2 to V3.1-Terminus because ultimately it struggles to compete with anything newer than when R1 initially released. Its only advantage is the price which is almost certainly an artificial edge that lasts until Google decides to drop the price or create a loss-leader.
>>106750388
>quants require a mac or paying for API
Not yet. Will try when ubergarm (GET TO WORK FOR THE LOVE OF GOD PLEASE) releases a quant.
>>
deepseek was never good
>>
Even Sam is a VRAMlet
https://xcancel.com/ai_for_success/status/1973097111064289332#m
>>
File: 8man.png (221 KB, 581x327)
221 KB
221 KB PNG
mikupad fucked up it's data and refused to load properly until I wiped all browser cache and storage clean.
Logging in back into every website is whatever, but I lost some mikupad logs that I wanted to keep is painful.
>>
>>106750490
That looks so bad. Did they intentionally increase the compression to mask the fact its coherence is so horrible?
>>
>>106750490
Sam is one of us
>>
>>106750493
Does it keep the logs in local storage? You could have saved them to json using some javascript before clearing your browser cache.
>>
>>106749929
DiffRhythm?
>>
saars i am very exciting for glm 4.6
>>
>>106750535
isn't glm chinese?
>>
Can GLM 4.6 do creative writing?
>>
>>106750563
crazy good
>>
>>106750541
No it is the 100% glorious Bharat engineering you bloody fuck you motherfucking bitch bastard
>>
>>106750592
sure looks chinese to me
https://zhipu-ai.feishu.cn/wiki/Gv3swM0Yci7w7Zke9E0crhU7n7D
>>
>>106750604
>responding to bait
ngmi
>>
>>106750576
I'd wait for eqbench's test
GLM 4.5 wasn't that good at creative writing compared to how much they've been shilled here
>>
>>106750633
all the writing / rp servers are talking about it at least
>>
>>106750633
Have you tried 4.5 full or are you comparing what you've heard to EQBench? It's legitimately underrated on that benchmark.
Also Qwen3 gamed it once by training on EQBench and it ended up making weird sentences, so it's not exactly reliable or unbiased.
Like this.
Drawn out.
Two words.
Judge manipulation.
Qwen devs are frauds nowadays desu.
>>
>>106750604
Bloody besterds Chinese benchod stealing Brahmin inventions?!?!
>>
what the fuck is MLX format by the way and why would anyone use it?
>>
>>106750683
mac chad
>>
>>106750659
You can't "train" on eqbench because it's closed source LLM-judged; there is no gradient flow for you to use
>>
>>106750524
Yeah that was the one.
>>
>>106750687
jesus christ.
https://huggingface.co/mradermacher/Baptist-Christian-Bible-Expert-v2.0-12B-i1-GGUF
>>
File: 1732374163138663.png (599 KB, 868x1208)
599 KB
599 KB PNG
>>
>>106750647
Literally who? You don't mean random discord servers, do you?
>>
>>106750706
Wrong
https://eqbench.com/results/creative-writing-longform/Qwen__Qwen3-235B-A22B-Instruct-2507_longform_report.html
Open any of them and scroll down. It devolves into short sentences on every line.
They also specifically mentioned WritingBench as one of the benchmarks https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507
>>
>>106750775
certainly a better gadge for taste than here in diaper fetish tranny central
>>
>>106750770
Knowing they care about my safety enough to protect me from moving pictures fills me with a warm fuzzy feeling.
>>
>>106750633
>eqbench
I love this meme so much
>>
>>106750812
>fills me with a warm fuzzy feeling.
anon t-that's saltman's...
>>
>>106750770
is this some stealth ad
>>
>>106750821
You say that but the undisputed top in creative writing on eqbench, which are o3 and K2, write unlike any other models
>>
>>106750802
Then leave. And while you're at it quote a single diaper fetish post from this general, I bet you can't. Also you can't even spell gauge so your opinion on writing is worthless
>>
>>106750786
Why does anyone still take benchmarks seriously? Much less redditor created and LLM-judged memes. I wonder how profitable would it be to just cut out the middleman and just provide a leaderboard where the positions are auctioned off.
>>
>>106750825
https://www.youtube.com/watch?v=5WCoRGbT3CM
>>
>>106750833
Well meme'd, fellow mememaster.
>>
File: 1736549799629719.png (92 KB, 675x445)
92 KB
92 KB PNG
Can you trust it if a model advertises improved RP performance in its model card?
>>
Does GLM use MLA?
>>
>>106750860
Why trust? Test it yourself.
>>
>>106750860
Only if Claude says that the model's output is good.
>>
>>106750866
My Leaking Asshole?
>>
>https://x.com/Zai_org/status/1973134943158141421
>At the moment, we have no plans to release an "Air" version of GLM-4.6. Our focus for this release has been entirely on maximizing the power and capabilities of the single, flagship 4.6 model
AAAAAAAAAAAAAAAAAAAAACK
>>
>>106750493
Never trust browser storage. Always "download" backups.
>>
>>106750985
ramlets forever in shambles
>>
>>106750985
TPD - TOTAL POORFAG DEATH
>>
>>106751033
this is expensive to you?
>>
>>106751095
No, but the poors seem to love Air for some reason. Never tried it, enjoy 4.5 and K2 too much.
>>
>>106750985
FFFFFFFFFUUUU
Runnable at not shit quants with 72+128GB?
>>
File: 1748095941910786.mp4 (360 KB, 352x640)
360 KB
360 KB MP4
lmao, Sama won once again
https://xcancel.com/cloud11665/status/1973115723309515092#m
>>
>>106751162
I haven't used it but local videogen feels like our most advanced field. We're like two weeks behind this in terms of local released.
>>
>>106751162
KEK
>>
Ziggers won
>>
A repeated problem with GLM 4.5/4.6 is that when my character uses special powers NPCs all react like he's some sort of god even if they shouldn't.
>omg what are you? this changes everything
Example 1: playing in the Outer Reincarnation CYOA setting, using an Inborn Gift that my parents know I have *and that I inherited from them*.
Example 2: In the War of the Zodiac Brides CYOA the contestants falling over themselves in awe and terror when a sword doesn't cut me. When frankly that reaction is ridiculous given their own abilities plus they knew I'd have some powers although they didn't know in advance which.
DeepSeek hasn't done this for these scenarios so far. Just GLM.
>>
File: 1742472919874291.png (756 KB, 588x1069)
756 KB
756 KB PNG
>>106751162
ok this is funny
>>
what's the intended usecase for video gen outside of porn
is it just a sloptent machine
>>
>>106751270
Prefill thinking so that it considers if abilities, feats, and powers are considered exceptional in the story and how characters should react to it I guess.
>>
>>106751302
Currently it's to not hire a marketing department to do ad fulfillment, saving time and money to better serve slop ads to customers who will lap it up. The businesses are hoping it will eventually lead to a perpetual infinity machine of profits but these models are not that.
>>
>>106751270
show log, dont use any sort of prefill or jb, it does not really need it
>>
File: it migu.png (330 KB, 382x696)
330 KB
330 KB PNG
https://x.com/cloud11665/status/1973084825411264548
omg
>>
>>106751345
it's over, we lost her. she's sama's ow.
>>
>>106751345
>Sam is a mikutranny
kek xe is /ourgirl/
>>
>>106751345
why sam have accent?
>>
>>106751313
I adjusted the prompts obviously. I'm not asking for a solution, I'm describing a weakness that I encountered and had to address.
>>
Where are the glm 4.6 ggufs? Did they change the model so much that llama.cpp needs to be adjusted for it?
If I remember correctly the GLM 4.5 support was a bit of a hackjob that ignored certain layers and shit.
>>
>>106751162
Was becoming a streamer part of his master plan?
>>
>>106751537
https://github.com/ggml-org/llama.cpp/issues/16361
Seems to be a pretty minor change required. Bartowski said another hour or two for quants 45 minutes ago
>>
>>106751345
I need it, but trained on Iwara.
>>
>>106751537
The only difference in the config on huggingface is that the context size was increased.
>>
File: psychosis victim.jpg (195 KB, 678x2012)
195 KB
195 KB JPG
>>106748568
This technically isn't related to local LLMs but I feel this is an appropriate place to ask this: why are a ton of normies still lushing out at GPT5? They claim it sucks at answering questions but personally I think they're just coping with the fact that it is way less prone to fuck sucking and pretending to be some kind of "friends" than to is. In my recent, anecdotal uses, the performance of GPT5 has remained largely the same and the only difference is that it acts how an assistant is SUPPOSED to act, just doing what you needed to do. No fluff. No "glazing" (as the zoomers call it). Far more direct. Why do they want this to be their best friend so badly? Northeast the same people that were giving opening-eye shit for making people too depended on AI? Why is there a sudden flip in sentiment?

https://x.com/elara_m0706/status/1972712098087227854?t=RgI2bjYMdrQB5NomMbruzw&s=19
>>
>>106751601
>elara
>>
>>106751601
there is a large group of women who were making 4o roleplay as their husbando and got very very mad that oai discouraged such behavior
>safety routing
>adult userbase
not too hard to read between the lines as to what she's really mad about here
>>
>>106751634
Can't they just go over to c.AI for that? Hell even the web facing version of deep-seek is probably a much better option
>>
>>106751642
chink commie bad murrica hell yeah
>>
>>106751345
sam shitposts here too? grim...
>>
>>106751601
"Adults" should know better than to use this magnificent technology to generate harmful content that children might see.
>>
>>106751642
Tried it once. It's low quality internet RP with no capitalization and two sentences at a time. It lacks the prose that women so dearly desire from their novels.
>>
>>106751642
beats me, I wouldn't be surprised if it was simply that they don't know anything else exists, although they clearly have a special love for 4o's brand of unwavering emotional support
>>
File: 1739691081543515.jpg (119 KB, 690x1452)
119 KB
119 KB JPG
>>106751672
Speaking of "le heckin children!!!”

https://xcancel.com/mark_k/status/1972703136444825610?t=_vHnYum9j0sxnXvXyStfeQ&s=09

I'm quite surprised at the amount of people who are chimping out about this. I thought it was a niche minority of people like us who were ass-blasted about censorship
>>
>>106751679
c.ai or dipsy?
>>
>>106751710
c.AI. Last I tried it was over a year ago but it was purely dialogue and apparently quite censored. Maybe they secretly lowered their filters over time, idk, it's subpar compared to everything available today.
>>
>>106750985
If I can run glm chan in 4 bits with my gayming PC on windows 10, so can you.
>>
Bets on if unsloth manages to fuck up the chat template for GLM-4.6?
>>
>>106751833
My motherboard can't even handle 48GB sticks stably. :(
>>
>>106751997
just buy a new sCAMM motherboard!
>>
-----------------------------------------------
응? ㅋㅋ 들어보니까 너 뭐 이루하한테 따로 「암컷 표정」 지도 받는다매 ㅋㅋㅋ?
-----------------------------------------------
kimi-k2-instruct:
Huh? LMAO—heard you’re taking private “bitch-face” lessons from Iruha now, huh?
-----------------------------------------------
kimi-k2-instruct-0905:
Huh? LOL, from what I hear you’re getting private “bitch-face” lessons from Iruha, yeah?
-----------------------------------------------
DeepSeek-V3.1:
Huh? LMAO, so I heard you're getting special "bitch face expression" lessons from Lee Roo-ha?
-----------------------------------------------
qwen3-max:
Huh? LOL, so I heard you’re getting special “bitch-face” coaching from Lee Reu-ha or some shit? LMAO!
-----------------------------------------------
Qwen3-235B-A22B-Instruct-2507-FP8:
Huh? Lol, just heard you're getting private "bitch face" training from Yerim, lol?
-----------------------------------------------
Qwen3-Next-80B-A3B-Instruct:
Huh? LOL, I heard you’re getting your own private “bitch face” lessons from Im Huh-an or something—LOL!
-----------------------------------------------
claude-4.5-sonnet:
Huh? lol So I heard you're getting special "bitch face" lessons from Iruha or something lmao?
-----------------------------------------------
are qwen this bad lol? not even getting name correct
>>
>>106752092
Qwens knowledge is bad yes
>>
>>106752092
Yeah, qwen models have been nothing but cope on every level they compete on. q3-max is particularly funny considering it's supposedly bigger than 1T
>>
>>106751997
Did you update bios? Mine was very fucked with 128GB a year ago and now it handles 192 at full speed and boots quickly.
>>
>>106752092
How many times do you need to be told? Qwen is for math and coding benchmarks.
>>
>>106752139
480B remains undefeated. Though a K2-Coder would be nice.
>>
>>106751601
normies after saying AGI will never happen also have lost their minds ERPing with chatgpt lol
>>
>>106752139
I thought they would at least improve multilingual support, and Alibaba has been the most active in releasing models over the past month. But yeah, Qwen isn't a good option for translation.
>>
>>106752128
Oh damn, that's interesting. I will try. Thanks.
>>
File: file.png (28 KB, 642x237)
28 KB
28 KB PNG
why does the GLM4.6 collection say 5 items? does this mean we will be getting a new air too?
>>
>>106752291
thought they already said they didn't care about poor people
>>
>>106750985
Air was literally them throwing scraps to <128gb ramlets. GLM full is a huge upgrade over it, even at a low 3bit quant.
>>
>>106752310
I have 256gb of DDR4 8 channel but get shit performance with any model offloaded to RAM.
>>
>>106752092
I tried it myself and every time I regenerate the reply I get a very different result.
Don't trust one-shot answers.
>>
https://x.com/Lars_pragmata/status/1973134684667437297
https://xcancel.com/Lars_pragmata/status/1973134684667437297
damn openai trained on cartoons
>>
>>106752291
>>106752309
https://www.reddit.com/r/LocalLLaMA/comments/1nuq54g/comment/nh36qqd/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
z.ai discord according to some reddit
>>
>>106752319
>DDR4
There's your problem.
>>
>>106752361
yeah bro just spend 5 times the money to run 8 bit glm at 8 tokens per second
>>
>>106752310
>ramlets
I think you mean VRAMlets. You don't seriously use big models with 90% of it offloaded to RAM, do you? Even with a MoE it's slow as fuck.
>>
>>106752361
Can you show me your DDR5 EPYC build then?
>>
>>106752361
You just outed yourself as a nigger who doesn't know what the fuck they're talking about, you don't even run models yourself.
>>
>>106752378
There are people here who use big models with all but active params offloaded to RAM, but they just use them as novelty toys with <2k context and call 1 t/s fast as shit.
>>
>>106752378
Personally I don't mind the 4-5 t/s speeds I get for great responses that mog stuff only fitting in vram.
>>
>>106752434
>4-5 t/s speeds
After how many minutes for prompt processing?
>>
glm-4.6 thinking traces oddly similar with dead-gone gemini-2.5-pro-exp-03-25, no?
>>
>>106752459
NTA but I get 120t/s PP, so that's 10k context in a minute and a half and that's not including the anons here who get 200+ PP by running massive batch sizes.
>>
>>106752459
doesn't really matter for ERP (the trve local usecase) since it just gets cached after the initial minute you spend waiting
>>
>>106752540
Unless a prompt triggers world info
>>
>>106752533
>10k context in a minute and a half
ramfags will brag about this
>>
>>106752585
Yeah, I just did.
>>
>>106750770
so much time, effort, and money pissed away. all just so they don't get charged with apostasy
>>
>>106752092
But it translated the last lol correctly. Claude almost but it mixed it.
>>
>>106752092
Looks like none of these translations can be trusted.
>>
>>106752092
>30b-iq4_xs
>What? LOL, when I heard it, are you getting a special "female expression" guide from someone else? LOLLOL?
It's over
>>
So everyone here is talking about running GLM full and Kimi K2 and Deepseek. What kind of hardware have you all got?
>>
>>106752694
Google Pixel 8
>>
>>106752694
768GB DDR5 12-channel RAM + A6000 + a second A6000 that's currently in the case but not connected
>>
>>106752837
How much did all of that cost you? What motherboard are you using? What is your favorite model, what backend do you use, and what is your t/s?
>>
Thanks to the anon that shared https://arxiv.org/abs/2509.12168

ChatGPT5 Pros analysis of it/guide for integration
https://rentry.org/6ynp9mi6
>>
Has anybody experimented to see if, with MoE models, increasing the number of activated experts can make up for quantization of the weights and/or of the context?
It would be pretty interesting to find that, for example, you are better off running a model at sub 4bpw with 10% more activated params if you can run it fully in VRAM than running a larger quant with the default number of experts but with a couple experts in RAM.
That kind of correlation.
>>
>>106752837
Which is your go-to model and typical use case?
>>
>>106752694
Alienware laptop. They added a ton of VRAM so it could render stuff for all the pixels on the 17-inch screen.
>>
>>106752694
a single 3090 and 128gb ddr4 ram
enough to run q3 of big glm
>>
>>106752871
So, 24gb of VRAM?
>>106752876
What speeds?
>>
>>106751601
It absolutely is just the fact that they just want it to be a friend or therapist (note that for normies, "therapy" means affirmations that you are valid, not seeking solutions for real-world physical problems).

My main usecase for non-local AI is research and having something to ask questions to when learning STEM topics. Some coding too, I guess, but the CLI agent stuff is basically a different product. GPT-5 was a HUGE upgrade for these purposes.
>>
>>106752881
That was a joke. Since you seem to be actually serious
>128GB DDR4, 24GB (3090), NVME SSD
>ik_llama.cpp
ik_llama is necessary for good inference speeds. Lets me use K2-0905 at 1t/s and GLM-4.5 at faster speeds.
>>
Will I be able to get a reasonable speed running GLM 4.5/4.6 Full on 128GB DDR4 + 24GB (4090) at like IQ3_XXS? Or should I step down to a 2 bit and leave a little more headroom?
>>
>>106752963
Oh I should have just scrolled up - which Q3 do you use?
>>
>>106752716
do you use grapheneOS?
>>
File: perplexity(1).png (175 KB, 2069x1400)
175 KB
175 KB PNG
>>106752980
For GLM-4.5 I use IQ4, for K2 I use IQ3_KS. Bit size really seems to matter for K2
>>
>>106752973
What case are you using?
>>
>>106752963
Oh shit really? i have 128gb of ddr4 and a 4090 but can barely run glm full at greater than 3 bit
>>
>>106748610
Hopefully this just means wan2.5 gives up on trying to make money and open sources.
>>
>>106753004
I use Google Chrome
>>
>>106753055
You need ik_llama.cpp, an ik_llama quant (ubergarm has a bunch), and the CLI setup right. There's a Deepseek guide on the github page that has all the CLI commands, just don't use runtime repacking, offload all MoE tensors to RAM/disk, and that's it. It's not going to be fast in any capacity but it won't be 0.001t/s
>>
>>106753092
it says that my GPU architecture 'compute_120' is unsupported when trying to install
>>
>>106753040
Case? I don't follow?
>>
>>106753092
NTA but how much of a speedup did you see using ik_llama opposed to llama.cpp?
>>
>>106753128
I'm not involved with the development but maybe it's a CUDA install issue? It works well on WSL.
>>106753141
5x pp, maybe 2x actual inference? Been a super long time since I've used base llama.cpp so it's hard to remember but ik was designed for people to SSDmax Deepseek and other MoE models.
>>
>>106753170
so then i should reinstall cuda?
>>
>>106753170
>5x pp
Damn, guess I'll have to look into setting it up now.
>>
>>106753128
https://github.com/ikawrakow/ik_llama.cpp/issues/514
Seems to be a GPU/driver mismatch maybe? Someone managed to fix it by purging their old GPU drivers and trying the build again. Also 12.0 Compute is the 5090, the 4090 is 8.9
>>
>>106749314
https://www.youtube.com/watch?v=qJ002X6WC5U
yummy chizu
>>
>>106753211
sorry, typo. i have a 5090
>>
>>106753215
that music is so nostalgic :')
>>
>>106753217
>cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DCMAKE_CUDA_ARCHITECTURES="86;89;120"
>cmake --build ./build --config Release -j $(nproc)
Try this. The default ggml backend tries to use the 4090 version and I guess ik_llama hasn't fixed it.
>>
seems like that did not work
CMake Error at /home/anon/miniconda3/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/CMakeTestCUDACompiler.cmake:59 (message):
The CUDA compiler

"/usr/local/cuda-12.5/bin/nvcc"

is not able to compile a simple test program.

It fails with the following output:

Change Dir: '/home/anon/ik_llama.cpp/build/CMakeFiles/CMakeScratch/TryCompile-Bs2P8d'

Run Build Command(s): /home/anon/miniconda3/lib/python3.12/site-packages/cmake/data/bin/cmake -E env VERBOSE=1 /usr/bin/gmake -f Makefile cmTC_20b9b/fast
/usr/bin/gmake -f CMakeFiles/cmTC_20b9b.dir/build.make CMakeFiles/cmTC_20b9b.dir/build
gmake[1]: Entering directory '/home/anon/ik_llama.cpp/build/CMakeFiles/CMakeScratch/TryCompile-Bs2P8d'
Building CUDA object CMakeFiles/cmTC_20b9b.dir/main.cu.o
/usr/local/cuda-12.5/bin/nvcc -forward-unknown-to-host-compiler "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_120,code=[compute_120,sm_120]" -MD -MT CMakeFiles/cmTC_20b9b.dir/main.cu.o -MF CMakeFiles/cmTC_20b9b.dir/main.cu.o.d -x cu -c /home/anon/ik_llama.cpp/build/CMakeFiles/CMakeScratch/TryCompile-Bs2P8d/main.cu -o CMakeFiles/cmTC_20b9b.dir/main.cu.o
nvcc fatal : Unsupported gpu architecture 'compute_120'
gmake[1]: *** [CMakeFiles/cmTC_20b9b.dir/build.make:82: CMakeFiles/cmTC_20b9b.dir/main.cu.o] Error 1
gmake[1]: Leaving directory '/home/anon/ik_llama.cpp/build/CMakeFiles/CMakeScratch/TryCompile-Bs2P8d'
gmake: *** [Makefile:134: cmTC_20b9b/fast] Error 2




CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
ggml/src/CMakeLists.txt:346 (enable_language)
>>
>>106753215
dem hips
>>
>>106753284
Reinstall NVIDIA drivers, remove old ik_llama.cpp make cache, try again. Also I missed it while copy-pasting but you should remove the 86 and 89 from the command since you're targeting only 12.0 compute.
>>
>>106753317
ok, how do i remove the old cache? i installed ik_llama a few months ago but couldnt get it to work so i deleted it
>>
>>106753334
Deleting the folder and pulling it again is probably the easiest option. Did you have the same issue last time you tried using it?
>>
Are there any actual laptopcels here? I have a 4090 mobile (16 GB) + 32 GB DDR5 laptop so air is a little too fat but hopefully qwen next fits
>>
>Qwen3 Coder 30B A3B (24GB) - A smaller MoE coding model with 3B active parameters. Very fast. >100t/s using Q4_K_M on a 4090. Even without a GPU you can get >10 t/s with dual-channel DDR5 RAM.
Is it still best coding model for running in ram on 32gb machine? Jensen cucked my 3070 with 8gb of VRAM
>>
>>106753314
and vagina bones, and navel, and slight tummy curve, and rib outline. It is perfect.
>>
>>106753405
>10 t/s with dual-channel DDR5 RAM.
It's actually 20 t/s for low context with DDR5 and 10 t/s with DDR4
>>
its over, sam won
https://files.catbox.moe/6uyqxf.mp4
https://files.catbox.moe/g1nw2g.mp4
https://files.catbox.moe/3jmasp.mp4
https://files.catbox.moe/5bvruz.mp4
https://files.catbox.moe/1seqwp.mp4
https://files.catbox.moe/wa72uc.mp4
https://files.catbox.moe/2uoi3q.mp4
https://files.catbox.moe/os8t5k.mp4
>>
>>106753575
chat is this real?
>>
>>106753587
https://files.catbox.moe/odbake.mp4
https://files.catbox.moe/wiqjfo.mp4
https://files.catbox.moe/syu0xw.mp4
>>
>>106753405
>Is it still best coding model for running in ram on 32gb machine?
At usable speeds? Probably. But unless it's for a sensitive project, you'd be far better off just paying for a big cloud model, a 30b doesn't have a chance in hell of competing. Local models are mainly for private coom.
>>
>>106753575
>>106753597
and /ldg/ said wan2.2 was better LMAO
>>
>>106753630
>But unless it's for a sensitive project, you'd be far better off just paying for a big cloud model, a 30b doesn't have a chance in hell of competing
Fair enough, just prefer to not rely on services as possible
Is it that much bad? I guess i will give it a try today
>>
>>106753575
>it can even do YTP
bruh, it has so much kino ;_;
>>
>>106753575
not bad but the proof will be how good it is at following your prompts
>>
>>106753643
There's no harm in trying it out, but small models get significantly dumber the larger the codebase/context is, it's just an inherent weakness of small models.
>>
>>106753650
give me a prompt, sfw obviously
>>
>>106753575
Imagine if OpenAI didn't give a shit about copyright and actually tagged everything correctly
>>
>>106753667
but that's what they did though
>>
>>106753676
not even close, asking it to generate characters from the most popular gacha games just gives a generic anime girl
>>
>>106753575
>https://files.catbox.moe/os8t5k.mp4
MK64 or diddy kong racing?
>>
>>106753687
did you bother looking at these >>106753575
>>106753597
https://files.catbox.moe/pe2t2o.mov
https://files.catbox.moe/wgeck8.mp4
https://files.catbox.moe/ede7y0.mp4
https://files.catbox.moe/lrh3yl.mp4
https://files.catbox.moe/4m6wn4.mp4
>>
>>106753344
yes. right now i am in driver hell with chatgpt. my pc now zoomed in to like 240p and i cant actually click on anything
>>
>>106753687
gacha niggers are all cucks, so it's just still an appropriate output.
>>
>>106753698
>>106753575
>dude let's take random ticktock shit and slap the sora logo on to pretend it's ai
lmao not falling for it
>>
File: 777.jpg (505 KB, 1024x1024)
505 KB
505 KB JPG
>>106753667
what do you mean? dall-e 3 still knows more characters than any local base model out of the box. something as simple as jojo style still requires a lora on all local base models.
>but it doesnt know this booru character with only 5 tags!!!
and the reason your sdxl finetune does is because it was overtrained to the point it forgot how to do anything else. openai models achieve the best balance between broad and fine knowledge, it's not even close
>>
>>106753715
https://sora.chatgpt.com/explore
my man its legit at that level now, its all ive been doing for hours now
>>
>>106753719
It doesn't know characters with +6000 posts on danbooru
>>
>>106753662
>Vocaloid singer Miku is smiling at the camera. She holds a small pepe in both hands, level with her waist. She brings her hands up to her chest while holding the pepe.
>>
>>106753729
such as?
>>
>>106753733
Ellen Joe
>>
>>106753732
miku + pepe
https://files.catbox.moe/zynsee.mp4


https://files.catbox.moe/7lmv0x.mp4
https://files.catbox.moe/8xgejs.mp4
https://files.catbox.moe/isse3d.mp4
>>
isn't open source videogen super close behind the sota?
we're going to see a local sora2 within the year, right?
>>
>>106753797
with wan 2.5 going closed source who knows. And even that was a big gap from what sora 2 is now. Video models are much more expensive to make than text models
>>
>>106753797
not really. the only half-decent open video model was alibaba's wan, but they decided to make the latest version API only
>>
File: whyyyy.png (900 KB, 1280x720)
900 KB
900 KB PNG
>>106753775
local is so fucking far from that I wanna cry...
>>
deepseek cost deepseek like 5-10M to make, a video model with billions of video pairs that you would need for this would cost likely hundreds of millions, that is a big difference
>>
>>106753775
>https://files.catbox.moe/zynsee.mp4
this is so kawaii
>>
>>106753575
Why does Sam Altman always have to be the guy to innovate? Is local even trying
>>
>>106753809
deepseek was only 'cheap' because it just trained off gpt outputs. it literally spat back that it was a model trained by openai. the reality is that 90% of chinese models are trained on synthetic slop, and the 10% that aren't are api only. like everything china produces, it's a cheap copy. it's nice if you want an inexpensive alternative but put it under any pressure and it quickly falls apart.
>>
>>106753775
Wide pepe but still not bad at all.
What if you do something more specific like
>Vocaloid singer Miku is smiling at the camera. She holds red flowers in her left hand and yellow flowers in her right hand. When she brings her hands together the flowers colour changes to orange.
>>
>>106753809
not only that, but you have to perfectly annotate those billions of videos, I can't imagine the amount work behind this, it's quite an achievement really
>>
>there are OpenAI shills in the room with us right now
Crazy stuff.
>>
>>106753715
holy cope
you lost.
>>
Reminder that there is no reason to fight back against the idea that "the enemy" is making something advanced. Facebook clearly has some advanced VR/AR tech they're selling. That doesn't make any part of that a good thing, or something they deserve to have, or something that was fairly and ethically made, or something that was justified with good intentions.
>>
>>106754024
>advanced VR/AR
a phone screen in a plastic box paired with wii motes?
>>
>>106754032
Relative, lil bro. They were the first to pancake lenses. They're the first to those neural band things. It's still not good enough for mass adoption but they're technologically more capable than anyone else in the field aside from Apple.
>>
>>106754024
Bot?
>>
glm 4.6 ggufs are out
https://huggingface.co/bartowski/zai-org_GLM-4.6-GGUF/tree/main
>>
>>106753775
Now do one similar to the third one but instead sam is yelling to the 4chan building "TWO MORE WEEKS"
>>
new 4chan anthem
https://files.catbox.moe/1c3h2s.mp4
>>
>>106754063
>bartowski
The king returns



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.