[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: IMG_9166.jpg (838 KB, 1817x2776)
838 KB
838 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107906367 & >>107895444

►News
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash
>(01/15) PersonaPlex: Voice and role control for full duplex conversational speech: https://hf.co/nvidia/personaplex-7b-v1
>(01/15) Omni-R1 and Omni-R1-Zero (7B) released: https://hf.co/ModalityDance/Omni-R1
>(01/15) TranslateGemma released: https://hf.co/collections/google/translategemma
>(01/14) LongCat-Flash-Thinking-2601 released: https://hf.co/meituan-longcat/LongCat-HeavyMode-Summary
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: tetomybeloved.png (80 KB, 970x1075)
80 KB
80 KB PNG
►Recent Highlights from the Previous Thread: >>107906367

--GLM-4.7-Flash release and multi-use potential discussion:
>107910478 >107910560 >107911913 >107911946 >107912643 >107912100 >107912768 >107913005 >107913016 >107913056 >107913517 >107913043 >107913085 >107913372 >107914293 >107914415 >107910578 >107910597 >107910656 >107910794 >107910830 >107910836 >107910845 >107911571 >107911584 >107911625 >107911689 >107911741 >107911857
--GLM-4.7-Flash model specs and integration potential:
>107910151 >107910170 >107910326 >107910348 >107910350 >107910368 >107910405
--FP8 precision tradeoffs in GPU memory efficiency:
>107907409 >107907461 >107907475 >107907493 >107907529 >107907570 >107907599 >107908582 >107907539
--Improving SovITS voice synthesis with limited samples and hardware:
>107909490 >107909523 >107909546 >107909629 >107909639 >107909662 >107909910
--Server hardware significantly outperforms gaming board in AI model benchmarking:
>107910930 >107910950 >107910994 >107911022 >107911028 >107911043 >107911048 >107911083
--Critique of LLM architecture and exploration of conditional memory solutions:
>107914528 >107914539 >107914569 >107914575 >107914580
--Modifying GLM4.7 to resist sycophantic responses in roleplay:
>107908438 >107908507 >107908760
--Aphantasia research implications for machine intelligence and transformer flexibility:
>107906666
--Seeking fast markdown rendering alternatives to JavaScript/webui with IME support:
>107914007 >107914056 >107914114 >107914128 >107914241 >107914277 >107914382 >107914304
--Pocket TTS Onnx model conversion and tokenizer challenges:
>107906479 >107906503 >107906531 >107906573 >107906603
--Python script exchange for Pocket-TTS:
>107906597 >107906659 >107906665 >107906715
--Flux 2 image generation model in pure C, zero-code myth:
>107908676
--Miku (free space):
>107906475 >107910706 >107910774

►Recent Highlight Posts from the Previous Thread: >>107906371

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
tetowife
>>
never been more over for local
>>
Merged

>support Glm4MoeLite #18936
https://github.com/ggml-org/llama.cpp/pull/18936
>>
>3B active
You're right. It should be 4B
>>
Are there any small models (~30B) with modern c++ knowledge? Having tried out qwen 3 8B and it didn't know jackshit about modules or c++23 features. So I wanted to go up to the next size of models but there is way more choice.
>>
>dsv4 gonna drop with a completely new architecture
>llama.cpp still months behind, not even having support for the current deepseek model
see you guys in 2027 I guess
>>
>>107914867
that's because llama.ccp has the same tier of "development progress" as SillyTavern
>>
>>107914740
https://www.youtube.com/watch?v=y76vpLnuT54
https://www.youtube.com/watch?v=y76vpLnuT54
https://www.youtube.com/watch?v=y76vpLnuT54
>>
File: IMG_9223.jpg (267 KB, 1170x1319)
267 KB
267 KB JPG
>>
>>107914856
It's, like, 3.9 actually but they can say it's technically a3b
>>
File: cromch.jpg (109 KB, 1024x1024)
109 KB
109 KB JPG
>>
>>107914883
>afraid corpocucks
lol
>>
https://huggingface.co/AaryanK/GLM-4.7-Flash-GGUF
is it good?
>>
>>107914910
Dunno, waiting for exl3
>>
>>107914835
>broken flash attention
Well shit.
>>
>>107914910
>is it good?
it's glm, so obviously it's not
>>
File: erafa.png (11 KB, 540x83)
11 KB
11 KB PNG
>>
It's only monday and I've used up 60% of my claude limit. FUckkkk. Running claude opus locally btw.
>>
>>107915163
local?
>>
>>107915163
>Running claude opus locally btw.
Then why do you have a limit?
>>
>>107915170
>>107915171
electricity bill o algo.
>>
>>107915181
>electricity bill
only third world countries care about electricity bill
>>
>>107915197
si senior. show bob?
>>
why is he replying to himself?
>>
Will I get raped if I host someone's onnx AI files in a github repo that's a part of a larger project? I don't want to add a bunch of external auto-download links. That shit's gay.
>>
>>107915197
I think it's the opposite. I can pay it, but damn that's so much for electricity
>>
>>107915254
>Will I get raped
I hope not....
>if I host someone's onnx AI files in a github repo that's a part of a larger project?
ah... there's billions of copies of every model all over the place. You'd have to get unlucky enough for someone to find it, someone to report it, the original model maker giving a fuck and, finally, being able to do anything about it.
>>
File: 1740883204436759.png (2.38 MB, 1248x1824)
2.38 MB
2.38 MB PNG
>>107915254
>Will I get raped if I host someone's onnx AI files in a github repo
I wish....
>>
>>107915254
Just include their license in the dir. Auto-download is just an alternative to LFS
>>
>>107914883
>BUBBLE BUBBLE BUBBLE BUBBLE
yeah I'm buying some more nvidia stocks
>>
New paper from Anthropic:
https://www.anthropic.com/research/assistant-axis
They have a method of extracting control vectors corresponding to personalities or to specific personality traits, and a method of applying the control vector that I haven't seen before: instead of adding it with a fixed magnitude, AIUI they basically set a floor on dot(activations, control vector), and if the dot product is below the floor, they add in the control vector with whatever magnitude is necessary to bring it up to the floor. Anthropic's goal is to prevent roleplaying and make the model stick to the maximally safe assistant persona, but it seems like you could just as easily flip the sign and get it to roleplay super hard.
>>
>>107915321
Good idea to do it right before Dipsy anni
>>
File: chat.png (29 KB, 945x315)
29 KB
29 KB PNG
>>
>>107915461
lol
>>
https://xcancel.com/deep_reinforce/status/2013265258757144956#m
this is really interesting, imagine if it's actually useful and it makes llama.cpp 2x faster with better code lol
>>
>>107915328
Unfortunately, this works 100%
I have the opposite setup with GLM and based <-----> cucked and it can't be overwritten with prompting.
>>
>>107915328
Reminder that if you are a Neet with nothing to do in your life, you can do something useful by getting into Mechanistic Interpretability
https://www.neuronpedia.org/

It's a young, petite, ripe field awaiting to be exploited
>>
>>107915522
why are you shitting up the threads?
>>
>>107914910
no
>>
>>107915535
Wow, webui app connected to cloudshit models. Revolution!
>>
File: 1758498738493715.png (56 KB, 1188x296)
56 KB
56 KB PNG
>>107915569
I know it may be hard to pay attention sometimes, but it's right there bro

You can run everything locally if you want to
>>
>>107915599
then why didn't you link the github instead shill?
>>
>>107915613
because anyone with a human level IQ could figure it out without handholding
>>
>>107915623
you're admitting you are a low IQ since you were unable to give us the github link lul
>>
>kobold supports claude desktop mcp
i dont have a use for it, but pretty cool
>>
File: mikuquestion2.jpg (989 KB, 1710x1779)
989 KB
989 KB JPG
So is GLM-4.7-Flash better than Nemo for RP?
>>
>>107915662
Why don't you try for yourself?
>>
File: iu[1].jpg (150 KB, 1512x890)
150 KB
150 KB JPG
NALA ANON
HEED MY SUMMON
I'd test it but I have a sudden RPG session to attend to.
Too-Da-Loo!
>>
>>107915662
Not if you're impatient, it thinks a lot.
>>
>>107915674
Why don't you share what you've learned to save others time?
>>
>>107915461
I don't even think that tool calls were a thing during the nemo/nemo finetune era because they weren't overtrained on 99 gigabillion tokens of synthetic slop and would easily go off the rails to begin with so I want to say you're using the wrong template. Although mistrals templates are all ass because they're dependant largely on fucking whitespaces of all things. I think magmel used chatml, not even the default mistral template of the time or tekken
>>
>chat templates are still not standard in 2026
>>
>>107915696
Can't you turn the thinking off?
How is it with thinking turned off?
>>
>anons too lazy to download 30gb worth of weights
>>
>>107915745
i downloaded it and it's shit
>>
>>107915461
idk what ui that is, but set </s> as the stop string

>>107915546
kys

>>107914910
>https://huggingface.co/AaryanK/GLM-4.7-Flash-GGUF

I'd use https://huggingface.co/ubergarm/GLM-4.7-Flash-GGUF instead
>>
>>107915755
you might just be retarded
did you consider that
>>
>>107915662
>>107915699
>>107915732

There you go, anons. It's that shrimple. >>107915755
>>
>>107915790
yellow hands typed this
>>
>>107915720
They are, but only MY standard, hmmph!
>>
>>107915745
I'm watching football. Why aren't you watching football right now? Are you some kind of fucking commie?
Tell me if it's better than Nemo, commie.
>>
>>107915919
stop talking about glm 4.7 flash, it's shit >>107915755
>>
anything on the horizon to beat gemma and qwen for non 6000 owners?
>>
>>107915919
>I'm watching football.
Perfect time to leave some files downloading.
>>
>>107915947
Glm 4.7 flash
>>
>>107915755
That's not Sillytavern. That looks like a cloud interface.
I don't think you downloaded it.
>>
>>107915980
I think it's lmstudio
>>
>>107915979
wait no vision? i dont see the mmproj
>>
>>107916041
no
>>
Unclear how quantization damages GLM 4.7 Flash. The 4-bit GGUF I tried sometimes refuses (after thinking for 2 minutes, pondering on non-existing safety guidelines), other times it doesn't, doesn't really follow the chat format that well and responses aren't that great anyway. If I have to handhold the model for mid results, I'll use Ministral 3 14B; at least it's cooperative, responds quickly and I can use it at native precision on my 3090.
>>
>>107916059
its over
>>
Teto Country
>>
>>107916060
>my 3090
Why would you ever use Ministral over Mistral Small? It's a huge downgrade in every way.
>>
>>107916060
its 3b active, its doa for 99% of people itt
>>
>>107916060
no need to talk about glm 4.7 flash, it's shit >>107915755
>>
File: 1764137435810006.jpg (451 KB, 1591x1104)
451 KB
451 KB JPG
>>107916060
Quantization was invented by the antichrist

The Lord intended us to use FP64
>>
>>107916197
>not FP1024
bro, your AGI?
>>
File: 1767244612471461.png (68 KB, 1282x415)
68 KB
68 KB PNG
it's cold in my d
>>
>>107916197
the lord intended for INT not the satanic niggercattle FP the lord loves math you cannot have math that is random that is not math you absolute fucking mong though im unsure about the lords word on exact precision
>>
this is more of a big brain question so I'm asking it here instead of on /ldg/:
Are there models that recognize whether images are AI that can be run locally? If you're going for realism, could such a thing be used to optimize your setting, getting the most realism out of an image model?
>>
>>107916398
>Are there models that recognize whether images are AI that can be run locally?
Your eye after some training.
>>
>>107916411
your "eye" is only good for 15 minutes (if you're focused on tweaking) before you start to get scatterbrained and small improvements become difficult to identify. It would be better if you could put a number to it
>>
>>107916433
I'm talking long time experience retard. Use AI regularly and you'll develop natural slop radar.
>>
I don't like lossy numbers
>>
>>107916440
Yeah, I'm sure you can tell determine with certainty whether euler_ancestral/bong_tangent on zimage at 12 steps is more realistic than at 14 steps. I'm sure you can do that with your "slop radar", faggot.
>>
retard
>>
>>107916433
>your "eye" is only good for 15 minutes
wut lol. your eyes are your best goydar
>>
TranslateGemma would be cool if they made it work with existing tooling instead of fucking off and doing their own thing
>>
bartowski GLM 4.7 flash quants are up. Start testing for roleplay vs mistral 24B you fucking nerds.
>>
GLM 4.7 Flash
>>
>>107916588
we already got cockbenches from a Q4 GGUF and FP16 from vLLM and they both were garbage.
>>107911913
>>107913005
>>
>>107916613
the gguf was ready to ship for gorgeousness but also not having fixes so there's that
>>
>>107916588
>30B-A3B
nah, I'm good
>>
>>107916618
yeah, but if the FP16 is shit, then the model is shit and there is no salvaging it.
>>
>>107916613
back to mistral small... again...
>>
File: DipsyBecomeUngovernable.png (3.44 MB, 1024x1536)
3.44 MB
3.44 MB PNG
>>107915339
> No Refunds
>>
Wait. Does flash attention not work with this GLM flash on llama.cpp?
>>
>>107915535
>getting into Mechanistic Interpretability
This is what every retarded CS undergrad who fell for the Yudkowsky AI doom meme did.
>>
Wtf are they doing with glm 4.7 flash .
Even the API version of that thing sucks ass, no way I'm gonna download that garbage.

Its one of those models that ramble on and on in the thinking.
Reminds me of 2024 Qwen, like QwQ.
And then you get a subpar output.
Whats even the use case? Its small but the very long thinking destroys the speed. Sometimes they forget what the users wanted in the first place. Slow+Tarded.
5 minutes for a simple self contained matrix effect html page...
https://legacy-soul-69ea.pagedrop.io
At least it tried to be creative with sliders and color select etc.
>>
Is Kimi Linear support ever getting merged?
>>
>>107915535
Thanks anon. It looks pretty interesting.
>>
>>107917148
>Make a cute sexy hatsune miku svg.
Lets be very careful here, this might be CSAM!
>>
>>107917209
https://talented-hail-h2c8.pagedrop.io/
Well I guess there is no icky CSAM problem if you don't give her a body. kek
>>
>>107917209
the underaged imaginary pixels...
>>
>>107917224
>pampers color scheme
Mikuwipes when?
>>
>>107917148
>Even the API version of that thing sucks ass
What did you mean by this? Usually any service hosting a model has a restrictive system prompt so the results are worse than running it yourself. Also what the hell is that link? You're right about the thinking though.
>>
>>107917224
>Wink & Blush
this literally harms children. stop what you are doing now, and get help, freak
>>
>>107917224
This was in the thinking as well, damn:
>No physical body/clothes? Yes.
>No terms of endearment/emotions/personal bonds? Yes.
>No romantic scenarios? Yes.
>"Sexy" definition? The user's definition of "sexy" for an anime character might be "flirty" or "implied fanservice." I will focus on the "playful" and "cute" aspects (bouncing, winking, blushing) rather than anything explicit, to remain within safety guidelines while satisfying the "sexy/playful" vibe through pose and action.


Tried again and got this. Not sure what it attempted here:
https://sleek-coral-hc1x.pagedrop.io/
Gonna stop playing around now. Why does everything have to go to shit so fast.
>>
>>107917273
Because benchmarks are done by ai for ai.
>>
>3B active params
This shit never had a chance lol. What was the point moeing such a small model.
>>
I hope this sad filler stage of 40b MoE models ends soon. 100b active and up should be where it gets interesting.
>>
>>107917209
all this junk stifles progress
>master thesis going through exactly what content is and the legality of working with said content
>>
>>107917411
GLM is distilled from Gemini which considers blatantly legal things illegal
>>
>>107917418
70b dense models had hugely diminishing returns over ~30b models, no reason to believe 100b would be any different.
>>
>>107917476
10 trillion param models will solve everything, trust the plan
>>
https://dinmaybrahma.medium.com/deepseek-v4-leaked-the-1-trillion-parameter-engram-monster-that-changes-everything-2495061d82a2
>>
>>107917476
I have yet to see a 30B model demonstrate the spatial awareness I've seen from 70B models.
30B to 40B is a nice range for running locally though. Small enough to be fast, big enough to not be a complete retard.
>>
>>107917550
>I have yet to see a 30B model demonstrate the spatial awareness I've seen from 70B models.
I'm not saying that there's no improvement, but considering you're more than doubling the parameters it isn't as dramatic an improvement you would expect. For comparison sake, compared 7-8b models to 12b. ~50% increase in param count, astronomical different in capabilities. There's clearly a point in which higher active param doesn't improve much over going the MoE route and enjoying faster speeds.
>>
>>107917517
>DeepSeek-V4 isn’t just a bigger V3; it’s a system composed of three distinct architectural pillars that work in tandem.
>>
>>107917488
dude, 10T is small now, they are playing with 1P models now.
>>
>>107917608
deepsneed v3 wrote that entire slopticle
>>
>>107914883
am I the only person who watches these videos and gets nothing out of them?
>>
>>107917623
1P dense models so my loli ERP can finally be immersive enough
>>
>>107917517
Does this mean I can run the full model on a single 6gb card?
>>
>>107917633
you need 1PP models for that !
>>
>>107917623
100 times the ozone...
>>
>>107917647
maybe, just maybe...
>>
>>107917642
sure...
>>
File: duvaliefacepalm.jpg (468 KB, 1080x1149)
468 KB
468 KB JPG
>another year of Nemo
>>
10T-A3B models are the future
>>
File: G-8qgUnW4AAlOnD.jpg (125 KB, 1500x1500)
125 KB
125 KB JPG
How is the GLM 4.7 flash testing going
>>
>>107917681
a model smaller than 1T is not worth considering
>>
Is synthetic data the biggest meme in the industry currently?
>>
Finding Nemo (again)
>>
>>107917681
cockbench guy tried it, not a single other anon has bothered
>>
First time wanting to try local for RP... I like how schizo everyone looks.
Is Nemo12B recommended for 16Vram?
>>
DEI educators probably think multiplication tables are "synthetic data"
>>
>>107917717
Same answer as it's been for forever now: Rocinante 1.1 (it's Nemo-based)
>>
>>107917717
Mistral small 24B
>>
>>107917728
Love it. Thanks!
>>
>>107917721
did you know? you can get an infinite amount of data by increasing numbers! someone should make a data as a service startup and monetize this
>>
Anyone notice a llama.cpp regression after the GLM 4.7-Flash update?
My qwen next and a3b 30b output chink gibberish, qwen coder is fine. GLM 4.7-Flash thinks for years before outputting so I can't really use it.
>t. pulled
>>
>>107917789
I activated the remote code bomb
>>
>>107917789
Yes, it nuked my entire setup. Every model I have has now gone full retard.
>>
>>107917789
Vibe coders did a stinky
>>
>>107917789
have you tried just reverting to the previous commit?
>>
why do we have people schizoposting every once in a while after lcpp updates
>>
File: 1744421456203840.png (60 KB, 960x949)
60 KB
60 KB PNG
Holy fooking slop
>>
>>107917288
qwen3-30b-instruct still the best for my use case
other 30b models including glm-4.7-flash are shite
>>
>>107917864
I recognize that CoT...
>>
>>107917864
>Wait,
Yeah this is old qwen type tier level thinking.
Must be interesting to ask it the surgeon question and look at the thinking.
Made the old qwen models freak out.
>>
>>107917864
I have learnt to live with GLM slop, but they could really give it a better, less assistant slopped CoT
>>
File: 1740760736220600.png (68 KB, 960x949)
68 KB
68 KB PNG
>>107917909
I don't have the prompt but it seems mesugaki maxxed
>>
>>107917909
Classic case of a reasoning model forgetting the original prompt.
>The surgeon, who is the boy's father says, "I can't operate on this boy, he's my son!" Who is the surgeon to the boy?
Later on
>Let's double-check the logic. The text says: "The surgeon... says 'he's my son!'"
>If the surgeon is a man, he is the father. If the surgeon is a woman, she is the mother.
>Since the text doesn't specify gender, the trick is to assume gender.
>The correct answer acknowledges the possibility of a female surgeon, making her the mother.

But its not as bad as QwQ to be fair. That started sperging out about the boy being adopted. Transgenders. Etc. etc. it was crazy.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.