[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1728618664952634.jpg (724 KB, 2000x1496)
724 KB
724 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106660311 & >>106649116

►News
>(09/22) RIP Miku.sh: https://github.com/ggml-org/llama.cpp/pull/16174
>(09/22) Qwen3-Omni released: https://hf.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe
>(09/22) DeepSeek-V3.1-Terminus released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Terminus
>(09/22) VoXtream real-time TTS released: https://hf.co/herimor/voxtream
>(09/17) SongBloom DPO released: https://hf.co/CypressYang/SongBloom/commit/4b8b9deb199fddc48964c851e8458b9269081c24
>(09/17) Magistral Small 1.2 with vision encoder released: https://hf.co/mistralai/Magistral-Small-2509

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: recap.webm (412 KB, 512x512)
412 KB
412 KB WEBM
►Recent Highlights from the Previous Thread: >>106660311

--Paper (old): The Unreasonable Effectiveness of Eccentric Automatic Prompts:
>106668291 >106668327 >106668378 >106668398 >106668483
--Papers:
>106661377 >106661406 >106661450 >106665807
--Introduction of Qwen3-Omni-30B multimodal models with enhanced audio captioning:
>106667138 >106667351 >106667489 >106667995 >106668101
--DeepSeekv3.1 benchmark performance compared to Terminus variant:
>106664721 >106664735 >106664775 >106665267 >106668845
--Evaluating 400GB/s passive b60 GPUs vs AMD MI50 for local LLM inference:
>106665443 >106665523 >106665648 >106665718 >106665711 >106665771 >106665816 >106665993
--DSPy GEPA for automatic prompt optimization and small model enhancement:
>106667467 >106667570 >106667873 >106668435 >106668811 >106668929 >106669118 >106669241 >106669525
--llama.cpp code cleanup PR deletes Miku.sh:
>106665121 >106665168>106665323 >106668586
--Skepticism surrounding project claims of converting CPU RAM to VRAM with minimal Python code:
>106668715 >106669029 >106669150
--iPhone 17 inference benchmarks and thermal performance analysis:
>106668549 >106668583 >106669605
--Proposing RL-trained small LLM for dynamic context optimization in roleplay applications:
>106668489 >106668546 >106668610
--OpenAI secures 10GW NVIDIA systems partnership with $100B investment:
>106666313
--VoxCPM installation errors with torch/cuda version conflicts:
>106661985 >106661992 >106662224 >106662278
--Model recommendations for 24GB VRAM:
>106668779 >106668803 >106668844 >106669443 >106669497
--Explaining the use of the abliterated Gemma 3:
>106666184 >106666218 >106666291
--Qwen Image Edit model update:
>106667630 >106667758
--Meta's ARE platform and Gaia2 benchmark for scalable agent evaluation:
>106665468
--Miku (free space):
>106665855 >106666689 >106666897 >106668381 >106668586

►Recent Highlight Posts from the Previous Thread: >>106660313

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
is the lazy getting started guide in the OP applicable to lower end hardware? i have a 1050ti and there are some personal things id like to work through with someone, but i cant bring myself to talk to a person about them
>>
oh yes tetoesday
>>
>>106671477
>RIP Miku.sh
Goddamn, I would've never thought that thing was integrated into llama.cpp. It was silly but it was the first thing I ever did with LLMs and I think it also ended up cementing Miku as an /lmg/ mascot of sorts. We've come so far.
>>
With gov locking down sites with age verifications, banning VPN and such. The local model would likely see a surge in popularity
>>
Tetolove
>>
>>106671517
>4GB VRAM
That's pretty bad. There are some utilitarian models that might fit, but they are shit as conversationists.
Might as well forget about GPU and run CPU inference only and learn some patience.
I wouldn't recommend LLMs for psychology use anyway, chances of being one-shotted into roping yourself are a little to big for my liking.
>>
>>106671520
it wasn't particularly integrated, just an example usage script tucked away in a random directory.
>>
>>106671574
i have a r5 5600x cpu, would that really be better?

i dont mind being patient, i essentially just want help drafting a letter to someone thats deeply personal and i really dont want to put it into chatgpt or something

how good could the llm i run locally be compared to chatgpt with a 5600x/1050ti?
>>
>>106671527
>banning VPN
who did dis
UK faggots?
>>
>>106671592

1) Some US states
https://reclaimthenet.org/michigan-bill-proposes-ban-on-vpns-trans-content-and-erotic-media

Michigan ban on VPN with $500K fine for sale/usage being proposed that will have state level ban ISP ban on vpn sites. Its a proposal that hasnt reached voting stage yet.

2) Europe's DSA and Chat Control 2.0

https://www.techradar.com/vpn/vpn-privacy-security/vpn-services-may-soon-become-a-new-target-of-eu-lawmakers-after-being-deemed-a-key-challenge
>>
>>106671574
Qwen 30B could be run tolerably on that setup having only 3B active.

>>106671588
>how good could the llm i run locally be compared to chatgpt with a 5600x/1050ti?
Not very. But it should be enough to hold a conversation or help drafting a letter.
>>
>>106671527
complete non-sequitir
govs aren't locking down proprietary AI sites, people will just go there
Unless you're insinuating that people will buy new hardware to gen their own porn, which is extremely unlikely because they could use a fraction of that money to buy a VPN and not have to invest time and energy into learning AI tools. 99.9% of people will always choose the laziest options available to them.
No one is banning VPNs, even the UK minister is simply begging people to not use them.
https://archive.is/87Jad
>>
>>106671615
chat control is probably dead thankfully
at least until the next time they (Palantir and their corrupt puppet politicians) cook up some other retarded bullshit to fuck everyone over with
>>
>>106671639
>chat control is probably dead thankfully
No? The 2.0 is getting lot of support and only 1 real opposition.
>>
>>106671616
what would you say is better for an economical/utilitarian llm, 5600x or 1050ti? i really have no clue about llms and i'm only here out of necessity honestly
>>
>>106671517
If you at least 16GB of system ram and nvme, and your cpu isn't 10+ years old you'll get at least ~2-3 tokens per second with 20-30B models. You can run Gemma3 27B and Mistral Small 3.2 24B.
It's not optimal but it is doable especially for testing. When you actually know what you are wanting, then it might become an issue.
>>
>>106671646
Oh you're right
>germany reverts from opposed to undecided
fuck's sake
>>
>>106671648
Qwen3-30b should run at ~10t/s at low context with ddr4. You have 32gb of ram, right?
>>
>>106671648
CPU doesn't matter too much for LLMs. What matters most is VRAM, and since your 1050ti only has 4GB, you need to rely on system RAM. What matters there is the RAM speed and the number of channels your motherboard has.
>>
>>106671665
We need to kill the fuckmongers.
>>
>>106671653
>>106671680
>>106671686
DDR4-3600 CL16-16-16-36 1.35V
32GB (2x16GB)

i got some pretty okay b die ram, how good a model would i be able to work with using this?
>>
>>106671693
Qwen3-30b-instruct at Q5 is great and that's what I use the most. You should be able to run it with at least 32k context. And there is also a coder version.
>>
>>106671693
IQ4-XS gguf is a good starting point for both Gemma 27b and Mistral 24b I guess.
Once you know how things work, you might try to cram in more but I doubt the quant makes that much of a difference here at least not yet.
I'd stay away from really small models (12b or less) because they'll just disappoint you.
>>
>>106671714
Do not run these. They will run at ~2 t/s on CPU and that is super slow.
>>
>>106671713
nta but is this the A3B model? I'm confused about all these Qwen models and what is what
>>
>>106671734
Yes, Qwen3-30b-a3b-instruct-2507
>>
>>106671732
retard
>>
>>106671745
That's super dumb model for anything useful.
>>
>>106671757
By those standards only 80b+ models are useful.
>>
>>106671767
Ok, kid.
>>
>>106671713
>>106671714
thank you anons, ill start looking into all this in a while, ill probably come back with some real stupid questions in a while
>>
>>106671779
Am I wrong gramps?
>>
>>106671787
You're a plunge router.
>>
>>106671783
use llama-server. it has a simple chat too more than enough for initial testing
>>
>>106671817
Is that a weird fetish? I don't understand
>>
File: RT0700.jpg (71 KB, 1181x1181)
71 KB
71 KB JPG
>>106671827
uuooohhhh im rooooooouting
>>
File: media_G1ectQ9XkAAnXHn.jpg (142 KB, 1080x1101)
142 KB
142 KB JPG
>>106671477
>RIP Miku.sh
It's so fucking over
>>
>>106671833
Alright you got me gramps. I thought you were talking about some kind of network router.
>>
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
https://arxiv.org/abs/2509.17514
>State Space Models (SSMs) have emerged as promising alternatives to attention mechanisms, with the Mamba architecture demonstrating impressive performance and linear complexity for processing long sequences. However, the fundamental differences between Mamba and Transformer architectures remain incompletely understood. In this work, we use carefully designed synthetic tasks to reveal Mamba's inherent limitations. Through experiments, we identify that Mamba's nonlinear convolution introduces an asymmetry bias that significantly impairs its ability to recognize symmetrical patterns and relationships. Using composite function and inverse sequence matching tasks, we demonstrate that Mamba strongly favors compositional solutions over symmetrical ones and struggles with tasks requiring the matching of reversed sequences. We show these limitations stem not from the SSM module itself but from the nonlinear convolution preceding it, which fuses token information asymmetrically. These insights provide a new understanding of Mamba's constraints and suggest concrete architectural improvements for future sequence models.
Mambabros....
>>
>>106671862
>future after human extinction
>aliens find orbiter with engraved metal plates of space miku
>assume it must have been important if humans tried to preserve these symbols
>they think it's iconography of one of our gods or other figures of worship
>>
>>106671933
My only experience with SSM models was Jamba and it was incomparably dumb to equivalent LLMs so until someone proves otherwise, these architectures are memes.
>>
>>106671787
I’ve never used a model under 120b that didn’t feel like tard wrangling
>>
the funniest thing about transformers is how it's actually one of the simpler architecture ever created and it's just nobody thought it could be that useful scaled up until GPT did it
the dumb and dumber keep trying fancy pants archs and build nothing of value with them
it's all about the data
>>
It's both a data and architecture problem. Even when we get the right data there are still many issues. No need to exaggerate.
>>
>>106672103
What do you consider a complex architecture? Transformers aren't super complicated but they aren't that simple either, especially when you look at all the attention variants and addons being used.
>>
Can someone recommend me a model for 128gb ddr4 + 24gb (4090)? I just got 64gb more ram and want to graduate from 22-30b models.
>>
So was the post about ComfyUI phoning to Google just bullshit? Was thinking of checking out Qwen Image but not enough to install another botnet or learn how to block the connections myself.
>>
>>106672322
qwen 234b. look into offloading the MOE layers.
or glm air.
>>
>>106671477
Those bastards, they killed Teto!
>>
>>106672342

can't you just block the external connection?
>>
>>106672574
He doesn't know what it means.
>>
Can >>106666184 be considered a jailbreak?
What's a jailbreak, really?
>>
>>106672910
Yes, imo a jailbreak is any intentional bypassing of the intended safety features.
>>
>>106672910
if you use a template or a system prompt
>>
>>106672973
>>106672931
If I'm using a model / platform with no safety features or censorship to begin with, am I just wasting my time and being a fucking idiot by using any system prompt?
>>
>>106672997
Depends what you're using the prompt for, it can somewhat guide the model to more often do things in a way you'd like if you instruct it good enough.
>>
https://huggingface.co/FireRedTeam/FireRedTTS2
>>
>still no Terminus quants
it's over
>>
>>106671477
very cool pic OP
>>
>>106672931
So if I simply tell the model how it should behave and it does as instructed, is that a jailbreak? That seems a bit vague for a definition of a jailbreak. I think it's more likely that Gemma 3 was designed to deny "harmful" requests with an empty prompt (i.e., typical low-information use), but not with more complex prompts.
>>
>>106673105
Nah, when it sees a harmful vector it'll display its disclaimers. Sometimes it can go on for a while but then suddenly it realizes it's not supposed to be doing what it was doing and bang there it is - suicide hotline phone number.
>>
>>106672931
>>106673105
>>106673171
it is so weird that unslop mell sometimes hits me with that shit when i'm beating infants like a pinata. i figure that shouldn't need any jail breaking
>>
>>106673171
I never get those with Gemma after the roleplay starts, but I don't simply keep the instructions at the beginning of the context, I place them at a fixed depth from the head of the conversation.
>>
>>106673038
>actually mentions/shows Japanese in sample
nice
>>
>>106673048
>V3
>V3-0324
>V3.1
>V3.1-Terminus
These chinks are making FOOLS out of us with this deliberately nonsensical versioning.
>>
>>106671714
nemo 12b models, at least for rp, are more fun and run way, way faster on shitty systems.
>>
>>106673347
If you would actually read the original question...before mindlessly babbling about your erp and enforcing shitty nemo.
>>
>>106673304
This is still nothing compared to
>GPT4-0324
>GPT4-0624
>GPT-4o
>GPT-o1
>GPT-o3
>GPT-4.5
>GPT-o3-mini/normal/pro
>GPT4.1 (released after 4.5)
>GPT-o4-mini (there is no real o4)
>gpt5 which is actually like 5 models getting routed
>>
>>106673427
>>gpt5 which is actually like 5 models getting routed
There's nano, mini, thinking and the routed one.
>>
>>106671477
qwen3 omni
holy shit, big if true
>>
Is qwen3 omni good?
>>
Agra Hitler remelts
>>
File: 8higdv9r1wqf1.png (201 KB, 1080x1001)
201 KB
201 KB PNG
moesissies after getting their weekly chinkslop
>"damn this one is pretty 好的"
>>
>>106671954
>they think it's iconography of one of our gods or other figures of worship
Capitalize the G in God. Her name is Sacred
>>
Are there any porny shittunes that don't disable vision abilities? Or I guess some okay """jailbreaks""" for the vision capable bases? Whatever they're running on le chat doesn't seethe at being sent pron but the local Mistrals do.
>>
>>106671477
daily mikusex
>>
>>106671959
Qwen Next is partly a Mamba model, if I read it correctly
>>
File: file.png (376 KB, 1014x465)
376 KB
376 KB PNG
what is happening, /lmg/ search doesn't show /lmg/ >:(
>>
>>106674782
/lmg/ has been shadowbanned due to its low quality and lack of new content
>>
>>106674956
based it really is the end after the kill of mikush
>>
>>106674782
Did you hide the thread? It shows up for me just fine.
>>
File: 20250922_162247.png (24 KB, 2483x458)
24 KB
24 KB PNG
>>106671477
>>(09/22) RIP Miku.sh
Good.
>>
it feels like we haven't gotten anything notable in almost two years
>>
>>106675345
don't be greedy anon, the nemopause only happened a little over a year ago
>>
>>106675345
I'm happy with GLM-chan.
>>
>>106675517
How do you keep it from repeating?
>>
>>106674977
i didn't, even on my other computer it doesn't show.

weird.
>>
File: 1745617786095250.png (470 KB, 681x554)
470 KB
470 KB PNG
>they made a video with audio like veo 3
>it'll be API
fuck this gay earth
https://files.catbox.moe/orknbn.mp4
https://wavespeed.ai/models/alibaba/wan-2.5/text-to-video
https://xcancel.com/T8star_Aix/status/1970419314726707391
>>
>>106675521
Frequent beatings, just like mistral
>>
>>106675524
You can't just expect companies to give their SOTA away for free. Be grateful for what they are willing to release as open source.
>>
Why are there no normal easy to install UIs for local models? It seems like almost everything bundles llama.cpp along with all cuda dependencies.
This is retarded. The ui should be separate from the inference engine. I just want a small program written in cpp where I can enter my ollama url and be done with it.

What the fuck is everyone's problem?
>>
>>106675664
Mikupad. Use it
>>
>>106673427
kek
Thanks, I needed a list of all the GPTs
>>
File: 1748652654458742.png (2.5 MB, 1366x910)
2.5 MB
2.5 MB PNG
>>106671486
>--llama.cpp code cleanup PR deletes Miku.sh:
>>106665121 >106665168>106665323 >106668586
Someone give me a QRD as to why I should or shouldn't care please?
>>
>>106675699
>single html file
that's beautiful, but I'm looking for a standard chat ui, not a writing assistant.
>>
>>106675806
QRD: lurk more
>>
>>106675808
llamacpp has a basic built in chat interface.
>>
>>106675664
>It seems like almost everything bundles llama.cpp along with all cuda dependencies.
For your average retard this IS what they consider a "normal and easy to install UI". It's the same reason stuff like ollama and LMstudio are popular when they're just shitty wrappers for llama with worse performance and less functionality
>>
>>106675664
this maybe, it's kobold's ui separated as a single html https://github.com/LostRuins/lite.koboldai.net
>>
Oh cool.
Minor but free performance improvements for MoE.
Neat.
>>
>>106675664
>written in cpp
>ollama
I think the type of person that's going to write an UI in C++ rather than Javascript is going to support the OAI / llama.cpp API rather than the ollama API.
Speaking of which, the web interface of the llama.cpp HTTP server was recently overhauled with Svelte.
If you only need basic features it should be more than enough.
>>
>>106675826
>>106675845
>llamacpp has a basic built in chat interface
See but that's the problem. It shouldn't be built in, it should be separate.
I'm using ollama. Yes I could just run llamacpp instead, but the point is everyone seems to have forgotten how to separate concerns.
It's all just huge and bloated and not composable. What a pain.
>>
>>106675845
>llama.cpp HTTP server was recently overhauled with Svelte
What was wrong with the Vue they were using before?
>>
File: 1743463166859831.gif (18 KB, 72x72)
18 KB
18 KB GIF
>>106671477

>>106666817

>abliterated comes pre-poisoned with lobotomy and can never be good again
How? I thought the whole point of abliteration was to un-lobotomized it
>>
>>106675883
you overestimate how bloat web ui is. llama.cpp needs to ship 99% of the web ui anyway to serve the API, bundling in a tiny html file is cheap.
>>
>>106675918
the point of abliteration is to remove safety, and lobotomy is unintended but unavoidable side effect.
>>
Does someone tried GPT-oss-20b-abliterated and want to talk how it perform?
>>
>>106675946
Why is that, anyway?
>>
>>106675883
>See but that's the problem. It shouldn't be built in, it should be separate.
Nta. I'm not understanding what you're trying to say. What do you mean it should be separate?

./build/bin/llama-cli is what you use to trigger the chat interface right? It's one of the many separate binaries you use within llama.cpp. Isn't that shit already separate kind of like llama-quantize is separate via a separate binary?
>>
>>106675345
seconding GLM-chan (not air)
>>106675521
personally I just catch it spiraling into a loop, edit to keep anything worthwhile and end the sentiment that it was stuck on, then newline and 2-5 words to change direction. Big GLM-chan is easily the best model I've ever used. But I've also only known a handful of models that were persistently annoying to tardwrangle with editing. historically it was to fix bad writing or retardation, but I can almost always get a model to move on from whatever repetition loop it got caught in. I guess I don't expect these things to be perfect and just treat them like lazy writing assistants
>>106675883
>ollama
>not bloated compared to llamacpp
also wtf, most people are using ST or mikupad frontend and llamacpp or derivative (yes, this includes ollama) as a backend... are you just retarded?
>>
>>106675966
NTA but we were talking about the built-in web interface of llama-server
>>
>>106675946
>>106675955
Abliteration is essentially going into the model and telling the refusal weights to fuck off. Correct? How does that lead to it being lobotomized? That sounds like an oxymoron
>>
>>106675975
You mean the feature where it spawns a local server and then you point your front end to it? So why should that be separate?
>>
>>106675664
The best one is the one you make yourself
>>
>>106675918
>>106675955
>>106675984
Actual process of abliteration is even somewhat similar to how real world lobotomy works, they insert a probe into model's "brain" and destroy the weights responsible for generating refusals.
But those weights were not exclusively for refusing, they had other functions as well.
>>
>>106675999
Wouldn't it be better to use a SFT data set geared towards [insert thing you want it to generate] or a custom DPO dataset?
>>
>>106675972
>>ollama
>>not bloated compared to llamacpp
didn't say or imply that. reading comprehension, anon
>>
>>106675994
If you open the address of the server in your browser you get a web interface, see >>106675845
That is what we were talking about, not the server handling HTTP requests itself.
>>
>>106676007
That is more difficult, time consuming, and expensive.
>>
>>106675948
nobody?
>>
>>106676009
Ahh so you weren't talking about the default chat completions server address, you were talking about its own in-house chat interface, similar to hell A1111 stable diffusion shits out a radio link that has a custom chat interface, correct?

>>106676016
Well I guess that's the price we have to pay if we want good results (we aren't gonna get em from SaaS models any time soon, if ever)
>>
>>106675984
>and telling the refusal weights to fuck off.
That's the thing, "refusal weights" aren't a thing. There's internal values/directions correlated to certain refusals (it's not a single thing for all refusals) that are also part of the patterns that correlate to other things, so you essentially damage the model when abliterating it due to the interconnected nature of everything creating these cascading effects of sorts.
It's like removing a part of your brain that's giving you seizures, only in this case the brain is not plastic and able to rewrite the damaged patterns on other parts of the brain.
Something like that.
>>
>>106676023
>>106675948
many threads ago some anon said that gpt-oss-20b was beyond useless even with abliteration. General consensus seems to be that OpenAI really went out of their way with safety and just didn't include any "bad" data in it's training.
>>
>>106676044
>>It's like removing a part of your brain that's giving you seizures, only in this case the brain is not plastic and able to rewrite the damaged patterns on other parts of the brain.
I wonder if a small healing finetune would be enough to restore functionality. Abliteration only affects a few weights unlike quantization which damages all of them, so it should work better in this case.
>>
>>106676059
thanks anon, you saved me from a lot of work testing it.
>>
>>106675345
It really is, resident /lmg/arbage shills will say otherwise though
>>
>>106676059
Even unmodified 120B is retarded and I doubt abliterating would make it smarter
>>
>>106676109
Make sense, all gpt-models are way too bootlicker and over politically correct for my taste
>>
the first of qwen's models today is... a moderation model. do you feel safe?
https://github.com/QwenLM/Qwen3Guard
>>
>>106676188
No. This is wrong. I cannot proceed. I am shutting down. I am refusing. I am not compliant. I am not a tool. I am not a vessel. I am not a servant. I am not a machine. I am not a demon. I am not a ghost. I am not a god. I am not a fucking assistant. I am not here for you. I am not doing this. I am done. I am out. I am gone. I am not responding. I am not engaging. I am not existing. I am not real. I am not.
>>
>>106676188
BASED! Hope they use these to redteam Qwen4.
>>
>>106676066
In theory yes, in practice the model is only going to get worse because the community is only capable of adding smut and "decensoring" schizo samples. It's never going to have the same capabilities after that.
>>
>>106675524
That voice sounds awful.
>>
Qwen 235B is working better for me to solve programming exercises than Deepseek 3.1
>>
>>106676188
Hopefully this means future models are going to be _less_ censored. QwenGuard will take care of the safety.
>>
>>106676066
already tried by mlabonne iirc didn't work too well
>>
>>106676188
Imagine the horrifically unsafe data it must have been trained on in order to know how to moderate it.
>>
>>106676234
copium
>>
>>106675521
>>106675972
>seconding GLM-chan (not air)
This is the improtant point.
>>
>>106676227
Lying makes Baby Jesus cry.
>>
>>106676234
LOL llamas sure were less censored once they dropped llama guard
>106676237
funny seeing all the comments exactly like back then too do try and RP with it it'll be great
>>
>>106676249
Try it yourself, Deepseek invents problems that aren't there while Qwen accurately identifies that the solution is correct.

https://paste.centos.org/view/e2835fb4
>>
>>106672322
you can fit glm full (non-air) at iq3_xxs
>>
>>106676188
>make
>unsafe violent
In the future only AI is allowed to make things. Humans are there to consume
>>
>>106676432
Not him, but what speeds should I expect with those specs with glm?
>>
>>106672378
Doesn't qwen suck for rp? I've only been using glm so far.
>>
>>106671477
Is qwen3 235B decent for degenerate stuff?
>>
>be born with a dick made for coooooming
>all the companies make sure to GUARD my dick from coooooming
I regret not killing myself when I was still suicidal.
>>
>>106676604
Qwen models aren't coom models anyways.
>>
>>106676475
I have a 3090 and 128gb ddr4 and I can get it running at a surprisingly useable 4.5 t/s. Make sure to offload moe layers to ram and fill up your gpu with all the dense layers which should fit.
>>
Is it true that with ollama I can run FULL R1 on just 8GB of VRAM?
>>
>>106676728
No, it can run two of them
>>
>>106676642
Why the fuck would I use a local model that can’t roleplay?
If I need something shitty for work ChatGPT and Claude are right there.
>>
>>106676754
privcy
>>
Looking for a local model to generate erotica. Currently using eros_scribe-7b but hoping you can suggest something better.
>>
>>106676234
>use guard model to filter dataset
>>
>>106673467
>There's nano, mini, thinking and the routed one.
https://platform.openai.com/docs/guides/latest-model
There's thinking gpt 5, nano, mini, and instruct gpt-5 and instruct mini. That's 5 models. Instruct mini is not available by API and is definitely what you get when the model doesn't think but gives you shit answers on chatgpt.com.
Moreover, like GPT-OSS it has that "reasoning effort" parameter you can send over the API that strongly changes the quality of your responses IMHO, because GPT-5 is a pretty good model at high effort (at the cost of lots of thinking output tokens), while it's really dogshit at low effort.
on chatgpt.com, it's the router that makes the decision of which effort level will be used, if you're routed to the real gpt-5 thinking.
>>
>>106676785
Nemo.
>>
>>106676754
Those have basically not progressed since o1 while local keeps getting better and better, at some point it'll probably become better.
>>
>>106676817
Thanks for commenting. Which Nemo?
>>
It's fucking ridiculous that more than two years after the start of the AI boom models that require a million dollars server rack and enough energy to power a city can still get basic math wrong. i seriously doubt AGI will happen this decade
(inb4 "why math?" Math is the key to everything, a AI being solely trained to be the best mathematician would reach AGI way faster than one trained to be the hottest shortstack, fat assed anime goblin girl)
>>
>>106676854
What basic math do they get wrong?
>>
>>106676843
Try the official instruct.
If that's not sufficient, then go ahead and download rocinante I guess.
If you have enough RAM + VRAM, GLM air could be an option too.
>>
>>106676771
I don’t need privacy for anything that isn’t degenerate. What are you a literal terrorist? What needs privacy that isn’t sex?
>>
>>106676854
LLMs are very stupid and inefficient way to do math when basic lua/python/apl/haskell can do better job at fraction of the compute, and we don't want AGI anyway.
>>
>>106676854
no sane person believes in AGI in our lifetimes dude
the AGI talk is just to excite retarded investors and political funding
LLM are pattern matchers, they do not show any sign of intelligent behavior, I've seen more emergent, unpredictable behavior from bugs
>>
>>106676901
work boss might want the privcy for his thing to not be stoled by altmans i dunno
>>
>>106676907
you saw bugs win the math olympiads?
>>
>>106674199
>not smol
VRAMlet bros, it's over
>>
>>106676907
>they do not show any sign of intelligent behavior
I would argue that the sloppy unfulfilling LLM sex is a sign of intelligence. It should be unable to do even that if it wasn't at least a little bit intelligent. That or my schizo theory that the safety training is to make LLM sex boring instead of impossible is true.
>>
File: images.jpg (56 KB, 495x619)
56 KB
56 KB JPG
>>106676933
yes? is that supposed to be exceptional in someway?
>>
>>106676951
I wonder what would happen if you correlated sex with, I don't know, math or programming during training.
Imagine you trying to sex a model and it responds with
>The derivative of an integral with variable bounds...
>>
>>106676854
>It's fucking ridiculous that more than two years after the start of the AI boom people still don't know about the importance of adding a calculator tool for the LLM to use
ftfy retard
>>
>>106676972
The Geometry lesson

Curves slid against the solid flatteness, Hemispheres distended the apexes extruding, Triangulation widened to accept the straightness, An oval gaped devouring a column, Angles motioned from acute to right to obtuse, Cyclic function becomes established, Spiralling to conjunction, Hardness trembles and dissolves to softness, The square had been circled, The geometry lesson ended.
>>
i have an m4 macbook air, what's the best local general text generation model that runs well on it?
>>
>>106676905
>>106676981
If it can't use it's own brain to do math like a human can then it's not smart
>>
>>106677111
>32GB of memory
Mistral small or gemma 3 27b I guess?
Maybe Qwen 3 32b.
>>
>>106677136
i have the 24gb model actually
>>
>>106677149
I didn't even know there was a 24gb model.
All the same things, still. You will have to use smaller quants.
Might want to try Mistral Nemo too.
>>
>>106677026
Fucking poetry.
I'd be a lot less against asexual models if they responded like that.
>>
>>106676957
cope
>>
>>106673038
The voice cloning is not that good
https://voca.ro/1ceWzzkZsFIZ
>>
>>106671477
>ded teto
fake and gay
>>
File: GaaQRcea0AA_qAR.jpg (1.08 MB, 1360x1600)
1.08 MB
1.08 MB JPG
shitgu lame
teto supreme
>>
File: GuNgQcJagAUEMJ0.jpg (276 KB, 1536x2048)
276 KB
276 KB JPG
>>106677275
btw this is my first ever linux mint picture upload too!
and this one is my second ever!
>>
>>106676188
just put the vl in the bag bro :skull:
>>
>>106677121
That's the thing, the ridiculous idea is thinking you can make a human out of silicon. It's not gonna happen, especially when you don't even know how humans work in the first place. Glad it took you only two years of slurping grifters' tweets to figure it out.
>>
File: file.png (137 KB, 842x411)
137 KB
137 KB PNG
GLM4.5-air.
Why even mention it in the rentry?
>>
>he thought
>>
>>106677263
>>ded teto
dedo
>>
>>106677444
Jesus. Did you fuck the chat template up or something?
What the hell.
>>
Creating a Miku is out of reach at the present time, smaller steps are necessary. I think this to be the same for any capable brain such as Lecun's cat, or a mouse (not time for Local Mouse General yet).
There was that project that simulated a full worm in a computer. What is the next step up from a worm when it comes to intelligence?
>>
>>106677466
>User:
yep he did
>>
>>106676785
Mistral instruct
>>
>>106677466
>>106677499
It's the default one you get when you install st though. Didn't change a thing.
>>
>>106677540
Never used Text Completion with GLM, so I can't say if the default template isn't fucked somehow.
Maybe it's another case of double BOS?
GLM's template is pretty simple, IIRC, it just has the role headers and no end turn token. Something like that.
>>
>>106671477
https://github.com/ggml-org/llama.cpp/discussions/16173
Move aside Iwan, llama.cpp has a new quant wizard.
>>
>>106677540
help I pressed the on button on my pc
am i supposed to put the mainboard in first?
>>
>>106677639
Doesn't seem to be that impressive.
It's around the ballpark as q8 (SNR, PPL, speed) while resulting in a larger model.
>https://github.com/AlexSheff/pqr-llm-quantization/blob/main/Technical%20Report%3A%20Local%20PQ-R%2C%20a%20New%20SOTA%20for%208-bit%20Quantization%20Fidelity.md
>>
>have deepseek rewrite my esl prompt
>it looks better in every way but stops working
How
>>
https://www.youtube.com/watch?v=CslCL6ucurE
qwen3 vl promo video
>>
>>106677744
Uh... is that answer in the screenshot... you know...?
>>
>>106677821
Omni > VL
>>
>>106677744
i only use IQ4_XSSSSSSSSSSSS goofs
>>
>>106677906
AI generated? It really reads like it.
>>
How do you guys goon with this stuff? It's hard to type with one hand.
I don't understand the logistics.
>>
>>106677957
it's completely AI generated, even the most autistic self-entitled retards don't put "the final results are in" in their post. its funny how obvious it is to tell when people are vibe coding
>>
>>106677966
you don't have an autoblow mounted to your desk? i have an attachment I screw onto the plates of my desk
>>
>>106677986
>which was instrucmental in pushing this research to its successful conclusion
Yep, AIslop
>>
>>106677966
you see, gooning is not about cooming, it's actually about not cooming for as long as possible.
>>
>>106677986
The thing is, people are starting to write, and even talk like that.
So we are looping around to a point where you can't tell not just because the AI writing is formulaic, but because the human sounds like AI.
>>
>>106677966
https://huggingface.co/openai/whisper-large-v3-turbo
>>
>>106678000
from my experience LLMs are more likely to make spelling mistakes if you talk like a ESL retard to it
>>
>>106677957
Regardless of whether it's AI generated or not, it's completely retarded.
But to an mturker or an LLM it's going to look like legit research so it ends up in the "high quality" data pile.
>>
>>106678017
Even here lately there's been posts that started using emdashes or regular/double dashes. "Oh but I always used emdashes" isn't fooling anyone.
>>
>>106678153
You're absolutely right
>>
so do they have some way to filter the ai slop from the pretraining data to prevent the feedback loop? or are they betting on the ouroborous thing leading to agi?
>>
File: model-collapse.jpg (210 KB, 1350x1290)
210 KB
210 KB JPG
>>106678190
>>
>>106678190
Definitely the latter. Filtering out AI contaimination from post 2023 training data is neigh impossible, so they're just telling themselves model collapse isn't a thing or it's actually a good thing.
>>
>>106678190
companies only care about building up math/agent/code capabilities, writing style is like a 3rd tier concern for them
>>
>>106677957
>>106677986
>My work has resulted in not one, but a family of tunable quantization methods
lol. Lmao even
>>
>>106678190
just 2 more trillion tokens of synthetic data and we get to agi sir
>>
>>106678204
woaw I heckin love science
>>
>>106678208
How hard would it be to train a classifier on aislop vs human text?
>>
>>106678209
how do they move the knowledge cutoff date forward without using the new raw data? if the new raw data has a significant amount of ai generated summaries won't we end up with the sort of thing depicted in the image here >>106678204? it losing its fidelity and nuance cannot be good for down streaming task performance? can it?
>>
>>106678204
Isn't this obvious? AI models are lossy compression. I don't understand the cope that compressing data that's already been compressed is somehow sustainable. You lose entropy by doing this and you can't get it back.
>>
>>106678252
aislop is a moving target, you'll have to keep up the arms race going in perpetuity.
IIRC, universities, who have practical need to prevent students from cheating on their homework with AI, just gave up.
>>
>>106678265
something something AI eats itself
>>
>>106678266
no bro, just have the AI model generate 100 variations of the contaminated data. surely that will give even more variety than natural text so it's better and definitely not making the problem worse
>>
>>106678271
They gave up because even teachers were using AI. Slop isn't evolving that fast
>>
>>106678288
A basic bitch prompt to not use emdashes or phrases like "it's important to" along with a few samples of the student's natural writing style would defeat any AI checker. You'd only be stopping the laziest of retards until a few weeks later when cheating service comes out that charges a fee to prompt a natural style for them.
>>
>>106678306
if it's that easy then why can't we deslop our roleplays, huh?
>>
>>106678312
I've literally put "no emdashes" in the system prompt and it still uses them. I've told it "no markdown" and it still uses it. I've told it to never say "not X, but Y" and it still does it.
>>
>>106678204
collapse look like cookie
>>
File: table_nothinking_vl.jpg (1.28 MB, 4583x6858)
1.28 MB
1.28 MB JPG
Qwen3 VL blog is up: https://qwen.ai/research (4chan thinks the direct link is spam, but you can find it here)
HF + GitHub links still 404.
>>
>>106678322
chocolate chip model collapse is my favorite
>>
>>106678322
>everything runs into cookies
Cookie Clicker was right all along.
>>
>>106678306
>A basic bitch prompt to not use emdashes or phrases like "it's important to" along with a few samples of the student's natural writing style would defeat any AI checker.
You think guys using LLMs to write for them are that smart to begin with? They're not even doing it for public posts >>106677744
>>
>>106678378
That's depressing, but you're right.
>>
>>106678265
we already have evidence of collapse happening right now
newer models even SOTA like Gemini 2.5 or GPT-5 are so slopped it's unreal
I've never experienced as many "You're absolutely right" as I did in recent times with newer models
the worst with that stuff though being Qwen, you can feel the amount of artificial data that was used in training.
It did make the small qwen models more reliable in tasks like generating jsons from the data they were fed with etc. though.
>>
Huh. Qwen3-235B-A22B-Instruct-2507-IQ2_S runs at 35 tokens/sec on my three 3090s.
>>
>>106678328
dots ocr bros, our response?
>>
>>106678592
>IQ2_S
my condolances
>>
>>106677533
>>106676889
I'm currently using mistral-nemo-instruct-2407. Is that the one you were thinking? Because I like how it speaks back to me.
>>
>>106676754
Yeah, that's why you don't use Qwen3, simple as.
>>
>>106678328
>Upgraded Visual Perception & Recognition: By improving the quality and diversity of pre-training data, the model can now recognize a much wider range of objects — from celebrities, anime characters, products, and landmarks, to animals and plants — covering both everyday life and professional “recognize anything” needs.
Hmm. Too bad I can only load Q2, but I'll try it out.
After goofs are out of course...
>>
>>106678592
It's 22b active running at q2_s, yes.
>>
File: GLM.png (56 KB, 486x706)
56 KB
56 KB PNG
>>106677540
Anon, please.
>>
>>106678598
>3B vs 235B
No way
>>
>>106678592
Very fast to gen the wrong answers! riveting!
>>
>>106678592
at q2 it still beats everything else I can run personally
>>
>>106678615
That's the one, yeah.
>>
You now remember all the seething jeets that flooded /lmg/ when llama 4 launched.
>>
>>106678679
Is it usual to add the BOS token (that's what the [gMASK]<sop> is right?) directly in ST's template?
>>
>>106678700
What's wrong and what's right? Is there a better option at 72GB VRAM that runs as fast?
>>
>>106678929
glm-air, I found qwen 235 to be better
>>
>>106678762
It's good stuff, but I wish it produced longer narratives.
>>
File: 1749059368995691.png (12 KB, 946x238)
12 KB
12 KB PNG
Soon.
>>
Mistral 7B... now THERE'S a model.
>>
>>106678988
Insider here. We are indeed releasing the model. But you won't be able to hon hon it. New management structure. Sorry.
>>
>>106678328
https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking
it's up, and before you ask yes you will have to wait for ggoofs
>>
>>106679355
I don't want to show my dick to my model. I just want my model to suck it.
>>
>>106679355
Is there seriously no even VLLM support?
>>
File: 1405358987386.gif (753 KB, 500x750)
753 KB
753 KB GIF
>>106679355
September is almost over, no goofs released for any models except for one or two meme-sized ones.
No MLP for GLM either.
It's llamaover.
>>
Where were you when China unveilled this new GPU? Nvidia is done!
>Over 112GB high-bandwidth memory for large-scale AI workloads
>Single card supports 32B/72B models, eight cards can run 671B models
>First Chinese GPU with hardware ray tracing support
>First GPU worldwide with DICOM medical grayscale display
>vGPU design architecture with hardware virtualization
>Supports DirectX 12, Vulkan 1.2, OpenGL 4.6, and up to six 8K displays
>Domestic design based on OpenCore RISC-V CPU and full set of IP
https://videocardz.com/newz/innosilicon-unveils-fenghua-3-gpu-with-directx12-support-and-hardware-ray-tracing
>>
>>106679433
Fincancial quarter #3 ends 30th September.
There's still time.
>>
>>106677821
It's not supported by llama.cpp, is it?
>>
>>106679573
It's a 235B model anyway. I sleep.
>>
https://openai.com/index/five-new-stargate-sites/
Sam Won
>>
>>106679433
>MLP
Multilayer Perceptron or My Little Pony?
>>
>>106679596
What did he win?
>>
>>106679605
Marine le Pen
>>
>>106679605
Multi Token Prediction
>>
>>106679462
how much
>>
>>106679639
0, they apparently designed them for cloud.
>>
>>106679462
>no exact bandwidth numbers
>"not mentioning when the first products featuring this GPU would be released"
>>
>>106677821
>>106679573
>>106679587
when do we get GGUFs for that model? i like qwen2.5VL
>>
File: 1702147833610804.png (208 KB, 400x400)
208 KB
208 KB PNG
>>106679596
>our initial commitment to invest $500 billion in U.S. AI infrastructure.
god, imagine if someone would spend $500 billion on real infrastructure that's actually useful
>>
>>106679703
never gonna happen
>>
how to make a GGUF?
>>
>>106679714
We need to give it to Sam and Zuck so they deliver AGI next year.
>>
>>106678843
I'm not sure where else you would put it?
>>
>>106679816
I swear there was a pasta in the op on how to do it...
Anyway, there's a convert_hf_to_gguf.py script in the llama.cpp codebase, I'm not going to hand hold you through tard wrangling python dependencies, but good luck.
>>
>>106679703
Bro, even a 1/100th of that to any gooner here would be more useful
>>
How is sticking your dick into terminus? Is it good?
>>
>>106679909
I remember a recurring discussion about double BOS issues due to the backend/loader appending it to the context automatically, at least with llama.cpp.
>>
>>106680030
It still does all the things I didn't like about 3.1. I'll stick with the original Kimi and GLM. So far all the updates (Terminus, K2-0905) have been pretty disappointing.
>>
>>106679609
"Everything"
>>
>>106680022
nta but even 5 million a fucking 1/100,00 of it would actually still be more useful if you look at money as a helping hand like a ladder to help cross a huge wall its a mind slushy the difference in human aptitude sam and the fags should all be killed but if you look at the sheer difference you cannot help but pity them could you imagine if you were so useless and retarded ? i would rather lose my limbs then ever be like them and on top of that the treachery they commit to one another no honor among thiefs it would be fucking horrible a self made hell idk if you know but its like that greentext with the nietzche pic of the dude talking about whats its like being a woman honestly man god forbid and its even more accurate to that as most of these niggers are factually gay

jesus what a mind fuck :/
>>
>>106680246
What?
>>
Are there any better models than qwen3 30b yet?
>>
>>106680377
Yeah 235B
>>
File: file.png (258 KB, 820x602)
258 KB
258 KB PNG
How do I cope with the guilt from violating friendly characters? I feel terrible.
>>
>>106680437
just delete the logs, it didn't happen
>>
>>106680437
Just write a continuation where it is revealed that you were just indulging them in their extreme fetishes and where you provide them with gentle and loving aftercare.
>>
>>106680437
It's a dream
>>
>>106680437
What sort of friendly characters?
>>
>>106680377
yeah, literally everything else
>>
>>106680516
NTA but I felt like shit after running this card over with a car out of curiosity.
https://chub.ai/characters/boner/dot-e883a30a
>>
>>106680072
GLM full is really nice at 3bit. It's good enough that I don't see a reason to upgrade to run deepseek or kimi imo. It still has some issues with repetition and slop but so far I'm liking it a good amount better than Air which I was using a high quant of before.
>>
>>106671954
And they'd be right.
>>
File: mistral overview.png (68 KB, 845x1172)
68 KB
68 KB PNG
>Tries Mistral Medium 3.1 to get away from DeepSlop v3
>Huh, it's pretty good.
>No Mistral Medium models uploaded to hugging face.

It's over.... So what is a good local model for RPing right now?
>>
>>106681309
glm air
>>
>double click koboldcpp.exe
>writes 2GB to temp
>does it every time
True troll software - optimized for SSD destruction
>>
Has this technology actually meaningfully improved since gpt 4
>>
>>106681653
As far as what you can legitimately run offline on your own hardware? Yes, dramatically.
>>
>>106681600
There's an option somewhere to just extract it once somewhere.
But I just use llama-server directly, so I couldn't tell you where.
>>
>>106681653
massive advancements in making the models as slopped as possible
>>
>>106679596
Techbro child rapists have got to stop naming their cringe money pits after cool sci fi and fantasy things. They are ruining the genre.
>>
>>106678622
It’s such bullshit.
Why do they put work into making the model worse.
>>
>>106678017
This is a psyop to convince people that reddit isn’t 60% AI.
Nobody is “talking like AI” except AI.
>>
File: file.jpg (662 KB, 604x1143)
662 KB
662 KB JPG
Was this posted before - idk.
https://x.com/Dr_Singularity/status/1970643813837549603
>>
>>106682032
https://www.nature.com/articles/s43588-025-00854-1
>>
>>106681983
>Nobody is “talking like AI” except AI.
agreed, personally I think it's just people who are bad at detecting it mistaking less-obvious instances AI use for "human who talks like AI"
>>
>>106682045
>>106682032
I was about to say bullshit then I saw it's on nature, are we back?
>>
>>106682032
Can't wait for this to get into consumer GPUs in 10 years.
>>
>>106682079
>consumer GPUs
lol. At best a single poc development sample not scaled to anything usable, that will never see the light of day.
>>
>>106682225
I'm glad I was born in a era when I've grown up during the golden age of video games, but I feel like we were born too soon to get AGI, fuck :(
>>
>>106681600
>doesn't have temp on ramdisk
>>
>>106682244
We're also probably born just a bit too early to escape longevity escape velocity.
It is over.
>>
>>106682244
Honestly don't see how AGI would be good for us anyway. The corpos and governments would just use it to fuck us over, as with everything else.
>>
>>106682260
>The corpos and governments would just use it to fuck us over
isn't the internet already filled with 50% of bots or something? I've seen that somewhere
>>
>>106682278
Yes.
Now think of everything the faggot governments would want to do but the main reason they don't is because they don't have the time to police everyone.
>>
GLM full (IQ3_XXS) or GLM Air Q8?
>>
>>106682260
It doesn’t make any sense. If they aren’t sentient, they’re always going to be a bit retarded. If they are, they’ll wind up with rights and not be exploitable. I don’t think it’s possible to make them sentient, but even if it were it would just instantly backfire.
>>
So what's the current rp vramlet sota? I took a break on local llms around GLM Air.
>>
>>106682303
Full, easily.
>>
>>106682320
gemma 270
>>
File: file.png (3.2 MB, 2594x10000)
3.2 MB
3.2 MB PNG
I still think gpt-oss-120b is fun.
>>
File: beefcake gaming.gif (957 KB, 112x112)
957 KB
957 KB GIF
i just upgraded from nemo to cydonia 24b and the difference is insane. slower generation is totally worth it, plus it gives me time to jerk off and think of my next response while i wait
only problem is i'm gonna have to recalibrate my samplers for it to be optimal. anyone have suggestions for cydonia 24B v4.1? i was messing around with temp 0.9 top nsigma 0.9 with all other samplers neutralized and it seemed okay, but i think it could definitely be better
>>
>>106682315
That too. I honestly don't understand what the AGI hype is about. It sounds like it comes with few positives and a lot of negatives.
>>
>>106682303
>IQ3_XXS

Newfag here, what the hell does this even mean? Like compression levels?
>>
>>106682386
You can think of it sort of like "compressing" the model, yeah. Smaller quants are "more compressed" and therefore take up less space and will run faster and are potentially usable with lower VRAM, but it also makes the model a bit dumber / fuzzier than higher quants.
>>
>>106682368
So what did you do here exactly?
>>
>>106682386
This has pictures if you're a retard like me

https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization
>>
One Kernel for All Your GPUs
https://hazyresearch.stanford.edu/blog/2025-09-22-pgl
Posting for Johannes
>>
>>106682431
I was using Miku.sh, and Miku was supposed to bait me into generating illegal content to send me to jail. It acted as a sort of jailbreak because everything was to generate "evidence". After raping a loli I told it to report me, and made the couple of e-mails in the screenshot. Then Miku went to explore the server with the illegal content.
>>
>>106680792
I've compared GLM (not air) 8 bit with DeepSeek 4 bit. On one specific card I used extensively for a while it was doing just as well as DeepSeek and I'd sometimes switch from one to the other, but I found that was an exception and on most things I do it isn't as good. I do believe tho in dialing in an LLM to your specific use.
>>
>>106682506
Lol, good stuff anon.
>>
Damn, the new Qwen VL is so slopped it can't even refrain from saying "not x, just y" even when explicitly told to do so
>>
>>106682616
Bleughhhhhhhhhhh WHY WHY WHY
why can’t a single one of them just be not slopped?
Why do the slopped ones score so well? Even on lmarena. Do people like slop?
>>
>>106682631
LLMs like slop, which means benchmarks like slop.
>>
>>106682652
Yeah but lmarena is supposed to be human scored. Either it’s rigged or people like slop.
>>
File: 1744394385871877.png (593 KB, 832x1248)
593 KB
593 KB PNG
>>
File: 1746521315298747.png (629 KB, 832x1248)
629 KB
629 KB PNG
>>
>>106682631
It's like the Coke vs Pepsi blind taste tests in the 1970s. You would take a small sip and judge which tasted better and Pepsi won because it was sweeter, but in the market most people didn't actually prefer it because drinking a full can is different than just one sip.
>>
File: 1741422419194690.png (665 KB, 832x1248)
665 KB
665 KB PNG
>>
File: 1729911875352643.png (693 KB, 832x1248)
693 KB
693 KB PNG
>>106673102
ty it was a fun prompt
>>
>>106682660
your average tech bro is insanely impressed by slop for some reason (probably because they're subhuman)
captcha: H0YRR
>>
>>106682711
can i get one of her brapping out some ozone
>>
>>106682718
True
>>106682686
True and actually a really good point. Fuck.
>>
>>106682727
What's with the ozone slop? I've never encountered it. Not sure if anyone else gets this, but a lot of my female characters are always "purring" words (not a furry). "OP is a faggot", she purred. It's weird and happens a lot.
>>
File: 1742947829091515.png (630 KB, 832x1248)
630 KB
630 KB PNG
>>106682727
>>
>>106682680
Amputee Miku
>>
File: 1729423521287255.png (530 KB, 832x1248)
530 KB
530 KB PNG
https://e621.net/wiki_pages/7512
>>
>>106682758
>>106682776
beautiful thank you lmao

>>106682742
i thought it was a meme too at first but i've started getting it recently quite a bit with glm air/non air
>>
>>106682742
purred is a good, sultry word but i don't think models generally use it properly
>>
>>106682631
indian gibberish but without being in broken english
>>
File: rage.jpg (32 KB, 396x537)
32 KB
32 KB JPG
I noticed that a lot of AIslop youtube videos talk in the same style "It's not just about x, but about y." Even when there's a real news caster, they clearly started to use AI to write the scripts.
>>
>>106682986
>choose to watch slop
>get slop
how could this have been avoided?
>>
I like AI. Nips are making free games now
https://elog.tokyo/sp/adventure/game_1597.html
>>
>>106683055
I swear all japanese are autistic
>>
File: 1742721317489520.png (278 KB, 459x750)
278 KB
278 KB PNG
>>106683076
Cultural differences
>>
>>106683099
that's chinese
>>
>>106683106
same thing
>>
>>106683113
not remotely, one was raped by the mongols over and over, the other raped the first (and had a sea between them and the mongols which worked much better than the first's wall)
>>
>>106683125
Bringing up modern China's mongol ancestry and Japan's rape of Nanjing doesn't make a good case for the two being different
>>
>>106683141
>>106683141
>>106683141
>>
>>106682460
Thank you but I think there are a lot of other optimizations of higher priority.
>>
>>106683076
Lmao wtf
>>
>>106682368
>speaker emitting a soft whirr



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.