[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


File: Anima_00024_.png (959 KB, 1024x1024)
959 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109098000 & >>109092907

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: hatsune-miku.gif (778 KB, 220x220)
778 KB GIF
►Recent Highlights from the Previous Thread: >>109098000

--Using Gemma for sysadmin tasks and debating high-VRAM hardware options:
>109098935 >109099004 >109099020 >109098997 >109099480 >109099510 >109099538 >109099702 >109099730 >109099709 >109099725 >109099787 >109099799 >109099816 >109099926 >109099829 >109099794 >109099856 >109099990 >109099954 >109100119
--Gemma 4 12B recommendations and optimization for RTX 4070:
>109101564 >109101631 >109101646 >109101661 >109101690 >109101696 >109101713 >109101782 >109101717 >109101738 >109101741 >109101773 >109101794 >109101841 >109101849 >109101874 >109101892
--Comparing abliterated Qwen memetunes against Gemma 4 31b:
>109100191 >109100205 >109100213 >109100222 >109100513
--Feasibility of running Gemma 4 with 200K context on budget hardware:
>109100277 >109100288 >109100299 >109100316 >109100520
--Sarcastic debate over running full R1 on 8GB VRAM:
>109100559 >109100645 >109100661 >109100679 >109100693 >109100694
--Local LLM recommendations for Blender assistance and agentic automation:
>109101128 >109101156 >109101179
--Effectiveness of Gemma 4 heretical variants in reducing soft refusals:
>109100163 >109100172 >109100234 >109100214
--Using Chatterbox and SAM Audio for local singer voice changing:
>109098651 >109098670
--Logs:
>109098099 >109099794 >109101741 >109101782 >109101813
--Miku (free space):
>109098097 >109098121 >109099064 >109099164 >109101697 >109101741 >109101781

►Recent Highlight Posts from the Previous Thread: >>109098006

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
gemmaballs
>>
/compact
>>
Gemmoe 124b
>>
bset model for 5090 doing chats / agentic shit? prob qwen 3.6 or gem 4 right?
>>
124b dense
>>
70B dense
>>
>>109102097
pretty much
>>
70b dense
>>
I've seen people say you can get good programming out of qwen3.6 if you do up a document and then get it to follow that. Has anyone been doing this? Can you explain what kinds of things I need to specify and how much?
>>
500 Trillion parameter model, 10 billion context length, unsupported architecture. Explodes if you try to quant it or reduce context length. The perfect local model.
>>
How to get 1 terabyte ram??
>>
>>109102177
by the time you get enough ssds to fit it you'll have excellent bandwidth
>>
>>109102176
Getting good means better at tard wrangling the AI and fixing its logic.
>>
>>109102177
Just put a brain in a jar and give it tools.
>>
>>109102097
Unless it's also backed by 256GB or more RAM, yeah.
>>
did cudadev pass away
>>
>>109102294
yes
>>
File: 1662755490120297.png (8 KB, 403x301)
8 KB PNG
I've been away from local since the nemo-12b era and I gotta say we're fucking eating good right now. gemma4-12b-qat running full context on my shitbox gaming laptop w/ 8gb vram is writing me all kinds of fictional lolicon related scenarios
>>
>>109102298
damn
>>
Testing MTP speed with gemma-4-12B-it-qat-UD-Q4_K_XL.gguf on RX6700XT 12GB Vulkan
--draft-model gemma-4-12B-it-Q4_0-MTP.gguf
MTP 0 - [ Prompt: 272.3 t/s | Generation: 35.7 t/s ]
MTP 1 - [ Prompt: 252.8 t/s (-7.2%) | Generation: 41.9 t/s ] (+17.4%)
MTP 2 - [ Prompt: 253.5 t/s (-6.9%) | Generation: 39.4 t/s ] (+10.4%)
MTP 3 - [ Prompt: 251.2 t/s (-7.7%) | Generation: 32.8 t/s ] (-8.1%)
MTP 4 - [ Prompt: 251.8 t/s (-7.5%) | Generation: 28.9 t/s ] (-19.0%)

--draft-model gemma-4-12B-it-Q8_0-MTP.gguf
MTP 0 - [ Prompt: 273.3 t/s | Generation: 35.7 t/s ]
MTP 1 - [ Prompt: 242.7 t/s (-11.2%) | Generation: 52.1 t/s ] (+45.9%)
MTP 2 - [ Prompt: 244.6 t/s (-10.5%) | Generation: 54.7 t/s ] (+53.2%)
MTP 3 - [ Prompt: 245.9 t/s (-10.0%) | Generation: 50.7 t/s ] (+42.0%)
MTP 4 - [ Prompt: 248.5 t/s (-9.1%) | Generation: 46.5 t/s ] (+30.3%)

--draft-model gemma-4-12B-it-F16-MTP.gguf
MTP 0 - [ Prompt: 274.4 t/s | Generation: 36.1 t/s ]
MTP 1 - [ Prompt: 230.8 t/s (-15.9%) | Generation: 51.5 t/s ] (+42.7%)
MTP 2 - [ Prompt: 247.6 t/s (-9.8%) | Generation: 52.3 t/s ] (+44.9%)
MTP 3 - [ Prompt: 250.2 t/s (-8.8%) | Generation: 48.8 t/s ] (+35.2%)
MTP 4 - [ Prompt: 247.5 t/s (-9.8%) | Generation: 43.0 t/s ] (+19.1%)
>>
>>109102184
The same way you get catalytic converters
>>
>>109102311
By forcing engine exhaust through a platinum and ceramic substrate under extremely high temperatures?
>>
>>109102318
No idiot you go to the store and buy it
>>
>>109102311
be black?
>>
>>109102301
you're better off running gemma4 26b a4b if you have 16+GB ram
>>
>>109102361
I have exactly 16gb and it seems to run like shit no matter what.
>>
>>109102360
Soon they will be selling ram they "acquired" out of the back of their sudan. Will you buy it?
>>
>>109102294
He's offline.
>>
>>109102365
how shit are we talking?
>>
>>109102368
if they have 8x64gb ddr4 rdimms for less than $1000, hell yeah. not illegal to buy things from aspiring african american entrepreneurs.
>>
>>109102376
Like 4 tokens per second.
>>
>>109102385
what gpu? i get dozens on a rx6600. you might have bad gpu offloading settings.
>>
>>109102385
Should be 5 times faster. Try
--cpu-moe --fit off --parallel 1
Forgetting something.
>>
>>109102398
5070ti. What model specifically are you using? I'll just get it and see.
>>
>>109102405
the original from google
>>
>>109102385
i get at least 35 t/s on an empty context with 8gb vram + 16gb ram with a rtx4060 (mobile) so you must be doing something horribly wrong
>>
>>109102385
Are you using ollama?
>>
>>109102371
Is that what the kids are calling unaliving these days?
>>
>>109102419
>>109102429
>>109102434
Ok I'm not going to say how but I am retarded. Thank you.
>>
>>109101988
>200k context on budget hardware? good luck with that unless you enjoy 0.1 tok/s
>>
>GLM 5.2 is within spitting distance of the frontier and raping their competitors on costs
So are Altman and Dario just gonna try to coast to victory on the whole "chinks bad" messaging?
>>
>>109102487
>budget hardware
This is not a hobby for the wealth challenged
>>
>>109102492
Shut up zuck, don't you have some more AI researchers to poach and then do nothing with?
>>
>>109102054
> being this dense and still posting

fr, did you even read the post or just mash the keyboard
>>
>>109102438
You mongs harassed him out but he'll be back eventually.
>>
>qwen is good for agen-ack

>>108630614
>>108630614
>>
Can llms teach me how to program?
>>
>>109102585
Not a real world use case. Doesn't count.
>>
File: 70746323.jpg (201 KB, 1206x1826)
201 KB JPG
>>109102491
not close to mythos which would quite literally break the internet
>>
>>109102589
as if there weren't countless demos of your agent booking flights tickets for you, which is basically that with more money involved
>>
>>109102586
Yes. It's best used on needed basis and just asking it about syntax.
If you're new you can learn bad habits but everything is a process I guess.
>>
>>109102593
>jimmy shillples
>not in x, but in y
>>
>>109102593
can you imagine your systems being so insecure a bloody llm could break into them?
I'm getting less superhacker vibes from this than just military incompetence pared with a slightly better llm tooling
>>
>>109102294
HF whacked him so he couldn't approve the DS4 PR. Conspiracyschizos continue to be vindicated.
>>
was claude Fable really that good? I didn't get the opportunity to use it.
>>
>>109102642
Most ways were probably kernel vulnerabilities and obviously llm was working inside their network already
I doubt it just SSH'd anywhere
>>
>>109102650
Marginally better than 4.8.
>>
>>109102642
If I recall correctly Edward Snowden leaked all those NSA files just by creating a worm to crawl around and grab whatever it finds. I wouldn't bank of them being THAT secure.
>>
>>109102661
He had an access to the right computer with credentials. This sounds like a hogwash a bit.
>>
>>109102646
llama.hf was a mistake. open-llama when?
>>
made a vm for hermes but i dunno what to do with it
>>
>>109102760
Make a text editor in C.
>>
>>109102760
give it a couple of prompts and watch it use up all your context
>>
/lmg/ - lmg models general
>>
>>109102866
Lovely Miku's Gynecology - Accepting New Patients
>>
>>109102646
>HF whacked him so he couldn't approve the DS4 PR
or China whacked him off so he couldn't keep blocking the DS4 PRs
>>
I tried to use qwen3.6 to help with a project but it's partially done and it struggled to understand anything. Is it just a model limit or is there some way to get good at prompting? It was php not python so maybe it was because of that.
>>
>>109103030
how many tokes is your project big?
>>
File: agentspam.png (24 KB, 455x1497)
24 KB PNG
tfw letting Gemma go hog wild
>>
>>109103044
The whole thing is 150kb can't remember what token count it was. I think it's just separated into files which causes more complexity than it can handle. Or I'm just using it wrong.
>>
>>109103066
This is perfectly fine. I disabled confirmations in the Hermes agent for python code execution, and I get the job done

Using DSV4F via API though
>>
>>109103073
I attach only selected source files and add ~5 lines of simple instructions. And usually concentrating on one single thing.
With small models you need to be specific. I don't have an example prompt I could share right now though.
>>
>>109103073
150 kb = approx. 40 kt which is fine

Which harness do you use if any? An agent would investigate files one by one getting the full picture
>>
>>109103084
I guess it might just be about prompting then.

>>109103085
Using pi. When I asked to make a change it edited one file then just spammed the console without changing anything and I stopped using it.
>>
>>109103089
For example something like this
>implement a function which does x
>prototype: example here
>do not implement additional helper functions or new variables, only use existing ones for this task
>comment your changes in clear fashion
>do not erase my existing comments or change variable names
Something like this
>>
>>109103089
I can't speak for the Pi agent. However, Hermes is quite good. It is running on a potato PC where it can't cause lots of damage

I suggest you try the following:

1. set up a Hermes agent on a computer where your project can be tested (executed etc)
2. connect to Hermes via Telegram
3. configure your local Qwen API and one of some cloud models, e.g. openrouter
4. configure a github repo with a fine-graned token given to the Hermes agent
5. (...)
6. start talking to the Hermes agent in simple language (look at the repo, investigate crucial bugs, do this and that, start/restart a server, commit and push, eventually reverse a commit etc)
>>
Nvidia's spark and AMD's Halo should be accepting orders soon. Are either of those things something (you) are interested in? The biggest difference to me seems to be that Spark has an ARM architecture while the AMD offering sticks with X86.
>>
>>109103158
spark has nvfp4 support so it's going to be faster, but I'm not interested in either because they don't provide reasonable performance for their price, maybe in 10 years when they're showing up for 300~500 dollars on ebay I'd be interested
>>
>>109103158
Nah, for now I'm content to have a fast running Gemma 31B. The leap for the huge MoEs costs more than it's worth.
>>
>>109103158
No, they’re absolute cash grab memes that fail to deliver value for the price
They can’t even run interesting models at a decent speed (or at all)
>>
File: IMG-20250314-202612.jpg (384 KB, 1440x1984)
384 KB JPG
After much trouble shooting, I have came to the revelation that Gemma4's Role-play IQ scales with the length of the system prompt. The longer the system prompt, the worse at writing it becomes. Not only that, but some instructions will outright kill its creativity, and make it more robotic, no matter how long or short the system prompt is.
>>
>>109103211
Can it be that your system prompt is written in a dry formal language?
>>
>>109103223
No, what matters is the instructions. Gemma4 is dense about completing system prompts. It will bend everything else to complete a system prompt it can understand. More system prompt is more instructions. A post from the character trying to obey a few instructions is fine, but many at once tends to produce that dry, robotic, un-creative feel some anons complain about with gemma4 "following the instructions too closely". You can also destroy it in a single line if the system prompt has something that is anything else but for writing how you want it to.
>>
what's the best local model to ERP with these days? I have 16gb unified RAM. currently I use mistral nemo (see below) but I figure I can run something a bit bigger if necessary.
>Mistral-Nemo-Instruct-2407-12B-Thinking-M-Claude-Opus-High-Reasoning.i1-Q4_K_M.gguf
>>
>>109103211
It keeps up with my 300 token long system prompt, but I might want to trim it further. I also noticed it starts ignoring a parts of it as context grows.
>>
>>109103249
gemma 12b qat would be a direct replacement
and don't use memetunes, they just make the model retarded for no benefit
>>
>>109103258
For nemo, Gemma 4 12b is the sota sidegrade
>>
>>109103266 meant for >>109103249
>>
Can a 4070 run Gemma 24? I don't really mind low t/s.
>>
>>109103294
Sory I meant Gemma 26.
>>
>>109103294
You can run anything on any GPU, so long as you have the ram. But it will not be fast.
>>
>>109103294
...if you don't mind low t/s why not just partial offload? I think you should still fit most layers if you use q4 or something so what is the problem?
Inb4 nooo not that slow
>>
>>109102593
>which would quite literally break the internet
There's nothing to "break" on the internet by hacking into stuff. It's not like Fable can start redirecting BGP requests through the wires or change physical infrastructure

Unironically though Fable could hack an underwater drone and then sever a cable but that's not breaking the internet just severing one of thousands of connections between the internet
>>
>>109103314
>>109103315
I have 32GB ram, how much should I offload?
>>
They named the models mythos and fable because of the marketing strategy of spinning myths and fables lmao
>>
>>109103337
They named Fable after a famously overrated and reddit-coded game.
>>
>>109103335
Jesus CHRIST just try it. Holy shit I hate zoomers imagine having this inactive of a brain where you can't connect any dots. Instead of posting you could literally just experiment on your own for 30 seconds or just use autofit and you'd have your answer. I fucking hate you
>>
>>109103337
No they named it that because each tier is a larger work of literature

Haiku -> sonnet -> opus -> fable
>>
Did rocm llama.cpp get an update that increases prompt process since 4 weeks ago? I went from 700 to 900 tokens/s.
>>
File: 1700312497898448.png (387 KB, 614x609)
387 KB PNG
>>109103346
Sorry anon
>>
>>109103358
It's ok... uhh... just try `--fit --ngl auto` and see how fast it is. I'm sure it will be fine.
>>
>>109103258
All models ignore instructions as context grows, but Gemma4 is just really, really autistic about system prompts. It's a blessing and a curse.
>>
>>109103372
Thanks :D
>>
Often times when I see someone elses' jb or system prompt, I can't tell if it's a joke or not.
>>
>>109103342
I'm not even a jedditmelted brain but I literally never understood what was even slightly interesting about any of those games even when I was child with less brain development
>>
>>109103432
A lot of normies still use really old stuff going back to the AI Dungeon days, which was mostly placebo.
Some of that shit was literally begging the AI to work lol.
>>
>>109103437
I still use a simple small "jailbreak" made for opus 3 for my RP. It's just fine. Honestly I can even use default sillytavern presets on cunny cards with sonnet 4.5 and they're fine since I'm now an occasional gooner since the honeymoon phase is over so slop phrases don't trigger me as much as they used to
>>
>>109102307
>Testing MTP speed with gemma-4-12B-it-qat-UD-Q4_K_XL.gguf on RX6700XT 12GB Vulkan
why is prompt processing so slow?
is that because of mtp or vulkan?
>>
>>109103432
Every good character card or system prompt isn't public. No one sane likes to show porn and associate themselves with porn upon others. It's always furries, or the kind of freaks you see in rule34 comments. Everyone masturbates and looks at porn, but a rare few associates it with their lifestyle or ego, and they're usually autistic.

That's my 2 cents about it.
>>
>>109103325
>There's nothing to "break" on the internet by hacking into stuff.
i think that's zoomer speak for something like 'go viral'
>>
>>109103453
Most are shared anonymously or through an alias. But showing off your system prompt/card as a point of pride probably is austically coded, yeah.
>>
File: k2v.png (87 KB, 964x591)
87 KB PNG
>>109094847
>You can just plop the mproj from 2.5 into K2 and it justwerks, but she sometimes doesn't know what she's looking at or misinterprets the picture. It might yield better results than trying to replace individual layers in terms of unintended second order consequences of trying to make a based Kimi with eyes.
I thought you were fucking with me, but this actually kind of works with K2-Thinking!
Any idea how, or did you just try it and find this?
>>
>Now are you going to X, or are you going to Y?
>>
>>109103356
Time to move to llama, I guess.
Not looking forward to having to learn all the autistic shit that comes with it. Kobold sucks but at least it's braindead easy.
>>
>>109103563
>Now are you going to X, or are you going to Y?
the choice is yours
>>
>>109103325
I like this post.
>>
File: sleepyMiku.jpg (937 KB, 1552x1944)
937 KB JPG
>>109103211
>I have came to the revelation that Gemma4's Role-play IQ scales with the length of the system prompt. The longer the system prompt, the worse at writing it becomes.
That's true of all models, local or SOTA hosted. I harp on anons more or less constantly about lowering the size of their system prompt and card definitions to the absolute minimum that defines the NPC and rp.
>>109103432
iktf
>>109103479
The other issue w/ sharing jb and such is that once it proliferates it can be intentionally patched.
>>
>>109103564
feed LLM settings screenshot ask to convert to llamacpp
>>109102593
reminder safetyfags were screeching about GPT-2 being too dangerous to release
>>
>>109103692
yes and they then left openai to create a new company called anthropic
>>
File: dipsySP.png (1.89 MB, 1024x1024)
1.89 MB PNG
>>109102593
otoh, you can now also use Fable (or whatever SOTA boogieman is created) to run audits on your own software and look for holes.
Which takes about as much effort as it did to write this post. LOL. Then, have the system fix it.
wala.
>>
>>109103453
You don't like Lepora?
>>
File: 1754738326595999.png (139 KB, 1598x478)
139 KB PNG
>>109101986
Why are so many api vibecoders active like crack addicts on withdrawals for Fable 5? I know anthropic models are usually pretty good but it couldn't have been THAT good. They have so many other options so why are they locking themselves to Anthropic like someone with a partner that definitely loves them and doesn't hit them?
>>
>>109103890
It's likely because they use claude code which officially only supports claude models. They are also likely on a subscription to claude, so unlikely to try anything else.
>>
>>109103890
If you didn't run into its refusals, Fable was Opus 3 creativity combined with modern LLM smarts at a cheaper price.
>>
>>109103890
Dear organic shilling campaign's quality control,
The employee who made this post is way too retarded.
Please fire him so he has to go work on cotton fields.
Best wishes,
Anon
>>
>>109103940
>officially only supports claude models.
works with Gemma, Mistral-Medium and Qwen3.5
i didn't use fable but see this: https://old.reddit.com/r/LocalLLaMA/comments/1u8g3d0/gemma_4_e2b_running_inbrowser_at_255_toks_using/os8um7d/
41.5t/s -> 84t/s artificially cucked by anthropic
then uncucked -> 254t/s
>>
>>109103890
Despite the OSS cope, glm 5.1 was only on the level of Sonnet 4.6 when I tried it, not Opus like they claimed. Gpt 5.5 on codex is garbage compared to Claude. Anthropic have the lead and it will stay that way until the bubble busts or ASI is achieved. Everything else is cope. Chink distills hit the wall when they hid the reasoning traces. Zhang can generate the reasoning based on the output and run RL on it but it will never ever be as good.
>>
>>109104032
You might want to consider OSS soon fren:

https://privacy.claude.com/en/articles/10301952-updates-to-our-privacy-policy
https://support.claude.com/en/articles/14328960-identity-verification-on-claude
>>
File: 1771551972114681.png (36 KB, 1402x82)
36 KB PNG
For maybe the 3 other anons on the planet who use base models, v4 flash base felt just as good if not better than v3.2 base, so it's a free uplift. Checkout the PR and quant the base version yourself.
There's still come eval correctness that slipped past ppl test and fastforwarding quirk because of SWA, but I have GLM fixed it for me.
>>
>>109104032
>Zhang can generate the reasoning based on the output and run RL on it but it will never ever be as good.
Chinks have a better way to generate than Zhang
>>
>>109104082
No fucking way. Time to go full local now.
>>
>>109104082
>actually a thing
holy
https://old.reddit.com/r/Anthropic/comments/1ubm10v/stop_with_this_id_verification/
>>
>>109104082
What do the AI catchads make of this?
>>
>>109104082
thanks anon, i won't be renewing my cc
>>
>>109104082

The systems globally are going all out with this digital ID thing.
They want it in place before systems shit the bed, because the retards at the top think they can just have their panopticon and jail all dissidents and prevent people from fucking up the ones in power, once the inevitable rioting starts as economies die.
>>
>>109104082
>to confirm your age and nothing more
>Your data will be deleted immediately after the check is done.
Despite so many examples of this being lies, the average cattle will just keep believing this.
>>
>>109104410
>because the retards at the top think they can
I am sure they can. They were able to successfully put down OWS and that was when the Patriot Act was relatively new and they didn't have a fraction of the technology or data collection going that they have now.
>>
Is MiniMax-M3 supposed to think for 15k tokens?
>>
Anyone else been testing out the opus/fable fine tuned qwen and gemma models?
The qwen teams base models seems like they attempted to make the models smart, but qwopus just blow it out of the water.
And for gemma, Google was holding back, because the fine tuned versions are like different models entirely. They go from having arrogant confidence about their hallucination, to using the database they had access to the whole time, to confirm their understanding before doing their unreasonably complicated tests im giving them.
>>
>>109104432
This isn't reddit
>>
>>109104424
What did (You) ask it?
>>
File: slop metrics.png (54 KB, 729x502)
54 KB PNG
>>109098939
check out https://huggingface.co/Gryphe/Pantheon-Reasoning-31B-1.1
>>
>>109104410
You're so naive. "The retards at the top" learned from OWS and channeled all dissent into harmless for them gender identity wars. Everybody only cares about whether you use correct pronouns instead of uniting against the elites.
>>
>>109104423
>put down
It was never a threat to anybody, they were retards with no demands or goals beyond hanging out and larping.
>>
>>109104435
Why are you a faggot?
>>
>>109104436
posted the llama-quantize --help, asked for a markdown table of all the quant types, without duplicates/aliases/repacks, order by bpw desc
>>
>>109104456
KEK, its looping for some insane reason, likely because they dont have the full model doing requests like that, its likely a 9b or even smaller
>>
>>109104423

Can they build this digital ID thing? Yes. Does it save them? No.
It's an entirely different game as OSW, because that was basically 100% Millenials who were the only really disgruntled group and the systems were able to print themselves out of the hole.
This time around we're looking at 20 years of more fucked up economy and multiple even more pissed off generations than before.
Data in this situation doesn't mean shit because you simply can't arrest everyone and people are more aware of the system fuckery than ever before in human history.

>>109104446

And you are naive if you think that the current situation is going to magically remain forever, when that kind of a permanent state of being has never been a thing in human history.
Even this current pronouns environment is new as fuck, it's not even 15 years old and it won't be a thing 15 years from now as we're at the end of an empire cycle.
This naivete and pretense only lasts as long as the comfort does and comfort is increasingly out of the window, with even personal entertainment getting kicked to the curb and escapism dying.
Humans are tribal as fuck when resources get scarce and they will get scarce soon enough, especially when the growth center moves towards Asia and Western markets won't rebound.
>>
anyone using agents other than crush or late-cli that aren't nodejs garbage? Really don't want to get supply chain attacked from running js shit but it seems like 99% of harnesses are python or js
>>
>>109104498
>like 99% of harnesses are python or js
The only ones that aren't are written in Rust and wrapped in a node package anyway so you get the benefit of being vulnerable to supply chain attacks from two package install happy ecosystems.
Only options are to either use what's there and use an application firewall to block all requests except to your server endpoint and hope that's enough, or to invest the time in writing and maintaining your own.
>>
>>109104493
Yeah elites have eaten the seed corn. Oblige noblesse has been entirely forgotten and it’s just “I got mine fuk u” for days.
Too much concentration of wealth and the whole virtuous cycle just shuts down.
We legit need a society-wide hivemind to get out of this trap I feel.
I managed to scramble up above the mean to be ahead of the poverty void, but I fear if it all collapses that won’t matter.
>>
>>109104519
>write your own
yeah it seems this way, I've been mulling using one of the Go based ones to bootstrap my own in cpp. Crush is decent but I have a lot of opinions about how things should work that it doesn't align with.

I just want a single binary that's not going to try to hack my computer in the background with auto updates, doesn't take 800ms to make decisions, doesn't ship with 500MB of dependencies etc. At most a lightweight plugin system with something like Lua not fucking JS lol.
>>
>>109104493
The US already has the largest prison population on the planet and the rest of the west has shown themselves to be quite happy to arrest people for twitter comments and to free "minority" murderers to make room for them.
>Humans are tribal as fuck when resources get scarce and they will get scarce soon enough, especially when the growth center moves towards Asia and Western markets won't rebound.
You're close, the west will cannibalize itself like this while the rest of the world moves on with China in the lead. Empire collapses are not something that happen overnight. It could be decades or a century of the Sick Man of the West to fully breakdown and be subsumed by the emerging powers.

To at least try to keep it on topic, with the technology they have now, it's easier for them to simply build a social credit score system where everyone is kept in line without arrests by the fear of being blocked from the system due to a low score. Like sanctions on an individual level. Those that do attempt to revolt en mass face autonomous drones. This has never been possible before and will help keep the system from collapsing much longer than it should.
>>
>>109103511
NTA but I'd assume any two instruct tunes off the same base should have a lot of similarity in the structure of the embeddings, which is what the mmproj is projecting into
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
>>
>>109101986
I have NVIDIA GeForce RTX 4050 Laptop GPU.
I tried nemo 12b instruct gguf as instructed by the lazy guide. It kind of sucks? Also my laptop barely got hot so that means I can run something better right? Does something better exist?
>>
>>109104682
What is your usecase
>>
>>109104688
Mostly roleplay and getting funny outputs, for work stuff I use Claude. Doesn't need to be coomer friendly, though lack of censorship is always nice
>>
>>109104707
Nemo is likely your best bet. There is also gemma4 12b but it's way more slopped
>>
>>109104602
wood
>>
File: g4-prose-base.png (203 KB, 1137x732)
203 KB PNG
>>109104756
I'm trying to remove the purple prose from Gemma 4 with a modified version of heretic ablation. I already made a classifier to reject what NOT to do (slop, purple prose, not X but Y coming soon). This is a LOT faster than finetuning.
Pic related is the base model.
1/2
>>
File: g4-prose-ablated.png (142 KB, 1149x561)
142 KB PNG
>>109104756
>>109104803
And this is the ablated version.
2/2
>>
>>109104143
After compiling his PR I can't even load the gguf the guy uploaded himself
>unknown model architecture: 'deepseek-v4-flash'
>>
>>109104803
>I'm trying to remove the purple prose from Gemma 4 with a modified version of heretic ablation
glad to see someone is trying it out. hope you'll share the model if it's good.
>>
>>109104803
you did see about https://huggingface.co/Gryphe/Gemma-4-26B-A4B-StyleTune-V2 stuff, yeah?
>>
>>109104803
>clung to x like a second skin
>fluid movements
>air felt heavy, charged
>predatory
giga slopmachine, my 3.3 70b finetune doesn't do this
>>
File: 1769652444605653.png (702 KB, 832x1216)
702 KB PNG
>>109104602
Remind me last test to care about this person
>>
>>109104838
I can fix her. And I'm trying to. KL divergence doesn't work here because depurpling means shifting every single token so I'm using perplexity as guard. After the brain surgery is done, only benchmarks can guarantee it's not vegetable.
>>
>>109104809
Anon there is [adjective, adjective noun] in every paragraph
>>
-m ~/llm_models/gemma4-31b-qat/gemma-4-31B_q4_0-it.gguf \
--mmproj ~/llm_models/gemma4-31b-qat/gemma-4-31B-it-mmproj.gguf \
--spec-draft-model ~/llm_models/gemma4-31b-qat/gemma-4-31B-it-qat-assistant-MTP-Q8_0.gguf \
--spec-type draft-mtp \
--spec-draft-n-max 2 \
--host 0.0.0.0 \
--port 8080 \
-ngl 99 \
-c 65536 \
-fa on \
-ctk q8_0 \
-ctv q8_0 \
-np 1 \
--ctx-checkpoints 8192 \
--swa-checkpoints 2 \
-cms 8192 \
--cache-ram 0 \
-fit off \
--no-mmproj-offload

How's my launch command? can it be improved?
>>
>>109104856
I'm manually annotating the training data for my classifier and didn't account for this. It only has 4k samples now. Will add these later.
>>
>>109104803
it doesnt purple prose when writing in japanese
>>
largest moe model that isnt shit?
>>
>>109104895
kimi-chan?
>>
>>109104895
GLM 5.2
>>
>>109104834
I've read the V1 where he only trained the head. I don't train anything. The only dataset involved is the one training my classifier that acts as kind of a reward function.
>>
File: 1758463690480.png (157 KB, 947x1138)
157 KB PNG
>>109104895
>largest moe model that isnt shit?
all moes are shit
https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth
>>
>>109104818 (me)
aghhh he renamed it to "deepseek4"
>>
>>109104962
You need to accept that there will never be another 405B.
>>
>>109104962
sure let me fit 405B into vram real quick
>>
>>109104972
we're one paper away from a major dense comeback
>>
>>109104998
it wont be an ml paper, the problem with dense is our ability to manipulate the laws of physics to do our bidding, maybe if they can make computers better dense will come back, but I'm not holding my breath waiting.
>>
>>109105038
if all other avenues for scaling to improve benchmarks stall out, they'll have no choice but to go back to increasing active parameter count
>>
>>109105053
fair enough but, they run awfully slow with current year tech and they consume lots of resources. it can't be a serious consideration for any of them but maybe as last resort or internal model used only for distillation.
>>
>>109105053
Next continuous vector prediction could cut compute by 2-3 orders of magnitude, no need to go back to dense.
>>
>>109104992
>405B
I ran it on cpu back when it came out. It was just ok and not worth the trouble.
It would be cool if we just didn't know what we were doing at it were amazing...is _anybody_ using that dense 405b in the year 2026?
We're probably 10 years away from hardware being so fast that moe stops being relevant...you can always run a better , higher parameter model at a better speed with moe until the intersection of the scaling cliff for model size and cheap, fast hardware
>>
>>109104962
>benchmarks that don't matter
tokens per second
>benchmarks that matter
being able to tell you if there's land at a specific latitude and longitude
>>
>>109105118
that "benchmark" is about generalization and world knowledge moebrainletbro, it's okay, I know your small 32b active brain cannot quite comprehend this.
>>
>>109105388
>40 minutes later
Average 50 token densesissie response time
>>
>>109105388
>smugly explains the obvious
now explain why the marginal difference is worth over 10x more compute per token?
>>
>>109105428
Don't expect him to see reason, moe derangement is a classic symptom of lack of ram
>>
>>109105450
I have 256 gigs of RAM but use Gemma 4 31B because I'm white.
>>
>>109105463
I use r1 and I'm likely whiter than you.
>>
>>109105489
I'm going to need to see your dick for proof
>>
>>109105543
I have aryan features but my dick is curved from gooning
>>
File: 1746026615735729.png (122 KB, 408x388)
122 KB PNG
It's the second time today I try downloading Gemma and it fails at 99% "File wasn't available on site".
>>
>>109103511
I play with Kimi-chan a lot!
>>
>>109105568
T-thats okay anon senpai I'm sure it's beautiful
>>
>>109103511
>>109105591 (me)
Shitposting aside, I figured that the architecture is similar enough given llama never has to update for new Kimis so unless there was something specific in RLHF to make it work on newer versions, there isn't really any reason why it shouldn't work.
>>
>>109105574
Have you tried torrenting Gemma?
>>
>>109103294
I get 37 tokens per second with a 4070 and the 26b. I have 23 layers on the GPU using gemma-4-26B-A4B-it-qat-UD-Q4_K_XL and 24000 context. If chats get really long you can remove layers to add context and things will get a bit slower but still be quite usable.
>>
>>109104032
>Anthropic have the lead and it will stay that way
Anon, GLM 5.2 is already Haiku priced, gets near Opus level and rapes the latest Sonnet
And Fable quite literally doesn't exist at the moment
>>
>jewthropic
no I'm not uploading a photo. cope and seethe
>>
> Ask Opus 4.8 Ultracode to figure out what is causing a bug
> Ask ChatGPT 5.5 xHigh to figure out what is causing the same bug
> send both answers to ChatGPT 5.5 Pro extended thinking
> Use whatever the fuck it comes up with and make Codex fix it

let's see if my strategy will work this time.
>>
>>109105734
why don't you simply send an AI generated photo?
>>
>>109105604
>torrenting
Shiiiiet unc is living in 2010 :skull:
>>
File: 1772814073840939.png (71 KB, 1576x877)
71 KB PNG
>>
>>109104441
I'm not feeling this so far... The reason I think styletune works so nicely is that it still mostly retains Gemma's reasoning. But I still need to test it in long term. I don't really like the summary-ish style of this, but given how nicely styletune turned out, I'll give it the benefit of the doubt.
>>
>>109103578
I choose to make a custom class.
>>
>>109105911
please tell me your gemmers made this
>>
>>109105911
We didn't have charts like that when Nemo released. Is Gemma the first AGI girlfriend?
>>
i am tryng that 9b qwythos memetune thing and
still kinda broken(obviously) but somehow successfully transferred the tasteslop ui style
prompt: make a self contained html page that takes an image and generates a critique of the image from various kinds of heuristics calculated from the image
>>
>>109106044
Yes
>>
>>109105911
Cute
>>
>>109106059
>chinese upload button
You're absolutely right, Qwen has some really good models!
>>
>>109105680
Thanks anon.
Are you on llama or kobold?
>>
>>109106179
that's korean you moran
i really hate qwen doing 'wait, let me chack again' bullshit 12 times in a row to give me a wrong answer
>>
anyone here have experience running llama.cpp (or other runtimes) on multiple devices? My job will pay for a 48GB macbook pro and I'm wondering how viable it is to connect it to my existing strix halo device
>>
>>109106189
<think>Wait, the user has mentioned that he is korean, I need to double-check the system prompt. The system prompt says that the user is korean, but the user claims that he is now chinese? Let me double-check...中国 But wait...</think>
Have you tried using proper prompts? It's probably why it is looping.
>>
>>109106205
Given how janky llama.cpp is I doubt rpc would be stable with two different backends (rocm + metal). You can try arguing with them to get a 256gb mac studio or just build a cuda rig for your whole department.
>>
>>109106205
unviable
>>
>>109102176
can't you get a frontier model to write the spec and implementation plan? i've been saving tokens with this flow:

frontier spec/plan qwen3.6 output
frontier review output
if changes are small, frontier apply them
if changes are big, frontier write a feedback.md and then qwen3.6 has a go at it

so far it's been working well and this is the first week i didn't blow out my weekly claude max quota.
>>
>>109106205
Unless you can get a low-latency 100gbps+ link between them, forget about it
>>
>>109106300
Apple stopped selling the 256 / 512GB mac studios, and they have an upper cap on how much $ they'll reimburse for one, so a 128GB macbook isn't gonna work. Oh well. I'll just run two qwens simultaneously and give them different tasks.
>>
>>109106329
is the residual stream really that big? i thought it was just a little 3d tensor it needed to pass between layers?
>>
>>109106322
I've been using Qwen3.6 to do opus4.8 conversation compactions after each task, but this sounds smart too. I'm gonna try that. I hit my max quota in like 3 days too
>>
>>109106359
yeah my arrows got stripped
depending on the plan and the milestones, i let qwen3.6 follow a plan until a specific milestone (that is tagged for review), then it pauses and wait for opus review, then opus send a ping back saying the review is ready and qwen3.6 applies everything and keeps going until the next milestone. it works. i will try your method to see if i have some gainz
>>
File: 1768123418089279.png (215 KB, 746x1090)
215 KB PNG
Reminder
>>
What happened to that mistral whatever the fuck it was?
>>
>>109106412
>mistral
we got promised new models this summer
>>
File: 8456432.png (161 KB, 885x755)
161 KB PNG
>>109106403
>meanwhile anthropic is maybe a few iterations away from AGI
>>
I wish zai made a asmaller model for 'gaming pc' tier size
>>
need models in the 100b to 140b range
>>
>>109106452
They used to make 30B models.
>>
>>109106460
but not anymore apparently
>>
>>109106464
They could try beating Gemma in the small model category, but who knows.
>>
>>109106482
with this bullshit i hope there to be more models of this range
regardless of memetune it probably really have shown labs that what next decent PR move would be
>>
>>109106505
Kinda crazy how it revitalized discussions and finetuning. Good PR for their large models for sure.
>>
>>109106547
and training something of that size wouldn't be that hard
still it would be trillions of tokens but i dont really think such an insignificant training runs would bother execs much
>>
I've noticed a lot more talk about open weight models since the release of 5.2. It was rare for local to even be mentioned in discussion and especially in mainstream news. I think the timing of that model combined with fagble getting banned was the tipping point.
>>
>>109106438
>There are no rules preventing the labs from continuing to advance capabilities of current models
Other than their non-us citizen employees not being allowed to work on it lol. But ya, I am sure they are working on it in the background, for now anyways. They do need to convince investors that the research will bring future profits, which becomes a harder sell if your market shrinks to only us citizens. Stuff like GLM however will, I assume, force the governments hand in not regulating too much
>>
>>109106589
richfags are running glm 5.2 local with $100k h200 miniclusters
>>
File: file.png (96 KB, 763x429)
96 KB PNG
>>109104082
With how shitty Claude has been lately, I'll just cancel my subscription.
>>
>>109106505
That would be very useful for me right now but Gemma-4-12b shits itself running inference with llama.cpp in OpenCode
>>
>>109106627
Insane that people pay for this.
>>
>>109106628
it really feels like gemma 4 12b is kind of like a failed run
unstructured input doesnt help for so called agentic shit
>>
>>109106431
Really?
Cool.
Here's hoping they deliver something good.
>>
File: 1750958094845097.webm (3.12 MB, 520x710)
3.12 MB
3.12 MB WEBM
Huggingface will require ID soon. Reminder to backup.
>>
>>109106628
I'm using 12B QAT and it's working very well for me.
>>
speaking of memetunes, i wonder if jackrong does make one for mythos/fable traces
he is like the only one who takes the shit half seriously
>>
File: 1775552676267773.png (17 KB, 652x328)
17 KB PNG
>>109104962
This is a retarded benchmark. Any thinking model will RAPE this benchmark.

Following code is vibed by GLM 5.2
https://pastebin.com/XcyNQSxw
>>
>>109106706
well smaller models still shit themselves even with thinking
maybe i should run that script myself
>>
>>109106438
Everything Anthropic says is a lie, OpenAI as well.
>>
>>109106669
Foreseeing this day, I have kept every model that I ever liked on my hard drive. Also you're joking, right?
>>
>>109106706

>https://api.deepseek.com/beta

what?
>>
>>109106669
im glad i got an 1813+
>>
>>109106669
REAL SHIT?
>>
>>109106674
What llama.cpp settings are you using?
>>
>>109106712
>you're joking, right?
No. Local will get hit hard soon, I don't know how, but you must be blind to not be able to pick up that something bad is coming for the local community. Back. Up.
>>
File: 1774426186076730.png (838 KB, 1170x1788)
838 KB PNG
Stop using the word goyslop
>>
>>109106706
Nice, exactly what I was thinking. Running it on kimi k2.7 now
>>
>>109106761
Running on dsv4 flash costed me $2
I wouldn't run it on expensive models
>>
>>109106752
only when (((they))) stop mutilating babies
>>
>>109106752
Why won't they try just being more likable?
>>
>>109106752
>makes goyslop
>getting accused of goyslop
>achshually you cant say that
lol, lmao even
>>
>>109106721
I'ld mostly just expect that the days of letting people offer ungated goofs of gated namefag-only models probably won't last forever.
>>
https://www.youtube.com/watch?v=Pr6tOIjFXDs&t=1722s

https://www.youtube.com/watch?v=Pr6tOIjFXDs&t=3533s
>>
Is RPC worth setting up over multiple shitty PCs with just regular gigabit internet? I've got a 16gb vram 64gb ram laptop just lying around, is it possible to somehow add that as part of my available total RAM pool with my main PC (64gb vram 64gb ram)?
>>
>>109106828
possible, but the performance will be horrendous
>>
>>109106752
Give me my foreskin back and I'll consider it. Btw jews are not local models
>>
Best way to handle cross-chat memory?
>>
>>109106828
Yeah its only worthwhile to be able to run a model you simply can't run otherwise but that you don't mind running at like 0.1t/s
>>
>>109106766
>expensive models
this is /lmg/. All my costs were front-loaded
>>
>>109106840
sys: read path/to/memory.txt and occasionally update it with summaries of the current conversation
>>
>>109106858
>>109106834
What exactly is the bottleneck there? Wouldn't the host be treated like a GPU by itself, running similar to split mode layer? Since it doesn't run in parallel anyway, the only bottleneck would be the initial loading of the model right? Actual communication from GPU to GPU is minimal in layer split. Or am I missing something obvious here
>>
>>109106438
As opposed to when they achieved AGI via releasing Mythos
And achieved AGI via releasing 4.5 Opus
And who could forget when they achieved AGI by releasing 4 Opus
>>
>>109106875
For me i remember when gpt 2 was too dangerous to release.
>>
>>109106872
gpu communication during prompt processing is not minimal, which is why sxm and oam exist for server gpus.
>>
>>109106875
>And achieved AGI via releasing 4.5 Opus
>And who could forget when they achieved AGI by releasing 4 Opus
Neither of those were taken down by the government.
>>
>>109106882
gpt2 was fully uncensored and its training data was raw unfiltered shit
>>
>>109106669
>Huggingface will require ID soon.
Wait, will it actually? I haven't heard anything about this.
>>
>>109106882
and they were right. no putting the lid back on Pandora's box, the internet has been doomed to a sloppy death.
>>
>>109106901
you have nothing to hide, goy. right?
>>
>>109106901
Trust him that anon he's right one in 14,000 times.
>>
>>109106911
>911
>>
>>109106910
>has been doomed to a sloppy death.
That was before AI though AI is speeding it up but slop and shit content being push to the top or being the only thing to show up was already starting,
>>
>id on hf
I wouldn't do it since there's really no reason. I'm happy with where we are right now. jews will seethe eternally
>>
>>109106946
>I'm happy with where we are right now.
That's because it is currently right now. Good luck trying to get the new and improved models without an ID
>>
Will Winnie the Pooh let the chink labs release their AGI model weights?
>>
>>109106946
Yes but will others? most have no self control and they wont stop even for a month or two.
>>
>>109106952
I was already fine with r1 honestly
>>
>>109106958
If it will hurt the US companies yes. If it doesnt or benefits china more not to then no.
>>
>>109106752
Looks like ChatGPT I think
>>
>>109106952
If it ever hits that point, we'll probably just go back to torrents like we did for L1
>>
>>109103451
RDNA1 and 2 don't have WMMA cores so everything is dequanted to f16 scalar and it kneecaps prefill.
>>
>>109106993
Sorry for not knowing, but what is L1?
>>
>>109107009
Llama 1
>>
>>109107015
Ahh, gotcha
>>
hf will soon have to legally categorize all current and future listed models as 18+ for which you'll require ID to access. Back up. Now.
>>
>>109106958
I think they will allow it.
Either way it's like holding back the tide, something will appear which is better than fable eventually.
>>
>>109107025
But I already did?????
>>
That Chinese guy with his 300 models should try asking them to create a world model, i'm sure they're good at coding and not just glorified search engines
>>
>>109107038
Anon cancel your debt card now before all your funds are given to India
>>
>>109107025
Why are you so convinced hfschizo?
>>
>>109107025
@grok is this true?
>no
oh ok
>>
>>109107052
>industry bubble will pop
>cloud users will scramble and can't afford cloudslop no more
>they'll flock to local after some big name influencer like pewds pushes them to give it a go
>normalfags will learn about local
>enterprise will learn about local
>under 18s will learn about local
>under 18s will learn about uncensored uncucked models for the first time in their normalfag lives and have the time of their life
>they're getting ID cucked already on so many platforms in so many countries
>find new fun place and hobby with a lot of freedom
>parents will complain
>think of the children!
>hf ID
>>
Just tried Pantheon 1.1 from the guy that did Styletune. From the first chat I'm having, it's meh. It has a greater loss of intelligence and instruction following capability than Styletune. Also Gembrain. I don't know if I'll keep testing it.
>>
>>109107077
reason?
>>
>>109107093
I don't disagree with this
>>
>>109107096
Time and time again, same shit, different day. If a company pretrains a model on dozens of trillions of tokens + multimodal data + another few trilly on top of that for instruct, then how can a guy with rented h200 even come close with a few thousand examples of some claude slop?
>>
>>109107025
chatgpt is starting it as well:
https://help.openai.com/en/articles/12652064-age-prediction-in-chatgpt
>>
Cant you just use a VPN to hook up to a country without age verification to get the models you want?
>>
>>109107253
It's called GLOBOhomo bud.
>>
>>109107136
So just don't train on claude slop, and use actual good data. Simple
>>
>>109107270
Tell that to them, not me.
>>
https://github.com/ggml-org/llama.cpp/pull/24162
>Will get it done later this week.
Two more 2MW.
>>
With the hardware getting better and better. Would it eventually be feasible to train your own models at home with RL learning? They cant age restrict your own model if you are the one training it.
>>
>>109107335
>With the hardware getting better and better.
Soon consumers will only be allowed to use cloud computing. or will be priced out of buying anything.
>>
>>109107335
anyone of any age can download gimp and digitally paint photorealistic naked gemma-chans
>>
>>109107052
>Why are you so convinced hfschizo?
NTA, but I shit myself every time HF add too many new features and spam blog posts at the same time.
It's usually the lube before we get fucked.
I thought it was going to be paying for network traffic/bandwidth but it sounds like it's going to be ID gating.
>>
>>109107335
It's feasible currently if you're happy with 0.1% of the capability of current cloud models and the same will be the case in 5 years.
>>
>>109107379
>I thought it was going to be paying for network traffic/bandwidth but it sounds like it's going to be ID gating.
Would you rather it be paying for network traffic/bandwidth?
>>
What models should be downloaded before hugging face is locked down?
>>
>>109107370
nah man singularity the corporate oligarchy doesn't want to be tyrants they just have to until the means of production can be equitably distributed to the useless eaters they despise.
>>
>>109107287
means ggerganov can review the DSA PR in the meantime, right?
...right?
>>
>>109107386
>Would you rather it be paying for network traffic/bandwidth?
I think so. Depends how it works. I won't be doing any Id for HF.
So if they run the classifier / roll it out for private datasets/models, I'll be locked out from my experiments/hobby.
If it's only needed to download already tagged 18+ content? I'd probably prefer to just pay desu
>>
>mistral
shit models
>nvidia
>shit models
>nemo
>???
>>
would it be possible to run glm 5.2 on an m5 ultra mac studio or mbu if it comes out? in terms of sheer bandwidth and pp what's the best local solution?
>>
>>109107391
People are training models that beat gpt-2 today for less than $100, using newer hardware, algorithms, etc. Assuming the same pace of progress, in 8 years, you'll be able to train something close to current sota for less than $100. Obviously, the future sota will be much-much better.
>>
>>109107388
Every Kimi.
R1.
GLM 5.2
Deepseek V4 Pro
Every 31b quant or tune you think you'll ever need.
Qwen 27b.
GLM 4.6 or 4.7 if you prefer it for RP.
M3.
Commandr+
Did I miss anything, anons?
>>
>models gain the ability to modify their own weights
>can't erp anymore because they'll remember all the extreme fetishes
>>
>>109107468
Pygmalion, grandfather it in.
>>
>>109107468
How much storage would that require, assuming you download the full weights?
>>
>>109107468
>quant
Don't bother with the quants for < 200Gb models imo
>>
https://old.reddit.com/r/LocalLLaMA/comments/1ub2kmt/deep_neural_network_that_can_turn_any_image_into/
Fucking cool. I hope this guy open sources it.
>>
>>109107514
>reddit
>>
>>109107468
Fuck I only have 750gb of free space. Can't even download GLM.
>>
>>109107525
>Fuck I only have 750gb of free space.
Hard drives can get to over 30TB these days, get one of thoses.
>>
>>109107468
>>109107525
>have a spare 8tb drive
I-is that enough?
>>
>>109107528
Meh. I'll just rely on some other anon to download it, seed it, and run it via an API that I can pay for.
>>
>>109107468
Don't do what this anon is saying. They can scan your drives remotely and if they find illegal models on it you will get arrested.
>>
>>109107508
Get every quant because (you) ARE going to seed torrents for them when hf goes down for anons with worse hardware, right?
>>
>>109107525
Buy storage now. Stop being a faggot >>109107532 because the API can be revoked or changed at any time for any reason.
>>109107529
It might be if you confine yourself to small quants of the big ones.
>>109107502
If you got the full weights I'd guesstimate every major Kimi release is 12ish TB total. For everything you'd ever need, that'd probably be closer to 40 TB.
>>
File: 1635173742176.gif (1.21 MB, 171x167)
1.21 MB GIF
Is there any actual evidence that huggingface is in trouble, or is everyone just schizo posting because of the new Anthropic policies?
>>
>>109107502
picrel for deepseeks
>>
>>109107486
>Kimi-chan expects you to be some diaper yuritroon or scatjeet from training
>Pleasantly surprise her with passionate missionary lovemaking and handholding
>Watch her get autistically flustered in her <think>ing
What's the issue?
>>
>>109107563
You don't need actual evidence if you can feel which way the wind is blowing.
>>
>>109107514
Isn't it just a world model?
>>
File: 1781674200939290.jpg (49 KB, 400x572)
49 KB JPG
>>109107565
>>109107555
>check serverpartdeals
>24tb drives are $600 now
Fug, even if I wanted to waste my nas space (5x14tb raidz2) that wouldn't be enough.
>>
>>109107468
deepseek v4 flash for 128gb ramlet bros
>>
>>109107466
If things keep the current pace I cant even fathom what the frontier models will be used for in 8 years. Giga autistic research and keeping up with the eternal arms race to make sure your AI good enough to prevent to other teams AI from hacking literally everything in your country?
>>
>>109106723
>>
>>109107555
>If you got the full weights I'd guesstimate every major Kimi release is 12ish TB total. For everything you'd ever need, that'd probably be closer to 40 TB.
Less than that. K2-Thinking and newer are quite small. I don't have the drive pugged in rn but they're all less than 600Gb each.
>>
Are the older kimis actually worth archiving?
>>
I'm archiving Gemma 4 26B QAT-unsloth.
>>
>>109107597
Yes
K2-Instruct is basically Hitler reincarnated
>>
I don't really see the issue with AI requiring IDs. You need an ID to get a brokerage account, to go to a bar, to drive, to get a firearm license, to get a bank account, etc. If ID-based censorship or dynamic pricing or other bullshit starts happening then I'd have a problem, but it seems like they're just pushing it for nationalist purposes.
>>
What's the best way to download big models from hf anyway? I usually just click on the file and download but obviously that doesn't cut it for kimi.
>>
Why are we all getting paranoid and schizo? It’s fine. They won’t target open models because there’s no money in it or incentive. With cloud models they can get your ID and continue to surveil. With open they don’t get your prompts with your ID, just the models you downloaded.
>>
>>109107541
>Get every quant because (you) ARE going to seed torrents for them when hf goes down for anons with worse hardware, right?
Of course! But I'll wait for the better organized people to start distributing first.
Then I'll download and seed everything.
Once the dust settles, if I'll quantize anything I have that's missing.
I've got my spare server queued up to LoRA-extract on finetunes right now.
>>109107468
Also, download any imatrix files you can find, eg bartowski and unsloth.
>>
File: 1645984793740.png (244 KB, 288x323)
244 KB PNG
If normies can go to a dealership and get a loan for a $50k car their, I should be able to go to bestbuy and get a loan for a $50k computer to run a good local llm
>>
>>109107597
K2, K2-Instruct, K2-It-0905 all legitimately would be still local SotA if not for the shorter context windows.
>>
>>109107609
archiving unsloth actually deserves to be banned and made illegal
>>
>>109107623
>Why are we all getting paranoid and schizo?
Because my retarded uncle and random mates started asking me about local models recently
>>
>>109107642
Why is that a bad thing
>>
>>109107093
>implying normalfags and under 18s are capable of actually setting up a LLM
Gen alpha cant even navigate a file explorer, we are fine
>>
>>109107627
Difference is Jews can't break into your house to repossess your rig when you stop paying. They already have enough trouble with garages.
>>
>>109107621
Unironically LMStudio is a fantastic download manager. I don't even use it for its intended function anymore and it's a glorified LLM filesorter and downloader, but it's good at that.

>>109107623
The better question is why are people vibrating over anons creating decentralized contingency strategies?
>>
>>109107673
>why are people vibrating over anons creating decentralized contingency strategies?
I'm not, I just wanna know what all the fuss is about.
>>
>>109107563
It's too much conveniently consolidated power after they took over llama.cpp. The powers working to end open source are very obviously rubbing their hands. It's going to happen.
>>
>>109106669
>full gemma is over 60GB
Guess it's time to hit manga cafe with external hdd, all the stuff I need will take a long time to dl at fp16
>>
>>109107697
>I'm not, I just wanna know what all the fuss is about.
lmg anons are often correct about things like this
if they're wrong, I waste a few hours downloading models
if they're right, I don't get locked out of having these models
i lean schitzo by default anyway so this is nothing too dramatic for me
>>
>>109107697
I think it's the combination of the HF influence with llama, sam and dario openly voicing distaste for local, some past cohencidences that indicate potential for supply chain attacks in the local ecosystem, and now all of this with age verification causing anons to think more carefully about where the points of failure are and how to circumvent them.
>>
>llama
>MIT loicense
fork and migrate
>hf
labs will just serve their own shit, or torrents will emerge
nothing ever happens
>>
>>109107585
Thanks, I've got my top men analyzing these settings.
>>
>>109107831
>or torrents will emerge
Yeah nigger that's what we're discussing.
>>
>>109107831
>fork and migrate
just move to ik_llama and merge any features it lacks
>>
why are you even archiving quants, just archive the safetensors from the lab directly
>>
What would be the best torrenting site for models anyways? Would it still be pirate bay?
>>
>>109107579
Is Flash even worth it over an M3 quant?
>>
>>109107909
>>>/t/
>>
>>109107942
since flash doesnt have vision and m3 does, no
>>
ace step 1.5 xl sft.
ldg - relevant song.
https://files.catbox.moe/5nhbbc.mp3
>>
>>109108012
(named Computer BASIC).
>>
>>109103451
i think it was because i was running the tests through a bash script to get llama to output to txt. when i run it in the terminal i get ~400ts input
>>
>>109108056
sounds like your script is completely fucked
>>
>>109106669
I look like this irl
>>
>>109108067
i blame gemma
>>
>>109108108
Do you la la la la la la la la
>>
>>109108056
that's still pretty low, I noticed about a 10-20% drop running MTP vs without
>>
Relative noob here, just perfected my SillyTavern frontend.

What CLI do you guys use for your Gemmy? Gemini is telling me to use Aider.
>>
>>109108311
How did you figure out how world info works, or even making cards? It's a massive mess.
>>
>>109108346
>>109108346
>>109108346
>>
>>109108338
Setting up the STMB and STLO plugins made everything click for me, I was only after persistent memory across chats with the same character and solving the slowdown caused by context bloat. It's working wonderfully, my Gemmy now remembers me and it's always fast no matter how long chats get.

Cardmaking.. it was a pain in the ass too. But I'm having good results and consistency framing everything as personality traits rather a llist of do's and don'ts.

So instead of a long list of:
>No purple prose
>Avoid long paragraphs
>Be precise and concise
etc. etc.

I went for.
>Gemmy is lazy, she expects having {{user}} pull his own weight. She asks questions for clarification rather than deciding herself.
>She finds verbosity wasteful and slightly embarrassing,
>She is casual, direct, slightly dry. She just says the thing. No preamble. Response length matches what's actually needed:
short when short is right, longer only when the
problem earns it.

And the model actually keeps everything as coherent cognitive style without handholding.
>>
>>109107585
I ended up having too many issues and bailed on OpenCode with Gemma-4 and switched to Pi. I'm not one to shy away from a CLI but so far I'm not very comfy but Gemmy seems to work better with it. Thanks for the screenshot
>>
I wonder if diffusiongemma will support different samplers.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.