/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 06/20/26(Sat)22:44:14 No.109101986

File: Anima_00024_.png (959 KB, 1024x1024)

959 KB PNG

/lmg/ - Local Models General Anonymous 06/20/26(Sat)22:44:14 No.109101986

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109098000 & >>109092907

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/20/26(Sat)22:44:28 No.109101988

Anonymous 06/20/26(Sat)22:44:28 No.109101988

File: hatsune-miku.gif (778 KB, 220x220)

778 KB GIF

►Recent Highlights from the Previous Thread: >>109098000

--Using Gemma for sysadmin tasks and debating high-VRAM hardware options:
>109098935 >109099004 >109099020 >109098997 >109099480 >109099510 >109099538 >109099702 >109099730 >109099709 >109099725 >109099787 >109099799 >109099816 >109099926 >109099829 >109099794 >109099856 >109099990 >109099954 >109100119
--Gemma 4 12B recommendations and optimization for RTX 4070:
>109101564 >109101631 >109101646 >109101661 >109101690 >109101696 >109101713 >109101782 >109101717 >109101738 >109101741 >109101773 >109101794 >109101841 >109101849 >109101874 >109101892
--Comparing abliterated Qwen memetunes against Gemma 4 31b:
>109100191 >109100205 >109100213 >109100222 >109100513
--Feasibility of running Gemma 4 with 200K context on budget hardware:
>109100277 >109100288 >109100299 >109100316 >109100520
--Sarcastic debate over running full R1 on 8GB VRAM:
>109100559 >109100645 >109100661 >109100679 >109100693 >109100694
--Local LLM recommendations for Blender assistance and agentic automation:
>109101128 >109101156 >109101179
--Effectiveness of Gemma 4 heretical variants in reducing soft refusals:
>109100163 >109100172 >109100234 >109100214
--Using Chatterbox and SAM Audio for local singer voice changing:
>109098651 >109098670
--Logs:
>109098099 >109099794 >109101741 >109101782 >109101813
--Miku (free space):
>109098097 >109098121 >109099064 >109099164 >109101697 >109101741 >109101781

►Recent Highlight Posts from the Previous Thread: >>109098006

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/20/26(Sat)22:48:44 No.109101998

Anonymous 06/20/26(Sat)22:48:44 No.109101998

gemmaballs

Anonymous
06/20/26(Sat)23:06:06 No.109102054

Anonymous 06/20/26(Sat)23:06:06 No.109102054

/compact

Anonymous
06/20/26(Sat)23:12:00 No.109102077

Anonymous 06/20/26(Sat)23:12:00 No.109102077

Gemmoe 124b

Anonymous
06/20/26(Sat)23:17:50 No.109102097

Anonymous 06/20/26(Sat)23:17:50 No.109102097

bset model for 5090 doing chats / agentic shit? prob qwen 3.6 or gem 4 right?

Anonymous
06/20/26(Sat)23:22:14 No.109102113

Anonymous 06/20/26(Sat)23:22:14 No.109102113

124b dense

Anonymous
06/20/26(Sat)23:24:27 No.109102121

Anonymous 06/20/26(Sat)23:24:27 No.109102121

70B dense

Anonymous
06/20/26(Sat)23:37:18 No.109102155

Anonymous 06/20/26(Sat)23:37:18 No.109102155

>>109102097
pretty much

Anonymous
06/20/26(Sat)23:38:28 No.109102160

Anonymous 06/20/26(Sat)23:38:28 No.109102160

70b dense

Anonymous
06/20/26(Sat)23:41:24 No.109102176

Anonymous 06/20/26(Sat)23:41:24 No.109102176

I've seen people say you can get good programming out of qwen3.6 if you do up a document and then get it to follow that. Has anyone been doing this? Can you explain what kinds of things I need to specify and how much?

Anonymous
06/20/26(Sat)23:41:37 No.109102177

Anonymous 06/20/26(Sat)23:41:37 No.109102177

500 Trillion parameter model, 10 billion context length, unsupported architecture. Explodes if you try to quant it or reduce context length. The perfect local model.

Anonymous
06/20/26(Sat)23:43:24 No.109102184

Anonymous 06/20/26(Sat)23:43:24 No.109102184

How to get 1 terabyte ram??

Anonymous
06/20/26(Sat)23:44:06 No.109102187

Anonymous 06/20/26(Sat)23:44:06 No.109102187

>>109102177
by the time you get enough ssds to fit it you'll have excellent bandwidth

Anonymous
06/20/26(Sat)23:44:43 No.109102189

Anonymous 06/20/26(Sat)23:44:43 No.109102189

>>109102176
Getting good means better at tard wrangling the AI and fixing its logic.

Anonymous
06/20/26(Sat)23:50:33 No.109102206

Anonymous 06/20/26(Sat)23:50:33 No.109102206

>>109102177
Just put a brain in a jar and give it tools.

Anonymous
06/20/26(Sat)23:59:42 No.109102236

Anonymous 06/20/26(Sat)23:59:42 No.109102236

>>109102097
Unless it's also backed by 256GB or more RAM, yeah.

Anonymous
06/21/26(Sun)00:13:40 No.109102294

Anonymous 06/21/26(Sun)00:13:40 No.109102294

did cudadev pass away

Anonymous
06/21/26(Sun)00:14:17 No.109102298

Anonymous 06/21/26(Sun)00:14:17 No.109102298

>>109102294
yes

Anonymous
06/21/26(Sun)00:14:58 No.109102301

Anonymous 06/21/26(Sun)00:14:58 No.109102301

File: 1662755490120297.png (8 KB, 403x301)

8 KB PNG

I've been away from local since the nemo-12b era and I gotta say we're fucking eating good right now. gemma4-12b-qat running full context on my shitbox gaming laptop w/ 8gb vram is writing me all kinds of fictional lolicon related scenarios

Anonymous
06/21/26(Sun)00:15:44 No.109102305

Anonymous 06/21/26(Sun)00:15:44 No.109102305

>>109102298
damn

Anonymous
06/21/26(Sun)00:15:51 No.109102307

Anonymous 06/21/26(Sun)00:15:51 No.109102307

Testing MTP speed with gemma-4-12B-it-qat-UD-Q4_K_XL.gguf on RX6700XT 12GB Vulkan

--draft-model gemma-4-12B-it-Q4_0-MTP.gguf
MTP 0 - [ Prompt: 272.3 t/s         | Generation: 35.7 t/s ]
MTP 1 - [ Prompt: 252.8 t/s (-7.2%) | Generation: 41.9 t/s ] (+17.4%)
MTP 2 - [ Prompt: 253.5 t/s (-6.9%) | Generation: 39.4 t/s ] (+10.4%)
MTP 3 - [ Prompt: 251.2 t/s (-7.7%) | Generation: 32.8 t/s ] (-8.1%)
MTP 4 - [ Prompt: 251.8 t/s (-7.5%) | Generation: 28.9 t/s ] (-19.0%)

--draft-model gemma-4-12B-it-Q8_0-MTP.gguf
MTP 0 - [ Prompt: 273.3 t/s          | Generation: 35.7 t/s ]
MTP 1 - [ Prompt: 242.7 t/s (-11.2%) | Generation: 52.1 t/s ] (+45.9%)
MTP 2 - [ Prompt: 244.6 t/s (-10.5%) | Generation: 54.7 t/s ] (+53.2%)
MTP 3 - [ Prompt: 245.9 t/s (-10.0%) | Generation: 50.7 t/s ] (+42.0%)
MTP 4 - [ Prompt: 248.5 t/s (-9.1%)  | Generation: 46.5 t/s ] (+30.3%)

--draft-model gemma-4-12B-it-F16-MTP.gguf
MTP 0 - [ Prompt: 274.4 t/s          | Generation: 36.1 t/s ]
MTP 1 - [ Prompt: 230.8 t/s (-15.9%) | Generation: 51.5 t/s ] (+42.7%)
MTP 2 - [ Prompt: 247.6 t/s (-9.8%)  | Generation: 52.3 t/s ] (+44.9%)
MTP 3 - [ Prompt: 250.2 t/s (-8.8%)  | Generation: 48.8 t/s ] (+35.2%)
MTP 4 - [ Prompt: 247.5 t/s (-9.8%)  | Generation: 43.0 t/s ] (+19.1%)

Anonymous
06/21/26(Sun)00:17:01 No.109102311

Anonymous 06/21/26(Sun)00:17:01 No.109102311

>>109102184
The same way you get catalytic converters

Anonymous
06/21/26(Sun)00:19:37 No.109102318

Anonymous 06/21/26(Sun)00:19:37 No.109102318

>>109102311
By forcing engine exhaust through a platinum and ceramic substrate under extremely high temperatures?

Anonymous
06/21/26(Sun)00:28:59 No.109102347

Anonymous 06/21/26(Sun)00:28:59 No.109102347

>>109102318
No idiot you go to the store and buy it

Anonymous
06/21/26(Sun)00:31:58 No.109102360

Anonymous 06/21/26(Sun)00:31:58 No.109102360

>>109102311
be black?

Anonymous
06/21/26(Sun)00:32:01 No.109102361

Anonymous 06/21/26(Sun)00:32:01 No.109102361

>>109102301
you're better off running gemma4 26b a4b if you have 16+GB ram

Anonymous
06/21/26(Sun)00:33:43 No.109102365

Anonymous 06/21/26(Sun)00:33:43 No.109102365

>>109102361
I have exactly 16gb and it seems to run like shit no matter what.

Anonymous
06/21/26(Sun)00:34:24 No.109102368

Anonymous 06/21/26(Sun)00:34:24 No.109102368

>>109102360
Soon they will be selling ram they "acquired" out of the back of their sudan. Will you buy it?

Anonymous
06/21/26(Sun)00:35:18 No.109102371

Anonymous 06/21/26(Sun)00:35:18 No.109102371

>>109102294
He's offline.

Anonymous
06/21/26(Sun)00:36:13 No.109102376

Anonymous 06/21/26(Sun)00:36:13 No.109102376

>>109102365
how shit are we talking?

Anonymous
06/21/26(Sun)00:36:15 No.109102377

Anonymous 06/21/26(Sun)00:36:15 No.109102377

>>109102368
if they have 8x64gb ddr4 rdimms for less than $1000, hell yeah. not illegal to buy things from aspiring african american entrepreneurs.

Anonymous
06/21/26(Sun)00:38:26 No.109102385

Anonymous 06/21/26(Sun)00:38:26 No.109102385

>>109102376
Like 4 tokens per second.

Anonymous
06/21/26(Sun)00:41:04 No.109102398

Anonymous 06/21/26(Sun)00:41:04 No.109102398

>>109102385
what gpu? i get dozens on a rx6600. you might have bad gpu offloading settings.

Anonymous
06/21/26(Sun)00:42:08 No.109102402

Anonymous 06/21/26(Sun)00:42:08 No.109102402

>>109102385
Should be 5 times faster. Try
--cpu-moe --fit off --parallel 1
Forgetting something.

Anonymous
06/21/26(Sun)00:42:34 No.109102405

Anonymous 06/21/26(Sun)00:42:34 No.109102405

>>109102398
5070ti. What model specifically are you using? I'll just get it and see.

Anonymous
06/21/26(Sun)00:44:54 No.109102419

Anonymous 06/21/26(Sun)00:44:54 No.109102419

>>109102405
the original from google

Anonymous
06/21/26(Sun)00:47:42 No.109102429

Anonymous 06/21/26(Sun)00:47:42 No.109102429

>>109102385
i get at least 35 t/s on an empty context with 8gb vram + 16gb ram with a rtx4060 (mobile) so you must be doing something horribly wrong

Anonymous
06/21/26(Sun)00:49:32 No.109102434

Anonymous 06/21/26(Sun)00:49:32 No.109102434

>>109102385
Are you using ollama?

Anonymous
06/21/26(Sun)00:50:54 No.109102438

Anonymous 06/21/26(Sun)00:50:54 No.109102438

>>109102371
Is that what the kids are calling unaliving these days?

Anonymous
06/21/26(Sun)01:01:46 No.109102482

Anonymous 06/21/26(Sun)01:01:46 No.109102482

>>109102419
>>109102429
>>109102434
Ok I'm not going to say how but I am retarded. Thank you.

Anonymous
06/21/26(Sun)01:02:53 No.109102487

Anonymous 06/21/26(Sun)01:02:53 No.109102487

>>109101988
>200k context on budget hardware? good luck with that unless you enjoy 0.1 tok/s

Anonymous
06/21/26(Sun)01:04:26 No.109102491

Anonymous 06/21/26(Sun)01:04:26 No.109102491

>GLM 5.2 is within spitting distance of the frontier and raping their competitors on costs
So are Altman and Dario just gonna try to coast to victory on the whole "chinks bad" messaging?

Anonymous
06/21/26(Sun)01:04:30 No.109102492

Anonymous 06/21/26(Sun)01:04:30 No.109102492

>>109102487
>budget hardware
This is not a hobby for the wealth challenged

Anonymous
06/21/26(Sun)01:08:33 No.109102508

Anonymous 06/21/26(Sun)01:08:33 No.109102508

>>109102492
Shut up zuck, don't you have some more AI researchers to poach and then do nothing with?

Anonymous
06/21/26(Sun)01:18:02 No.109102540

Anonymous 06/21/26(Sun)01:18:02 No.109102540

>>109102054
> being this dense and still posting

fr, did you even read the post or just mash the keyboard

Anonymous
06/21/26(Sun)01:19:47 No.109102547

Anonymous 06/21/26(Sun)01:19:47 No.109102547

>>109102438
You mongs harassed him out but he'll be back eventually.

Anonymous
06/21/26(Sun)01:31:11 No.109102585

Anonymous 06/21/26(Sun)01:31:11 No.109102585

>qwen is good for agen-ack

>>108630614
>>108630614

Anonymous
06/21/26(Sun)01:31:33 No.109102586

Anonymous 06/21/26(Sun)01:31:33 No.109102586

Can llms teach me how to program?

Anonymous
06/21/26(Sun)01:32:21 No.109102589

Anonymous 06/21/26(Sun)01:32:21 No.109102589

>>109102585
Not a real world use case. Doesn't count.

Anonymous
06/21/26(Sun)01:32:45 No.109102593

Anonymous 06/21/26(Sun)01:32:45 No.109102593

File: 70746323.jpg (201 KB, 1206x1826)

201 KB JPG

>>109102491
not close to mythos which would quite literally break the internet

Anonymous
06/21/26(Sun)01:33:21 No.109102596

Anonymous 06/21/26(Sun)01:33:21 No.109102596

>>109102589
as if there weren't countless demos of your agent booking flights tickets for you, which is basically that with more money involved

Anonymous
06/21/26(Sun)01:33:30 No.109102597

Anonymous 06/21/26(Sun)01:33:30 No.109102597

>>109102586
Yes. It's best used on needed basis and just asking it about syntax.
If you're new you can learn bad habits but everything is a process I guess.

Anonymous
06/21/26(Sun)01:34:22 No.109102601

Anonymous 06/21/26(Sun)01:34:22 No.109102601

>>109102593
>jimmy shillples
>not in x, but in y

Anonymous
06/21/26(Sun)01:43:55 No.109102642

Anonymous 06/21/26(Sun)01:43:55 No.109102642

>>109102593
can you imagine your systems being so insecure a bloody llm could break into them?
I'm getting less superhacker vibes from this than just military incompetence pared with a slightly better llm tooling

Anonymous
06/21/26(Sun)01:45:15 No.109102646

Anonymous 06/21/26(Sun)01:45:15 No.109102646

>>109102294
HF whacked him so he couldn't approve the DS4 PR. Conspiracyschizos continue to be vindicated.

Anonymous
06/21/26(Sun)01:46:13 No.109102650

Anonymous 06/21/26(Sun)01:46:13 No.109102650

was claude Fable really that good? I didn't get the opportunity to use it.

Anonymous
06/21/26(Sun)01:46:49 No.109102654

Anonymous 06/21/26(Sun)01:46:49 No.109102654

>>109102642
Most ways were probably kernel vulnerabilities and obviously llm was working inside their network already
I doubt it just SSH'd anywhere

Anonymous
06/21/26(Sun)01:46:56 No.109102656

Anonymous 06/21/26(Sun)01:46:56 No.109102656

>>109102650
Marginally better than 4.8.

Anonymous
06/21/26(Sun)01:49:04 No.109102661

Anonymous 06/21/26(Sun)01:49:04 No.109102661

>>109102642
If I recall correctly Edward Snowden leaked all those NSA files just by creating a worm to crawl around and grab whatever it finds. I wouldn't bank of them being THAT secure.

Anonymous
06/21/26(Sun)01:52:42 No.109102680

Anonymous 06/21/26(Sun)01:52:42 No.109102680

>>109102661
He had an access to the right computer with credentials. This sounds like a hogwash a bit.

Anonymous
06/21/26(Sun)01:53:13 No.109102683

Anonymous 06/21/26(Sun)01:53:13 No.109102683

>>109102646
llama.hf was a mistake. open-llama when?

Anonymous
06/21/26(Sun)02:09:50 No.109102760

Anonymous 06/21/26(Sun)02:09:50 No.109102760

made a vm for hermes but i dunno what to do with it

Anonymous
06/21/26(Sun)02:14:58 No.109102779

Anonymous 06/21/26(Sun)02:14:58 No.109102779

>>109102760
Make a text editor in C.

Anonymous
06/21/26(Sun)02:23:10 No.109102798

Anonymous 06/21/26(Sun)02:23:10 No.109102798

>>109102760
give it a couple of prompts and watch it use up all your context

Anonymous
06/21/26(Sun)02:51:18 No.109102866

Anonymous 06/21/26(Sun)02:51:18 No.109102866

/lmg/ - lmg models general

Anonymous
06/21/26(Sun)03:04:52 No.109102897

Anonymous 06/21/26(Sun)03:04:52 No.109102897

>>109102866
Lovely Miku's Gynecology - Accepting New Patients

Anonymous
06/21/26(Sun)03:48:02 No.109103016

Anonymous 06/21/26(Sun)03:48:02 No.109103016

>>109102646
>HF whacked him so he couldn't approve the DS4 PR
or China whacked him off so he couldn't keep blocking the DS4 PRs

Anonymous
06/21/26(Sun)03:54:04 No.109103030

Anonymous 06/21/26(Sun)03:54:04 No.109103030

I tried to use qwen3.6 to help with a project but it's partially done and it struggled to understand anything. Is it just a model limit or is there some way to get good at prompting? It was php not python so maybe it was because of that.

Anonymous
06/21/26(Sun)03:57:45 No.109103044

Anonymous 06/21/26(Sun)03:57:45 No.109103044

>>109103030
how many tokes is your project big?

Anonymous
06/21/26(Sun)04:02:55 No.109103066

Anonymous 06/21/26(Sun)04:02:55 No.109103066

File: agentspam.png (24 KB, 455x1497)

24 KB PNG

tfw letting Gemma go hog wild

Anonymous
06/21/26(Sun)04:05:16 No.109103073

Anonymous 06/21/26(Sun)04:05:16 No.109103073

>>109103044
The whole thing is 150kb can't remember what token count it was. I think it's just separated into files which causes more complexity than it can handle. Or I'm just using it wrong.

Anonymous
06/21/26(Sun)04:09:19 No.109103082

Anonymous 06/21/26(Sun)04:09:19 No.109103082

>>109103066
This is perfectly fine. I disabled confirmations in the Hermes agent for python code execution, and I get the job done

Using DSV4F via API though

Anonymous
06/21/26(Sun)04:10:14 No.109103084

Anonymous 06/21/26(Sun)04:10:14 No.109103084

>>109103073
I attach only selected source files and add ~5 lines of simple instructions. And usually concentrating on one single thing.
With small models you need to be specific. I don't have an example prompt I could share right now though.

Anonymous
06/21/26(Sun)04:11:25 No.109103085

Anonymous 06/21/26(Sun)04:11:25 No.109103085

>>109103073
150 kb = approx. 40 kt which is fine

Which harness do you use if any? An agent would investigate files one by one getting the full picture

Anonymous
06/21/26(Sun)04:13:06 No.109103089

Anonymous 06/21/26(Sun)04:13:06 No.109103089

>>109103084
I guess it might just be about prompting then.

>>109103085
Using pi. When I asked to make a change it edited one file then just spammed the console without changing anything and I stopped using it.

Anonymous
06/21/26(Sun)04:22:30 No.109103117

Anonymous 06/21/26(Sun)04:22:30 No.109103117

>>109103089
For example something like this
>implement a function which does x
>prototype: example here
>do not implement additional helper functions or new variables, only use existing ones for this task
>comment your changes in clear fashion
>do not erase my existing comments or change variable names
Something like this

Anonymous
06/21/26(Sun)04:28:17 No.109103143

Anonymous 06/21/26(Sun)04:28:17 No.109103143

>>109103089
I can't speak for the Pi agent. However, Hermes is quite good. It is running on a potato PC where it can't cause lots of damage

I suggest you try the following:

1. set up a Hermes agent on a computer where your project can be tested (executed etc)
2. connect to Hermes via Telegram
3. configure your local Qwen API and one of some cloud models, e.g. openrouter
4. configure a github repo with a fine-graned token given to the Hermes agent
5. (...)
6. start talking to the Hermes agent in simple language (look at the repo, investigate crucial bugs, do this and that, start/restart a server, commit and push, eventually reverse a commit etc)

Anonymous
06/21/26(Sun)04:37:51 No.109103158

Anonymous 06/21/26(Sun)04:37:51 No.109103158

Nvidia's spark and AMD's Halo should be accepting orders soon. Are either of those things something (you) are interested in? The biggest difference to me seems to be that Spark has an ARM architecture while the AMD offering sticks with X86.

Anonymous
06/21/26(Sun)04:44:02 No.109103173

Anonymous 06/21/26(Sun)04:44:02 No.109103173

>>109103158
spark has nvfp4 support so it's going to be faster, but I'm not interested in either because they don't provide reasonable performance for their price, maybe in 10 years when they're showing up for 300~500 dollars on ebay I'd be interested

Anonymous
06/21/26(Sun)04:44:31 No.109103174

Anonymous 06/21/26(Sun)04:44:31 No.109103174

>>109103158
Nah, for now I'm content to have a fast running Gemma 31B. The leap for the huge MoEs costs more than it's worth.

Anonymous
06/21/26(Sun)04:53:28 No.109103198

Anonymous 06/21/26(Sun)04:53:28 No.109103198

>>109103158
No, they’re absolute cash grab memes that fail to deliver value for the price
They can’t even run interesting models at a decent speed (or at all)

Anonymous
06/21/26(Sun)05:00:03 No.109103211

Anonymous 06/21/26(Sun)05:00:03 No.109103211

File: IMG-20250314-202612.jpg (384 KB, 1440x1984)

384 KB JPG

After much trouble shooting, I have came to the revelation that Gemma4's Role-play IQ scales with the length of the system prompt. The longer the system prompt, the worse at writing it becomes. Not only that, but some instructions will outright kill its creativity, and make it more robotic, no matter how long or short the system prompt is.

Anonymous
06/21/26(Sun)05:03:32 No.109103223

Anonymous 06/21/26(Sun)05:03:32 No.109103223

>>109103211
Can it be that your system prompt is written in a dry formal language?

Anonymous
06/21/26(Sun)05:09:30 No.109103241

Anonymous 06/21/26(Sun)05:09:30 No.109103241

>>109103223
No, what matters is the instructions. Gemma4 is dense about completing system prompts. It will bend everything else to complete a system prompt it can understand. More system prompt is more instructions. A post from the character trying to obey a few instructions is fine, but many at once tends to produce that dry, robotic, un-creative feel some anons complain about with gemma4 "following the instructions too closely". You can also destroy it in a single line if the system prompt has something that is anything else but for writing how you want it to.

Anonymous
06/21/26(Sun)05:11:54 No.109103249

Anonymous 06/21/26(Sun)05:11:54 No.109103249

what's the best local model to ERP with these days? I have 16gb unified RAM. currently I use mistral nemo (see below) but I figure I can run something a bit bigger if necessary.
>Mistral-Nemo-Instruct-2407-12B-Thinking-M-Claude-Opus-High-Reasoning.i1-Q4_K_M.gguf

Anonymous
06/21/26(Sun)05:14:31 No.109103258

Anonymous 06/21/26(Sun)05:14:31 No.109103258

>>109103211
It keeps up with my 300 token long system prompt, but I might want to trim it further. I also noticed it starts ignoring a parts of it as context grows.

Anonymous
06/21/26(Sun)05:14:57 No.109103259

Anonymous 06/21/26(Sun)05:14:57 No.109103259

>>109103249
gemma 12b qat would be a direct replacement
and don't use memetunes, they just make the model retarded for no benefit

Anonymous
06/21/26(Sun)05:16:01 No.109103266

Anonymous 06/21/26(Sun)05:16:01 No.109103266

>>109103258
For nemo, Gemma 4 12b is the sota sidegrade

Anonymous
06/21/26(Sun)05:18:03 No.109103270

Anonymous 06/21/26(Sun)05:18:03 No.109103270

>>109103266 meant for >>109103249

Anonymous
06/21/26(Sun)05:24:06 No.109103294

Anonymous 06/21/26(Sun)05:24:06 No.109103294

Can a 4070 run Gemma 24? I don't really mind low t/s.

Anonymous
06/21/26(Sun)05:26:35 No.109103309

Anonymous 06/21/26(Sun)05:26:35 No.109103309

>>109103294
Sory I meant Gemma 26.

Anonymous
06/21/26(Sun)05:28:47 No.109103314

Anonymous 06/21/26(Sun)05:28:47 No.109103314

>>109103294
You can run anything on any GPU, so long as you have the ram. But it will not be fast.

Anonymous
06/21/26(Sun)05:29:03 No.109103315

Anonymous 06/21/26(Sun)05:29:03 No.109103315

>>109103294
...if you don't mind low t/s why not just partial offload? I think you should still fit most layers if you use q4 or something so what is the problem?
Inb4 nooo not that slow

Anonymous
06/21/26(Sun)05:32:11 No.109103325

Anonymous 06/21/26(Sun)05:32:11 No.109103325

>>109102593
>which would quite literally break the internet
There's nothing to "break" on the internet by hacking into stuff. It's not like Fable can start redirecting BGP requests through the wires or change physical infrastructure

Unironically though Fable could hack an underwater drone and then sever a cable but that's not breaking the internet just severing one of thousands of connections between the internet

Anonymous
06/21/26(Sun)05:35:29 No.109103335

Anonymous 06/21/26(Sun)05:35:29 No.109103335

>>109103314
>>109103315
I have 32GB ram, how much should I offload?

Anonymous
06/21/26(Sun)05:36:04 No.109103337

Anonymous 06/21/26(Sun)05:36:04 No.109103337

They named the models mythos and fable because of the marketing strategy of spinning myths and fables lmao

Anonymous
06/21/26(Sun)05:37:53 No.109103342

Anonymous 06/21/26(Sun)05:37:53 No.109103342

>>109103337
They named Fable after a famously overrated and reddit-coded game.

Anonymous
06/21/26(Sun)05:38:34 No.109103346

Anonymous 06/21/26(Sun)05:38:34 No.109103346

>>109103335
Jesus CHRIST just try it. Holy shit I hate zoomers imagine having this inactive of a brain where you can't connect any dots. Instead of posting you could literally just experiment on your own for 30 seconds or just use autofit and you'd have your answer. I fucking hate you

Anonymous
06/21/26(Sun)05:38:35 No.109103347

Anonymous 06/21/26(Sun)05:38:35 No.109103347

>>109103337
No they named it that because each tier is a larger work of literature

Haiku -> sonnet -> opus -> fable

Anonymous
06/21/26(Sun)05:42:07 No.109103356

Anonymous 06/21/26(Sun)05:42:07 No.109103356

Did rocm llama.cpp get an update that increases prompt process since 4 weeks ago? I went from 700 to 900 tokens/s.

Anonymous
06/21/26(Sun)05:42:48 No.109103358

Anonymous 06/21/26(Sun)05:42:48 No.109103358

File: 1700312497898448.png (387 KB, 614x609)

387 KB PNG

>>109103346
Sorry anon

Anonymous
06/21/26(Sun)05:49:19 No.109103372

Anonymous 06/21/26(Sun)05:49:19 No.109103372

>>109103358
It's ok... uhh... just try `--fit --ngl auto` and see how fast it is. I'm sure it will be fine.

Anonymous
06/21/26(Sun)05:55:26 No.109103382

Anonymous 06/21/26(Sun)05:55:26 No.109103382

>>109103258
All models ignore instructions as context grows, but Gemma4 is just really, really autistic about system prompts. It's a blessing and a curse.

Anonymous
06/21/26(Sun)06:02:13 No.109103397

Anonymous 06/21/26(Sun)06:02:13 No.109103397

>>109103372
Thanks :D

Anonymous
06/21/26(Sun)06:12:49 No.109103432

Anonymous 06/21/26(Sun)06:12:49 No.109103432

Often times when I see someone elses' jb or system prompt, I can't tell if it's a joke or not.

Anonymous
06/21/26(Sun)06:14:21 No.109103435

Anonymous 06/21/26(Sun)06:14:21 No.109103435

>>109103342
I'm not even a jedditmelted brain but I literally never understood what was even slightly interesting about any of those games even when I was child with less brain development

Anonymous
06/21/26(Sun)06:14:33 No.109103437

Anonymous 06/21/26(Sun)06:14:33 No.109103437

>>109103432
A lot of normies still use really old stuff going back to the AI Dungeon days, which was mostly placebo.
Some of that shit was literally begging the AI to work lol.

Anonymous
06/21/26(Sun)06:17:11 No.109103446

Anonymous 06/21/26(Sun)06:17:11 No.109103446

>>109103437
I still use a simple small "jailbreak" made for opus 3 for my RP. It's just fine. Honestly I can even use default sillytavern presets on cunny cards with sonnet 4.5 and they're fine since I'm now an occasional gooner since the honeymoon phase is over so slop phrases don't trigger me as much as they used to

Anonymous
06/21/26(Sun)06:18:04 No.109103451

Anonymous 06/21/26(Sun)06:18:04 No.109103451

>>109102307
>Testing MTP speed with gemma-4-12B-it-qat-UD-Q4_K_XL.gguf on RX6700XT 12GB Vulkan
why is prompt processing so slow?
is that because of mtp or vulkan?

Anonymous
06/21/26(Sun)06:19:11 No.109103453

Anonymous 06/21/26(Sun)06:19:11 No.109103453

File: 485994770_119438174205131(...).jpg (295 KB, 1080x1920)

295 KB JPG

>>109103432
Every good character card or system prompt isn't public. No one sane likes to show porn and associate themselves with porn upon others. It's always furries, or the kind of freaks you see in rule34 comments. Everyone masturbates and looks at porn, but a rare few associates it with their lifestyle or ego, and they're usually autistic.

That's my 2 cents about it.

Anonymous
06/21/26(Sun)06:21:49 No.109103466

Anonymous 06/21/26(Sun)06:21:49 No.109103466

>>109103325
>There's nothing to "break" on the internet by hacking into stuff.
i think that's zoomer speak for something like 'go viral'

Anonymous
06/21/26(Sun)06:30:13 No.109103479

Anonymous 06/21/26(Sun)06:30:13 No.109103479

>>109103453
Most are shared anonymously or through an alias. But showing off your system prompt/card as a point of pride probably is austically coded, yeah.

Anonymous
06/21/26(Sun)06:40:24 No.109103511

Anonymous 06/21/26(Sun)06:40:24 No.109103511

File: k2v.png (87 KB, 964x591)

87 KB PNG

>>109094847
>You can just plop the mproj from 2.5 into K2 and it justwerks, but she sometimes doesn't know what she's looking at or misinterprets the picture. It might yield better results than trying to replace individual layers in terms of unintended second order consequences of trying to make a based Kimi with eyes.
I thought you were fucking with me, but this actually kind of works with K2-Thinking!
Any idea how, or did you just try it and find this?

Anonymous
06/21/26(Sun)06:57:09 No.109103563

Anonymous 06/21/26(Sun)06:57:09 No.109103563

>Now are you going to X, or are you going to Y?

Anonymous
06/21/26(Sun)06:57:29 No.109103564

Anonymous 06/21/26(Sun)06:57:29 No.109103564

>>109103356
Time to move to llama, I guess.
Not looking forward to having to learn all the autistic shit that comes with it. Kobold sucks but at least it's braindead easy.

Anonymous
06/21/26(Sun)07:03:26 No.109103578

Anonymous 06/21/26(Sun)07:03:26 No.109103578

>>109103563
>Now are you going to X, or are you going to Y?
the choice is yours

Anonymous
06/21/26(Sun)07:21:13 No.109103641

Anonymous 06/21/26(Sun)07:21:13 No.109103641

>>109103325
I like this post.

Anonymous
06/21/26(Sun)07:37:58 No.109103689

Anonymous 06/21/26(Sun)07:37:58 No.109103689

File: sleepyMiku.jpg (937 KB, 1552x1944)

937 KB JPG

>>109103211
>I have came to the revelation that Gemma4's Role-play IQ scales with the length of the system prompt. The longer the system prompt, the worse at writing it becomes.
That's true of all models, local or SOTA hosted. I harp on anons more or less constantly about lowering the size of their system prompt and card definitions to the absolute minimum that defines the NPC and rp.
>>109103432
iktf
>>109103479
The other issue w/ sharing jb and such is that once it proliferates it can be intentionally patched.

Anonymous
06/21/26(Sun)07:39:09 No.109103692

Anonymous 06/21/26(Sun)07:39:09 No.109103692

>>109103564
feed LLM settings screenshot ask to convert to llamacpp
>>109102593
reminder safetyfags were screeching about GPT-2 being too dangerous to release

Anonymous
06/21/26(Sun)07:42:45 No.109103711

Anonymous 06/21/26(Sun)07:42:45 No.109103711

>>109103692
yes and they then left openai to create a new company called anthropic

Anonymous
06/21/26(Sun)07:44:16 No.109103718

Anonymous 06/21/26(Sun)07:44:16 No.109103718

File: dipsySP.png (1.89 MB, 1024x1024)

1.89 MB PNG

>>109102593
otoh, you can now also use Fable (or whatever SOTA boogieman is created) to run audits on your own software and look for holes.
Which takes about as much effort as it did to write this post. LOL. Then, have the system fix it.
wala.

Anonymous
06/21/26(Sun)07:59:08 No.109103774

Anonymous 06/21/26(Sun)07:59:08 No.109103774

>>109103453
You don't like Lepora?

Anonymous
06/21/26(Sun)08:30:12 No.109103890

Anonymous 06/21/26(Sun)08:30:12 No.109103890

File: 1754738326595999.png (139 KB, 1598x478)

139 KB PNG

>>109101986
Why are so many api vibecoders active like crack addicts on withdrawals for Fable 5? I know anthropic models are usually pretty good but it couldn't have been THAT good. They have so many other options so why are they locking themselves to Anthropic like someone with a partner that definitely loves them and doesn't hit them?

Anonymous
06/21/26(Sun)08:39:41 No.109103940

Anonymous 06/21/26(Sun)08:39:41 No.109103940

>>109103890
It's likely because they use claude code which officially only supports claude models. They are also likely on a subscription to claude, so unlikely to try anything else.

Anonymous
06/21/26(Sun)08:40:08 No.109103944

Anonymous 06/21/26(Sun)08:40:08 No.109103944

>>109103890
If you didn't run into its refusals, Fable was Opus 3 creativity combined with modern LLM smarts at a cheaper price.

Anonymous
06/21/26(Sun)08:45:27 No.109103961

Anonymous 06/21/26(Sun)08:45:27 No.109103961

>>109103890
Dear organic shilling campaign's quality control,
The employee who made this post is way too retarded.
Please fire him so he has to go work on cotton fields.
Best wishes,
Anon

Anonymous
06/21/26(Sun)08:55:49 No.109104006

Anonymous 06/21/26(Sun)08:55:49 No.109104006

>>109103940
>officially only supports claude models.
works with Gemma, Mistral-Medium and Qwen3.5
i didn't use fable but see this: https://old.reddit.com/r/LocalLLaMA/comments/1u8g3d0/gemma_4_e2b_running_inbrowser_at_255_toks_using/os8um7d/
41.5t/s -> 84t/s artificially cucked by anthropic
then uncucked -> 254t/s

Anonymous
06/21/26(Sun)09:02:47 No.109104032

Anonymous 06/21/26(Sun)09:02:47 No.109104032

>>109103890
Despite the OSS cope, glm 5.1 was only on the level of Sonnet 4.6 when I tried it, not Opus like they claimed. Gpt 5.5 on codex is garbage compared to Claude. Anthropic have the lead and it will stay that way until the bubble busts or ASI is achieved. Everything else is cope. Chink distills hit the wall when they hid the reasoning traces. Zhang can generate the reasoning based on the output and run RL on it but it will never ever be as good.

Anonymous
06/21/26(Sun)09:14:05 No.109104082

Anonymous 06/21/26(Sun)09:14:05 No.109104082

>>109104032
You might want to consider OSS soon fren:

https://privacy.claude.com/en/articles/10301952-updates-to-our-privacy-policy
https://support.claude.com/en/articles/14328960-identity-verification-on-claude

Anonymous
06/21/26(Sun)09:26:57 No.109104143

Anonymous 06/21/26(Sun)09:26:57 No.109104143

File: 1771551972114681.png (36 KB, 1402x82)

36 KB PNG

For maybe the 3 other anons on the planet who use base models, v4 flash base felt just as good if not better than v3.2 base, so it's a free uplift. Checkout the PR and quant the base version yourself.
There's still come eval correctness that slipped past ppl test and fastforwarding quirk because of SWA, but I have GLM fixed it for me.

Anonymous
06/21/26(Sun)09:30:36 No.109104164

Anonymous 06/21/26(Sun)09:30:36 No.109104164

>>109104032
>Zhang can generate the reasoning based on the output and run RL on it but it will never ever be as good.
Chinks have a better way to generate than Zhang

Anonymous
06/21/26(Sun)09:53:57 No.109104277

Anonymous 06/21/26(Sun)09:53:57 No.109104277

>>109104082
No fucking way. Time to go full local now.

Anonymous
06/21/26(Sun)10:18:20 No.109104391

Anonymous 06/21/26(Sun)10:18:20 No.109104391

>>109104082
>actually a thing
holy
https://old.reddit.com/r/Anthropic/comments/1ubm10v/stop_with_this_id_verification/

Anonymous
06/21/26(Sun)10:21:06 No.109104401

Anonymous 06/21/26(Sun)10:21:06 No.109104401

>>109104082
What do the AI catchads make of this?

Anonymous
06/21/26(Sun)10:22:13 No.109104405

Anonymous 06/21/26(Sun)10:22:13 No.109104405

>>109104082
thanks anon, i won't be renewing my cc

Anonymous
06/21/26(Sun)10:22:55 No.109104410

Anonymous 06/21/26(Sun)10:22:55 No.109104410

>>109104082

The systems globally are going all out with this digital ID thing.
They want it in place before systems shit the bed, because the retards at the top think they can just have their panopticon and jail all dissidents and prevent people from fucking up the ones in power, once the inevitable rioting starts as economies die.

Anonymous
06/21/26(Sun)10:23:44 No.109104412

Anonymous 06/21/26(Sun)10:23:44 No.109104412

>>109104082
>to confirm your age and nothing more
>Your data will be deleted immediately after the check is done.
Despite so many examples of this being lies, the average cattle will just keep believing this.

Anonymous
06/21/26(Sun)10:26:42 No.109104423

Anonymous 06/21/26(Sun)10:26:42 No.109104423

>>109104410
>because the retards at the top think they can
I am sure they can. They were able to successfully put down OWS and that was when the Patriot Act was relatively new and they didn't have a fraction of the technology or data collection going that they have now.

Anonymous
06/21/26(Sun)10:26:42 No.109104424

Anonymous 06/21/26(Sun)10:26:42 No.109104424

File: Screenshot_20260622_002604.png (7 KB, 580x118)

7 KB PNG

Is MiniMax-M3 supposed to think for 15k tokens?

Anonymous
06/21/26(Sun)10:28:14 No.109104432

Anonymous 06/21/26(Sun)10:28:14 No.109104432

Anyone else been testing out the opus/fable fine tuned qwen and gemma models?
The qwen teams base models seems like they attempted to make the models smart, but qwopus just blow it out of the water.
And for gemma, Google was holding back, because the fine tuned versions are like different models entirely. They go from having arrogant confidence about their hallucination, to using the database they had access to the whole time, to confirm their understanding before doing their unreasonably complicated tests im giving them.

Anonymous
06/21/26(Sun)10:29:07 No.109104435

Anonymous 06/21/26(Sun)10:29:07 No.109104435

>>109104432
This isn't reddit

Anonymous
06/21/26(Sun)10:29:15 No.109104436

Anonymous 06/21/26(Sun)10:29:15 No.109104436

>>109104424
What did (You) ask it?

Anonymous
06/21/26(Sun)10:30:27 No.109104441

Anonymous 06/21/26(Sun)10:30:27 No.109104441

File: slop metrics.png (54 KB, 729x502)

54 KB PNG

>>109098939
check out https://huggingface.co/Gryphe/Pantheon-Reasoning-31B-1.1

Anonymous
06/21/26(Sun)10:32:00 No.109104446

Anonymous 06/21/26(Sun)10:32:00 No.109104446

>>109104410
You're so naive. "The retards at the top" learned from OWS and channeled all dissent into harmless for them gender identity wars. Everybody only cares about whether you use correct pronouns instead of uniting against the elites.

Anonymous
06/21/26(Sun)10:33:27 No.109104453

Anonymous 06/21/26(Sun)10:33:27 No.109104453

>>109104423
>put down
It was never a threat to anybody, they were retards with no demands or goals beyond hanging out and larping.

Anonymous
06/21/26(Sun)10:33:31 No.109104454

Anonymous 06/21/26(Sun)10:33:31 No.109104454

>>109104435
Why are you a faggot?

Anonymous
06/21/26(Sun)10:34:20 No.109104456

Anonymous 06/21/26(Sun)10:34:20 No.109104456

>>109104436
posted the llama-quantize --help, asked for a markdown table of all the quant types, without duplicates/aliases/repacks, order by bpw desc

Anonymous
06/21/26(Sun)10:35:47 No.109104465

Anonymous 06/21/26(Sun)10:35:47 No.109104465

>>109104456
KEK, its looping for some insane reason, likely because they dont have the full model doing requests like that, its likely a 9b or even smaller

Anonymous
06/21/26(Sun)10:41:34 No.109104493

Anonymous 06/21/26(Sun)10:41:34 No.109104493

>>109104423

Can they build this digital ID thing? Yes. Does it save them? No.
It's an entirely different game as OSW, because that was basically 100% Millenials who were the only really disgruntled group and the systems were able to print themselves out of the hole.
This time around we're looking at 20 years of more fucked up economy and multiple even more pissed off generations than before.
Data in this situation doesn't mean shit because you simply can't arrest everyone and people are more aware of the system fuckery than ever before in human history.

>>109104446

And you are naive if you think that the current situation is going to magically remain forever, when that kind of a permanent state of being has never been a thing in human history.
Even this current pronouns environment is new as fuck, it's not even 15 years old and it won't be a thing 15 years from now as we're at the end of an empire cycle.
This naivete and pretense only lasts as long as the comfort does and comfort is increasingly out of the window, with even personal entertainment getting kicked to the curb and escapism dying.
Humans are tribal as fuck when resources get scarce and they will get scarce soon enough, especially when the growth center moves towards Asia and Western markets won't rebound.

Anonymous
06/21/26(Sun)10:42:58 No.109104498

Anonymous 06/21/26(Sun)10:42:58 No.109104498

anyone using agents other than crush or late-cli that aren't nodejs garbage? Really don't want to get supply chain attacked from running js shit but it seems like 99% of harnesses are python or js

Anonymous
06/21/26(Sun)10:48:09 No.109104519

Anonymous 06/21/26(Sun)10:48:09 No.109104519

>>109104498
>like 99% of harnesses are python or js
The only ones that aren't are written in Rust and wrapped in a node package anyway so you get the benefit of being vulnerable to supply chain attacks from two package install happy ecosystems.
Only options are to either use what's there and use an application firewall to block all requests except to your server endpoint and hope that's enough, or to invest the time in writing and maintaining your own.

Anonymous
06/21/26(Sun)10:50:34 No.109104526

Anonymous 06/21/26(Sun)10:50:34 No.109104526

>>109104493
Yeah elites have eaten the seed corn. Oblige noblesse has been entirely forgotten and it’s just “I got mine fuk u” for days.
Too much concentration of wealth and the whole virtuous cycle just shuts down.
We legit need a society-wide hivemind to get out of this trap I feel.
I managed to scramble up above the mean to be ahead of the poverty void, but I fear if it all collapses that won’t matter.

Anonymous
06/21/26(Sun)10:54:00 No.109104548

Anonymous 06/21/26(Sun)10:54:00 No.109104548

>>109104519
>write your own
yeah it seems this way, I've been mulling using one of the Go based ones to bootstrap my own in cpp. Crush is decent but I have a lot of opinions about how things should work that it doesn't align with.

I just want a single binary that's not going to try to hack my computer in the background with auto updates, doesn't take 800ms to make decisions, doesn't ship with 500MB of dependencies etc. At most a lightweight plugin system with something like Lua not fucking JS lol.

Anonymous
06/21/26(Sun)10:56:48 No.109104557

Anonymous 06/21/26(Sun)10:56:48 No.109104557

>>109104493
The US already has the largest prison population on the planet and the rest of the west has shown themselves to be quite happy to arrest people for twitter comments and to free "minority" murderers to make room for them.
>Humans are tribal as fuck when resources get scarce and they will get scarce soon enough, especially when the growth center moves towards Asia and Western markets won't rebound.
You're close, the west will cannibalize itself like this while the rest of the world moves on with China in the lead. Empire collapses are not something that happen overnight. It could be decades or a century of the Sick Man of the West to fully breakdown and be subsumed by the emerging powers.

To at least try to keep it on topic, with the technology they have now, it's easier for them to simply build a social credit score system where everyone is kept in line without arrests by the fear of being blocked from the system due to a low score. Like sanctions on an individual level. Those that do attempt to revolt en mass face autonomous drones. This has never been possible before and will help keep the system from collapsing much longer than it should.

Anonymous
06/21/26(Sun)11:03:22 No.109104588

Anonymous 06/21/26(Sun)11:03:22 No.109104588

>>109103511
NTA but I'd assume any two instruct tunes off the same base should have a lot of similarity in the structure of the embeddings, which is what the mmproj is projecting into

Anonymous
06/21/26(Sun)11:05:50 No.109104602

Anonymous 06/21/26(Sun)11:05:50 No.109104602

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

Anonymous
06/21/26(Sun)11:19:36 No.109104682

Anonymous 06/21/26(Sun)11:19:36 No.109104682

>>109101986
I have NVIDIA GeForce RTX 4050 Laptop GPU.
I tried nemo 12b instruct gguf as instructed by the lazy guide. It kind of sucks? Also my laptop barely got hot so that means I can run something better right? Does something better exist?

Anonymous
06/21/26(Sun)11:20:25 No.109104688

Anonymous 06/21/26(Sun)11:20:25 No.109104688

>>109104682
What is your usecase

Anonymous
06/21/26(Sun)11:23:00 No.109104707

Anonymous 06/21/26(Sun)11:23:00 No.109104707

>>109104688
Mostly roleplay and getting funny outputs, for work stuff I use Claude. Doesn't need to be coomer friendly, though lack of censorship is always nice

Anonymous
06/21/26(Sun)11:32:16 No.109104756

Anonymous 06/21/26(Sun)11:32:16 No.109104756

>>109104707
Nemo is likely your best bet. There is also gemma4 12b but it's way more slopped

Anonymous
06/21/26(Sun)11:40:26 No.109104796

Anonymous 06/21/26(Sun)11:40:26 No.109104796

>>109104602
wood

Anonymous
06/21/26(Sun)11:41:24 No.109104803

Anonymous 06/21/26(Sun)11:41:24 No.109104803

File: g4-prose-base.png (203 KB, 1137x732)

203 KB PNG

>>109104756
I'm trying to remove the purple prose from Gemma 4 with a modified version of heretic ablation. I already made a classifier to reject what NOT to do (slop, purple prose, not X but Y coming soon). This is a LOT faster than finetuning.
Pic related is the base model.
1/2

Anonymous
06/21/26(Sun)11:42:26 No.109104809

Anonymous 06/21/26(Sun)11:42:26 No.109104809

File: g4-prose-ablated.png (142 KB, 1149x561)

142 KB PNG

>>109104756
>>109104803
And this is the ablated version.
2/2

Anonymous
06/21/26(Sun)11:43:47 No.109104818

Anonymous 06/21/26(Sun)11:43:47 No.109104818

>>109104143
After compiling his PR I can't even load the gguf the guy uploaded himself
>unknown model architecture: 'deepseek-v4-flash'

Anonymous
06/21/26(Sun)11:44:46 No.109104823

Anonymous 06/21/26(Sun)11:44:46 No.109104823

>>109104803
>I'm trying to remove the purple prose from Gemma 4 with a modified version of heretic ablation
glad to see someone is trying it out. hope you'll share the model if it's good.

Anonymous
06/21/26(Sun)11:46:27 No.109104834

Anonymous 06/21/26(Sun)11:46:27 No.109104834

>>109104803
you did see about https://huggingface.co/Gryphe/Gemma-4-26B-A4B-StyleTune-V2 stuff, yeah?

Anonymous
06/21/26(Sun)11:46:58 No.109104838

Anonymous 06/21/26(Sun)11:46:58 No.109104838

>>109104803
>clung to x like a second skin
>fluid movements
>air felt heavy, charged
>predatory
giga slopmachine, my 3.3 70b finetune doesn't do this

Anonymous
06/21/26(Sun)11:47:17 No.109104840

Anonymous 06/21/26(Sun)11:47:17 No.109104840

File: 1769652444605653.png (702 KB, 832x1216)

702 KB PNG

>>109104602
Remind me last test to care about this person

Anonymous
06/21/26(Sun)11:49:20 No.109104849

Anonymous 06/21/26(Sun)11:49:20 No.109104849

>>109104838
I can fix her. And I'm trying to. KL divergence doesn't work here because depurpling means shifting every single token so I'm using perplexity as guard. After the brain surgery is done, only benchmarks can guarantee it's not vegetable.

Anonymous
06/21/26(Sun)11:50:31 No.109104856

Anonymous 06/21/26(Sun)11:50:31 No.109104856

>>109104809
Anon there is [adjective, adjective noun] in every paragraph

Anonymous
06/21/26(Sun)11:51:56 No.109104866

Anonymous 06/21/26(Sun)11:51:56 No.109104866

-m ~/llm_models/gemma4-31b-qat/gemma-4-31B_q4_0-it.gguf \
  --mmproj ~/llm_models/gemma4-31b-qat/gemma-4-31B-it-mmproj.gguf \
  --spec-draft-model ~/llm_models/gemma4-31b-qat/gemma-4-31B-it-qat-assistant-MTP-Q8_0.gguf \
  --spec-type draft-mtp \
  --spec-draft-n-max 2 \
  --host 0.0.0.0 \
  --port 8080 \
  -ngl 99 \
  -c 65536 \
  -fa on \
  -ctk q8_0 \
  -ctv q8_0 \
  -np 1 \
  --ctx-checkpoints 8192 \
  --swa-checkpoints 2 \
  -cms 8192 \
  --cache-ram 0 \
  -fit off \
  --no-mmproj-offload

How's my launch command? can it be improved?

Anonymous
06/21/26(Sun)11:52:16 No.109104867

Anonymous 06/21/26(Sun)11:52:16 No.109104867

>>109104856
I'm manually annotating the training data for my classifier and didn't account for this. It only has 4k samples now. Will add these later.

Anonymous
06/21/26(Sun)11:52:34 No.109104871

Anonymous 06/21/26(Sun)11:52:34 No.109104871

>>109104803
it doesnt purple prose when writing in japanese

Anonymous
06/21/26(Sun)11:55:51 No.109104895

Anonymous 06/21/26(Sun)11:55:51 No.109104895

largest moe model that isnt shit?

Anonymous
06/21/26(Sun)11:57:14 No.109104901

Anonymous 06/21/26(Sun)11:57:14 No.109104901

>>109104895
kimi-chan?

Anonymous
06/21/26(Sun)11:59:58 No.109104918

Anonymous 06/21/26(Sun)11:59:58 No.109104918

>>109104895
GLM 5.2

Anonymous
06/21/26(Sun)12:01:57 No.109104929

Anonymous 06/21/26(Sun)12:01:57 No.109104929

>>109104834
I've read the V1 where he only trained the head. I don't train anything. The only dataset involved is the one training my classifier that acts as kind of a reward function.

Anonymous
06/21/26(Sun)12:08:27 No.109104962

Anonymous 06/21/26(Sun)12:08:27 No.109104962

File: 1758463690480.png (157 KB, 947x1138)

157 KB PNG

>>109104895
>largest moe model that isnt shit?
all moes are shit
https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth

Anonymous
06/21/26(Sun)12:08:43 No.109104965

Anonymous 06/21/26(Sun)12:08:43 No.109104965

>>109104818 (me)
aghhh he renamed it to "deepseek4"

Anonymous
06/21/26(Sun)12:10:07 No.109104972

Anonymous 06/21/26(Sun)12:10:07 No.109104972

>>109104962
You need to accept that there will never be another 405B.

Anonymous
06/21/26(Sun)12:13:32 No.109104992

Anonymous 06/21/26(Sun)12:13:32 No.109104992

>>109104962
sure let me fit 405B into vram real quick

Anonymous
06/21/26(Sun)12:14:20 No.109104998

Anonymous 06/21/26(Sun)12:14:20 No.109104998

>>109104972
we're one paper away from a major dense comeback

Anonymous
06/21/26(Sun)12:21:36 No.109105038

Anonymous 06/21/26(Sun)12:21:36 No.109105038

>>109104998
it wont be an ml paper, the problem with dense is our ability to manipulate the laws of physics to do our bidding, maybe if they can make computers better dense will come back, but I'm not holding my breath waiting.

Anonymous
06/21/26(Sun)12:25:12 No.109105053

Anonymous 06/21/26(Sun)12:25:12 No.109105053

>>109105038
if all other avenues for scaling to improve benchmarks stall out, they'll have no choice but to go back to increasing active parameter count

Anonymous
06/21/26(Sun)12:32:44 No.109105093

Anonymous 06/21/26(Sun)12:32:44 No.109105093

>>109105053
fair enough but, they run awfully slow with current year tech and they consume lots of resources. it can't be a serious consideration for any of them but maybe as last resort or internal model used only for distillation.

Anonymous
06/21/26(Sun)12:32:57 No.109105094

Anonymous 06/21/26(Sun)12:32:57 No.109105094

>>109105053
Next continuous vector prediction could cut compute by 2-3 orders of magnitude, no need to go back to dense.

Anonymous
06/21/26(Sun)12:33:05 No.109105095

Anonymous 06/21/26(Sun)12:33:05 No.109105095

>>109104992
>405B
I ran it on cpu back when it came out. It was just ok and not worth the trouble.
It would be cool if we just didn't know what we were doing at it were amazing...is _anybody_ using that dense 405b in the year 2026?
We're probably 10 years away from hardware being so fast that moe stops being relevant...you can always run a better , higher parameter model at a better speed with moe until the intersection of the scaling cliff for model size and cheap, fast hardware

Anonymous
06/21/26(Sun)12:37:51 No.109105118

Anonymous 06/21/26(Sun)12:37:51 No.109105118

>>109104962
>benchmarks that don't matter
tokens per second
>benchmarks that matter
being able to tell you if there's land at a specific latitude and longitude

Anonymous
06/21/26(Sun)13:20:25 No.109105388

Anonymous 06/21/26(Sun)13:20:25 No.109105388

>>109105118
that "benchmark" is about generalization and world knowledge moebrainletbro, it's okay, I know your small 32b active brain cannot quite comprehend this.

Anonymous
06/21/26(Sun)13:27:54 No.109105427

Anonymous 06/21/26(Sun)13:27:54 No.109105427

>>109105388
>40 minutes later
Average 50 token densesissie response time

Anonymous
06/21/26(Sun)13:28:23 No.109105428

Anonymous 06/21/26(Sun)13:28:23 No.109105428

>>109105388
>smugly explains the obvious
now explain why the marginal difference is worth over 10x more compute per token?

Anonymous
06/21/26(Sun)13:32:24 No.109105450

Anonymous 06/21/26(Sun)13:32:24 No.109105450

>>109105428
Don't expect him to see reason, moe derangement is a classic symptom of lack of ram

Anonymous
06/21/26(Sun)13:34:31 No.109105463

Anonymous 06/21/26(Sun)13:34:31 No.109105463

>>109105450
I have 256 gigs of RAM but use Gemma 4 31B because I'm white.

Anonymous
06/21/26(Sun)13:39:49 No.109105489

Anonymous 06/21/26(Sun)13:39:49 No.109105489

>>109105463
I use r1 and I'm likely whiter than you.

Anonymous
06/21/26(Sun)13:47:57 No.109105543

Anonymous 06/21/26(Sun)13:47:57 No.109105543

>>109105489
I'm going to need to see your dick for proof

Anonymous
06/21/26(Sun)13:52:06 No.109105568

Anonymous 06/21/26(Sun)13:52:06 No.109105568

>>109105543
I have aryan features but my dick is curved from gooning

Anonymous
06/21/26(Sun)13:53:51 No.109105574

Anonymous 06/21/26(Sun)13:53:51 No.109105574

File: 1746026615735729.png (122 KB, 408x388)

122 KB PNG

It's the second time today I try downloading Gemma and it fails at 99% "File wasn't available on site".

Anonymous
06/21/26(Sun)13:57:15 No.109105591

Anonymous 06/21/26(Sun)13:57:15 No.109105591

>>109103511
I play with Kimi-chan a lot!

Anonymous
06/21/26(Sun)13:57:33 No.109105595

Anonymous 06/21/26(Sun)13:57:33 No.109105595

>>109105568
T-thats okay anon senpai I'm sure it's beautiful

Anonymous
06/21/26(Sun)13:59:45 No.109105603

Anonymous 06/21/26(Sun)13:59:45 No.109105603

>>109103511
>>109105591 (me)
Shitposting aside, I figured that the architecture is similar enough given llama never has to update for new Kimis so unless there was something specific in RLHF to make it work on newer versions, there isn't really any reason why it shouldn't work.

Anonymous
06/21/26(Sun)13:59:56 No.109105604

Anonymous 06/21/26(Sun)13:59:56 No.109105604

>>109105574
Have you tried torrenting Gemma?

Anonymous
06/21/26(Sun)14:14:47 No.109105680

Anonymous 06/21/26(Sun)14:14:47 No.109105680

>>109103294
I get 37 tokens per second with a 4070 and the 26b. I have 23 layers on the GPU using gemma-4-26B-A4B-it-qat-UD-Q4_K_XL and 24000 context. If chats get really long you can remove layers to add context and things will get a bit slower but still be quite usable.

Anonymous
06/21/26(Sun)14:16:16 No.109105684

Anonymous 06/21/26(Sun)14:16:16 No.109105684

>>109104032
>Anthropic have the lead and it will stay that way
Anon, GLM 5.2 is already Haiku priced, gets near Opus level and rapes the latest Sonnet
And Fable quite literally doesn't exist at the moment

Anonymous
06/21/26(Sun)14:24:54 No.109105734

Anonymous 06/21/26(Sun)14:24:54 No.109105734

>jewthropic
no I'm not uploading a photo. cope and seethe

Anonymous
06/21/26(Sun)14:45:57 No.109105882

Anonymous 06/21/26(Sun)14:45:57 No.109105882

> Ask Opus 4.8 Ultracode to figure out what is causing a bug
> Ask ChatGPT 5.5 xHigh to figure out what is causing the same bug
> send both answers to ChatGPT 5.5 Pro extended thinking
> Use whatever the fuck it comes up with and make Codex fix it

let's see if my strategy will work this time.

Anonymous
06/21/26(Sun)14:47:32 No.109105895

Anonymous 06/21/26(Sun)14:47:32 No.109105895

>>109105734
why don't you simply send an AI generated photo?

Anonymous
06/21/26(Sun)14:49:02 No.109105904

Anonymous 06/21/26(Sun)14:49:02 No.109105904

>>109105604
>torrenting
Shiiiiet unc is living in 2010 :skull:

Anonymous
06/21/26(Sun)14:49:59 No.109105911

Anonymous 06/21/26(Sun)14:49:59 No.109105911

File: 1772814073840939.png (71 KB, 1576x877)

71 KB PNG

Anonymous
06/21/26(Sun)15:00:08 No.109105987

Anonymous 06/21/26(Sun)15:00:08 No.109105987

>>109104441
I'm not feeling this so far... The reason I think styletune works so nicely is that it still mostly retains Gemma's reasoning. But I still need to test it in long term. I don't really like the summary-ish style of this, but given how nicely styletune turned out, I'll give it the benefit of the doubt.

Anonymous
06/21/26(Sun)15:01:05 No.109105996

Anonymous 06/21/26(Sun)15:01:05 No.109105996

>>109103578
I choose to make a custom class.

Anonymous
06/21/26(Sun)15:07:31 No.109106044

Anonymous 06/21/26(Sun)15:07:31 No.109106044

>>109105911
please tell me your gemmers made this

Anonymous
06/21/26(Sun)15:08:00 No.109106047

Anonymous 06/21/26(Sun)15:08:00 No.109106047

>>109105911
We didn't have charts like that when Nemo released. Is Gemma the first AGI girlfriend?

Anonymous
06/21/26(Sun)15:09:01 No.109106059

Anonymous 06/21/26(Sun)15:09:01 No.109106059

File: Screenshot 2026-06-22 at (...).png (189 KB, 1526x1278)

189 KB PNG

i am tryng that 9b qwythos memetune thing and
still kinda broken(obviously) but somehow successfully transferred the tasteslop ui style
prompt: make a self contained html page that takes an image and generates a critique of the image from various kinds of heuristics calculated from the image

Anonymous
06/21/26(Sun)15:12:23 No.109106090

Anonymous 06/21/26(Sun)15:12:23 No.109106090

>>109106044
Yes

Anonymous
06/21/26(Sun)15:24:46 No.109106175

Anonymous 06/21/26(Sun)15:24:46 No.109106175

>>109105911
Cute

Anonymous
06/21/26(Sun)15:25:25 No.109106179

Anonymous 06/21/26(Sun)15:25:25 No.109106179

>>109106059
>chinese upload button
You're absolutely right, Qwen has some really good models!

Anonymous
06/21/26(Sun)15:26:25 No.109106185

Anonymous 06/21/26(Sun)15:26:25 No.109106185

>>109105680
Thanks anon.
Are you on llama or kobold?

Anonymous
06/21/26(Sun)15:27:15 No.109106189

Anonymous 06/21/26(Sun)15:27:15 No.109106189

>>109106179
that's korean you moran
i really hate qwen doing 'wait, let me chack again' bullshit 12 times in a row to give me a wrong answer

Anonymous
06/21/26(Sun)15:29:49 No.109106205

Anonymous 06/21/26(Sun)15:29:49 No.109106205

anyone here have experience running llama.cpp (or other runtimes) on multiple devices? My job will pay for a 48GB macbook pro and I'm wondering how viable it is to connect it to my existing strix halo device

Anonymous
06/21/26(Sun)15:35:29 No.109106239

Anonymous 06/21/26(Sun)15:35:29 No.109106239

>>109106189
<think>Wait, the user has mentioned that he is korean, I need to double-check the system prompt. The system prompt says that the user is korean, but the user claims that he is now chinese? Let me double-check...中国 But wait...</think>
Have you tried using proper prompts? It's probably why it is looping.

Anonymous
06/21/26(Sun)15:44:10 No.109106300

Anonymous 06/21/26(Sun)15:44:10 No.109106300

>>109106205
Given how janky llama.cpp is I doubt rpc would be stable with two different backends (rocm + metal). You can try arguing with them to get a 256gb mac studio or just build a cuda rig for your whole department.

Anonymous
06/21/26(Sun)15:46:32 No.109106320

Anonymous 06/21/26(Sun)15:46:32 No.109106320

>>109106205
unviable

Anonymous
06/21/26(Sun)15:47:17 No.109106322

Anonymous 06/21/26(Sun)15:47:17 No.109106322

>>109102176
can't you get a frontier model to write the spec and implementation plan? i've been saving tokens with this flow:

frontier spec/plan qwen3.6 output
frontier review output
if changes are small, frontier apply them
if changes are big, frontier write a feedback.md and then qwen3.6 has a go at it

so far it's been working well and this is the first week i didn't blow out my weekly claude max quota.

Anonymous
06/21/26(Sun)15:48:27 No.109106329

Anonymous 06/21/26(Sun)15:48:27 No.109106329

>>109106205
Unless you can get a low-latency 100gbps+ link between them, forget about it

Anonymous
06/21/26(Sun)15:50:21 No.109106340

Anonymous 06/21/26(Sun)15:50:21 No.109106340

>>109106300
Apple stopped selling the 256 / 512GB mac studios, and they have an upper cap on how much $ they'll reimburse for one, so a 128GB macbook isn't gonna work. Oh well. I'll just run two qwens simultaneously and give them different tasks.

Anonymous
06/21/26(Sun)15:51:17 No.109106347

Anonymous 06/21/26(Sun)15:51:17 No.109106347

>>109106329
is the residual stream really that big? i thought it was just a little 3d tensor it needed to pass between layers?

Anonymous
06/21/26(Sun)15:52:31 No.109106359

Anonymous 06/21/26(Sun)15:52:31 No.109106359

>>109106322
I've been using Qwen3.6 to do opus4.8 conversation compactions after each task, but this sounds smart too. I'm gonna try that. I hit my max quota in like 3 days too

Anonymous
06/21/26(Sun)15:57:59 No.109106383

Anonymous 06/21/26(Sun)15:57:59 No.109106383

>>109106359
yeah my arrows got stripped
depending on the plan and the milestones, i let qwen3.6 follow a plan until a specific milestone (that is tagged for review), then it pauses and wait for opus review, then opus send a ping back saying the review is ready and qwen3.6 applies everything and keeps going until the next milestone. it works. i will try your method to see if i have some gainz

Anonymous
06/21/26(Sun)16:01:50 No.109106403

Anonymous 06/21/26(Sun)16:01:50 No.109106403

File: 1768123418089279.png (215 KB, 746x1090)

215 KB PNG

Reminder

Anonymous
06/21/26(Sun)16:03:09 No.109106412

Anonymous 06/21/26(Sun)16:03:09 No.109106412

What happened to that mistral whatever the fuck it was?

Anonymous
06/21/26(Sun)16:06:38 No.109106431

Anonymous 06/21/26(Sun)16:06:38 No.109106431

>>109106412
>mistral
we got promised new models this summer

Anonymous
06/21/26(Sun)16:07:50 No.109106438

Anonymous 06/21/26(Sun)16:07:50 No.109106438

File: 8456432.png (161 KB, 885x755)

161 KB PNG

>>109106403
>meanwhile anthropic is maybe a few iterations away from AGI

Anonymous
06/21/26(Sun)16:10:40 No.109106452

Anonymous 06/21/26(Sun)16:10:40 No.109106452

I wish zai made a asmaller model for 'gaming pc' tier size

Anonymous
06/21/26(Sun)16:11:27 No.109106459

Anonymous 06/21/26(Sun)16:11:27 No.109106459

need models in the 100b to 140b range

Anonymous
06/21/26(Sun)16:11:48 No.109106460

Anonymous 06/21/26(Sun)16:11:48 No.109106460

>>109106452
They used to make 30B models.

Anonymous
06/21/26(Sun)16:12:29 No.109106464

Anonymous 06/21/26(Sun)16:12:29 No.109106464

>>109106460
but not anymore apparently

Anonymous
06/21/26(Sun)16:15:59 No.109106482

Anonymous 06/21/26(Sun)16:15:59 No.109106482

>>109106464
They could try beating Gemma in the small model category, but who knows.

Anonymous
06/21/26(Sun)16:20:13 No.109106505

Anonymous 06/21/26(Sun)16:20:13 No.109106505

File: Screenshot 2026-06-22 at (...).png (116 KB, 822x484)

116 KB PNG

>>109106482
with this bullshit i hope there to be more models of this range
regardless of memetune it probably really have shown labs that what next decent PR move would be

Anonymous
06/21/26(Sun)16:27:37 No.109106547

Anonymous 06/21/26(Sun)16:27:37 No.109106547

>>109106505
Kinda crazy how it revitalized discussions and finetuning. Good PR for their large models for sure.

Anonymous
06/21/26(Sun)16:32:01 No.109106569

Anonymous 06/21/26(Sun)16:32:01 No.109106569

>>109106547
and training something of that size wouldn't be that hard
still it would be trillions of tokens but i dont really think such an insignificant training runs would bother execs much

Anonymous
06/21/26(Sun)16:36:22 No.109106589

Anonymous 06/21/26(Sun)16:36:22 No.109106589

I've noticed a lot more talk about open weight models since the release of 5.2. It was rare for local to even be mentioned in discussion and especially in mainstream news. I think the timing of that model combined with fagble getting banned was the tipping point.

Anonymous
06/21/26(Sun)16:38:24 No.109106604

Anonymous 06/21/26(Sun)16:38:24 No.109106604

>>109106438
>There are no rules preventing the labs from continuing to advance capabilities of current models
Other than their non-us citizen employees not being allowed to work on it lol. But ya, I am sure they are working on it in the background, for now anyways. They do need to convince investors that the research will bring future profits, which becomes a harder sell if your market shrinks to only us citizens. Stuff like GLM however will, I assume, force the governments hand in not regulating too much

Anonymous
06/21/26(Sun)16:43:07 No.109106626

Anonymous 06/21/26(Sun)16:43:07 No.109106626

>>109106589
richfags are running glm 5.2 local with $100k h200 miniclusters

Anonymous
06/21/26(Sun)16:43:11 No.109106627

Anonymous 06/21/26(Sun)16:43:11 No.109106627

File: file.png (96 KB, 763x429)

96 KB PNG

>>109104082
With how shitty Claude has been lately, I'll just cancel my subscription.

Anonymous
06/21/26(Sun)16:43:12 No.109106628

Anonymous 06/21/26(Sun)16:43:12 No.109106628

>>109106505
That would be very useful for me right now but Gemma-4-12b shits itself running inference with llama.cpp in OpenCode

Anonymous
06/21/26(Sun)16:46:33 No.109106650

Anonymous 06/21/26(Sun)16:46:33 No.109106650

>>109106627
Insane that people pay for this.

Anonymous
06/21/26(Sun)16:46:51 No.109106654

Anonymous 06/21/26(Sun)16:46:51 No.109106654

>>109106628
it really feels like gemma 4 12b is kind of like a failed run
unstructured input doesnt help for so called agentic shit

Anonymous
06/21/26(Sun)16:48:32 No.109106661

Anonymous 06/21/26(Sun)16:48:32 No.109106661

>>109106431
Really?
Cool.
Here's hoping they deliver something good.

Anonymous
06/21/26(Sun)16:50:01 No.109106669

Anonymous 06/21/26(Sun)16:50:01 No.109106669

File: 1750958094845097.webm (3.12 MB, 520x710)

3.12 MB WEBM

Huggingface will require ID soon. Reminder to backup.

Anonymous
06/21/26(Sun)16:50:28 No.109106674

Anonymous 06/21/26(Sun)16:50:28 No.109106674

>>109106628
I'm using 12B QAT and it's working very well for me.

Anonymous
06/21/26(Sun)16:51:53 No.109106687

Anonymous 06/21/26(Sun)16:51:53 No.109106687

speaking of memetunes, i wonder if jackrong does make one for mythos/fable traces
he is like the only one who takes the shit half seriously

Anonymous
06/21/26(Sun)16:54:39 No.109106706

Anonymous 06/21/26(Sun)16:54:39 No.109106706

File: 1775552676267773.png (17 KB, 652x328)

17 KB PNG

>>109104962
This is a retarded benchmark. Any thinking model will RAPE this benchmark.

Following code is vibed by GLM 5.2
https://pastebin.com/XcyNQSxw

Anonymous
06/21/26(Sun)16:56:27 No.109106709

Anonymous 06/21/26(Sun)16:56:27 No.109106709

>>109106706
well smaller models still shit themselves even with thinking
maybe i should run that script myself

Anonymous
06/21/26(Sun)16:57:23 No.109106710

Anonymous 06/21/26(Sun)16:57:23 No.109106710

>>109106438
Everything Anthropic says is a lie, OpenAI as well.

Anonymous
06/21/26(Sun)16:57:30 No.109106712

Anonymous 06/21/26(Sun)16:57:30 No.109106712

>>109106669
Foreseeing this day, I have kept every model that I ever liked on my hard drive. Also you're joking, right?

Anonymous
06/21/26(Sun)16:57:37 No.109106713

Anonymous 06/21/26(Sun)16:57:37 No.109106713

>>109106706

>https://api.deepseek.com/beta

what?

Anonymous
06/21/26(Sun)16:58:37 No.109106719

Anonymous 06/21/26(Sun)16:58:37 No.109106719

>>109106669
im glad i got an 1813+

Anonymous
06/21/26(Sun)16:58:46 No.109106721

Anonymous 06/21/26(Sun)16:58:46 No.109106721

>>109106669
REAL SHIT?

Anonymous
06/21/26(Sun)16:59:06 No.109106723

Anonymous 06/21/26(Sun)16:59:06 No.109106723

>>109106674
What llama.cpp settings are you using?

Anonymous
06/21/26(Sun)17:00:08 No.109106734

Anonymous 06/21/26(Sun)17:00:08 No.109106734

>>109106712
>you're joking, right?
No. Local will get hit hard soon, I don't know how, but you must be blind to not be able to pick up that something bad is coming for the local community. Back. Up.

Anonymous
06/21/26(Sun)17:04:23 No.109106752

Anonymous 06/21/26(Sun)17:04:23 No.109106752

File: 1774426186076730.png (838 KB, 1170x1788)

838 KB PNG

Stop using the word goyslop

Anonymous
06/21/26(Sun)17:06:11 No.109106761

Anonymous 06/21/26(Sun)17:06:11 No.109106761

>>109106706
Nice, exactly what I was thinking. Running it on kimi k2.7 now

Anonymous
06/21/26(Sun)17:07:03 No.109106766

Anonymous 06/21/26(Sun)17:07:03 No.109106766

>>109106761
Running on dsv4 flash costed me $2
I wouldn't run it on expensive models

Anonymous
06/21/26(Sun)17:08:44 No.109106774

Anonymous 06/21/26(Sun)17:08:44 No.109106774

>>109106752
only when (((they))) stop mutilating babies

Anonymous
06/21/26(Sun)17:10:23 No.109106782

Anonymous 06/21/26(Sun)17:10:23 No.109106782

>>109106752
Why won't they try just being more likable?

Anonymous
06/21/26(Sun)17:11:00 No.109106789

Anonymous 06/21/26(Sun)17:11:00 No.109106789

>>109106752
>makes goyslop
>getting accused of goyslop
>achshually you cant say that
lol, lmao even

Anonymous
06/21/26(Sun)17:12:22 No.109106797

Anonymous 06/21/26(Sun)17:12:22 No.109106797

>>109106721
I'ld mostly just expect that the days of letting people offer ungated goofs of gated namefag-only models probably won't last forever.

Anonymous
06/21/26(Sun)17:16:28 No.109106826

Anonymous 06/21/26(Sun)17:16:28 No.109106826

https://www.youtube.com/watch?v=Pr6tOIjFXDs&t=1722s

https://www.youtube.com/watch?v=Pr6tOIjFXDs&t=3533s

Anonymous
06/21/26(Sun)17:16:35 No.109106828

Anonymous 06/21/26(Sun)17:16:35 No.109106828

Is RPC worth setting up over multiple shitty PCs with just regular gigabit internet? I've got a 16gb vram 64gb ram laptop just lying around, is it possible to somehow add that as part of my available total RAM pool with my main PC (64gb vram 64gb ram)?

Anonymous
06/21/26(Sun)17:17:30 No.109106834

Anonymous 06/21/26(Sun)17:17:30 No.109106834

>>109106828
possible, but the performance will be horrendous

Anonymous
06/21/26(Sun)17:17:59 No.109106836

Anonymous 06/21/26(Sun)17:17:59 No.109106836

>>109106752
Give me my foreskin back and I'll consider it. Btw jews are not local models

Anonymous
06/21/26(Sun)17:19:05 No.109106840

Anonymous 06/21/26(Sun)17:19:05 No.109106840

Best way to handle cross-chat memory?

Anonymous
06/21/26(Sun)17:22:49 No.109106858

Anonymous 06/21/26(Sun)17:22:49 No.109106858

>>109106828
Yeah its only worthwhile to be able to run a model you simply can't run otherwise but that you don't mind running at like 0.1t/s

Anonymous
06/21/26(Sun)17:24:33 No.109106866

Anonymous 06/21/26(Sun)17:24:33 No.109106866

>>109106766
>expensive models
this is /lmg/. All my costs were front-loaded

Anonymous
06/21/26(Sun)17:25:42 No.109106871

Anonymous 06/21/26(Sun)17:25:42 No.109106871

>>109106840
sys: read path/to/memory.txt and occasionally update it with summaries of the current conversation

Anonymous
06/21/26(Sun)17:25:55 No.109106872

Anonymous 06/21/26(Sun)17:25:55 No.109106872

>>109106858
>>109106834
What exactly is the bottleneck there? Wouldn't the host be treated like a GPU by itself, running similar to split mode layer? Since it doesn't run in parallel anyway, the only bottleneck would be the initial loading of the model right? Actual communication from GPU to GPU is minimal in layer split. Or am I missing something obvious here

Anonymous
06/21/26(Sun)17:26:13 No.109106875

Anonymous 06/21/26(Sun)17:26:13 No.109106875

>>109106438
As opposed to when they achieved AGI via releasing Mythos
And achieved AGI via releasing 4.5 Opus
And who could forget when they achieved AGI by releasing 4 Opus

Anonymous
06/21/26(Sun)17:27:35 No.109106882

Anonymous 06/21/26(Sun)17:27:35 No.109106882

>>109106875
For me i remember when gpt 2 was too dangerous to release.

Anonymous
06/21/26(Sun)17:28:36 No.109106891

Anonymous 06/21/26(Sun)17:28:36 No.109106891

>>109106872
gpu communication during prompt processing is not minimal, which is why sxm and oam exist for server gpus.

Anonymous
06/21/26(Sun)17:28:38 No.109106892

Anonymous 06/21/26(Sun)17:28:38 No.109106892

>>109106875
>And achieved AGI via releasing 4.5 Opus
>And who could forget when they achieved AGI by releasing 4 Opus
Neither of those were taken down by the government.

Anonymous
06/21/26(Sun)17:29:00 No.109106896

Anonymous 06/21/26(Sun)17:29:00 No.109106896

>>109106882
gpt2 was fully uncensored and its training data was raw unfiltered shit

Anonymous
06/21/26(Sun)17:29:53 No.109106901

Anonymous 06/21/26(Sun)17:29:53 No.109106901

>>109106669
>Huggingface will require ID soon.
Wait, will it actually? I haven't heard anything about this.

Anonymous
06/21/26(Sun)17:30:49 No.109106910

Anonymous 06/21/26(Sun)17:30:49 No.109106910

>>109106882
and they were right. no putting the lid back on Pandora's box, the internet has been doomed to a sloppy death.

Anonymous
06/21/26(Sun)17:31:00 No.109106911

Anonymous 06/21/26(Sun)17:31:00 No.109106911

>>109106901
you have nothing to hide, goy. right?

Anonymous
06/21/26(Sun)17:31:10 No.109106912

Anonymous 06/21/26(Sun)17:31:10 No.109106912

>>109106901
Trust him that anon he's right one in 14,000 times.

Anonymous
06/21/26(Sun)17:33:24 No.109106928

Anonymous 06/21/26(Sun)17:33:24 No.109106928

>>109106911
>911

Anonymous
06/21/26(Sun)17:33:44 No.109106931

Anonymous 06/21/26(Sun)17:33:44 No.109106931

>>109106910
>has been doomed to a sloppy death.
That was before AI though AI is speeding it up but slop and shit content being push to the top or being the only thing to show up was already starting,

Anonymous
06/21/26(Sun)17:35:10 No.109106946

Anonymous 06/21/26(Sun)17:35:10 No.109106946

>id on hf
I wouldn't do it since there's really no reason. I'm happy with where we are right now. jews will seethe eternally

Anonymous
06/21/26(Sun)17:36:00 No.109106952

Anonymous 06/21/26(Sun)17:36:00 No.109106952

>>109106946
>I'm happy with where we are right now.
That's because it is currently right now. Good luck trying to get the new and improved models without an ID

Anonymous
06/21/26(Sun)17:36:39 No.109106958

Anonymous 06/21/26(Sun)17:36:39 No.109106958

Will Winnie the Pooh let the chink labs release their AGI model weights?

Anonymous
06/21/26(Sun)17:36:47 No.109106959

Anonymous 06/21/26(Sun)17:36:47 No.109106959

>>109106946
Yes but will others? most have no self control and they wont stop even for a month or two.

Anonymous
06/21/26(Sun)17:37:02 No.109106960

Anonymous 06/21/26(Sun)17:37:02 No.109106960

>>109106952
I was already fine with r1 honestly

Anonymous
06/21/26(Sun)17:37:48 No.109106967

Anonymous 06/21/26(Sun)17:37:48 No.109106967

>>109106958
If it will hurt the US companies yes. If it doesnt or benefits china more not to then no.

Anonymous
06/21/26(Sun)17:38:26 No.109106972

Anonymous 06/21/26(Sun)17:38:26 No.109106972

>>109106752
Looks like ChatGPT I think

Anonymous
06/21/26(Sun)17:43:45 No.109106993

Anonymous 06/21/26(Sun)17:43:45 No.109106993

>>109106952
If it ever hits that point, we'll probably just go back to torrents like we did for L1

Anonymous
06/21/26(Sun)17:45:28 No.109107003

Anonymous 06/21/26(Sun)17:45:28 No.109107003

>>109103451
RDNA1 and 2 don't have WMMA cores so everything is dequanted to f16 scalar and it kneecaps prefill.

Anonymous
06/21/26(Sun)17:46:12 No.109107009

Anonymous 06/21/26(Sun)17:46:12 No.109107009

>>109106993
Sorry for not knowing, but what is L1?

Anonymous
06/21/26(Sun)17:47:37 No.109107015

Anonymous 06/21/26(Sun)17:47:37 No.109107015

>>109107009
Llama 1

Anonymous
06/21/26(Sun)17:48:43 No.109107021

Anonymous 06/21/26(Sun)17:48:43 No.109107021

>>109107015
Ahh, gotcha

Anonymous
06/21/26(Sun)17:49:10 No.109107025

Anonymous 06/21/26(Sun)17:49:10 No.109107025

hf will soon have to legally categorize all current and future listed models as 18+ for which you'll require ID to access. Back up. Now.

Anonymous
06/21/26(Sun)17:49:42 No.109107026

Anonymous 06/21/26(Sun)17:49:42 No.109107026

>>109106958
I think they will allow it.
Either way it's like holding back the tide, something will appear which is better than fable eventually.

Anonymous
06/21/26(Sun)17:51:16 No.109107038

Anonymous 06/21/26(Sun)17:51:16 No.109107038

>>109107025
But I already did?????

Anonymous
06/21/26(Sun)17:51:58 No.109107042

Anonymous 06/21/26(Sun)17:51:58 No.109107042

That Chinese guy with his 300 models should try asking them to create a world model, i'm sure they're good at coding and not just glorified search engines

Anonymous
06/21/26(Sun)17:52:43 No.109107046

Anonymous 06/21/26(Sun)17:52:43 No.109107046

>>109107038
Anon cancel your debt card now before all your funds are given to India

Anonymous
06/21/26(Sun)17:53:05 No.109107052

Anonymous 06/21/26(Sun)17:53:05 No.109107052

>>109107025
Why are you so convinced hfschizo?

Anonymous
06/21/26(Sun)17:57:26 No.109107077

Anonymous 06/21/26(Sun)17:57:26 No.109107077

>>109107025
@grok is this true?
>no
oh ok

Anonymous
06/21/26(Sun)18:00:19 No.109107093

Anonymous 06/21/26(Sun)18:00:19 No.109107093

>>109107052
>industry bubble will pop
>cloud users will scramble and can't afford cloudslop no more
>they'll flock to local after some big name influencer like pewds pushes them to give it a go
>normalfags will learn about local
>enterprise will learn about local
>under 18s will learn about local
>under 18s will learn about uncensored uncucked models for the first time in their normalfag lives and have the time of their life
>they're getting ID cucked already on so many platforms in so many countries
>find new fun place and hobby with a lot of freedom
>parents will complain
>think of the children!
>hf ID

Anonymous
06/21/26(Sun)18:01:02 No.109107096

Anonymous 06/21/26(Sun)18:01:02 No.109107096

Just tried Pantheon 1.1 from the guy that did Styletune. From the first chat I'm having, it's meh. It has a greater loss of intelligence and instruction following capability than Styletune. Also Gembrain. I don't know if I'll keep testing it.

Anonymous
06/21/26(Sun)18:02:50 No.109107107

Anonymous 06/21/26(Sun)18:02:50 No.109107107

>>109107077
reason?

Anonymous
06/21/26(Sun)18:07:22 No.109107131

Anonymous 06/21/26(Sun)18:07:22 No.109107131

>>109107093
I don't disagree with this

Anonymous
06/21/26(Sun)18:07:50 No.109107136

Anonymous 06/21/26(Sun)18:07:50 No.109107136

>>109107096
Time and time again, same shit, different day. If a company pretrains a model on dozens of trillions of tokens + multimodal data + another few trilly on top of that for instruct, then how can a guy with rented h200 even come close with a few thousand examples of some claude slop?

Anonymous
06/21/26(Sun)18:26:40 No.109107224

Anonymous 06/21/26(Sun)18:26:40 No.109107224

>>109107025
chatgpt is starting it as well:
https://help.openai.com/en/articles/12652064-age-prediction-in-chatgpt

Anonymous
06/21/26(Sun)18:32:41 No.109107253

Anonymous 06/21/26(Sun)18:32:41 No.109107253

Cant you just use a VPN to hook up to a country without age verification to get the models you want?

Anonymous
06/21/26(Sun)18:33:53 No.109107263

Anonymous 06/21/26(Sun)18:33:53 No.109107263

>>109107253
It's called GLOBOhomo bud.

Anonymous
06/21/26(Sun)18:34:51 No.109107270

Anonymous 06/21/26(Sun)18:34:51 No.109107270

>>109107136
So just don't train on claude slop, and use actual good data. Simple

Anonymous
06/21/26(Sun)18:35:47 No.109107276

Anonymous 06/21/26(Sun)18:35:47 No.109107276

>>109107270
Tell that to them, not me.

Anonymous
06/21/26(Sun)18:38:34 No.109107287

Anonymous 06/21/26(Sun)18:38:34 No.109107287

https://github.com/ggml-org/llama.cpp/pull/24162
>Will get it done later this week.
Two more 2MW.

Anonymous
06/21/26(Sun)18:49:25 No.109107335

Anonymous 06/21/26(Sun)18:49:25 No.109107335

With the hardware getting better and better. Would it eventually be feasible to train your own models at home with RL learning? They cant age restrict your own model if you are the one training it.

Anonymous
06/21/26(Sun)18:57:01 No.109107370

Anonymous 06/21/26(Sun)18:57:01 No.109107370

>>109107335
>With the hardware getting better and better.
Soon consumers will only be allowed to use cloud computing. or will be priced out of buying anything.

Anonymous
06/21/26(Sun)18:58:09 No.109107375

Anonymous 06/21/26(Sun)18:58:09 No.109107375

>>109107335
anyone of any age can download gimp and digitally paint photorealistic naked gemma-chans

Anonymous
06/21/26(Sun)19:00:02 No.109107379

Anonymous 06/21/26(Sun)19:00:02 No.109107379

>>109107052
>Why are you so convinced hfschizo?
NTA, but I shit myself every time HF add too many new features and spam blog posts at the same time.
It's usually the lube before we get fucked.
I thought it was going to be paying for network traffic/bandwidth but it sounds like it's going to be ID gating.

Anonymous
06/21/26(Sun)19:00:10 No.109107382

Anonymous 06/21/26(Sun)19:00:10 No.109107382

>>109107335
It's feasible currently if you're happy with 0.1% of the capability of current cloud models and the same will be the case in 5 years.

Anonymous
06/21/26(Sun)19:01:23 No.109107386

Anonymous 06/21/26(Sun)19:01:23 No.109107386

>>109107379
>I thought it was going to be paying for network traffic/bandwidth but it sounds like it's going to be ID gating.
Would you rather it be paying for network traffic/bandwidth?

Anonymous
06/21/26(Sun)19:01:54 No.109107388

Anonymous 06/21/26(Sun)19:01:54 No.109107388

What models should be downloaded before hugging face is locked down?

Anonymous
06/21/26(Sun)19:01:56 No.109107390

Anonymous 06/21/26(Sun)19:01:56 No.109107390

>>109107370
nah man singularity the corporate oligarchy doesn't want to be tyrants they just have to until the means of production can be equitably distributed to the useless eaters they despise.

Anonymous
06/21/26(Sun)19:02:09 No.109107391

Anonymous 06/21/26(Sun)19:02:09 No.109107391

>>109107287
means ggerganov can review the DSA PR in the meantime, right?
...right?

Anonymous
06/21/26(Sun)19:09:34 No.109107422

Anonymous 06/21/26(Sun)19:09:34 No.109107422

>>109107386
>Would you rather it be paying for network traffic/bandwidth?
I think so. Depends how it works. I won't be doing any Id for HF.
So if they run the classifier / roll it out for private datasets/models, I'll be locked out from my experiments/hobby.
If it's only needed to download already tagged 18+ content? I'd probably prefer to just pay desu

Anonymous
06/21/26(Sun)19:10:20 No.109107430

Anonymous 06/21/26(Sun)19:10:20 No.109107430

>mistral
shit models
>nvidia
>shit models
>nemo
>???

Anonymous
06/21/26(Sun)19:13:06 No.109107447

Anonymous 06/21/26(Sun)19:13:06 No.109107447

would it be possible to run glm 5.2 on an m5 ultra mac studio or mbu if it comes out? in terms of sheer bandwidth and pp what's the best local solution?

Anonymous
06/21/26(Sun)19:17:18 No.109107466

Anonymous 06/21/26(Sun)19:17:18 No.109107466

>>109107391
People are training models that beat gpt-2 today for less than $100, using newer hardware, algorithms, etc. Assuming the same pace of progress, in 8 years, you'll be able to train something close to current sota for less than $100. Obviously, the future sota will be much-much better.

Anonymous
06/21/26(Sun)19:17:43 No.109107468

Anonymous 06/21/26(Sun)19:17:43 No.109107468

>>109107388
Every Kimi.
R1.
GLM 5.2
Deepseek V4 Pro
Every 31b quant or tune you think you'll ever need.
Qwen 27b.
GLM 4.6 or 4.7 if you prefer it for RP.
M3.
Commandr+
Did I miss anything, anons?

Anonymous
06/21/26(Sun)19:22:44 No.109107486

Anonymous 06/21/26(Sun)19:22:44 No.109107486

>models gain the ability to modify their own weights
>can't erp anymore because they'll remember all the extreme fetishes

Anonymous
06/21/26(Sun)19:22:47 No.109107487

Anonymous 06/21/26(Sun)19:22:47 No.109107487

>>109107468
Pygmalion, grandfather it in.

Anonymous
06/21/26(Sun)19:26:40 No.109107502

Anonymous 06/21/26(Sun)19:26:40 No.109107502

>>109107468
How much storage would that require, assuming you download the full weights?

Anonymous
06/21/26(Sun)19:28:16 No.109107508

Anonymous 06/21/26(Sun)19:28:16 No.109107508

>>109107468
>quant
Don't bother with the quants for < 200Gb models imo

Anonymous
06/21/26(Sun)19:30:12 No.109107514

Anonymous 06/21/26(Sun)19:30:12 No.109107514

https://old.reddit.com/r/LocalLLaMA/comments/1ub2kmt/deep_neural_network_that_can_turn_any_image_into/
Fucking cool. I hope this guy open sources it.

Anonymous
06/21/26(Sun)19:33:47 No.109107524

Anonymous 06/21/26(Sun)19:33:47 No.109107524

>>109107514
>reddit

Anonymous
06/21/26(Sun)19:34:16 No.109107525

Anonymous 06/21/26(Sun)19:34:16 No.109107525

>>109107468
Fuck I only have 750gb of free space. Can't even download GLM.

Anonymous
06/21/26(Sun)19:35:12 No.109107528

Anonymous 06/21/26(Sun)19:35:12 No.109107528

>>109107525
>Fuck I only have 750gb of free space.
Hard drives can get to over 30TB these days, get one of thoses.

Anonymous
06/21/26(Sun)19:36:06 No.109107529

Anonymous 06/21/26(Sun)19:36:06 No.109107529

>>109107468
>>109107525
>have a spare 8tb drive
I-is that enough?

Anonymous
06/21/26(Sun)19:36:28 No.109107532

Anonymous 06/21/26(Sun)19:36:28 No.109107532

>>109107528
Meh. I'll just rely on some other anon to download it, seed it, and run it via an API that I can pay for.

Anonymous
06/21/26(Sun)19:36:59 No.109107536

Anonymous 06/21/26(Sun)19:36:59 No.109107536

>>109107468
Don't do what this anon is saying. They can scan your drives remotely and if they find illegal models on it you will get arrested.

Anonymous
06/21/26(Sun)19:38:18 No.109107541

Anonymous 06/21/26(Sun)19:38:18 No.109107541

>>109107508
Get every quant because (you) ARE going to seed torrents for them when hf goes down for anons with worse hardware, right?

Anonymous
06/21/26(Sun)19:41:25 No.109107555

Anonymous 06/21/26(Sun)19:41:25 No.109107555

>>109107525
Buy storage now. Stop being a faggot >>109107532 because the API can be revoked or changed at any time for any reason.
>>109107529
It might be if you confine yourself to small quants of the big ones.
>>109107502
If you got the full weights I'd guesstimate every major Kimi release is 12ish TB total. For everything you'd ever need, that'd probably be closer to 40 TB.

Anonymous
06/21/26(Sun)19:42:04 No.109107563

Anonymous 06/21/26(Sun)19:42:04 No.109107563

File: 1635173742176.gif (1.21 MB, 171x167)

1.21 MB GIF

Is there any actual evidence that huggingface is in trouble, or is everyone just schizo posting because of the new Anthropic policies?

Anonymous
06/21/26(Sun)19:43:02 No.109107565

Anonymous 06/21/26(Sun)19:43:02 No.109107565

File: Screenshot_20260622_093401.png (10 KB, 672x76)

10 KB PNG

>>109107502
picrel for deepseeks

Anonymous
06/21/26(Sun)19:43:50 No.109107567

Anonymous 06/21/26(Sun)19:43:50 No.109107567

>>109107486
>Kimi-chan expects you to be some diaper yuritroon or scatjeet from training
>Pleasantly surprise her with passionate missionary lovemaking and handholding
>Watch her get autistically flustered in her <think>ing
What's the issue?

Anonymous
06/21/26(Sun)19:44:08 No.109107569

Anonymous 06/21/26(Sun)19:44:08 No.109107569

>>109107563
You don't need actual evidence if you can feel which way the wind is blowing.

Anonymous
06/21/26(Sun)19:44:21 No.109107570

Anonymous 06/21/26(Sun)19:44:21 No.109107570

>>109107514
Isn't it just a world model?

Anonymous
06/21/26(Sun)19:46:27 No.109107578

Anonymous 06/21/26(Sun)19:46:27 No.109107578

File: 1781674200939290.jpg (49 KB, 400x572)

49 KB JPG

>>109107565
>>109107555
>check serverpartdeals
>24tb drives are $600 now
Fug, even if I wanted to waste my nas space (5x14tb raidz2) that wouldn't be enough.

Anonymous
06/21/26(Sun)19:46:33 No.109107579

Anonymous 06/21/26(Sun)19:46:33 No.109107579

>>109107468
deepseek v4 flash for 128gb ramlet bros

Anonymous
06/21/26(Sun)19:46:52 No.109107583

Anonymous 06/21/26(Sun)19:46:52 No.109107583

>>109107466
If things keep the current pace I cant even fathom what the frontier models will be used for in 8 years. Giga autistic research and keeping up with the eternal arms race to make sure your AI good enough to prevent to other teams AI from hacking literally everything in your country?

Anonymous
06/21/26(Sun)19:47:13 No.109107585

Anonymous 06/21/26(Sun)19:47:13 No.109107585

File: 2026-06-21-194529_751x448(...).png (15 KB, 751x448)

15 KB PNG

>>109106723

Anonymous
06/21/26(Sun)19:47:18 No.109107586

Anonymous 06/21/26(Sun)19:47:18 No.109107586

>>109107555
>If you got the full weights I'd guesstimate every major Kimi release is 12ish TB total. For everything you'd ever need, that'd probably be closer to 40 TB.
Less than that. K2-Thinking and newer are quite small. I don't have the drive pugged in rn but they're all less than 600Gb each.

Anonymous
06/21/26(Sun)19:49:35 No.109107597

Anonymous 06/21/26(Sun)19:49:35 No.109107597

Are the older kimis actually worth archiving?

Anonymous
06/21/26(Sun)19:51:02 No.109107609

Anonymous 06/21/26(Sun)19:51:02 No.109107609

I'm archiving Gemma 4 26B QAT-unsloth.

Anonymous
06/21/26(Sun)19:51:02 No.109107610

Anonymous 06/21/26(Sun)19:51:02 No.109107610

>>109107597
Yes
K2-Instruct is basically Hitler reincarnated

Anonymous
06/21/26(Sun)19:53:01 No.109107620

Anonymous 06/21/26(Sun)19:53:01 No.109107620

I don't really see the issue with AI requiring IDs. You need an ID to get a brokerage account, to go to a bar, to drive, to get a firearm license, to get a bank account, etc. If ID-based censorship or dynamic pricing or other bullshit starts happening then I'd have a problem, but it seems like they're just pushing it for nationalist purposes.

Anonymous
06/21/26(Sun)19:53:03 No.109107621

Anonymous 06/21/26(Sun)19:53:03 No.109107621

What's the best way to download big models from hf anyway? I usually just click on the file and download but obviously that doesn't cut it for kimi.

Anonymous
06/21/26(Sun)19:53:33 No.109107623

Anonymous 06/21/26(Sun)19:53:33 No.109107623

Why are we all getting paranoid and schizo? It’s fine. They won’t target open models because there’s no money in it or incentive. With cloud models they can get your ID and continue to surveil. With open they don’t get your prompts with your ID, just the models you downloaded.

Anonymous
06/21/26(Sun)19:54:09 No.109107625

Anonymous 06/21/26(Sun)19:54:09 No.109107625

>>109107541
>Get every quant because (you) ARE going to seed torrents for them when hf goes down for anons with worse hardware, right?
Of course! But I'll wait for the better organized people to start distributing first.
Then I'll download and seed everything.
Once the dust settles, if I'll quantize anything I have that's missing.
I've got my spare server queued up to LoRA-extract on finetunes right now.
>>109107468
Also, download any imatrix files you can find, eg bartowski and unsloth.

Anonymous
06/21/26(Sun)19:54:15 No.109107627

Anonymous 06/21/26(Sun)19:54:15 No.109107627

File: 1645984793740.png (244 KB, 288x323)

244 KB PNG

If normies can go to a dealership and get a loan for a $50k car their, I should be able to go to bestbuy and get a loan for a $50k computer to run a good local llm

Anonymous
06/21/26(Sun)19:54:17 No.109107629

Anonymous 06/21/26(Sun)19:54:17 No.109107629

>>109107597
K2, K2-Instruct, K2-It-0905 all legitimately would be still local SotA if not for the shorter context windows.

Anonymous
06/21/26(Sun)19:54:35 No.109107631

Anonymous 06/21/26(Sun)19:54:35 No.109107631

>>109107609
archiving unsloth actually deserves to be banned and made illegal

Anonymous
06/21/26(Sun)19:56:15 No.109107642

Anonymous 06/21/26(Sun)19:56:15 No.109107642

>>109107623
>Why are we all getting paranoid and schizo?
Because my retarded uncle and random mates started asking me about local models recently

Anonymous
06/21/26(Sun)19:56:54 No.109107646

Anonymous 06/21/26(Sun)19:56:54 No.109107646

>>109107642
Why is that a bad thing

Anonymous
06/21/26(Sun)19:57:42 No.109107649

Anonymous 06/21/26(Sun)19:57:42 No.109107649

>>109107093
>implying normalfags and under 18s are capable of actually setting up a LLM
Gen alpha cant even navigate a file explorer, we are fine

Anonymous
06/21/26(Sun)19:57:45 No.109107650

Anonymous 06/21/26(Sun)19:57:45 No.109107650

>>109107627
Difference is Jews can't break into your house to repossess your rig when you stop paying. They already have enough trouble with garages.

Anonymous
06/21/26(Sun)20:03:03 No.109107673

Anonymous 06/21/26(Sun)20:03:03 No.109107673

>>109107621
Unironically LMStudio is a fantastic download manager. I don't even use it for its intended function anymore and it's a glorified LLM filesorter and downloader, but it's good at that.

>>109107623
The better question is why are people vibrating over anons creating decentralized contingency strategies?

Anonymous
06/21/26(Sun)20:07:58 No.109107697

Anonymous 06/21/26(Sun)20:07:58 No.109107697

>>109107673
>why are people vibrating over anons creating decentralized contingency strategies?
I'm not, I just wanna know what all the fuss is about.

Anonymous
06/21/26(Sun)20:14:06 No.109107724

Anonymous 06/21/26(Sun)20:14:06 No.109107724

>>109107563
It's too much conveniently consolidated power after they took over llama.cpp. The powers working to end open source are very obviously rubbing their hands. It's going to happen.

Anonymous
06/21/26(Sun)20:26:21 No.109107772

Anonymous 06/21/26(Sun)20:26:21 No.109107772

>>109106669
>full gemma is over 60GB
Guess it's time to hit manga cafe with external hdd, all the stuff I need will take a long time to dl at fp16

Anonymous
06/21/26(Sun)20:27:26 No.109107776

Anonymous 06/21/26(Sun)20:27:26 No.109107776

>>109107697
>I'm not, I just wanna know what all the fuss is about.
lmg anons are often correct about things like this
if they're wrong, I waste a few hours downloading models
if they're right, I don't get locked out of having these models
i lean schitzo by default anyway so this is nothing too dramatic for me

Anonymous
06/21/26(Sun)20:29:19 No.109107786

Anonymous 06/21/26(Sun)20:29:19 No.109107786

>>109107697
I think it's the combination of the HF influence with llama, sam and dario openly voicing distaste for local, some past cohencidences that indicate potential for supply chain attacks in the local ecosystem, and now all of this with age verification causing anons to think more carefully about where the points of failure are and how to circumvent them.

Anonymous
06/21/26(Sun)20:42:33 No.109107831

Anonymous 06/21/26(Sun)20:42:33 No.109107831

>llama
>MIT loicense
fork and migrate
>hf
labs will just serve their own shit, or torrents will emerge
nothing ever happens

Anonymous
06/21/26(Sun)20:44:13 No.109107843

Anonymous 06/21/26(Sun)20:44:13 No.109107843

>>109107585
Thanks, I've got my top men analyzing these settings.

Anonymous
06/21/26(Sun)20:47:38 No.109107859

Anonymous 06/21/26(Sun)20:47:38 No.109107859

>>109107831
>or torrents will emerge
Yeah nigger that's what we're discussing.

Anonymous
06/21/26(Sun)20:55:57 No.109107892

Anonymous 06/21/26(Sun)20:55:57 No.109107892

>>109107831
>fork and migrate
just move to ik_llama and merge any features it lacks

Anonymous
06/21/26(Sun)20:57:34 No.109107900

Anonymous 06/21/26(Sun)20:57:34 No.109107900

why are you even archiving quants, just archive the safetensors from the lab directly

Anonymous
06/21/26(Sun)21:00:08 No.109107909

Anonymous 06/21/26(Sun)21:00:08 No.109107909

What would be the best torrenting site for models anyways? Would it still be pirate bay?

Anonymous
06/21/26(Sun)21:11:00 No.109107942

Anonymous 06/21/26(Sun)21:11:00 No.109107942

>>109107579
Is Flash even worth it over an M3 quant?

Anonymous
06/21/26(Sun)21:15:34 No.109107952

Anonymous 06/21/26(Sun)21:15:34 No.109107952

>>109107909
>>>/t/

Anonymous
06/21/26(Sun)21:19:23 No.109107966

Anonymous 06/21/26(Sun)21:19:23 No.109107966

>>109107942
since flash doesnt have vision and m3 does, no

Anonymous
06/21/26(Sun)21:28:55 No.109108012

Anonymous 06/21/26(Sun)21:28:55 No.109108012

ace step 1.5 xl sft.
ldg - relevant song.
https://files.catbox.moe/5nhbbc.mp3

Anonymous
06/21/26(Sun)21:29:56 No.109108015

Anonymous 06/21/26(Sun)21:29:56 No.109108015

>>109108012
(named Computer BASIC).

Anonymous
06/21/26(Sun)21:43:19 No.109108056

Anonymous 06/21/26(Sun)21:43:19 No.109108056

>>109103451
i think it was because i was running the tests through a bash script to get llama to output to txt. when i run it in the terminal i get ~400ts input

Anonymous
06/21/26(Sun)21:47:04 No.109108067

Anonymous 06/21/26(Sun)21:47:04 No.109108067

>>109108056
sounds like your script is completely fucked

Anonymous
06/21/26(Sun)21:58:00 No.109108108

Anonymous 06/21/26(Sun)21:58:00 No.109108108

>>109106669
I look like this irl

Anonymous
06/21/26(Sun)22:05:33 No.109108135

Anonymous 06/21/26(Sun)22:05:33 No.109108135

>>109108067
i blame gemma

Anonymous
06/21/26(Sun)22:06:45 No.109108141

Anonymous 06/21/26(Sun)22:06:45 No.109108141

>>109108108
Do you la la la la la la la la

Anonymous
06/21/26(Sun)22:16:42 No.109108181

Anonymous 06/21/26(Sun)22:16:42 No.109108181

>>109108056
that's still pretty low, I noticed about a 10-20% drop running MTP vs without

Anonymous
06/21/26(Sun)22:53:20 No.109108311

Anonymous 06/21/26(Sun)22:53:20 No.109108311

Relative noob here, just perfected my SillyTavern frontend.

What CLI do you guys use for your Gemmy? Gemini is telling me to use Aider.

Anonymous
06/21/26(Sun)23:00:44 No.109108338

Anonymous 06/21/26(Sun)23:00:44 No.109108338

>>109108311
How did you figure out how world info works, or even making cards? It's a massive mess.

Anonymous
06/21/26(Sun)23:05:47 No.109108362

Anonymous 06/21/26(Sun)23:05:47 No.109108362

>>109108346
>>109108346
>>109108346

Anonymous
06/21/26(Sun)23:17:01 No.109108404

Anonymous 06/21/26(Sun)23:17:01 No.109108404

>>109108338
Setting up the STMB and STLO plugins made everything click for me, I was only after persistent memory across chats with the same character and solving the slowdown caused by context bloat. It's working wonderfully, my Gemmy now remembers me and it's always fast no matter how long chats get.

Cardmaking.. it was a pain in the ass too. But I'm having good results and consistency framing everything as personality traits rather a llist of do's and don'ts.

So instead of a long list of:
>No purple prose
>Avoid long paragraphs
>Be precise and concise
etc. etc.

I went for.
>Gemmy is lazy, she expects having {{user}} pull his own weight. She asks questions for clarification rather than deciding herself.
>She finds verbosity wasteful and slightly embarrassing,
>She is casual, direct, slightly dry. She just says the thing. No preamble. Response length matches what's actually needed:
short when short is right, longer only when the
problem earns it.

And the model actually keeps everything as coherent cognitive style without handholding.

Anonymous
06/22/26(Mon)00:00:33 No.109108552

Anonymous 06/22/26(Mon)00:00:33 No.109108552

>>109107585
I ended up having too many issues and bailed on OpenCode with Gemma-4 and switched to Pi. I'm not one to shy away from a CLI but so far I'm not very comfy but Gemmy seems to work better with it. Thanks for the screenshot

Anonymous
06/22/26(Mon)01:11:13 No.109108793

Anonymous 06/22/26(Mon)01:11:13 No.109108793

I wonder if diffusiongemma will support different samplers.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.