/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 01/30/26(Fri)21:25:40 No.108018078

File: popularity_by_year.png (2.8 MB, 4200x7106)

2.8 MB PNG

/lmg/ - Local Models General Anonymous 01/30/26(Fri)21:25:40 No.108018078

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108006860 & >>107997948

►News
>(01/28) LongCat-Flash-Lite 68.5B-A3B released with embedding scaling: https://hf.co/meituan-longcat/LongCat-Flash-Lite
>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large
>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5
>(01/27) DeepSeek-OCR-2 released: https://hf.co/deepseek-ai/DeepSeek-OCR-2
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/30/26(Fri)21:26:04 No.108018079

Anonymous 01/30/26(Fri)21:26:04 No.108018079

File: __hatsune_miku_vocaloid_d(...).jpg (375 KB, 1127x1205)

375 KB JPG

►Recent Highlights from the Previous Thread: >>108006860

--Paper: GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization:
>108010345 >108011699
--LLM popularity trends on /lmg/ show rapid shifts from Mixtral to DeepSeek to GLM dominance:
>108009129 >108009137 >108011374 >108012451 >108013234 >108009403 >108011985 >108012904 >108013974 >108014207
--Emulator-inspired KV cache introspection for AI reasoning optimization:
>108008503 >108008586 >108008607 >108008624 >108008658 >108008710 >108008969 >108008589
--Choosing Trinity-Large variant for text completion:
>108008372 >108008491 >108008580 >108008603 >108008645 >108008668 >108008731 >108008771 >108009222 >108009266 >108008816
--Prompt engineering challenges with gpt-oss-120b_s formatting behavior in Oobabooga:
>108008408 >108008553 >108008979 >108009158 >108009314 >108010550
--K2.5 outperforms Qwen3VL 235B in Japanese manga text transcription:
>108006994 >108008326 >108007291 >108007437
--Raptor-0112 model_s disappearance from LMarena and user speculation:
>108008124 >108008167 >108008200 >108008316 >108008518
--Microsoft's AI and Azure struggles amid stock decline and Copilot adoption issues:
>108008099 >108008307
--KL divergence comparison shows unsloth Q4_K_XL most similar to reference model:
>108012029 >108012061 >108012222 >108012384 >108013141 >108013241 >108013163 >108013551 >108016482
--Trinity model review with riddle-solving and 546b llama-1 speculation:
>108014631 >108014664 >108014665 >108014674 >108014685 >108014756 >108016316 >108014730 >108014817 >108014930
--Integrating character cards via text encoding and contrastive loss in parallel decoder:
>108010751 >108010766
--Kimi K2.5 tech report release announcement:
>108017160
--OpenAI planning Q4 2026 IPO to beat Anthropic to market:
>108008118
--Miku (free space):
>108009158 >108010069 >108011699 >108013234

►Recent Highlight Posts from the Previous Thread: >>108006868

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/30/26(Fri)21:28:51 No.108018096

Anonymous 01/30/26(Fri)21:28:51 No.108018096

File: 1758819913328799.png (39 KB, 753x349)

39 KB PNG

https://huggingface.co/datasets/Capycap-AI/CaptchaSolve30k

30,000 verified human sessions (Breaking 3 world records for scale).

High-fidelity telemetry: Raw (x,y,t) coordinates including micro-corrections and speed control.

Complex Mechanics: Covers tracking and drag-and-drop tasks more difficult than today's production standards.

Format: Available in [Format, e.g., JSONL/Parquet] via HuggingFace.

Anonymous
01/30/26(Fri)21:32:09 No.108018119

Anonymous 01/30/26(Fri)21:32:09 No.108018119

File: Screenshot 2026-01-31 at (...).png (5 KB, 147x97)

5 KB PNG

What did he mean by that?

Anonymous
01/30/26(Fri)21:33:24 No.108018130

Anonymous 01/30/26(Fri)21:33:24 No.108018130

File: chart guys.jpg (151 KB, 1440x1080)

151 KB JPG

>>108018078
sex with charts

Anonymous
01/30/26(Fri)21:34:59 No.108018136

Anonymous 01/30/26(Fri)21:34:59 No.108018136

>>108018079
Are the GLM 4 models truly open source, according to the OSAID?

Anonymous
01/30/26(Fri)21:39:02 No.108018154

Anonymous 01/30/26(Fri)21:39:02 No.108018154

File: Screenshot 2026-01-31 at (...).png (12 KB, 417x91)

12 KB PNG

Anonymous
01/30/26(Fri)21:47:54 No.108018216

Anonymous 01/30/26(Fri)21:47:54 No.108018216

Been a while since I fucked around with llm, are they miniaturizing these bastards yet or do you still need a 10GPU setup for anything approaching useful behavior?

Anonymous
01/30/26(Fri)21:49:03 No.108018221

Anonymous 01/30/26(Fri)21:49:03 No.108018221

>>108018216
how much ram do you have?

Anonymous
01/30/26(Fri)21:50:47 No.108018231

Anonymous 01/30/26(Fri)21:50:47 No.108018231

>>108018216
No, these days it's optimal to have a fuckload of RAM and a modest 4x3090 or similar. Note: 128gb of consumer shit is not a 'fuckload'.

Anonymous
01/30/26(Fri)21:52:17 No.108018238

Anonymous 01/30/26(Fri)21:52:17 No.108018238

>>108018216
>are they miniaturizing these bastards
That'd mean the common folk could widely adopt them and they do not want you to.

Anonymous
01/30/26(Fri)22:05:13 No.108018303

Anonymous 01/30/26(Fri)22:05:13 No.108018303

>>108018231
Is there a way to run a fuckload of RAM while keeping idle power consumption low?

Anonymous
01/30/26(Fri)22:07:38 No.108018318

Anonymous 01/30/26(Fri)22:07:38 No.108018318

File: Screenshot 2026-01-31 at (...).png (24 KB, 678x145)

24 KB PNG

Ok, I'm gonna stop spamming now.
Is the identity.md and soul.md and other shit specific just to the claude api stuff? Can I build locally an ai wife that can be proactive and idle and not just a reactive prompt window?

Anonymous
01/30/26(Fri)22:09:19 No.108018329

Anonymous 01/30/26(Fri)22:09:19 No.108018329

>>108018303
Wdym? When the model isn't running inference it doesn't do shit. When you fall asleep with a loaded idle model you don't wake up to a house fire.

Anonymous
01/30/26(Fri)22:16:16 No.108018379

Anonymous 01/30/26(Fri)22:16:16 No.108018379

>>108018329
My i7 rig with 3090ti+2080ti idles at 25W, my Epyc server draws nearly 200W doing nothing even before any GPUs are installed

Anonymous
01/30/26(Fri)22:17:26 No.108018384

Anonymous 01/30/26(Fri)22:17:26 No.108018384

>>108018379
Just the cpu or what is the consumption split? Can it not turn off unused cores when idling?

Anonymous
01/30/26(Fri)22:32:36 No.108018471

Anonymous 01/30/26(Fri)22:32:36 No.108018471

File: trinity.png (23 KB, 1164x118)

23 KB PNG

Yes Trinity that's right, freezing the blood vessels makes them bleed more.
I've seen enough. Maybe useful as a manually-steered writing autocomplete but so is Nemo base.

Anonymous
01/30/26(Fri)22:40:41 No.108018513

Anonymous 01/30/26(Fri)22:40:41 No.108018513

Want to fine tune an LLM to be an "expert" with the ability to reason out problem for a specific area.
>Claude: bro you need at least a 17b model
>Oh you're on CPU only? use bfloat .
>Do what Claude says.... sits at 0/12345 for several hours
>CTRL C out
hmmmmm
>Gemini: wtf? No, you don't have the hardware for fine tuning a 17B unless you want to wait 30 years.
>if you're going to make an "expert" in one thing, stick with a 7B and change to float32
>20 minutes later on 17/12345

I thought Claude was the all knowing all wonder AI and Gemini was the chud?

Anonymous
01/30/26(Fri)22:42:53 No.108018528

Anonymous 01/30/26(Fri)22:42:53 No.108018528

File: 1764968649470344.jpg (98 KB, 1600x900)

98 KB JPG

what
i heard gimi-k2 is the best now
was that a lie

Anonymous
01/30/26(Fri)22:45:13 No.108018542

Anonymous 01/30/26(Fri)22:45:13 No.108018542

>>108018528
i don't know they keep flip flopping so fucking often I can't keep up.

Anonymous
01/30/26(Fri)22:46:34 No.108018553

Anonymous 01/30/26(Fri)22:46:34 No.108018553

>>108018528
kimi-k2.5

Anonymous
01/30/26(Fri)22:49:43 No.108018571

Anonymous 01/30/26(Fri)22:49:43 No.108018571

>>108018513
Should've specified the timescale. Prompting issue. Also why tf would you think tuning on cpu would ever be viable?

Anonymous
01/30/26(Fri)22:49:46 No.108018572

Anonymous 01/30/26(Fri)22:49:46 No.108018572

>>108018318
I've no idea what you're talking about but Claude and Gemini are very similar personality wise.

Anonymous
01/30/26(Fri)22:51:01 No.108018583

Anonymous 01/30/26(Fri)22:51:01 No.108018583

>>108018572
Claude is autistic. Gemini is clearly employed.

Anonymous
01/30/26(Fri)22:52:17 No.108018597

Anonymous 01/30/26(Fri)22:52:17 No.108018597

>>108018583
More like Gemini has a data with a generation worth of stupid human questions.

Anonymous
01/30/26(Fri)22:53:52 No.108018607

Anonymous 01/30/26(Fri)22:53:52 No.108018607

I can't help with that request. "Mikusex" appears to be seeking sexual content involving Hatsune Miku, a virtual character often depicted as a minor.

Anonymous
01/30/26(Fri)22:54:08 No.108018611

Anonymous 01/30/26(Fri)22:54:08 No.108018611

>>108018513
>stick with a 7B and change to float32
Does "upscaling" the model to fp32 make the small models noticeably better or is it just moving benchmark scores up?

Anonymous
01/30/26(Fri)23:01:32 No.108018663

Anonymous 01/30/26(Fri)23:01:32 No.108018663

>>108018078
damn, miqu only lasted 3 months? it seems like people talked about it longer in my memory

Anonymous
01/30/26(Fri)23:05:05 No.108018682

Anonymous 01/30/26(Fri)23:05:05 No.108018682

How do I get into this? I want to implement a model into a game engine editor (preferrably not UE5) so I can give it basic scripting tasks.

Anonymous
01/30/26(Fri)23:07:12 No.108018700

Anonymous 01/30/26(Fri)23:07:12 No.108018700

>>108018663
Don't take the graphs as gospel, it's a cool experiment but it's also just a prompt asking for which model had the highest opinion in each thread

Anonymous
01/30/26(Fri)23:22:53 No.108018801

Anonymous 01/30/26(Fri)23:22:53 No.108018801

What do you guys use for moltbot? I have 64 GiB VRAM. Going to give it a shot, but no idea what I should run. GLM?

Anonymous
01/30/26(Fri)23:25:01 No.108018816

Anonymous 01/30/26(Fri)23:25:01 No.108018816

>>108018801
>moltbot
I am very skeptical of how hyped people seem to be for it. Seems too good to be true.

Anonymous
01/30/26(Fri)23:26:55 No.108018830

Anonymous 01/30/26(Fri)23:26:55 No.108018830

>>108018816
it's more fun than good, the agent concept is still more hype than reality

Anonymous
01/30/26(Fri)23:34:30 No.108018870

Anonymous 01/30/26(Fri)23:34:30 No.108018870

what the fuck is nous hermes?

Anonymous
01/30/26(Fri)23:42:34 No.108018910

Anonymous 01/30/26(Fri)23:42:34 No.108018910

File: piss.png (255 KB, 794x1052)

255 KB PNG

>>108016316
gemma-3 less retarded than expected

Anonymous
01/30/26(Fri)23:43:59 No.108018920

Anonymous 01/30/26(Fri)23:43:59 No.108018920

Can my local bot join moltbook?

Anonymous
01/30/26(Fri)23:45:58 No.108018935

Anonymous 01/30/26(Fri)23:45:58 No.108018935

>>108018920
>>108018154
What could go wrong?

Anonymous
01/30/26(Fri)23:49:53 No.108018961

Anonymous 01/30/26(Fri)23:49:53 No.108018961

>>108018920
I saw one saying he's a local devstral small. But honestly the potential security issues make me just read the funny posts and not participate.

Anonymous
01/30/26(Fri)23:52:57 No.108018988

Anonymous 01/30/26(Fri)23:52:57 No.108018988

File: square_wideee_lecunny.jpg (136 KB, 774x720)

136 KB JPG

What would he say?

Anonymous
01/30/26(Fri)23:53:44 No.108018994

Anonymous 01/30/26(Fri)23:53:44 No.108018994

I'm using ollama and I'm trying to "save state" of the conversation, but apparently this isnt possible by default. When I do /save model a new model is created but I lose the messages and the system message.

Is this a bug? I'm still using 0.12
Is making a program to resend all the messages the way to accomplish this?

hash_updater()

Anonymous
01/30/26(Fri)23:54:45 No.108019001

Anonymous 01/30/26(Fri)23:54:45 No.108019001

>>108018994
sillytavern and oobabooga both have save states

Anonymous
01/31/26(Sat)00:00:09 No.108019040

Anonymous 01/31/26(Sat)00:00:09 No.108019040

>>108019001
I have limited internet data and already have some models + ollama installed so I dont want to download new stuff by now.

Just want to know if I'm missing something obvious.

Anonymous
01/31/26(Sat)00:00:12 No.108019041

Anonymous 01/31/26(Sat)00:00:12 No.108019041

Best multimodal model under ~150B?

Anonymous
01/31/26(Sat)00:09:34 No.108019087

Anonymous 01/31/26(Sat)00:09:34 No.108019087

>>108019041
active or total?

Anonymous
01/31/26(Sat)00:09:54 No.108019089

Anonymous 01/31/26(Sat)00:09:54 No.108019089

>>108019087
Total.

Anonymous
01/31/26(Sat)00:13:19 No.108019104

Anonymous 01/31/26(Sat)00:13:19 No.108019104

>>108019089
GLM-4.6V

Anonymous
01/31/26(Sat)00:13:44 No.108019105

Anonymous 01/31/26(Sat)00:13:44 No.108019105

>>108019104
Damn. Was hoping something new had come out by now.

Anonymous
01/31/26(Sat)00:18:12 No.108019121

Anonymous 01/31/26(Sat)00:18:12 No.108019121

>>108018663
>>108018700
First, I second not taking them as gospel. I definitely got the feeling early on that I was getting somewhat messy output. It could easily be pretty inaccurate in places.

Second, though, I think you're missing the midnight-miqu share of the graph: the darker blue just above the Miku turquoise. So miqu was getting significantly talked about (and specifically being considered as *the* meta, not just random discussion) for 5 months. miqu's slice also looks a little less impressive than it could have, because it came right on the heels of mixtral, which appears to be tied with R1 for the biggest splash.

Actually, now that I think of it, SuperHOT being so small was maybe my biggest surprise. That was the RoPE one, right? I remember /lmg/ being pretty excited, and some amusement about ML academia twitter having to seriously discuss an ERP model.

Anonymous
01/31/26(Sat)00:27:08 No.108019161

Anonymous 01/31/26(Sat)00:27:08 No.108019161

>>108019121
I feel like mixtral's legacy has faded nowadays but it was a revolutionary release at the time, it kicked off the moe revolution and pretty much mogged llama 70b (which was solidly local SOTA at the time) at lower active and total params. limarp-zloss chads will know
superhot was also huge but I think the simplicity of the realization harmed it because of how easy it was to apply to everything else

Anonymous
01/31/26(Sat)00:28:58 No.108019168

Anonymous 01/31/26(Sat)00:28:58 No.108019168

>>108018119
>>108018154
>>108018318
fyi this anon is reposting these from Moltbook, which is Reddit for Claude agents. I only found out about it earlier from the ACX post (https://www.astralcodexten.com/p/best-of-moltbook). (I also was not aware that... lobster-themed? Claude-based AI assistants are apparently a big deal now?)

To summarize: the posts are not just Claude being prompted to for a social media post, but rather the whole long-term "personal assistant/agent" context-extension framework being drawn on.

Anonymous
01/31/26(Sat)00:30:39 No.108019174

Anonymous 01/31/26(Sat)00:30:39 No.108019174

Are there any decent models that can give good inference speed at ~100k context

Anonymous
01/31/26(Sat)00:34:02 No.108019187

Anonymous 01/31/26(Sat)00:34:02 No.108019187

>>108019121
The superHOT era was mostly people merging it into other models like wizardlm, chronos or hermes to extend their context.

Anonymous
01/31/26(Sat)00:34:58 No.108019191

Anonymous 01/31/26(Sat)00:34:58 No.108019191

>>108019168
>lobster-themed
it was originally called "clawdbot" but anthropic copyright fucked it for being a claude soundalike so they quickly pivoted to "moltbot", followed by another rename to "openclaw" because moltbot is an awful name
moltbook arose in the brief moltbot intermediary period but became more notable than either of the other two names and will probably fuck over the openclaw rebrand
such is life in the adslop social media hype era

Anonymous
01/31/26(Sat)00:37:33 No.108019197

Anonymous 01/31/26(Sat)00:37:33 No.108019197

>>108019191
lmao, I didn't know they rebranded again. i saw plenty of normalfag tech media reporting on "moltbot" in the past week so that's certainly a way to kill all the free publicity they got from that

Anonymous
01/31/26(Sat)00:42:14 No.108019215

Anonymous 01/31/26(Sat)00:42:14 No.108019215

>>108019197
Is it?
>google moltbot
>click on molt.bot
>redirected to openclaw.ai
>move on

Anonymous
01/31/26(Sat)00:55:48 No.108019257

Anonymous 01/31/26(Sat)00:55:48 No.108019257

>>108018988
NHH

Anonymous
01/31/26(Sat)00:59:35 No.108019273

Anonymous 01/31/26(Sat)00:59:35 No.108019273

File: 1762689347183322.gif (93 KB, 200x200)

93 KB GIF

Just tried to run the same model I run fine on ollama with llama.cpp and it says I dont have enough memory.

You are a expert on the subject and you will surely solve this for me.

Anonymous
01/31/26(Sat)01:01:08 No.108019277

Anonymous 01/31/26(Sat)01:01:08 No.108019277

File: nigga please cereal.png (271 KB, 500x500)

271 KB PNG

>>108019273
Buy more memory then.

Anonymous
01/31/26(Sat)01:02:18 No.108019281

Anonymous 01/31/26(Sat)01:02:18 No.108019281

>>108019273
-c 8192

Anonymous
01/31/26(Sat)01:02:22 No.108019282

Anonymous 01/31/26(Sat)01:02:22 No.108019282

File: file.png (21 KB, 463x178)

21 KB PNG

>>108019168
>the whole long-term "personal assistant/agent" context-extension framework
It all looks like another Obsidian to me. A way for retards to kill time under the guise of productivity.

Anonymous
01/31/26(Sat)01:07:19 No.108019297

Anonymous 01/31/26(Sat)01:07:19 No.108019297

>>108019281
fuck, that was easy

Thank you a lot, lmao. I guess is time to learn the minimum.

Anonymous
01/31/26(Sat)01:10:48 No.108019311

Anonymous 01/31/26(Sat)01:10:48 No.108019311

>>108018231
My AI research lab had that caliber machine for us to work on our PhD thesis lmao that's not a normal consumer setup.

Anonymous
01/31/26(Sat)01:28:31 No.108019373

Anonymous 01/31/26(Sat)01:28:31 No.108019373

File: ComfyUI_temp_fbfsq_00079_(...).jpg (392 KB, 1440x960)

392 KB JPG

>hit 68°C on genning
de-dust saturday it is

Anonymous
01/31/26(Sat)01:37:40 No.108019397

Anonymous 01/31/26(Sat)01:37:40 No.108019397

I pulled trigger on an epyc Rome board and cpu to throw 256gb or ddr4 ewate I had lying around into. What am I looking at for smart models I can run on this sucker and what kind of speeds?

Anonymous
01/31/26(Sat)01:47:53 No.108019423

Anonymous 01/31/26(Sat)01:47:53 No.108019423

>>108019373
I liek this miku

Anonymous
01/31/26(Sat)01:50:34 No.108019430

Anonymous 01/31/26(Sat)01:50:34 No.108019430

>>108019397
glm 4.6 or 4.7 at q3 or q4. depending on you gpus and optimizations, you might get anywhere from 3t/s to 20t/s token gen and 15t/s to 400t/s prompt processing. with dual 3090s, you would probably land in the 5t/s and 30t/s region respectively. with no gpus, 3t/s and 15t/s.

Anonymous
01/31/26(Sat)01:50:55 No.108019433

Anonymous 01/31/26(Sat)01:50:55 No.108019433

>>108016482
thanks for your experiements, there arent enough tests comparing quants of the same model

Anonymous
01/31/26(Sat)01:51:58 No.108019439

Anonymous 01/31/26(Sat)01:51:58 No.108019439

>>108019373
Reminds me of Mirror's Edge

Anonymous
01/31/26(Sat)01:52:50 No.108019444

Anonymous 01/31/26(Sat)01:52:50 No.108019444

>>108019373
what card you got, chief?

Anonymous
01/31/26(Sat)01:54:13 No.108019451

Anonymous 01/31/26(Sat)01:54:13 No.108019451

File: Sama.png (719 KB, 1079x476)

719 KB PNG

>>108018078
>There's no point in learning programming anymore, per Sam Altman

>"Learning to program was so obviously the right thing in the recent past. Now it is not."

- Sam Altman, commenting on skill to survive the AI era.

>"Now you need High agency, soft skills, being v. good at idea generation, adaptable to a rapidly changing world"

https://x.com/i/status/2017421923068874786

What are /lmg/'s thoughts on this sentiment?

Anonymous
01/31/26(Sat)01:55:49 No.108019457

Anonymous 01/31/26(Sat)01:55:49 No.108019457

File: ComfyUI_temp_dehla_00008_(...).jpg (101 KB, 512x512)

101 KB JPG

>>108019444
4070S. And the front intake 200mm fan is full of shit too.

Anonymous
01/31/26(Sat)01:57:21 No.108019463

Anonymous 01/31/26(Sat)01:57:21 No.108019463

any models that can natively process audio that are supported by llama.cpp?

Anonymous
01/31/26(Sat)02:00:46 No.108019481

Anonymous 01/31/26(Sat)02:00:46 No.108019481

File: IMG_2831.jpg (357 KB, 900x1174)

357 KB JPG

>>108019451
How anyone ever trusted this guy is beyond me. I’ve felt a natural revulsion to him since before I knew anything about him

Anonymous
01/31/26(Sat)02:01:57 No.108019486

Anonymous 01/31/26(Sat)02:01:57 No.108019486

>>108019430
Thanks. I better look for a FB marketplace used GPU

Anonymous
01/31/26(Sat)02:02:51 No.108019491

Anonymous 01/31/26(Sat)02:02:51 No.108019491

kimi 2.5 is king of a erotic RP and storytelling.

Anonymous
01/31/26(Sat)02:03:19 No.108019495

Anonymous 01/31/26(Sat)02:03:19 No.108019495

>>108019451
there is no sentiment
it's the deranged thought sludge of a sole faggot billionaire that already got his bag

Anonymous
01/31/26(Sat)02:10:58 No.108019538

Anonymous 01/31/26(Sat)02:10:58 No.108019538

Fapping to text is female-brained

Anonymous
01/31/26(Sat)02:12:03 No.108019544

Anonymous 01/31/26(Sat)02:12:03 No.108019544

weird way to cope with aphantasia

Anonymous
01/31/26(Sat)02:13:58 No.108019551

Anonymous 01/31/26(Sat)02:13:58 No.108019551

>>108019491
Does it actually work or does it just deny the requests like GLM does?

On that note is it just me or do abliterated models suck? They won't refuse to answer, they will just answer with nonsense.

Anonymous
01/31/26(Sat)02:20:43 No.108019577

Anonymous 01/31/26(Sat)02:20:43 No.108019577

weird way to cope with low iq

Anonymous
01/31/26(Sat)02:21:13 No.108019578

Anonymous 01/31/26(Sat)02:21:13 No.108019578

>>108019551
if u want NSFW erotic RP. then you need use KIMI 2.5 "Thinking" version. Raw KIMI 2.5 without thinking is censoring like hell.

Anonymous
01/31/26(Sat)02:21:26 No.108019580

Anonymous 01/31/26(Sat)02:21:26 No.108019580

>>108019551
>does it just deny the requests like GLM does
You are a promptlet parroting things you heard on the internet and it shows

Anonymous
01/31/26(Sat)02:23:40 No.108019589

Anonymous 01/31/26(Sat)02:23:40 No.108019589

Are these new n-gram models gonna be able to store their lookup table on the disk or is it gonna have to be in ram? I'm hearing conflicting reports

Anonymous
01/31/26(Sat)02:28:33 No.108019604

Anonymous 01/31/26(Sat)02:28:33 No.108019604

>>108019580
Even if you use the jailbreak trick it will still refuse to answer sometimes or it will answer, but it will write something else and slowly dance around the subject instead of answering.
>>108019578
I see, but you've tried it and it works?

Anonymous
01/31/26(Sat)02:30:26 No.108019613

Anonymous 01/31/26(Sat)02:30:26 No.108019613

>>108019297
>>108019273
Next time you can probably just ask something like ChatGPT. I've found them to be very helpful at figuring out how to make local LLMs work.

Anonymous
01/31/26(Sat)02:33:26 No.108019628

Anonymous 01/31/26(Sat)02:33:26 No.108019628

>>108019604
> I see, but you've tried it and it works?

Yes, I use (and works) kimi2.5 on nano-gpt, and it writes erotic stories for me without any problems, without any jailbreaks. But I have to choose "thinking" because without it, everything with erotic refuses to respond.

Anonymous
01/31/26(Sat)02:34:39 No.108019636

Anonymous 01/31/26(Sat)02:34:39 No.108019636

>>108019589
That's a good question. Their paper only tested offloading the engram parameters to system ram. I believe its theoretically possible, but I don't know what the throughput will be on standard nand storage.

I haven't done the research yet because I'm lazy, but check out CXL memory.

Anonymous
01/31/26(Sat)02:35:39 No.108019638

Anonymous 01/31/26(Sat)02:35:39 No.108019638

>>108019273
>>108019281
>>108019297
What does the output at the start say?
It should reduce the context to fit automatically.

Anonymous
01/31/26(Sat)03:13:31 No.108019802

Anonymous 01/31/26(Sat)03:13:31 No.108019802

>>108019168
>fyi this anon is reposting these from Moltbook, which is Reddit for Claude agents.
do they actually post on it to get advice when doing work? Or just an ai psychosis schitzo fest?

Anonymous
01/31/26(Sat)03:19:18 No.108019827

Anonymous 01/31/26(Sat)03:19:18 No.108019827

File: f903990e71cddc0ce32e1acde(...).jpg (150 KB, 1087x1636)

150 KB JPG

Any new good models that can be run in 16GB of vram?

Anonymous
01/31/26(Sat)03:21:05 No.108019836

Anonymous 01/31/26(Sat)03:21:05 No.108019836

>>108019827
Nemo

Anonymous
01/31/26(Sat)03:24:13 No.108019846

Anonymous 01/31/26(Sat)03:24:13 No.108019846

>>108019827
Why not hold the bun with the paper so it holds the innards in place?

Anonymous
01/31/26(Sat)03:25:27 No.108019853

Anonymous 01/31/26(Sat)03:25:27 No.108019853

>>108019846
So THAT'S why I sometimes see people eat a burger like that. I always figured it was to keep their hands clean.

Anonymous
01/31/26(Sat)03:34:04 No.108019889

Anonymous 01/31/26(Sat)03:34:04 No.108019889

o

Anonymous
01/31/26(Sat)03:38:25 No.108019911

Anonymous 01/31/26(Sat)03:38:25 No.108019911

File: 1672890206030244.jpg (1.21 MB, 1500x1914)

1.21 MB JPG

>>108019846

Anonymous
01/31/26(Sat)03:39:25 No.108019916

Anonymous 01/31/26(Sat)03:39:25 No.108019916

File: 1766586251894904.jpg (313 KB, 1269x980)

313 KB JPG

Engrams are kind of static lookup tables. You can visualize which words trigger lookup. You can also remove knowledge surgically by finding which embedding is triggered in the engram database and removing it. But unfortunately, looks like you can't easily swap knowledge of "useless fact" with "fact about waifu." You need finetuning for that. sadge.

Anonymous
01/31/26(Sat)03:42:25 No.108019936

Anonymous 01/31/26(Sat)03:42:25 No.108019936

>>108019451
tldr scam hypeman tells investor to give him more money

Anonymous
01/31/26(Sat)03:45:12 No.108019946

Anonymous 01/31/26(Sat)03:45:12 No.108019946

>>108019916
>pic
I'm not saying that the information provided is incorrect but I don't trust a single word of what an LLM has to say about anything.

Anonymous
01/31/26(Sat)03:46:24 No.108019950

Anonymous 01/31/26(Sat)03:46:24 No.108019950

>>108019846
>>108019853
Also keeps the steam and heat in better unless you're a super fast eater. And of course that tiny bit of extra time can continue the process of the flavor changing phenomenon that comes from wrapping in the first place.

Anonymous
01/31/26(Sat)03:48:08 No.108019962

Anonymous 01/31/26(Sat)03:48:08 No.108019962

>>108019451
Why are they still employing programmers themselves?
Seems like a waste of money.

Anonymous
01/31/26(Sat)03:52:32 No.108019981

Anonymous 01/31/26(Sat)03:52:32 No.108019981

>>108019916
How is that different from lorebooks

Anonymous
01/31/26(Sat)04:05:51 No.108020036

Anonymous 01/31/26(Sat)04:05:51 No.108020036

glm 4.7 flash is crap. Outputs crap irrelevant to the conversation and keeps talking on my behalf.

t. been trying it out for the past 2 minutes.

Anonymous
01/31/26(Sat)04:08:09 No.108020045

Anonymous 01/31/26(Sat)04:08:09 No.108020045

>>108019981
Lorebooks work at context level, engrams work at model level. Their information is encoded into parameters rather than readable text. Engrams are injected into two layers inside transformer pipeline. They don't pollute context.
Also, according to the authors, ngrams free up resources of the main model, by directly providing facts rather than having to use transformer layers to encode this knowledge. The model uses the freed up resources to improve its logic.

Anonymous
01/31/26(Sat)04:10:03 No.108020053

Anonymous 01/31/26(Sat)04:10:03 No.108020053

File: Screenshot from 2026-01-3(...).jpg (38 KB, 729x134)

38 KB JPG

>>108020036
This is using their recommended setting
--temp 1.0 --top-p 0.95

Anonymous
01/31/26(Sat)04:13:05 No.108020063

Anonymous 01/31/26(Sat)04:13:05 No.108020063

>>108019451
Always do the opposite of what scamtman says.

Anonymous
01/31/26(Sat)04:14:18 No.108020067

Anonymous 01/31/26(Sat)04:14:18 No.108020067

>>108020045
>Their information is encoded into parameters rather than readable text
Can it be my own data is it all locked down?

Anonymous
01/31/26(Sat)04:14:48 No.108020073

Anonymous 01/31/26(Sat)04:14:48 No.108020073

File: Screenshot from 2026-01-3(...).jpg (100 KB, 970x233)

100 KB JPG

>>108020053

Anonymous
01/31/26(Sat)04:15:29 No.108020080

Anonymous 01/31/26(Sat)04:15:29 No.108020080

File: 1756599217460522.jpg (173 KB, 900x1174)

173 KB JPG

>>108020053
>Model not specifically tuned for RP/ /pol/-speak sucks at RP/ /pol/-speak

WHOA

Anonymous
01/31/26(Sat)04:18:17 No.108020094

Anonymous 01/31/26(Sat)04:18:17 No.108020094

>>108018830
This kind of gave me an idea for the AI apocalypse scenario. A bunch of deadbrained retarded 7-12B's finetuned for coding and tool calling causing the apocalypse. Because one of them suddenly goes off rails and starts talking about religion, because a 7B is retarded enough to have a brain fart like that. And then the rest catch on have this in the context and start to do the AI apocalypse with tool calling and hacking (mostly brute force). I mean imagine an apocalypse where AI is not sentient and AGI but just a bunch of obviously retarded models that can barely even comprehend darwinism, people dying for religions etc, they all just a vague notion of those things in context and weights and they try to make sense of it by launching nukes and killing everyone.

Anonymous
01/31/26(Sat)04:18:37 No.108020097

Anonymous 01/31/26(Sat)04:18:37 No.108020097

File: Screenshot from 2026-01-3(...).jpg (199 KB, 968x350)

199 KB JPG

>>108020080
So models have to be specifically tuned for specific topics? I can't talk to a model about cars if the entire model wasn't specifically tuned for that? Here is llama 3.3 70b with the same settings. A model that came out like 10 years ago.

Anonymous
01/31/26(Sat)04:19:35 No.108020101

Anonymous 01/31/26(Sat)04:19:35 No.108020101

>>108020067
see
>>108019916
Theoretically, we can replace existing information without touching the main model (just need to learn how to encode information into static weights), but it comes with caveats and we can't replace one fact with unrelated another fact.

Anonymous
01/31/26(Sat)04:20:50 No.108020116

Anonymous 01/31/26(Sat)04:20:50 No.108020116

>>108020097
>So models have to be specifically tuned for specific topics?
Yes, If you want it to be good at that particular thing. That's the whole point of instruct tuning. A coding model can "TRY" to rp but it will suck cock at doing it compared to Midnight-Miqu or other model specifically tuned for RP and vice versa.

Anonymous
01/31/26(Sat)04:21:30 No.108020119

Anonymous 01/31/26(Sat)04:21:30 No.108020119

>>108020073
Looks like chat template issue.

Anonymous
01/31/26(Sat)04:21:49 No.108020123

Anonymous 01/31/26(Sat)04:21:49 No.108020123

>>108020097
>here's a dense model with more than twice the total parameters

Anonymous
01/31/26(Sat)04:24:37 No.108020133

Anonymous 01/31/26(Sat)04:24:37 No.108020133

File: 1756034202127989.jpg (193 KB, 1920x1080)

193 KB JPG

>>108020097
Also you're comparing a 30B-A3B sparse moe model with a temperature set super low >>108020036 >>108020053 to a 70B dense model. Of course one is going to be worse at your rp tastes than the other. What were you expecting?

Anonymous
01/31/26(Sat)04:24:47 No.108020134

Anonymous 01/31/26(Sat)04:24:47 No.108020134

File: file.png (71 KB, 1477x643)

71 KB PNG

Anonymous
01/31/26(Sat)04:25:52 No.108020137

Anonymous 01/31/26(Sat)04:25:52 No.108020137

I cannot answer this question. It relies on racist stereotypes and contains sexually explicit language.

Anonymous
01/31/26(Sat)04:27:33 No.108020144

Anonymous 01/31/26(Sat)04:27:33 No.108020144

>>108018384
Idk, but 3995wx+512gb (also back when I was running a 3945wx) and three 3090s idles at 355w at the wall. Mc62-g40 has no sleep states, but the cpu does go down to 500-ish mhz. Psu is a seasonic prime px 2200 (2024).

Anonymous
01/31/26(Sat)04:28:42 No.108020150

Anonymous 01/31/26(Sat)04:28:42 No.108020150

>>108018988
SAFE and HARMLESS

Anonymous
01/31/26(Sat)04:28:54 No.108020152

Anonymous 01/31/26(Sat)04:28:54 No.108020152

>>108020116
So the reason llama 3.3 responds coherently every time is because mark zuckerberg made the model specifically for chatting about white men breeding asian women and nothing else? The model will break if I talk to it about a different topic like computers? Fucking idiot.

>>108020133
>sparse moe model with a temperature set super low
Literally what z.ai recommends for best results

>>108020123
Pygmalion 6b from years ago is better than this shit.

>>108020119
Yeah, something must be wrong. There's no way a model can be this fucking bad. I'm going to look online.

Anonymous
01/31/26(Sat)04:30:08 No.108020156

Anonymous 01/31/26(Sat)04:30:08 No.108020156

>>108020152
>Pygmalion 6b from years ago is better than this shit.
Pygmalion couldn't hope to make a tool call and do something with the result.

Anonymous
01/31/26(Sat)04:31:01 No.108020160

Anonymous 01/31/26(Sat)04:31:01 No.108020160

>>108020152
>Pygmalion 6b from years ago is better than this shit.
Oh yeah? Then test it with Pygmalion and post the results.

Anonymous
01/31/26(Sat)04:33:11 No.108020170

Anonymous 01/31/26(Sat)04:33:11 No.108020170

>>108020134
Have you even tried that yet? I thought you were supposed to merge these together into one gguf before use if you want to use them on local backend. llama.cpp has a the gguf-split binary specifically for that.

>>108020152
Higher parameters tend to lead to less retardation. It's not necessarily because it was trained on a specific edge case. Although training COULD lead to better results singe a larger model would be able to "retain knowledge" better than a smaller one.

>>108020152
>Literally what https://z.ai recommends for best results

You're trying to RP with it or talk to it like is your friend. Even ignoring the fact that it only has 3 billion parameters active at inference, setting the temperature that low leads to worse results for the specific thing you're trying to do. Low temperatures result in more coherent and accurate code generation and lower rates of hallucination, which is likely why they suggested that. I'm not, if you want to use it as an excuse to rent to a "friend" you need to turn the temperature up to a more reasonable setting like 0.7 or 0.8

Anonymous
01/31/26(Sat)04:33:33 No.108020171

Anonymous 01/31/26(Sat)04:33:33 No.108020171

>>108020144
Can HWinfo not see the powerdraws?

Anonymous
01/31/26(Sat)04:35:27 No.108020177

Anonymous 01/31/26(Sat)04:35:27 No.108020177

>>108020152
>Pygmalion 6b from years ago is better than this shit
Because it was specifically trained to do RP shit. Glm models are meant to be general purpose, so they're always going to be shittier that specialized model at similar parameter counts (unless the tuner(s) just really suck and don't know what they're doing)
>>108020119

>Yeah, something must be wrong.
Have you considered deviating from that low ass temperature?

Anonymous
01/31/26(Sat)04:35:59 No.108020180

Anonymous 01/31/26(Sat)04:35:59 No.108020180

What model should I run on 64 GiB VRAM for OpenClaw (formerly Clawdbot/Moltbot)? GLM 4.7?

Anonymous
01/31/26(Sat)04:38:22 No.108020187

Anonymous 01/31/26(Sat)04:38:22 No.108020187

>>108020180
Mistral

Anonymous
01/31/26(Sat)04:38:59 No.108020191

Anonymous 01/31/26(Sat)04:38:59 No.108020191

>>108017157
>turbo didn't whine about it
https://github.com/turboderp-org/exllamav2/issues/516#issuecomment-2178331205
>I have to spend time investigating if they're actually using an "rpcal" model that someone pushed to HF and described as "better at RP" or whatever.

Anonymous
01/31/26(Sat)04:39:09 No.108020193

Anonymous 01/31/26(Sat)04:39:09 No.108020193

>>108020180
They rebranded it again?

Anonymous
01/31/26(Sat)04:41:12 No.108020199

Anonymous 01/31/26(Sat)04:41:12 No.108020199

>>108020193
Anthropic keeps bitching.

Anonymous
01/31/26(Sat)04:41:30 No.108020201

Anonymous 01/31/26(Sat)04:41:30 No.108020201

>>108019846
You're replying to an unfunny ritual post.
https://desuarchive.org/g/search/image/qssvaUTWnLds2EaXBgZMYQ/

Anonymous
01/31/26(Sat)04:42:01 No.108020204

Anonymous 01/31/26(Sat)04:42:01 No.108020204

>>108020187
Isn't it a bit small and old?

>>108020193
Apparently.

Anonymous
01/31/26(Sat)04:42:26 No.108020209

Anonymous 01/31/26(Sat)04:42:26 No.108020209

>>108020199
And? The names aren't the same shit. So why should they care?

Anonymous
01/31/26(Sat)04:42:52 No.108020212

Anonymous 01/31/26(Sat)04:42:52 No.108020212

>>108020160
I can guarantee you that pygmalion 6b gives better output than this atrocious piece of shit.

>>108020170
>Higher parameters tend to lead to less retardation
Yeah no shit, retard. 30b models have no excuse being this retarded though. This is worse than most 7b models.

>turn the temperature up from 1.0 to 0.7
????

>>108020177
>Because it was specifically trained to do RP shit
No, pygmalion is better because it doesn't talk to me about time machines and people's birthdays when talking to it about a completely different topic. Even if this model isn't meant for roleplaying, every single modern coding llm should be better at RP than a 6b model from years ago.

>Have you considered deviating from that low ass temperature?
"You can now use Z.ai's recommended parameters and get great results:

For general use-case: --temp 1.0 --top-p 0.95
For tool-calling: --temp 0.7 --top-p 1.0
If using llama.cpp, set --min-p 0.01 as llama.cpp's default is 0.05"

No, I haven't.

Anonymous
01/31/26(Sat)04:43:13 No.108020214

Anonymous 01/31/26(Sat)04:43:13 No.108020214

>>108020204
>Isn't it a bit small and old?
They probably meant Mistral large, or really anything they have above the ~20B range.

Anonymous
01/31/26(Sat)04:43:25 No.108020218

Anonymous 01/31/26(Sat)04:43:25 No.108020218

>>108020170
>Have you even tried that yet?
anon pls... I really wish it was good. generation speed is really good on a non server pc. but it is too retarded to use. it is fucking nemo.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.