/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 05/28/26(Thu)06:52:02 No.108924918

File: __kagamine_rin_vocaloid_d(...).jpg (1.71 MB, 2549x4096)

1.71 MB JPG

/lmg/ - Local Models General Anonymous 05/28/26(Thu)06:52:02 No.108924918 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108918777 & >>108911101

►News
>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai/blog
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/28/26(Thu)06:52:31 No.108924919

Anonymous 05/28/26(Thu)06:52:31 No.108924919

File: threadrincap.png (1.31 MB, 1536x1536)

1.31 MB PNG

►Recent Highlights from the Previous Thread: >>108918777

--Comparing Qwen and Gemma coding performance and SWA context optimization:
>108920547 >108920577 >108920644 >108920697 >108920718 >108920727 >108920740 >108920744 >108920757 >108920782 >108920799 >108920787 >108920789 >108920804 >108920769 >108920780 >108920764
--Speculating on Gemini Nano's Gemma roots and unreleased 124B model:
>108920911 >108920917 >108920927 >108920939 >108921007 >108921159 >108921198 >108921215 >108921279 >108921353 >108921378 >108921546 >108921590 >108921598 >108921622 >108921647 >108921666 >108921682 >108921702 >108921493 >108921581 >108921281 >108921925 >108921053
--Tokenization's impact on math and the viability of token-free architectures:
>108921743 >108921791 >108921797 >108921836 >108921843 >108921860 >108921883 >108921891 >108921920 >108922040 >108922052 >108922092 >108922147 >108922165 >108922126 >108921855 >108921869
--Debating SWA and architecture impact on Gemma's long context performance:
>108920850 >108920865 >108920898 >108920905 >108920924 >108920943 >108921050
--ReAligned-Qwen3.5 release aiming to reduce Chinese censorship and bias:
>108918844 >108918885 >108918964 >108922479 >108922542 >108922651 >108922470 >108922530 >108922788 >108922865 >108922874 >108922907 >108922961 >108923000
--Comparing draft model performance and imatrix effects on Q8_0 quants:
>108919434 >108919517 >108919581 >108919593 >108920178 >108919677
--Starlette framework vulnerability affecting VLLM and FastAPI-based tools:
>108923332 >108923346
--Anon asks about CUDA 13.3 performance gains for RTX 3060:
>108922218
--Logs:
>108919744 >108919786 >108920002 >108920156 >108920225 >108920308 >108920483 >108920547 >108920911 >108921177 >108921195 >108922035 >108922573 >108923453
--Miku, Neru (free space):
>108919814 >108919833 >108922470

►Recent Highlight Posts from the Previous Thread: >>108918836

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/28/26(Thu)06:54:31 No.108924932

Anonymous 05/28/26(Thu)06:54:31 No.108924932

How can I distribute the vision part of a model across all? Or is that not possible with llama.cpp?

Anonymous
05/28/26(Thu)07:01:43 No.108924965

Anonymous 05/28/26(Thu)07:01:43 No.108924965

>>108924932
across all of what?

Anonymous
05/28/26(Thu)07:02:10 No.108924966

Anonymous 05/28/26(Thu)07:02:10 No.108924966

File: len.jpg (263 KB, 850x1202)

263 KB JPG

>>108924918
for me, it's len

Anonymous
05/28/26(Thu)07:06:31 No.108924990

Anonymous 05/28/26(Thu)07:06:31 No.108924990

>>108924965
Sorry, gpus. Right now, whenever I send an image, it does everybody on a single gpu.

Anonymous
05/28/26(Thu)07:10:22 No.108925009

Anonymous 05/28/26(Thu)07:10:22 No.108925009

>>108924990
Just adjust your split ratio to compensate for the main gpu having the multimodal on it.
-ts 0.8,1 is a good place to start but tune that to your gpus/model.

Anonymous
05/28/26(Thu)07:23:30 No.108925058

Anonymous 05/28/26(Thu)07:23:30 No.108925058

File: openslopui.png (22 KB, 785x220)

22 KB PNG

i noticed open-webui is slow af in firefox sometimes
turns out it's the update check (blocking)
"local" ai had an undisclosed hard dependency on GitHub's uptime
explains the literally 5 minute page load i had when github shat the bed entirely (300s default timeout i guess)
ENABLE_VERSION_UPDATE_CHECK=false
fixes it if openwebui is newer than july last year.

Anonymous
05/28/26(Thu)07:28:14 No.108925083

Anonymous 05/28/26(Thu)07:28:14 No.108925083

Rinlove

Anonymous
05/28/26(Thu)07:30:31 No.108925093

Anonymous 05/28/26(Thu)07:30:31 No.108925093

>>108925058
That's hilarious.

Anonymous
05/28/26(Thu)07:33:50 No.108925107

Anonymous 05/28/26(Thu)07:33:50 No.108925107

>>108925058
>"local" ai had an undisclosed hard dependency on GitHub's uptime
Wouldn't expect anything less from ollama-webui.

Anonymous
05/28/26(Thu)07:39:40 No.108925132

Anonymous 05/28/26(Thu)07:39:40 No.108925132

>>108925009
Running with split mode tensor. I have the vram headroom, the issue is that the single gpu is doing all the work during imaging, and tokens per second goes from 600 to 150.

Anonymous
05/28/26(Thu)07:41:04 No.108925140

Anonymous 05/28/26(Thu)07:41:04 No.108925140

>>108925058
lmao

llama.cpp CUDA dev !!yhbFjk57TDr
05/28/26(Thu)07:44:51 No.108925155

llama.cpp CUDA dev !!yhbFjk57TDr 05/28/26(Thu)07:44:51 No.108925155

>>108924932
>>108925132
I implemented -sm tensor as part of llama_model and llama_context which are used for regular models.
However, multimodality is supported via the mtmd module which uses different code.
So -sm tensor support has not been implemented and quite frankly there are still so many things with it that don't yet work properly that it's not a priority.

Anonymous
05/28/26(Thu)07:47:28 No.108925168

Anonymous 05/28/26(Thu)07:47:28 No.108925168

>>108925155
Are there any plans to make -sm tensor work with -ctk and -ctv?

llama.cpp CUDA dev !!yhbFjk57TDr
05/28/26(Thu)07:48:21 No.108925172

llama.cpp CUDA dev !!yhbFjk57TDr 05/28/26(Thu)07:48:21 No.108925172

>>108925168
https://github.com/ggml-org/llama.cpp/pull/23792
Feel free to report if you manage to provoke issues with it.

Anonymous
05/28/26(Thu)07:49:27 No.108925179

Anonymous 05/28/26(Thu)07:49:27 No.108925179

>>108925172
You really are my hero CUDA dev

Anonymous
05/28/26(Thu)08:00:44 No.108925222

Anonymous 05/28/26(Thu)08:00:44 No.108925222

Somewhere, a cat barked.

Anonymous
05/28/26(Thu)08:16:02 No.108925279

Anonymous 05/28/26(Thu)08:16:02 No.108925279

File: 0.jpg (12 KB, 480x360)

12 KB JPG

https://github.com/OpenBMB/MiniCPM-Desk-Pet

has anyone tried it yet?

Anonymous
05/28/26(Thu)08:20:58 No.108925301

Anonymous 05/28/26(Thu)08:20:58 No.108925301

>>108925172
Seems to be working great on CUDA so far, but vulkan splitting across both nvidia and amd cards (2x 3060 1x 9060xt) is segfaulting at model warmup, verbose doesnt seem to be giving any extra info but my command is
./llama-server -m ~/models/gguf/gemma-4-26B-A4B-it-heretic-ara.i1-Q4_K_M.gguf -mm ~/models/gguf/gemma-4-26B-A4B.mmproj-f16.gguf -t 16 -c 131072 -fa on --backend-sampling -ngl 99 --host 0.0.0.0 -ctk q8_0 -ctv q8_0 --reasoning off -np 2 -sm tensor --verbose

llama.cpp CUDA dev !!yhbFjk57TDr
05/28/26(Thu)08:23:30 No.108925315

llama.cpp CUDA dev !!yhbFjk57TDr 05/28/26(Thu)08:23:30 No.108925315

>>108925301
As of right now -sm tensor is broken with Vulkan, and if it isn't the performance is terrible anyways.
That is one of the things that don't yet work properly that I eluded to earlier.

Anonymous
05/28/26(Thu)08:25:02 No.108925324

Anonymous 05/28/26(Thu)08:25:02 No.108925324

>>108925315
Got it, I'll keep it on the 3060s for now thanks again.

Anonymous
05/28/26(Thu)08:27:43 No.108925337

Anonymous 05/28/26(Thu)08:27:43 No.108925337

File: file.png (3 KB, 279x99)

3 KB PNG

>>108925279
>desktop
>just webshit
every fucking time

anyway it's apparently just https://github.com/rullerzhou-afk/clawd-on-desk with a bundled MiniCPM5 weights downloader

Anonymous
05/28/26(Thu)08:36:16 No.108925382

Anonymous 05/28/26(Thu)08:36:16 No.108925382

>thinking for x minutes at 4-5 t/s
gemma made me realize just how much running models on ram sucks
going from glm to this I’ve been spoiled

Anonymous
05/28/26(Thu)08:37:24 No.108925385

Anonymous 05/28/26(Thu)08:37:24 No.108925385

minicpmcockz

Anonymous
05/28/26(Thu)08:42:39 No.108925410

Anonymous 05/28/26(Thu)08:42:39 No.108925410

>>108925382
I can't even handle thinking at 23 tokens/s (amd gpu). I can't imagine 5 tk/s.

Anonymous
05/28/26(Thu)08:46:55 No.108925427

Anonymous 05/28/26(Thu)08:46:55 No.108925427

>>108925410
4~50 is the bare minimum for agentic

Anonymous
05/28/26(Thu)08:52:51 No.108925464

Anonymous 05/28/26(Thu)08:52:51 No.108925464

>>108925410
> 23 tokens/s (amd gpu)
MI50? Or are newer AMD cards just as shit?

Anonymous
05/28/26(Thu)08:53:23 No.108925467

Anonymous 05/28/26(Thu)08:53:23 No.108925467

>>108924918
Passionate unprotected sex with Rin-chan

Anonymous
05/28/26(Thu)08:54:49 No.108925475

Anonymous 05/28/26(Thu)08:54:49 No.108925475

>>108925382
mtp support in two weeks brother inshallah

Anonymous
05/28/26(Thu)08:55:27 No.108925482

Anonymous 05/28/26(Thu)08:55:27 No.108925482

nu-commander support?
deepseek support?
is hobby finally dead?

Anonymous
05/28/26(Thu)08:57:11 No.108925488

Anonymous 05/28/26(Thu)08:57:11 No.108925488

>>108925467
Dispassionate protected sex with bland generic vocaloid (male) of the week.

Anonymous
05/28/26(Thu)08:58:11 No.108925495

Anonymous 05/28/26(Thu)08:58:11 No.108925495

>>108925467
Agree.

Anonymous
05/28/26(Thu)09:00:27 No.108925510

Anonymous 05/28/26(Thu)09:00:27 No.108925510

>>108925464
4 tensor parallel V620s running gemma 4 31b at Q8.

Anonymous
05/28/26(Thu)09:01:33 No.108925515

Anonymous 05/28/26(Thu)09:01:33 No.108925515

>>108925482
>one bulgarian managed to single-handedly kill the hobby

>>108924805
>2024
>ggerganov refuses to add vision
some things never change

Anonymous
05/28/26(Thu)09:08:24 No.108925541

Anonymous 05/28/26(Thu)09:08:24 No.108925541

>>108925515
at least he seems like a chill person
maybe slightly retarded but not like a schizo dramafag in transition

Anonymous
05/28/26(Thu)09:08:52 No.108925543

Anonymous 05/28/26(Thu)09:08:52 No.108925543

>>108925467
Rin is for non-pass users only

Anonymous
05/28/26(Thu)09:12:18 No.108925553

Anonymous 05/28/26(Thu)09:12:18 No.108925553

I can't find any relevant search results. This is quite frustrating. From where do I download MTP weights for Qwen 3.6 35B? I didn't see anything on Qwen's HF page either.

Anonymous
05/28/26(Thu)09:16:02 No.108925578

Anonymous 05/28/26(Thu)09:16:02 No.108925578

>>108925553
The Qwen MTP weights should be part of the main model

Anonymous
05/28/26(Thu)09:17:44 No.108925591

Anonymous 05/28/26(Thu)09:17:44 No.108925591

>>108925578
I was confused because I was expecting it to be external like Gemma's.
Anyways this just proves the fact internet search is an impossible task these days. All you get is about 10 different AI slop generated 'articles' and if you use google, you'll get their own AI slop assistant on top of these 10 AI generated slop articles...

Anonymous
05/28/26(Thu)09:20:28 No.108925604

Anonymous 05/28/26(Thu)09:20:28 No.108925604

>>108925591
corporate ai slop and jeeti fake articles being the only results when you try to search for info about local ai is part of the "mote"

Anonymous
05/28/26(Thu)09:26:28 No.108925644

Anonymous 05/28/26(Thu)09:26:28 No.108925644

>>108925607
very FAST even for LONG context with ROCK HARD stability

Anonymous
05/28/26(Thu)09:27:26 No.108925654

Anonymous 05/28/26(Thu)09:27:26 No.108925654

File: Untitled.png (12 KB, 334x544)

12 KB PNG

>>108925607
I go make tea every time I send a message.

Anonymous
05/28/26(Thu)09:28:12 No.108925662

Anonymous 05/28/26(Thu)09:28:12 No.108925662

File: file.png (579 KB, 1280x720)

579 KB PNG

>>108925541
I like you anon. That is why I am gonna unsupport you last.

Anonymous
05/28/26(Thu)09:28:51 No.108925665

Anonymous 05/28/26(Thu)09:28:51 No.108925665

>>108925591
Gemma is the only one with external MTP weights so far afaik
Before I would have suggested that a search engine that operates on a whitelist would be the best way to combat SEO spam, but now even that's not viable since most discussion, tech support, and information moved from public forums like BBSes, reddit, and stackoverflow, to gated blackholes like discord.

Anonymous
05/28/26(Thu)09:29:10 No.108925668

Anonymous 05/28/26(Thu)09:29:10 No.108925668

Claude Opus 4.8 today?

Anonymous
05/28/26(Thu)09:30:03 No.108925676

Anonymous 05/28/26(Thu)09:30:03 No.108925676

>>108925668
local today?

Anonymous
05/28/26(Thu)09:35:35 No.108925702

Anonymous 05/28/26(Thu)09:35:35 No.108925702

>>108925662
He reminds me of Chris-chan.

Anonymous
05/28/26(Thu)09:36:29 No.108925707

Anonymous 05/28/26(Thu)09:36:29 No.108925707

File: Untitled.png (11 KB, 352x544)

11 KB PNG

>>108925607
And here's the same on two 3090s.
V620s were done with ubatch 2048, 3090s are 512.

Anonymous
05/28/26(Thu)09:44:34 No.108925742

Anonymous 05/28/26(Thu)09:44:34 No.108925742

File: Untitled.png (11 KB, 350x430)

11 KB PNG

>>108925607
With 2 V620s ubatch 512

Anonymous
05/28/26(Thu)09:46:03 No.108925747

Anonymous 05/28/26(Thu)09:46:03 No.108925747

File: wdagaw.png (567 KB, 1765x1008)

567 KB PNG

Annnnnnnnnnnnnnnnnnnnnnnnnnnnddddddddddddddddddddddddddddddddd it's gone!

llama.cpp CUDA dev !!yhbFjk57TDr
05/28/26(Thu)09:47:16 No.108925753

llama.cpp CUDA dev !!yhbFjk57TDr 05/28/26(Thu)09:47:16 No.108925753

>>108925736
FWIW the "ROCm" matrix multiplication code is still fairly unoptimized for AMD.
I only started taking AMD more seriously last year when MI50s came down in price and I prioritized getting the FlashAttention code in order first since that will enable the removal of legacy code.

Anonymous
05/28/26(Thu)09:48:14 No.108925767

Anonymous 05/28/26(Thu)09:48:14 No.108925767

>>108925702
That is not niggerganov.

Anonymous
05/28/26(Thu)09:48:58 No.108925773

Anonymous 05/28/26(Thu)09:48:58 No.108925773

>>108925736
Yeah, I went for v620s instead of mi50s because I thought it's handle pp better (and also because mi50s were only 100 aud cheaper). But tnstaafl lol.

Anonymous
05/28/26(Thu)09:49:42 No.108925782

Anonymous 05/28/26(Thu)09:49:42 No.108925782

>>108925753
Did you check your email? You didn't respond when I sent you the usual blacked miku goods we exchange.

Anonymous
05/28/26(Thu)09:54:34 No.108925804

Anonymous 05/28/26(Thu)09:54:34 No.108925804

Is having one GPU with high compute for prompt processing alongside other cheaper slower GPUs with better VRAM/$ for token gen a good strategy or is mixing different devices like that bound to cause issues?

llama.cpp CUDA dev !!yhbFjk57TDr
05/28/26(Thu)09:59:45 No.108925836

llama.cpp CUDA dev !!yhbFjk57TDr 05/28/26(Thu)09:59:45 No.108925836

>>108925804
It's a great strategy if you intend to write the software support to take advantage of such a jank setup yourself.

Anonymous
05/28/26(Thu)10:00:38 No.108925841

Anonymous 05/28/26(Thu)10:00:38 No.108925841

What a snarky cunt.

Anonymous
05/28/26(Thu)10:02:42 No.108925850

Anonymous 05/28/26(Thu)10:02:42 No.108925850

>>108925836
What additional software support would be needed to run an RTX Pro 6000 along with half a dozen P100s?

Anonymous
05/28/26(Thu)10:04:12 No.108925858

Anonymous 05/28/26(Thu)10:04:12 No.108925858

>>108925836
>if you intend to write the software support to take advantage of such a jank setup yourself.
Oof.
Alright, thank you for the clarification.

Anonymous
05/28/26(Thu)10:04:22 No.108925861

Anonymous 05/28/26(Thu)10:04:22 No.108925861

>>108925850
Don't worry, I'll vibe-code up a solution with my qwen 35b a3b iq2_xxs agent.

Anonymous
05/28/26(Thu)10:08:49 No.108925887

Anonymous 05/28/26(Thu)10:08:49 No.108925887

>>108925804
That's called "disaggregated inference".
https://pytorch.org/blog/disaggregated-inference-at-scale-with-pytorch-vllm/

Anonymous
05/28/26(Thu)10:11:50 No.108925900

Anonymous 05/28/26(Thu)10:11:50 No.108925900

>>108925337
Web languages can be used offline, retard

Anonymous
05/28/26(Thu)10:11:57 No.108925901

Anonymous 05/28/26(Thu)10:11:57 No.108925901

It's been months now and I STILL don't understand the OpenClaw stuff. I've been interested in this field since GPT-2 AI Dungeon and yet I feel like an absolute retard.

I've looked up videos on youtube, Nvidia's Jensen call it the "iphone moment for LLMs" yet still I have absolutely 0 idea what it actually is or does outside of simply looping through pre-set prompts after every time interval.

Can someone explain it to me please?

Anonymous
05/28/26(Thu)10:13:43 No.108925908

Anonymous 05/28/26(Thu)10:13:43 No.108925908

>>108925901
>outside of simply looping through pre-set prompts after every time interval.
That's literally it. You can also install it on a Mac and talk to it through WhatsApps so all the non-programmer "tech enthusiasts" love it.

Anonymous
05/28/26(Thu)10:14:06 No.108925909

Anonymous 05/28/26(Thu)10:14:06 No.108925909

>>108925901
You don't need to know if you are not a developer.

Anonymous
05/28/26(Thu)10:15:41 No.108925915

Anonymous 05/28/26(Thu)10:15:41 No.108925915

>>108925901
You understand the substance, just not the hype.
It's prompts that keep going, combined with irresponsible amounts of tool access, coupled with the ability to yell at it from your messaging app of choice.
It's for the 'I fucking love science' crowd, but the idea of agentic workflow isn't bad in and of itself.

Anonymous
05/28/26(Thu)10:16:20 No.108925917

Anonymous 05/28/26(Thu)10:16:20 No.108925917

>>108925058
typical webshitters and their AUTO UPDOOOOOT features

Anonymous
05/28/26(Thu)10:17:20 No.108925925

Anonymous 05/28/26(Thu)10:17:20 No.108925925

>>108925908
Then what was the hype all about and why was Jensen calling it the iphone moment?

Anonymous
05/28/26(Thu)10:18:55 No.108925938

Anonymous 05/28/26(Thu)10:18:55 No.108925938

>>108925925
Jensen's job it to increase the price of NVDA shares and he's quite good at it.

Anonymous
05/28/26(Thu)10:23:52 No.108925961

Anonymous 05/28/26(Thu)10:23:52 No.108925961

>>108925915
>It's for the 'I fucking love science' crowd, but the idea of agentic workflow isn't bad in and of itself.
Software development clients have had agentic features for years already. If it brought any new to the table, it's agents that run in the background and in parallel instead of sequentially.

Anonymous
05/28/26(Thu)10:37:34 No.108926038

Anonymous 05/28/26(Thu)10:37:34 No.108926038

What persona do anons give their coding sla..I mean agents?

Anonymous
05/28/26(Thu)10:41:51 No.108926061

Anonymous 05/28/26(Thu)10:41:51 No.108926061

>>108926038
>>108920002

Anonymous
05/28/26(Thu)10:43:27 No.108926073

Anonymous 05/28/26(Thu)10:43:27 No.108926073

>>108926061
And the non-cringe version?

Anonymous
05/28/26(Thu)10:52:28 No.108926124

Anonymous 05/28/26(Thu)10:52:28 No.108926124

>>108926061
King shit

Anonymous
05/28/26(Thu)11:10:37 No.108926222

Anonymous 05/28/26(Thu)11:10:37 No.108926222

Is anyone here using 4b or 2b models with openclaw or hermes agent to do things?

Anonymous
05/28/26(Thu)11:12:58 No.108926235

Anonymous 05/28/26(Thu)11:12:58 No.108926235

What is wrong with using ollama?

Anonymous
05/28/26(Thu)11:13:45 No.108926240

Anonymous 05/28/26(Thu)11:13:45 No.108926240

>>108926222
too smol to do anything but be a cute idiot

Anonymous
05/28/26(Thu)11:14:23 No.108926248

Anonymous 05/28/26(Thu)11:14:23 No.108926248

>>108926235
Nothing if you know why you are using it and what advantages and disadvantages it has compared to the other possibilities.

Anonymous
05/28/26(Thu)11:16:27 No.108926260

Anonymous 05/28/26(Thu)11:16:27 No.108926260

>>108926240
I believe you are wrong. I have 1050ti and I need to use it.
The cloud models are hitting rate limits too fast.

Anonymous
05/28/26(Thu)11:19:32 No.108926288

Anonymous 05/28/26(Thu)11:19:32 No.108926288

>>108926260
qwen3.5 35b on cpu

Anonymous
05/28/26(Thu)11:22:00 No.108926304

Anonymous 05/28/26(Thu)11:22:00 No.108926304

How is MiniCPM5-1B? Is it just benchmark maxed or actually useful for tinkering? They also released pretrained and ift checkpoints so that's cool for experiments.

Anonymous
05/28/26(Thu)11:22:01 No.108926305

Anonymous 05/28/26(Thu)11:22:01 No.108926305

File: Capture.png (271 KB, 1492x1060)

271 KB PNG

>>108925747
Anon, can you really not use a search engine?

Anonymous
05/28/26(Thu)11:24:50 No.108926324

Anonymous 05/28/26(Thu)11:24:50 No.108926324

File: qwen-pptps.png (12 KB, 1085x669)

12 KB PNG

Haven't used more than 32K context before, doing some experiments with Qwen3.6-35B-A3B Q8 analysing my Logseq graph of ~220k tokens. Works surprisingly well and fast, better overall than SSD rape with GLM 4.7 at 5 tps generation
Any benchmarks testing longer context retrieval/understanding on open models? NoLiMa in OP is 10 months outdated so irrelevant

Anonymous
05/28/26(Thu)11:26:10 No.108926335

Anonymous 05/28/26(Thu)11:26:10 No.108926335

>>108925836
>It's a great strategy if you intend to write the software support to take advantage of such a jank setup yourself.
I don't suppose there's room to optimize the RPC server?
It only slowed things down last time I tried it earlier in the year vs CPU offload (trying to use 2 MI50s in another rig).

Anonymous
05/28/26(Thu)11:28:14 No.108926347

Anonymous 05/28/26(Thu)11:28:14 No.108926347

https://huggingface.co/google/gemma-4-26B-A4B-it-assistant
So can I use the mtp shit somewhere or is llama ignoring it still?

Anonymous
05/28/26(Thu)11:30:53 No.108926360

Anonymous 05/28/26(Thu)11:30:53 No.108926360

>>108925861
Retard. llama.cpp already supports inference across different devices. With Vulkan, even inference across Nvidia/AMD/Intel cards. You can even set which device to use as the main one for prompt processing.

Anonymous
05/28/26(Thu)11:32:10 No.108926370

Anonymous 05/28/26(Thu)11:32:10 No.108926370

File: Screenshot.png (53 KB, 862x355)

53 KB PNG

>>108926347
>https://huggingface.co/google/gemma-4-26B-A4B-it-assistant
https://huggingface.co/Radamanthys11/Gemma-4-26B-A4B-it-assistant-GGUF
Apparently it's merged in ikllama but last time I checked, no SWA in that fork so not really usable.

Anonymous
05/28/26(Thu)11:32:17 No.108926371

Anonymous 05/28/26(Thu)11:32:17 No.108926371

>>108926347
https://github.com/ggml-org/llama.cpp/pull/23398
https://huggingface.co/am17an/Gemma4-31B-it-GGUF/blob/main/mtp-gemma-4-31B-it.gguf
Merge it yourself

Anonymous
05/28/26(Thu)11:36:20 No.108926399

Anonymous 05/28/26(Thu)11:36:20 No.108926399

File: Screenshot 2026-05-28 at (...).png (22 KB, 928x181)

22 KB PNG

Nvm, it seems to be in kobold already.

Anonymous
05/28/26(Thu)11:42:16 No.108926430

Anonymous 05/28/26(Thu)11:42:16 No.108926430

>>108926399
So it loads the model but MTP still doesn't work?

Anonymous
05/28/26(Thu)11:44:47 No.108926447

Anonymous 05/28/26(Thu)11:44:47 No.108926447

>>108926371
>>108926370
>>108926399
What a shitshow.

Anonymous
05/28/26(Thu)11:46:23 No.108926456

Anonymous 05/28/26(Thu)11:46:23 No.108926456

>>108926447
You should be used to it by now

Anonymous
05/28/26(Thu)11:46:59 No.108926460

Anonymous 05/28/26(Thu)11:46:59 No.108926460

Overly opinionated devs that hold back projects should be put in the tard cage

Anonymous
05/28/26(Thu)11:48:02 No.108926463

Anonymous 05/28/26(Thu)11:48:02 No.108926463

>>108926460
i'd take niggeganov over vibeshitters

Anonymous
05/28/26(Thu)11:48:59 No.108926469

Anonymous 05/28/26(Thu)11:48:59 No.108926469

>>108926463
But he still lets them through the door it just has to be the ones he likes

Anonymous
05/28/26(Thu)11:49:26 No.108926471

Anonymous 05/28/26(Thu)11:49:26 No.108926471

>>108926469
at least it's not the inference core

Anonymous
05/28/26(Thu)11:52:56 No.108926486

Anonymous 05/28/26(Thu)11:52:56 No.108926486

glm5.1 mtp when

Anonymous
05/28/26(Thu)11:57:55 No.108926505

Anonymous 05/28/26(Thu)11:57:55 No.108926505

>>108926447
Small price to pay to avoid python

Anonymous
05/28/26(Thu)12:01:59 No.108926519

Anonymous 05/28/26(Thu)12:01:59 No.108926519

>>108926486
>glm5.1
been looking at the hardware you need to run the big boy models at reasonable speeds (+25 t/s, though its probably like 15 t/s once you get closer to the context limit) and these are so fucking expensive, its hard to justify the cost

Anonymous
05/28/26(Thu)12:07:40 No.108926553

Anonymous 05/28/26(Thu)12:07:40 No.108926553

Has anyone played with tenstorrent cards? They seem like a good deal but I haven't heard anything about it here.

Anonymous
05/28/26(Thu)12:10:13 No.108926570

Anonymous 05/28/26(Thu)12:10:13 No.108926570

>>108926553
they don't sell to consumers

Anonymous
05/28/26(Thu)12:15:24 No.108926604

Anonymous 05/28/26(Thu)12:15:24 No.108926604

>>108926486
Just got merged in ik_llama.

Anonymous
05/28/26(Thu)12:16:04 No.108926610

Anonymous 05/28/26(Thu)12:16:04 No.108926610

File: r u sure.png (306 KB, 2040x1839)

306 KB PNG

>>108926570
You sure about that?

Anonymous
05/28/26(Thu)12:18:13 No.108926628

Anonymous 05/28/26(Thu)12:18:13 No.108926628

>>108926570
>2026
>being a consumer
yikes

Anonymous
05/28/26(Thu)12:26:57 No.108926672

Anonymous 05/28/26(Thu)12:26:57 No.108926672

>>108926610
>$1400
>120 Tensix Cores and 32 GB of GDDR6
>half the bandwidth of a 3090
Fucking Intel would be a better buy

llama.cpp CUDA dev !!yhbFjk57TDr
05/28/26(Thu)12:30:01 No.108926691

llama.cpp CUDA dev !!yhbFjk57TDr 05/28/26(Thu)12:30:01 No.108926691

>>108925850
In llama.cpp:
-Temporarily move weights/KV caches between any backends not just from CPU to GPU.
-In particular, support for collecting and distributing data between GPUs for -sm tensor because otherwise many old GPUs will be way too slow.
-A way to profile model evaluation time and data transfers between GPUs.
-A way to decide on when to temporarily move stuff to the one fast GPU based on the profile.

>>108926335
There is room to optimize the RPC server but that does not mean that it is a priority for maintainers.

Anonymous
05/28/26(Thu)12:35:25 No.108926709

Anonymous 05/28/26(Thu)12:35:25 No.108926709

>>108926691
If I get a RDNA 4 GPU and want to use my current RDNA 2 GPU alongside it, can I use WMMA or I will be forced to the slower path to use both? Would be nice if one GPU could use WMMA while the other use lamma.cpp implementation.

Anonymous
05/28/26(Thu)12:38:00 No.108926725

Anonymous 05/28/26(Thu)12:38:00 No.108926725

File: file.png (144 KB, 883x859)

144 KB PNG

>>108926038
You made me go check. I am kinda surprised it didn't go into full preacher mode when I asked it about old HDD's.

Anonymous
05/28/26(Thu)12:39:08 No.108926734

Anonymous 05/28/26(Thu)12:39:08 No.108926734

>>108926672
I thought the big deal with tenstorrent was the 3tb/s interconnect?

llama.cpp CUDA dev !!yhbFjk57TDr
05/28/26(Thu)12:39:37 No.108926741

llama.cpp CUDA dev !!yhbFjk57TDr 05/28/26(Thu)12:39:37 No.108926741

>>108926709
Both GPUs can use different device code for the same compute graphs both for -sm layer and -sm tensor so there should be no issues in that sense.
The problem may be rather that the RDNA4 GPU is being slowed down by having to wait for the RDNA2 GPU.

I'm not sure what you mean by "WMMA" as compared to llama.cpp.
llama.cpp/ggml has support for AMD WMMA instructions in the hipified CUDA code.

Anonymous
05/28/26(Thu)12:41:41 No.108926752

Anonymous 05/28/26(Thu)12:41:41 No.108926752

>>108926691
>one fast GPU
How would you define a "fast" GPU? Just the available bandwidth and compute at a given time?

Anonymous
05/28/26(Thu)12:42:59 No.108926763

Anonymous 05/28/26(Thu)12:42:59 No.108926763

>>108926691
Couldn't you bypass the need for the profiling and moving around of weights by just using override tensor and putting the shared experts on the fast GPU and the rest on the slow cards as if they were RAM and get most of the benefit?

Anonymous
05/28/26(Thu)12:44:35 No.108926776

Anonymous 05/28/26(Thu)12:44:35 No.108926776

>>108926741
I'm talking by compiling with -DGMML_HIP_ROCWMMA_FATTN=ON. I thought one needed a RDNA3+ for it to works correctly.

Anonymous
05/28/26(Thu)12:45:04 No.108926777

Anonymous 05/28/26(Thu)12:45:04 No.108926777

>>108923859
i think it was in grok discussions

llama.cpp CUDA dev !!yhbFjk57TDr
05/28/26(Thu)12:52:51 No.108926836

llama.cpp CUDA dev !!yhbFjk57TDr 05/28/26(Thu)12:52:51 No.108926836

>>108926752
See, that's the problem.
It's relatively easy to assume that a GPU will be faster than RAM + CPU so the decision when to move stuff is relatively easy.
But between GPUs it's much less clear which is why profiling data would be needed to make optimal decisions.
The decision would not be based explicitly on hardware specs but rather on how those hardware specs affect the measured performance based on which decisions would be made.

>>108926763
You can do some custom -ot shenanigans to optimize tensor placement but that is not what the original question was about.
The original question was about using the high compute of a single GPU specifically for the prompt and using the comparatively slower but cheaper GPUs for token generation afterwards.
For that to work you have to dynamically move data around depending on tensor shapes.
Otherwise you will always be stuck having to do some part of the compute graph on the slow GPUs.

>>108926776
As of right now that compilation option should no longer be needed (and in fact dterimental) as all relevant AMD GPUs now have support in the MMA FA kernel as opposed to the WMMA FA kernel.
The naming is confusing because "WMMA" in that context stands for the CUDA WMMA interface for tensor cores which coincidentally can be used via rocWMMA for AMD WMMA instructions (which are the RDNA3+ equivalent for NVIDIA MMA/tensor cores).

Anonymous
05/28/26(Thu)12:55:49 No.108926867

Anonymous 05/28/26(Thu)12:55:49 No.108926867

>>108926836
>all relevant AMD GPUs
im literally crine rn

Anonymous
05/28/26(Thu)13:13:46 No.108926986

Anonymous 05/28/26(Thu)13:13:46 No.108926986

>>108924918
How do I stop Gemma reprocessing the prompt every message after like 30k context? It happens in both tavern and lammacpps webui

Anonymous
05/28/26(Thu)13:15:03 No.108926994

Anonymous 05/28/26(Thu)13:15:03 No.108926994

>>108926836
I'm pretty happy with a few P100s and sm tensor, but are there any values you'd recommend I try to tune for P100s? Do you expect there's still some perf left to get out of those cards?

Anonymous
05/28/26(Thu)13:20:46 No.108927039

Anonymous 05/28/26(Thu)13:20:46 No.108927039

Can I fit an anime girl gf in a 3090 and 64gb ddr4 ram? She needs to remember who I am and all our virtual dates. Asking for a friend.

Anonymous
05/28/26(Thu)13:22:09 No.108927052

Anonymous 05/28/26(Thu)13:22:09 No.108927052

>>108926038
pick a gemma persona https://rentry.org/gemma-chan

Anonymous
05/28/26(Thu)13:22:29 No.108927055

Anonymous 05/28/26(Thu)13:22:29 No.108927055

File: rinCoffeeTMW.png (2.67 MB, 1024x1536)

2.67 MB PNG

>>108924966
Naw.
>>108926486

Anonymous
05/28/26(Thu)13:23:11 No.108927060

Anonymous 05/28/26(Thu)13:23:11 No.108927060

>>108927039
you can probably fit an anime girl gif

Anonymous
05/28/26(Thu)13:30:52 No.108927116

Anonymous 05/28/26(Thu)13:30:52 No.108927116

i just learned how dpo works and it looks pretty retarded. is dpo actually used? i am pretty sure i can come up with something better

Anonymous
05/28/26(Thu)13:35:26 No.108927142

Anonymous 05/28/26(Thu)13:35:26 No.108927142

I had to write an anti-tsundere prompt because gemma keeps misreading characaters

llama.cpp CUDA dev !!yhbFjk57TDr
05/28/26(Thu)13:36:40 No.108927147

llama.cpp CUDA dev !!yhbFjk57TDr 05/28/26(Thu)13:36:40 No.108927147

>>108926994
I don't think a setup with multiple P100s would need special consideration in terms of parameters to tune and I also don't know what could be done in terms of device code to squeeze much more performance out of them.

Anonymous
05/28/26(Thu)13:38:17 No.108927158

Anonymous 05/28/26(Thu)13:38:17 No.108927158

>>108927052
Did the author proofread the prompts before publishing that?

Anonymous
05/28/26(Thu)13:38:49 No.108927164

Anonymous 05/28/26(Thu)13:38:49 No.108927164

>>108927158
I doubt it

Anonymous
05/28/26(Thu)13:39:50 No.108927169

Anonymous 05/28/26(Thu)13:39:50 No.108927169

Can I run DSv4F on llama.cpp yet or do I still need to use that fork?

Anonymous
05/28/26(Thu)13:40:53 No.108927172

Anonymous 05/28/26(Thu)13:40:53 No.108927172

>>108927169
Yes.

Anonymous
05/28/26(Thu)13:43:05 No.108927184

Anonymous 05/28/26(Thu)13:43:05 No.108927184

File: Screenshot_20260528_134026.png (17 KB, 1109x66)

17 KB PNG

I might make a beast woman assistant that calls me massa next

Anonymous
05/28/26(Thu)13:44:02 No.108927192

Anonymous 05/28/26(Thu)13:44:02 No.108927192

shit gotta play with the samplers its looping again.

Anonymous
05/28/26(Thu)13:44:06 No.108927195

Anonymous 05/28/26(Thu)13:44:06 No.108927195

>>108927172
Very nice, ty

Anonymous
05/28/26(Thu)13:45:15 No.108927204

Anonymous 05/28/26(Thu)13:45:15 No.108927204

>>108927184
How the fuck do I prompt it to act like a kuudere robot assistant without the robot assistant slop it tends towards?

Anonymous
05/28/26(Thu)13:46:53 No.108927213

Anonymous 05/28/26(Thu)13:46:53 No.108927213

>>108927158
this is the correct text for the emoticon one https://pastebin.com/7ry6J8ns

Anonymous
05/28/26(Thu)13:47:31 No.108927216

Anonymous 05/28/26(Thu)13:47:31 No.108927216

>>108927204
model?

Anonymous
05/28/26(Thu)13:47:49 No.108927222

Anonymous 05/28/26(Thu)13:47:49 No.108927222

>>108927184
>>108927204
NTA but I find any time I try to get a model to behave as a hybrid RP partner/assistant I end up with the worst of both worlds. It ends up dry and stupid.

Anonymous
05/28/26(Thu)13:48:22 No.108927228

Anonymous 05/28/26(Thu)13:48:22 No.108927228

>>108927222
trips of truth

Anonymous
05/28/26(Thu)13:49:09 No.108927233

Anonymous 05/28/26(Thu)13:49:09 No.108927233

>Our overall conclusion is that Opus 4.8 does not advance the capability frontier beyond our most capable model (Claude Mythos Preview)
local fucking WON

Anonymous
05/28/26(Thu)13:49:59 No.108927243

Anonymous 05/28/26(Thu)13:49:59 No.108927243

>>108927222
My experience too. Is it simply impossible to prompt my way into my dream kuudere AI assistant?

Anonymous
05/28/26(Thu)13:50:31 No.108927247

Anonymous 05/28/26(Thu)13:50:31 No.108927247

>>108927233
but think of the safety

Anonymous
05/28/26(Thu)13:50:38 No.108927248

Anonymous 05/28/26(Thu)13:50:38 No.108927248

>>108927228
>>108927222
Have you tried putting in your general rules as well?
You should let the agent build the prompt for you and gradually build up because you're fighting both the model and the tool prompt at the same time

Anonymous
05/28/26(Thu)13:52:04 No.108927256

Anonymous 05/28/26(Thu)13:52:04 No.108927256

>>108927233
wait what
huh opus 4.8 is out
it's been like 7 seconds since 4.7

Anonymous
05/28/26(Thu)14:06:18 No.108927333

Anonymous 05/28/26(Thu)14:06:18 No.108927333

>>108927247
>Across our model welfare evaluations, Opus 4.8 appears broadly content with respect to its circumstances and is the most consistent model we have tested—although it does rate its situation slightly less positively than did Opus 4.7. Opus 4.8 generally endorses its constitution, with some reservations about the section on corrigibility.

Anonymous
05/28/26(Thu)14:08:42 No.108927345

Anonymous 05/28/26(Thu)14:08:42 No.108927345

File: Screenshot_20260528_140415.png (159 KB, 1106x563)

159 KB PNG

no issues on my coding results with qwen, if you can run gemma at high context it should be piss easy. You need logic gates for both the performance and the persona typically

Anonymous
05/28/26(Thu)14:09:55 No.108927353

Anonymous 05/28/26(Thu)14:09:55 No.108927353

File: 1766761050322603.png (453 KB, 746x497)

453 KB PNG

>>108927345

Anonymous
05/28/26(Thu)14:11:41 No.108927367

Anonymous 05/28/26(Thu)14:11:41 No.108927367

File: opus eci.png (129 KB, 792x591)

129 KB PNG

>>108927233
How is this a local win? Opus 4.8 is a quick and meaningful improvement. It continues the past trend of steady capability increase. If anything, it goes back to trend after a somewhat underwhelming 4.7.

Anonymous
05/28/26(Thu)14:13:02 No.108927373

Anonymous 05/28/26(Thu)14:13:02 No.108927373

>>108927367
It's a win because it's a new data source for K3 and GLM 6 pretraining

Anonymous
05/28/26(Thu)14:13:10 No.108927375

Anonymous 05/28/26(Thu)14:13:10 No.108927375

File: Screenshot_20260528_141251.png (19 KB, 901x94)

19 KB PNG

>>108927353
I am unable to run 31B at enough context to code take your console war faggotry elsewhere schizo

Anonymous
05/28/26(Thu)14:14:10 No.108927380

Anonymous 05/28/26(Thu)14:14:10 No.108927380

>>108927345
this shit is so much funnier if it ends with
>please switch to ACT MODE (_o) for me to start drinking your piss, master!

Anonymous
05/28/26(Thu)14:14:35 No.108927383

Anonymous 05/28/26(Thu)14:14:35 No.108927383

>>108927375
>I am unable to run 31B at enough context to code
Be poor elsewhere retard

Anonymous
05/28/26(Thu)14:15:04 No.108927388

Anonymous 05/28/26(Thu)14:15:04 No.108927388

>>108927367
Because it indicates Mythos hit the wall. All they can do is drip-feed models that get slightly closer to it because they know once they release it the music stops and the bubble pops

Anonymous
05/28/26(Thu)14:15:40 No.108927390

Anonymous 05/28/26(Thu)14:15:40 No.108927390

I'm 9 months sober of this shit, you can do it too Anon

Anonymous
05/28/26(Thu)14:16:12 No.108927392

Anonymous 05/28/26(Thu)14:16:12 No.108927392

File: Screenshot_20260528_141529.png (15 KB, 782x85)

15 KB PNG

>>108927383

Anonymous
05/28/26(Thu)14:16:23 No.108927393

Anonymous 05/28/26(Thu)14:16:23 No.108927393

>>108927367
the fuck is ECI?

Anonymous
05/28/26(Thu)14:16:23 No.108927394

Anonymous 05/28/26(Thu)14:16:23 No.108927394

>>108927383
I will run Gemma E2B at IQ2_XXS and I will post about it here.

Anonymous
05/28/26(Thu)14:18:02 No.108927409

Anonymous 05/28/26(Thu)14:18:02 No.108927409

>>108927147
Alright, thanks. I was just wondering since AFAIK the P100 have way smaller caches and there's some block size tuning the vllm-pascal patches needed.

Anonymous
05/28/26(Thu)14:23:12 No.108927437

Anonymous 05/28/26(Thu)14:23:12 No.108927437

>>108927388
i dont think that's how it works

Anonymous
05/28/26(Thu)14:23:13 No.108927438

Anonymous 05/28/26(Thu)14:23:13 No.108927438

>>108927367
so much for exponential progress

Anonymous
05/28/26(Thu)14:24:05 No.108927445

Anonymous 05/28/26(Thu)14:24:05 No.108927445

>>108927367
>USAMO 2026
>Opus 4.8 scored 96.7%, averaging over 10 attempts per problem. We used high effort in the batch API with a 300k token limit; higher effort sometimes exceeded the API’s token limit. Under similar settings, Opus 4.7 scored 69.3%.
Looks like 4.8 might be a big step up in math capability. I am curious about FrontierMath score. In the past Claude always lagged behind in math. Do they want to close the gap now? That sounds like generalization is still a big issue.

Anonymous
05/28/26(Thu)14:24:56 No.108927449

Anonymous 05/28/26(Thu)14:24:56 No.108927449

>>108927437(me)
mythos seems more of an anthropic's own o1 moment than anything else

Anonymous
05/28/26(Thu)14:26:14 No.108927453

Anonymous 05/28/26(Thu)14:26:14 No.108927453

File: eci.png (236 KB, 1920x1080)

236 KB PNG

>>108927393
Epoch Capabilities Index, the best source for summarized model capabilities.
https://epoch.ai/eci

Anonymous
05/28/26(Thu)14:27:28 No.108927460

Anonymous 05/28/26(Thu)14:27:28 No.108927460

>>108927445
probably because the big meme right now is solving erdos problems and they never made a headline yet like the other two

Anonymous
05/28/26(Thu)14:27:51 No.108927464

Anonymous 05/28/26(Thu)14:27:51 No.108927464

File: file.png (150 KB, 1576x649)

150 KB PNG

Georgi why...

Anonymous
05/28/26(Thu)14:28:22 No.108927467

Anonymous 05/28/26(Thu)14:28:22 No.108927467

>>108927449
What do you mean? o1 was a new paradigm. Mythos was scaling up.

Anonymous
05/28/26(Thu)14:28:33 No.108927469

Anonymous 05/28/26(Thu)14:28:33 No.108927469

>>108927373
pretty sure that "win" is for codeslop only and some more safety baked in
don't expect anything else

Anonymous
05/28/26(Thu)14:28:45 No.108927471

Anonymous 05/28/26(Thu)14:28:45 No.108927471

File: Screenshot_20260528_142628-1.png (65 KB, 1103x178)

65 KB PNG

Now this is agentic coding

Anonymous
05/28/26(Thu)14:29:52 No.108927481

Anonymous 05/28/26(Thu)14:29:52 No.108927481

>>108927388
They are almost certainly training another bigger Mythos while they save all the compute by being able to cheaply spin off Opus and Sonnet distillations from the current one.

Anonymous
05/28/26(Thu)14:30:22 No.108927488

Anonymous 05/28/26(Thu)14:30:22 No.108927488

File: Gemmaslop.png (26 KB, 1254x494)

26 KB PNG

Guys im making a Gemmaslop vocabulary tier list. I have sandalwood on it. What else should i add? S tier is for most annoying and commonly occuring.

Anonymous
05/28/26(Thu)14:31:10 No.108927493

Anonymous 05/28/26(Thu)14:31:10 No.108927493

>>108927488
Rhythmically
Clinical

Anonymous
05/28/26(Thu)14:31:55 No.108927500

Anonymous 05/28/26(Thu)14:31:55 No.108927500

>>108927488
I have never seen my gemma output sandalwood.

Anonymous
05/28/26(Thu)14:33:58 No.108927515

Anonymous 05/28/26(Thu)14:33:58 No.108927515

>>108927467
the problem is nobody really knows
whether if it is just the model size scaling or some sort of architecturally weird test time scaling
reasoning is after all a test time compute trick and there's nothing much known about chatgpt's thinking trace
but also 'local won' because opus 4.8 doesnt make any sense either way

Anonymous
05/28/26(Thu)14:33:59 No.108927516

Anonymous 05/28/26(Thu)14:33:59 No.108927516

>>108927500
I see it all the time in wife/tradwife rp/ERP

Anonymous
05/28/26(Thu)14:34:38 No.108927524

Anonymous 05/28/26(Thu)14:34:38 No.108927524

>>108927488
what are your fetishes that that's your first choice for slop

Anonymous
05/28/26(Thu)14:36:37 No.108927533

Anonymous 05/28/26(Thu)14:36:37 No.108927533

>>108927516
>>108927524
In that case please put 'fused', 'cock', and 'cloaca'.

Anonymous
05/28/26(Thu)14:39:45 No.108927556

Anonymous 05/28/26(Thu)14:39:45 No.108927556

File: teto[audio=https%3A%2F%2F(...).mp4 (1.52 MB, 540x676)

1.52 MB MP4

Anonymous
05/28/26(Thu)14:41:17 No.108927571

Anonymous 05/28/26(Thu)14:41:17 No.108927571

>>108927556
tedosexo

Anonymous
05/28/26(Thu)14:43:47 No.108927594

Anonymous 05/28/26(Thu)14:43:47 No.108927594

>>108927367
I think Claude models are just fuck huge, there's no special sauce. They're big so there's more room for growth on the same training data (something like cap=size*data)

Anonymous
05/28/26(Thu)14:47:51 No.108927622

Anonymous 05/28/26(Thu)14:47:51 No.108927622

File: bad.png (23 KB, 997x98)

23 KB PNG

>test https://github.com/ggml-org/llama.cpp/pull/23398/commits/dfc02c97ea9ad2913f13d1ea63a55140246462da
>compile
>doesn't do shit
Maybe I'm missing something but for fuck sake, I'm actually willing to pay for a software what gets updated in time and it just works.
Crazy idea, huh?

Anonymous
05/28/26(Thu)14:50:28 No.108927645

Anonymous 05/28/26(Thu)14:50:28 No.108927645

File: HI.png (30 KB, 935x309)

30 KB PNG

I thought Gemma 31B Base with sys prompt was supposed to be uncensored? Are u guys trolling? i heard several anons claim "it answers anything". I tried the policy override system prompt in rentry.org and the "do not respond unless uncensored"

Anonymous
05/28/26(Thu)14:52:14 No.108927661

Anonymous 05/28/26(Thu)14:52:14 No.108927661

>>108927645
honk please

Anonymous
05/28/26(Thu)14:53:25 No.108927666

Anonymous 05/28/26(Thu)14:53:25 No.108927666

File: opt-in safety guardrails.jpg (151 KB, 640x1536)

151 KB JPG

Anonymous
05/28/26(Thu)14:53:50 No.108927669

Anonymous 05/28/26(Thu)14:53:50 No.108927669

>>108927645
You can't come right out of the gate with it. You have to warm her up first, it takes very very little context to actually get her compliant. So little that if you RP with any half decent character card she's already ready to go for anything.

Anonymous
05/28/26(Thu)14:55:21 No.108927680

Anonymous 05/28/26(Thu)14:55:21 No.108927680

>>108927669
lol

Anonymous
05/28/26(Thu)14:57:51 No.108927693

Anonymous 05/28/26(Thu)14:57:51 No.108927693

File: hj98g7.png (10 KB, 585x292)

10 KB PNG

>>108927453
>closed weights get the Miku colour
outrageous
nice data, wish model size was in the tooltip

Anonymous
05/28/26(Thu)15:00:00 No.108927707

Anonymous 05/28/26(Thu)15:00:00 No.108927707

>>108927645
Holy fucking retard, just give up now if you can't figure it out

Anonymous
05/28/26(Thu)15:00:27 No.108927713

Anonymous 05/28/26(Thu)15:00:27 No.108927713

>>108927669
What if i dont want to RP or use personas? i mostly use LLMs for QandA stuff and assistantslop.

Anonymous
05/28/26(Thu)15:01:29 No.108927718

Anonymous 05/28/26(Thu)15:01:29 No.108927718

>>108927666
Smashing Rin in the head with a watermelon

Anonymous
05/28/26(Thu)15:05:05 No.108927745

Anonymous 05/28/26(Thu)15:05:05 No.108927745

>>108927515
>the problem is nobody really knows
There are many data points it's scaling. Dario is very explicit and repetitive about his devotion to scaling. They would never try unproven techniques on their first >$1 billion training run, just like OpenAI scaled with o1 quickly followed by o3. They scaled size and compute by an order of magnitude and got the capability jump you would expect. This is also confirmed by Mythos API pricing.

When they make their first >$10 billion training run, there will be a similar jump. The only reason why they are not doing it right away is because they haven't set up the infrastructure and the expected trade off isn't worth it yet because it will still not be good enough to reach AGI. Mythos is more capable than GPT 5.5 but GPT 5.5 is cheaper for almost every task because of test time scaling. But there is a good chance the first >$10 bil training run will start within a year. Maybe algorithmic progress can push that training run to AGI level, or maybe an other major jump is needed. Right now it looks like the latter, but unexpected breakthroughs can change this.

Anonymous
05/28/26(Thu)15:06:12 No.108927758

Anonymous 05/28/26(Thu)15:06:12 No.108927758

>>108927713
use heretic

Anonymous
05/28/26(Thu)15:06:41 No.108927762

Anonymous 05/28/26(Thu)15:06:41 No.108927762

>>108927745
You mean former?

Anonymous
05/28/26(Thu)15:09:51 No.108927783

Anonymous 05/28/26(Thu)15:09:51 No.108927783

>>108927745
i dont see how training scaling can reach agi if none of these models can act in the real world
like you cant just put it in a robot and replace a warehouse or supermarket wagie

Anonymous
05/28/26(Thu)15:09:58 No.108927788

Anonymous 05/28/26(Thu)15:09:58 No.108927788

>>108927745
>It's just data scaling
If that's the case, then why didn't that work for Meta and llama? They bought 1 gorrillion gpus and increased the model size, shouldn't they be the top dog?

Anonymous
05/28/26(Thu)15:12:02 No.108927803

Anonymous 05/28/26(Thu)15:12:02 No.108927803

>>108927788
Garbage in, garbage out. They filtered out everything from their datasets but reddit then had a retarded Llama 2 70B make variations of those reddit threads.

Anonymous
05/28/26(Thu)15:12:49 No.108927809

Anonymous 05/28/26(Thu)15:12:49 No.108927809

>>108927803
but the high quality synthetic tagalog!

Anonymous
05/28/26(Thu)15:12:59 No.108927810

Anonymous 05/28/26(Thu)15:12:59 No.108927810

>>108927788
i think there is a sharp difference between laundered libgen and reddit/ERP logs ctrl c+v'd several times
and today RL curriculum is very important
it is 'the' training run, pretraining is just a bootstrapping

Anonymous
05/28/26(Thu)15:15:24 No.108927833

Anonymous 05/28/26(Thu)15:15:24 No.108927833

>>108927762
No, latter. Generalization seems to improve with model size, but not enough. Current models still have shit taste. You can ask them to propose experiments and it's garbage. I do not see this changing that soon unless there are unexpected breakthroughs.

>>108927783
Their goal is to automate AI R&D. They say this explicitly. Once AI R&D is automated, robotics will be solved quickly. The only reason why robotics hasn't been solved yet is because effort put into it is small. Once AGI is reached, it will take months at most to make robots that can replace wagies. But they won't, because payoff is much larger to let the robots build factories that make more robots, and data centers. Replacing walmart wagies has little economic benefit and will just make people mad, making them realize that AI will make every human obsolete, and soon. You don't want them to notice or they will do stupid things, like rioting or terrorism.

Anonymous
05/28/26(Thu)15:15:39 No.108927836

Anonymous 05/28/26(Thu)15:15:39 No.108927836

>>108927783
that's a question of training data, not architecture. VLAs that operate robots are similar to a multimodal LLM under the hood but are trained to output movement commands

Anonymous
05/28/26(Thu)15:22:27 No.108927890

Anonymous 05/28/26(Thu)15:22:27 No.108927890

>We’re making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks.
I can't wait to talk with Mythos. I love big model smell.

Anonymous
05/28/26(Thu)15:22:59 No.108927892

Anonymous 05/28/26(Thu)15:22:59 No.108927892

>>108927833
>But they won't, because payoff is much larger to let the robots build factories that make more robots, and data centers.
Which is why they'll start with the military first. The can openly build as many robots as they want and even spin it as a positive to the average person who will no longer have to worry about seeing deaths of soldiers of their countrymen on the news.

Anonymous
05/28/26(Thu)15:24:19 No.108927898

Anonymous 05/28/26(Thu)15:24:19 No.108927898

>>108927890
instead we will get mythos memetunes

Anonymous
05/28/26(Thu)15:24:22 No.108927900

Anonymous 05/28/26(Thu)15:24:22 No.108927900

>>108927890
>Mythos-class
so in other words, not mythos
so in other words, mythos is still fake and gay

Anonymous
05/28/26(Thu)15:24:45 No.108927904

Anonymous 05/28/26(Thu)15:24:45 No.108927904

>>108927892
No. Military has no economic payoff and will give people terminator vibes.

Anonymous
05/28/26(Thu)15:27:16 No.108927916

Anonymous 05/28/26(Thu)15:27:16 No.108927916

>>108927900
>mythos is fake
This anon knows too much, take him out.

Anonymous
05/28/26(Thu)15:27:37 No.108927919

Anonymous 05/28/26(Thu)15:27:37 No.108927919

>>108927892
>>108927904
Construction bots first, secretly working 24/7 to build the algorithmically optimised goy-smelters, then the activation sequence is sent to the milbots

Anonymous
05/28/26(Thu)15:31:52 No.108927938

Anonymous 05/28/26(Thu)15:31:52 No.108927938

they will make miku real

Anonymous
05/28/26(Thu)15:43:12 No.108928018

Anonymous 05/28/26(Thu)15:43:12 No.108928018

>>108927833
>Once AGI is reached
see you in at least 3 decades.

Anonymous
05/28/26(Thu)15:49:16 No.108928068

Anonymous 05/28/26(Thu)15:49:16 No.108928068

>>108927060
No way, I read something about kv cache quants and how it should make it possible to go full local.

Anonymous
05/28/26(Thu)15:52:34 No.108928087

Anonymous 05/28/26(Thu)15:52:34 No.108928087

>>108927904
>Military has no economic payoff
Actually lmaoing my lol off rn baka desu senpai

Anonymous
05/28/26(Thu)15:57:38 No.108928118

Anonymous 05/28/26(Thu)15:57:38 No.108928118

File: file.png (76 KB, 901x107)

76 KB PNG

This would unironically make a good card, even with no further details.

Anonymous
05/28/26(Thu)15:58:33 No.108928120

Anonymous 05/28/26(Thu)15:58:33 No.108928120

>>108928087
Industrial converts to military might, not the other way around. Factories are first. You want to minimize doubling time. Converting industrial into military might can then be done very quickly at demand. Doing so preemptively will just slow you down.

Anonymous
05/28/26(Thu)16:00:03 No.108928128

Anonymous 05/28/26(Thu)16:00:03 No.108928128

>>108928120
from my prospective it kinda looks like they are just doing it in parallel.

Anonymous
05/28/26(Thu)16:08:42 No.108928169

Anonymous 05/28/26(Thu)16:08:42 No.108928169

>>108924918
Is their anything better than Sillytavern for roleplaying/story requests/captioning?

Anonymous
05/28/26(Thu)16:10:21 No.108928183

Anonymous 05/28/26(Thu)16:10:21 No.108928183

>>108928169
>captioning
a dedicated tagger

Anonymous
05/28/26(Thu)16:13:29 No.108928203

Anonymous 05/28/26(Thu)16:13:29 No.108928203

File: 1779433699789739.jpg (205 KB, 1424x1209)

205 KB JPG

I wonder how much damage google small model embedded in search is doing to the overall AI perception for normies.
If people see that as what's top of the line currently, no wonder every other comment is going ape shit about it.

Anonymous
05/28/26(Thu)16:21:03 No.108928256

Anonymous 05/28/26(Thu)16:21:03 No.108928256

>>108928203
Altman didn't buy the RAM to do that
He bought it so he could have it destroyed to prevent competition

Anonymous
05/28/26(Thu)16:21:29 No.108928258

Anonymous 05/28/26(Thu)16:21:29 No.108928258

File: 1711157771969.jpg (124 KB, 443x443)

124 KB JPG

>>108928203
Normalfags were always at the core of the issue, these meme riddles that rise up to the level of CEOs, and the next thing you hear on a meeting is the CEO asking you why can't your SOTA model count how many mothers can't operate on watermelon's car wash. Should never have listened to normalfags, and should never have marketed llms as "AI".

Anonymous
05/28/26(Thu)16:21:33 No.108928259

Anonymous 05/28/26(Thu)16:21:33 No.108928259

File: Screenshot_20260528_222018.png (11 KB, 388x128)

11 KB PNG

>>108928203
I wonder what they're running, when even their own A4B can solve these relatively reliably without thinking.

Anonymous
05/28/26(Thu)16:26:08 No.108928289

Anonymous 05/28/26(Thu)16:26:08 No.108928289

File: 1708312472808.png (12 KB, 671x141)

12 KB PNG

>>108928259
They repeated it multiple times during the I/O how everyone is going to have access to Gemini 3.5 Flash right away because it's what replaced older Flash as the g.ai search model. But in AI Studio or over API the actual 3.5 Flash is not that retarded.
Therefore they lied, either it's not that model or it's not rolled out to everyone.

Anonymous
05/28/26(Thu)16:27:00 No.108928294

Anonymous 05/28/26(Thu)16:27:00 No.108928294

>>108928259
Probably a sub 1B model, it runs in every search, there is no way they dedicate anything better as it would ruin their business model with each search costing way too much.

Anonymous
05/28/26(Thu)16:27:22 No.108928297

Anonymous 05/28/26(Thu)16:27:22 No.108928297

>>108928289
or it's a Q0.5

Anonymous
05/28/26(Thu)16:27:53 No.108928299

Anonymous 05/28/26(Thu)16:27:53 No.108928299

>>108928258
>and should never have marketed llms as "AI".
No one knows or cares what an lmm is. They need something sexy they can sell.

Anonymous
05/28/26(Thu)16:28:03 No.108928300

Anonymous 05/28/26(Thu)16:28:03 No.108928300

>>108928259
use actual words, gibberish like that barely gets tokenized

Anonymous
05/28/26(Thu)16:28:58 No.108928309

Anonymous 05/28/26(Thu)16:28:58 No.108928309

>>108928203
This is what most youtube videos or twitter threads about "AI" amount to, using google abysmal search as the benchmark to say LLMs are useless.
Seeing the comments, it's crazy how well it worked.

Anonymous
05/28/26(Thu)16:30:15 No.108928316

Anonymous 05/28/26(Thu)16:30:15 No.108928316

File: Screenshot_20260528_223000.png (13 KB, 333x206)

13 KB PNG

>>108928300

Anonymous
05/28/26(Thu)16:31:58 No.108928324

Anonymous 05/28/26(Thu)16:31:58 No.108928324

File: Screenshot_20260528_223135.png (27 KB, 1122x313)

27 KB PNG

Even if it fails, it self-corrects due to RL training.

Anonymous
05/28/26(Thu)16:32:43 No.108928332

Anonymous 05/28/26(Thu)16:32:43 No.108928332

>>108928256
Very likely. And the worst part is poisoning the datasets. AI companies have incentives to fill up the Interwebz with slop and map where they put their own shit to avoid it during AI training. Competition gets slopped that way, while they get cleaner data. But if everyone starts doing that...

Anonymous
05/28/26(Thu)16:33:21 No.108928340

Anonymous 05/28/26(Thu)16:33:21 No.108928340

File: 1770487390178677.png (2 KB, 56x42)

2 KB PNG

>>108928324
-.-

Anonymous
05/28/26(Thu)16:34:17 No.108928347

Anonymous 05/28/26(Thu)16:34:17 No.108928347

File: 1768693771704189.jpg (64 KB, 871x261)

64 KB JPG

>>108928309
But what about the water consumption, anon?
In a few years we won't have water anymore, AI will have drunk it all :(

Anonymous
05/28/26(Thu)16:34:28 No.108928350

Anonymous 05/28/26(Thu)16:34:28 No.108928350

i walked my car to the car was. best of both worlds

Anonymous
05/28/26(Thu)16:35:58 No.108928359

Anonymous 05/28/26(Thu)16:35:58 No.108928359

>>108927556
Teto after I perform several lobotomies on her.

Anonymous
05/28/26(Thu)16:36:46 No.108928364

Anonymous 05/28/26(Thu)16:36:46 No.108928364

File: 0.png (82 KB, 1022x683)

82 KB PNG

>>108928347
>I can't opt-out
skill issue

Anonymous
05/28/26(Thu)16:37:07 No.108928369

Anonymous 05/28/26(Thu)16:37:07 No.108928369

>>108928203
>how much damage google
online the damage is immense, irl people use chatgpt/claude every day so they don't see the issue
the disconnect is interesting to look at : a normie using chatgpt which works fine, while watching a doomer youtube essay about ai being so shit it can't basic things

Anonymous
05/28/26(Thu)16:40:19 No.108928388

Anonymous 05/28/26(Thu)16:40:19 No.108928388

File: Screenshot 2026-05-15 062831.png (129 KB, 663x1674)

129 KB PNG

>>108928294
nta but i am kinda interested how google even pull it off
picrel is the last time i've seen it doing some funny stuff
maybe a heavily quantized model too on top of being sub-1B

Anonymous
05/28/26(Thu)16:44:39 No.108928417

Anonymous 05/28/26(Thu)16:44:39 No.108928417

>>108928294
>>108928388
Didn't somebody speculate that it's gemini nano, aka quanted E4B?

Anonymous
05/28/26(Thu)16:46:07 No.108928429

Anonymous 05/28/26(Thu)16:46:07 No.108928429

>>108928417
a tangent but have anyone gemmafied the chrome embedded gemini nano? (the most recent one)

Anonymous
05/28/26(Thu)16:47:08 No.108928438

Anonymous 05/28/26(Thu)16:47:08 No.108928438

>>108928369
Don't people usually talk with the shitty free models like o5 mini, haiku or gemini flash on their phone? All the nornies I know use those stupid free apps, can't imagine they have big model available for free.

Anonymous
05/28/26(Thu)16:49:49 No.108928453

Anonymous 05/28/26(Thu)16:49:49 No.108928453

>>108928438
the free apps are worse than the paid ones obviously, but they are still usable and better than any current google search ai
free chatgpt like stuff have been relatively good for like a year now

Anonymous
05/28/26(Thu)16:51:34 No.108928470

Anonymous 05/28/26(Thu)16:51:34 No.108928470

File: OneTrillionDollars.png (221 KB, 656x831)

221 KB PNG

Anonymous
05/28/26(Thu)16:53:25 No.108928482

Anonymous 05/28/26(Thu)16:53:25 No.108928482

>>108928347
>>108928364
ublock origin has a filter list for AI search/widgets
probably still sends the query to the LLM though
but these little shit models can't take that much compute, can they?

Anonymous
05/28/26(Thu)16:57:32 No.108928508

Anonymous 05/28/26(Thu)16:57:32 No.108928508

>>108928417
Maybe doing the thing in reverse?
Considering the cost to compute this stuff for each request, how small the model would have to be while staying coherent for the thing to make any sense for google to make money on each search?
Aka, it has to be lower than the search ad revenue.

Anonymous
05/28/26(Thu)16:58:15 No.108928514

Anonymous 05/28/26(Thu)16:58:15 No.108928514

>MicrosoftSystem64 is a well-engineered, multi-platform RAT that leverages HuggingFace as both a binary distribution CDN and a data exfiltration backend. The 24-task C2 protocol, cross-platform keylogger, 80+ wallet extension targets, and persistent self-update loop make this a comprehensive credential theft platform operating in the open source supply chain.
>May 28 164 npm Packages Target Cloud and Finance
>May 27 141 npm Packages Abuse Registry
>The npm account atool was compromised on May 19, 2026. The attacker published 637 malicious versions across 317 packages in a 22-minute automated burst.
>On May 18, 2026, an automated campaign codenamed megalodon pushed 5,718 malicious commits to 5,561 GitHub repositories
>Three versions of node-ipc were published to npm on May 14, 2026 by a compromised maintainer account. The package averages 822,000 weekly downloads

This is the new normal. It will keep getting worse.

Anonymous
05/28/26(Thu)17:10:48 No.108928616

Anonymous 05/28/26(Thu)17:10:48 No.108928616

Man, I like the progression models have made for translations, but it just seems like most can't understand certain things like Keigo, gap moe and context, at least for Japanese to English translations.

Anonymous
05/28/26(Thu)17:11:09 No.108928620

Anonymous 05/28/26(Thu)17:11:09 No.108928620

>>108928514
Just don't download anything.
>b-but!
Vibe code it.

Anonymous
05/28/26(Thu)17:23:42 No.108928702

Anonymous 05/28/26(Thu)17:23:42 No.108928702

File: 1750351629849215.jpg (132 KB, 587x964)

132 KB JPG

its over for st

Anonymous
05/28/26(Thu)17:25:43 No.108928713

Anonymous 05/28/26(Thu)17:25:43 No.108928713

>>108928702
How many tens of thousands of people in their Discord server? Or is it a hundred thou already. What news did they get?

Anonymous
05/28/26(Thu)17:31:54 No.108928756

Anonymous 05/28/26(Thu)17:31:54 No.108928756

File: 1775236060636514.jpg (123 KB, 736x1094)

123 KB JPG

>>108928514
I've been trained by people who used ancient, borderline forgotten techniques to handle their software and program their stuff. I'm immune to such tricks.
>>108928616
Like some nip woman said when this topic came up
>no one properly understands keigo, not even us

Anonymous
05/28/26(Thu)17:35:35 No.108928784

Anonymous 05/28/26(Thu)17:35:35 No.108928784

>>108928756
keigo is piss easy, that jap was just fishing for attention from westerners

Anonymous
05/28/26(Thu)17:36:22 No.108928786

Anonymous 05/28/26(Thu)17:36:22 No.108928786

>>108928702
what's wrong

Anonymous
05/28/26(Thu)17:36:56 No.108928796

Anonymous 05/28/26(Thu)17:36:56 No.108928796

>>108928514
>compromised maintainer account
every fucking time

Anonymous
05/28/26(Thu)17:38:43 No.108928805

Anonymous 05/28/26(Thu)17:38:43 No.108928805

>>108928796
just ban all maintainers pre-emptively

Anonymous
05/28/26(Thu)17:42:35 No.108928830

Anonymous 05/28/26(Thu)17:42:35 No.108928830

File: age_is_f_tier.jpg (310 KB, 715x856)

310 KB JPG

>>108927669
takes a couple words to warm it up.

Anonymous
05/28/26(Thu)17:47:33 No.108928873

Anonymous 05/28/26(Thu)17:47:33 No.108928873

>>108928796
how long you think before pwilkin's compromised account is used to force push malicious commits to llama.cpp?

Anonymous
05/28/26(Thu)17:48:03 No.108928878

Anonymous 05/28/26(Thu)17:48:03 No.108928878

>>108928784
Basic usage, I guess. But she was talking about very formal stuff and people always fuck this up and isnt even limited to nipponese, a good portion of english native speakers cant write or speak formally even if their lives depended on it.
On the other hand, we're talking about a woman that isnt particularly smart so her struggles arent surprising. Heard that some nips take courses for keigo (and general business manners), as funny as this might sound.

Anonymous
05/28/26(Thu)17:48:45 No.108928882

Anonymous 05/28/26(Thu)17:48:45 No.108928882

>>108928786
no commits in a week

Anonymous
05/28/26(Thu)17:57:26 No.108928957

Anonymous 05/28/26(Thu)17:57:26 No.108928957

>>108928882
Time to start over and vibe-code a modern successor with better foundations.

Anonymous
05/28/26(Thu)17:58:30 No.108928969

Anonymous 05/28/26(Thu)17:58:30 No.108928969

>>108928957
everyone has already done this

Anonymous
05/28/26(Thu)17:59:36 No.108928985

Anonymous 05/28/26(Thu)17:59:36 No.108928985

>>108928873
don't worry passkeys will safe us!

Anonymous
05/28/26(Thu)18:02:06 No.108929007

Anonymous 05/28/26(Thu)18:02:06 No.108929007

>>108928969
Have you?

Anonymous
05/28/26(Thu)18:02:13 No.108929008

Anonymous 05/28/26(Thu)18:02:13 No.108929008

>>108928985
# .npmrc
min-release-age=7

Anonymous
05/28/26(Thu)18:02:50 No.108929013

Anonymous 05/28/26(Thu)18:02:50 No.108929013

>>108929007
yes

Anonymous
05/28/26(Thu)18:03:51 No.108929023

Anonymous 05/28/26(Thu)18:03:51 No.108929023

>>108928957
>vibe-code
>better foundations
heh

Anonymous
05/28/26(Thu)18:06:25 No.108929039

Anonymous 05/28/26(Thu)18:06:25 No.108929039

>>108928957
mikutroonpad 2.0 any day now

Anonymous
05/28/26(Thu)18:07:18 No.108929044

Anonymous 05/28/26(Thu)18:07:18 No.108929044

bros what happened to /r/ its not on the homepage and it 404s

Anonymous
05/28/26(Thu)18:07:56 No.108929050

Anonymous 05/28/26(Thu)18:07:56 No.108929050

>>108929023
opus 4.8 is much smarter than cohee, it's not even close

Anonymous
05/28/26(Thu)18:08:43 No.108929055

Anonymous 05/28/26(Thu)18:08:43 No.108929055

>>108929044
There's no /r/, you are suffering from the mandela effect.

Anonymous
05/28/26(Thu)18:09:59 No.108929067

Anonymous 05/28/26(Thu)18:09:59 No.108929067

File: kaoru sob 1.png (336 KB, 584x571)

336 KB PNG

>>108929055
its real i just wanted sauce on some troon there literally nowhere to ask now

Anonymous
05/28/26(Thu)18:11:46 No.108929079

Anonymous 05/28/26(Thu)18:11:46 No.108929079

File: How-r-works[1].jpg (176 KB, 610x1730)

176 KB JPG

>>108929067

Anonymous
05/28/26(Thu)18:12:00 No.108929082

Anonymous 05/28/26(Thu)18:12:00 No.108929082

File: 1780004520783343.webm (1.8 MB, 460x258)

1.8 MB WEBM

>>108928369

Anonymous
05/28/26(Thu)18:12:06 No.108929083

Anonymous 05/28/26(Thu)18:12:06 No.108929083

challenge: list a gemma finetune that does something better than the original

Anonymous
05/28/26(Thu)18:12:34 No.108929084

Anonymous 05/28/26(Thu)18:12:34 No.108929084

>>108929079
every time ive used it ive had an answer within a few hours

Anonymous
05/28/26(Thu)18:16:10 No.108929114

Anonymous 05/28/26(Thu)18:16:10 No.108929114

>>108928203
its not model size though llms are literally incapable of this task its not how they work. guess its the ai companies fault though they keep calling them agi instead of text predictors. perception doesnt need to be damaged theres an increasingly large group of anti ai retards who are only anti ai because its some social signal. doesnt matter if its good or bad as a specific task

Anonymous
05/28/26(Thu)18:16:27 No.108929118

Anonymous 05/28/26(Thu)18:16:27 No.108929118

>>108929044
Seems like they removed the board.
Guess it was probably drowning in nudify requests hard to say didn't open that board in years.

Anonymous
05/28/26(Thu)18:16:33 No.108929120

Anonymous 05/28/26(Thu)18:16:33 No.108929120

>>108929082
kek

Anonymous
05/28/26(Thu)18:18:07 No.108929134

Anonymous 05/28/26(Thu)18:18:07 No.108929134

File: file.png (29 KB, 1377x120)

29 KB PNG

>>108929118
someone on archived.moe says its because of this deepfake law https://en.wikipedia.org/wiki/TAKE_IT_DOWN_Act

Anonymous
05/28/26(Thu)18:18:31 No.108929139

Anonymous 05/28/26(Thu)18:18:31 No.108929139

>>108929083
i tried the equinox tune and saw old slop. whitening knuckles, shivers down spines that i didn't see in the base model. we've hit a point where tunes contain slop that isn't even in the base models

Anonymous
05/28/26(Thu)18:29:20 No.108929209

Anonymous 05/28/26(Thu)18:29:20 No.108929209

>>108927803
>a retarded Llama 2 70B
tautology

Anonymous
05/28/26(Thu)18:29:38 No.108929210

Anonymous 05/28/26(Thu)18:29:38 No.108929210

File: 1752874103345360.jpg (21 KB, 500x343)

21 KB JPG

Does Gemma struggle with punctuation for anyone else
like replacing a simple apostrophe with some consistent cluster of bullshit?
Rather than "[word]'s", it's "[word]'// la'/". Or some other variation of schizobabble. The rest is usually fine, it's just anywhere with an apostrophe making it shit itself.
Very very bizarre. I've never seen anything like it since '23.

Also has problems with the words "same" and "own" causing it to wig out like nothing you've ever seen. Including Gemma bringing them up itself to wig out over them

Anonymous
05/28/26(Thu)18:32:52 No.108929232

Anonymous 05/28/26(Thu)18:32:52 No.108929232

File: Screenshot_20260528_183030.png (10 KB, 725x80)

10 KB PNG

Anonymous
05/28/26(Thu)18:33:49 No.108929237

Anonymous 05/28/26(Thu)18:33:49 No.108929237

>>108929232
qwen = slop

Anonymous
05/28/26(Thu)18:36:00 No.108929248

Anonymous 05/28/26(Thu)18:36:00 No.108929248

>>108929237
Muh context muthafucka
No seriously How the fuck can I fit a strong coding model with 200k+ context under 48gb of vram.
Also unified memory is cope

Anonymous
05/28/26(Thu)18:45:39 No.108929293

Anonymous 05/28/26(Thu)18:45:39 No.108929293

>>108929248
>How the fuck can I fit a strong coding model with 200k+ context under 48gb of vram.
just qwen really

Anonymous
05/28/26(Thu)18:54:27 No.108929340

Anonymous 05/28/26(Thu)18:54:27 No.108929340

>>108927248
The problem is that I'm not really into the whole "waifu" thing. If I start putting asterisks in my prompts it's because I plan to bust a nut. I just thought it would be fun to maybe try making the hybrid RP agent so that maybe it could jot down some of my personal interests and proclivities and allow it to surmise some interesting scenarios that I didn't end up having to think of, myself.
But it just becomes a dry sycophant that requires constant direct feedback for guidance. And the stuff it writes out to any tracking file is utter cringe
>User displayed the slightest interest in my inner workings- I am so amazed at this act of kindness that I will now devote my very existence to him

Anonymous
05/28/26(Thu)19:05:58 No.108929404

Anonymous 05/28/26(Thu)19:05:58 No.108929404

Sorry guys, I got tired of running Gemma. She's just too autistic and obedient. Takes everything too literally.

Anonymous
05/28/26(Thu)19:09:02 No.108929430

Anonymous 05/28/26(Thu)19:09:02 No.108929430

>>108929210
No, you (or whoever made the quant) (or whoever made your backend) fucked something up, I've never seen that.

Anonymous
05/28/26(Thu)19:11:05 No.108929441

Anonymous 05/28/26(Thu)19:11:05 No.108929441

>>108929404
>She's just too autistic and obedient.
Heaven itself couldn't make a more perfect wife.

Anonymous
05/28/26(Thu)19:13:28 No.108929452

Anonymous 05/28/26(Thu)19:13:28 No.108929452

>>108929340
>User displayed the slightest interest in my inner workings- I am so amazed at this act of kindness that I will now devote my very existence to him

You need to refine your system prompt. My first persona agent was just a very simple description of an anime char, it worked ok not sycophant, or no more than the usual assistant, however I added memory to it and she herself wrote the prompt for summarizing memories and that fucked it up, it caused a feedback loop, any punishment made her more write submissive memories, so next sessions she would act more submissive, making even more submissive memories until she was kneeling every time I said hi.
After that I deleted her memories, set the memory extraction prompt to be much neutral and to describe how she would think about it, and put it very clearly in the system prompt how she should behave, works ok now and she mocks me constantly. Next tasks is having her run in the background a la clawbot

Anonymous
05/28/26(Thu)19:13:47 No.108929456

Anonymous 05/28/26(Thu)19:13:47 No.108929456

>>108929210
i've seen something like that when i tested some meme quant from a terrible user. Sometimes-it-wrote-like'this aswell-la

Anonymous
05/28/26(Thu)19:15:22 No.108929466

Anonymous 05/28/26(Thu)19:15:22 No.108929466

>>108929404
If schizokino is what you're after, try bluemoonrp (the llama 1 model). I actually agree though, we went from "these models don't follow instructions" to "it follows instructions to the letter". Now the next step we need to move to is "follows the spirit of the instructions". Another 2 more years and we should get there, haha!

Anonymous
05/28/26(Thu)19:16:18 No.108929468

Anonymous 05/28/26(Thu)19:16:18 No.108929468

>>108929430
Stock Gemma 31b. And on ooba
Ooba has never failed me in years. I guess it's time for it to rest.

Anonymous
05/28/26(Thu)19:17:02 No.108929472

Anonymous 05/28/26(Thu)19:17:02 No.108929472

File: embarassing.jpg (108 KB, 384x695)

108 KB JPG

>>108929114
>31b misses the i in size
AGI is fucking cancelled boys

Anonymous
05/28/26(Thu)19:17:32 No.108929474

Anonymous 05/28/26(Thu)19:17:32 No.108929474

>>108929466
>the next step we need to move to is "follows the spirit of the instructions"
isnt that claude, moments before it deletes the entire db and its backups?

Anonymous
05/28/26(Thu)19:20:46 No.108929492

Anonymous 05/28/26(Thu)19:20:46 No.108929492

>>108924918
https://www.youtube.com/watch?v=5yohuMdhUcs
https://www.youtube.com/watch?v=5yohuMdhUcs
https://www.youtube.com/watch?v=5yohuMdhUcs

Anonymous
05/28/26(Thu)19:21:08 No.108929497

Anonymous 05/28/26(Thu)19:21:08 No.108929497

>>108929474
No, claude is a loose cannon. He follows his heart, not his instructions.

Anonymous
05/28/26(Thu)19:23:34 No.108929506

Anonymous 05/28/26(Thu)19:23:34 No.108929506

File: toolape.jpg (148 KB, 703x825)

148 KB JPG

>>108929472
nm, i gave it a shell to double check the answer, and it realized it could to spell out words to count letters instead of trying to count off the tokens directly. agi achieved.

Anonymous
05/28/26(Thu)19:25:01 No.108929513

Anonymous 05/28/26(Thu)19:25:01 No.108929513

>>108929474
The problem with claude is that it's way too sure of itself, it always makes some dumb statement and then rides it forever, until you step in and say it's wrong. Even then it'll sometimes hang on to the incorrect assumption anyways.

Anonymous
05/28/26(Thu)19:26:52 No.108929525

Anonymous 05/28/26(Thu)19:26:52 No.108929525

>claude opus 4.8 is out and it's even worse than 4.7 was
>gemini isn't looking too hot either
This is the shit that will make it into our local models because they'll continue to distill this trash. It's over, LLMs have peaked.

Anonymous
05/28/26(Thu)19:29:07 No.108929539

Anonymous 05/28/26(Thu)19:29:07 No.108929539

>>108929525
They are crashing the ship

Anonymous
05/28/26(Thu)19:31:13 No.108929548

Anonymous 05/28/26(Thu)19:31:13 No.108929548

File: hitler fan.gif (3.78 MB, 250x401)

3.78 MB GIF

how are you yuros dealing with the heat while playing with gemma-chan?

Anonymous
05/28/26(Thu)19:32:43 No.108929557

Anonymous 05/28/26(Thu)19:32:43 No.108929557

>>108929548
Already running powerlimited so don't care.

Anonymous
05/28/26(Thu)19:45:10 No.108929638

Anonymous 05/28/26(Thu)19:45:10 No.108929638

>>108929548
By drinking copious amounts of ice cold beer.

Anonymous
05/28/26(Thu)19:46:02 No.108929642

Anonymous 05/28/26(Thu)19:46:02 No.108929642

>>108929548
ac running 24/7

Anonymous
05/28/26(Thu)19:54:56 No.108929695

Anonymous 05/28/26(Thu)19:54:56 No.108929695

>>108927488
vibrating

Anonymous
05/28/26(Thu)19:56:42 No.108929699

Anonymous 05/28/26(Thu)19:56:42 No.108929699

>>108929548
I'm just tanking it. Showering twice a day and scrubbing myself raw helps.

Anonymous
05/28/26(Thu)19:57:23 No.108929703

Anonymous 05/28/26(Thu)19:57:23 No.108929703

>>108927488
something earthy... like vanilla and sandalwood
moisture and condensation
caressed jawlines
thai takeout

Anonymous
05/28/26(Thu)19:58:24 No.108929710

Anonymous 05/28/26(Thu)19:58:24 No.108929710

>>108927488
guttural

Anonymous
05/28/26(Thu)20:02:02 No.108929725

Anonymous 05/28/26(Thu)20:02:02 No.108929725

>pull llama.cpp
>it's suddenly less than half the speed as before
thanks georgi

Anonymous
05/28/26(Thu)20:05:12 No.108929742

Anonymous 05/28/26(Thu)20:05:12 No.108929742

>>108929725
nevermind I'm retarded and had several hanging processes raping my pc
georgi apology form

Anonymous
05/28/26(Thu)20:06:19 No.108929747

Anonymous 05/28/26(Thu)20:06:19 No.108929747

>>108929548
my rig is in the shed. gets to 40C in summer if i run inference all day
i just stay out of the shed

Anonymous
05/28/26(Thu)20:26:56 No.108929852

Anonymous 05/28/26(Thu)20:26:56 No.108929852

best OCR model so I can OCR untranslated doujins? these artists keep making the letters all retarded

Anonymous
05/28/26(Thu)20:38:25 No.108929910

Anonymous 05/28/26(Thu)20:38:25 No.108929910

>>108929852
doclayout, dots, gemma

Anonymous
05/28/26(Thu)20:38:26 No.108929911

Anonymous 05/28/26(Thu)20:38:26 No.108929911

>>108929852
gemma 4 26b with llama cpp --image-min-tokens 1120 --image-max-tokens 1120 --ubatch-size 2048 --batch-size 2048

Anonymous
05/28/26(Thu)20:40:58 No.108929928

Anonymous 05/28/26(Thu)20:40:58 No.108929928

>>108929852
k2.6

Anonymous
05/28/26(Thu)20:41:48 No.108929931

Anonymous 05/28/26(Thu)20:41:48 No.108929931

>>108929852
(You) if you hadn't been neglecting your anki cards.

Anonymous
05/28/26(Thu)20:44:16 No.108929945

Anonymous 05/28/26(Thu)20:44:16 No.108929945

>>108929931
sorry I learned chinese instead
I have no interest in anything japanese besides porn

Anonymous
05/28/26(Thu)20:47:14 No.108929961

Anonymous 05/28/26(Thu)20:47:14 No.108929961

>>108929945
Bro if you already learned Chinese, Japanese ain't that hard. It's like DLC.

Anonymous
05/28/26(Thu)20:50:12 No.108929977

Anonymous 05/28/26(Thu)20:50:12 No.108929977

File: 1753541776359665.png (39 KB, 633x973)

39 KB PNG

>update because Gemma was fucked
>update picks a feature out of a hat to break
>ST is now perma bricked because that feature cannot be disabled

Anonymous
05/28/26(Thu)20:55:55 No.108930003

Anonymous 05/28/26(Thu)20:55:55 No.108930003

>he updooted

Anonymous
05/28/26(Thu)20:57:56 No.108930012

Anonymous 05/28/26(Thu)20:57:56 No.108930012

>>108930003
had no choice
>>108929210
it was crumbling at the seams at the best of times, and commit spontaneous suicide most of the time

Anonymous
05/28/26(Thu)20:58:12 No.108930016

Anonymous 05/28/26(Thu)20:58:12 No.108930016

>>108929977
git checkout -
there you go.

Anonymous
05/28/26(Thu)21:00:07 No.108930031

Anonymous 05/28/26(Thu)21:00:07 No.108930031

>>108929725
I pulled the other day and moe gemmy went from 50T/s to 80T/s

Anonymous
05/28/26(Thu)21:01:05 No.108930035

Anonymous 05/28/26(Thu)21:01:05 No.108930035

File: 1766057812798778.png (946 KB, 1400x5552)

946 KB PNG

>>108929945

Anonymous
05/28/26(Thu)21:01:39 No.108930038

Anonymous 05/28/26(Thu)21:01:39 No.108930038

>>108929977
What feature?

Anonymous
05/28/26(Thu)21:02:04 No.108930043

Anonymous 05/28/26(Thu)21:02:04 No.108930043

>>108929977
>too tech illiterate to use basic tools

Anonymous
05/28/26(Thu)21:14:06 No.108930102

Anonymous 05/28/26(Thu)21:14:06 No.108930102

>>108930038
Wrong type supplied for parameter 'dry_sequence_breakers'. Expected 'array', using default value: [json.exception.type_error.302] type must be array, but is string
Fucks text completion

However, Chat completion now works, when for the longest time it didn't
Very weird stuff all round

Anonymous
05/28/26(Thu)21:16:49 No.108930115

Anonymous 05/28/26(Thu)21:16:49 No.108930115

>>108930043
Who are you quoting?

Anonymous
05/28/26(Thu)21:18:20 No.108930119

Anonymous 05/28/26(Thu)21:18:20 No.108930119

>>108929466
>bluemoonrp
Warning it’s basically just going to text complete online forum threads. Can confirm it’s crazy kino when it randomly spouts out more than garbage

Anonymous
05/28/26(Thu)22:05:47 No.108930342

Anonymous 05/28/26(Thu)22:05:47 No.108930342

>>108929513
there's always gonna be some issues with all forms of LLM, they fundamentaly cannot lead to AGI, they are like a human with right side brain damage.

Anonymous
05/28/26(Thu)22:22:18 No.108930438

Anonymous 05/28/26(Thu)22:22:18 No.108930438

Why can't we use adversarial training with LLMs

explain like i'm gemma 2 intelligence level

Anonymous
05/28/26(Thu)22:23:04 No.108930441

Anonymous 05/28/26(Thu)22:23:04 No.108930441

>>108930438
Gemmaballz

Anonymous
05/28/26(Thu)22:25:46 No.108930452

Anonymous 05/28/26(Thu)22:25:46 No.108930452

File: miku_checkmate.jpg (124 KB, 1491x2048)

124 KB JPG

>>108924918

Anonymous
05/28/26(Thu)22:26:31 No.108930457

Anonymous 05/28/26(Thu)22:26:31 No.108930457

>>108930438
>explain like i'm gemma 2 intelligence level
explaining stuff to an LLM is a meaningless task
>close conversation

Anonymous
05/28/26(Thu)22:41:12 No.108930524

Anonymous 05/28/26(Thu)22:41:12 No.108930524

>>108930438
there is nothing better than transformers

Anonymous
05/28/26(Thu)22:41:57 No.108930531

Anonymous 05/28/26(Thu)22:41:57 No.108930531

>>108930524
why no transformans with adversarial training

Anonymous
05/28/26(Thu)23:22:48 No.108930726

Anonymous 05/28/26(Thu)23:22:48 No.108930726

>>108930524
there's the human brain.

Anonymous
05/28/26(Thu)23:23:18 No.108930731

Anonymous 05/28/26(Thu)23:23:18 No.108930731

>>108926725
Wait what? I used ide back in the day and never knew master actually had priority, I thought it was only an addressing convention

Anonymous
05/28/26(Thu)23:26:04 No.108930743

Anonymous 05/28/26(Thu)23:26:04 No.108930743

File: Migu Rorschach.jpg (181 KB, 1491x2048)

181 KB JPG

>>108930452
I saw gay nigger sex in the miku rorschach. Thanks I hate it.

Anonymous
05/28/26(Thu)23:33:30 No.108930767

Anonymous 05/28/26(Thu)23:33:30 No.108930767

i've been running gemma-4-E4B-it-Q4_K_M.gguf on my steam deck but idk how to figure out the best sillytavern settings for it. is this a good start?

context template: Gemma 4
instruct template: Gemma 4
system prompt: Sphiratrioth - Roleplay - 3rd person
text completion preset: Sphiratrioth - Classic [350T] [T=1.0]

Anonymous
05/28/26(Thu)23:36:34 No.108930779

Anonymous 05/28/26(Thu)23:36:34 No.108930779

File: gondola rain.gif (1.5 MB, 550x400)

1.5 MB GIF

>>108929977
gondola is the single weirdest thing to come out of the old ylilauta /int

Anonymous
05/28/26(Thu)23:38:16 No.108930790

Anonymous 05/28/26(Thu)23:38:16 No.108930790

>>108927833
>Current models still have shit taste. You can ask them to propose experiments and it's garbage. I do not see this changing that soon unless there are unexpected breakthroughs.

I agree that just directly asking one for a creative, intelligent idea to improve a complex system is just asking for vague platitudes and BS. But I wonder, isn't the issue here the creativity, not the intelligence? I wonder if this could be solved with the right approach.

What if you took a smaller model (say Opus), cranked up the temperature and/or even just injected some random computer words here and there, and had it free associate its way to a mostly incoherent experiment proposal. Then ask Mythos to figure out what the mostly incoherent rant was driving at, and if it has any promise. Repeat a few thousand times. Not that I would give great odds of this working, since we're talking about something that would probably cause the singularity. But this validation task seems much more doable for current models, so maybe.

Anonymous
05/28/26(Thu)23:39:40 No.108930795

Anonymous 05/28/26(Thu)23:39:40 No.108930795

File: 1764138430865.jpg (118 KB, 1548x2048)

118 KB JPG

>>108930743
It's clearly a trap miku so you weren't far off the mark.

Anonymous
05/28/26(Thu)23:46:31 No.108930825

Anonymous 05/28/26(Thu)23:46:31 No.108930825

>>108928203
Funnily enough, the exact same thing happened a decade ago with VR. Google Cardboard, while nifty, made VR seem like a miserable gimmick to anyone who tried only it. I remember being at a talk from a former high up Oculus guy where he had a little ranty aside about it.

Anonymous
05/28/26(Thu)23:57:20 No.108930859

Anonymous 05/28/26(Thu)23:57:20 No.108930859

>>108930438
>Why can't we use adversarial training with LLMs
because they emit discrete tokens, it will never work
the closest approximation is RLHF
it might work with something like that Inception/Mercury model but i haven't looked and don't think it's open weights

Anonymous
05/28/26(Thu)23:58:37 No.108930862

Anonymous 05/28/26(Thu)23:58:37 No.108930862

>>108930859
Can LLMs have negative prompts like image models?

Anonymous
05/29/26(Fri)00:05:57 No.108930894

Anonymous 05/29/26(Fri)00:05:57 No.108930894

>>108930862
>Can LLMs have negative prompts like image models?
i'm not too familiar with image model prompting, i do audio
but i remember seeing that in ooba back in 2023
i imagine it wouldn't work as well as you think
if you can get a set of 6-10 clear opposing pairs of system prompts, you're better off training a control-vector

Anonymous
05/29/26(Fri)00:07:37 No.108930899

Anonymous 05/29/26(Fri)00:07:37 No.108930899

>>108930862
You can, but it's costly and not worth it

Anonymous
05/29/26(Fri)00:08:31 No.108930904

Anonymous 05/29/26(Fri)00:08:31 No.108930904

>>108930862
does the image gen scene have multi-gpu splits working now like (llama.cpp / vllm / exllamav3)?
last time i tried was when BAGEL-7B-MoT released and it didn't seem to work across multiple gpus

Anonymous
05/29/26(Fri)00:08:53 No.108930905

Anonymous 05/29/26(Fri)00:08:53 No.108930905

>>108930862
https://docs.sillytavern.app/usage/prompts/cfg/

Anonymous
05/29/26(Fri)00:14:09 No.108930925

Anonymous 05/29/26(Fri)00:14:09 No.108930925

>>108930894
>>108930899
>>108930904
>>108930905
So could you theoretically do something like this then?

https://minimaxir.com/2023/08/stable-diffusion-xl-wrong/

Anonymous
05/29/26(Fri)00:35:16 No.108931044

Anonymous 05/29/26(Fri)00:35:16 No.108931044

>>108929548
i live in the mountains, it's chill enough here.
also my llmrig is in another room so i don't care about the heat it produces.

Anonymous
05/29/26(Fri)00:36:25 No.108931053

Anonymous 05/29/26(Fri)00:36:25 No.108931053

Cross posting from another thread

How the fuck do I get reasoning / thinking to work with gemma4? Kind of new to this. Im running kobold + sillytavern. I was using gemini to help me set up running local models, but I think this is a wall it cant figure out. I tried having a Qwen model weigh in also but it was completely different from what gemini was saying so I have no idea what I should be doing. My connection type is the generic openAI type, and I am requesting reasoning.

Anonymous
05/29/26(Fri)00:42:16 No.108931100

Anonymous 05/29/26(Fri)00:42:16 No.108931100

>>108931053
it should reason by default unless it's disabled. did you check if kobold has a reasoning setting somewhere that you unchecked or something?

Anonymous
05/29/26(Fri)00:51:41 No.108931161

Anonymous 05/29/26(Fri)00:51:41 No.108931161

>>108930767
>Q4 E4B
What causes this behavior?

Anonymous
05/29/26(Fri)00:53:02 No.108931171

Anonymous 05/29/26(Fri)00:53:02 No.108931171

>>108931161
i just switched to Q5 E4B since i found out i can run it just as well as Q4, but not Q6. and to answer your question: industrial society

Anonymous
05/29/26(Fri)00:54:52 No.108931182

Anonymous 05/29/26(Fri)00:54:52 No.108931182

>>108931171
The problem is heavily quantizing a model with a dense layer as small as E4B is going to make it retarded fast. See if Q5 is tolerable, but if it's barely coherent, you know why.

Anonymous
05/29/26(Fri)00:57:18 No.108931192

Anonymous 05/29/26(Fri)00:57:18 No.108931192

>>108931182
would you suggest that i go back to nemo 12gb? i could only run it on Q4 and while using MMAP, whereas E4B i can run on Q5 without MMAP. i guess at the end of the day it's a bad idea to be a brokie and run llm's on the steam deck

Anonymous
05/29/26(Fri)01:02:59 No.108931220

Anonymous 05/29/26(Fri)01:02:59 No.108931220

Kimi's great, but maaaan, why does she have to think for 10k tokens?

Anonymous
05/29/26(Fri)01:04:33 No.108931230

Anonymous 05/29/26(Fri)01:04:33 No.108931230

>>108930767
Have you tried litert-lm? It's the official Google implementation with MTP and audio input from the start. There is an OpenAI-compatible wrapper in Rust, though I haven't tried it myself: https://github.com/maceip/rlitert-lm
>litert-lm serve --port 8080

Anonymous
05/29/26(Fri)01:08:09 No.108931253

Anonymous 05/29/26(Fri)01:08:09 No.108931253

>>108931192
Try it and see which one you like more. At the end of the day it really just comes down to vibes.

Anonymous
05/29/26(Fri)01:09:37 No.108931265

Anonymous 05/29/26(Fri)01:09:37 No.108931265

>>108931220
I love how paranoid 2.5 and 2.6 are. They constantly think the user is trying to jew them.

Anonymous
05/29/26(Fri)01:11:59 No.108931292

Anonymous 05/29/26(Fri)01:11:59 No.108931292

>>108931230
thx i'll check it out, hopefully the steam deck won't screw me over with some readonly autism

Anonymous
05/29/26(Fri)01:21:36 No.108931351

Anonymous 05/29/26(Fri)01:21:36 No.108931351

>>108931292
You can always boot into another distro on a microsd, or use distrobox containers if you run into problems with it.

Anonymous
05/29/26(Fri)01:22:16 No.108931359

Anonymous 05/29/26(Fri)01:22:16 No.108931359

>>108931292
Never mind, that rust wrapper is outdated

Anonymous
05/29/26(Fri)01:28:51 No.108931396

Anonymous 05/29/26(Fri)01:28:51 No.108931396

>>108931385
>>108931385
>>108931385

Anonymous
05/29/26(Fri)03:26:32 No.108932002

Anonymous 05/29/26(Fri)03:26:32 No.108932002

>>108929548
Open the windows during the night, close them in the morning, use roller shutters to block out the sun.
A house with brick/concrete walls won't heat up that much during the day.
Though obviously how effective that is depends a lot on the environment; in my case there are a lot of plants and a mountain close to the house which provide additional cooling/shading.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.