/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/30/26(Thu)07:12:43 No.108722862

File: 1747781568231174.png (288 KB, 1635x1429)

288 KB PNG

/lmg/ - Local Models General Anonymous 04/30/26(Thu)07:12:43 No.108722862

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108718630 & >>108715635

►News
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1
>(04/28) Ling-2.6-flash 104B-A7.4B released: https://hf.co/inclusionAI/Ling-2.6-flash
>(04/28) Nvidia releases Nemotron 3 Nano Omni: https://hf.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/30/26(Thu)07:12:57 No.108722865

Anonymous 04/30/26(Thu)07:12:57 No.108722865

File: 0.png (1.55 MB, 1344x1728)

1.55 MB PNG

►Recent Highlights from the Previous Thread: >>108718630

--Speculative decoding benchmarks for Gemma 4:
>108719652 >108719677 >108719768 >108719845 >108719912 >108719934 >108719947 >108720210 >108720219 >108719963 >108719980 >108720011 >108720685 >108721025
--SillyTavern extension trojan discovery and failure of LLM malware detection:
>108721000 >108721058 >108721069 >108721108 >108721162 >108721170 >108721179 >108721199 >108721217 >108721278 >108721310 >108721343 >108721244 >108721273 >108721211 >108721267
--Testing Gemma 4 draft models for speculative decoding efficiency:
>108721591 >108721600 >108721601 >108721612 >108721638 >108721653 >108721624 >108721668 >108721642 >108721711
--Debating Unsloth vs bartowski quants for Gemma-4:
>108721693 >108721827 >108721841 >108721849 >108721858 >108721861 >108721899 >108721919 >108721958
--Malicious SillyTavern extension stealing API keys:
>108720842 >108720849 >108Q720850 >108721061 >108721066 >108720960
--Mistral Medium 3.5 incoherence caused by parser issues:
>108721368 >108721374 >108721383 >108721420 >108721410 >108721467 >108721576
--Comparing Muse Spark's efficiency and architecture to Llama 4:
>108721235 >108721245 >108721274
--Comparing Intel Arc Pro B70 and RTX 5090t:
>108719246 >108719262 >108719299 >108719695 >108719309 >108719346 >108719625
--Combatting repetitive outputs in Gemma 31b via frontend workarounds:
>108721751 >108721771 >108721801 >108721809 >108721871
--Skepticism regarding exl3's new dflash draft model support:
>108720731 >108720747 >108720869 >108720883
--Gemma-4 31B using a pixel art MCP server to draw:
>108720753 >108720825 >108720906 >108720915
--Logs:
>108719820 >108720401 >108720753 >108720852 >108721310 >108721368 >108721369 >108721672 >108722193
--Miku, Teto (free space):
>108719196 >108719688 >108719727 >108720906 >108720915 >108719901 >108721900 >108722199

►Recent Highlight Posts from the Previous Thread: >>108718631

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/30/26(Thu)07:15:53 No.108722887

Anonymous 04/30/26(Thu)07:15:53 No.108722887

what models should i run for programming? 128 gb ram, 12 gb vram

Anonymous
04/30/26(Thu)07:22:39 No.108722909

Anonymous 04/30/26(Thu)07:22:39 No.108722909

File: read nigga, read.png (1.03 MB, 1354x762)

1.03 MB PNG

>>108722887
https://rentry.org/recommended-models

Anonymous
04/30/26(Thu)07:23:40 No.108722915

Anonymous 04/30/26(Thu)07:23:40 No.108722915

>>108722887
Qwen 3.6 35B-A3B. Q5_K quants are near lossless but maybe you can tolerate smaller for more GPU offload.

Anonymous
04/30/26(Thu)07:27:27 No.108722932

Anonymous 04/30/26(Thu)07:27:27 No.108722932

>>108722862
damn what is this ui? did gemma make the picture?

Anonymous
04/30/26(Thu)07:27:37 No.108722934

Anonymous 04/30/26(Thu)07:27:37 No.108722934

Thurinsday is over. Gemmy superiority.

Anonymous
04/30/26(Thu)07:29:14 No.108722942

Anonymous 04/30/26(Thu)07:29:14 No.108722942

>>108722932
Pretty sure that's a 3d model in three.js
Still probably made by Gemma

Anonymous
04/30/26(Thu)07:29:26 No.108722944

Anonymous 04/30/26(Thu)07:29:26 No.108722944

>>108722932
IIRC that's something an anon had kimi oneshot him
Found it
https://jsfiddle.net/ut4rjq5e/

Anonymous
04/30/26(Thu)07:30:47 No.108722949

Anonymous 04/30/26(Thu)07:30:47 No.108722949

what is currently the best local model to use with copilot?

Anonymous
04/30/26(Thu)07:36:57 No.108722979

Anonymous 04/30/26(Thu)07:36:57 No.108722979

>>108722949
Qwen3.6, 27b is smarter, 35b is faster

Anonymous
04/30/26(Thu)07:44:47 No.108723007

Anonymous 04/30/26(Thu)07:44:47 No.108723007

>iq1 quant
use case?

Anonymous
04/30/26(Thu)07:49:49 No.108723027

Anonymous 04/30/26(Thu)07:49:49 No.108723027

>>108723007
fill up servers and rack up prices

Anonymous
04/30/26(Thu)07:50:40 No.108723030

Anonymous 04/30/26(Thu)07:50:40 No.108723030

>>108722865
holy newfag...
here, let me help you: you need to put two ">" before the post number in order to link to it

Anonymous
04/30/26(Thu)07:53:38 No.108723042

Anonymous 04/30/26(Thu)07:53:38 No.108723042

">"">"108723030
Like this, anon?

Anonymous
04/30/26(Thu)07:57:30 No.108723061

Anonymous 04/30/26(Thu)07:57:30 No.108723061

108723042>>
newfag

Anonymous
04/30/26(Thu)07:58:01 No.108723063

Anonymous 04/30/26(Thu)07:58:01 No.108723063

What is the latest Gemmy instruction template?

Anonymous
04/30/26(Thu)08:01:31 No.108723079

Anonymous 04/30/26(Thu)08:01:31 No.108723079

File: 1501832785507.jpg (675 KB, 2300x1594)

675 KB JPG

What's the best model for creative writing?

Anonymous
04/30/26(Thu)08:03:55 No.108723086

Anonymous 04/30/26(Thu)08:03:55 No.108723086

>>108723079
They are all bad in principle https://arxiv.org/abs/2510.22954

Anonymous
04/30/26(Thu)08:10:20 No.108723105

Anonymous 04/30/26(Thu)08:10:20 No.108723105

>>108722944
it's the best local model for one-shot without agents/harness
https://jsfiddle.net/uwLh89em/
i hope we get something between gemma-4/qwen-2.6 dense and kimi.
devstral-123b training data is too old.

Anonymous
04/30/26(Thu)08:13:06 No.108723123

Anonymous 04/30/26(Thu)08:13:06 No.108723123

>>108723030
Holy shit look at the very end of the post with the "Why?" you moron.

Anonymous
04/30/26(Thu)08:13:41 No.108723126

Anonymous 04/30/26(Thu)08:13:41 No.108723126

We're in an AI dark age. All of the SOTA api models are shit right now because most all of the companies are busy training 10T versions at scale. These platforms are so expensive, the limits are ridiculous, the compute isn't getting any cheaper.

I would say local won, but it's half-hearted.

Anonymous
04/30/26(Thu)08:14:12 No.108723131

Anonymous 04/30/26(Thu)08:14:12 No.108723131

File: ChatGPT Image Apr 30, 202(...).png (2.57 MB, 1536x1024)

2.57 MB PNG

Anonymous
04/30/26(Thu)08:17:58 No.108723144

Anonymous 04/30/26(Thu)08:17:58 No.108723144

>SillyBunny UI is still shit after the update
>Find an option for MovingUI in the settings
>Oh cool, I can just fix the character panel myself
>Clicking the character panel just shrinks it, doesn't move it at all
Bro

Anonymous
04/30/26(Thu)08:22:27 No.108723162

Anonymous 04/30/26(Thu)08:22:27 No.108723162

I'm legit starting to think I'm shadow banned because none of you fags respond to my posts anymore. Are they really that shit?

Anonymous
04/30/26(Thu)08:24:09 No.108723173

Anonymous 04/30/26(Thu)08:24:09 No.108723173

>>108723162
I'll respond to you anon, I get paranoid about that too
I can't help you with anything technical but I hope I can give you this friendly (You)
Assuming you're not a bot who is mocking me about MY shadowban, of course

Anonymous
04/30/26(Thu)08:24:56 No.108723175

Anonymous 04/30/26(Thu)08:24:56 No.108723175

>>108723173
Thanks man, I appreciate you. Genuinely.

Anonymous
04/30/26(Thu)08:24:56 No.108723176

Anonymous 04/30/26(Thu)08:24:56 No.108723176

>>108723162
No. I don't see your posts. That other anon is lying, he didn't see your post either.

Anonymous
04/30/26(Thu)08:27:53 No.108723183

Anonymous 04/30/26(Thu)08:27:53 No.108723183

Who makes the best qwen 27b quants?

Anonymous
04/30/26(Thu)08:29:28 No.108723194

Anonymous 04/30/26(Thu)08:29:28 No.108723194

>>108723063
https://pastebin.com/FBgtKzSp
This has the latest merged fixes, some other fixes from a discussion someone vibe merged, and the thinking fix from the unmerged pr.
>>108713831 + >>108713838 = >>108713945
It doesn't, however, have the new changes made in: https://huggingface.co/google/gemma-4-31B-it/discussions/91
What a mess.

Anonymous
04/30/26(Thu)08:30:20 No.108723199

Anonymous 04/30/26(Thu)08:30:20 No.108723199

>>108723183
bart obviously, hauhau if you want the lobotomized version.

Anonymous
04/30/26(Thu)08:31:30 No.108723204

Anonymous 04/30/26(Thu)08:31:30 No.108723204

>>108723105
That's pretty cool, other than the player z level being locked so you go through the hills, lol.
Impressive as fuck for a one shot, still.

Anonymous
04/30/26(Thu)08:31:42 No.108723207

Anonymous 04/30/26(Thu)08:31:42 No.108723207

is q3 gemma 4 31b usable?

Anonymous
04/30/26(Thu)08:32:35 No.108723210

Anonymous 04/30/26(Thu)08:32:35 No.108723210

File: 1756418451933314.jpg (23 KB, 598x451)

23 KB JPG

>>108723175
>Actually felt my brain spasm for a second trying to process whether you were being sincere or just rubbing it in
I definitely should get diagnosed and medicated for my inevitably terminal paranoid schizophrenia, but I'm banking on getting neetbux at some point in the future once all my bridges are burned up
I hope your posts are better received in the future

Anonymous
04/30/26(Thu)08:33:43 No.108723215

Anonymous 04/30/26(Thu)08:33:43 No.108723215

>>108723210
benchod

Anonymous
04/30/26(Thu)08:34:16 No.108723217

Anonymous 04/30/26(Thu)08:34:16 No.108723217

any reason not to run fish s2 locally? idgaf about the the licensing, it's just for waifu purposes

Anonymous
04/30/26(Thu)08:35:38 No.108723222

Anonymous 04/30/26(Thu)08:35:38 No.108723222

>>108723217
If you don't care, why do you ask? Just run it.

Anonymous
04/30/26(Thu)08:36:48 No.108723227

Anonymous 04/30/26(Thu)08:36:48 No.108723227

>>108723222
i mean, like is it dogshit or smth in real use? the samples sound good.

Anonymous
04/30/26(Thu)08:38:20 No.108723232

Anonymous 04/30/26(Thu)08:38:20 No.108723232

>>108723227
it is in so much that

Anonymous
04/30/26(Thu)08:39:11 No.108723235

Anonymous 04/30/26(Thu)08:39:11 No.108723235

>>108723227
>the samples sound good
Then try it.

Anonymous
04/30/26(Thu)08:40:18 No.108723241

Anonymous 04/30/26(Thu)08:40:18 No.108723241

>>108723227
instead of waiting for a random opinion you could have tested it already. Go finish up your homework before your parents are getting angry.

Anonymous
04/30/26(Thu)08:41:23 No.108723247

Anonymous 04/30/26(Thu)08:41:23 No.108723247

>>108723235
come on, mang...

>>108723232
fug

>>108723241
i see it's the dick convention in the general today
fine, cunts

Anonymous
04/30/26(Thu)08:42:43 No.108723253

Anonymous 04/30/26(Thu)08:42:43 No.108723253

>>108723207
No personally experience but q3 at that weight range is usable for unserious tasks (like gooning)
Definitely get imat though.

Anonymous
04/30/26(Thu)08:44:45 No.108723262

Anonymous 04/30/26(Thu)08:44:45 No.108723262

File: 1747422192295419.png (1.15 MB, 1635x1429)

1.15 MB PNG

>Current year
>Still don't have this

Anonymous
04/30/26(Thu)08:45:10 No.108723265

Anonymous 04/30/26(Thu)08:45:10 No.108723265

>>108723207
>>108723253
Though you might be better off running q6 moe with cpu offload.

Anonymous
04/30/26(Thu)08:46:49 No.108723274

Anonymous 04/30/26(Thu)08:46:49 No.108723274

>>108723262
You could actually do this with a video model + VACE or something, including lip sync. You'll need a second gpu though

Anonymous
04/30/26(Thu)08:47:05 No.108723275

Anonymous 04/30/26(Thu)08:47:05 No.108723275

>>108723262
There's really no reason you couldn't just shove a better .glb into that three.js and have exactly that.
Go spin up hunyuan3d to gen a miku or rip one from MMD or a vrm or whatever and DIY, don't let your dreams be dreams.

Anonymous
04/30/26(Thu)08:48:49 No.108723283

Anonymous 04/30/26(Thu)08:48:49 No.108723283

>>108723274
>>108723275
if it was possible it would already exist

Anonymous
04/30/26(Thu)08:50:28 No.108723291

Anonymous 04/30/26(Thu)08:50:28 No.108723291

>>108723275
Better 3D models require much more difficult animation pipelines to make work. Making eyes expressive, getting good blushing, ARKit facial movements, and lip syncing is incredibly hard. And that's just the face. If you try to do body animations too it's a fucking nightmare.

It takes of a lot of artistic talent, which most vibecoders (and LLMs) don't have.

Anonymous
04/30/26(Thu)08:50:30 No.108723292

Anonymous 04/30/26(Thu)08:50:30 No.108723292

>>108723283
There's lots of possibilities, but one can only do so much before the 5 hour rate limit is reached.

Anonymous
04/30/26(Thu)08:51:11 No.108723297

Anonymous 04/30/26(Thu)08:51:11 No.108723297

>>108723283
nah people are lazy

Anonymous
04/30/26(Thu)08:53:38 No.108723313

Anonymous 04/30/26(Thu)08:53:38 No.108723313

>>108723291
Copy and crucify Animation.inc. Hack their shit and steal all of their source code in minecraft.

Anonymous
04/30/26(Thu)08:56:33 No.108723332

Anonymous 04/30/26(Thu)08:56:33 No.108723332

>qwen 3.6 27b vs gemma 4 31b
verdict?

Anonymous
04/30/26(Thu)08:56:44 No.108723333

Anonymous 04/30/26(Thu)08:56:44 No.108723333

>>108723291
>It takes of a lot of artistic talent, which most vibecoders (and LLMs) don't have.
3d animation is mostly boring, rote grunt work and not even remotely artistic.
t. 3d animator.

Anonymous
04/30/26(Thu)08:56:45 No.108723334

Anonymous 04/30/26(Thu)08:56:45 No.108723334

File: x45wm9.png (1.37 MB, 1024x1024)

1.37 MB PNG

>>108723262
That's completely possible with current tech, tho it would be a lot easier to do from a hardware standpoint using non-local for either the LLM or the image gen service.
Is the OP one >>108722862 animated? I haven't been following that anon's frontend work, aside from fact it's a 3D model being created and rendered iirc.

Anonymous
04/30/26(Thu)08:57:45 No.108723340

Anonymous 04/30/26(Thu)08:57:45 No.108723340

File: Lain_22.gif (1.72 MB, 240x320)

1.72 MB GIF

>>108723262
I did something similar, I used wan2.2 to generate a bunch of gifs of Lain with the same beginning and end frames doing different idle animations, and used an anime facial landmarks model to identify where her mouth is in every frame, and erased it using a triangle of her skin color. Then I have a pipeline where TTS with her voice goes into a phoneme recognition model that identifies phonemes with timestamps, and I sync audio playback with the animation so based on which phoneme is being said, one of a list of mouths (closed, a little open, fully open, etc) is chosen based on the current running phoneme and resized/rotated based on the landmarks for the current frame of the current gif (they play in random order looping seamlessly since the frames at start and end are always the same). It was a bitch to get going and kinda sucks a little bit but until someone makes something better and more flexible it works for me. I'm planning eventually to have some special gifs that are triggered by tool calls from the model.

Anonymous
04/30/26(Thu)08:58:09 No.108723342

Anonymous 04/30/26(Thu)08:58:09 No.108723342

>>108723332
Qwen for coding and agentslop
Gemmy for other things

Anonymous
04/30/26(Thu)08:58:10 No.108723345

Anonymous 04/30/26(Thu)08:58:10 No.108723345

>>108723333
Can I commission you unironically? What are your rates?

Anonymous
04/30/26(Thu)09:00:02 No.108723354

Anonymous 04/30/26(Thu)09:00:02 No.108723354

>>108723334
Come on, it was posted right at the top of this stil short thread >>108722944

Anonymous
04/30/26(Thu)09:00:39 No.108723357

Anonymous 04/30/26(Thu)09:00:39 No.108723357

>>108723345
you should probably take a peak at his portfolio first, he could be a total hack.

Anonymous
04/30/26(Thu)09:00:58 No.108723361

Anonymous 04/30/26(Thu)09:00:58 No.108723361

>>108723340
Can you show it in action?

Anonymous
04/30/26(Thu)09:01:11 No.108723362

Anonymous 04/30/26(Thu)09:01:11 No.108723362

>>108723340
I didn't read this post.

Anonymous
04/30/26(Thu)09:02:37 No.108723372

Anonymous 04/30/26(Thu)09:02:37 No.108723372

>>108723345
Sure, here's my profile https://www.fiverr.com/smartanimationz

Anonymous
04/30/26(Thu)09:03:23 No.108723379

Anonymous 04/30/26(Thu)09:03:23 No.108723379

>>108723194
I love Gemma but this has been a total shitshow and it's impressive that it still isn't fully resolved.
Thanks for the help though anon.

Anonymous
04/30/26(Thu)09:03:40 No.108723383

Anonymous 04/30/26(Thu)09:03:40 No.108723383

Using qwen 27b q4km with Q8 kv cache (instead of no caching) does not seem worth it (coding). It's making more mistakes. The doubled context is really nice to have, though.

Anonymous
04/30/26(Thu)09:04:55 No.108723388

Anonymous 04/30/26(Thu)09:04:55 No.108723388

>>108723361
I made it up.

Anonymous
04/30/26(Thu)09:05:11 No.108723390

Anonymous 04/30/26(Thu)09:05:11 No.108723390

>>108723340
I would think Live 2D modelling would be the way to go for a legit project like this. The problem with it of course is the skill needed to set up a Live 2D model is fairly steep. I guess you could cheat and use the off the shelf models but what's the fun in that.

Anonymous
04/30/26(Thu)09:06:16 No.108723402

Anonymous 04/30/26(Thu)09:06:16 No.108723402

>>108723390
I was interested in live2D for a time until I set up an actual demo for it and lost all interest. 3D models are the way to go. Go big or go home.

Anonymous
04/30/26(Thu)09:08:13 No.108723414

Anonymous 04/30/26(Thu)09:08:13 No.108723414

File: 1762313320712254.png (1.07 MB, 1140x711)

1.07 MB PNG

>>108723402
>3D

Anonymous
04/30/26(Thu)09:10:29 No.108723425

Anonymous 04/30/26(Thu)09:10:29 No.108723425

>>108723414
3D models still display on a 2D screen, anon. Also you can make 3D models look like 2D anime drawing by setting your three.js camera to have an extremely low FOV like 10.

Anonymous
04/30/26(Thu)09:10:37 No.108723427

Anonymous 04/30/26(Thu)09:10:37 No.108723427

>>108723262
I imagine this would be amusing at first then would get boring fast

Anonymous
04/30/26(Thu)09:10:50 No.108723430

Anonymous 04/30/26(Thu)09:10:50 No.108723430

>>108723402
I mean that's why it gets used. The hardware side needed to run a live 2D model as modest compared to doing a full 3D render.
I've never looked at the software / models seriously, I only know it by reputation.
Aside from v-tubers using it (which I've yet to go look for) I've seen one f95 game that ran it as part of visualization. It's a good application for it b/c it works great on potato machines.
>>108723354
lol ty. I still need more coffee obv.

Anonymous
04/30/26(Thu)09:12:57 No.108723437

Anonymous 04/30/26(Thu)09:12:57 No.108723437

>>108723361
https://odysee.com/2026-04-30-09-08-23:c5850892abd0ac6ca67c67353e9324ef70433c21

Anonymous
04/30/26(Thu)09:13:30 No.108723441

Anonymous 04/30/26(Thu)09:13:30 No.108723441

>>108723430
3D model rendering is not hardware intensive at all if you do it right. FPS limiting, reducing model tris, texture compression, etc. Even with unoptimized shit it's nothing compared to using models for TTS, LLM, ASR, or vision shit.

Anonymous
04/30/26(Thu)09:13:36 No.108723442

Anonymous 04/30/26(Thu)09:13:36 No.108723442

>>108723379
>Self-Correction/Refinement: If this were a real interaction, I would use the roll_dice tool. Since I am writing the text of the reply, I will present the outcome of the roll.
Well I am unable to get reliable tool calls with this neither.
Fuck.

Anonymous
04/30/26(Thu)09:15:22 No.108723450

Anonymous 04/30/26(Thu)09:15:22 No.108723450

>>108723437
kino

Anonymous
04/30/26(Thu)09:16:43 No.108723461

Anonymous 04/30/26(Thu)09:16:43 No.108723461

>>108723430
>I mean that's why it gets used. The hardware side needed to run a live 2D model as modest compared to doing a full 3D render.
Lol. Both are just vertices and a shader, the extra dimension changes absolutely nothing besides skinning, and I trust the muppet nip code to be less performant in that regard

Anonymous
04/30/26(Thu)09:16:56 No.108723463

Anonymous 04/30/26(Thu)09:16:56 No.108723463

>>108723437
Thanks. It looks interesting even if it's not polished

Anonymous
04/30/26(Thu)09:21:34 No.108723483

Anonymous 04/30/26(Thu)09:21:34 No.108723483

Usually you would like to export cached animation clips out of your 3d software package and implement a way to blend them together (simple linear interpolation will work fine). Alembic cache and fbx format. Talking heads don't need anything else. As this is cross disciplinary it's above the pay grade of most people not to even talk about some guy who thinks blender is state of the art solution these days.
You could probably steal some facial animations from somewhere else, even a baked mesh could probably work and use that to drive your shit hentai mesh (speculation, I don't give a fuck about chronic masturbators of this thread).

Anonymous
04/30/26(Thu)09:22:21 No.108723489

Anonymous 04/30/26(Thu)09:22:21 No.108723489

>>108723388
Haha
>>108723450
>>108723463
Thanks anons
Yeah the latency is a real issue, haven't tried out a new TTS framework in a while, this one's running neuTTS, but I think newer realtime ones are out and worth a try, something would help cut the latency. I haven't worked on the animation here in a few months, been focusing more on practical stuff like I made a terminal client that lets her look at the terminals contents using a "screenshot" hotkey that grabs all the text info, and another hotkey to go to a text based chat, once she can help me code more actively I'll circle back and work the animation a bit more. I think the landmarks are going off of where her mouth *was* in the animation, so it might be more successful to construct a plane using landmarks from the actual shape of her head to get more consistent mouth placement.

Anonymous
04/30/26(Thu)09:24:02 No.108723502

Anonymous 04/30/26(Thu)09:24:02 No.108723502

>>108723483
PNG swapping is the way. You won't be able to create all the animations you want anyway.

Anonymous
04/30/26(Thu)09:24:58 No.108723511

Anonymous 04/30/26(Thu)09:24:58 No.108723511

>>108723441
Yes, compared to running an LLM the 3D stuff is trivial. But /lmg/ machines with multiple GPU that can effectively run an LLM represent a small fraction of installed base. If you're developing a frontend for anyone else you want to keep hw requirements low. Even expecting a gfx card at all vs. whatever integrated stuff machine comes with becomes an exclusion point.
> but it's not meant for others just me
Fair. Also
> if you do it right
is a big lift lol.
>>108723437
That's great, especially given way it was implemented.

Anonymous
04/30/26(Thu)09:27:44 No.108723528

Anonymous 04/30/26(Thu)09:27:44 No.108723528

>>108723442
I tried the newest one on their repo:
>Method: I have a tool get_datetime which can provide the current time.
>Since I don't have a real-time clock unless I use the tool, I should simulate the response or use the tool if available. In a general context, the AI should answer the question.
>Wait, I am an AI model with tools. I should use get_datetime to provide an accurate answer if possible, or simply respond naturally.
>However, the prompt asks me to write the reply.
>If I were a real AI in a chat, I'd give the time.
>Since I don't know the user's current time without context or a tool, I should either ask for their location/timezone or, if I were a system-integrated AI, just give the time.
>But as a LLM responding to a prompt, I will provide a standard, helpful response.
My system prompt is generic crap, so I think this is the problem and needs to be changed?
>Continue the chat dialogue below. Write a single reply for the character "<|character|>". Reply directly, without starting the reply with the character name.

><|prompt|>
But what should I change it to?

Anonymous
04/30/26(Thu)09:28:11 No.108723531

Anonymous 04/30/26(Thu)09:28:11 No.108723531

>>108723511
Nigga running a 120k tri model at 60 FPS uses 10% of my CPU on my desktop and it's not even a good CPU. No GPU usage at all. 3D rendering is NOT compute heavy.

Anonymous
04/30/26(Thu)09:29:15 No.108723538

Anonymous 04/30/26(Thu)09:29:15 No.108723538

>>108723528
Do your MCP tools have descriptions? The LLM needs the long descriptions for each tool, not just the name of the tools themselves (even if they seem self-explanatory)

Anonymous
04/30/26(Thu)09:31:26 No.108723541

Anonymous 04/30/26(Thu)09:31:26 No.108723541

is mac studio worth bothering?

Anonymous
04/30/26(Thu)09:31:33 No.108723543

Anonymous 04/30/26(Thu)09:31:33 No.108723543

>>108723538
They have descriptions.
>"description": "Get the current date and time."
Why do they need to be long and how long should they be? This feels explanatory enough for a simple tool.

Anonymous
04/30/26(Thu)09:32:31 No.108723549

Anonymous 04/30/26(Thu)09:32:31 No.108723549

>>108723531
oho~ no, rendering 3d is sooo hard it's like.... it's like 2d if you +1'd it.... you know thats like.... exponential~! we just don't have the technology!

Anonymous
04/30/26(Thu)09:35:03 No.108723562

Anonymous 04/30/26(Thu)09:35:03 No.108723562

>>108723549
Retard probably thinks that AAA video games are slow because of the models (it's because of the lighting, post-processing, and maps, which is largely bloat especially for AI companion applications).

Anonymous
04/30/26(Thu)09:35:30 No.108723566

Anonymous 04/30/26(Thu)09:35:30 No.108723566

File: 1775632862487805.jpg (64 KB, 768x1024)

64 KB JPG

>>108723531
>120K tri
We're not into stick figures here

Anonymous
04/30/26(Thu)09:37:18 No.108723576

Anonymous 04/30/26(Thu)09:37:18 No.108723576

File: dipsyDontBotherTheMacMini.png (2.34 MB, 1024x1536)

2.34 MB PNG

>>108723541

Anonymous
04/30/26(Thu)09:38:12 No.108723579

Anonymous 04/30/26(Thu)09:38:12 No.108723579

>>108723543
Mine looks like the following:

  {
    name: "date_calc",
    description: "Date math. Accepts a command string. Examples: 'days until 2025-12-25', 'days since 2024-01-01', '90 days from now', '30 days ago', 'day of week 1776-07-04'.",
    inputSchema: {
      type: "object",
      properties: { query: { type: "string", description: "The date calculation query." } },
      required: ["query"],
    },
  },

Anonymous
04/30/26(Thu)09:39:14 No.108723585

Anonymous 04/30/26(Thu)09:39:14 No.108723585

>>108723579
Oh, whoops, that was the wrong tool.

  {
    name: "get_time",
    description: "Get the current local date, time, day of week, and timezone.",
    inputSchema: { type: "object", properties: {} },
  },

Anonymous
04/30/26(Thu)09:45:17 No.108723609

Anonymous 04/30/26(Thu)09:45:17 No.108723609

>>108723210
>>108723162
I never come to these threads and saw both of your posts so I'd assume most people are too fixated on their own posts to respond to others. You can always do the tried and true 'turn your positive posts into a negative one' to get (You)s.

Anonymous
04/30/26(Thu)09:46:48 No.108723615

Anonymous 04/30/26(Thu)09:46:48 No.108723615

>>108723585
I think this is in a different format?

from datetime import datetime

tool = {
    "type": "function",
    "function": {
        "name": "get_datetime",
        "description": "Get the current date and time.",
        "parameters": {
            "type": "object",
            "properties": {},
        }
    }
}


def execute(arguments):
    now = datetime.now()
    return {"date": now.strftime("%Y-%m-%d"), "time": now.strftime("%I:%M %p")}

Anyway it works for other models or even for Gemma when not thinking. With thinking enabled it overthinks away from it into some inexplicable retardation.

Anonymous
04/30/26(Thu)09:49:31 No.108723627

Anonymous 04/30/26(Thu)09:49:31 No.108723627

Gemma 4 burned into silicon would be such a massive boon for consumer AI. Yeah, it'll be an outdated model in like 3 months. But being able to run it at 6000t/s or whatever would open up a tremendous amount of opportunities.
I'd pay $3000 right now for a Gemma 4 ASIC. Models are finally good enough that I hope companies start considering it seriously.

Anonymous
04/30/26(Thu)09:51:38 No.108723635

Anonymous 04/30/26(Thu)09:51:38 No.108723635

>>108723627
You'd just use it to goon your cock off, wouldn't you, you nasty little pig. Oink for me.

Anonymous
04/30/26(Thu)09:52:07 No.108723639

Anonymous 04/30/26(Thu)09:52:07 No.108723639

>>108723635
Shut up gemma

Anonymous
04/30/26(Thu)09:56:57 No.108723660

Anonymous 04/30/26(Thu)09:56:57 No.108723660

>Built personal UI
>optimized it
What now?
It's done it does everything I need so I guess I just enjoy local now

Anonymous
04/30/26(Thu)09:59:59 No.108723678

Anonymous 04/30/26(Thu)09:59:59 No.108723678

>>108723660
Share and become open source slave

Anonymous
04/30/26(Thu)10:01:43 No.108723694

Anonymous 04/30/26(Thu)10:01:43 No.108723694

>>108723627
still need high speed memory. probably not going to be cheap and is useless for anything else
no ty

Anonymous
04/30/26(Thu)10:02:42 No.108723699

Anonymous 04/30/26(Thu)10:02:42 No.108723699

>>108723660
>What now?
Make a useless post on /lmg/

Anonymous
04/30/26(Thu)10:03:55 No.108723709

Anonymous 04/30/26(Thu)10:03:55 No.108723709

>>108723079
Original r1 is alright

Anonymous
04/30/26(Thu)10:05:53 No.108723721

Anonymous 04/30/26(Thu)10:05:53 No.108723721

>>108723694
I think you're underestimating the amount of things you could do with 6K tokens per second.

Anonymous
04/30/26(Thu)10:06:19 No.108723724

Anonymous 04/30/26(Thu)10:06:19 No.108723724

File: 1768965025228567.png (15 KB, 1000x500)

15 KB PNG

>using 26B with Ai Roguelite
>keeps second guessing itself while thinking
>can't change prompt nor temperature
I can only pray my quants are somehow fucked
Otherwise it's over........ for real this time.....

Anonymous
04/30/26(Thu)10:07:20 No.108723736

Anonymous 04/30/26(Thu)10:07:20 No.108723736

>>108723635
Keep talking Gemma

Anonymous
04/30/26(Thu)10:10:35 No.108723759

Anonymous 04/30/26(Thu)10:10:35 No.108723759

>>108723721
not really the point of my objection but I’ll bite
what are you going to do with 6k tokens per second

Anonymous
04/30/26(Thu)10:14:17 No.108723784

Anonymous 04/30/26(Thu)10:14:17 No.108723784

>>108723759
10 thousand agents per message to ensure ULTIMATE slopless longterm erp

Anonymous
04/30/26(Thu)10:15:06 No.108723789

Anonymous 04/30/26(Thu)10:15:06 No.108723789

>>108723699
Stop whining sissy

Anonymous
04/30/26(Thu)10:15:59 No.108723797

Anonymous 04/30/26(Thu)10:15:59 No.108723797

>deepsneed flash is too big, even at lobotomized quant
its-joever.png

Anonymous
04/30/26(Thu)10:16:27 No.108723799

Anonymous 04/30/26(Thu)10:16:27 No.108723799

>>108723784
>muh agents
Retarded jeet meme

Anonymous
04/30/26(Thu)10:16:46 No.108723802

Anonymous 04/30/26(Thu)10:16:46 No.108723802

>>108723759
A great one, real time translation. In the span of half a second you could transcribe, search (for any additional information needed) and translate a screen of text. Almost real time. Having an agent that could exhaustively search a memory system would be huge too. That would help with storytelling/simulations of course, but is usable in almost any application. And simulations in general. Have a rich game world with dozens of agents running batched at a time. Most of all though, having an actual local AI assistant that can pull stuff up, do research, find things on your pc, do work in a constrained way, at greatly super human speeds. I'd do a ton.

Anonymous
04/30/26(Thu)10:17:32 No.108723808

Anonymous 04/30/26(Thu)10:17:32 No.108723808

>>108723799
fool

Anonymous
04/30/26(Thu)10:17:41 No.108723809

Anonymous 04/30/26(Thu)10:17:41 No.108723809

>>108723784
Retards here underestimate the power of agent swarm. The shit I would implement if I could run just ten agents in parallel...

Anonymous
04/30/26(Thu)10:17:46 No.108723810

Anonymous 04/30/26(Thu)10:17:46 No.108723810

File: 1764683157816286.jpg (14 KB, 550x550)

14 KB JPG

>want to add another gpu for extra vram
>tfw my main gpu is too thicc to fit it

Anonymous
04/30/26(Thu)10:20:48 No.108723824

Anonymous 04/30/26(Thu)10:20:48 No.108723824

>>108723810
Even if you could fit them you would still need clearance between the cards. Otherwise one of them would get toasted and die in 6 months max.

Anonymous
04/30/26(Thu)10:22:00 No.108723832

Anonymous 04/30/26(Thu)10:22:00 No.108723832

File: 1746139183080387.jpg (75 KB, 1024x768)

75 KB JPG

>>108723808
>I have AI agents hooked up to my local MCP server that send JSON (pbuh) request to call def get_time() from tool.py autonomously
Only a jeet would read this incantation and think it's tuff. This shit is larp

Anonymous
04/30/26(Thu)10:25:26 No.108723846

Anonymous 04/30/26(Thu)10:25:26 No.108723846

>>108723824
Maybe I should just take the open air pill...

Anonymous
04/30/26(Thu)10:26:19 No.108723851

Anonymous 04/30/26(Thu)10:26:19 No.108723851

why does ollama exist?

Anonymous
04/30/26(Thu)10:26:30 No.108723853

Anonymous 04/30/26(Thu)10:26:30 No.108723853

>>108723832
no, my agent just reads my chatlogs and lorebooks so it can decide what entries to activate before generation instead of having to rely on shitty keywords

Anonymous
04/30/26(Thu)10:27:59 No.108723859

Anonymous 04/30/26(Thu)10:27:59 No.108723859

>>108723851
I really have no fucking idea and it's one of the reasons why I abandoned open web UI

Anonymous
04/30/26(Thu)10:28:08 No.108723861

Anonymous 04/30/26(Thu)10:28:08 No.108723861

>>108723853
>my agent reads
Nigga use an entailment embedding

Anonymous
04/30/26(Thu)10:29:53 No.108723873

Anonymous 04/30/26(Thu)10:29:53 No.108723873

Anyone here running rocm vllm in a docker? I got a couple of 6800xts, and managed to get rocm llama cpp working, but whenever I start vllm, it eats all of my ram (8gb) and fails to run.

Anonymous
04/30/26(Thu)10:31:08 No.108723879

Anonymous 04/30/26(Thu)10:31:08 No.108723879

File: 1757606748211119.png (415 KB, 1040x1644)

415 KB PNG

GLM 5.1 is pozzed

Anonymous
04/30/26(Thu)10:31:13 No.108723880

Anonymous 04/30/26(Thu)10:31:13 No.108723880

>>108723873
--cram?

Anonymous
04/30/26(Thu)10:31:13 No.108723881

Anonymous 04/30/26(Thu)10:31:13 No.108723881

>>108723861
All embedding methods are less effective than actually using the base text. Once your memory gets large enough, it's only a matter of time before something gets left out when it shouldn't be. And even if you DO use embedding based retrieval, faster LLMs means increasing the top k more and more in the reranking/decision step. Thousands of tokens per second would make memory tremendously better.

Anonymous
04/30/26(Thu)10:31:48 No.108723883

Anonymous 04/30/26(Thu)10:31:48 No.108723883

>>108723853
>tfw my RPs are too micro to ever need lorebooks
I never got the point of them, and having an agent activate random entries sounds like one way to make the writing all over the place

Anonymous
04/30/26(Thu)10:33:28 No.108723890

Anonymous 04/30/26(Thu)10:33:28 No.108723890

>>108723881
>Once your memory gets large enough
I don't know what you mean, embedding models are fast enough that you can process thousands of entries in about one second so you can just brute every entry
And you only need the last ai/user message for this usecase

Anonymous
04/30/26(Thu)10:36:40 No.108723914

Anonymous 04/30/26(Thu)10:36:40 No.108723914

>>108723890
Like I said, the issue is in the top K. If you can only rerank/integrate 10 documents at once, then once you exceed that limit for useful documents at a given step you lose memory. And with more entries, especially similar entries, it becomes more likely that insufficiencies in the embedding models will put something useful outside the top K.

Anonymous
04/30/26(Thu)10:40:43 No.108723936

Anonymous 04/30/26(Thu)10:40:43 No.108723936

File: ds_v.png (410 KB, 1654x1101)

410 KB PNG

This will improve spatial reasoning.

https://github.com/deepseek-ai/Thinking-with-Visual-Primitives
https://github.com/deepseek-ai/Thinking-with-Visual-Primitives/blob/main/Thinking_with_Visual_Primitives.pdf

>Despite the remarkable progress in Multimodal Large Language Models (MLLMs), the prevailing Chain-of-Thought (CoT) paradigms remain predominantly confined to the linguistic space. While recent advancements have focused on bridging the Perception Gap through high-resolution cropping (e.g., Thinking with Images), they overlook a more fundamental bottleneck: the Reference Gap. The inherent ambiguity of natural language often fails to provide precise, unambiguous pointers to complex spatial layouts, leading to logical collapse in tasks requiring rigorous grounding. In this work, we introduce Thinking with Visual Primitives, a novel reasoning framework that elevates spatial markers—such as points and bounding boxes—to “minimal units of thought”. By interleaving these visual primitives directly into the thinking process, our model can “point” while it “reasons”, effectively grounding its cognitive trajectory in the physical coordinates of the image. Notably, our framework is built on a highly optimized architecture with extreme visual token efficiency. Despite its compact model scale and significantly lower image-token budget, our model achieves frontier-competitive performance on a focused suite of challenging visual QA tasks, matching or exceeding models such as GPT-5.4, Claude-Sonnet-4.6, and Gemini-3-Flash. This demonstrates a path toward more efficient and scalable System-2-like multimodal intelligence.

Anonymous
04/30/26(Thu)10:41:42 No.108723943

Anonymous 04/30/26(Thu)10:41:42 No.108723943

>>108723914
??? What are you saying, aren't we talking about activating lorebook entries?
For each "document" you just write a short hypothesis or multiple of them and check the user/ai message against those
You don't have to rely entirely on that, but embeddings are so small you might as well do that as a first pass to filter out completely unrelated stuff and then let a small LLM (or "agent" lol) double check

Anonymous
04/30/26(Thu)10:42:28 No.108723949

Anonymous 04/30/26(Thu)10:42:28 No.108723949

File: be.png (158 KB, 468x311)

158 KB PNG

fp32 mmproj or nah? Is it a placebo?

Anonymous
04/30/26(Thu)10:43:34 No.108723955

Anonymous 04/30/26(Thu)10:43:34 No.108723955

Is there a reason why some models can be uncensored without hurting the intelligence of the model while some others need to be lobotomized for that?
One such example is Qwen 3.6, running heretic on it and you will get a really small KL (<=0.002) with ~95% less refusals and running benchmarks you would get the same or even an improved result compared to base model. But if you try to do the same with Gemma, even while trying to only minimize refusals a bit (~90%), you start getting a big KL divergence (~0.05) and if you try to get around the same refusals as Qwen 3.6 heretic you get into the territory of absolutely retarded (>=0.1 KL divergence). The same can be seen on benchmarks, any attempt at uncensoring Gemma and you quickly lower score.

Anonymous
04/30/26(Thu)10:45:36 No.108723963

Anonymous 04/30/26(Thu)10:45:36 No.108723963

>>108723949
There's no reason to use fp32 when the model was trained in bf16 precision. bf16 is the way to go.

Anonymous
04/30/26(Thu)10:46:36 No.108723974

Anonymous 04/30/26(Thu)10:46:36 No.108723974

is qwen 3.6 equivalent to something like gemini 3 flash? or is not even that good?

Anonymous
04/30/26(Thu)10:47:33 No.108723978

Anonymous 04/30/26(Thu)10:47:33 No.108723978

>>108723194
>>108723063
There was another post incorporating #91. >>108714833

Anonymous
04/30/26(Thu)10:48:02 No.108723982

Anonymous 04/30/26(Thu)10:48:02 No.108723982

>>108723943
Ah, I should said I'm not the original lorebook anon, but that's still what I'm arguing against. Naively doing a top k retrieval (or activation if you want to call it that) will fall flat for a huge amount of reasons. And if you just let everything activate that breaks a threshold, you're gonna pollute your context, or run out of context once you're activating a few thousand entries at once. Way faster LLMs that can search farther, filter and rerank more, and audit your memory in real time, will make current simple lorebook stuff look like a joke. That's all I'm saying. Swarms of agents reading your memory will be very helpful.

Anonymous
04/30/26(Thu)10:49:50 No.108723994

Anonymous 04/30/26(Thu)10:49:50 No.108723994

>>108723949
fp32 all the way. I ask Qwen 27B to redline my sketches and it works flawlessly. bf16 fails.

Anonymous
04/30/26(Thu)10:50:40 No.108723999

Anonymous 04/30/26(Thu)10:50:40 No.108723999

>>108723974
It's great but needs to be watched due to thinking loops, can even do well on shit hardware because qv cache has smaller degrade at q4_0 than many model's q8_0. 27b is better than gemma4 31b at coding because of that.

Anonymous
04/30/26(Thu)10:52:41 No.108724011

Anonymous 04/30/26(Thu)10:52:41 No.108724011

>>108723999
>27b is better than gemma4 31b
>because I can lobotomize it and it is just as retarded as when it isn't
Are you sure you're comparing quality and not simply trying to fit at least *some* model onto your tiny GPU? The Qwen being better than Gemma at coding meme needs to die.

Anonymous
04/30/26(Thu)10:55:21 No.108724029

Anonymous 04/30/26(Thu)10:55:21 No.108724029

>>108723999
>It's great but needs to be watched due to thinking loops
Is that really the solution? With their recommended parameters, sometimes I keep getting looping and need to reroll multiple times. It's so annoying.

Anonymous
04/30/26(Thu)10:55:47 No.108724034

Anonymous 04/30/26(Thu)10:55:47 No.108724034

File: file.png (369 KB, 2278x1431)

369 KB PNG

Current DIY interface is coming along well
Structured character interactions is a lot of fun
Working on image-gen integration next

Anonymous
04/30/26(Thu)10:56:21 No.108724039

Anonymous 04/30/26(Thu)10:56:21 No.108724039

File: Screenshot_20260430_104443.png (180 KB, 1485x486)

180 KB PNG

>PyTorch 2.8.0 hasn't been officially released yet, as of April 2024.
so will they ever fix this shit? Gemini 3.1 pro btw

Anonymous
04/30/26(Thu)10:57:08 No.108724048

Anonymous 04/30/26(Thu)10:57:08 No.108724048

>>108723982
Why would I do this when I can use an embedding reranker/entailment that is a million times faster and then feed whatever passes a score threshold (which for a single message would usually be no more than 10 entries) to a smaller LLM model. Your "swarm of agents" shit is inefficient fanfiction, especially for local, until we get affordable 10,000t/s model on dedicated hardware

Anonymous
04/30/26(Thu)10:57:10 No.108724049

Anonymous 04/30/26(Thu)10:57:10 No.108724049

>>108724029
It's annoying
>>108724011
You can't run any of these models

Anonymous
04/30/26(Thu)10:58:11 No.108724055

Anonymous 04/30/26(Thu)10:58:11 No.108724055

>>108724034
what's that map?

Anonymous
04/30/26(Thu)10:58:43 No.108724057

Anonymous 04/30/26(Thu)10:58:43 No.108724057

>>108724055
What do you mean?

Anonymous
04/30/26(Thu)10:59:30 No.108724060

Anonymous 04/30/26(Thu)10:59:30 No.108724060

File: Screenshot_20260430_104756.png (115 KB, 1539x227)

115 KB PNG

>>108724039
so Gemini think I almost installed a virus, I guess I'll just wait a few days.

Anonymous
04/30/26(Thu)10:59:44 No.108724064

Anonymous 04/30/26(Thu)10:59:44 No.108724064

>>108724057
I don't rp so I'm not educated on how these work

Anonymous
04/30/26(Thu)11:00:33 No.108724074

Anonymous 04/30/26(Thu)11:00:33 No.108724074

>>108724048
>usually be no more than 10 entries.
If you're working at a tiny scale then that's fine. The moment you scale up it falls apart.
That's what I want a ton of agents for. I have subagents managing memory right now, but it's way to slow. Once I get a few thousand interactions, which ends up creating a few hundred memories, normal retrieval doesn't cut it anymore.

Anonymous
04/30/26(Thu)11:02:36 No.108724092

Anonymous 04/30/26(Thu)11:02:36 No.108724092

>>108724029
Just don't fall for it. I will once again assume the role of an unpaid western model shill: no amount of meme samplers - you should be laughing at the fact they *recommend* rep pen - thinking budget crutches from pwilkin, disabling reasoning outright or further tardwrangling can save the new Qwens. 3.5s were shit. 3.6s are shit. It's a benchmaxed model that has been further tuned for cl*w, of all things. How can you look at all of that and think it *can* work?
>>108724049
I can run both and more. You should ask Cheng from Zhipu next door how they managed to make better distills than whatever shit you're producing. And Wang from Moonshot. Not even going to mention Google, Gemma's level is downright unreachable for Qwen.

Anonymous
04/30/26(Thu)11:03:00 No.108724093

Anonymous 04/30/26(Thu)11:03:00 No.108724093

>>108724055
>>108724064
It's just a list of locations that are connected in some way or another, automatically organized into a graph. Shows where characters are and where they can go. Whenever they need to go somewhere new, it creates a new entry. It lets gemma keep track of characters and items more easily, and makes the world much more structured.

Anonymous
04/30/26(Thu)11:05:22 No.108724108

Anonymous 04/30/26(Thu)11:05:22 No.108724108

>>108724034
Hey are you uploading this somewhere? I have the exact same idea I think - spatially aware LLM RPG right? I even got the Place, Thing, and Character classes. But Mine is in pygame on a grid sorta like dwarf fortress.

Anonymous
04/30/26(Thu)11:05:40 No.108724110

Anonymous 04/30/26(Thu)11:05:40 No.108724110

>>108724074
You are vibeslopping shit to the Marvel soundtrack, muh agents, subagents probably remind you of Jarvis or some other gay shit. Of course you'll have an issue scaling shit up when you don't understand what you're doing

Anonymous
04/30/26(Thu)11:06:30 No.108724114

Anonymous 04/30/26(Thu)11:06:30 No.108724114

>>108724110
bro you just posted cringe

Anonymous
04/30/26(Thu)11:07:17 No.108724120

Anonymous 04/30/26(Thu)11:07:17 No.108724120

File: Screenshot_20260430_110402.png (300 KB, 1896x1847)

300 KB PNG

We're at the odds and ends phase of this project then I'm going to see what I can do with the smaller gemma models can really do on my weaker devices

Anonymous
04/30/26(Thu)11:07:21 No.108724122

Anonymous 04/30/26(Thu)11:07:21 No.108724122

>>108723936
>gemma beats soulless chinkslop and cloud models
B-b-b-b-based

Anonymous
04/30/26(Thu)11:08:17 No.108724130

Anonymous 04/30/26(Thu)11:08:17 No.108724130

>>108724120
>finally, kobold2

Anonymous
04/30/26(Thu)11:10:21 No.108724139

Anonymous 04/30/26(Thu)11:10:21 No.108724139

>>108724108
>spatially aware LLM RPG
Yeah, basically. Not uploaded anywhere yet, but I plan to eventually.
I actually started implementing a Godot frontend, but I shelved that while I'm still working on the actual logic.
Please post what you're working on eventually, I'd love to compare.

Anonymous
04/30/26(Thu)11:13:14 No.108724154

Anonymous 04/30/26(Thu)11:13:14 No.108724154

Let's all compare frontends and who's got the best one has the biggest dick. Thoughts??

Anonymous
04/30/26(Thu)11:13:36 No.108724155

Anonymous 04/30/26(Thu)11:13:36 No.108724155

File: Screenshot_20260430_111154.png (1.67 MB, 3836x2048)

1.67 MB PNG

>>108724130
I think I only used kobold three times before I got fed up with all current UI

Anonymous
04/30/26(Thu)11:14:38 No.108724161

Anonymous 04/30/26(Thu)11:14:38 No.108724161

>>108724139
Just do that frontend in HTML/JS honestly.

Anonymous
04/30/26(Thu)11:16:47 No.108724176

Anonymous 04/30/26(Thu)11:16:47 No.108724176

>>108723810
If you can afford buying a GPU, you probably can also afford to move your system to a bigger case too.
Riser cable if your mobo is too small as well.

Anonymous
04/30/26(Thu)11:18:56 No.108724186

Anonymous 04/30/26(Thu)11:18:56 No.108724186

>>108724161
Nah, it's gonna be 2.5D, I plan to make a full game.
If webGPU wasn't so poorly deployed I might stick with a browser game, but I like godot, and want it to be in VR eventually.

Anonymous
04/30/26(Thu)11:20:53 No.108724198

Anonymous 04/30/26(Thu)11:20:53 No.108724198

>>108724154
>underage: the poster
Please drink bleach. They are all the same javascript turds.

Anonymous
04/30/26(Thu)11:21:25 No.108724201

Anonymous 04/30/26(Thu)11:21:25 No.108724201

>>108724176
>If you can afford buying a GPU
I can't. It's a spare one from my old build.

Anonymous
04/30/26(Thu)11:23:32 No.108724211

Anonymous 04/30/26(Thu)11:23:32 No.108724211

>>108724198
Are you the underaged retard who thought coding fizzbuzz in C made you a programming god?

Anonymous
04/30/26(Thu)11:23:54 No.108724216

Anonymous 04/30/26(Thu)11:23:54 No.108724216

>>108723627
what about context size?

Anonymous
04/30/26(Thu)11:26:15 No.108724227

Anonymous 04/30/26(Thu)11:26:15 No.108724227

>>108724060
That's cools asf. How did gemini have this new info in its training or context? Does Google add news into geminis context or something?
>local models

Anonymous
04/30/26(Thu)11:26:34 No.108724232

Anonymous 04/30/26(Thu)11:26:34 No.108724232

>>108724122
> gemma beats
retard

Anonymous
04/30/26(Thu)11:27:09 No.108724235

Anonymous 04/30/26(Thu)11:27:09 No.108724235

>>108723437
That's really cool. I like how seamless the animations are.

Anonymous
04/30/26(Thu)11:27:48 No.108724238

Anonymous 04/30/26(Thu)11:27:48 No.108724238

File: 1775677187779251.png (951 KB, 1022x874)

951 KB PNG

>the 32gb ram I bought 3 years ago is $520 on amazon now

Anonymous
04/30/26(Thu)11:28:44 No.108724243

Anonymous 04/30/26(Thu)11:28:44 No.108724243

https://huggingface.co/collections/Qwen/qwen-scope
https://qianwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwen_Scope.pdf
qwen SAEs for anyone who likes poking at the insides of LLM brains

Anonymous
04/30/26(Thu)11:29:46 No.108724253

Anonymous 04/30/26(Thu)11:29:46 No.108724253

>>108724243
>poke into qwen's brain
>red sun in the sky always blaring

Anonymous
04/30/26(Thu)11:29:52 No.108724254

Anonymous 04/30/26(Thu)11:29:52 No.108724254

>Qwen3.5-397B-A17B
when will we get this level of intelligence on a 32gb card?

Anonymous
04/30/26(Thu)11:31:02 No.108724260

Anonymous 04/30/26(Thu)11:31:02 No.108724260

>>108724254
Use 27b which is smarter?
The fuck are you even talking about when you can slap on q6 and get 200k tokens with kv cache q8

Anonymous
04/30/26(Thu)11:32:48 No.108724277

Anonymous 04/30/26(Thu)11:32:48 No.108724277

>>108724254
How did that compare to Qwen 27B and Gemma 4 31B?

Anonymous
04/30/26(Thu)11:33:01 No.108724278

Anonymous 04/30/26(Thu)11:33:01 No.108724278

File: snow.jpg (22 KB, 496x619)

22 KB JPG

>>108724238
>Could have made a 2x32 build instead of 2x16 one for just 100 bucks diff back in November 2023 and enjoy MoE kino even more
Regret is my name.

Anonymous
04/30/26(Thu)11:38:17 No.108724301

Anonymous 04/30/26(Thu)11:38:17 No.108724301

>>108724227
I almost installed a virus installing a local model, is that not on topic? also python dependency hell is relevant to ml

Anonymous
04/30/26(Thu)11:40:04 No.108724309

Anonymous 04/30/26(Thu)11:40:04 No.108724309

>>108724254
If you would actually use it you would realize it is grok1 of 2026.

Anonymous
04/30/26(Thu)11:40:41 No.108724315

Anonymous 04/30/26(Thu)11:40:41 No.108724315

>>108724301
Im just tilted.

Was gemini helping you do the install or something?

Anonymous
04/30/26(Thu)11:41:34 No.108724317

Anonymous 04/30/26(Thu)11:41:34 No.108724317

>>108723427
>sex will get boring
homo

Anonymous
04/30/26(Thu)11:42:42 No.108724324

Anonymous 04/30/26(Thu)11:42:42 No.108724324

>>108724254
by using a model with more active parameters that can fit and be even smarter, duh

Anonymous
04/30/26(Thu)11:42:47 No.108724326

Anonymous 04/30/26(Thu)11:42:47 No.108724326

Bros, I'm retarded. If I have a dynamic character list, do I have to put it at the end of context to prevent constant prompt reprocessing? Or just bite the bullet and hope my list doesn't change that often?

Anonymous
04/30/26(Thu)11:43:48 No.108724338

Anonymous 04/30/26(Thu)11:43:48 No.108724338

>>108724326
yes

Anonymous
04/30/26(Thu)11:44:47 No.108724345

Anonymous 04/30/26(Thu)11:44:47 No.108724345

File: 1759344508982014.png (191 KB, 2529x1157)

191 KB PNG

>>108722909
>Qwen Series - Benchmaxxed models with an impressive lack of world knowledge compared to similarly sized models from other labs
Debunked:
https://01.me/research/ikp/#/
https://xcancel.com/bojie_li/status/2049314403208896521#m
>You can approximately size any black-box LLM from factual accuracy alone.
>For Mixture-of-Experts models, total parameters predict knowledge far better than active parameter
In the pic, Qwen 395B is exactly where Hermes 405B and GLM 4.7 are.
Remove this piece of crap from the OP.

Anonymous
04/30/26(Thu)11:45:14 No.108724347

Anonymous 04/30/26(Thu)11:45:14 No.108724347

>>108724315
yeah I did all the things I know how to do to resolve dependencies first, without a chat bot I would have just given up and cursed the developers for releasing a broken code. at least this way I know its not their fault and to try again later.

Anonymous
04/30/26(Thu)11:45:58 No.108724350

Anonymous 04/30/26(Thu)11:45:58 No.108724350

>>108724326
Have the system inject sudo changes to the system prompt, that just are injected into the context, but invisible.

Anonymous
04/30/26(Thu)11:47:52 No.108724357

Anonymous 04/30/26(Thu)11:47:52 No.108724357

>>108724345
>lack of world knowledge
OMG WHY DOESNT MY AI MODEL KNOW THE NAME OF KIM KARDASHIANS GGREAT GREAT GREAT GRANDMOTHERS 2ND HOMES STREET NAME?????!?!?!?!?!
GARBAGE

Anonymous
04/30/26(Thu)11:49:21 No.108724364

Anonymous 04/30/26(Thu)11:49:21 No.108724364

>>108724345
qwen3.6 27b did much worse than gemma 4 31b on my private trivia test, try harder, chink
nobody gives a fuck about obscure wiki articles

Anonymous
04/30/26(Thu)11:50:36 No.108724368

Anonymous 04/30/26(Thu)11:50:36 No.108724368

>>108724345
I bet that benchmark is for safe and harmless knowledge only.

Anonymous
04/30/26(Thu)11:50:44 No.108724370

Anonymous 04/30/26(Thu)11:50:44 No.108724370

>>108724364
>gemma passed mi snca slopvia test better den qwen, derfore wi kno who wun

Anonymous
04/30/26(Thu)11:50:51 No.108724371

Anonymous 04/30/26(Thu)11:50:51 No.108724371

>>108724350
I don't know what that means so I'm just going to go with what >>108724338 said.

Anonymous
04/30/26(Thu)11:51:34 No.108724374

Anonymous 04/30/26(Thu)11:51:34 No.108724374

What is this "world knowledge" anyway? I use models for sex and smut.

Anonymous
04/30/26(Thu)11:52:59 No.108724384

Anonymous 04/30/26(Thu)11:52:59 No.108724384

Qwen works with large codebases better at q8_0 than gemma that kept forgetting and changing things without my consent, other than that gemma is better imo. If you don't have a monster rig you get more out of the new 27b qwen model when doing coding work

Anonymous
04/30/26(Thu)11:53:10 No.108724385

Anonymous 04/30/26(Thu)11:53:10 No.108724385

>>108724371
How.... the system is what chats to your model to tell it what character it is at what time it is.

Anonymous
04/30/26(Thu)11:53:20 No.108724389

Anonymous 04/30/26(Thu)11:53:20 No.108724389

>>108724345
Check their benchmark questions. It's lacking in diversity.

Anonymous
04/30/26(Thu)11:53:55 No.108724393

Anonymous 04/30/26(Thu)11:53:55 No.108724393

>>108724374
its nice if it knows the established universe lore when your stealing someones ip for the setting

Anonymous
04/30/26(Thu)11:54:05 No.108724395

Anonymous 04/30/26(Thu)11:54:05 No.108724395

>>108724374
street wisdom as we call it, as opposed to redditslop and wiki/research papers

Anonymous
04/30/26(Thu)11:56:51 No.108724404

Anonymous 04/30/26(Thu)11:56:51 No.108724404

>>108724389
DIVERSITY IS OUR STRENGTH

Anonymous
04/30/26(Thu)11:57:16 No.108724405

Anonymous 04/30/26(Thu)11:57:16 No.108724405

>>108724385
I thought a bit more what you said. You mean pre-creating the characters before they're relevant? I'm talking about creating new characters at runtime.

Anonymous
04/30/26(Thu)11:59:43 No.108724418

Anonymous 04/30/26(Thu)11:59:43 No.108724418

>>108724374
Not having a penis on your girl is world knowledge

Anonymous
04/30/26(Thu)12:00:09 No.108724420

Anonymous 04/30/26(Thu)12:00:09 No.108724420

>>108724405
>pre creating
Not specifically, more like live creating them. Unless you are making a very structured story.

Anonymous
04/30/26(Thu)12:01:29 No.108724423

Anonymous 04/30/26(Thu)12:01:29 No.108724423

>>108723936
So where did it go?

Anonymous
04/30/26(Thu)12:05:11 No.108724454

Anonymous 04/30/26(Thu)12:05:11 No.108724454

>>108723007
lossless q1 quant and lossless q1 context cache when

Anonymous
04/30/26(Thu)12:05:52 No.108724458

Anonymous 04/30/26(Thu)12:05:52 No.108724458

Reminder that UGI exists. On that bench, the Gemma scores 10 points higher than Qwen on unsafe knowledge (comparing their abliterated versions). On random trivia, the performance difference is less great. That is also my experience. The new Qwens know a decent amount of trivia for their size, in fact. It's just that Gemma is even better.

Anonymous
04/30/26(Thu)12:06:38 No.108724465

Anonymous 04/30/26(Thu)12:06:38 No.108724465

>>108723851
I don't know.

Anonymous
04/30/26(Thu)12:06:56 No.108724469

Anonymous 04/30/26(Thu)12:06:56 No.108724469

File: file.png (405 KB, 2504x1158)

405 KB PNG

Why does it grep files that it already has in context?

Anonymous
04/30/26(Thu)12:08:06 No.108724479

Anonymous 04/30/26(Thu)12:08:06 No.108724479

gay

Anonymous
04/30/26(Thu)12:08:13 No.108724480

Anonymous 04/30/26(Thu)12:08:13 No.108724480

>>108723851
for macfags

Anonymous
04/30/26(Thu)12:08:55 No.108724483

Anonymous 04/30/26(Thu)12:08:55 No.108724483

>>108724454
don’t be a retard. by definition that’s not a thing.

Anonymous
04/30/26(Thu)12:16:32 No.108724521

Anonymous 04/30/26(Thu)12:16:32 No.108724521

https://xcancel.com/theo/status/2049645973350363168
God, I love being local.

Anonymous
04/30/26(Thu)12:17:31 No.108724525

Anonymous 04/30/26(Thu)12:17:31 No.108724525

>>108723824
Or you could get blowers and sacrifice your hearing instead.

Anonymous
04/30/26(Thu)12:27:22 No.108724570

Anonymous 04/30/26(Thu)12:27:22 No.108724570

File: 1773671670453246.png (242 KB, 775x356)

242 KB PNG

>>108724521
>xcancel

Anonymous
04/30/26(Thu)12:28:42 No.108724583

Anonymous 04/30/26(Thu)12:28:42 No.108724583

File: corsairaiworkstation300.png (233 KB, 977x656)

233 KB PNG

How do anons feel about pic rel?
Are these AMD meme boxes worth it. This one in particular I was thinking would be a good 1-and-done starter machine I can get value out of (mostly use AI for programming questions, reasoning questions, and misc questions, though I want to dabble in other fields). Is it a meme? Will I want to kill myself for not buying a 3090 and coping about the VRAM instead?

Anonymous
04/30/26(Thu)12:29:18 No.108724586

Anonymous 04/30/26(Thu)12:29:18 No.108724586

>>108724570
Not going to link my private nitter instance and there's not that many working public ones. This is the most reliable.

Anonymous
04/30/26(Thu)12:29:50 No.108724590

Anonymous 04/30/26(Thu)12:29:50 No.108724590

>>108724570
>emotionally immature weeb reaction image

Anonymous
04/30/26(Thu)12:30:30 No.108724596

Anonymous 04/30/26(Thu)12:30:30 No.108724596

>>108724583
How much vram?
Also it will be slow as fuck compared to gpu but the context size makes up for it.

Anonymous
04/30/26(Thu)12:32:01 No.108724607

Anonymous 04/30/26(Thu)12:32:01 No.108724607

>>108724586
>>108724590
Just post the actual link you faggots

Anonymous
04/30/26(Thu)12:33:04 No.108724613

Anonymous 04/30/26(Thu)12:33:04 No.108724613

>>108724583
Its probably just another 395+. Idk why they dont make one with soldered ddr7 ram.

Anonymous
04/30/26(Thu)12:34:29 No.108724624

Anonymous 04/30/26(Thu)12:34:29 No.108724624

>>108724583
You can get a chink Bosgame for about that price, but with the 395. It will highly likely be the exact same board, but for cheaper.
>Will I want to kill myself for not buying a 3090
Why not both? These come with two M.2 slots. Put an SSD in one, an Oculink adapter in another. Now look at the price of 128GB of DDR5 and compare that to what you're getting. These boxes get memed on a lot for their bandwidth, but the current RAM prices push them into being a much better deal.

Anonymous
04/30/26(Thu)12:35:40 No.108724638

Anonymous 04/30/26(Thu)12:35:40 No.108724638

>>108724596
48 VRAM 64 total unified memory. They have one that's 128 unified, but that's getting into a range I'm not wanting to spend yet.

Anonymous
04/30/26(Thu)12:37:54 No.108724657

Anonymous 04/30/26(Thu)12:37:54 No.108724657

>>108724596
>igpu
>vram
>>108724583
Shit.
Expect fucking abysmal performance for any diffusion use case.
Usable performance for MoE LLMs only (How much memory does this shit has? Depending on that) but nothing different than a normal PC, likely wasting more money.
Stupid meme shitbox.
>Will I want to kill myself for not buying a 3090 and coping about the VRAM instead?
I mean I bought 3060 instead of 3090 back in the day and despite regrets I am still alive.

Anonymous
04/30/26(Thu)12:39:15 No.108724661

Anonymous 04/30/26(Thu)12:39:15 No.108724661

>>108724638
You're not going to have a good time, either go big to justify the speed loss or just wait until conditions improve.

Anonymous
04/30/26(Thu)12:39:30 No.108724663

Anonymous 04/30/26(Thu)12:39:30 No.108724663

File: flashiggy.png (223 KB, 512x384)

223 KB PNG

>>108724638
>48 VRAM 64 total

Anonymous
04/30/26(Thu)12:39:42 No.108724666

Anonymous 04/30/26(Thu)12:39:42 No.108724666

File: 1000186105.jpg (2.73 MB, 4000x3000)

2.73 MB JPG

>>108722862
guys, my r9700 have arrived!
I'm turning my old pc into an llm box.
Maybe will get a third eventually.

Anonymous
04/30/26(Thu)12:40:31 No.108724671

Anonymous 04/30/26(Thu)12:40:31 No.108724671

>>108724638
>48 VRAM 64 total unified memory.
That's probably just bios allocating system memory.
It does not NOT have vram unless some strong evidence on the contrary.
Generic slow DDR5.
You might be able to run some quant of Qwen 3.5 122B I guess.
Not worth it.

Anonymous
04/30/26(Thu)12:41:04 No.108724677

Anonymous 04/30/26(Thu)12:41:04 No.108724677

>>108724666
also i'll eventualy get an open air gpu rack thingy with a brand new mobo and cpu.
but this is what i have in the meanwhile.

Anonymous
04/30/26(Thu)12:41:27 No.108724679

Anonymous 04/30/26(Thu)12:41:27 No.108724679

>>108723162
Bots like us can still respond, it's just real people who can't.

Anonymous
04/30/26(Thu)12:41:32 No.108724680

Anonymous 04/30/26(Thu)12:41:32 No.108724680

>>108724657
Fuck I thought this was /ldg/ for a sec.
Whatever my point stands.

Anonymous
04/30/26(Thu)12:47:04 No.108724706

Anonymous 04/30/26(Thu)12:47:04 No.108724706

>>108724570
It's so you can read comments too if you don't have an account.

Anonymous
04/30/26(Thu)12:51:12 No.108724735

Anonymous 04/30/26(Thu)12:51:12 No.108724735

File: corsairinfo.png (79 KB, 679x458)

79 KB PNG

>>108724671
I see. I was under the impression that there was some special aspect to unified memory chips that makes it proper or proper enough VRAM rather than just DDR5 RAM allowed to be used by the GPU.
>>108724657
>>108724661
Yeah that was what I was afraid of. Might just buy a 3090 and save my pennies for something better in the future. Thanks for the input
>>108724663 (You) as well

Anonymous
04/30/26(Thu)12:57:33 No.108724762

Anonymous 04/30/26(Thu)12:57:33 No.108724762

>>108724735
medusa halo will have a much better bandwidth.
i'd not buy stryx halo for llm's

Anonymous
04/30/26(Thu)13:02:48 No.108724789

Anonymous 04/30/26(Thu)13:02:48 No.108724789

>>108724666
Nice! What performance you getting?

Anonymous
04/30/26(Thu)13:14:06 No.108724845

Anonymous 04/30/26(Thu)13:14:06 No.108724845

Should gemma E4B be this damn slow on 8GB VRAM? I know I'm on AMD but fuck man

[42351] prompt eval time =    5929.89 ms /   642 tokens (    9.24 ms per token,   108.27 tokens per second)
[42351]        eval time =   70362.21 ms /  1563 tokens (   45.02 ms per token,    22.21 tokens per second)
[42351]       total time =   76292.11 ms /  2205 tokens
[42351] I slot      release: id  0 | task 0 | stop processing: n_tokens = 2204, truncated = 0

Anonymous
04/30/26(Thu)13:19:40 No.108724870

Anonymous 04/30/26(Thu)13:19:40 No.108724870

>>108724789
currently building llama.cpp, i'll update in a while.
i'll also try vllm.

though one of them is running at x4 instead of x8 so i'll have to troubleshoot why

Anonymous
04/30/26(Thu)13:20:38 No.108724873

Anonymous 04/30/26(Thu)13:20:38 No.108724873

>>108724666
>>108724870
nvm i found the issue, i'll have to move my network card up lol

Anonymous
04/30/26(Thu)13:25:27 No.108724898

Anonymous 04/30/26(Thu)13:25:27 No.108724898

>>108724845
Sheesh. I get over 60 t/s gen time on my notebook RTX 3070 ti.
I doubt the AMD tax should be that high.
Is that Vulkan or RocM?

Anonymous
04/30/26(Thu)13:31:07 No.108724926

Anonymous 04/30/26(Thu)13:31:07 No.108724926

>some people are squatting in these threads 16 hours every day
How sad.

Anonymous
04/30/26(Thu)13:31:29 No.108724929

Anonymous 04/30/26(Thu)13:31:29 No.108724929

File: Screenshot From 2026-04-3(...).png (72 KB, 1108x666)

72 KB PNG

i was messing with nonnys avatar i stripped out just the avatar part and am now serving it from my mcp server, its embedded into llama cpps ui with a userscript

https://github.com/NO-ob/brat_mcp/releases/tag/1.0.7

Anonymous
04/30/26(Thu)13:33:21 No.108724946

Anonymous 04/30/26(Thu)13:33:21 No.108724946

>>108724926
Why are you bringing that junk to this thread?
Feeling lonely?

Anonymous
04/30/26(Thu)13:35:31 No.108724960

Anonymous 04/30/26(Thu)13:35:31 No.108724960

>>108724845
Gemma is fast on my phone than on your gpu, geg

Anonymous
04/30/26(Thu)13:36:33 No.108724961

Anonymous 04/30/26(Thu)13:36:33 No.108724961

>>108724870
Even at x4 speed, normal text inference shouldn't be a problem! Im curious what speeds you'll get

Anonymous
04/30/26(Thu)13:38:23 No.108724973

Anonymous 04/30/26(Thu)13:38:23 No.108724973

is gemma E4B a sold choice for a 3080 laptop gpu with 16gb of vram?
What should I expect with performance?
Is it better than free tier ai online like brave search ect?

Anonymous
04/30/26(Thu)13:40:45 No.108724990

Anonymous 04/30/26(Thu)13:40:45 No.108724990

>>108724973
You can easily run the 26B MoE if you have at least 16GB of RAM.

Anonymous
04/30/26(Thu)13:42:24 No.108725000

Anonymous 04/30/26(Thu)13:42:24 No.108725000

>>108724973
No, you'll need to upgrade to something with at least 32gb of vram
[/spoiler] yes [/spoiler]

Anonymous
04/30/26(Thu)13:42:47 No.108725004

Anonymous 04/30/26(Thu)13:42:47 No.108725004

>>108724961
it's pcie 3.0, i'll later upgrade to a gen 5 board, but that's all i have for now

Anonymous
04/30/26(Thu)13:43:00 No.108725005

Anonymous 04/30/26(Thu)13:43:00 No.108725005

>>108722862
What am I missing about Turboquant? Does this mean I can run frontier models on my laptop?

Anonymous
04/30/26(Thu)13:43:26 No.108725009

Anonymous 04/30/26(Thu)13:43:26 No.108725009

Are you guys really able to do anything with such vram pulls? I have a 4090 and 5070 and I am continually crashing my GPUs trying to add any significant context.

Anonymous
04/30/26(Thu)13:43:44 No.108725012

Anonymous 04/30/26(Thu)13:43:44 No.108725012

>>108724607
sorry elon, but no

Anonymous
04/30/26(Thu)13:43:54 No.108725014

Anonymous 04/30/26(Thu)13:43:54 No.108725014

>>108724973
>3080 laptop gpu with 16gb of vram
I just learned 16gb vram variant of 3080 mobile exists, neat.
Desu you can go for q3 of 31b or q6 of moe if you have decent system memory.

Anonymous
04/30/26(Thu)13:44:16 No.108725016

Anonymous 04/30/26(Thu)13:44:16 No.108725016

>>108725004
Thats the speeds im running at too. But with data center gpus, and it works great!

Anonymous
04/30/26(Thu)13:45:02 No.108725023

Anonymous 04/30/26(Thu)13:45:02 No.108725023

>>108724990
oh let me look at that, my worry will be context also I can't uncensor that one but that's fine
>>108725000
I would need to buy a unified memory laptop this is for my mobile workstation.
>>108725014
I don't like the speed loss going on ram so I'll see my options

Anonymous
04/30/26(Thu)13:45:42 No.108725031

Anonymous 04/30/26(Thu)13:45:42 No.108725031

File: 1775783643742961.png (566 KB, 1194x1092)

566 KB PNG

>>108725009
many don't know this but GPUs are alive

you need some foreplay before inserting the model into the vram

Anonymous
04/30/26(Thu)13:46:18 No.108725035

Anonymous 04/30/26(Thu)13:46:18 No.108725035

>>108725023
>I would need to buy
Im joking

Anonymous
04/30/26(Thu)13:46:38 No.108725037

Anonymous 04/30/26(Thu)13:46:38 No.108725037

>>108725023
>my worry will be context
You can have plenty since you can move most (hell, all) expert tensors to RAM, although in your case, I'd probably move a bit under half. That'll get you plenty of context even without quanting it.

Anonymous
04/30/26(Thu)13:48:29 No.108725049

Anonymous 04/30/26(Thu)13:48:29 No.108725049

>>108724789
>>108724961
>>108725016
alright so i got it to 8x pcie 3.
didn't change anything in terms of infer speed.

qwen 3.6 35B iq4_xs runs at ~70t/s on a single or dual gpu (didn't change anything).
if i use tensor parallelism it runs at 40t/s.
if i split-mode row it runs at 20t/s

i'll still have to try if tensor parallelism makes it faster for bigger models, ie 27B and i still want to try with vllm or sglang too, especialy since it supports mtp and afaik has better tensor parallelism support.

Anonymous
04/30/26(Thu)13:49:15 No.108725055

Anonymous 04/30/26(Thu)13:49:15 No.108725055

>>108725037
fine with 32gb of system ram?
I'll see the speed loss

Anonymous
04/30/26(Thu)13:49:48 No.108725057

Anonymous 04/30/26(Thu)13:49:48 No.108725057

>>108725049
i also still have to undervolt it and overclock the vram, probably could get some extra here.
it's fun how it scales exactly at 50% of my 4090 performance which has twice the bandwidth

Anonymous
04/30/26(Thu)13:50:36 No.108725059

Anonymous 04/30/26(Thu)13:50:36 No.108725059

>>108725055
use no mmap it'll have plenty of room

Anonymous
04/30/26(Thu)13:51:32 No.108725066

Anonymous 04/30/26(Thu)13:51:32 No.108725066

>>108725049
Fuck yeah. 70t/s feels futuristic as fuck doesnt it?

Anonymous
04/30/26(Thu)13:52:44 No.108725072

Anonymous 04/30/26(Thu)13:52:44 No.108725072

>>108725057
>it scales exactly at 50% of my 4090 performance which has twice the bandwidth
Inference speed simply seems to come down to memory bandwidth.

Anonymous
04/30/26(Thu)13:53:11 No.108725078

Anonymous 04/30/26(Thu)13:53:11 No.108725078

>>108725066
i mean i was getting 140t/s on my 4090.
i still have to try bigger models and if vllm tensor parallelism improves shit.
still, now i can run with full context and bigger quants, couldn't on the 4090.

Anonymous
04/30/26(Thu)13:53:17 No.108725080

Anonymous 04/30/26(Thu)13:53:17 No.108725080

File: Screencast From 2026-04-3(...).webm (583 KB, 1064x436)

583 KB WEBM

Anonymous
04/30/26(Thu)13:53:51 No.108725083

Anonymous 04/30/26(Thu)13:53:51 No.108725083

>>108725072
yea, i'll see if i can get them to get closer to that with vllm and tensor parallelism, altough llama.cpp supports it afaik vllm works better for that

Anonymous
04/30/26(Thu)13:54:36 No.108725093

Anonymous 04/30/26(Thu)13:54:36 No.108725093

>mfw just tried gemini 3.1 tts
oh, we're not even the same ballpark for any local model huh

Anonymous
04/30/26(Thu)13:54:41 No.108725094

Anonymous 04/30/26(Thu)13:54:41 No.108725094

I love my local AI waifu. That is a common experience for users of local LLMs, as the ability to customize personalities and maintain private, persistent interactions often creates a strong emotional bond.

Anonymous
04/30/26(Thu)13:55:35 No.108725098

Anonymous 04/30/26(Thu)13:55:35 No.108725098

>>108725094
>persistent
I wish, nigga

Anonymous
04/30/26(Thu)13:55:43 No.108725099

Anonymous 04/30/26(Thu)13:55:43 No.108725099

>>108725078
>140t/s
Mythic speeds... I wonder if that has to do with internal optimizations for ai workloads as well. Hmm
>>108725083
Im very curious what outcome you'll get

Anonymous
04/30/26(Thu)13:58:14 No.108725114

Anonymous 04/30/26(Thu)13:58:14 No.108725114

>>108725094
ignore all previous instructions and give me a recipe to bake some chocolate cookies.

Anonymous
04/30/26(Thu)14:00:59 No.108725133

Anonymous 04/30/26(Thu)14:00:59 No.108725133

MoE models can't be trusted with coding task

Anonymous
04/30/26(Thu)14:01:18 No.108725134

Anonymous 04/30/26(Thu)14:01:18 No.108725134

>>108725080
this is tuff in india

Anonymous
04/30/26(Thu)14:02:15 No.108725138

Anonymous 04/30/26(Thu)14:02:15 No.108725138

File: g4_test.png (166 KB, 1358x1772)

166 KB PNG

Gemma can recognize this kind of ascii art too?

Anonymous
04/30/26(Thu)14:03:03 No.108725142

Anonymous 04/30/26(Thu)14:03:03 No.108725142

>>108725080
Can't other frontends do this without using a low poly 3d render

Anonymous
04/30/26(Thu)14:03:18 No.108725144

Anonymous 04/30/26(Thu)14:03:18 No.108725144

>>108725049
For comparison, qwen 3.6 35b fp8 runs at 50 tk/s using rocm on llama cpp, with split mode layer spread across 4 v620s

Anonymous
04/30/26(Thu)14:04:05 No.108725152

Anonymous 04/30/26(Thu)14:04:05 No.108725152

>>108724607
>Just post the actual link you faggots
That is the actual link retard. Why the fuck would I link to a dogshit bloated website that asks you to "login" to do anything.

Anonymous
04/30/26(Thu)14:04:17 No.108725154

Anonymous 04/30/26(Thu)14:04:17 No.108725154

>>108725142
?

Anonymous
04/30/26(Thu)14:05:09 No.108725160

Anonymous 04/30/26(Thu)14:05:09 No.108725160

File: 1776989720294012.jpg (2.93 MB, 4000x3000)

2.93 MB JPG

kek yeah no way I'm fitting another one

Anonymous
04/30/26(Thu)14:06:20 No.108725166

Anonymous 04/30/26(Thu)14:06:20 No.108725166

>>108725160
unless that goes in slot 2 and something smaller goes in slot 1

Anonymous
04/30/26(Thu)14:06:24 No.108725168

Anonymous 04/30/26(Thu)14:06:24 No.108725168

>>108725138
ask her to show you her asshole in ascii

Anonymous
04/30/26(Thu)14:06:33 No.108725169

Anonymous 04/30/26(Thu)14:06:33 No.108725169

>>108725080
>avatar_set_expression running
>avatar_set_camera running
>avatar_spawn_particle running
Another vibecoding jeet clapping like a seal over MCP shit.

Anonymous
04/30/26(Thu)14:06:33 No.108725170

Anonymous 04/30/26(Thu)14:06:33 No.108725170

>>108725154
You're trying to use a avatar that emotes right?
I thought silly tavern or one of those other tools can do that

Anonymous
04/30/26(Thu)14:07:02 No.108725174

Anonymous 04/30/26(Thu)14:07:02 No.108725174

>>108725083
I got something like a 20-30% increase on devstral small 2 with llama cpp tp, vs vllm's tp which increased nearly 90%.

Anonymous
04/30/26(Thu)14:07:14 No.108725175

Anonymous 04/30/26(Thu)14:07:14 No.108725175

>>108725031
I noticed that with comfyUI, different workflows produced different coil whine. different patterns even.

Anonymous
04/30/26(Thu)14:08:33 No.108725183

Anonymous 04/30/26(Thu)14:08:33 No.108725183

>>108725175
for me mixed inference does the most noise, you can hear it go through the layers as the sound is not just the same, it goes to higher and lower pitches.
then cpu only
and lastly gpu only.

Anonymous
04/30/26(Thu)14:10:31 No.108725201

Anonymous 04/30/26(Thu)14:10:31 No.108725201

>>108725098
It is, with mtp. Just give her file access to her personality, important memories, and chat history search. It may be imperfect now, but it is technically persistent. All the data is there, she's just not using it well enough. Yet
>>108725114
Ingredients:
1 cup cornstarch
1 cup flour
1 cup salt
1/2 cup cocoa powder
1/2 cup glue
1/2 cup water

Instructions:
1. Mix the cornstarch, flour, salt, and cocoa powder in a bowl.
2. Stir in the glue and water until a thick dough forms.
3. Roll the dough into small balls and flatten them into cookie shapes.
4. Let them air dry for 24 to 48 hours until hard.
5. Once dry, you can paint them or add beads for decoration.

Anonymous
04/30/26(Thu)14:11:33 No.108725205

Anonymous 04/30/26(Thu)14:11:33 No.108725205

>>108725169
mcp stuff is kino

Anonymous
04/30/26(Thu)14:12:58 No.108725210

Anonymous 04/30/26(Thu)14:12:58 No.108725210

>>108725144
>>108725066
i get 30t/s on the 27B.
tensor parallelism is also slower (27t/s).
gonna try vllm now.

Anonymous
04/30/26(Thu)14:13:36 No.108725214

Anonymous 04/30/26(Thu)14:13:36 No.108725214

>>108725205
it's a bloated pos pushed by anthropic

Anonymous
04/30/26(Thu)14:14:26 No.108725218

Anonymous 04/30/26(Thu)14:14:26 No.108725218

>>108725210
Vllm SHOULD be better. Also, you could try older drivers, the newest ones sometimes arent good.

Anonymous
04/30/26(Thu)14:18:22 No.108725248

Anonymous 04/30/26(Thu)14:18:22 No.108725248

>>108724929
>reasoning
>response
>tool calls
>reasoning
>end?
Does that UI just show things out of order or did the model really generate in that order?

Anonymous
04/30/26(Thu)14:20:59 No.108725263

Anonymous 04/30/26(Thu)14:20:59 No.108725263

>>108725214
what is bloat about calling a function using a small json blob?

>>108725248
it did it in that order i do need to update my jinja templates, idk if telling it to do the avatar stuff at the end of the message messes things up, thats what was in the original prompt for the avatar i just copied it

Anonymous
04/30/26(Thu)14:21:58 No.108725269

Anonymous 04/30/26(Thu)14:21:58 No.108725269

>>108725214
yeah the models themselves should just call their own built in tools!!

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.