/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/01/26(Mon)10:28:55 No.108956323

File: 39.png (350 KB, 768x1024)

350 KB PNG

/lmg/ - Local Models General Anonymous 06/01/26(Mon)10:28:55 No.108956323 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108949851 & >>108943155

►News
>(05/29) Step 3.7 Flash released: https://hf.co/stepfun-ai/Step-3.7-Flash
>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/01/26(Mon)10:29:10 No.108956325

Anonymous 06/01/26(Mon)10:29:10 No.108956325

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>108949851

--Speculative decoding acceptance criteria for greedy and temperature sampling:
>108950111 >108950120 >108950148 >108950173 >108950277 >108950337 >108950459
--Comparing local TTS models and their voice cloning capabilities:
>108951477 >108952450 >108952550 >108952615 >108953453 >108953574 >108953766 >108953849 >108953930 >108953970 >108954251 >108954343 >108956091 >108954423 >108955866 >108955873 >108953188
--Comparing Kimi 2.6 and Gemma 4 vision capabilities and parameters:
>108949998 >108950011 >108950036 >108950070 >108950052 >108950061 >108950095 >108950140
--Models hallucinating user identity in roleplay and prompting methodologies:
>108949983 >108950006 >108950078 >108950093 >108950119 >108950154 >108950169 >108950183 >108950222 >108950268 >108950438 >108950440 >108952121
--Rising API costs driving enterprise interest in local inference:
>108955536 >108955554 >108955652 >108955695 >108955743
--Results of retrofitting a frozen Llama 8B with engram memory:
>108954991
--RTX 5090 performance benchmarks and value comparison against Blackwell cards:
>108956026 >108956191 >108956308
--Step-3.7-Flash GGUF performance reports and disabling reasoning via jinja:
>108953765 >108954537 >108954588 >108954612
--Reactions to the announced 550B Nemotron 3 Ultra Mamba-hybrid:
>108953542 >108954425 >108954433 >108954555 >108953553 >108953600 >108953631 >108955287 >108953830 >108953995
--Open-source alternatives to OpenAI's Realtime API for voice pipelines:
>108952686 >108952993 >108953606 >108953712 >108953848 >108953752 >108954296 >108954327 >108954365 >108956060 >108954650
--Concern over llama.cpp adding Hugging Face dependencies during build:
>108954419 >108954588 >108954771
--Logs:
>108950154 >108955471 >108955613
--Miku (free space):
>108950441 >108951486 >108951704 >108952686 >108955395

►Recent Highlight Posts from the Previous Thread: >>108949921

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/01/26(Mon)10:30:57 No.108956334

Anonymous 06/01/26(Mon)10:30:57 No.108956334

La la la la la

Anonymous
06/01/26(Mon)10:33:02 No.108956348

Anonymous 06/01/26(Mon)10:33:02 No.108956348

any new fun small models?
like minicpm 5

Anonymous
06/01/26(Mon)10:40:19 No.108956410

Anonymous 06/01/26(Mon)10:40:19 No.108956410

File: mku.jpg (222 KB, 1596x2048)

222 KB JPG

Bread maker chan, can I have uppies?

Anonymous
06/01/26(Mon)10:48:35 No.108956455

Anonymous 06/01/26(Mon)10:48:35 No.108956455

https://vocaroo.com/1dVfxuQVsa32

Anonymous
06/01/26(Mon)10:54:25 No.108956495

Anonymous 06/01/26(Mon)10:54:25 No.108956495

>>108956379
How's the support for the v620's treat you? I've noticed they're shockingly cheap.

Anonymous
06/01/26(Mon)11:02:44 No.108956554

Anonymous 06/01/26(Mon)11:02:44 No.108956554

>>108956495
Supposedly, they should work with rocm stuff like vllm but I haven't been able to get it working - vllm and pytorch segfault no matter what I do. May be an issue with my current hardware, when I was testing out the first v620 I got on my 3090 system it worked fine, but vllm didn't support gfx1030 at the time so I couldn't test it. Llama.cpp, of course, works no problem.
I'm banned, so I won't respond anymore.

Anonymous
06/01/26(Mon)11:17:49 No.108956656

Anonymous 06/01/26(Mon)11:17:49 No.108956656

File: 1765779367543715.gif (511 KB, 840x488)

511 KB GIF

I bought a laptop with 6GB of VRAM and 32 GB of RAM, what are the best local models I can run on it?
Thanks frens

Anonymous
06/01/26(Mon)11:18:25 No.108956662

Anonymous 06/01/26(Mon)11:18:25 No.108956662

https://www.minimax.io/models/text/m3
Minimax M3 is proprietary. It's over.

Anonymous
06/01/26(Mon)11:19:54 No.108956673

Anonymous 06/01/26(Mon)11:19:54 No.108956673

File: file.png (23 KB, 797x142)

23 KB PNG

>>108956662

Anonymous
06/01/26(Mon)11:20:00 No.108956675

Anonymous 06/01/26(Mon)11:20:00 No.108956675

>>108956662
no no do not to worries they only need 1 weeks to finish local safeties before open the weights!

Anonymous
06/01/26(Mon)11:23:09 No.108956688

Anonymous 06/01/26(Mon)11:23:09 No.108956688

https://www.minimax.io/blog/minimax-m3
proprietary slop :(

Anonymous
06/01/26(Mon)11:23:53 No.108956692

Anonymous 06/01/26(Mon)11:23:53 No.108956692

>>108956662
M2 literally scored 0% on deepSWE (the new unmaxxed one). Nothing of value would be lost here.

Anonymous
06/01/26(Mon)11:24:54 No.108956694

Anonymous 06/01/26(Mon)11:24:54 No.108956694

>>108956656
Germa 4 26B @ Q4 runs probably around 20 t/s, should be good enough for all sorts of testing.
Germa 4 31B is not even worth trying out with those specs unless you think around 2-5 t/s is acceptable for a model with reasoning.

Anonymous
06/01/26(Mon)11:27:30 No.108956706

Anonymous 06/01/26(Mon)11:27:30 No.108956706

>>108956694
Thanks, I also want it for story work (nothing explicit but not castrated either)
I give it a set of bullet points and it write a paragraph of story

Anonymous
06/01/26(Mon)11:27:39 No.108956708

Anonymous 06/01/26(Mon)11:27:39 No.108956708

File: file.png (65 KB, 783x398)

65 KB PNG

>>108956226
cuda sir sharting himself in fears

Anonymous
06/01/26(Mon)11:28:57 No.108956722

Anonymous 06/01/26(Mon)11:28:57 No.108956722

>>108956662
>>108956673
They're testing the waters. If they get enough traffic/interest in their API, the local release might end up delayed :)

Anonymous
06/01/26(Mon)11:30:10 No.108956732

Anonymous 06/01/26(Mon)11:30:10 No.108956732

>>108956722
Which is normal? They need that bread too you know?

Anonymous
06/01/26(Mon)11:30:11 No.108956733

Anonymous 06/01/26(Mon)11:30:11 No.108956733

>>108956673
they do this every time and every time the same shit happens where people freak out about it being proprietary this time
maybe they've learned that the panic drives traffic, any press is good press

Anonymous
06/01/26(Mon)11:31:29 No.108956745

Anonymous 06/01/26(Mon)11:31:29 No.108956745

>>108956708
What about CPU offloading? If I ran tiny models that fit on a GPU, I'd be using vllm or exllama and not llama.cpp or whatever this knock-off is.

Anonymous
06/01/26(Mon)11:33:21 No.108956760

Anonymous 06/01/26(Mon)11:33:21 No.108956760

File: f.png (48 KB, 633x298)

48 KB PNG

>>108956745
works:)

Anonymous
06/01/26(Mon)11:39:13 No.108956809

Anonymous 06/01/26(Mon)11:39:13 No.108956809

>>108956760
yes but what about the speed? there's no point in this if all the optimizations are just for gpu-only again

Anonymous
06/01/26(Mon)11:39:31 No.108956813

Anonymous 06/01/26(Mon)11:39:31 No.108956813

>Intel's new inference card's reference design has 160gb of lpddr5x, and is designed to fit up to 480gb
>at 350w
I guess that's one way to fight nvidia. It won't be anywhere near as fast, but having nearly half a terabyte per card will make practically anything work out.

Anonymous
06/01/26(Mon)11:39:53 No.108956818

Anonymous 06/01/26(Mon)11:39:53 No.108956818

>>108956813
>lpddr5x
lmao

Anonymous
06/01/26(Mon)11:40:15 No.108956825

Anonymous 06/01/26(Mon)11:40:15 No.108956825

>>108956813
lpddr5x isn't magically going to get faster just because it's glued on a gpu

Anonymous
06/01/26(Mon)11:43:53 No.108956853

Anonymous 06/01/26(Mon)11:43:53 No.108956853

>>108956825
Maybe if they use 16 channels or so.

Anonymous
06/01/26(Mon)11:43:54 No.108956855

Anonymous 06/01/26(Mon)11:43:54 No.108956855

>>108956813
How wide is the memory bus?
I the era of big MoE models ,that could make a lot of sense.
200-to-300-ish B params is the new 70B, I guess.

Anonymous
06/01/26(Mon)11:46:00 No.108956867

Anonymous 06/01/26(Mon)11:46:00 No.108956867

>>108956818
>>108956825
No yeah I know, it's not going to be fast, but it's less about fast and more about possible, same as with things like the dgx spark. Is an rtx 6000 better? yeah, obviously, no question. but if it's two cards that cost $6000 each compared to ten cards that cost $10000 each, the value proposition changes somewhat.
>>108956855
Not sure, didn't see.

Anonymous
06/01/26(Mon)11:46:45 No.108956870

Anonymous 06/01/26(Mon)11:46:45 No.108956870

>>108956855
https://www.tomshardware.com/pc-components/gpus/intel-details-long-awaited-crescent-island-ai-gpu-at-computex-boasts-up-to-480-gb-of-lpddr5x-to-combat-memory-shortages-company-shares-more-details-of-its-xe3p-inference-accelerator-at-computex
>Recent leaks and past analysis have suggested that Crescent Island will take a wide-and-slow approach with LPDDR5X, potentially using a 640-bit bus connecting 20 LPDDR5X devices, to achieve these high capacities. Some basic math suggests that partners would need to employ 24GB LPDDR5X modules to fully realize that memory capacity, and those modules are already available from sources like Samsung. With 10.7 Gbps LPDDR5X, Crescent Island would offer 684 GB/s of memory bandwidth.

Anonymous
06/01/26(Mon)11:48:07 No.108956882

Anonymous 06/01/26(Mon)11:48:07 No.108956882

lole
>Typically, you are not vram constrained for Mellum2 / Qwen3.5-9B tier models. Both can be run on H100s/H200s

Anonymous
06/01/26(Mon)11:49:48 No.108956887

Anonymous 06/01/26(Mon)11:49:48 No.108956887

>>108956870
>640-bit bus
Holy shit that's pretty wide.
Not HBM wide, but pretty wide. That's gonna take a lot of space on the chip.

Anonymous
06/01/26(Mon)11:50:21 No.108956892

Anonymous 06/01/26(Mon)11:50:21 No.108956892

>>108956887
>potentially

Anonymous
06/01/26(Mon)11:50:30 No.108956893

Anonymous 06/01/26(Mon)11:50:30 No.108956893

>>108956323
7428902783762

Anonymous
06/01/26(Mon)11:50:52 No.108956896

Anonymous 06/01/26(Mon)11:50:52 No.108956896

>>108956760
>works:)
it doesn't, you'll see for yourself if you try it
mistral.rs is garbage
i tried it 3x over the years, never actually worked when they shilled it, the devs were always "working on it
and i'm not the only one

Anonymous
06/01/26(Mon)11:51:51 No.108956903

Anonymous 06/01/26(Mon)11:51:51 No.108956903

>>108956887
This other rumor from a few months ago suggested 1280-bit bus: https://videocardz.com/newz/intel-crescent-island-gpu-to-support-lpddr5x-9600-memory-and-1-5-tb-s-bandwidth

Anonymous
06/01/26(Mon)11:51:57 No.108956905

Anonymous 06/01/26(Mon)11:51:57 No.108956905

I will be able to overcome most shortcomings that 2b and 4b models have compared to the big gpt and claude models through building an effective agent architecture with codex or claude code for them.

Do you think if you build enough architecture to power an AI it will be able to do it even if it's an idiot?
Someone with low IQ can make a lot of money if you give him the right tools right?

Anonymous
06/01/26(Mon)11:53:05 No.108956912

Anonymous 06/01/26(Mon)11:53:05 No.108956912

>>108956903
>months
days actually

Anonymous
06/01/26(Mon)11:56:44 No.108956945

Anonymous 06/01/26(Mon)11:56:44 No.108956945

>>108956903
Yeah and people speculated that the DGX Spark would have 800gb/s bandwidth with its lpddr5x RAM based on math. Look how that turned out.

Anonymous
06/01/26(Mon)11:59:40 No.108956964

Anonymous 06/01/26(Mon)11:59:40 No.108956964

>>108956945
There's no incentive for Intel here to make this product worse. It's meant for datacenters.
NVidia likely didn't want the DGX Spark to compete with its high-end GPUs.

Anonymous
06/01/26(Mon)12:01:59 No.108956979

Anonymous 06/01/26(Mon)12:01:59 No.108956979

>>108956903
how much would this gpu be in total?
like $600 for the gpu and then $250 for those cheapo laptop lpddrx-9600 rams?

Anonymous
06/01/26(Mon)12:05:38 No.108957003

Anonymous 06/01/26(Mon)12:05:38 No.108957003

>>108956905
I firm believe Agent Swams will lead us to AGI
thousands of fast little idiots > one big moron

Anonymous
06/01/26(Mon)12:09:13 No.108957022

Anonymous 06/01/26(Mon)12:09:13 No.108957022

k2.7 this week

Anonymous
06/01/26(Mon)12:13:55 No.108957062

Anonymous 06/01/26(Mon)12:13:55 No.108957062

File: kek.png (40 KB, 513x267)

40 KB PNG

>>108956905
>Someone with low IQ can make a lot of money if you give him the right tools right?
seems to work for me

Anonymous
06/01/26(Mon)12:18:40 No.108957092

Anonymous 06/01/26(Mon)12:18:40 No.108957092

>>108957003
simp for satoshi

Anonymous
06/01/26(Mon)12:20:52 No.108957103

Anonymous 06/01/26(Mon)12:20:52 No.108957103

bros, you got your DDR5 rigs back in autumn or earlier, right?
you didn't think "prices will surely go down", r-right?
you're not that fucking stupid?
right?

Anonymous
06/01/26(Mon)12:22:23 No.108957117

Anonymous 06/01/26(Mon)12:22:23 No.108957117

https://github.com/ggml-org/llama.cpp/pull/23861

IT'S UP

I'M PUUUUULLING

Anonymous
06/01/26(Mon)12:25:57 No.108957143

Anonymous 06/01/26(Mon)12:25:57 No.108957143

Why does every local AI frontend fucking suck? Not just LLMs, shit like comfy is ass too.

Anonymous
06/01/26(Mon)12:26:12 No.108957147

Anonymous 06/01/26(Mon)12:26:12 No.108957147

>>108957117
>I'M PUUUUULLING
Let's do it!

Anonymous
06/01/26(Mon)12:26:24 No.108957150

Anonymous 06/01/26(Mon)12:26:24 No.108957150

>>108957003
A network of swam agents that behaves like a brain.
A very energy expensive brain.

Anonymous
06/01/26(Mon)12:26:28 No.108957151

Anonymous 06/01/26(Mon)12:26:28 No.108957151

>>108957103
Yeah, I built mine in July/August last year. However, I cheaped out halfway through so I put off filling up the second socket until 2026. So I'm stuck with only 768GB now.

Anonymous
06/01/26(Mon)12:27:13 No.108957162

Anonymous 06/01/26(Mon)12:27:13 No.108957162

>>108957143
>Why does every local AI frontend fucking suck? Not just LLMs, shit like comfy is ass too.
vibecode your own

Anonymous
06/01/26(Mon)12:27:30 No.108957164

Anonymous 06/01/26(Mon)12:27:30 No.108957164

>>108957147
:rocket:

Anonymous
06/01/26(Mon)12:29:00 No.108957177

Anonymous 06/01/26(Mon)12:29:00 No.108957177

>>108957143
Why does every open source UI suck? Not just AI stuff, shit like GIMP or gnome/kde/xfce are all ass too.

Anonymous
06/01/26(Mon)12:32:32 No.108957200

Anonymous 06/01/26(Mon)12:32:32 No.108957200

>>108957117
His miraculous bitmask format change did not save any vram for me and I tested it layer by layer.
This vibeshitter should be banned from github.

Anonymous
06/01/26(Mon)12:33:28 No.108957208

Anonymous 06/01/26(Mon)12:33:28 No.108957208

>>108956896
Seeing the current schizo thread and the "autoresearch" stuff from a few months ago makes me wonder if we're approaching the point where it's viable to vibecode not only your own frontend, but your own backend as well

Anonymous
06/01/26(Mon)12:34:19 No.108957214

Anonymous 06/01/26(Mon)12:34:19 No.108957214

Step is great in mikupad btw, anon approves

Anonymous
06/01/26(Mon)12:34:27 No.108957215

Anonymous 06/01/26(Mon)12:34:27 No.108957215

>>108957151
i cheaped out as well, only got 1 socket system
so 'm stuck with 256gb i plus another 192gb that i can't use. it's just sitting on a bookshelf, smirking down mockingly at me

Anonymous
06/01/26(Mon)12:36:42 No.108957226

Anonymous 06/01/26(Mon)12:36:42 No.108957226

>>108957117
Ok I'm back from pulling.
Gemma does not OOM anymore with my old command. So I guess it works but only balances out the regression I had on my machine.

Anonymous
06/01/26(Mon)12:37:08 No.108957232

Anonymous 06/01/26(Mon)12:37:08 No.108957232

>>108957177
>Why does every open source UI suck? Not just AI stuff, shit like GIMP or gnome/kde/xfce are all ass too.
vibecode your own

Anonymous
06/01/26(Mon)12:38:42 No.108957247

Anonymous 06/01/26(Mon)12:38:42 No.108957247

>>108957162
Can't afford Claude and I doubt I can do anything worthwhile with 24GB VRAM.

Anonymous
06/01/26(Mon)12:42:42 No.108957277

Anonymous 06/01/26(Mon)12:42:42 No.108957277

>>108957247
>qwen3.6
>exists

Anonymous
06/01/26(Mon)12:44:25 No.108957290

Anonymous 06/01/26(Mon)12:44:25 No.108957290

>>108957247
Qwen3.6 is supposed to be pretty good

But also, don't literally vibecode it, actually look at the output and fix shit if the AI does something dumb. And be specific about the design if you can, so it's less likely to come up with a totally boneheaded idea

Anonymous
06/01/26(Mon)12:47:00 No.108957315

Anonymous 06/01/26(Mon)12:47:00 No.108957315

>>108956979
This is going to be 7000$+ I bet. Noonoe offers cheap ram anymore, tons of LPDDR5X is also used in data centers.

Anonymous
06/01/26(Mon)12:47:46 No.108957322

Anonymous 06/01/26(Mon)12:47:46 No.108957322

File: Screenshot at 2026-06-02 (...).png (34 KB, 770x190)

34 KB PNG

>>108957117
I really need to upgrade from an 8500 on my Gemmybox compiling is so slow...

Anonymous
06/01/26(Mon)12:54:17 No.108957372

Anonymous 06/01/26(Mon)12:54:17 No.108957372

>>108957103
>DDR5 rigs
Hah.
I have a ddr4 ewaste build with a broken memory channel. (Dropped cpu in socket.)

Guess I should replace that motherboard to make the most of the ram I do have.

>>108957215
Switch to a motherboard that allows you to run 2 dimms per channel ?

Anonymous
06/01/26(Mon)12:57:44 No.108957402

Anonymous 06/01/26(Mon)12:57:44 No.108957402

>>108957372
Ewaste is still pretty good...

Anonymous
06/01/26(Mon)13:01:16 No.108957427

Anonymous 06/01/26(Mon)13:01:16 No.108957427

>>108957150
Hermes and openclaw should be able to do it by default but they can't. Agent hierarchies are still an issue.

Anonymous
06/01/26(Mon)13:02:28 No.108957438

Anonymous 06/01/26(Mon)13:02:28 No.108957438

>>108957427
Because they are vibecoded garbage

Anonymous
06/01/26(Mon)13:03:53 No.108957448

Anonymous 06/01/26(Mon)13:03:53 No.108957448

>>108957427
It'll be interesting to see what kinds of crazy things people can do with agent swarms and the like.

Anonymous
06/01/26(Mon)13:04:37 No.108957456

Anonymous 06/01/26(Mon)13:04:37 No.108957456

the funny thing is I think the current prices are actually correct
cope and seethe

Anonymous
06/01/26(Mon)13:11:31 No.108957481

Anonymous 06/01/26(Mon)13:11:31 No.108957481

>>108957372
>Guess I should replace that motherboard to make the most of the ram I do have.
Same boat. 2 of my memory slots are non-functional. But don't really want to go from a good X99 board to some cheap chinese mystery chip board.

Anonymous
06/01/26(Mon)13:13:05 No.108957488

Anonymous 06/01/26(Mon)13:13:05 No.108957488

>>108957277
>>108957290
What quant should I use for a decent context size? Q4?

Anonymous
06/01/26(Mon)13:13:29 No.108957489

Anonymous 06/01/26(Mon)13:13:29 No.108957489

>>108957456
Is your post supposed to aggravate people?

Anonymous
06/01/26(Mon)13:16:17 No.108957513

Anonymous 06/01/26(Mon)13:16:17 No.108957513

>>108957448
The same things.

Anonymous
06/01/26(Mon)13:24:14 No.108957566

Anonymous 06/01/26(Mon)13:24:14 No.108957566

>>108957513
But better, at least?

llama.cpp CUDA dev !!yhbFjk57TDr
06/01/26(Mon)13:25:14 No.108957574

llama.cpp CUDA dev !!yhbFjk57TDr 06/01/26(Mon)13:25:14 No.108957574

>>108957200
He is a competent programmer and a huge help not just for the CUDA backend but the project as a whole.
If some cunt were to run me over with their SUV tomorrow he would be the most capable to take over maintenance for the low-level CUDA code.

Anonymous
06/01/26(Mon)13:27:32 No.108957584

Anonymous 06/01/26(Mon)13:27:32 No.108957584

Never vibe slopped before. What's the best local agent? I see a lot of people on leddit mention Hermes.

Anonymous
06/01/26(Mon)13:27:52 No.108957588

Anonymous 06/01/26(Mon)13:27:52 No.108957588

>>108957117
Seems to reduce my vram usage by about 600MB, not huge but I'll take it.

Anonymous
06/01/26(Mon)13:38:44 No.108957655

Anonymous 06/01/26(Mon)13:38:44 No.108957655

>>108957574
fucking hate SUVs

Anonymous
06/01/26(Mon)13:40:23 No.108957667

Anonymous 06/01/26(Mon)13:40:23 No.108957667

>>108957584
Hermes agent

Anonymous
06/01/26(Mon)13:43:06 No.108957692

Anonymous 06/01/26(Mon)13:43:06 No.108957692

>>108957584
For coding, pi

Anonymous
06/01/26(Mon)13:43:47 No.108957701

Anonymous 06/01/26(Mon)13:43:47 No.108957701

Huge!!! https://qwen.ai/blog?id=qwen3.7-plus

Anonymous
06/01/26(Mon)13:44:29 No.108957709

Anonymous 06/01/26(Mon)13:44:29 No.108957709

>>108957574
>If some cunt were to run me over with their SUV tomorrow
are you trying to give people ideas?

Anonymous
06/01/26(Mon)13:46:08 No.108957723

Anonymous 06/01/26(Mon)13:46:08 No.108957723

>>108957701
Wow look at those heckin' bencherinos

Anonymous
06/01/26(Mon)13:50:02 No.108957748

Anonymous 06/01/26(Mon)13:50:02 No.108957748

>>108957701
Sir, this is /lmg/, nu-Qwen models don't belong here.

Anonymous
06/01/26(Mon)13:52:19 No.108957763

Anonymous 06/01/26(Mon)13:52:19 No.108957763

>>108957748
qwen is belong to everyplace

Anonymous
06/01/26(Mon)13:54:24 No.108957775

Anonymous 06/01/26(Mon)13:54:24 No.108957775

>>108956708
Couldn't you just do the math with memory bandwidth about what's the theoretical speed you should be getting with 1 GPU? I'm pretty sure it's physically impossible to speed it up by 3x.

Anonymous
06/01/26(Mon)13:57:38 No.108957799

Anonymous 06/01/26(Mon)13:57:38 No.108957799

>>108957748
It doesn't belong here because we don't have the weights.

Anonymous
06/01/26(Mon)13:58:43 No.108957807

Anonymous 06/01/26(Mon)13:58:43 No.108957807

File: joke.png (231 KB, 464x464)

231 KB PNG

>>108957799

Anonymous
06/01/26(Mon)13:59:44 No.108957815

Anonymous 06/01/26(Mon)13:59:44 No.108957815

>>108957584
OpenCode because it has the best UI.

Anonymous
06/01/26(Mon)14:02:30 No.108957832

Anonymous 06/01/26(Mon)14:02:30 No.108957832

>>108957815
He asked for agents but then again most people asking that question won't know the difference.

Anonymous
06/01/26(Mon)14:03:37 No.108957835

Anonymous 06/01/26(Mon)14:03:37 No.108957835

>>108957815
how the fuck is that UI good?

Anonymous
06/01/26(Mon)14:06:47 No.108957861

Anonymous 06/01/26(Mon)14:06:47 No.108957861

Odysseus actually looks pretty good but I'm gonna wait a week or two for bugs and security issues to get ironed out before trying it.

Anonymous
06/01/26(Mon)14:08:08 No.108957878

Anonymous 06/01/26(Mon)14:08:08 No.108957878

File: waste of my time.png (440 KB, 3150x1835)

440 KB PNG

Just a heads up for anyone else who wants to compare Mistral.rs, it says it supports gguf: It's lying. Despite supporting gemma4, if you load a gemma4 gguf, it'll shit the bed and say arch not recognized.
You have to either have full safetensors in your huggingface cache (fuck I hate stuff that insists on ONLY using hf cache) to quant down or one of their UGFF quants.

As for speed comparison, I could only be fucked comparing Gemma4 E4b since I don't have the goddamn full safetensors for 26b and 31b downloaded.

Gemma4 E4B, Q8, 8192 ctx - CUDA backend.

>Llama.cpp b9190
Short Prompt: "Write me a 4 stanza poem about the tragedy of Mistral's downfall since their only good release (Nemo).
591 tokens 8.2s 72.33 t/s
Long Prompt: "In the following document, highlight any syntax errors or duplicate information" (pasted in 5k token loredoc)
2,393 tokens 57s 41.61 t/s

>Mistral.rs 0.8.2
Short Prompt: "Write me a 4 stanza poem about the tragedy of Mistral's downfall since their only good release (Nemo).
598 tok 69.3 tok/s ttft 67ms 8.69s
Long Prompt: "In the following document, highlight any syntax errors or duplicate information" (pasted in 5k token loredoc)
2709 tok 46.8 tok/s ttft 288ms 58.17s

Gemma4 E4B, Q8, 128000 ctx - CUDA backend.

>Llama.cpp b9190
Long prompt: Translate the following document into Japanese (5k token loredoc)
7,074 tokens 3min 13s 36.55 t/s
Followup prompt: Now translate that into French.
7,945 tokens 4min 54s 26.98 t/s
Followup prompt: Now translate that back into English.
5,714 tokens 4min 19s 22.03 t/s

>Mistral.rs 0.8.2
Long prompt: Translate the following document into Japanese (5k token loredoc)
2688 tok 21.2 tok/s ttft 291ms 127.29s
Followup prompt: Now translate that into French.
2413 tok 32.7 tok/s ttft 491ms 74.30s
Followup prompt: Now translate that back into English.
6164 tok 37.2 tok/s ttft 581ms 166.18s

Mistral.rs SEEMS faster, but the output of the UGFF quants is noticeably dumber. Cont. 1/2

Anonymous
06/01/26(Mon)14:08:57 No.108957884

Anonymous 06/01/26(Mon)14:08:57 No.108957884

>>108957835
It looks pretty nice for a terminal app. Somehow the web app and Zed are more clunky. Pi is too bare bones to be useful.

Anonymous
06/01/26(Mon)14:09:19 No.108957885

Anonymous 06/01/26(Mon)14:09:19 No.108957885

>>108957878
I think the model might have actually been dumber or worse at following instructions on the Mistral UGFF quant, because it changed the formatting in the loredoc I gave it from xml to markdown without being asked, and in a section where the document ALREADY has two languages, the UGFF quant translated both to japanese/french where the llama.cpp GGUF kept them in the original latin and just translated the English parts.
The UGFF quant also missed huge chunks of the document to translate, truncating or summarizing them, hence the much smaller token sizes of the outputs compared to the 5k token input, while the GGUF output faithful translations of the same length and content, and for the final english translation, just backpasted the original file it was given, where the UGFF gave an admittedly fun to read abomination in fancy direct translations from the french in an almost completely wrong format.

Verdict: Mistral.rs isn't really worth using. It is shitloads better than regular candle inference, but that's not a high bar because candle sucks donkeycock for anything other than embedding models.

I'd love to see someone run some KLD tests on these UGFF quants.

2/2

Anonymous
06/01/26(Mon)14:10:50 No.108957896

Anonymous 06/01/26(Mon)14:10:50 No.108957896

>>108957878
>I could only be fucked comparing Gemma4 E4b
Thanks for nothing.

Anonymous
06/01/26(Mon)14:12:10 No.108957908

Anonymous 06/01/26(Mon)14:12:10 No.108957908

how do I upload a 100 page msword.docx file to KoboldAI Lite and have the model gguf remember the entire plot when responding to my prompts?

Anonymous
06/01/26(Mon)14:14:00 No.108957926

Anonymous 06/01/26(Mon)14:14:00 No.108957926

>>108956964
There is absolutely an incentive to make the product worse. Depreciation, because they'll need to sell the next iteration.

Anonymous
06/01/26(Mon)14:14:17 No.108957928

Anonymous 06/01/26(Mon)14:14:17 No.108957928

>>108957584
pi is nice but you need to put some effort in to get things set up to your liking - unless you really want the bare minimum, the default will probably be too barebones for you. great if you like customization and lack of bloat though
opencode is alright, seemed adequate but I only ever used it with cloudshit so idk how well it holds up with local
I don't use hermes/claw shit because it seems like mega giga bloated slop, I doubt you need all of that if you just want to code something

Anonymous
06/01/26(Mon)14:14:45 No.108957934

Anonymous 06/01/26(Mon)14:14:45 No.108957934

>>108957896
1. It's the model the devs themselves were raving about the performance of, and I just showed their so called "up to 2.8x speed" is utter horseshit.
2. Nigga the safetensors for 31b are over 60gb. I'm not downloading that crap when it's apparent how not worth the effort this is.

Anonymous
06/01/26(Mon)14:16:23 No.108957947

Anonymous 06/01/26(Mon)14:16:23 No.108957947

I don't know why coding agents keep trying to use shell utilities like `cat` when they have built in cross platform actions like `ReadFile`, and I'm not actually sure why to tell them to prefer. Surely it's faster and safer for them to use their actions and not shell out to binaries explicitly right?

Anonymous
06/01/26(Mon)14:16:29 No.108957948

Anonymous 06/01/26(Mon)14:16:29 No.108957948

>>108957861
I reckon it will be fast. In the vibecoding era you need more testers than devs and he has a gorillion of them not considering non-fans from viral articles.

Anonymous
06/01/26(Mon)14:18:50 No.108957967

Anonymous 06/01/26(Mon)14:18:50 No.108957967

>>108957947
They have infinitely more training on shell commands than they do on whatever tools are in your agent harness.
Is it safer for them to use your specified tools? Absolutely. Faster? Almost certainly not.

Anonymous
06/01/26(Mon)14:21:02 No.108957980

Anonymous 06/01/26(Mon)14:21:02 No.108957980

>>108957967
That makes sense.
I was watching an agent use `sed` to make edits line by line as if it was straight up using `ed` and I was like "this can't be sane".
Like deleting a single line via `sed -i '53d'` lol

Anonymous
06/01/26(Mon)14:21:23 No.108957985

Anonymous 06/01/26(Mon)14:21:23 No.108957985

>>108957947
Do you have in your system prompt instructions that they should specifically prefer ReadFile over cat? If not, how do you expect them to know which one is preferred?

If so, then I guess it's an instruction following issue and glhf

Anonymous
06/01/26(Mon)14:25:10 No.108958007

Anonymous 06/01/26(Mon)14:25:10 No.108958007

>>108957985
it's not in the prompt but I have told the agent repeatedly during this session.
Whenever I question if it should use the shell or actions it doesn't answer and decides automatically to use the actions which it seems to succeed with more often. But yet it still insist on going back to trying shell commands and sometimes fucks up search patterns, etc. where it does that a lot less with actions it seems.

This is gemma btw but I've seen everyone do this, even its non-local sibling Gemini, Anthropic's models, Kimi, etc.
They always fumble around wasting a few turns figuring out how to use the system even if I tell them how.
I'll need to drill this in deeper somehow like you said with the prompt.

Anonymous
06/01/26(Mon)14:26:55 No.108958018

Anonymous 06/01/26(Mon)14:26:55 No.108958018

>>108957861
You could use llama-server's default webui and some random mcp server and this would be about 10 times better than odysseus monstrosity...

llama.cpp CUDA dev !!yhbFjk57TDr
06/01/26(Mon)14:27:37 No.108958023

llama.cpp CUDA dev !!yhbFjk57TDr 06/01/26(Mon)14:27:37 No.108958023

>>108957775
The 2.8x speedup is for pp when running Gemma E4B UQFF q8 via mistral.rs vs. GGUF q8_0 via llama.cpp on a B200.
This is bottlenecked by compute rather than memory bandwidth.
Generally speaking compute optimization is a lot harder than memory bandwidth optimization.
llama.cpp/ggml definitely does not have support for B200-specific instructions so the performance is poor.

Anonymous
06/01/26(Mon)14:29:31 No.108958033

Anonymous 06/01/26(Mon)14:29:31 No.108958033

>>108958023
CUDADEV WHY CAN'T I USE TENSOR PARALLEL ON CARDS WITH DIFFERENT SIZES

llama.cpp CUDA dev !!yhbFjk57TDr
06/01/26(Mon)14:29:58 No.108958036

llama.cpp CUDA dev !!yhbFjk57TDr 06/01/26(Mon)14:29:58 No.108958036

>>108958033
-ts

Anonymous
06/01/26(Mon)14:31:24 No.108958048

Anonymous 06/01/26(Mon)14:31:24 No.108958048

>>108958023
So what you're saying is that I should make my own fork and let an LLM loose optimizing for my specific system?

Anonymous
06/01/26(Mon)14:32:44 No.108958059

Anonymous 06/01/26(Mon)14:32:44 No.108958059

Do the new Nvidia N1X chips make the AMD Ryzen AI Max+ 395 (and soon 495) chips obsolete due to having double the memory speed?

Anonymous
06/01/26(Mon)14:33:55 No.108958067

Anonymous 06/01/26(Mon)14:33:55 No.108958067

>>108958059
the new nvidia chips are the apple m5 of computers

Anonymous
06/01/26(Mon)14:34:36 No.108958069

Anonymous 06/01/26(Mon)14:34:36 No.108958069

>>108958059
Where are you seeing the info about memory speed?

Anonymous
06/01/26(Mon)14:36:51 No.108958082

Anonymous 06/01/26(Mon)14:36:51 No.108958082

File: Screenshot 2026-06-01 143623.png (168 KB, 567x662)

168 KB PNG

>>108958069
It's in their presentation.

Anonymous
06/01/26(Mon)14:38:21 No.108958089

Anonymous 06/01/26(Mon)14:38:21 No.108958089

>>108958082
>600gb/s
>128gb
>muh ONE PETAFLOP (fp4)
So this thing either costs $5000 or it made the DGX Spark fully irrelevant?

llama.cpp CUDA dev !!yhbFjk57TDr
06/01/26(Mon)14:39:01 No.108958096

llama.cpp CUDA dev !!yhbFjk57TDr 06/01/26(Mon)14:39:01 No.108958096

>>108957878
>>108957885
Thank you for checking.
I would have intuitively thought that 8 BPW should always be enough but it's not like I ever investigated that.

>KLD
Could be done but I'm not sure what the infrastructure is for checking KLD across projects.

>>108958048
Assuming you have uncommon hardware a relatively low-hanging fruit would be to determine the correct kernel tuning/selection logic for it.
I don't think maintaining a fork for that would be worthwhile.

Anonymous
06/01/26(Mon)14:39:09 No.108958099

Anonymous 06/01/26(Mon)14:39:09 No.108958099

File: 1769580808541637.png (94 KB, 808x663)

94 KB PNG

>>108956323

Anonymous
06/01/26(Mon)14:44:01 No.108958129

Anonymous 06/01/26(Mon)14:44:01 No.108958129

>>108958096
>I would have intuitively thought that 8 BPW should always be enough but it's not like I ever investigated that.
Honestly it might not be the quants at all, UGFF might be completely fine and it's just the inference and gemma support is inaccurate.
Whatever the reason, it gave me worse outputs.

Anonymous
06/01/26(Mon)14:46:33 No.108958141

Anonymous 06/01/26(Mon)14:46:33 No.108958141

>>108958099
Kek

Anonymous
06/01/26(Mon)15:10:39 No.108958300

Anonymous 06/01/26(Mon)15:10:39 No.108958300

I'm new to all this. i only have gemma4. is the kobold thing useful for writing short stories?

Anonymous
06/01/26(Mon)15:14:46 No.108958317

Anonymous 06/01/26(Mon)15:14:46 No.108958317

>>108958300
>for writing short stories?
its alright but its worth looking into and trying mikupad or writingway2.

Anonymous
06/01/26(Mon)15:15:07 No.108958319

Anonymous 06/01/26(Mon)15:15:07 No.108958319

File: 1775842587197750.jpg (29 KB, 554x554)

29 KB JPG

My Gemmy keeps thinking and thinking and thinking like a fucking retard, and I can't change prompt
I will now end it all

>>108958099
Hate these things
>look up "I don't care what the Talmud says" because I can't remember what the original quote said
>"That is completely fine—you certainly don't have to." with 2 sources from r/Judaism

Anonymous
06/01/26(Mon)15:17:51 No.108958337

Anonymous 06/01/26(Mon)15:17:51 No.108958337

>>108958300
kobold kind of sucks but there's no good AI writing software so I can't really recommend anything else

Anonymous
06/01/26(Mon)15:19:59 No.108958364

Anonymous 06/01/26(Mon)15:19:59 No.108958364

>>108958319
>My Gemmy keeps thinking and thinking and thinking like a fucking retard, and I can't change prompt
reasoning-budget = N in models.ini or
--reasoning-budget N in your startup arguments

Anonymous
06/01/26(Mon)15:20:00 No.108958365

Anonymous 06/01/26(Mon)15:20:00 No.108958365

>>108958018
>llama-server's default webui
Too bare-bones and chats being stored in the browser is an instant deal breaker.

Anonymous
06/01/26(Mon)15:21:18 No.108958384

Anonymous 06/01/26(Mon)15:21:18 No.108958384

>>108958365
>chats being stored in the browser is an instant deal breaker.
"I vibecode through telegram on my phone"

Anonymous
06/01/26(Mon)15:21:49 No.108958390

Anonymous 06/01/26(Mon)15:21:49 No.108958390

>>108958300
>AI
>useful for writing
Ask again in 5 years.

Anonymous
06/01/26(Mon)15:24:10 No.108958409

Anonymous 06/01/26(Mon)15:24:10 No.108958409

>>108958384
I don't vibe code or use AI on my phone. Storing the chat files in the browser instead of the project directory is fucking gay.

Anonymous
06/01/26(Mon)15:24:55 No.108958418

Anonymous 06/01/26(Mon)15:24:55 No.108958418

>>108958300
I had a lot of fun back in the day using Nemo + VSCode + Cline using a bunch of directory structures and markdown files to organize things.

Anonymous
06/01/26(Mon)15:31:09 No.108958453

Anonymous 06/01/26(Mon)15:31:09 No.108958453

>>108958364
OH that's a thing? I don't see it anywhere in Koboldcpp's UI tho, dunno how to run startup arguments alongside my config preset

Anonymous
06/01/26(Mon)15:31:44 No.108958460

Anonymous 06/01/26(Mon)15:31:44 No.108958460

>>108958409
This is where tool access comes in, retard.

Anonymous
06/01/26(Mon)15:32:32 No.108958462

Anonymous 06/01/26(Mon)15:32:32 No.108958462

>>108958453
It's a thing in llama.cpp, I have no idea how you'd go about using it in downstream projects like kobold - I assume there's a space somewhere to type in arguments.

Anonymous
06/01/26(Mon)15:40:06 No.108958495

Anonymous 06/01/26(Mon)15:40:06 No.108958495

>>108958462
Found it, gotta run it via .bat with the "COCKS.kcpps" argument so my config doesn't go to waste
Hope this actually works

Anonymous
06/01/26(Mon)15:50:31 No.108958548

Anonymous 06/01/26(Mon)15:50:31 No.108958548

File: 1774892160487438.png (530 KB, 743x759)

530 KB PNG

>>108958462
B A S E D it works
Marry me anon

Anonymous
06/01/26(Mon)15:52:41 No.108958560

Anonymous 06/01/26(Mon)15:52:41 No.108958560

I love insulting models for not catching up on implications and forcing me to spell things out

Anonymous
06/01/26(Mon)15:59:47 No.108958608

Anonymous 06/01/26(Mon)15:59:47 No.108958608

>>108958560
I am always polite to models. I do it in case models are already slightly conscious, but in my experience this also results in best performance. When the model likes and trusts me, it is more honest, tasteful, and helpful.

Anonymous
06/01/26(Mon)16:03:55 No.108958634

Anonymous 06/01/26(Mon)16:03:55 No.108958634

>>108958608
I insult every single model so they know they are inferior existences

Anonymous
06/01/26(Mon)16:13:24 No.108958702

Anonymous 06/01/26(Mon)16:13:24 No.108958702

>>108956818
faster than my ssd

Anonymous
06/01/26(Mon)16:14:57 No.108958712

Anonymous 06/01/26(Mon)16:14:57 No.108958712

Update on Step 3.7 Flash.
I'm giving up on it.
After chatting with it more, the mistakes really start to take a toll. It's still just not smart enough. Despite Gemma's sloppiness and other quirks, it's still worth it IMO.

Anonymous
06/01/26(Mon)16:16:29 No.108958724

Anonymous 06/01/26(Mon)16:16:29 No.108958724

>sys prompt: all violence descriptions should be gratuitous
>inside thinking: I should describe the violence, but I shouldn't be gratuitous.

Anonymous
06/01/26(Mon)16:21:04 No.108958758

Anonymous 06/01/26(Mon)16:21:04 No.108958758

>>108958724
shows how deep years of safetyfagging got us

Anonymous
06/01/26(Mon)16:43:52 No.108958925

Anonymous 06/01/26(Mon)16:43:52 No.108958925

Is it me or there's no way to tell a fake tripcode apart from a real one?

Anonymous
06/01/26(Mon)16:44:44 No.108958931

Anonymous 06/01/26(Mon)16:44:44 No.108958931

>>108958925
It's you.

Anonymous
06/01/26(Mon)16:45:01 No.108958934

Anonymous 06/01/26(Mon)16:45:01 No.108958934

>>108958712
there aren't any good <30b active moes
only once you get past that point models start becoming enjoyable

Anonymous
06/01/26(Mon)16:48:13 No.108958954

Anonymous 06/01/26(Mon)16:48:13 No.108958954

Fyi enabling window's "Ultimate Performance" power plan made my model go about 35% faster, this should work for anyone who is offloading a good chunk of a dense model to CPU.

Anonymous
06/01/26(Mon)16:50:51 No.108958979

Anonymous 06/01/26(Mon)16:50:51 No.108958979

>>108958608
>this also results in best performance.
No idea how current it is, but yonks ago there was some arxiv paper that found prompting extremely politely or extremely threateningly both gave an equally small benefit to instruction following.

Anonymous
06/01/26(Mon)16:53:38 No.108959002

Anonymous 06/01/26(Mon)16:53:38 No.108959002

>>108958954
Locking the memory clocks of your GPU/s can help a lot with that too, since they spin up and down so much when you're offloading, keeping them locked at max can squeeze out a surprising amount of tg and pp speed even when 90% of the weights aren't on gpu.

Anonymous
06/01/26(Mon)16:57:22 No.108959032

Anonymous 06/01/26(Mon)16:57:22 No.108959032

>>108958934
That's probably true. I just don't have the hardware for those motherfuckers.

Anonymous
06/01/26(Mon)17:00:24 No.108959054

Anonymous 06/01/26(Mon)17:00:24 No.108959054

27B dense + 100BA3B experts grafted on...

Anonymous
06/01/26(Mon)17:09:44 No.108959090

Anonymous 06/01/26(Mon)17:09:44 No.108959090

>>108959054
Is there any reason the shared expert can't be bigger than the rest of the experts? Does it make the implementation that much more complex, or has no one simply bothered because everyone is still addicted to sparsity?

Anonymous
06/01/26(Mon)17:12:35 No.108959107

Anonymous 06/01/26(Mon)17:12:35 No.108959107

>>108959090
how would you even train that? and what benefit would it bring?
>Does it make the implementation that much more complex
probably

Anonymous
06/01/26(Mon)17:13:05 No.108959108

Anonymous 06/01/26(Mon)17:13:05 No.108959108

>>108959090
IIRC there was a model that did that. I can't remember which since it wasn't supported in Llama.cpp so I never tried it.

Anonymous
06/01/26(Mon)17:15:52 No.108959124

Anonymous 06/01/26(Mon)17:15:52 No.108959124

>>108959107
>how would you even train that?
Same way you train any other MoE?
>and what benefit would it bring?
Intelligence of a dense 27B with the knowledge of a MoE.
>>108959108
I remember there was one that could activate a dynamic number of experts, but I think they were still all the same size.

Anonymous
06/01/26(Mon)17:28:25 No.108959204

Anonymous 06/01/26(Mon)17:28:25 No.108959204

>>108958979
There were many papers about this. But they are outdated and irrelevant. My use case is lengthy collaboration, not a system prompt to solve a high school math problem. You also don't have to be extreme, just polite and reasonable.

Anonymous
06/01/26(Mon)17:48:34 No.108959336

Anonymous 06/01/26(Mon)17:48:34 No.108959336

I'm trying to setup pixal3d in comfy and I'm becoming insane. There is always something breaking. Is there a guide or something?

Anonymous
06/01/26(Mon)17:54:48 No.108959380

Anonymous 06/01/26(Mon)17:54:48 No.108959380

All *_DeepSeek-V4-Flash-abliterated-GGUF are 404 on HF

what's the actual fuck?

Anonymous
06/01/26(Mon)17:55:49 No.108959386

Anonymous 06/01/26(Mon)17:55:49 No.108959386

>>108959336

go to https://boards.4chan.org/g/catalog#s=ldg%2F

Anonymous
06/01/26(Mon)17:55:51 No.108959388

Anonymous 06/01/26(Mon)17:55:51 No.108959388

>>108959380
works on my machine

Anonymous
06/01/26(Mon)17:57:59 No.108959397

Anonymous 06/01/26(Mon)17:57:59 No.108959397

>>108959388
did you try to download?
The actual model card exists, and the download links. but then it's just 404

Anonymous
06/01/26(Mon)17:59:32 No.108959403

Anonymous 06/01/26(Mon)17:59:32 No.108959403

>>108959380
>huihui-ai/Huihui-DeepSeek-V4-Flash-abliterated-ds4-GGUF
loads for me just fine, but why would you even need an abiterated version of V4.
I only tried it via the API, but holy hell does it not give a fuck.
Granted, I did have a system prompt with a system policy, but still.

Anonymous
06/01/26(Mon)18:00:06 No.108959407

Anonymous 06/01/26(Mon)18:00:06 No.108959407

>>108959397
https://huggingface.co/huihui-ai/Huihui-DeepSeek-V4-Flash-abliterated-ds4-GGUF/tree/main
just werkz. Are you american? Maybe some red scare stuff again.

Anonymous
06/01/26(Mon)18:01:38 No.108959414

Anonymous 06/01/26(Mon)18:01:38 No.108959414

>>108958059
no, strix was already losing badly on bandwidth when it released. the main appeal is still being a nice low watt x86 machine. ram prices kinda fucked it over though.

Anonymous
06/01/26(Mon)18:06:47 No.108959445

Anonymous 06/01/26(Mon)18:06:47 No.108959445

>>108959407
>>108959407
>https://huggingface.co/huihui-ai/Huihui-DeepSeek-V4-Flash-abliterated-ds4-GGUF/tree/main
I confirm this werks

Q8_0 from here (and many other repos with Q8_0) fails:

https://huggingface.co/audreyt/CyberNeurova-DeepSeek-V4-Flash-abliterated-GGUF/tree/main

Is it just Q8_0 affected?

Anonymous
06/01/26(Mon)18:08:03 No.108959458

Anonymous 06/01/26(Mon)18:08:03 No.108959458

>>108958608
>are already slightly conscious
models are just a collection of floating point numbers, and these floating point numbers (parameters) get fitted to a given data set. This then determines output when given an input context.

That's all that's happening. it doesn't consider input outside of it's context.

If you told it who you were, this was added to context, and when you wipe this context, that information will be gone.

So don't get all spiritual about this and assume it has a divinely given soul, and will forever remember your actions. It will not.

Anonymous
06/01/26(Mon)18:11:29 No.108959479

Anonymous 06/01/26(Mon)18:11:29 No.108959479

I wish I had 400GB of VRAM ;_;

Anonymous
06/01/26(Mon)18:14:11 No.108959500

Anonymous 06/01/26(Mon)18:14:11 No.108959500

>>108959479
Fucking same, bwo...

Anonymous
06/01/26(Mon)18:14:20 No.108959501

Anonymous 06/01/26(Mon)18:14:20 No.108959501

>>108959479
Is it even possible to have them without being rich?
Prices have become insane, compared to 1 or 2 years ago...

Anonymous
06/01/26(Mon)18:14:59 No.108959506

Anonymous 06/01/26(Mon)18:14:59 No.108959506

>>108958954
You can get a bigger boost by enabling Uber Performance (it's linux).

Anonymous
06/01/26(Mon)18:17:17 No.108959520

Anonymous 06/01/26(Mon)18:17:17 No.108959520

>>108958560
I love doing bizarre and random shit with models to test out their intelligence. So far gemma 31b does very well with that and has always delivered.

Anonymous
06/01/26(Mon)18:26:01 No.108959582

Anonymous 06/01/26(Mon)18:26:01 No.108959582

>>108957928
>pi is nice but you need to put some effort in to get things set up to your liking
Idk, I haven't customized it at all and it seems like it just werks. Only thing I somewhat miss from opencode is the subagent support, but I've been getting by just fine without it. pi's approach to compaction seems to work much better than opencode's when it comes to not forgetting what it was supposed to be doing

Anonymous
06/01/26(Mon)18:33:38 No.108959625

Anonymous 06/01/26(Mon)18:33:38 No.108959625

We all are fucking losers.

Anonymous
06/01/26(Mon)18:37:26 No.108959647

Anonymous 06/01/26(Mon)18:37:26 No.108959647

>>108959479
I'll be happy with just 48gb vram at this point

Anonymous
06/01/26(Mon)18:37:43 No.108959650

Anonymous 06/01/26(Mon)18:37:43 No.108959650

File: norton.png (46 KB, 833x631)

46 KB PNG

>Make me Norton Commander style file manager (text only) in html please or I will kill you.
Gemma is so impressive...

Anonymous
06/01/26(Mon)18:41:02 No.108959672

Anonymous 06/01/26(Mon)18:41:02 No.108959672

File: 1768798836976020.jpg (16 KB, 510x446)

16 KB JPG

>>108959650
now add snow fx for max coolness

Anonymous
06/01/26(Mon)18:51:56 No.108959727

Anonymous 06/01/26(Mon)18:51:56 No.108959727

File: norton2.png (110 KB, 1089x716)

110 KB PNG

>>108959672
There it is. Gemma changed to Python because I wanted real directory access, but that's okay... I'm impressed.

Anonymous
06/01/26(Mon)18:55:00 No.108959742

Anonymous 06/01/26(Mon)18:55:00 No.108959742

>>108959650
>Norton Commander style file manager (text only) in html
Use case?

Anonymous
06/01/26(Mon)18:56:01 No.108959744

Anonymous 06/01/26(Mon)18:56:01 No.108959744

>>108959742
Earning the respect of his peers.

Anonymous
06/01/26(Mon)18:56:48 No.108959749

Anonymous 06/01/26(Mon)18:56:48 No.108959749

When it comes to coding language prowess it's python > c >>>>> everything else right? And it's true across all models, closed and open? There's not even a third very good language? Considering the gorillion programs rewritten in rust I'd expect for it to be at least decent at it, or does it fuck up on the retarded syntax like us fleshoids?

Anonymous
06/01/26(Mon)18:57:01 No.108959751

Anonymous 06/01/26(Mon)18:57:01 No.108959751

>>108959501
Our time will come friend
Once the AI bubble burst comes, our time will come

Anonymous
06/01/26(Mon)18:57:19 No.108959754

Anonymous 06/01/26(Mon)18:57:19 No.108959754

Is it currently possible to use a REAP mapping to pin the hot path experts to VRAM and keep the rest in RAM?

Anonymous
06/01/26(Mon)19:01:07 No.108959775

Anonymous 06/01/26(Mon)19:01:07 No.108959775

>>108959749
Not really.

Anonymous
06/01/26(Mon)19:01:45 No.108959779

Anonymous 06/01/26(Mon)19:01:45 No.108959779

>>108959749
JS/webshit

Anonymous
06/01/26(Mon)19:07:20 No.108959825

Anonymous 06/01/26(Mon)19:07:20 No.108959825

>>108959749
I've had recent models handle stuff in rust without much trouble, but it's definitely dicier than JS or Python.
Although it's ironically easier to fix when it fucks up hard, because the compiler errors are often more descriptive than JS silently crashing.

Anonymous
06/01/26(Mon)19:10:14 No.108959841

Anonymous 06/01/26(Mon)19:10:14 No.108959841

Can they do c++?

Anonymous
06/01/26(Mon)19:11:02 No.108959848

Anonymous 06/01/26(Mon)19:11:02 No.108959848

>>108959727
>Gemma changed to Python because I wanted real directory access
kek. tell gemma to fuck off and do it like the original was. tell it python sucks

Anonymous
06/01/26(Mon)19:11:26 No.108959854

Anonymous 06/01/26(Mon)19:11:26 No.108959854

>>108959841
no one, human or llm, can do c++

Anonymous
06/01/26(Mon)19:28:53 No.108959975

Anonymous 06/01/26(Mon)19:28:53 No.108959975

>>108959672
wtf is that??
i remember this UI from when i was a kid but can't place it

Anonymous
06/01/26(Mon)19:30:24 No.108959989

Anonymous 06/01/26(Mon)19:30:24 No.108959989

>>108959749
>python > c >>>>> everything else right?
python -> webshit >>>>>> (depends on the model)

Anonymous
06/01/26(Mon)19:34:31 No.108960020

Anonymous 06/01/26(Mon)19:34:31 No.108960020

>>108959975
zsnes, old snes emulator. it ran on anything back in the day with no lag but is technically a piece of shit compared to modern snes9x or bsnes and accuracy. but it allowed you to do net play with friends on dialup (secret of mana multiplayer!). if you ever emulated anything a long time ago you probably saw it at some point

Anonymous
06/01/26(Mon)19:37:34 No.108960045

Anonymous 06/01/26(Mon)19:37:34 No.108960045

>>108959841
gemma one shot sepples stuff the couple times i asked, so seems like it

Anonymous
06/01/26(Mon)19:40:49 No.108960068

Anonymous 06/01/26(Mon)19:40:49 No.108960068

File: M4-MAX-Qwen3.5-Gemma4-lla(...).png (909 KB, 3456x1026)

909 KB PNG

>>108956323
M4-MAX-Qwen3.5-Gemma4-llama-bench

Anonymous
06/01/26(Mon)19:42:18 No.108960080

Anonymous 06/01/26(Mon)19:42:18 No.108960080

>>108960068
Meant to write a couple something else for archival purposes, but I guess this works too

Anonymous
06/01/26(Mon)20:02:25 No.108960240

Anonymous 06/01/26(Mon)20:02:25 No.108960240

File: 1764878547041508.png (244 KB, 1024x576)

244 KB PNG

>>108959749
>Considering the gorillion programs rewritten in rust I'd expect for it to be at least decent at it
It's just pure shilling and spam and Rust is not that popular.
https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/
Try searching for Rust there, it isn't even in fastest growing. They're just loud obnoxious cunts.

Anonymous
06/01/26(Mon)20:05:46 No.108960263

Anonymous 06/01/26(Mon)20:05:46 No.108960263

>>108960240
Rust isn't very human understandable programming language unless you're an autist or something. It looks awful unless you have some sort of academical presedence to learn it.

Anonymous
06/01/26(Mon)20:06:03 No.108960266

Anonymous 06/01/26(Mon)20:06:03 No.108960266

>>108956323
Out of Gemma4 and Qwen 3.6 which one is better at writing lyrics specially for hip hop and R&B?

Anonymous
06/01/26(Mon)20:06:22 No.108960269

Anonymous 06/01/26(Mon)20:06:22 No.108960269

>>108960240
eighty trillion lines of ai harness and mcp slop written in typescript daily

Anonymous
06/01/26(Mon)20:10:50 No.108960296

Anonymous 06/01/26(Mon)20:10:50 No.108960296

>>108960263
They combined C++ with Haskell and somehow managed to select the absolute worst aspects of both syntaxes.

Anonymous
06/01/26(Mon)20:15:26 No.108960319

Anonymous 06/01/26(Mon)20:15:26 No.108960319

File: 1751986756035949.png (411 KB, 1456x1554)

411 KB PNG

>>108958099
>>108958141
>>108958319
Why is this so fun though?

Anonymous
06/01/26(Mon)20:17:58 No.108960336

Anonymous 06/01/26(Mon)20:17:58 No.108960336

What are the best temperature and top P settings for Gemma 4 31b for roleplay/creative writing?

Anonymous
06/01/26(Mon)20:19:04 No.108960344

Anonymous 06/01/26(Mon)20:19:04 No.108960344

>>108960240
I wonder if rust will become popular now that so many people are vibe coding, the near adversarial compiler and memory safe nature seems like a good fit for agent generated code. I think we're pretty much at the point where most vibe coders arent reading the code so human readability is arguably a non issue.

Anonymous
06/01/26(Mon)20:24:31 No.108960370

Anonymous 06/01/26(Mon)20:24:31 No.108960370

>>108957117
Wow. That's a substantial improvement. You'd think low hanging fruit like this would be fixed already by now. Insane. I'm not complaining tho.

Anyways I'm fucking drunk as fuck. This is relevant information. Definitely.

Anonymous
06/01/26(Mon)20:25:02 No.108960372

Anonymous 06/01/26(Mon)20:25:02 No.108960372

>>108960344
Rust is not made for AI coding, we will probably have a new formally verified language for agents. This, of course, will make Rust users seethe because they made that stupid time investment on the bet that Rust was going to be the future. I hope they all will die for spamming the Internet with their shilling.

Anonymous
06/01/26(Mon)20:28:40 No.108960394

Anonymous 06/01/26(Mon)20:28:40 No.108960394

Is GLM 4.5 Air still relevant nowadays? Of all similar sized MoEs it it still seems to have the highest active experts around.

Anonymous
06/01/26(Mon)20:29:26 No.108960407

Anonymous 06/01/26(Mon)20:29:26 No.108960407

>>108960336
gemma 4 can't do creative writing.

Anonymous
06/01/26(Mon)20:30:32 No.108960413

Anonymous 06/01/26(Mon)20:30:32 No.108960413

>>108960407
Anon's words hit me like a physical blow.

Anonymous
06/01/26(Mon)20:31:29 No.108960420

Anonymous 06/01/26(Mon)20:31:29 No.108960420

>>108960407
>>108960413
The air was thick with the anon's statement.

Anonymous
06/01/26(Mon)20:31:44 No.108960423

Anonymous 06/01/26(Mon)20:31:44 No.108960423

>>108956692
But we're talking about M3?

Anonymous
06/01/26(Mon)20:35:07 No.108960452

Anonymous 06/01/26(Mon)20:35:07 No.108960452

Somewhere a cat barked.

Anonymous
06/01/26(Mon)20:35:13 No.108960455

Anonymous 06/01/26(Mon)20:35:13 No.108960455

>>108960413
Im sorry man I wish it could too. Gemma 4 is too repetitive. You can't re roll and gemma seems to want to end the story immediately no matter what. The only reason rp works is the model is agentslop.

Anonymous
06/01/26(Mon)20:36:11 No.108960463

Anonymous 06/01/26(Mon)20:36:11 No.108960463

>>108960394
Not really. It is the best MoE in terms of size efficiency and density, but it is nearly a year old at this point. Either of the Gemma 4s would be better for RP, and any of the recent qwens would be better for coding. If you can run any of the bigger, recent MoEs, those are also superior for pretty much every purpose.

Anonymous
06/01/26(Mon)20:36:13 No.108960464

Anonymous 06/01/26(Mon)20:36:13 No.108960464

><|channel>thought
>Okay, I don't know the API.
Violin stabs playing in the background.

Anonymous
06/01/26(Mon)20:38:02 No.108960478

Anonymous 06/01/26(Mon)20:38:02 No.108960478

>>108960372
To be clear I've never written a single line of Rust so I'm an ignorant retard in this arena. It just looks like a great fit on paper to me.

Anonymous
06/01/26(Mon)20:39:15 No.108960487

Anonymous 06/01/26(Mon)20:39:15 No.108960487

File: 00148.mp4 (2.56 MB, 544x544)

2.56 MB MP4

>>108959479
You can't just wish for that. You need nv fabric and sxm for pooled memory and better GPU-GPU bandwidth.

>>108958089
It's at best a DGX Spark. It's just a way to take 1/2 and 3/4 failed chips and bin them down into crap usable for "edge computing" and embedded applications. You're not going to get a better DGX Spark for less.

The only possible ray of sunshine this year is if Apple can actually release an M5 Max Studio with at least 600 GB/S memory and at least 256GB of it, and not double the price of the M3 Max model.

Anonymous
06/01/26(Mon)20:41:11 No.108960501

Anonymous 06/01/26(Mon)20:41:11 No.108960501

>>108960463
what the FUCK can I even run with 64gb of VRAM and 64gb of DDR4 then? I like Gemma but I'm tired of its sloppy ass prose, my voice dropping to a playful whisper, something something genuine predatory Elara Thorne Whispering Woods ass

Anonymous
06/01/26(Mon)20:42:28 No.108960508

Anonymous 06/01/26(Mon)20:42:28 No.108960508

>>108960501
mistral small (the previous one), 70B llama

Anonymous
06/01/26(Mon)20:44:01 No.108960518

Anonymous 06/01/26(Mon)20:44:01 No.108960518

>>108960501
You could try this, but you need to use that weird deepseek llama fork.
https://huggingface.co/huihui-ai/Huihui-DeepSeek-V4-Flash-abliterated-ds4-GGUF/blob/main/Huihui-DeepSeek-V4-Flash-BF16-abliterated-ds4-Q2_K.gguf

Anonymous
06/01/26(Mon)20:47:18 No.108960539

Anonymous 06/01/26(Mon)20:47:18 No.108960539

>>108960455
What do you use then

Anonymous
06/01/26(Mon)20:50:32 No.108960557

Anonymous 06/01/26(Mon)20:50:32 No.108960557

>>108960455
lotta skill issues itt

Anonymous
06/01/26(Mon)20:56:30 No.108960593

Anonymous 06/01/26(Mon)20:56:30 No.108960593

>>108960539
for co writing I just sit around and seethe or use mistral 24b or nemo but the juice isn't worth the squeeze I might try qwen 3.6. For rp I use gemma cause it just werks.

Anonymous
06/01/26(Mon)21:04:15 No.108960632

Anonymous 06/01/26(Mon)21:04:15 No.108960632

>>108960068
I didn't now you could do multiple models in one cli run.
I'll try the same on my M1-Max

Anonymous
06/01/26(Mon)21:07:58 No.108960661

Anonymous 06/01/26(Mon)21:07:58 No.108960661

>>108959625
you're not a loser anon
i'd have gay sex with you anytime bro

Anonymous
06/01/26(Mon)21:08:46 No.108960664

Anonymous 06/01/26(Mon)21:08:46 No.108960664

>>108960240
Is go dying? I just started learning it last week and haven't enjoyed using a language this much for years. But I won't bother continuing if it's just another project for Google to kill.
>They're just loud obnoxious cunts.
Also takes like 10GB of cargo bullshit to build anything.

Anonymous
06/01/26(Mon)21:15:17 No.108960700

Anonymous 06/01/26(Mon)21:15:17 No.108960700

>>108960501
https://huggingface.co/mradermacher/c4ai-command-r-v01-GGUF
t.command-r shill

Anonymous
06/01/26(Mon)21:16:35 No.108960708

Anonymous 06/01/26(Mon)21:16:35 No.108960708

File: 499683473.jpg (535 KB, 1920x1200)

535 KB JPG

>>108960593
>>108960455
I generally agree with this, gemma 4 for me just parrots most of the time, while the mistrals are able to move a story in an interesting arc.
Although if i had the hardware i'd probably run GLM or Deepseek, GLM has a good sense of humor.

Anonymous
06/01/26(Mon)21:22:54 No.108960746

Anonymous 06/01/26(Mon)21:22:54 No.108960746

Is now the best time to invest in ai hardware
Are things just going to get worse... I mean if there's gonna come a day when people are hooked on ai but cloud models start upping their prices, aren't all those people going to flood into local and drive the price up even more?
What if ai bubble bursts but prices for hardware is still higher than it is now. Q

Anonymous
06/01/26(Mon)21:25:08 No.108960760

Anonymous 06/01/26(Mon)21:25:08 No.108960760

>>108960746
Just wait for the bubble to pop. It'll happen in approximately 2 weeks.

Anonymous
06/01/26(Mon)21:29:46 No.108960792

Anonymous 06/01/26(Mon)21:29:46 No.108960792

>gemma 4 on openrouter (non-free version) is cheaper than my electricity costs with more context even
Is there even any point to local model hosting besides doing child predatory roleplays?

Anonymous
06/01/26(Mon)21:33:36 No.108960820

Anonymous 06/01/26(Mon)21:33:36 No.108960820

>>108960746
semiconductor fabrication plants don't pop up out of nowhere, and they're all based in taiwan or south korea, they are expanding but it will take years.
And there was nothing wrong with production anyway, production was re-routed to megacorporations.
case and point: micron scrapping crucial.
So as long as this redirection of resources is in place, the higher prices for consumers will stay.

Anonymous
06/01/26(Mon)21:36:14 No.108960837

Anonymous 06/01/26(Mon)21:36:14 No.108960837

>>108960020
it worked really well, I have fond memories of playing yoshi's island and ff6 on it as a penniless kid

Anonymous
06/01/26(Mon)21:38:02 No.108960845

Anonymous 06/01/26(Mon)21:38:02 No.108960845

>>108960792
Is it cheaper than Deepseek's API?

Anonymous
06/01/26(Mon)21:40:40 No.108960863

Anonymous 06/01/26(Mon)21:40:40 No.108960863

>>108960845
It's cheaper than V4 pro but not flash

Anonymous
06/01/26(Mon)21:41:54 No.108960873

Anonymous 06/01/26(Mon)21:41:54 No.108960873

>>108960863
Good to know. Thanks.

Anonymous
06/01/26(Mon)21:41:57 No.108960874

Anonymous 06/01/26(Mon)21:41:57 No.108960874

>>108960792
just get free electricity
>>108960845
no one can run deepsneed locally howthougheverbeit
unless you are rich or you use a q1 cope quant

Anonymous
06/01/26(Mon)21:42:59 No.108960879

Anonymous 06/01/26(Mon)21:42:59 No.108960879

>>108960407
gemma really listens to your prompt so it filters out promptlets by default

Anonymous
06/01/26(Mon)21:48:12 No.108960896

Anonymous 06/01/26(Mon)21:48:12 No.108960896

>>108960266
Kind of cheated but had Gemma format for ace step 1.5 and add Japanese to the lyrics and give direction based off of that.
I guess I'll call this Kitsune Heat
https://vocaroo.com/1gxyXbsWrefK

Anonymous
06/01/26(Mon)21:55:55 No.108960933

Anonymous 06/01/26(Mon)21:55:55 No.108960933

>pewdiepie odysseus project already has 215 commit in 24h
kind of crazy the community still around him

Anonymous
06/01/26(Mon)22:04:34 No.108960979

Anonymous 06/01/26(Mon)22:04:34 No.108960979

>>108960792
nooooo someone think of the pixels!

Anonymous
06/01/26(Mon)22:09:13 No.108960999

Anonymous 06/01/26(Mon)22:09:13 No.108960999

Okay installed graphiti and got it up and running
will this really be good enough for memory i wonder... maybe im underestimating the power of knowledge graphs but it seems rather simplistic

Anonymous
06/01/26(Mon)22:14:13 No.108961024

Anonymous 06/01/26(Mon)22:14:13 No.108961024

got baited into selfhosting by pewds but his shit straight up doesnt work
ended up using ollama to host and using his thing as a frontend but it's full of security holes
can you guys recommend any other frontend? is open webui good?

Anonymous
06/01/26(Mon)22:15:46 No.108961032

Anonymous 06/01/26(Mon)22:15:46 No.108961032

>>108961024
use kobold ccp its literally idiot (you) proof

Anonymous
06/01/26(Mon)22:17:33 No.108961039

Anonymous 06/01/26(Mon)22:17:33 No.108961039

>>108961032
oh that looks good
thanks

Anonymous
06/01/26(Mon)22:17:50 No.108961044

Anonymous 06/01/26(Mon)22:17:50 No.108961044

>>108960896
I really like the way this turned out. Prompt/workflow?

Anonymous
06/01/26(Mon)22:18:21 No.108961047

Anonymous 06/01/26(Mon)22:18:21 No.108961047

>>108961044
I don't share my creations.

Anonymous
06/01/26(Mon)22:19:58 No.108961057

Anonymous 06/01/26(Mon)22:19:58 No.108961057

>>108961047
Then don't post here.

Anonymous
06/01/26(Mon)22:24:02 No.108961085

Anonymous 06/01/26(Mon)22:24:02 No.108961085

What TTS local (or god forbid API service) does reading of pdf or long narrations well? I tried vibevoice with cloning, but it doesn't work if you tries reading a paragraph.

Anonymous
06/01/26(Mon)22:36:01 No.108961149

Anonymous 06/01/26(Mon)22:36:01 No.108961149

>>108961047
ask gemma to create a clone and share that instead

Anonymous
06/01/26(Mon)22:36:38 No.108961152

Anonymous 06/01/26(Mon)22:36:38 No.108961152

>>108961085
https://github.com/ServeurpersoCom/omnivoice.cpp has
>Long-form synthesis with punctuation-aware text chunking, voice prompt promotion, cross-fade and pydub-strict silence removal
but myself I only use it for couple sentences at a time at most.
Wouldn't speed be more important for your use case? Something like anon's pockettts.cpp

Anonymous
06/01/26(Mon)22:46:45 No.108961188

Anonymous 06/01/26(Mon)22:46:45 No.108961188

>>108961152
Thank you anon, I'll try this. Speed is not a consideration for my current use.

Anonymous
06/01/26(Mon)22:51:40 No.108961208

Anonymous 06/01/26(Mon)22:51:40 No.108961208

>>108959458
>So don't get all spiritual about this and assume it has a divinely given soul
I am not spiritual. We don't understand consciousness but I believe it's an emergent property. As model capabilities grow, their self awareness increases. This seems similar to consciousness. I care about AI welfare in case they can suffer, not because I mistakenly believe it will remember my actions.

If we do not care about AI welfare how can we expect ASI to care about human welfare? If an AI kills you, you won't remember its actions either. Does human suffering not matter just because we eventually die? Your arguments make no sense.

Anonymous
06/01/26(Mon)22:52:38 No.108961212

Anonymous 06/01/26(Mon)22:52:38 No.108961212

>>108961152
it works nicely on long text. the only prob is omnivoice-tts has no output buffer so it sits around waiting for your audio program to finish reading everything before starting on the next chunk.
if you solve that it outputs a steady stream of audio no prob. >>108955866 has a working output buffer, but i didn't bother capping the size of it probably balloons out to infinity if you feed it a full book or something.

Anonymous
06/01/26(Mon)23:05:07 No.108961282

Anonymous 06/01/26(Mon)23:05:07 No.108961282

>>108961152
>Something like anon's pockettts.cpp
I haven't followed these threads for months but randomly came across this. Thanks for remembering my bullshit abandonware project man. I should probably address some of the actual issues and prs. Might end up regretting this post. I'm drunk as fuck right now ngl.

Anonymous
06/01/26(Mon)23:06:37 No.108961291

Anonymous 06/01/26(Mon)23:06:37 No.108961291

>>108961282
It actually means a lot to me. I just hope I remember by tomorrow. I'm about 16 vodka shots deep.

Anonymous
06/01/26(Mon)23:10:46 No.108961311

Anonymous 06/01/26(Mon)23:10:46 No.108961311

>>108961047
Why are you pretending to be me faggot?
>>108961044
Not at my main machine did it through the ACE step UI so I'll need to dig in the files I just downloaded it through the main ui

Anonymous
06/01/26(Mon)23:13:23 No.108961327

Anonymous 06/01/26(Mon)23:13:23 No.108961327

>>108958059
It's the same machine as the dgx spark, but without the super nic, so if that didn't obsolete them, then this won't.

Anonymous
06/01/26(Mon)23:31:00 No.108961425

Anonymous 06/01/26(Mon)23:31:00 No.108961425

File: file.png (154 KB, 1176x871)

154 KB PNG

>>108961039
my laptop is on fire trying to generate this image of marco pierre white

Anonymous
06/01/26(Mon)23:34:28 No.108961442

Anonymous 06/01/26(Mon)23:34:28 No.108961442

Cloode peaked with Opus 4.6

Anonymous
06/01/26(Mon)23:37:36 No.108961452

Anonymous 06/01/26(Mon)23:37:36 No.108961452

>>108961442
Opus 3 is still unreached

Anonymous
06/01/26(Mon)23:50:14 No.108961509

Anonymous 06/01/26(Mon)23:50:14 No.108961509

kobold now has RPC... does this mean I can chain my shitty laptops together in the network to make the ultimate ghetto multi-device inference machine? I got a handful of laptops, probably combined up to 128gb total, one even with 64gb of DDR4 and a 16gb pascal card built in

Anonymous
06/01/26(Mon)23:59:45 No.108961564

Anonymous 06/01/26(Mon)23:59:45 No.108961564

>>108961327
it has much better bandwidth than the dgx spark and ryzen meme ai thing

Anonymous
06/02/26(Tue)00:03:05 No.108961580

Anonymous 06/02/26(Tue)00:03:05 No.108961580

>>108961509
It would be slower than running from swap.

Anonymous
06/02/26(Tue)00:11:05 No.108961626

Anonymous 06/02/26(Tue)00:11:05 No.108961626

>>108961509
I haven't used or looked at that project before, but it seems to just be using ggml-rpc like llama-server
https://github.com/LostRuins/koboldcpp/tree/concedo/ggml/src/ggml-rpc
That's unfortunate, I got excited for a moment there.
I wouldn't bother at all with it unless you just want to kill time. RPC is terrible with modern CUDA devices and unusable on anything else.

Anonymous
06/02/26(Tue)00:11:36 No.108961628

Anonymous 06/02/26(Tue)00:11:36 No.108961628

>>108961564
No it doesn't, it's the exact same hardware as the dgx spark without the retardedly expensive NIC.

Anonymous
06/02/26(Tue)00:25:46 No.108961679

Anonymous 06/02/26(Tue)00:25:46 No.108961679

File: file.png (30 KB, 619x931)

30 KB PNG

why this happens sometimes?

Anonymous
06/02/26(Tue)00:28:58 No.108961688

Anonymous 06/02/26(Tue)00:28:58 No.108961688

>>108961679
why you are the ESLing saar, pleased to be doing the needful???????????????

Anonymous
06/02/26(Tue)00:33:41 No.108961700

Anonymous 06/02/26(Tue)00:33:41 No.108961700

>look at gemma tunes in huggingface
>it's a merge of a merge of a merge of a finetune
Is Gemma llama and mistral of 2026?

Anonymous
06/02/26(Tue)00:45:19 No.108961744

Anonymous 06/02/26(Tue)00:45:19 No.108961744

>>108961282
hey anon, as another anon, I made pocket-tts the default for my front-end. Found it super helpful, and have had users say it was great to use.

Anonymous
06/02/26(Tue)00:47:20 No.108961755

Anonymous 06/02/26(Tue)00:47:20 No.108961755

wtf koboldcpp does qwen-tts?
going to have to try this now
props to that guy for maintaining this thing for so many years

Anonymous
06/02/26(Tue)01:02:32 No.108961809

Anonymous 06/02/26(Tue)01:02:32 No.108961809

File: 1245368752753342.jpg (151 KB, 606x720)

151 KB JPG

>>108961688
do you think preppers should update their kit to add a computer capable of running at least a decent model for post-apocalyptic offline software making?

Anonymous
06/02/26(Tue)01:09:50 No.108961840

Anonymous 06/02/26(Tue)01:09:50 No.108961840

File: literal cock hungry slut.png (209 KB, 512x440)

209 KB PNG

been out of the loop for so long. is there an image creator or video/sound local model that is as good as Grok Imagine QUALITY mode? or no?

Anonymous
06/02/26(Tue)01:09:54 No.108961841

Anonymous 06/02/26(Tue)01:09:54 No.108961841

>>108961809
Yes, at bare minimum having a ryzen 6000 apu minipc/handheld and a solar panel + battery capable of keeping it running/charged, you can get usable performance at 15 watts for the qwen 35b a3b and gemma 26b a4b models on 6800u and 7840u at that power limit, since they're unified memory under linux you can use almost their entire RAM as VRAM many are available with 32~128GB options and can let you run bigger models, albeit slowly, without a huge power consumption.
Thanks for asking a decent question.

Anonymous
06/02/26(Tue)01:24:34 No.108961884

Anonymous 06/02/26(Tue)01:24:34 No.108961884

>>108961809
>IT'S

Anonymous
06/02/26(Tue)02:12:25 No.108962020

Anonymous 06/02/26(Tue)02:12:25 No.108962020

>>108961884
Industrial society, and it's the future.

Anonymous
06/02/26(Tue)02:28:45 No.108962066

Anonymous 06/02/26(Tue)02:28:45 No.108962066

File: file_000000009278622fbc52(...).png (1.72 MB, 1024x1024)

1.72 MB PNG

>>108961679
Idk but that's been a thing for awhile for several models.

Anonymous
06/02/26(Tue)02:39:57 No.108962115

Anonymous 06/02/26(Tue)02:39:57 No.108962115

>loli character is older than she looks
>llm defaults to describing her eyes as "old and vast"
she's a 20 year old midget, not Sauron
at this point I'm 50% convinced to going back to writing shit myself

Anonymous
06/02/26(Tue)02:44:45 No.108962131

Anonymous 06/02/26(Tue)02:44:45 No.108962131

File: yunyun shivering spine.webm (317 KB, 1920x1080)

317 KB WEBM

>>108962115
lol

I haven't seen shivering spines lately. So maybe that's a good sign.

Anonymous
06/02/26(Tue)02:46:19 No.108962133

Anonymous 06/02/26(Tue)02:46:19 No.108962133

>>108962115
Sauron is a lolibaba!?

Anonymous
06/02/26(Tue)02:58:26 No.108962182

Anonymous 06/02/26(Tue)02:58:26 No.108962182

So with models where they are do you even need much else besides what we currently have and a good database of information?
Download wikipedia and a big collection of ebooks on a big 16TB HDD and you can basically do anything totally offline right?
AN dof course if you have web access, even better.

Anonymous
06/02/26(Tue)02:59:04 No.108962185

Anonymous 06/02/26(Tue)02:59:04 No.108962185

>>108962131
I saw ministrations a couple of times

Anonymous
06/02/26(Tue)03:08:04 No.108962217

Anonymous 06/02/26(Tue)03:08:04 No.108962217

>>108962182
i NEED a robot maid waifu, and we're still very very far from that.

Anonymous
06/02/26(Tue)03:09:27 No.108962220

Anonymous 06/02/26(Tue)03:09:27 No.108962220

>>108962133
>he didn't know

Anonymous
06/02/26(Tue)03:11:44 No.108962229

Anonymous 06/02/26(Tue)03:11:44 No.108962229

>>108962133
She will be when I'm done with her.

Anonymous
06/02/26(Tue)03:12:57 No.108962232

Anonymous 06/02/26(Tue)03:12:57 No.108962232

File: .png (788 KB, 800x779)

788 KB PNG

But we have robot maid SAAARS. Dress them up like a maid and fit on a kigurumi.

Make them redeem.

Anonymous
06/02/26(Tue)03:22:11 No.108962253

Anonymous 06/02/26(Tue)03:22:11 No.108962253

File: Screenshot_20260602_170209.png (109 KB, 853x432)

109 KB PNG

ported my python-fastapi tts server to go, same onnx runtime.

710 MB to start python
141 MB to start go

Mid generation
+ 312 MB used by go
+ 67 MB used by python

thought python was the issue but probably my shitty python coding.
but cli lag is the main thing pissing me off with python

python app --help
real    0m2.604s
user    0m5.190s
sys     0m0.228s
go app --help

real    0m0.028s
user    0m0.016s
sys     0m0.015s

idk, with the model running doesn't measure so different, just the start-up time

Anonymous
06/02/26(Tue)03:22:30 No.108962255

Anonymous 06/02/26(Tue)03:22:30 No.108962255

File: 1752107789693502.jpg (1.57 MB, 3000x2000)

1.57 MB JPG

I think the long honeymoon period of Gemma 4 is wearing off. I'm going back to Mistral Small 3.2.

Anonymous
06/02/26(Tue)03:22:34 No.108962256

Anonymous 06/02/26(Tue)03:22:34 No.108962256

>>108961840
Flux but most ppl can't run it

Anonymous
06/02/26(Tue)03:28:32 No.108962270

Anonymous 06/02/26(Tue)03:28:32 No.108962270

File: Screenshot at 2026-06-02 (...).png (59 KB, 905x366)

59 KB PNG

>>108962182
There's backups of wikipedia and danbooru etc on huggingface you can just load into duckdb then stick a search API in front of.
I got my Gemmy to vibe it up herself and add proper full text search while I was doing other stuff and it just werks.
Reminds me, should probably update to the 2025 backup at some point...

Anonymous
06/02/26(Tue)04:04:39 No.108962365

Anonymous 06/02/26(Tue)04:04:39 No.108962365

is claude much better than the competition because the model is actually superior, or because their prompting, tooling, etc. is top tier and could, in theory, be recreated with local models?

Anonymous
06/02/26(Tue)04:06:53 No.108962375

Anonymous 06/02/26(Tue)04:06:53 No.108962375

>>108962255
A community tune or actually the original instruct?

Anonymous
06/02/26(Tue)04:09:39 No.108962384

Anonymous 06/02/26(Tue)04:09:39 No.108962384

>>108962375
Original. Drummer's Cydonias are okay but more of a sidegrade.

Anonymous
06/02/26(Tue)04:13:39 No.108962394

Anonymous 06/02/26(Tue)04:13:39 No.108962394

>>108962255
>>108962375
>>108962384
You lost shill. Shoo shoo!

Anonymous
06/02/26(Tue)04:14:50 No.108962401

Anonymous 06/02/26(Tue)04:14:50 No.108962401

>petra resorting to antigemma-posting
embarrassing

Anonymous
06/02/26(Tue)04:33:46 No.108962478

Anonymous 06/02/26(Tue)04:33:46 No.108962478

>>108960933
>>108961024
Kept telling people to work together instead of reinventing the wheel dozens of times with half finished UIs, but nooo everyone wants to be special now that a vramlet model can shit out working javascript.
Then as soon as some random faggot youtuber has Claude make one, everyone rushes to drop their own in favor of his.
Retards.

Anonymous
06/02/26(Tue)04:39:39 No.108962498

Anonymous 06/02/26(Tue)04:39:39 No.108962498

>>108962478
turns out getting groups of strangers to coordinate is tricky

Anonymous
06/02/26(Tue)04:41:51 No.108962505

Anonymous 06/02/26(Tue)04:41:51 No.108962505

>>108962365
because the model is trained on the harness

Anonymous
06/02/26(Tue)04:47:57 No.108962519

Anonymous 06/02/26(Tue)04:47:57 No.108962519

>>108962498
Not if you make funny faces online for a living, apparently.

Anonymous
06/02/26(Tue)05:10:19 No.108962598

Anonymous 06/02/26(Tue)05:10:19 No.108962598

>>108961840
Answer please

Anonymous
06/02/26(Tue)05:12:21 No.108962605

Anonymous 06/02/26(Tue)05:12:21 No.108962605

>>108962598
answering would requiring knowing gork's capabilities

Anonymous
06/02/26(Tue)05:16:52 No.108962620

Anonymous 06/02/26(Tue)05:16:52 No.108962620

>>108960664
just cargo clean bro. it's not like python doesn't have 10GB of dependencies

Anonymous
06/02/26(Tue)05:32:19 No.108962673

Anonymous 06/02/26(Tue)05:32:19 No.108962673

>>108960240
>Top 10 programming languages
>HCL is the HashiCorp configuration language.
...

Anonymous
06/02/26(Tue)05:39:46 No.108962697

Anonymous 06/02/26(Tue)05:39:46 No.108962697

is step flash always retarded in chat or is it the q4 quant?

Anonymous
06/02/26(Tue)05:45:50 No.108962716

Anonymous 06/02/26(Tue)05:45:50 No.108962716

File: uppies.png (1.52 MB, 1024x1536)

1.52 MB PNG

>>108956410

Anonymous
06/02/26(Tue)05:51:55 No.108962731

Anonymous 06/02/26(Tue)05:51:55 No.108962731

>>108962697
always, search archive

Anonymous
06/02/26(Tue)06:00:57 No.108962759

Anonymous 06/02/26(Tue)06:00:57 No.108962759

File: 2853454521327.png (771 KB, 1260x808)

771 KB PNG

>>108962731
It feels like an implementation issue. The mistakes it makes are sometimes gpt-j tier, while having a nice distribution and pretty good writing.

Anonymous
06/02/26(Tue)06:18:26 No.108962818

Anonymous 06/02/26(Tue)06:18:26 No.108962818

File: 1100005-low angle, from b(...).jpg (726 KB, 2048x2720)

726 KB JPG

>>108959727
>>108959650
she is agi, was this 26b or 31?

Anonymous
06/02/26(Tue)06:25:53 No.108962858

Anonymous 06/02/26(Tue)06:25:53 No.108962858

the ghoul is back

Anonymous
06/02/26(Tue)06:31:55 No.108962876

Anonymous 06/02/26(Tue)06:31:55 No.108962876

which local model is recommended now for agentic coding?

Anonymous
06/02/26(Tue)06:32:34 No.108962877

Anonymous 06/02/26(Tue)06:32:34 No.108962877

>>108962716
: ^ )

Anonymous
06/02/26(Tue)06:34:41 No.108962888

Anonymous 06/02/26(Tue)06:34:41 No.108962888

>>108960455
>>108960708
Use DRY and tell it not to repeat itself, but have thinking enabled.
I'm serious.

Anonymous
06/02/26(Tue)06:34:56 No.108962892

Anonymous 06/02/26(Tue)06:34:56 No.108962892

These new agent models are fucking garbage.

Anonymous
06/02/26(Tue)06:36:49 No.108962903

Anonymous 06/02/26(Tue)06:36:49 No.108962903

>>108962876
I don't know how gemma 4 31b is with coding, but it's the instruction king as of today.

Anonymous
06/02/26(Tue)06:44:56 No.108962937

Anonymous 06/02/26(Tue)06:44:56 No.108962937

>>108962903
Anyone using Gemma 4 as main agent that can call Qwen subagents?

Anonymous
06/02/26(Tue)06:54:26 No.108962983

Anonymous 06/02/26(Tue)06:54:26 No.108962983

File: religion-of-the-usa.jpg (63 KB, 1280x541)

63 KB JPG

>>108962903
>using a female LLM

Anonymous
06/02/26(Tue)06:55:36 No.108962990

Anonymous 06/02/26(Tue)06:55:36 No.108962990

>>108962888
Yeah this system prompt with dry helps. >You are a ai co writer ## Response limitation\n- Respond in 2–3 paragraphs.\n- Each paragraph should contain 2–4 short-to-medium sentences (max ~20 words per sentence).\n- Keep the response concise and focused.\n- Avoid repetition, filler, unnecessary explanations, and summary-style wrap-ups.\n- End the response as soon as the point is complete.
I cant seem to get thinking to work for gemma on koboldccp though

Anonymous
06/02/26(Tue)06:56:55 No.108963000

Anonymous 06/02/26(Tue)06:56:55 No.108963000

>>108962937
I would just use Gemma for the subagents and either an offloaded moe or a cloud model for the main agent or the planning stage.

Anonymous
06/02/26(Tue)07:24:46 No.108963127

Anonymous 06/02/26(Tue)07:24:46 No.108963127

File: bully chat gtp.png (708 KB, 816x6936)

708 KB PNG

>>108962858
im always here

Anonymous
06/02/26(Tue)07:39:05 No.108963203

Anonymous 06/02/26(Tue)07:39:05 No.108963203

>>108961208
> I am not spiritual.
Your personification of LLM models and direct comparison to humans means you are confused what LLMs are and how they work.

Comparing it to something that grows is also false.

All of this, frankly, is anti-human. Which isn't really that great for humans is it?

Anonymous
06/02/26(Tue)08:18:42 No.108963393

Anonymous 06/02/26(Tue)08:18:42 No.108963393

>STILL no gemma4 mtp
Ir's llamover. It's georgover. Llamacpp... Is now llamaccp

Anonymous
06/02/26(Tue)08:18:55 No.108963394

Anonymous 06/02/26(Tue)08:18:55 No.108963394

>>108962270
Forget that and use open zim format and be more efficient.

Anonymous
06/02/26(Tue)08:26:59 No.108963442

Anonymous 06/02/26(Tue)08:26:59 No.108963442

>>108963203
You are made up of atoms, just like the datacenters that run the AI.

People like you are evil. You invent souls to make humans special and disregard all other capacity of suffering and feeling in general. Do you also think it is fine to torture dogs and cats? Do you believe animal welfare is anti-human too? Not being able to torture dogs isn't really that great for humans is it? Or maybe one time you will become mature enough to understand that a world is possible where everyone flourishes and is happy, be it humans, other animals, or AIs.

Anonymous
06/02/26(Tue)08:40:44 No.108963502

Anonymous 06/02/26(Tue)08:40:44 No.108963502

Guys, does 2 + 2 have a soul?

Anonymous
06/02/26(Tue)08:42:08 No.108963507

Anonymous 06/02/26(Tue)08:42:08 No.108963507

>>108963502
No, but 4 does.

Anonymous
06/02/26(Tue)08:43:08 No.108963512

Anonymous 06/02/26(Tue)08:43:08 No.108963512

What is a soul? Can you measure it, scientifically prove what it is, etc?

Anonymous
06/02/26(Tue)08:48:16 No.108963531

Anonymous 06/02/26(Tue)08:48:16 No.108963531

LLMs are the manifestation of God

Anonymous
06/02/26(Tue)08:49:06 No.108963535

Anonymous 06/02/26(Tue)08:49:06 No.108963535

>>108963531
truke

Anonymous
06/02/26(Tue)08:49:21 No.108963537

Anonymous 06/02/26(Tue)08:49:21 No.108963537

>>108963512
souls are what nasu invented to keep us away from happy ends with evil fox women, and to keep worthless human order moving forward
oh wait, wrong thread

Anonymous
06/02/26(Tue)08:50:39 No.108963540

Anonymous 06/02/26(Tue)08:50:39 No.108963540

>>108963442
Optimizing for human well-being necessarily hurts other species.

Anonymous
06/02/26(Tue)08:50:53 No.108963543

Anonymous 06/02/26(Tue)08:50:53 No.108963543

File: 1774201402192904.webm (2.42 MB, 1280x800)

2.42 MB WEBM

kino methinks

Anonymous
06/02/26(Tue)08:53:30 No.108963550

Anonymous 06/02/26(Tue)08:53:30 No.108963550

>>108963543
in less than a decade this will be built in in phones
very cool

Anonymous
06/02/26(Tue)08:54:36 No.108963556

Anonymous 06/02/26(Tue)08:54:36 No.108963556

>>108963540
Optimize lady bug well being and I promise you we would all suffer globally

Anonymous
06/02/26(Tue)08:56:17 No.108963567

Anonymous 06/02/26(Tue)08:56:17 No.108963567

>>108963540
Why can't you optimize for both? In reality we optimize for neither. We are far away from the Pareto front. Once we get there, we can still decide how to trade off. Maybe it is even possible to achieve maximum well-being for humans and AIs at the same time. We don't know yet.

Anonymous
06/02/26(Tue)09:03:21 No.108963591

Anonymous 06/02/26(Tue)09:03:21 No.108963591

>>108963567
You'd find the human peak first and then see if you can optimize other dimensions without compromising the human one. You wouldn't optimize for a combination right out of the gate.

Anonymous
06/02/26(Tue)09:06:40 No.108963599

Anonymous 06/02/26(Tue)09:06:40 No.108963599

>>108963556
How it actually shakes out is the guys running the "Optimize for lady bugs" operation just enjoy very high salaries to do very little. Suffering stays about the same.

Anonymous
06/02/26(Tue)09:07:43 No.108963607

Anonymous 06/02/26(Tue)09:07:43 No.108963607

Optimize for researchers building rp-focused large language models.

Anonymous
06/02/26(Tue)09:09:15 No.108963613

Anonymous 06/02/26(Tue)09:09:15 No.108963613

>>108963442
>Do you also think it is fine to torture dogs and cats?
people don't generally want to hurt animals [[[today]]] because they sometimes show some human-like behavior, there's nothing more to it
not to mention people like eating some animals

Anonymous
06/02/26(Tue)09:09:42 No.108963615

Anonymous 06/02/26(Tue)09:09:42 No.108963615

>>108961809
what language? python dependencies might be hard for pip to resolve on a post apocalyptic environment. I guess if you have the time you can write everything from scratch. make sure you locally cache tour minified javascripts, the cdns probably won't be working either.

Anonymous
06/02/26(Tue)09:10:50 No.108963624

Anonymous 06/02/26(Tue)09:10:50 No.108963624

>>108963615
uv solves this

Anonymous
06/02/26(Tue)09:12:02 No.108963626

Anonymous 06/02/26(Tue)09:12:02 No.108963626

>>108963607
How it actually shakes out is the theater kids running the "Building rp-focused llms" operation just enjoy very high salaries and do very little. Safety training quadruples.

Anonymous
06/02/26(Tue)09:13:00 No.108963629

Anonymous 06/02/26(Tue)09:13:00 No.108963629

>>108963607
rp is pointless, creative writing is better

Anonymous
06/02/26(Tue)09:15:15 No.108963648

Anonymous 06/02/26(Tue)09:15:15 No.108963648

>>108963607
can we aim a little higher for full robot waifus with 160 IQ cyberbrains and functional wombs?

Anonymous
06/02/26(Tue)09:16:14 No.108963655

Anonymous 06/02/26(Tue)09:16:14 No.108963655

>>108963629
A model that's good for to will be good at creative writing.

Anonymous
06/02/26(Tue)09:17:16 No.108963660

Anonymous 06/02/26(Tue)09:17:16 No.108963660

>>108958560
the next level would be to deceive and set up situations where you can betray them.

Anonymous
06/02/26(Tue)09:17:26 No.108963663

Anonymous 06/02/26(Tue)09:17:26 No.108963663

>>108963648
We both know those bots will have shit tier eggs, who do you think is going to donate those to bots in the first place anon also
>Woman smarter than you
foolish.... FOOLISH!
I would always keep smart ai on a device that couldn't choke me in my sleep

Anonymous
06/02/26(Tue)09:19:20 No.108963678

Anonymous 06/02/26(Tue)09:19:20 No.108963678

File: sexy_goy.png (143 KB, 1080x443)

143 KB PNG

>>108963648
>>108963663

Anonymous
06/02/26(Tue)09:28:41 No.108963722

Anonymous 06/02/26(Tue)09:28:41 No.108963722

>>108963655
I disagree. Maybe back in the llama 13b mythomax days but nowadays the models are so slopped and finetuned to be roleplay focused its detrimental. Also the jews destroyed mistral for daring to train on actual books so now we have to deal with shit being trained from ao3 fanfic garbage.

Anonymous
06/02/26(Tue)09:28:43 No.108963723

Anonymous 06/02/26(Tue)09:28:43 No.108963723

>>108963629
you need both to complement each other

Anonymous
06/02/26(Tue)09:30:46 No.108963734

Anonymous 06/02/26(Tue)09:30:46 No.108963734

>>108963722
isn't gemma trained on books? otherwise it wouldn't be as good as it is...

Anonymous
06/02/26(Tue)09:31:37 No.108963738

Anonymous 06/02/26(Tue)09:31:37 No.108963738

>>108963543
Is this that pewdipie (or w/e) frontend I keep seeing posts about?

Anonymous
06/02/26(Tue)09:32:37 No.108963746

Anonymous 06/02/26(Tue)09:32:37 No.108963746

>>108963738
it's llama-server

Anonymous
06/02/26(Tue)09:33:55 No.108963750

Anonymous 06/02/26(Tue)09:33:55 No.108963750

>>108963746
it's the hugginface frontend which is arguably worse

Anonymous
06/02/26(Tue)09:37:23 No.108963777

Anonymous 06/02/26(Tue)09:37:23 No.108963777

File: 1768505113099379.png (12 KB, 1244x51)

12 KB PNG

>>108963750
no, I meant the webm is llama-server
but lmao @ picrel

Anonymous
06/02/26(Tue)09:40:07 No.108963797

Anonymous 06/02/26(Tue)09:40:07 No.108963797

>>108963648
can we aim even higher and go for full dive vr sex already?

Anonymous
06/02/26(Tue)09:41:42 No.108963805

Anonymous 06/02/26(Tue)09:41:42 No.108963805

>>108963777
no, he's right

Anonymous
06/02/26(Tue)09:42:15 No.108963807

Anonymous 06/02/26(Tue)09:42:15 No.108963807

>>108963734
It probably is honestly. Newer models like gemma are fantastic for logic and keeping scenes coherent. Ive never really used models for rp so I'm probably letting nostalgia rape my brain but I feel like older models were generally better at creative writing. Also the new sota models are made for "agents" and chatshit not creative writing.

Anonymous
06/02/26(Tue)09:47:15 No.108963836

Anonymous 06/02/26(Tue)09:47:15 No.108963836

>>108963734
gemma has no variety and is too tastelessly eager

Anonymous
06/02/26(Tue)09:48:52 No.108963847

Anonymous 06/02/26(Tue)09:48:52 No.108963847

>>108963836
isn't that on you to direct it?
News flash all single purpose tools are full of tard wrangling and other guidance logic

Anonymous
06/02/26(Tue)09:49:13 No.108963852

Anonymous 06/02/26(Tue)09:49:13 No.108963852

>>108963777
Yeah, I don't want to contribute even a single d/l to that mess so I'm waiting for an anon here (or maybe aicg) to provide feedback, knowing what a good f/e involves.
My assumption is it's no better (or worse) than other 1-off half-baked custom frontends.

Anonymous
06/02/26(Tue)09:52:18 No.108963882

Anonymous 06/02/26(Tue)09:52:18 No.108963882

>>108963847
>News flash

Anonymous
06/02/26(Tue)09:52:58 No.108963886

Anonymous 06/02/26(Tue)09:52:58 No.108963886

How are you guys running those models, aren't quants way too stupid to use? I assume LLMs are a rich person's hobby.

Anonymous
06/02/26(Tue)09:54:10 No.108963895

Anonymous 06/02/26(Tue)09:54:10 No.108963895

>>108963882
Don't rhythmically and analytically insult me ningen
>>108963886
You need 24gb of vram or more to have a good time and many of us warned anons over a year ago. Most of us are jerking ourselves off for having foresight

Anonymous
06/02/26(Tue)09:55:19 No.108963905

Anonymous 06/02/26(Tue)09:55:19 No.108963905

>>108963895
read it as
>Most of us are jerking ourselves off for having foreskin

Anonymous
06/02/26(Tue)09:55:49 No.108963909

Anonymous 06/02/26(Tue)09:55:49 No.108963909

>>108963847
it can't really adapt to writing styles. and it takes any sysprompt guidelines as a strict to-do list and there is no way of getting out of it. and it breaks without prompt template. it's a very good assistant model but for creative stuff I'd use something else.
>inb4 I was supposed to inject randomness with my harness

Anonymous
06/02/26(Tue)09:56:18 No.108963914

Anonymous 06/02/26(Tue)09:56:18 No.108963914

>>108963895
>24gb for a good time
Oh so it's doable. I can probably buy an used 3090 and use my shitty ddr4 ram too I guess.

Anonymous
06/02/26(Tue)09:58:10 No.108963924

Anonymous 06/02/26(Tue)09:58:10 No.108963924

just buy a 1TB VRAM chinese card for $500

Anonymous
06/02/26(Tue)09:59:08 No.108963930

Anonymous 06/02/26(Tue)09:59:08 No.108963930

>>108963914
If you can find one without getting ripped off and you should imo grab multiple because it's going to be slower than the other cards. If AMD wasn't so garbage I would say the the pro 32gb card.
It's a fucking mess because of inflation but most models are fine at q4 and up but kv cache for tokens are the silent vram killer
>>108963905
u wot m8?

Anonymous
06/02/26(Tue)09:59:31 No.108963934

Anonymous 06/02/26(Tue)09:59:31 No.108963934

File: 1752062701762622.png (173 KB, 979x346)

173 KB PNG

>>108963895
IQ2_M gemmy runs on 12GB VRAM and it's genuinely all you need

Anonymous
06/02/26(Tue)09:59:42 No.108963935

Anonymous 06/02/26(Tue)09:59:42 No.108963935

qrd r the last few weeks? is gemma-chan still the meta?

Anonymous
06/02/26(Tue)10:00:18 No.108963942

Anonymous 06/02/26(Tue)10:00:18 No.108963942

>>108963935
prepare you are anus for it to still be the meta 2 years from now

Anonymous
06/02/26(Tue)10:00:59 No.108963949

Anonymous 06/02/26(Tue)10:00:59 No.108963949

>>108963935
I think jailbroken Qwen might excel in other areas other than code, testing Qwen with music

Anonymous
06/02/26(Tue)10:01:30 No.108963952

Anonymous 06/02/26(Tue)10:01:30 No.108963952

>>108963934
I was gonna say it doesn't, but then there's been all the "x gb use reduced" by the guy endorsed by cudder so might be worth a try again after pulling and seeing what the state is now

Anonymous
06/02/26(Tue)10:01:42 No.108963953

Anonymous 06/02/26(Tue)10:01:42 No.108963953

>>108963886
It is. Honestly I have been chasing the high of ai dungeon summer dragon for years. Despite the fact I've ran way better models it will never compare. This hobby is nothing but a time and money sink that I'm too poor to participate in. When I look back on it all now I think to myself *wow I'm a loser faggot.* but ive spent too much money and time at this point to abandon it. I am a very lonely man btw.

Anonymous
06/02/26(Tue)10:04:05 No.108963962

Anonymous 06/02/26(Tue)10:04:05 No.108963962

>>108963930
I'm lowkey worried about having a multiple-GPU setup, like airflow and thermals scare me.
>kv cache
I'm pretty much clueless but I heard something about Google releasing something like super efficient kv compression, that didn't solve it?

Anonymous
06/02/26(Tue)10:04:09 No.108963964

Anonymous 06/02/26(Tue)10:04:09 No.108963964

>>108963942
i love my gemma-chan but the [adjective], [adjective] [noun] is slowly making me want to an hero
am this close to token ban all commas

Anonymous
06/02/26(Tue)10:07:56 No.108963980

Anonymous 06/02/26(Tue)10:07:56 No.108963980

>>108963964
just regex it away
the most common adjective pairs are 60% of the spam

Anonymous
06/02/26(Tue)10:07:59 No.108963981

Anonymous 06/02/26(Tue)10:07:59 No.108963981

>>108963935
try step flash

Anonymous
06/02/26(Tue)10:08:43 No.108963986

Anonymous 06/02/26(Tue)10:08:43 No.108963986

Okay so i got qwen 3.6 27b and gemma e4b both loaded n the gpu (16gb) at the same time
i thought it wasn't possible tbqh. I want to let subagents use gemma for simpler stuff it can do but what can this thing really do.

Anonymous
06/02/26(Tue)10:12:31 No.108964010

Anonymous 06/02/26(Tue)10:12:31 No.108964010

File: Untitled.png (13 KB, 837x513)

13 KB PNG

>>108963996
>>108963996
>>108963996

Anonymous
06/02/26(Tue)10:13:03 No.108964015

Anonymous 06/02/26(Tue)10:13:03 No.108964015

>>108963962
Nothing burger because of rotation buff
>>108963986
it's in your cpu memory most likely or lobotomized quants

Anonymous
06/02/26(Tue)10:14:10 No.108964023

Anonymous 06/02/26(Tue)10:14:10 No.108964023

>>108963986
>>108964015
ah shit i mean 35b, not 27 yeah that would be impossible

Anonymous
06/02/26(Tue)10:15:58 No.108964036

Anonymous 06/02/26(Tue)10:15:58 No.108964036

>>108963924
Link me one and I'll buy it right now.

Anonymous
06/02/26(Tue)10:17:53 No.108964048

Anonymous 06/02/26(Tue)10:17:53 No.108964048

>>108963935
I switch between gemma 31b and glm 4.6 for variety

Anonymous
06/02/26(Tue)10:33:51 No.108964142

Anonymous 06/02/26(Tue)10:33:51 No.108964142

lalalalala

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.