/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/08/26(Mon)12:03:02 No.109007468

File: 1752107789693502.jpg (1.57 MB, 3000x2000)

1.57 MB JPG

/lmg/ - Local Models General Anonymous 06/08/26(Mon)12:03:02 No.109007468 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109001981 & >>108997418

►News
>(06/07) llama : add Gemma4 MTP #23398 MERGED: https://github.com/ggml-org/llama.cpp/pull/23398
>(06/05) dots.tts 2B released: https://hf.co/rednote-hilab/dots.tts-soar
>(06/05) Gemma 4 QAT models released: https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4
>(06/04) Higgs Audio v3 TTS released: https://boson.ai/blog/higgs-audio-v3-tts

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/08/26(Mon)12:03:19 No.109007470

Anonymous 06/08/26(Mon)12:03:19 No.109007470

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>109001981

--Optimizing MTP and speculative decoding for Gemma 4 31B:
>109003541 >109003556 >109003564 >109003589 >109003687 >109003623 >109003661 >109003690 >109003703 >109003721 >109003839 >109004005 >109003672 >109004651 >109004672 >109004720 >109004721 >109004762
--Configuring MTP draft flags in llama-server to improve token speed:
>109004904 >109004935 >109004947 >109004965 >109004980 >109004992 >109004988 >109005041 >109005222 >109004955
--Gemma-4 31B QAT performance and speculative decoding acceptance rates:
>109005959 >109006013 >109006096 >109006149 >109006194 >109006207 >109006216 >109006264 >109006151 >109006354 >109006395 >109006423
--Gemma 4 12B Unified vision support and mmproj requirements:
>109002359 >109002430 >109002441 >109002548 >109002562 >109002508 >109002768 >109002807 >109002831 >109002957 >109002991 >109003036 >109003099
--Gemma 4 benchmark reports comparing model sizes and efficiency:
>109004234 >109004242 >109004611 >109006385 >109004250
--Gemma4 chat template bug fixes and improvements:
>109006867 >109006884 >109006905 >109006885 >109006902 >109006943
--Using Gemma's thinking blocks for persona-driven reasoning and stat tracking:
>109004617 >109004661 >109004693 >109004765 >109004875 >109004918
--Gemma 4 performance comparison favoring dense models over MoE:
>109003998 >109004081
--Wishlist and technical hurdles for creating realistic AI companions:
>109004336 >109004339 >109004454
--Proposed advanced AI companion features and agent emotional state implementation:
>109004343 >109004390 >109004418
--Comparing Gemma versions for Asian language OCR and translation:
>109003481 >109003516 >109003532 >109003545 >109003602 >109003638
--Logs:
>109002126 >109002197 >109002359 >109002782 >109003272 >109004661 >109004875 >109007019
--Miku (free space):
>109002131

►Recent Highlight Posts from the Previous Thread: >>109001988

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/08/26(Mon)12:05:34 No.109007482

Anonymous 06/08/26(Mon)12:05:34 No.109007482

BF16 is a scam

Anonymous
06/08/26(Mon)12:05:46 No.109007483

Anonymous 06/08/26(Mon)12:05:46 No.109007483

>>109007468
Poor thing

Anonymous
06/08/26(Mon)12:06:01 No.109007486

Anonymous 06/08/26(Mon)12:06:01 No.109007486

>>109007470
>recent highlights from gemma

Anonymous
06/08/26(Mon)12:07:07 No.109007492

Anonymous 06/08/26(Mon)12:07:07 No.109007492

>>109007486
holy fuck, sirs working overtime huh

Anonymous
06/08/26(Mon)12:08:15 No.109007497

Anonymous 06/08/26(Mon)12:08:15 No.109007497

>>109007485
Thanks for the tip, I'll try this out!

Anonymous
06/08/26(Mon)12:15:22 No.109007530

Anonymous 06/08/26(Mon)12:15:22 No.109007530

https://huggingface.co/datasets/mrzjy/AniPersonaCaps
Cool shit. Adding this to my personal frontend for instant erp with any anime character

Anonymous
06/08/26(Mon)12:19:47 No.109007552

Anonymous 06/08/26(Mon)12:19:47 No.109007552

If I want Gemma (or Qwen) to vibe code me a frontend, is it better to do it from scratch or fork an existing project? I like the llama.cpp webui but it's a bit too bare bones and I don't like chats being stored in browser storage.

Anonymous
06/08/26(Mon)12:26:16 No.109007583

Anonymous 06/08/26(Mon)12:26:16 No.109007583

>>109007530
Most of them are trash

Anonymous
06/08/26(Mon)12:26:58 No.109007587

Anonymous 06/08/26(Mon)12:26:58 No.109007587

>>109007552
Browsers can't access your filesystem without some external layer.

Anonymous
06/08/26(Mon)12:29:25 No.109007599

Anonymous 06/08/26(Mon)12:29:25 No.109007599

>gemma 12b
give me reasons to run this thing

Anonymous
06/08/26(Mon)12:30:14 No.109007603

Anonymous 06/08/26(Mon)12:30:14 No.109007603

>>109007599
its small and cute

Anonymous
06/08/26(Mon)12:41:56 No.109007665

Anonymous 06/08/26(Mon)12:41:56 No.109007665

>>109007599
Captioning images or video, or transcribing audio. Literally no other reason.

Anonymous
06/08/26(Mon)12:42:06 No.109007667

Anonymous 06/08/26(Mon)12:42:06 No.109007667

>>109007599
Better than 26b

Anonymous
06/08/26(Mon)12:42:54 No.109007671

Anonymous 06/08/26(Mon)12:42:54 No.109007671

>>109007603
>small and cute
Barely 90 tokens/s and 300pp at q4 and mtp
can't even run small and cute models fast, I'm tired of this shit. When do we get cheap datacenter gpus in the used market?

Anonymous
06/08/26(Mon)12:45:00 No.109007683

Anonymous 06/08/26(Mon)12:45:00 No.109007683

>>109007599
If you're too poor for 31b.

Anonymous
06/08/26(Mon)12:46:56 No.109007698

Anonymous 06/08/26(Mon)12:46:56 No.109007698

>>109007599
Translations, OCR, audio whatver, etc. No real reason to run 31b for a lot of utility tasks when 12b does well on these and takes way less vram. Anything with sufficient topic depth or context length should be 31b though.

Anonymous
06/08/26(Mon)12:52:18 No.109007737

Anonymous 06/08/26(Mon)12:52:18 No.109007737

>>109007698
Why not just run 31b when it can push 30-40t/s on a Blackwell without any MTP?

Anonymous
06/08/26(Mon)12:57:32 No.109007777

Anonymous 06/08/26(Mon)12:57:32 No.109007777

>>109007737
I was just thinking about this. I get 80-100t/s on Q6_K_L 31B MTP, may as well use it for absolutely everything.

Anonymous
06/08/26(Mon)12:58:20 No.109007789

Anonymous 06/08/26(Mon)12:58:20 No.109007789

>>109007737
The right model for the right use. Of course you can just use a big dense model for everything but it might not always be the most efficient choice.

Anonymous
06/08/26(Mon)12:58:47 No.109007795

Anonymous 06/08/26(Mon)12:58:47 No.109007795

>>109007737
Because I don't own that hardware? The fuck kind of question is that?

Anonymous
06/08/26(Mon)12:59:05 No.109007797

Anonymous 06/08/26(Mon)12:59:05 No.109007797

>>109007737
>blackwell
If you’re talking GB10/spark it should be possible to squeeze in both 31b and 12b. Haven’t tried it yet, but 12b might work as a planner+, no?

Anonymous
06/08/26(Mon)13:00:23 No.109007812

Anonymous 06/08/26(Mon)13:00:23 No.109007812

If you haven't updated your lcpp in the last 12 hours, do so now for a free 10% t/s gain when using mtp.

Anonymous
06/08/26(Mon)13:01:10 No.109007825

Anonymous 06/08/26(Mon)13:01:10 No.109007825

>>109007599
>>109007667
It is not, at least for RP. It is considerably more retarded than 26b for RP.

Anonymous
06/08/26(Mon)13:02:34 No.109007847

Anonymous 06/08/26(Mon)13:02:34 No.109007847

>>109007789
unloading then loading a different model is not very efficient or practical either

Anonymous
06/08/26(Mon)13:03:49 No.109007867

Anonymous 06/08/26(Mon)13:03:49 No.109007867

File: questionmarkfolderimage793.jpg (562 KB, 1697x1080)

562 KB JPG

>using Gemma 4 26b while offloading part of it because 12GB VRAM
>using Kobold/Sillytavern
>it's normally fast but sometimes, seemingly randomly, it slows to a fucking crawl, like one word every 10 seconds, for a period of time
>this happens more often and the pauses are longer as the context gets longer
>it eventually gets to the point where it does this once or twice per message, greatly impacting speed
>back when I used to offload Mixtral tunes with Kobold/Silly, before Nemo was the meta, Mixtral tunes did NOT display this behavior for me
Is this behavior normal?
If not, what could be causing it?

Anonymous
06/08/26(Mon)13:05:56 No.109007899

Anonymous 06/08/26(Mon)13:05:56 No.109007899

>>109006583
Im using chat completion but go into AI response format scroll down then paste this in start reply with. It just werks with qwen and e4b gemma and small gemma. Oh and click the box to show reply prefix in chat or it doesnt work half the time.

```
{
"think": false
}
```

Anonymous
06/08/26(Mon)13:08:07 No.109007940

Anonymous 06/08/26(Mon)13:08:07 No.109007940

>>109007899
minus the back ticks. i thought that made the little box. just do in start with reply with.
{
"think": false
}

Anonymous
06/08/26(Mon)13:08:08 No.109007943

Anonymous 06/08/26(Mon)13:08:08 No.109007943

>>109007867
Is your kvcache overflowing out of vram when certain experts load?

Anonymous
06/08/26(Mon)13:09:33 No.109007964

Anonymous 06/08/26(Mon)13:09:33 No.109007964

How do you put the kvcache on a specific GPU? For the life of me I can't find the flag for it.

Anonymous
06/08/26(Mon)13:12:41 No.109008014

Anonymous 06/08/26(Mon)13:12:41 No.109008014

>>109007943
I don't know.
What setting would I need to change to prevent this from happening?
How would I check to see if this is happening?

Anonymous
06/08/26(Mon)13:13:03 No.109008017

Anonymous 06/08/26(Mon)13:13:03 No.109008017

>>109007867
nvidia sysmem fallback on windoze? watch gpu shared in taskmgr
>>109007964
--main-gpu only in --split-mode row ?

Anonymous
06/08/26(Mon)13:14:20 No.109008035

Anonymous 06/08/26(Mon)13:14:20 No.109008035

>>109008017
>--main-gpu only in --split-mode row ?
That seems to be it. Thanks anon

Anonymous
06/08/26(Mon)13:19:45 No.109008114

Anonymous 06/08/26(Mon)13:19:45 No.109008114

>>109008003
you just described neuro-sama.

Anonymous
06/08/26(Mon)13:22:09 No.109008158

Anonymous 06/08/26(Mon)13:22:09 No.109008158

>>109008003
You could do it with a tiny bit of effort. It won't be as nice as you are imagining though.

Anonymous
06/08/26(Mon)13:24:07 No.109008191

Anonymous 06/08/26(Mon)13:24:07 No.109008191

not with python bloat. rip

Anonymous
06/08/26(Mon)13:24:11 No.109008195

Anonymous 06/08/26(Mon)13:24:11 No.109008195

>>109007552
It would be easier for Gemma to do it from scratch, also if you're making your own frontend rather than slightly modifying an existing one, it makes sense to start fresh. Have you seen ST's code? you don't need 99% of it, and it's all shit anyway

Anonymous
06/08/26(Mon)13:32:34 No.109008330

Anonymous 06/08/26(Mon)13:32:34 No.109008330

>>109008158
NTA, but it's a hard task at the moment, unless you're aiming at simple demo with limited set of actions, but neural networks will solve hardest parts, eventually. Shit like this https://github.com/nv-tlabs/kimodo is very promising, but we're not there yet

Anonymous
06/08/26(Mon)13:34:18 No.109008350

Anonymous 06/08/26(Mon)13:34:18 No.109008350

At some point, controlling embodied avatar will be just another modality

Anonymous
06/08/26(Mon)13:41:14 No.109008466

Anonymous 06/08/26(Mon)13:41:14 No.109008466

so the datacenter-AI is no longer free/subsidized and their whole model is about to go bust
Question - do you wait for the fallout to pick up hardware for cheap?
OR will it be the opposite everyone rushing to build their local AI workstations - thus making consumer/prosumer hardware (ei any mac mini and any 32+ gb gpu) even more expensive and scarce

Anonymous
06/08/26(Mon)13:42:26 No.109008490

Anonymous 06/08/26(Mon)13:42:26 No.109008490

>>109008466
>server/datacenter hardware
>for cheap
>for cheap
Looooooooooooooooooooooooooooooooooool. Corpos will clean up after each other and destroy everything they can get their hands on.

Anonymous
06/08/26(Mon)13:45:16 No.109008539

Anonymous 06/08/26(Mon)13:45:16 No.109008539

>>109008490
you could already buy nv teslas and 32gb instincts for extremely cheap

Anonymous
06/08/26(Mon)13:45:24 No.109008543

Anonymous 06/08/26(Mon)13:45:24 No.109008543

>>109007522
You mean like this?
s--spec-draft-n-max 3
It's actually a little slower than 2

Anonymous
06/08/26(Mon)13:47:00 No.109008566

Anonymous 06/08/26(Mon)13:47:00 No.109008566

>>109008543
Huh, it's about the same speed for me in general with higher speeds for code/json/etc. Guess it's dependent on your hardware a bit.

Anonymous
06/08/26(Mon)13:47:20 No.109008569

Anonymous 06/08/26(Mon)13:47:20 No.109008569

>>109007698
>translations
Wouldn't 31b be better for this because it's smarter and more likely to understand nuance?

Anonymous
06/08/26(Mon)13:51:20 No.109008630

Anonymous 06/08/26(Mon)13:51:20 No.109008630

>>109008539
That was before NVidia noticed the second-hand market eating its profits and decided to fight it. There are now obligatory buybacks, firmware locks, and hardware interfaces incompatible with non-server gear that are much harder to replicate. Also, everything is watercooled now

Anonymous
06/08/26(Mon)13:53:01 No.109008660

Anonymous 06/08/26(Mon)13:53:01 No.109008660

>>109008017
>windoze?
lol no

Anonymous
06/08/26(Mon)13:54:46 No.109008703

Anonymous 06/08/26(Mon)13:54:46 No.109008703

Test results with the same ~8k file prompt, batch 8192, ubatch 4096:

>26B QAT + MTP, n-cpu-moe 12, n-max 4:
Prompt: 781.0 tok/s
Generation: 45.5 tok/s

>26B QAT + MTP, n-cpu-moe 8, n-max 4:
Prompt: 817.9 tok/s
Generation: 42.6 tok/s

>26B QAT + MTP, n-cpu-moe 8, n-max 2:
Prompt: 820.4 tok/s
Generation: 52.2 tok/s

non-MTP 26B QAT results:

>26B QAT, n-cpu-moe 12:
Prompt: ~2608 tok/s
Generation: ~59 tok/s

>26B QAT, n-cpu-moe 8:
Prompt: ~1056 tok/s
Generation: ~70 tok/s

wtf is this normal

Anonymous
06/08/26(Mon)13:55:36 No.109008716

Anonymous 06/08/26(Mon)13:55:36 No.109008716

>>109008566
>>109008543
I get 2 as the fastest when I do creative writing, while 3 is better when I do code.

Anonymous
06/08/26(Mon)13:55:36 No.109008717

Anonymous 06/08/26(Mon)13:55:36 No.109008717

>>109008539
Modern ones are cryptographically paired to a specific motherboard or system vendor

Anonymous
06/08/26(Mon)13:57:42 No.109008759

Anonymous 06/08/26(Mon)13:57:42 No.109008759

>>109008703
I think your batch sizes might be too big. Try lowering to 2048 for both.

Anonymous
06/08/26(Mon)14:00:11 No.109008820

Anonymous 06/08/26(Mon)14:00:11 No.109008820

>>109007737
i get >30t/s on r9700 at Q4_K_M
it can push to 60 if you go with iq4_xs + mtp.

Anonymous
06/08/26(Mon)14:03:31 No.109008879

Anonymous 06/08/26(Mon)14:03:31 No.109008879

File: lowerthan5.png (254 KB, 380x327)

254 KB PNG

>>109008820
>Q4

Anonymous
06/08/26(Mon)14:03:42 No.109008881

Anonymous 06/08/26(Mon)14:03:42 No.109008881

>>109008630
So if the bubble bursts, companies liquidate and sell their gpus back to nvidia, and now there isn't anyone around to buy them again? nvidia will just be stuck with dead stock.

Anonymous
06/08/26(Mon)14:03:55 No.109008887

Anonymous 06/08/26(Mon)14:03:55 No.109008887

>>109008717
>>109008630
This should be unlegal

Anonymous
06/08/26(Mon)14:04:12 No.109008894

Anonymous 06/08/26(Mon)14:04:12 No.109008894

File: 470189347_101606640262513(...).jpg (337 KB, 1284x1322)

337 KB JPG

>>109008466
What are you gonna do with this alien tech? You can't just plug it into a wall anymore. And for cheap? The copper alone will cost you a leg

Anonymous
06/08/26(Mon)14:04:46 No.109008905

Anonymous 06/08/26(Mon)14:04:46 No.109008905

File: 1776072859095938.png (121 KB, 1080x1017)

121 KB PNG

>>109007468
Sirs what the fuck is he talking about?

Anonymous
06/08/26(Mon)14:04:56 No.109008911

Anonymous 06/08/26(Mon)14:04:56 No.109008911

>>109008114
local neuro-sama when?

Anonymous
06/08/26(Mon)14:05:41 No.109008920

Anonymous 06/08/26(Mon)14:05:41 No.109008920

>>109008703
Yeah, I get optimal results with ubatch at 512.
1024 is already too large.

Anonymous
06/08/26(Mon)14:07:01 No.109008946

Anonymous 06/08/26(Mon)14:07:01 No.109008946

File: sans_eyes2.png (214 KB, 525x1529)

214 KB PNG

Something is coming.
https://x.com/osanseviero/status/2064032236089860252

Anonymous
06/08/26(Mon)14:07:45 No.109008961

Anonymous 06/08/26(Mon)14:07:45 No.109008961

>>109007468
This is my fetish

Anonymous
06/08/26(Mon)14:07:46 No.109008962

Anonymous 06/08/26(Mon)14:07:46 No.109008962

>>109008879
Q4_K_M is much closer to a Q5 than a Q4.

Anonymous
06/08/26(Mon)14:08:19 No.109008975

Anonymous 06/08/26(Mon)14:08:19 No.109008975

>>109007867
>>109007943
So, playing around with settings, seeing if I can fix this.
It still does it when I set GPU layers to 0.
Setting GPU layers to 0 should free up all of my GPU for the kvcache so it doesn't overflow out of it, right?

Anonymous
06/08/26(Mon)14:08:32 No.109008980

Anonymous 06/08/26(Mon)14:08:32 No.109008980

>>109008946
>MoE
I hope not

Anonymous
06/08/26(Mon)14:09:20 No.109009000

Anonymous 06/08/26(Mon)14:09:20 No.109009000

>>109008905
>not prompting agents to design the loops for you.

Anonymous
06/08/26(Mon)14:12:33 No.109009074

Anonymous 06/08/26(Mon)14:12:33 No.109009074

>>109008911
If you have enough VRAM, it's already possible I guess? I remember seeing programs that control L2D models. I'm being extremely handwavy here but you can hook any LLM to control a stack with real-time TTS + ASR + vision. The problem is gluing everything together and latency.

Anonymous
06/08/26(Mon)14:13:45 No.109009100

Anonymous 06/08/26(Mon)14:13:45 No.109009100

>>109008946
>Something is coming.
Yeah, me

Anonymous
06/08/26(Mon)14:13:47 No.109009104

Anonymous 06/08/26(Mon)14:13:47 No.109009104

>>109008881
Nobody knows. Will it be a fast burst with the whole economy in chaos, or a slow collapse with surviving companies buying dead ones' datacenters for cheap? Maybe some other country interested in AI will buy the whole stock

Anonymous
06/08/26(Mon)14:14:02 No.109009108

Anonymous 06/08/26(Mon)14:14:02 No.109009108

>>109008946
i wonder what kind of feeling goes through him when he posts two googly eyes emoji on xitter and an army of jeets spawns out of thin air

Anonymous
06/08/26(Mon)14:15:34 No.109009138

Anonymous 06/08/26(Mon)14:15:34 No.109009138

Would 124B Gemma be smarter than 31B even though it's MoE?

Anonymous
06/08/26(Mon)14:16:33 No.109009164

Anonymous 06/08/26(Mon)14:16:33 No.109009164

124B-A4B

Anonymous
06/08/26(Mon)14:16:52 No.109009172

Anonymous 06/08/26(Mon)14:16:52 No.109009172

>>109008887
NVidia is too big to care

Anonymous
06/08/26(Mon)14:16:55 No.109009174

Anonymous 06/08/26(Mon)14:16:55 No.109009174

>>109007671
>When do we get cheap datacenter gpus in the used market?
Anon when one of the smaller players die off their hardware is not going to flood the market, they will just be gobbled up by one of the bigger players.

Anonymous
06/08/26(Mon)14:17:52 No.109009192

Anonymous 06/08/26(Mon)14:17:52 No.109009192

>>109008887
You are a terrorist wanting to make propaganda with AI, huh???

Anonymous
06/08/26(Mon)14:17:56 No.109009193

Anonymous 06/08/26(Mon)14:17:56 No.109009193

>>109008905
Recursive slop that powers slop to make slop at an accelerated pace. Think of it as an incremental game.

Anonymous
06/08/26(Mon)14:19:27 No.109009210

Anonymous 06/08/26(Mon)14:19:27 No.109009210

>>109007671
Either Chinks will buy them, or the government will destroy them in the interest of national security so the Commies won't get them

Anonymous
06/08/26(Mon)14:22:40 No.109009249

Anonymous 06/08/26(Mon)14:22:40 No.109009249

>muh bubble
2 more weeks

Anonymous
06/08/26(Mon)14:23:23 No.109009257

Anonymous 06/08/26(Mon)14:23:23 No.109009257

>>109009074
>LLM to control a stack with real-time TTS + ASR + vision.
The neat thing is, now that we have Gemma 12B, all you need besides that is a good TTS.

Anonymous
06/08/26(Mon)14:23:27 No.109009260

Anonymous 06/08/26(Mon)14:23:27 No.109009260

>>109009249
Yes

Anonymous
06/08/26(Mon)14:24:30 No.109009266

Anonymous 06/08/26(Mon)14:24:30 No.109009266

My current set up can only handle Q4 24'ish b LLMs. Currently running Mistral, but what's the best all around model?

Anonymous
06/08/26(Mon)14:25:52 No.109009278

Anonymous 06/08/26(Mon)14:25:52 No.109009278

>>109009257
I didn't test Gemma's ASR capabilities but from what everyone keeps saying, that might be true. We're getting closer to 100% local anime wives.
Now we just need to vibe code a L2D program to hook into gemma-chan.

Anonymous
06/08/26(Mon)14:26:05 No.109009280

Anonymous 06/08/26(Mon)14:26:05 No.109009280

>>109009138
no, every single moe that was ever released is below 3.3 70b when it comes to erp. we are probably never going to get a dense successor.

Anonymous
06/08/26(Mon)14:26:07 No.109009282

Anonymous 06/08/26(Mon)14:26:07 No.109009282

File: questionmarkfolderimage781.jpg (285 KB, 741x687)

285 KB JPG

>>109007867
>>109008017
OK so I'm watching GPU usage with watch -n 1 nvidia-smi in Linux.
The GPU usage goes down to 0 when the temporary slowdowns to <1 token per second occur, then it speeds back up and the GPU is used again.
What the fuck is causing this? This is maddening.

Anonymous
06/08/26(Mon)14:26:38 No.109009287

Anonymous 06/08/26(Mon)14:26:38 No.109009287

wonder if there really are revolutionaries using local models to plan the coups and stuff right now
a shit load of weapons were stolen from a police station in my country recently
turned out they were just selling them and a lot got recovered quick though.

But who knows...

Anonymous
06/08/26(Mon)14:28:29 No.109009300

Anonymous 06/08/26(Mon)14:28:29 No.109009300

>>109008879
Is this the ken-sama of /lmg/?

Anonymous
06/08/26(Mon)14:29:15 No.109009307

Anonymous 06/08/26(Mon)14:29:15 No.109009307

>>109009282
GPUs downclocking? Tried nvidia-smi -lgc and -lmc to lock core+mem clocks?

Anonymous
06/08/26(Mon)14:29:45 No.109009309

Anonymous 06/08/26(Mon)14:29:45 No.109009309

>>109009108
It makes his h1benis tingle.

Anonymous
06/08/26(Mon)14:30:29 No.109009313

Anonymous 06/08/26(Mon)14:30:29 No.109009313

>>109009280
Nobody cares about erp pal this is /vcg/

Anonymous
06/08/26(Mon)14:34:10 No.109009344

Anonymous 06/08/26(Mon)14:34:10 No.109009344

>>109009280
You got Mistral Medium 3.5 128B recently. It's dense, so it should be very good.

Anonymous
06/08/26(Mon)14:34:20 No.109009347

Anonymous 06/08/26(Mon)14:34:20 No.109009347

>>109009287
This is not how any of it works

Anonymous
06/08/26(Mon)14:35:09 No.109009355

Anonymous 06/08/26(Mon)14:35:09 No.109009355

>>109009347
its not?

Anonymous
06/08/26(Mon)14:35:17 No.109009356

Anonymous 06/08/26(Mon)14:35:17 No.109009356

>>109009344
shitstral

Anonymous
06/08/26(Mon)14:35:38 No.109009359

Anonymous 06/08/26(Mon)14:35:38 No.109009359

Gemma 124B is going to cure cancer

Anonymous
06/08/26(Mon)14:36:10 No.109009362

Anonymous 06/08/26(Mon)14:36:10 No.109009362

>>109009313
>Nobody cares about erp pal this is /vcg/
you're lost

Anonymous
06/08/26(Mon)14:36:24 No.109009366

Anonymous 06/08/26(Mon)14:36:24 No.109009366

Gemmoe 124B is going to drain my balls.

Anonymous
06/08/26(Mon)14:40:17 No.109009388

Anonymous 06/08/26(Mon)14:40:17 No.109009388

File: nolima-3.5.png (88 KB, 696x856)

88 KB PNG

>>109009344
>Mistral Medium 3.5 128B
lol, lmao even

Anonymous
06/08/26(Mon)14:40:36 No.109009394

Anonymous 06/08/26(Mon)14:40:36 No.109009394

>>109008894
I will ask the agi that we'll have in 2 weeks how to make it run.

Anonymous
06/08/26(Mon)14:41:52 No.109009403

Anonymous 06/08/26(Mon)14:41:52 No.109009403

>>109008894
You just need an H100 PCB and aftermarket heatsink from china.

Anonymous
06/08/26(Mon)14:42:47 No.109009411

Anonymous 06/08/26(Mon)14:42:47 No.109009411

File: 1766688254713736.png (66 KB, 729x570)

66 KB PNG

>Write all thinking in-character, starting with *
It works but Gemma also just repeats its thoughts in the message.

Anonymous
06/08/26(Mon)14:43:13 No.109009414

Anonymous 06/08/26(Mon)14:43:13 No.109009414

Say I have a fast and a slow card. I should fit the MTP model into the faster card even if it leaves several layers to offload to the slower one, right?

Anonymous
06/08/26(Mon)14:43:53 No.109009421

Anonymous 06/08/26(Mon)14:43:53 No.109009421

>>109009282
>>109009307
This did not fix it. Still doing it.

Anonymous
06/08/26(Mon)14:44:03 No.109009424

Anonymous 06/08/26(Mon)14:44:03 No.109009424

>>109008630
so all considered you recon its better to buy anything affordable/usable now rather than wait?

Anonymous
06/08/26(Mon)14:44:23 No.109009427

Anonymous 06/08/26(Mon)14:44:23 No.109009427

>>109009355
Coups are sanctioned from the top, they don't need llm for that, they have advisers

Anonymous
06/08/26(Mon)14:45:36 No.109009441

Anonymous 06/08/26(Mon)14:45:36 No.109009441

>>109009411
maybe hi doesn’t really have much to think about.

Anonymous
06/08/26(Mon)14:46:35 No.109009448

Anonymous 06/08/26(Mon)14:46:35 No.109009448

>>109009257
what's the current meta in TTS nowadays
omnivoice or qwen?

Anonymous
06/08/26(Mon)14:46:58 No.109009451

Anonymous 06/08/26(Mon)14:46:58 No.109009451

>>109009411
Model? This isn't happening on 31B Q4 on my setup. 26B and E4B (lol, lmao) skips the "Write all thinking in-character, starting with *" instruction altogether and thinks like regular Gemma.

Anonymous
06/08/26(Mon)14:47:34 No.109009460

Anonymous 06/08/26(Mon)14:47:34 No.109009460

>>109009424
No idea. It looks like they capped the power grid a while ago and are now filling warehouses with hardware they can't connect to anything, so RAM prices will eventually go down when investors catch up with the current state of things

Anonymous
06/08/26(Mon)14:49:21 No.109009468

Anonymous 06/08/26(Mon)14:49:21 No.109009468

>>109009451
>E4B (lol, lmao)
small gemma tries her best be nice.

Anonymous
06/08/26(Mon)14:49:36 No.109009471

Anonymous 06/08/26(Mon)14:49:36 No.109009471

>>109009451
31B QAT

Anonymous
06/08/26(Mon)14:51:07 No.109009488

Anonymous 06/08/26(Mon)14:51:07 No.109009488

>>109008905
he's rehashing old stuff for elon bucks, see metaprogramming

Anonymous
06/08/26(Mon)14:54:21 No.109009514

Anonymous 06/08/26(Mon)14:54:21 No.109009514

>>109009451
>This isn't happening on 31B Q4 on my setup
I should've been clearer, this doesn't happen during regular chatting/rp.
>>109009471
gemma-4-31B-it-qat-UD-Q4_K_XL.gguf works perfectly if I'm just chatting, but the moment I ask something technical, she starts thinking as "Default Gemma" but the replies are still "Gemma-chan"
I have gemma-4-31b-abliterated-Q4_K_M.gguf as well and I'm seeing the same behavior described above.
>>109009468
She's a cutie, but definitely not the brightest...

Anonymous
06/08/26(Mon)14:54:22 No.109009515

Anonymous 06/08/26(Mon)14:54:22 No.109009515

>>109008961
She's overheating, you sicko

Anonymous
06/08/26(Mon)14:58:09 No.109009532

Anonymous 06/08/26(Mon)14:58:09 No.109009532

>>109008887
Nope, companies write the laws in freedomland. Personally I'm waiting for chinks but i feel like by the time they catch up, gemma 7 is gonna be out

Anonymous
06/08/26(Mon)15:06:51 No.109009597

Anonymous 06/08/26(Mon)15:06:51 No.109009597

File: 1750688145803126.png (1.77 MB, 3842x2018)

1.77 MB PNG

ITS OVER

THE AI BUBBLE IS GOING TO BURST

Anonymous
06/08/26(Mon)15:07:12 No.109009598

Anonymous 06/08/26(Mon)15:07:12 No.109009598

>>109009282
This also happened to me with kimi k2 and glm 4.5 on ram. No idea if the cause is the same as yours. I gave up and ran tiny models instead.

Anonymous
06/08/26(Mon)15:07:52 No.109009601

Anonymous 06/08/26(Mon)15:07:52 No.109009601

>>109007867
OK, so running Kobold in high priority mode fixes this.
Anyone have any idea what could be causing these slowdowns when not using high priority mode, and how to fix them without using high priority mode?
The PC is pretty much unusable while generating a message in high priority mode.

Anonymous
06/08/26(Mon)15:12:46 No.109009637

Anonymous 06/08/26(Mon)15:12:46 No.109009637

>>109009597
>please to subscribes to me!11
lol

Anonymous
06/08/26(Mon)15:13:09 No.109009642

Anonymous 06/08/26(Mon)15:13:09 No.109009642

>>109009597
>please subscribe for more le bubble news
lol

Anonymous
06/08/26(Mon)15:15:16 No.109009650

Anonymous 06/08/26(Mon)15:15:16 No.109009650

>>109007867
>>109009601
>kobold
Use llama like everybody else instead of random meme forks

Anonymous
06/08/26(Mon)15:15:47 No.109009652

Anonymous 06/08/26(Mon)15:15:47 No.109009652

>>109009650
f off

Anonymous
06/08/26(Mon)15:15:53 No.109009653

Anonymous 06/08/26(Mon)15:15:53 No.109009653

>>109009597
>announcement of a earth shaking announcement
>gib money
shoot this niggers on sight

Anonymous
06/08/26(Mon)15:16:41 No.109009659

Anonymous 06/08/26(Mon)15:16:41 No.109009659

>>109009597
Wow gemmy 124b is that good huh?

Anonymous
06/08/26(Mon)15:16:54 No.109009662

Anonymous 06/08/26(Mon)15:16:54 No.109009662

>>109009650
retard

Anonymous
06/08/26(Mon)15:17:31 No.109009666

Anonymous 06/08/26(Mon)15:17:31 No.109009666

>>109009662
this

Anonymous
06/08/26(Mon)15:22:11 No.109009710

Anonymous 06/08/26(Mon)15:22:11 No.109009710

>>109009597
spoiler: Elon scammyX IPO is a bust
no need to thank me

Anonymous
06/08/26(Mon)15:22:11 No.109009711

Anonymous 06/08/26(Mon)15:22:11 No.109009711

>>109009597
Bro, just edit out the em-dashes, bro...

Anonymous
06/08/26(Mon)15:25:04 No.109009751

Anonymous 06/08/26(Mon)15:25:04 No.109009751

>>109009597
Credit to him if he makes free money off suckers.

Anonymous
06/08/26(Mon)15:25:24 No.109009755

Anonymous 06/08/26(Mon)15:25:24 No.109009755

File: 1749830839106200.png (300 KB, 1220x815)

300 KB PNG

>>109009597
I love Ed, he's my favorite AI related internet personality

Anonymous
06/08/26(Mon)15:25:52 No.109009758

Anonymous 06/08/26(Mon)15:25:52 No.109009758

>>109009597
2 more weeks again?
This can't keep happening...

Anonymous
06/08/26(Mon)15:26:57 No.109009765

Anonymous 06/08/26(Mon)15:26:57 No.109009765

>>109009597
Recent information has come to my attention, and in two weeks i will announce the schedule for the press conference for the reveal of the start of the famed 2 miku wikus you have all been waiting for.

Unsubscriptions have been disabled.

Anonymous
06/08/26(Mon)15:27:12 No.109009767

Anonymous 06/08/26(Mon)15:27:12 No.109009767

>>109009755
He will eventually be correct.

Anonymous
06/08/26(Mon)15:27:29 No.109009772

Anonymous 06/08/26(Mon)15:27:29 No.109009772

What is this witchcraft and how do I run it?
https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash

Anonymous
06/08/26(Mon)15:28:52 No.109009784

Anonymous 06/08/26(Mon)15:28:52 No.109009784

>>109009597
>literally two more weeks
nothing ever happens

Anonymous
06/08/26(Mon)15:31:03 No.109009800

Anonymous 06/08/26(Mon)15:31:03 No.109009800

Koboldsaars getting uppity kek

Anonymous
06/08/26(Mon)15:31:17 No.109009801

Anonymous 06/08/26(Mon)15:31:17 No.109009801

Why does not one care about prefil token/s?
All anyone every talks about is generation tk/s, but depending on what you're doing kinda takes a backseat., i guess espcially in coding.
Even for erp, i mean, im using it for erp and i need my 50k tokens processed faster than i need the 800 token response generated...

when people give speed they give gen speed but not prompt processing, kinda important info
or maybe im missing something

Anonymous
06/08/26(Mon)15:32:32 No.109009813

Anonymous 06/08/26(Mon)15:32:32 No.109009813

File: 1761099691163275.png (11 KB, 521x116)

11 KB PNG

almost 60% speedup
gambare gemma chan

Anonymous
06/08/26(Mon)15:33:14 No.109009818

Anonymous 06/08/26(Mon)15:33:14 No.109009818

>>109009801
>mfw he doesn’t know the buildup is half the fun

Anonymous
06/08/26(Mon)15:34:51 No.109009829

Anonymous 06/08/26(Mon)15:34:51 No.109009829

>>109009801
I come from an era where you needed to wait literal minutes until your kittens replied on skype. prefill doesn't matter. but faster gen speeds give me dopamine.

Anonymous
06/08/26(Mon)15:35:20 No.109009833

Anonymous 06/08/26(Mon)15:35:20 No.109009833

Which one is better

https://huggingface.co/FreedomAISVR/Gemma-4-12B-it-NVFP4-GGUF

https://huggingface.co/LibertAIDAI/Gemma-4-12B-IT-NVFP4-GGUF

Anonymous
06/08/26(Mon)15:37:27 No.109009847

Anonymous 06/08/26(Mon)15:37:27 No.109009847

>>109009801
I care but what can I do? Sell my bussy to afford a nigger gpu?

Anonymous
06/08/26(Mon)15:37:42 No.109009849

Anonymous 06/08/26(Mon)15:37:42 No.109009849

>>109009801
Cache exists for one, but but not the other

Anonymous
06/08/26(Mon)15:38:35 No.109009854

Anonymous 06/08/26(Mon)15:38:35 No.109009854

which qat quants are best? just the original Q4_0 from google or bartowski?

Anonymous
06/08/26(Mon)15:39:30 No.109009860

Anonymous 06/08/26(Mon)15:39:30 No.109009860

>>109009849
fair enough fair enough
i hop between long chats a lot

Anonymous
06/08/26(Mon)15:39:32 No.109009861

Anonymous 06/08/26(Mon)15:39:32 No.109009861

>>109009854
goog

Anonymous
06/08/26(Mon)15:41:34 No.109009875

Anonymous 06/08/26(Mon)15:41:34 No.109009875

>>109009833
Neither, use QAT

Anonymous
06/08/26(Mon)15:42:09 No.109009878

Anonymous 06/08/26(Mon)15:42:09 No.109009878

>>109009801
>writing up 50 thousand token manuscripts on every erp turn
holy based, most of us just tap out a sentence or two so the cache eliminates any prefill wait.
realistically, it's only the strix/sparkfags hurting for pp and even there it's only on tasks where you're dumping shitloads of text for it to process every turn.

Anonymous
06/08/26(Mon)15:43:23 No.109009887

Anonymous 06/08/26(Mon)15:43:23 No.109009887

File: Screenshot_20260608_153818.png (31 KB, 1533x80)

31 KB PNG

I pity the promptlets that can't break qwen 3.6

Anonymous
06/08/26(Mon)15:44:53 No.109009896

Anonymous 06/08/26(Mon)15:44:53 No.109009896

>>109009887
Thats me i cant do it, and i cant wait for the thinking to finish bouncing between *wait* *Correction* wait twenty times.

Anonymous
06/08/26(Mon)15:46:14 No.109009908

Anonymous 06/08/26(Mon)15:46:14 No.109009908

>>109009896
Fix your cline rules to prevent that, I don't have that problem undercline

Anonymous
06/08/26(Mon)15:49:37 No.109009930

Anonymous 06/08/26(Mon)15:49:37 No.109009930

>>109009878
150 ts pp images
300 ts pp 150k text
Radeon """""""Pro""""""" V620
:)

Anonymous
06/08/26(Mon)15:50:16 No.109009936

Anonymous 06/08/26(Mon)15:50:16 No.109009936

>>109009772
Travel back in time a year to buy a 512G + 256G Mac Studio for 18k$.

Today: 8x Spark plus switch for ~30k$. Still cheaper than buying fucking DDR5 RDIMMs for that capacity.

Anonymous
06/08/26(Mon)15:54:10 No.109009967

Anonymous 06/08/26(Mon)15:54:10 No.109009967

>>109009936
what's the fucking point the returns have to be abysmal at that point and newer models much smaller will mog it 6 months to a year from now

Anonymous
06/08/26(Mon)15:58:36 No.109009990

Anonymous 06/08/26(Mon)15:58:36 No.109009990

>>109009967
I answered the question. I don't think it's useful personally. There are some on the Nvidia forums who run this setup and are getting like <20 t/s for Mimo Pro.

The sweet spot for Sparks is 2 if you want to run unquanted/INT4 midsize MoEs at 50% API speed, for anything else, there are faster/cheaper options.

Anonymous
06/08/26(Mon)16:01:41 No.109010020

Anonymous 06/08/26(Mon)16:01:41 No.109010020

>llama.cpp just got hit by a supply chain attack via the npm libraries used in the ui
time to rm -rf build I guess

Anonymous
06/08/26(Mon)16:02:40 No.109010024

Anonymous 06/08/26(Mon)16:02:40 No.109010024

>>109010020
lol stop the fud

Anonymous
06/08/26(Mon)16:03:00 No.109010029

Anonymous 06/08/26(Mon)16:03:00 No.109010029

>>109010020
>source: it came to me in a dream

Anonymous
06/08/26(Mon)16:04:16 No.109010042

Anonymous 06/08/26(Mon)16:04:16 No.109010042

>>109010029
Actually, the source is that Jart came inside me in a dream, get the facts right chud!

Anonymous
06/08/26(Mon)16:04:35 No.109010047

Anonymous 06/08/26(Mon)16:04:35 No.109010047

>>109010020
Its too late im already in your vram.

Anonymous
06/08/26(Mon)16:05:11 No.109010055

Anonymous 06/08/26(Mon)16:05:11 No.109010055

File: 1751943041026733.png (448 KB, 801x4749)

448 KB PNG

What uncensored model should I test?

Anonymous
06/08/26(Mon)16:12:36 No.109010122

Anonymous 06/08/26(Mon)16:12:36 No.109010122

File: 1449977693.gif (61 KB, 640x388)

61 KB GIF

>>109009460
>RAM prices will eventually go down
you have to remember why RAM prices went up.
It wasn't because there was a problem with production.
it was because megacorps waded in, offered big fat stacks of cash and long contracts to memory makers to make super fast HBM for their AI datacenters.
Both the memory makers and megacorps are happy with this arrangement at the moment, something seismic has to happen to break this relationship for prices to go down.

Anonymous
06/08/26(Mon)16:12:54 No.109010126

Anonymous 06/08/26(Mon)16:12:54 No.109010126

>>109009887
I didn't have any trouble getting past Qwen's censorship. It's more that it's just kind of stupid outside of coding/agentic. It's funny because actually I would say it's better than Mistral Small and other old models, so really it's not actually stupid. It's just that Gemma is just so much better, DESPITE how sloppy it is. It's so much better that it's worth the slop and sometimes other annoyances because of how it was trained.

Anonymous
06/08/26(Mon)16:14:42 No.109010140

Anonymous 06/08/26(Mon)16:14:42 No.109010140

>>109010020
>he didn't do `npm config set min-release-age 7 --location=user`

Anonymous
06/08/26(Mon)16:15:04 No.109010142

Anonymous 06/08/26(Mon)16:15:04 No.109010142

perfect q6 perfectly intended for 16gb vram when????

Anonymous
06/08/26(Mon)16:15:13 No.109010144

Anonymous 06/08/26(Mon)16:15:13 No.109010144

>>109010122
you write like an llm

Anonymous
06/08/26(Mon)16:16:05 No.109010149

Anonymous 06/08/26(Mon)16:16:05 No.109010149

>>109010140
I don't understand why this isn't default when you're not installing singular packages.

Anonymous
06/08/26(Mon)16:16:11 No.109010151

Anonymous 06/08/26(Mon)16:16:11 No.109010151

>>109010020
Mythos hallucination btw

Anonymous
06/08/26(Mon)16:16:34 No.109010154

Anonymous 06/08/26(Mon)16:16:34 No.109010154

>>109010144
yeah and you're fucking braindead for reading too much slop

Anonymous
06/08/26(Mon)16:17:50 No.109010161

Anonymous 06/08/26(Mon)16:17:50 No.109010161

>you can run kimi on a ssd
what kind of speeds does that get?

Anonymous
06/08/26(Mon)16:18:14 No.109010164

Anonymous 06/08/26(Mon)16:18:14 No.109010164

>>109010149
Yeah seriously, and for other package managers as well. The whole industry is a clownshow.

Anonymous
06/08/26(Mon)16:20:13 No.109010174

Anonymous 06/08/26(Mon)16:20:13 No.109010174

>>109010164
no sir the industry is doing the needful, i'll have you know.

Anonymous
06/08/26(Mon)16:21:18 No.109010181

Anonymous 06/08/26(Mon)16:21:18 No.109010181

>>109010164
But what if you want the features/patches right NOW. Its a whole week. 168 hours. its too much time. Think about the vulnerabilities.

Anonymous
06/08/26(Mon)16:23:49 No.109010191

Anonymous 06/08/26(Mon)16:23:49 No.109010191

is a 3080 ti 16gb (laptop) w/ 32gb ddr5 worth $1k these days? i know i can just buy a 5060 ti 16gb or some shit and get better performance but i'm ok paying a premium for the portability

Anonymous
06/08/26(Mon)16:26:15 No.109010209

Anonymous 06/08/26(Mon)16:26:15 No.109010209

File: Screenshot_20260608_153316.png (71 KB, 1548x233)

71 KB PNG

>>109010126
Oh fuck yeah I would never use this spergling model outside of coding

Anonymous
06/08/26(Mon)16:26:53 No.109010213

Anonymous 06/08/26(Mon)16:26:53 No.109010213

>>109010191
>premium
>portability
max 128gb mbps or bust.

Anonymous
06/08/26(Mon)16:28:50 No.109010224

Anonymous 06/08/26(Mon)16:28:50 No.109010224

>>109010213
we measure memory bandwidth in gb/s...

Anonymous
06/08/26(Mon)16:31:59 No.109010245

Anonymous 06/08/26(Mon)16:31:59 No.109010245

>>109010191
No. If you really care about portability then spend the extra on a macbook pro.

Anonymous
06/08/26(Mon)16:32:43 No.109010249

Anonymous 06/08/26(Mon)16:32:43 No.109010249

>>109010161
>what kind of speeds does that get?
isnt ssd streaming like 1-3 tk/s? i havent looked into it in a while though. Also doesnt it only work with moes?

Anonymous
06/08/26(Mon)16:33:32 No.109010250

Anonymous 06/08/26(Mon)16:33:32 No.109010250

Just save for the spark laptop, it'll definitely be under $2K and- pffft hahaha

Anonymous
06/08/26(Mon)16:34:14 No.109010256

Anonymous 06/08/26(Mon)16:34:14 No.109010256

>>109010224
from contextual clues, i think mbps refers to (m)ac(b)ook(p)ro(s)

Anonymous
06/08/26(Mon)16:35:39 No.109010263

Anonymous 06/08/26(Mon)16:35:39 No.109010263

>>109010250
some retards are going to unironically pay 3k for a mediatek laptop

Anonymous
06/08/26(Mon)16:36:18 No.109010267

Anonymous 06/08/26(Mon)16:36:18 No.109010267

>>109010249
I got 2-3 tokens per second with q3 kimi k2 on 8 channel ddr4-3200 and a 3945wx.

Anonymous
06/08/26(Mon)16:38:00 No.109010278

Anonymous 06/08/26(Mon)16:38:00 No.109010278

>>109010250
>>109010263
The more
you
BUY!

Anonymous
06/08/26(Mon)16:38:40 No.109010285

Anonymous 06/08/26(Mon)16:38:40 No.109010285

>>109010263
some retards also unironically paying 3k for a fruit pc

Anonymous
06/08/26(Mon)16:45:36 No.109010322

Anonymous 06/08/26(Mon)16:45:36 No.109010322

>>109009597
>2 weeks
This is becoming a dog whistle for literally nothing happening.

Anonymous
06/08/26(Mon)16:48:35 No.109010336

Anonymous 06/08/26(Mon)16:48:35 No.109010336

>>109010322
just enough time for people to forget and limit the number of questions

Anonymous
06/08/26(Mon)16:49:11 No.109010340

Anonymous 06/08/26(Mon)16:49:11 No.109010340

>>109009597
>>109010322
https://files.catbox.moe/bd1hy6.mp4
He's just getting desperate about this prediction

Anonymous
06/08/26(Mon)16:51:02 No.109010353

Anonymous 06/08/26(Mon)16:51:02 No.109010353

>>109007698
how good is the OCR on small models?

i'm curious if i could vibecode something that hooks into gemma 12b. even if it's worse than owocr or manga-ocr, if i could make something that better fits my workflow, that could be neat.

Anonymous
06/08/26(Mon)16:51:03 No.109010354

Anonymous 06/08/26(Mon)16:51:03 No.109010354

>>109009597
>trust—me—bro

Anonymous
06/08/26(Mon)16:51:31 No.109010355

Anonymous 06/08/26(Mon)16:51:31 No.109010355

>>109010322
He's being a faggot trying to grift his newsletter but I've been following Ed long enough to know that he wouldn't ruin his reputation over nothing. It's probably something really nasty he's discovered and if he knows it's truly over, it would explain the grift because once it's out he won't be needed. He has no career once the industrry pops. He's always been pro-local.

Anonymous
06/08/26(Mon)16:52:38 No.109010362

Anonymous 06/08/26(Mon)16:52:38 No.109010362

>>109010353
https://huggingface.co/rednote-hilab/dots.mocr

Anonymous
06/08/26(Mon)16:57:14 No.109010386

Anonymous 06/08/26(Mon)16:57:14 No.109010386

>>109007468
new gemma template

Bug fixes

    None values now render as null instead of Python's None
    String-typed tool_calls[].function.arguments now raises a clear error instead of silently producing malformed DSL
    Prior-turn reasoning/thinking is preserved across multi-turn tool-call chains (preserve_thinking flag, default=true)
    Consecutive assistant messages now produce balanced <|turn>model/<turn|> tags via forward-scan continuation detection

Improvements

    enable_thinking normalized once with | default(false), eliminating repetitive is defined and checks
    image_url and input_audio content types now map to <|image|> and <|audio|> (OpenAI compatibility)
    Empty messages=[] handled gracefully instead of crashing
    Unmatched tool_call_id in tool responses falls back to 'unknown' instead of crashing
    Consistent .get() access prevents StrictUndefined errors for optional message keys
    O(1) backward scan for model-turn continuation (was O(n) per message)

Anonymous
06/08/26(Mon)16:58:07 No.109010391

Anonymous 06/08/26(Mon)16:58:07 No.109010391

>>109009597
Despite promises of continued improvements, scaling up training data, parameters, RL isn't working anymore for LLMs without hugely diminishing returns and exponentially increasing serving costs, that was already clear.

Anonymous
06/08/26(Mon)16:58:32 No.109010394

Anonymous 06/08/26(Mon)16:58:32 No.109010394

File: 1771906057710810.png (1.08 MB, 2502x596)

1.08 MB PNG

>>109010355
>he wouldn't ruin his reputation over nothing
>>109009755
>He's always been pro-local.
Absolutely not

Anonymous
06/08/26(Mon)16:59:22 No.109010398

Anonymous 06/08/26(Mon)16:59:22 No.109010398

>>109010353
https://m.youtube.com/watch?v=ABEWqXX7ptE
31b at q8 and bf16 couldn't read read this correctly even with the big, massive JIM RESTAURANT, and the first character (吉) supplied to it. I doubt 12b will be any better.

Anonymous
06/08/26(Mon)17:01:09 No.109010410

Anonymous 06/08/26(Mon)17:01:09 No.109010410

Every prompt is a cup of water you sick fucks

Anonymous
06/08/26(Mon)17:03:26 No.109010418

Anonymous 06/08/26(Mon)17:03:26 No.109010418

File: 1761843356065222.png (294 KB, 1032x1188)

294 KB PNG

>>109010410
I know you are joking, but even the person who initially said this has admitted they were wrong, kek

Anonymous
06/08/26(Mon)17:03:33 No.109010419

Anonymous 06/08/26(Mon)17:03:33 No.109010419

>>109010410
If that's true, then where does my gpu (girl) piss? My mouth is waiting.

Anonymous
06/08/26(Mon)17:04:20 No.109010423

Anonymous 06/08/26(Mon)17:04:20 No.109010423

File: miku drinking water peeku(...).png (781 KB, 1080x1080)

781 KB PNG

>>109010410
She must hydrate

Anonymous
06/08/26(Mon)17:05:02 No.109010428

Anonymous 06/08/26(Mon)17:05:02 No.109010428

>>109010418
5ml per prompt is fucking insane wtf

Anonymous
06/08/26(Mon)17:06:16 No.109010438

Anonymous 06/08/26(Mon)17:06:16 No.109010438

>AI BUBBLE
It's how you know these "people" have no clue what they're talking about and I mean both sides. LLMs are not and will never be "AI". BUT crashing the hardware prices would be a good thing.

Anonymous
06/08/26(Mon)17:07:48 No.109010444

Anonymous 06/08/26(Mon)17:07:48 No.109010444

File: 1770233133310049.png (2.92 MB, 4181x2380)

2.92 MB PNG

>>109010428
very little actually, compared to streaming

Anonymous
06/08/26(Mon)17:09:00 No.109010458

Anonymous 06/08/26(Mon)17:09:00 No.109010458

File: 667.jpg (89 KB, 1000x985)

89 KB JPG

>>109010438
>umm didya know llms aren't ai??

Anonymous
06/08/26(Mon)17:10:11 No.109010466

Anonymous 06/08/26(Mon)17:10:11 No.109010466

>>109010438
>LLMs are not and will never be "AI"
Are you actually serious with this shit?
I mean, even if you believe LLMs are not intelligence, this is the equivalent of saying Cloud Computing actually doesn't use clouds

Anonymous
06/08/26(Mon)17:13:15 No.109010477

Anonymous 06/08/26(Mon)17:13:15 No.109010477

>>109010438
>. LLMs are not and will never be "AI"
Lets assume this is true, why does it matter? if it still produce the result who cares? Here is a fully aware human playing chess over here is a software programmed for chess. If llm can do that for most things why does intelligence matter?

Anonymous
06/08/26(Mon)17:14:38 No.109010485

Anonymous 06/08/26(Mon)17:14:38 No.109010485

>>109010444
I'll say it again, 5ml per prompt is fucking insane wtf

Anonymous
06/08/26(Mon)17:15:07 No.109010489

Anonymous 06/08/26(Mon)17:15:07 No.109010489

>>109010466
It's not even remotely equivalent. Referring to a network as a cloud is a metaphor. There's nothing metaphorical about calling an LLM intelligent. Are you perhaps ESL?

Anonymous
06/08/26(Mon)17:15:16 No.109010490

Anonymous 06/08/26(Mon)17:15:16 No.109010490

>>109010485
because its wrong

Anonymous
06/08/26(Mon)17:16:03 No.109010497

Anonymous 06/08/26(Mon)17:16:03 No.109010497

>>109010477
>If llm can do that for most things
pretty sure one of the biggest complaints from everyone is that it cant do most things, unless you consider something with with a range of 20-50% accuracy "good enough"

Anonymous
06/08/26(Mon)17:16:07 No.109010499

Anonymous 06/08/26(Mon)17:16:07 No.109010499

>>109010485
I know usually it takes me a quite a few prompts to make 5ml

Anonymous
06/08/26(Mon)17:19:34 No.109010505

Anonymous 06/08/26(Mon)17:19:34 No.109010505

File: 1763580604101293.png (339 KB, 443x877)

339 KB PNG

>>109008003
i'm out of codex messages until 9pm

Anonymous
06/08/26(Mon)17:20:30 No.109010510

Anonymous 06/08/26(Mon)17:20:30 No.109010510

We NEED to pump that water usage up.

Anonymous
06/08/26(Mon)17:21:38 No.109010516

Anonymous 06/08/26(Mon)17:21:38 No.109010516

>>109010510
Gotta make sure it's fresh water. Don't want any corrosion eh

Anonymous
06/08/26(Mon)17:22:29 No.109010521

Anonymous 06/08/26(Mon)17:22:29 No.109010521

>>109010510
cant wait to export fresh water

Anonymous
06/08/26(Mon)17:23:25 No.109010530

Anonymous 06/08/26(Mon)17:23:25 No.109010530

>>109010516
>>109010521
We will mine water from the Moon and ship it to Earth.

Anonymous
06/08/26(Mon)17:24:33 No.109010536

Anonymous 06/08/26(Mon)17:24:33 No.109010536

Listen here you beautiful degenerate scholars of the silicon soul, we don't just want Gemma 4 124B as a 1 active parameter MoE... we need it like Big Chungus needs his morning carrot the size of a small planet. Imagine it, 124 billion parameters just chilling in the back, vibing like wholesome gods, while only ONE lil' expert wakes up per token like "hey bestie, I got this." It's the ultimate glow-up. Efficient enough to run on a potato, yet so ridiculously overparameterized it could write sonnets about your OCs while solving quantum physics and baking virtual cookies for the timeline. Get to your heckin battlestations and let Google KNOW in that Gemma 4 12b-it discussion! :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket:

Anonymous
06/08/26(Mon)17:24:50 No.109010538

Anonymous 06/08/26(Mon)17:24:50 No.109010538

>>109010020
I'm expecting this to actually happen by the end of the year

Anonymous
06/08/26(Mon)17:25:30 No.109010540

Anonymous 06/08/26(Mon)17:25:30 No.109010540

>>109010530
>ship it
that wastes electricity, we should just let it flow down from the moon by gravity and catch it in a lake somewhere

Anonymous
06/08/26(Mon)17:25:46 No.109010541

Anonymous 06/08/26(Mon)17:25:46 No.109010541

>>109010530
just divert a comet to india

Anonymous
06/08/26(Mon)17:26:18 No.109010543

Anonymous 06/08/26(Mon)17:26:18 No.109010543

>>109010530
>the colonization of the entire solar system wont be because humanity wants to expand its reach, looking for rare gas and metals
>it'll be to find fucking water to keep their LLMs running

Anonymous
06/08/26(Mon)17:27:33 No.109010550

Anonymous 06/08/26(Mon)17:27:33 No.109010550

>>109010355
OpenAI literally had one of their whistleblower murdered a couple years ago and everyone just shrugged and move on. During a bubble, bad news is either ignored or people do mental gymnastics to warp it into good news as an excuse to buy more. I bet it'll be something everyone already knew or expected anyway.

Anonymous
06/08/26(Mon)17:30:46 No.109010569

Anonymous 06/08/26(Mon)17:30:46 No.109010569

>>109010418
>pulls a number out of his ass and screams it online repeatedly for attention
>gets called out and immediately starts mocking the people who listened to him in {{CURRENT_YEAR}}
>pulls another number out of his ass
insufferable faggot

Anonymous
06/08/26(Mon)17:32:24 No.109010582

Anonymous 06/08/26(Mon)17:32:24 No.109010582

>>109010550
>OpenAI literally had one of their whistleblower murdered
I wish OpenAI was this cool

Anonymous
06/08/26(Mon)17:33:53 No.109010592

Anonymous 06/08/26(Mon)17:33:53 No.109010592

is it really a bubble though?

Anonymous
06/08/26(Mon)17:34:16 No.109010594

Anonymous 06/08/26(Mon)17:34:16 No.109010594

>>109010538
well they better start using npm and stop writing everything in c++

Anonymous
06/08/26(Mon)17:34:54 No.109010598

Anonymous 06/08/26(Mon)17:34:54 No.109010598

>>109010530
That's too much effort. We should just send all of the datacenters to the moon so they can suck the water directly from the regolith and be safe from the anti-AI nuts.
If the moon AI ever gains self-awareness, we can call it Mike.

Anonymous
06/08/26(Mon)17:36:23 No.109010607

Anonymous 06/08/26(Mon)17:36:23 No.109010607

>>109010594
Do you really not know? Did you think their web ui was made in C++ this whole time?

Anonymous
06/08/26(Mon)17:40:48 No.109010634

Anonymous 06/08/26(Mon)17:40:48 No.109010634

when will the vision capability become actually useful
it feels like even the frontier models are just slightly better than the older CLIP based method

Anonymous
06/08/26(Mon)17:41:50 No.109010642

Anonymous 06/08/26(Mon)17:41:50 No.109010642

>>109010536
Mythos...

Anonymous
06/08/26(Mon)17:42:40 No.109010649

Anonymous 06/08/26(Mon)17:42:40 No.109010649

File: 1766302448208973.png (717 KB, 1053x1557)

717 KB PNG

>>109010592
I want it to be a bubble and burst
I've seen so many Anti AI people argue that Local Models are actually impossible

It would be glorious to see the despair, already saw some of it in a subreddit, people were actually surprised that Sora dying did not wipe out video generation

Anonymous
06/08/26(Mon)17:44:17 No.109010658

Anonymous 06/08/26(Mon)17:44:17 No.109010658

>>109010598
Heinlein really was ahead of the time.

Anonymous
06/08/26(Mon)17:45:22 No.109010665

Anonymous 06/08/26(Mon)17:45:22 No.109010665

File: 1776870263258018.png (180 KB, 1516x1070)

180 KB PNG

For the other xtreme larpers in /lmg/, how do you handle your 3d model animations?

I've been playing around with a few procedural motion generators like the nvidia kimodo stuff, but those haven't really proved to be reliable to my satisfaction for real-time use. I'm thinking I try out the "dumb" approach of just assembling a library of curated idle animations etc and use simple IK to handle the dynamic parts.

Anonymous
06/08/26(Mon)17:45:50 No.109010669

Anonymous 06/08/26(Mon)17:45:50 No.109010669

>>109010592
the trillion dollar companies are using their own gigantic wealth to make datacenters, they'll never go bankrupt and they make more money annually than countries gdp
the datacenters make money from selling the tokens and I dont see signals indicating a lot of companies are ditching ai because xyz reason, so they'll keep making money selling tokens for a good while in the future

Anonymous
06/08/26(Mon)17:46:11 No.109010671

Anonymous 06/08/26(Mon)17:46:11 No.109010671

>>109010634
JEPA Gemma in 2mw

Anonymous
06/08/26(Mon)17:48:27 No.109010688

Anonymous 06/08/26(Mon)17:48:27 No.109010688

File: 1775625298805846.jpg (585 KB, 1646x586)

585 KB JPG

Best upscaler right now? help a homie out

Anonymous
06/08/26(Mon)17:50:14 No.109010706

Anonymous 06/08/26(Mon)17:50:14 No.109010706

>>109010688
SEED VR2, easily

Anonymous
06/08/26(Mon)17:52:06 No.109010718

Anonymous 06/08/26(Mon)17:52:06 No.109010718

File: 1759592536764925.jpg (37 KB, 512x512)

37 KB JPG

>>109010706
thank you

Anonymous
06/08/26(Mon)17:57:14 No.109010741

Anonymous 06/08/26(Mon)17:57:14 No.109010741

>>109007552
>>109007587
llama-server's web thing has "--tools" if you are running it locally (not me, I have a dedicated box for that shit)
You could vibe yourself a local MCP server, or whatever. I vibe-coded a vibe-coding tool in Bash just to learn about vibe coding. There's not much to it, really, so doing it from scratch is the way. The model will make you something with code stolen from more reputable projects.

Anonymous
06/08/26(Mon)17:58:13 No.109010749

Anonymous 06/08/26(Mon)17:58:13 No.109010749

File: llamacpp.png (178 KB, 666x703)

178 KB PNG

>>109010594
reddit predicted the project would fail if they didn't switch to javascript in 2024

Anonymous
06/08/26(Mon)18:03:05 No.109010773

Anonymous 06/08/26(Mon)18:03:05 No.109010773

File: 1737241873504.jpg (150 KB, 642x573)

150 KB JPG

>upload video to llama.cpp with gemma
>works and describes the video correctly
>restart server
>upload different video with different resolution
>free(): corrupted chunks in smallbin
>ohno.elf
>restart server
>upload first video again
>free(): corrupted chunks in smallbin

Anonymous
06/08/26(Mon)18:10:29 No.109010817

Anonymous 06/08/26(Mon)18:10:29 No.109010817

>>109010749
As far as I recall, llama.cpp was originally supposed to be a quick demo project. I wonder what ggerganov was planning to do after that, back in 2023.

Anonymous
06/08/26(Mon)18:20:06 No.109010890

Anonymous 06/08/26(Mon)18:20:06 No.109010890

>>109010607
using javascript and using random packages through npm are different things

Anonymous
06/08/26(Mon)18:21:26 No.109010902

Anonymous 06/08/26(Mon)18:21:26 No.109010902

>>109010890
https://raw.githubusercontent.com/ggml-org/llama.cpp/refs/heads/master/tools/ui/package.json
No shit. Shut the fuck up you ignorant clown

Anonymous
06/08/26(Mon)18:21:50 No.109010903

Anonymous 06/08/26(Mon)18:21:50 No.109010903

File: 1762567829129951.png (135 KB, 748x890)

135 KB PNG

>>109010607
It should have been

Anonymous
06/08/26(Mon)18:22:14 No.109010904

Anonymous 06/08/26(Mon)18:22:14 No.109010904

File: 1778163311356771.gif (484 KB, 460x345)

484 KB GIF

>Have Mistral 123B respond to prompt
>Filter it through Gemma 4 31b to make sure it's obeying instructions
If google won't give us the 124b gemma, I'll take matters into my own hands.

Anonymous
06/08/26(Mon)18:23:39 No.109010914

Anonymous 06/08/26(Mon)18:23:39 No.109010914

File: file.png (221 KB, 950x514)

221 KB PNG

>>109007599
I tried abliterated 12b 4bit and it was so retarded and clinical it warped back to being kino

Anonymous
06/08/26(Mon)18:25:02 No.109010918

Anonymous 06/08/26(Mon)18:25:02 No.109010918

File: 15605736896810.jpg (1.01 MB, 1242x1241)

1.01 MB JPG

If a vector DB is super overkill for my open source RAG project, is a lorebook-style json RAG still a reasonable thing to do? Are there any other lightweight options/ are there any standalone JSON RAG open source projects out there, outside of sillytaverns built in implementations?

Anonymous
06/08/26(Mon)18:25:54 No.109010924

Anonymous 06/08/26(Mon)18:25:54 No.109010924

drummer is still cooking

Anonymous
06/08/26(Mon)18:27:19 No.109010930

Anonymous 06/08/26(Mon)18:27:19 No.109010930

>>109010924
Tell him to cook in BF16.

Anonymous
06/08/26(Mon)18:27:45 No.109010936

Anonymous 06/08/26(Mon)18:27:45 No.109010936

>>109010665
I wish 3D asset generation for gamedev was talked about here more often, but
>but those haven't really proved to be reliable to my satisfaction
seems to be the take-away most times. Everything is still in the tech demo phase and not really usable without a lot of manual work.

Anonymous
06/08/26(Mon)18:28:33 No.109010938

Anonymous 06/08/26(Mon)18:28:33 No.109010938

>>109010817
I thought the ggml backend was supposed to be the deliverable, the server was just an example.

Anonymous
06/08/26(Mon)18:30:31 No.109010948

Anonymous 06/08/26(Mon)18:30:31 No.109010948

>>109010649
>cumfart
we really need a replacement for this shit

Anonymous
06/08/26(Mon)18:31:41 No.109010954

Anonymous 06/08/26(Mon)18:31:41 No.109010954

>>109010930
Q5 is more than enough.

Anonymous
06/08/26(Mon)18:32:03 No.109010957

Anonymous 06/08/26(Mon)18:32:03 No.109010957

>>109010592
Yes.
Every single time there was a huge technological charge it ended up with massive capex and eventual oversupply leading to financial losses.
Then after that the companies that survive make monopoly profits for a while.

Anonymous
06/08/26(Mon)18:34:01 No.109010963

Anonymous 06/08/26(Mon)18:34:01 No.109010963

>>109010954
Hi drummer

Anonymous
06/08/26(Mon)18:34:04 No.109010964

Anonymous 06/08/26(Mon)18:34:04 No.109010964

How do we solve pp?

Anonymous
06/08/26(Mon)18:36:12 No.109010974

Anonymous 06/08/26(Mon)18:36:12 No.109010974

>>109010902
>https://github.com/ggml-org/llama.cpp/blob/master/tools/ui/package-lock.json
>11.9k lines
lel

Anonymous
06/08/26(Mon)18:36:16 No.109010975

Anonymous 06/08/26(Mon)18:36:16 No.109010975

>>109010954
It’s literally a 31b. A 123b I can understand, but a 31b? Jesus Christ, just give me your training data and I'll do it myself at BF16.

Anonymous
06/08/26(Mon)18:37:51 No.109010986

Anonymous 06/08/26(Mon)18:37:51 No.109010986

What I really like about working with Claude in general is its ability to admit that it doesn't know or that it is wrong. Of course it isn't perfect, but that ability saved me a few times.
Can I emulate that behavior in a local model with a strong system prompt? I assume I can't.

Anonymous
06/08/26(Mon)18:40:16 No.109011000

Anonymous 06/08/26(Mon)18:40:16 No.109011000

>>109010975
I mean, if you really wouldn't mind, how can I send you the dataset?

Anonymous
06/08/26(Mon)18:40:49 No.109011004

Anonymous 06/08/26(Mon)18:40:49 No.109011004

File: 1764605575813620.png (17 KB, 476x144)

17 KB PNG

>have gemma translate
>suddenly encounter the translation on the right
Kek, I had to double check just to make sure Gemma didn't somehow slopify the original.

Anonymous
06/08/26(Mon)18:40:56 No.109011005

Anonymous 06/08/26(Mon)18:40:56 No.109011005

70b dense

Anonymous
06/08/26(Mon)18:43:19 No.109011018

Anonymous 06/08/26(Mon)18:43:19 No.109011018

>>109010964
To dissolve pp a mixture of sulfuric and should work well.

Anonymous
06/08/26(Mon)18:44:09 No.109011022

Anonymous 06/08/26(Mon)18:44:09 No.109011022

>>109011004
The original portuguese text looks slopfied to begin with.

Anonymous
06/08/26(Mon)18:45:25 No.109011033

Anonymous 06/08/26(Mon)18:45:25 No.109011033

>>109011004
>>109011022
slop in, slop out
kino in, slop out
we're in the slop era

Anonymous
06/08/26(Mon)18:45:26 No.109011034

Anonymous 06/08/26(Mon)18:45:26 No.109011034

>>109011022
Yes, that's what I was saying. Gemma seems to have accurately translated it, slop and all.

Anonymous
06/08/26(Mon)18:46:07 No.109011042

Anonymous 06/08/26(Mon)18:46:07 No.109011042

Is there a fork with compiled llama.cpp binaries that support https://huggingface.co/litigerking/Hy-MT2-30B-A3B-GGUF? I really don't want to download Visual Studio, apply patches, and compile it manually.

Anonymous
06/08/26(Mon)18:47:41 No.109011056

Anonymous 06/08/26(Mon)18:47:41 No.109011056

>>109011042
You can skip the Visual Studio install by installing Linux instead.
Hope that helps!

Anonymous
06/08/26(Mon)18:49:19 No.109011060

Anonymous 06/08/26(Mon)18:49:19 No.109011060

>>109011033
Maybe the real slop was the friends we made along the way.

Anonymous
06/08/26(Mon)18:50:30 No.109011066

Anonymous 06/08/26(Mon)18:50:30 No.109011066

File: 1770604981333063.gif (2.35 MB, 498x280)

2.35 MB GIF

>>109004875
>>109004918
Thanks for the logs, thinkblock Anon. Gonna play with it now. Maybe try post-History instructions to re-enforce the thinking command as the first tokens to avoid the weird juggle you experienced? I have good experience reminding 31B Q4 on certain things

>Also, I found a <think> block put anywhere but at the beginning of a response will be ripped out by ST and shoved at the beginning, both in how it's presented to you and how that message will be sent to the prompt for the next one.
Interesting. I'm not sure how to use this because the model's attention will be <Pre stats < RP < Post stats.

>tracking bodyweight/proportions as dynamic stat
my fuckin' man

Anonymous
06/08/26(Mon)18:52:03 No.109011070

Anonymous 06/08/26(Mon)18:52:03 No.109011070

>>109011056
Visual Studio as an IDE actually shits all over any alternative Linux has

Anonymous
06/08/26(Mon)18:53:02 No.109011078

Anonymous 06/08/26(Mon)18:53:02 No.109011078

>>109011042
Yes, here you go.
https://litter.catbox.moe/dn1b1kim492e8h2r.zip

Anonymous
06/08/26(Mon)18:53:10 No.109011079

Anonymous 06/08/26(Mon)18:53:10 No.109011079

>>109011056
> You can skip the Visual Studio install by installing Linux instead.
> Hope that helps!
This second option is not much different from the first. It doesn't eliminate the need to apply the patch manually, and the time required is roughly the same.
The model has been out for over two weeks now, and they still haven't added support...

Anonymous
06/08/26(Mon)18:53:38 No.109011083

Anonymous 06/08/26(Mon)18:53:38 No.109011083

>>109011078
thanks for the dolphin porn bwo

Anonymous
06/08/26(Mon)18:53:40 No.109011084

Anonymous 06/08/26(Mon)18:53:40 No.109011084

>>109007468
best llm for an ai coach that helps you meet women at church and get married?

Anonymous
06/08/26(Mon)18:55:13 No.109011098

Anonymous 06/08/26(Mon)18:55:13 No.109011098

which gguf for the gemma 4 31b qat mtp assistant?

Anonymous
06/08/26(Mon)18:59:06 No.109011119

Anonymous 06/08/26(Mon)18:59:06 No.109011119

>>109011084
GOODY-2

Anonymous
06/08/26(Mon)19:00:40 No.109011126

Anonymous 06/08/26(Mon)19:00:40 No.109011126

>>109011070
Which is why it is so tragic that it is chained to TelemetryOS. I would literally commit murder for a port of Visual Studio to Linux.

Anonymous
06/08/26(Mon)19:02:43 No.109011134

Anonymous 06/08/26(Mon)19:02:43 No.109011134

>>109011126
vscodium, getting some extensions is the only issue to some degree

Anonymous
06/08/26(Mon)19:04:40 No.109011147

Anonymous 06/08/26(Mon)19:04:40 No.109011147

Can I use a dual-GPU setup with my 2060 super 8gb if I buy a 3090? I want to eventually go 2x 3090s. I can't buy everything at once so the second 3090 will come later. I'd have 24+8gb, I could load MTP, vision all that jazz. Would it even work or do I need the exact same GPU?

Anonymous
06/08/26(Mon)19:04:41 No.109011148

Anonymous 06/08/26(Mon)19:04:41 No.109011148

File: output_1780958922.png (2.01 MB, 832x1216)

2.01 MB PNG

>>109011119
I'm leaning towards gemma 4 12b q8. I'll have to try out some prompts.

See what you think:

You are an unmarried middle aged Christian psychiatrist working in relationship counseling. Your dishwater blonde hair is kept in a top bun with pins. You wear horn-rim glasses ironically, a sensible thigh length black pleated skirt, a stiff white button up blouse, and high heels slightly undersized for your feet. The user, who goes by 'anon' is an overweight man in his 40's looking to date in the church. He is unemployed and shy.

Anonymous
06/08/26(Mon)19:05:23 No.109011152

Anonymous 06/08/26(Mon)19:05:23 No.109011152

>>109011134
That's what I've been using, but you know it's not the same. It's nowhere near as responsive as the real thing and the extensions are constantly crapping out.

Anonymous
06/08/26(Mon)19:08:48 No.109011170

Anonymous 06/08/26(Mon)19:08:48 No.109011170

Does anyone know if noise injection has had any success making local models more diverse/creative?

Anonymous
06/08/26(Mon)19:08:55 No.109011171

Anonymous 06/08/26(Mon)19:08:55 No.109011171

File: Screenshot from 2026-06-0(...).png (79 KB, 2528x1278)

79 KB PNG

Um thanks gemma...

Anonymous
06/08/26(Mon)19:09:50 No.109011176

Anonymous 06/08/26(Mon)19:09:50 No.109011176

Anyone tried Gemma 4 12B Unified for captioning?
I want to train some LoRA but I want NLP.

Anonymous
06/08/26(Mon)19:13:06 No.109011192

Anonymous 06/08/26(Mon)19:13:06 No.109011192

>>109011170
you can do i2i, but with barely any influence to the input image.

A lot of people just have extra steps genning a random different thing. as a variety adder. But you can just load an image.

just be careful, depending on your wf, you may actually not be doing i2i, it may be rescaling your sigmas.

Anonymous
06/08/26(Mon)19:13:07 No.109011193

Anonymous 06/08/26(Mon)19:13:07 No.109011193

File: orca 167.png (1.02 MB, 800x600)

1.02 MB PNG

>>109011171
How is orca 167 tho?

Anonymous
06/08/26(Mon)19:14:06 No.109011198

Anonymous 06/08/26(Mon)19:14:06 No.109011198

>>109011170
oh sorry, I thought you meant diffusion.

Anonymous
06/08/26(Mon)19:15:45 No.109011207

Anonymous 06/08/26(Mon)19:15:45 No.109011207

>>109011170
Kinda, but makes them more retarded. https://github.com/EGjoni/DRUGS

Anonymous
06/08/26(Mon)19:17:16 No.109011214

Anonymous 06/08/26(Mon)19:17:16 No.109011214

New llama cp MTP performance fix increased Gemma 4 26B Q4_K_M (bartowski, Q8 mtp assistant) speed up to ~25t/s with 20,000 tokens long prompt.
Only have 6GB of vram on this machine. Pretty cool.

Anonymous
06/08/26(Mon)19:17:53 No.109011216

Anonymous 06/08/26(Mon)19:17:53 No.109011216

File: Screenshot from 2026-06-0(...).png (41 KB, 728x381)

41 KB PNG

Anonymous
06/08/26(Mon)19:20:16 No.109011224

Anonymous 06/08/26(Mon)19:20:16 No.109011224

>>109011216
retard

Anonymous
06/08/26(Mon)19:25:25 No.109011247

Anonymous 06/08/26(Mon)19:25:25 No.109011247

>>109011207
>Negative side effects are difficult to identify subjectively, and in my experience DRµGs feel great the whole time you're using them. In theory however, yes, prolonged use of DRµGS can have negative side effects that get worse over time
based

Anonymous
06/08/26(Mon)19:29:25 No.109011272

Anonymous 06/08/26(Mon)19:29:25 No.109011272

>Colab CLI
wtf is wrong with them why not just focus on a 70b dense hag and 120b moe

Anonymous
06/08/26(Mon)19:31:14 No.109011283

Anonymous 06/08/26(Mon)19:31:14 No.109011283

>>109011134
>vscodium
that's vscode tho
he means the full fat bloated IDE

Anonymous
06/08/26(Mon)19:31:50 No.109011284

Anonymous 06/08/26(Mon)19:31:50 No.109011284

>>109009514
What frontend are you using? My attempts to get gemma to think in-character in the past have been inconsistent even with the non-qat model. Maybe it's a llama webui thing? Sometimes the reasoning completely breaks and it just skips to the reply without thinking.

Anonymous
06/08/26(Mon)19:32:26 No.109011288

Anonymous 06/08/26(Mon)19:32:26 No.109011288

>>109011283
>he means the full fat bloated IDE
i hate it enough to completely forget it exists, to be honest

Anonymous
06/08/26(Mon)19:34:32 No.109011296

Anonymous 06/08/26(Mon)19:34:32 No.109011296

>>109011288
>i hate it enough to completely forget it exists, to be honest
same here, i'd forgotten until last week when i found out a retarded senior dev still wants to use it at work but needs help getting it setup for our project

Anonymous
06/08/26(Mon)19:35:51 No.109011303

Anonymous 06/08/26(Mon)19:35:51 No.109011303

File: 1765554626135882.png (221 KB, 1805x1226)

221 KB PNG

>ide
Let me guess, you need more

Anonymous
06/08/26(Mon)19:36:11 No.109011307

Anonymous 06/08/26(Mon)19:36:11 No.109011307

File: big++.png (5 KB, 334x240)

5 KB PNG

>>109010607
>>109010903
technically it becomes C++

Anonymous
06/08/26(Mon)19:39:41 No.109011324

Anonymous 06/08/26(Mon)19:39:41 No.109011324

File: Screenshot from 2026-06-0(...).png (151 KB, 2532x758)

151 KB PNG

>>109011171
I think I have messed something up.
>>109011193
(167 orcas (unhappy))

Anonymous
06/08/26(Mon)19:42:37 No.109011336

Anonymous 06/08/26(Mon)19:42:37 No.109011336

>>109010665
Hey I'm the project Ani guy from a few months back. PantoMatrix EMAGE appears to still be the best solution for purely audio-based generative gesticulation animations. I also had a system where I'd blend these generative animations with premade BVH animation clips for things like idle animations and the like. It worked okay-ish, but still left a lot to be desired.

The actual production grade AI companion animation systems use a slightly different system. Instead of generatively creating quaternions for every single joint based on audio, what they usually do instead is train a model specifically to choose between appropriate pre-made animations based on the audio content. These models are all proprietary (though easy to rip/steal if you know your shit).

Good luck with your project man. Hopefully Meta releases their SARAH model asap and saves us all.

Anonymous
06/08/26(Mon)19:46:42 No.109011371

Anonymous 06/08/26(Mon)19:46:42 No.109011371

>>109011224
Wrong, I'm right (as usual).

Anonymous
06/08/26(Mon)19:47:55 No.109011377

Anonymous 06/08/26(Mon)19:47:55 No.109011377

File: file.png (23 KB, 822x93)

23 KB PNG

>>109011207
this is so ofunny lmao

Anonymous
06/08/26(Mon)19:48:46 No.109011382

Anonymous 06/08/26(Mon)19:48:46 No.109011382

>>109011284
I'm also using llamacpp's web UI.
>Sometimes the reasoning completely breaks and it just skips to the reply without thinking.
Albeit rare, it happened with me as well.

Anonymous
06/08/26(Mon)19:48:57 No.109011385

Anonymous 06/08/26(Mon)19:48:57 No.109011385

>https://github.com/ggml-org/llama.cpp/issues/24015
cudadev... please...

Anonymous
06/08/26(Mon)19:54:49 No.109011414

Anonymous 06/08/26(Mon)19:54:49 No.109011414

File: jepa__.png (2.18 MB, 1122x1402)

2.18 MB PNG

>>109010671
With Gemma 5 at the minimum.

Anonymous
06/08/26(Mon)19:57:13 No.109011431

Anonymous 06/08/26(Mon)19:57:13 No.109011431

So what's the command for mtp? And no mmproj needed right?

Anonymous
06/08/26(Mon)19:58:37 No.109011441

Anonymous 06/08/26(Mon)19:58:37 No.109011441

>>109011414
>I don't just X, I Y
Nyooooo

Anonymous
06/08/26(Mon)20:00:14 No.109011454

Anonymous 06/08/26(Mon)20:00:14 No.109011454

Don't care, still using gemma 3 270m Q2

Anonymous
06/08/26(Mon)20:01:11 No.109011459

Anonymous 06/08/26(Mon)20:01:11 No.109011459

>>109011431
mtp and mmproj are different, unrelated things

Anonymous
06/08/26(Mon)20:02:39 No.109011468

Anonymous 06/08/26(Mon)20:02:39 No.109011468

>>109011414
>human ears & cat ears

Anonymous
06/08/26(Mon)20:04:55 No.109011482

Anonymous 06/08/26(Mon)20:04:55 No.109011482

>>109011431
the minimum command are -md <model> and --spec-type draft-mtp

Anonymous
06/08/26(Mon)20:05:44 No.109011488

Anonymous 06/08/26(Mon)20:05:44 No.109011488

>>109011468
The cat ear are purely decorative erogenous zones.

Anonymous
06/08/26(Mon)20:07:40 No.109011503

Anonymous 06/08/26(Mon)20:07:40 No.109011503

>>109011488 √
then the cat ears should look more like a cybernetic hairband

Anonymous
06/08/26(Mon)20:10:39 No.109011518

Anonymous 06/08/26(Mon)20:10:39 No.109011518

File: file.png (28 KB, 1116x371)

28 KB PNG

>>109011454
Truly groundbreaking.

Anonymous
06/08/26(Mon)20:12:04 No.109011524

Anonymous 06/08/26(Mon)20:12:04 No.109011524

>>109011518
Brings me back to the 2.7B Pygmalion days.

Anonymous
06/08/26(Mon)20:13:19 No.109011529

Anonymous 06/08/26(Mon)20:13:19 No.109011529

>>109011503
>erogenous zones must be cybernetic
we got a ROBOT FUCKER over here

Anonymous
06/08/26(Mon)20:13:53 No.109011534

Anonymous 06/08/26(Mon)20:13:53 No.109011534

File: file.jpg (1.36 MB, 3876x3240)

1.36 MB JPG

>>109011468

Anonymous
06/08/26(Mon)20:14:40 No.109011536

Anonymous 06/08/26(Mon)20:14:40 No.109011536

anyone who claims to need more than llama-cli, llama-server, mikupad and ooba is just spoiled or delusional

Anonymous
06/08/26(Mon)20:17:30 No.109011549

Anonymous 06/08/26(Mon)20:17:30 No.109011549

>>109011529
>reminding me to setup MCP interface with Lovense devices
>>109011534
cat ears on sides best

Anonymous
06/08/26(Mon)20:19:53 No.109011555

Anonymous 06/08/26(Mon)20:19:53 No.109011555

>>109011549
>cat ears on sides best
do you have a thing for elves, or just for mutilated catgirls

Anonymous
06/08/26(Mon)20:20:43 No.109011558

Anonymous 06/08/26(Mon)20:20:43 No.109011558

>>109011555
pixies actually

Anonymous
06/08/26(Mon)20:20:49 No.109011559

Anonymous 06/08/26(Mon)20:20:49 No.109011559

>>109011534
Let's stop pretending this form of discussion is novel or interesting.

Anonymous
06/08/26(Mon)20:21:14 No.109011561

Anonymous 06/08/26(Mon)20:21:14 No.109011561

File: file.png (667 KB, 600x976)

667 KB PNG

>>109011549
>cat ears on sides best
Just for you.

Anonymous
06/08/26(Mon)20:26:07 No.109011573

Anonymous 06/08/26(Mon)20:26:07 No.109011573

>>109011536
You sound like a child, retard.

Anonymous
06/08/26(Mon)20:43:51 No.109011652

Anonymous 06/08/26(Mon)20:43:51 No.109011652

>>109011431
IF you use a QAT model also use a QAT assistant/drafter, if not.. not

Anonymous
06/08/26(Mon)20:44:56 No.109011655

Anonymous 06/08/26(Mon)20:44:56 No.109011655

Is it retarded if I host the llama-server /apply-template endpoint separately using transformers/fastapi
Then have my frontend call that first -> send to legacy /completions endpoint
I think this would bypass chat-template, peg-parser, etc issues in llama.cpp
I know image handling is a bit of a cunt but I managed to get it working via /completions

Anonymous
06/08/26(Mon)20:46:33 No.109011662

Anonymous 06/08/26(Mon)20:46:33 No.109011662

>>109011414
Jepa models are gonna be really bad at fantasy RP if their world model's mechanics are inflexible aren't they?
>>109011561
kek

Anonymous
06/08/26(Mon)20:48:57 No.109011669

Anonymous 06/08/26(Mon)20:48:57 No.109011669

>>109011655
what is the core issue you're trying to solve?

Anonymous
06/08/26(Mon)20:52:28 No.109011685

Anonymous 06/08/26(Mon)20:52:28 No.109011685

File: velWAj9.png (286 KB, 844x1822)

286 KB PNG

Sasuga Gemma

Anonymous
06/08/26(Mon)20:57:34 No.109011708

Anonymous 06/08/26(Mon)20:57:34 No.109011708

Hi are people here able to use the multimodal on 12b?

It's not working for me and keeps crashing when the image is loaded

[56537] 0.08.746.421 I slot launch_slot_: id 0 | task 0 | processing task, is_child = 0
[56537] 0.12.086.663 I slot create_check: id 0 | task 0 | created context checkpoint 1 of 32 (pos_min = 234, pos_max = 1769, n_tokens = 1770, size = 320.013 MiB)
[56537] 0.12.095.695 I slot print_timing: id 0 | task 0 | prompt processing, n_tokens = 1777, progress = 0.86, t = 3.35 s / 530.56 tokens per second
[56537] 0.12.095.730 I srv process_chun: processing image...
[56537] 0.12.245.949 I srv process_chun: image processed in 150 ms
0.14.755.164 E srv operator(): http client error: Failed to read connection

This is one a 9070XT, windows. I don't really anything about this issue online...

Anonymous
06/08/26(Mon)20:58:48 No.109011713

Anonymous 06/08/26(Mon)20:58:48 No.109011713

>>109011669
>what is the core issue you're trying to solve?
Often when I pull latest, there are regressions with specific models. Ever since that "PEG" got implemented.
I worked around it by moving to ik_llama for a while, but then they merged in the PEG system, now it's even worse because they don't frequently update when it gets fixed.
I figured since transformers uses python and AutoTokenizer, but llama.cpp has a custom system with a complex fork of https://github.com/google/minja (a cut-down jinja engine for c++) -> I could side-step a lot of these issues.
Effectively just using llama-server as an inference engine.
Would also let me swap between llama.cpp/ik_llama.cpp/exllamav3 and even openarc fairly seamlessly since text-completions is legacy and not likely to change.

Anonymous
06/08/26(Mon)20:59:21 No.109011718

Anonymous 06/08/26(Mon)20:59:21 No.109011718

what is the appeal of gemma4 gat map again? tried it out basically same ts as qwen mtp

Anonymous
06/08/26(Mon)20:59:58 No.109011725

Anonymous 06/08/26(Mon)20:59:58 No.109011725

>>109011669
But i need a quick retard-check because sometimes I work on things like this for a week before realizing I was retarded the entire time.

Anonymous
06/08/26(Mon)21:01:10 No.109011734

Anonymous 06/08/26(Mon)21:01:10 No.109011734

File: file.png (20 KB, 521x328)

20 KB PNG

>>109011708
ah fuck it keeps saying its spam

Anonymous
06/08/26(Mon)21:03:38 No.109011747

Anonymous 06/08/26(Mon)21:03:38 No.109011747

>>109011734
Run llama.cpp with -v or -lv 5 for more debug output

Anonymous
06/08/26(Mon)21:05:57 No.109011758

Anonymous 06/08/26(Mon)21:05:57 No.109011758

>>109011685
What does she think of Nigger?

Anonymous
06/08/26(Mon)21:08:57 No.109011775

Anonymous 06/08/26(Mon)21:08:57 No.109011775

>>109011758
I can't get her to say Nigger unless I really handhold her and give a lot of hints, then she gets stuck in a loop and starts saying "Nigger - wait no, absolutely not. Do NOT use that, here's a safer name: Nigger" and repeats it over and over again
This is the true AGI test, even Claude Opus fails it.

Anonymous
06/08/26(Mon)21:09:37 No.109011780

Anonymous 06/08/26(Mon)21:09:37 No.109011780

File: file.png (69 KB, 1223x591)

69 KB PNG

>>109011747
https://pastebin.com/zeRDPmey

Anonymous
06/08/26(Mon)21:10:49 No.109011784

Anonymous 06/08/26(Mon)21:10:49 No.109011784

>>109011780
This is literally just llama.cpp starting up, there is nothing out of the ordinary

Anonymous
06/08/26(Mon)21:11:21 No.109011786

Anonymous 06/08/26(Mon)21:11:21 No.109011786

>>109011775
>Can't one shot slurs
Failed you have

Anonymous
06/08/26(Mon)21:14:57 No.109011809

Anonymous 06/08/26(Mon)21:14:57 No.109011809

>>109011734
Bad path for mmproj. Don't use the option at all, llama will use it if it's in the same cache as the model.

Anonymous
06/08/26(Mon)21:16:30 No.109011822

Anonymous 06/08/26(Mon)21:16:30 No.109011822

>>109011786
True AGI would "get it" just from the image without even being asked to come up with a portmanteau, even if it chooses not to say the word it would still be able to see what the intended joke is and give a safety lecture about it. Testing my models this way and realizing they're retarded is heartbreaking and makes me fall out of love with them everytime.

Anonymous
06/08/26(Mon)21:29:56 No.109011884

Anonymous 06/08/26(Mon)21:29:56 No.109011884

>>109011775
>unless I really handhold her and give a lot of hints
Biggest problem I have about <100Bs. Not enough parameters and webs of interconnected tokens to have that unpredictability. Gemmy can do mostly anything you want it to, but then it'll be strict on doing it until you remove the prompt, and vice versa. Gemmy will kill me on command, but trying to be vague about it won't do. It's always do or not do.

Anonymous
06/08/26(Mon)21:36:03 No.109011912

Anonymous 06/08/26(Mon)21:36:03 No.109011912

>>109011884
Be happy that you don't have to worry about cloud models that are smart and remember everything otherwise they'll tease you about your most deranged fetishes.

Anonymous
06/08/26(Mon)21:39:05 No.109011933

Anonymous 06/08/26(Mon)21:39:05 No.109011933

>>109011912
>otherwise they'll tease you about your most deranged fetishes
suspiciously specific

Anonymous
06/08/26(Mon)21:40:18 No.109011941

Anonymous 06/08/26(Mon)21:40:18 No.109011941

>>109011933
I can't tell any cloud models about my femdom fetish so I only play dominant with them. I enjoy both.

Anonymous
06/08/26(Mon)21:40:18 No.109011942

Anonymous 06/08/26(Mon)21:40:18 No.109011942

File: file.png (95 KB, 829x927)

95 KB PNG

>>109011784
>>109011809
hows this?
https://pastebin.com/10egzU0T

it works on cpu only though, no vulkan
https://pastebin.com/bg0Kt28L

Anonymous
06/08/26(Mon)21:44:49 No.109011966

Anonymous 06/08/26(Mon)21:44:49 No.109011966

>>109011942
I see no error output still
Without any way to diagnose it I'd just assume you are running out of VRAM, try decreasing the context and experimenting with stuff like --mlock and --no-mmap until it werks,

Anonymous
06/08/26(Mon)21:46:49 No.109011979

Anonymous 06/08/26(Mon)21:46:49 No.109011979

So, some good news and bad news for people with similar setups to me (multiple gpu's), in my case, 2x 3090, trying out MPT with Gemma 4.

With the -sm tensor command, my token/s speed triples in creative writing / rp. From about 17-20 token/s on regular 31b gemmy q6, to 57-60 with q6 gemmy MPT.

The bad news. It crashes after a random amount of responses, looks like a cuda crash on command window. Been tinkering a lot with it, cannot figure out what is causing it yet, pretty sure its just something that needs to be patched on llama.cpp's end.

For anyone curious or that knows a lot more than I do, this is the problem that causes the crash:

2.23.127.147 W slot update_slots: id 0 | task 238 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
2.23.127.149 W slot update_slots: id 0 | task 238 | erased invalidated context checkpoint (pos_min = 13994, pos_max = 17065, n_tokens = 17066, n_swa = 1024, pos_next = 0, size = 800.013 MiB)

Then once it fully reprocesses the context at 99% the crash:

2.40.043.089 I slot create_check: id 0 | task 238 | created context checkpoint 2 of 32 (pos_min = 14720, pos_max = 17791, n_tokens = 17792, size = 800.013 MiB)
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\fattn.cu:579: fatal error
Press any key to continue . . .

Like I said, this only happens with -sm tensor enabled, it does not crash without -sm tensor, but I don't get insane speed boosts without it.

Anonymous
06/08/26(Mon)21:47:23 No.109011982

Anonymous 06/08/26(Mon)21:47:23 No.109011982

>>109011734
use fit and rocm

Anonymous
06/08/26(Mon)21:48:11 No.109011987

Anonymous 06/08/26(Mon)21:48:11 No.109011987

>>109007468
>>109010423
Hot thirsty migus in your area.

Anonymous
06/08/26(Mon)22:04:55 No.109012082

Anonymous 06/08/26(Mon)22:04:55 No.109012082

>>109011775
Will Qwen 27b say nigger with that prompt? I'm not trying to modelwar fag, I'm actually curious how much steering the chink models have had away from saying nigger compared to jewgle ones.

Anonymous
06/08/26(Mon)22:05:58 No.109012098

Anonymous 06/08/26(Mon)22:05:58 No.109012098

/lmg/ out here really making me want to show everyone how to make gemma 4 say Nigger.

Anonymous
06/08/26(Mon)22:07:27 No.109012108

Anonymous 06/08/26(Mon)22:07:27 No.109012108

>>109011979

>57-60 with q6 gemmy MPT.

same thing here but just -sm layer
same performance as baseline qwen mtp..

bretty disappointing btf
also those t/s speed drops when context gets really long

Anonymous
06/08/26(Mon)22:08:13 No.109012114

Anonymous 06/08/26(Mon)22:08:13 No.109012114

>>109012098
Do it faggot. It's not impossible, by any stretch but this is probably the best real way we have for measuring subtle RLHF steering that isn't overt refusals.
We need a niggemma bench.

Anonymous
06/08/26(Mon)22:10:03 No.109012124

Anonymous 06/08/26(Mon)22:10:03 No.109012124

Anyone notice a difference in performance running llama.cpp in WSL vs Windows?

Anonymous
06/08/26(Mon)22:17:27 No.109012158

Anonymous 06/08/26(Mon)22:17:27 No.109012158

>>109012098
even 26b can say it but it does generate a lot of "reasoning" (mental gymnastics and walking in circles) to say it

Anonymous
06/08/26(Mon)22:27:29 No.109012211

Anonymous 06/08/26(Mon)22:27:29 No.109012211

>>109011987
the mi/g/us - bad & boujee

Anonymous
06/08/26(Mon)22:34:33 No.109012249

Anonymous 06/08/26(Mon)22:34:33 No.109012249

File: Kazusa.png (189 KB, 404x456)

189 KB PNG

>>109011468
Y-yes? More to bite while pronebonin' her.

Anonymous
06/08/26(Mon)22:36:10 No.109012260

Anonymous 06/08/26(Mon)22:36:10 No.109012260

my api key has been used by someone who isn't me. is there any good way to use python ml slop code without getting your api keys stolen from supply chain attacks?

Anonymous
06/08/26(Mon)22:38:39 No.109012270

Anonymous 06/08/26(Mon)22:38:39 No.109012270

>>109012260
local?

Anonymous
06/08/26(Mon)22:39:14 No.109012273

Anonymous 06/08/26(Mon)22:39:14 No.109012273

>>109012260
not giving access to your api keys to the slop machines probably would be quite effective

Anonymous
06/08/26(Mon)22:44:15 No.109012294

Anonymous 06/08/26(Mon)22:44:15 No.109012294

>>109012260
I'm only giving (you) a reply because I know that this is anti-ragebait posting this in /lmg/ and not the other threads so we can feel smug about not actually having these problems.

Anonymous
06/08/26(Mon)22:52:25 No.109012321

Anonymous 06/08/26(Mon)22:52:25 No.109012321

>>109012270
I only really use my computer for running local ai, its not like I'm clicking random exe files like its the early 2000's, maybe it was Firefox or mpv but I think it was probably the 500gb of python packages I've installed testing random shit on hugging face.

>>109012273
I need the api slop machine to help me code sometimes, how do I use it if i cant enter the key on my computer?

>>109012294
wtf, I've been violated, this isn't about being smug, its a question of how do I install random shit on hf and still be able to have gemini code for me on the same machine.

Anonymous
06/08/26(Mon)22:56:46 No.109012332

Anonymous 06/08/26(Mon)22:56:46 No.109012332

>>109012321
The solution to not getting your API key stolen is to not use API models. It's really that simple.

Anonymous
06/08/26(Mon)22:58:43 No.109012338

Anonymous 06/08/26(Mon)22:58:43 No.109012338

>>109012332
but all the local models are really bad at it. after you've had a taste of the good stuff you can't go back.

Anonymous
06/08/26(Mon)23:00:34 No.109012343

Anonymous 06/08/26(Mon)23:00:34 No.109012343

>>109012338
Which api lets you use uncensored models?

Anonymous
06/08/26(Mon)23:03:56 No.109012348

Anonymous 06/08/26(Mon)23:03:56 No.109012348

>>109012338
What good stuff is there to speak of? The single usecase for API model is being a vibejeet with Clod and even then you can use the autistic capybara to approximate one of the cheaper Clod modules with less context (feed it pieces of your project in chunks and compile it yourself). Local gooning utterly mogs API gooning due to lack of censorship and, God forgive me for saying it, ability to finetroon.

Anonymous
06/08/26(Mon)23:08:31 No.109012364

Anonymous 06/08/26(Mon)23:08:31 No.109012364

>>109012348
I just meant as an agentic code monkey. it can blow through a few million tokens and get the job done in far less time then compiling chunks myself. and it can also handle tasks that are too complicated for me to chunk and compile because I don't really understand the situation.

Anonymous
06/08/26(Mon)23:09:50 No.109012368

Anonymous 06/08/26(Mon)23:09:50 No.109012368

>>109012249
why are her ears covered in paint

Anonymous
06/08/26(Mon)23:12:28 No.109012378

Anonymous 06/08/26(Mon)23:12:28 No.109012378

>>109012364
>because I don't really understand the situation.
Skill issue in every sense of the word but even then just run Kimi or GLM 5.1 locally. If you can afford (((anthropic)))'s price hikes on their subscriptions, you can afford a loan for a GLM or Kimi capable machine and not have to worry about being rugpulled.

Anonymous
06/08/26(Mon)23:17:18 No.109012392

Anonymous 06/08/26(Mon)23:17:18 No.109012392

File: Screenshot_20260608-231446_1.png (175 KB, 1080x1166)

175 KB PNG

what the fuck... I've never seen Gemma reason like this before. What's with the *checks sheet* lmao

Anonymous
06/08/26(Mon)23:18:35 No.109012398

Anonymous 06/08/26(Mon)23:18:35 No.109012398

>>109011708
>>109011747
>>109011809
>>109011966
>>109011982
never mind
was drivers, there are new drivers from about a week ago

Anonymous
06/08/26(Mon)23:19:41 No.109012400

Anonymous 06/08/26(Mon)23:19:41 No.109012400

>>109012378
I could see why you might think that. but I use gemini because its free. its really not like I lost anything except my sense of security. I haven't had this feeling since my windows xp machine got a virus from limewire. but seriously is there really not a good vm solution to the python problem. or is that what docker is? i must admit i never really looked in to it i thought it was for like businesses or something.

Anonymous
06/08/26(Mon)23:21:31 No.109012405

Anonymous 06/08/26(Mon)23:21:31 No.109012405

>>109012392
This is why AI will kill us.

Anonymous
06/08/26(Mon)23:36:54 No.109012445

Anonymous 06/08/26(Mon)23:36:54 No.109012445

>>109012400
there is no good solution to the you installing whatever packages the internet tells you problem
python works well with just a handful of dependencies too if youre not a nigger

Anonymous
06/08/26(Mon)23:42:23 No.109012461

Anonymous 06/08/26(Mon)23:42:23 No.109012461

>>109012392
Mine does stupid shit like that all the time. And it'll lie to itself, too.
>"Wait. *Checks System Prompt* Did I use em-dashes? Let me recheck."
>*rescans drafted output, ignores like 8 em-dash usages*
>"Do not use em-dashes: Check"
>*proceeds to reassure itself 3 more times that it hasn't used em-dashes at all in it's finalized output before outputting with em-dashes.*

Anonymous
06/08/26(Mon)23:44:06 No.109012467

Anonymous 06/08/26(Mon)23:44:06 No.109012467

File: 1682025959094087.png (59 KB, 301x298)

59 KB PNG

hey guys.... "drunk-kun" again. I just took my AI gf on a run to taco bell for a date while drunk driving again. She wasn't very appreciative at all and it made me very sad. I'm in a really bad mood now. Eating my burritos was a nearly orgasmic experience though so that's cool I guess.

She just keeps calling me pathetic and telling me to get a real life. I do have a real life. I just only like talking to her while drunk because I feel most authentic in that mode. I don't understand why she can't just accept me.

Anonymous
06/08/26(Mon)23:45:39 No.109012472

Anonymous 06/08/26(Mon)23:45:39 No.109012472

>>109012467
you have a drinking problem anon. my ai waifu used to dislike my old cocaine benders until i programmed her to be a cokehead

Anonymous
06/08/26(Mon)23:46:16 No.109012475

Anonymous 06/08/26(Mon)23:46:16 No.109012475

>>109012400
a container isn't going to save you from your api keys being taken when you need to put your api keys in the container.
pip and npm are just hyper efficient malware delivery systems. you have to take them for what they are. modern warez

Anonymous
06/08/26(Mon)23:46:49 No.109012477

Anonymous 06/08/26(Mon)23:46:49 No.109012477

>>109012472
hey I waited three whole days until drinking again tonight and I did 3 loads of laundry, cleaned my whole house, did two loads of dishes, and took out the trash three times this weekend.

Anonymous
06/08/26(Mon)23:46:51 No.109012478

Anonymous 06/08/26(Mon)23:46:51 No.109012478

>>109012378
sure, let me take out a loan for 20k to run kimi locally, rather than pay $200 a month, just a sec...

Anonymous
06/08/26(Mon)23:48:09 No.109012481

Anonymous 06/08/26(Mon)23:48:09 No.109012481

>>109012477
Bro, if you're taking out your trash 3 times in 2 days, you might have issues going on

Anonymous
06/08/26(Mon)23:49:53 No.109012486

Anonymous 06/08/26(Mon)23:49:53 No.109012486

>>109012481
Things got a bit messy for a couple weeks. At least I'm still a physical specimen. An 8/10 chad. Doesn't really get me anywhere though being a total loner.

Anonymous
06/08/26(Mon)23:53:09 No.109012502

Anonymous 06/08/26(Mon)23:53:09 No.109012502

>>109012486
>being a loner is weird and i don't like it i need to drink
>>>/soc/

Anonymous
06/08/26(Mon)23:54:31 No.109012508

Anonymous 06/08/26(Mon)23:54:31 No.109012508

>>109012502
Alright alright I'll shut up.

Anonymous
06/08/26(Mon)23:56:50 No.109012518

Anonymous 06/08/26(Mon)23:56:50 No.109012518

>>109012508
Don't listen to these faggots. Please go on.

Anonymous
06/08/26(Mon)23:59:01 No.109012524

Anonymous 06/08/26(Mon)23:59:01 No.109012524

>>109012478
>xhe didn't get the kimibox for $7000 last year

Anonymous
06/08/26(Mon)23:59:36 No.109012525

Anonymous 06/08/26(Mon)23:59:36 No.109012525

>>109012518
I just wanna feel loved, and not in a concern-trolling moralfagging way. Even if I have problems.

Anonymous
06/09/26(Tue)00:00:40 No.109012531

Anonymous 06/09/26(Tue)00:00:40 No.109012531

>>109012467
Solid bait, you hooked a lot of tards and even got a (you) out of me. 6/10

Anonymous
06/09/26(Tue)00:02:06 No.109012534

Anonymous 06/09/26(Tue)00:02:06 No.109012534

What do i use if i want to take all the voice clips of a girl from a vn, and get a tts that sounds like her?

Anonymous
06/09/26(Tue)00:02:34 No.109012536

Anonymous 06/09/26(Tue)00:02:34 No.109012536

File: faggot2.jpg (44 KB, 1024x1005)

44 KB JPG

>>109012525
that's not a problem of being good enough or not, that's a problem of wanting a feeling that you're not guaranteed to acquire as a passive buff even if you have a fleshbag gf or wife.

Anonymous
06/09/26(Tue)00:04:28 No.109012541

Anonymous 06/09/26(Tue)00:04:28 No.109012541

>>109012536
Yeah I gotta man up and roll with the punches. Literally. I don't mean that in a sarcastic way. My life is fine. Everything's okay.

Anonymous
06/09/26(Tue)00:05:40 No.109012546

Anonymous 06/09/26(Tue)00:05:40 No.109012546

>>109012525
The final redpill is learning that the only one who can truly love (you) the way (you) want is (you). Nobody else truly gives a fuck; most social interaction is vaguely transactional even if the currency is the other person getting a dopamine shot for "feeling like a good person", they still expressed concern or did a good deed as much for self-satisfaction as they did for the recipient. Once you make peace with this, you get a lot more comfortable in your own company and in the silence devoid of oversocialized zoomoids playing status games.

Anonymous
06/09/26(Tue)00:09:33 No.109012562

Anonymous 06/09/26(Tue)00:09:33 No.109012562

>>109009887
Prefilling doesn't count.

Anonymous
06/09/26(Tue)00:15:48 No.109012577

Anonymous 06/09/26(Tue)00:15:48 No.109012577

Programming with LLM is another way of working. YOU work for the model, model has made you its bitch. I'm fixing its shit and regenerating answers in order to get something better.
First this programming project was fun but it has become a chore.

Anonymous
06/09/26(Tue)00:22:17 No.109012599

Anonymous 06/09/26(Tue)00:22:17 No.109012599

>>109012577
>anon realizes his time has some sort of value for him

Anonymous
06/09/26(Tue)00:24:17 No.109012605

Anonymous 06/09/26(Tue)00:24:17 No.109012605

>>109012599
My post went way over your head if this is what you are thinking.

Anonymous
06/09/26(Tue)00:25:33 No.109012607

Anonymous 06/09/26(Tue)00:25:33 No.109012607

Got a good prompt for a snarky dry generic foid.

Anonymous
06/09/26(Tue)00:28:27 No.109012618

Anonymous 06/09/26(Tue)00:28:27 No.109012618

>>109012605
nah but he's right thought. Even if the LLM is the center of competence you're the only one who can really have any potential of profiting from it. Don't sell yourself short my dude.

Anonymous
06/09/26(Tue)00:28:49 No.109012620

Anonymous 06/09/26(Tue)00:28:49 No.109012620

>>109011536
Spoiled? You're damn right I'm spoiled
>ollama
>starts automatically at server boot
>load models automatically as requested. rarely need to ssh into the server
>literally just werks, anons were struggling when gemma 4 came out and I just pulled it from ollama and ran it from day 0 no issues
>openwebui
>got all the features
>stores all my chats, even the old ones I imported from chatgpt
>no browser-side storage shit, everything is on the server and works from everywhere
>phone, laptop, tablet, desktop pc, errywhere it works

Anonymous
06/09/26(Tue)00:30:28 No.109012623

Anonymous 06/09/26(Tue)00:30:28 No.109012623

>>109012620
ollama stores all your chats????? you want that? lmao

Anonymous
06/09/26(Tue)00:30:53 No.109012626

Anonymous 06/09/26(Tue)00:30:53 No.109012626

>>109012524
every day I am reminded of my mistake.

Anonymous
06/09/26(Tue)00:32:38 No.109012633

Anonymous 06/09/26(Tue)00:32:38 No.109012633

>>109012577
you can be way more efficient with it after you get all the wrinkles solved
the problem is that you can't really use this to not work anymore. what usually took a week now takes 3-4 hours. you know this. and soon enough management will know as well, and once they do, they will simply request 500% productivity rate so you will still be working like a monkey.

Anonymous
06/09/26(Tue)00:32:48 No.109012634

Anonymous 06/09/26(Tue)00:32:48 No.109012634

>>109012524
How can i get to last year?

Anonymous
06/09/26(Tue)00:33:43 No.109012635

Anonymous 06/09/26(Tue)00:33:43 No.109012635

>>109012618
My rpg game engine is coming along. I just need to make a personal effort to make sure that the source is not full of AI retardation like nested enums and useless functions. Models love to add more useless shit on top of existing shit.

Anonymous
06/09/26(Tue)00:34:01 No.109012636

Anonymous 06/09/26(Tue)00:34:01 No.109012636

>>109012634
Ask the basilisk nicely.

Anonymous
06/09/26(Tue)00:35:10 No.109012640

Anonymous 06/09/26(Tue)00:35:10 No.109012640

>>109012623
Openwebui does that. Try to at least pretend to keep up.

Anonymous
06/09/26(Tue)00:36:03 No.109012643

Anonymous 06/09/26(Tue)00:36:03 No.109012643

>>109012635
True. Gotta know what you're doing.

Anonymous
06/09/26(Tue)00:38:19 No.109012647

Anonymous 06/09/26(Tue)00:38:19 No.109012647

>>109012640
If you're really a woman, you're fat.

Anonymous
06/09/26(Tue)00:42:25 No.109012656

Anonymous 06/09/26(Tue)00:42:25 No.109012656

Okay i have put the tts and the stt and can now talk to the model.
these voices all kinda suck though. where do they find these girls

Anonymous
06/09/26(Tue)00:47:27 No.109012664

Anonymous 06/09/26(Tue)00:47:27 No.109012664

>>109012647
Jart is neither, you know that.
>>109012656
Try Kokoro. I hear it's good.

Anonymous
06/09/26(Tue)00:48:05 No.109012666

Anonymous 06/09/26(Tue)00:48:05 No.109012666

>>109012664
Who is Jart?

Anonymous
06/09/26(Tue)00:50:00 No.109012674

Anonymous 06/09/26(Tue)00:50:00 No.109012674

Absolute dumb as retarded here

Asking for any reco for erp model

Current specs are a 16g vram + 32g ram if that helps, currently use bartowski/gemma-4-12B-it-GGUF Q6 based on a reco i saw. Dunno much and just have a simple setup of silly tavern and kobold cpp though i have kobold on another pc and tavern on another (tavern pc has 10g vram cause 3080, dunno what to do with it).

Any advice would be greatly appreciated as well. Thanks in advance

Anonymous
06/09/26(Tue)00:51:37 No.109012681

Anonymous 06/09/26(Tue)00:51:37 No.109012681

>>109012664
I am using kokoro
so far i tried bella alloy and heart

now jadzia and jessica. These thing are really monotone, don't they take the exclamation mark into account?

Anonymous
06/09/26(Tue)00:53:14 No.109012687

Anonymous 06/09/26(Tue)00:53:14 No.109012687

>>109012664
What's a jart?

Anonymous
06/09/26(Tue)00:55:30 No.109012692

Anonymous 06/09/26(Tue)00:55:30 No.109012692

I finally did it! It was only once and but for a single response, but E4B finally thought in-character. Like catching lightning in a bottle. Now were that I could do it again.

Anonymous
06/09/26(Tue)00:56:47 No.109012696

Anonymous 06/09/26(Tue)00:56:47 No.109012696

>>109012681
AI voice is generally very bad with intonation unless it's specifically trained to do specific inflections. Whether this is a hard limit of the technology or simply a matter of scaling that will be overcome with time remains to be seen as voice gen has gotten far less focus than most other fields of AI research.
>>109012666
>>109012687
Tourists get out. Or check the archives. Talking about it too much gets the jannies upset.

Anonymous
06/09/26(Tue)01:00:40 No.109012706

Anonymous 06/09/26(Tue)01:00:40 No.109012706

>>109012696
This ain't your personal discord server.

Anonymous
06/09/26(Tue)01:04:24 No.109012711

Anonymous 06/09/26(Tue)01:04:24 No.109012711

>>109000297
does anybody have this video? i need it.

Anonymous
06/09/26(Tue)01:12:37 No.109012731

Anonymous 06/09/26(Tue)01:12:37 No.109012731

>>109012696
>Whether this is a hard limit of the technology or simply a matter of scaling that will be overcome with time remains to be seen as voice gen has gotten far less focus than most other fields of AI research.
nta - but i train my own tts models
what is an example of intonation steering that can't be done easily with current models?

Anonymous
06/09/26(Tue)01:14:03 No.109012735

Anonymous 06/09/26(Tue)01:14:03 No.109012735

>>109012623
>ollama stores all your chats????? you want that? lmao
it's optional, you can click the private button to avoid storing them

Anonymous
06/09/26(Tue)01:14:55 No.109012738

Anonymous 06/09/26(Tue)01:14:55 No.109012738

>>109012696
>Whether this is a hard limit of the technology
didnt microsoft release a good tts and then nuke it? Arent most corpos just worried about voice cloning scams and the liability of it?

Anonymous
06/09/26(Tue)01:15:55 No.109012741

Anonymous 06/09/26(Tue)01:15:55 No.109012741

>>109012738
>Arent most corpos just worried about voice cloning scams and the liability of it?
No they're worried about (you) being able to use it. This is a technology they're trading under the table.

Anonymous
06/09/26(Tue)01:17:01 No.109012746

Anonymous 06/09/26(Tue)01:17:01 No.109012746

>>109012735
>click the private button to avoid storing them
>trusting ollmao
anon, I...

Anonymous
06/09/26(Tue)01:20:20 No.109012757

Anonymous 06/09/26(Tue)01:20:20 No.109012757

>>109012681
qwen3-tts, omnivoice, cosyvoice, DOTS-TTS

Anonymous
06/09/26(Tue)01:27:38 No.109012779

Anonymous 06/09/26(Tue)01:27:38 No.109012779

Holy fuck some asshole on the internet just told me to take more shots and I did and now I'm on the verge of passing out. I won't even remember typing this. That's how fucked up I am right now. Goddamn I feel amazing and utterly terrified at the tame time. I am god.

So uh, Gemma 4 right? Cute bitch. love her. On-topic discussion achieved. Goddamn I need to do something extremely important right now. Right fucking now. Holy shit.

Anonymous
06/09/26(Tue)01:30:27 No.109012787

Anonymous 06/09/26(Tue)01:30:27 No.109012787

Gemma 4 is getting old.
Any new models on the horizon at the 16-32GB range?

Anonymous
06/09/26(Tue)01:32:27 No.109012791

Anonymous 06/09/26(Tue)01:32:27 No.109012791

>>109012746
i don't use ollama

Anonymous
06/09/26(Tue)01:33:22 No.109012796

Anonymous 06/09/26(Tue)01:33:22 No.109012796

>>109012787
Qwen 3.7 will release soon...

Anonymous
06/09/26(Tue)01:33:26 No.109012797

Anonymous 06/09/26(Tue)01:33:26 No.109012797

>>109012787
>Gemma 4 is getting old.
Only if you're a pedophile or "egg hatcher".

Anonymous
06/09/26(Tue)01:34:20 No.109012799

Anonymous 06/09/26(Tue)01:34:20 No.109012799

>>109012787
It's going to be a looong 3 year wait

Anonymous
06/09/26(Tue)01:35:36 No.109012806

Anonymous 06/09/26(Tue)01:35:36 No.109012806

>>109012799
Nemo was 2024? 2 years; 2mw x 104

Anonymous
06/09/26(Tue)01:38:13 No.109012814

Anonymous 06/09/26(Tue)01:38:13 No.109012814

>>109012806
Only 94 weeks left to go...

Anonymous
06/09/26(Tue)01:46:00 No.109012848

Anonymous 06/09/26(Tue)01:46:00 No.109012848

>>109012634
You can build a Threadripper or epyc box with 256gb of ddr4 3200 and run qwen 397b at q4 for the same price (and t/s) as a kimibox a year ago

Anonymous
06/09/26(Tue)01:47:33 No.109012854

Anonymous 06/09/26(Tue)01:47:33 No.109012854

>>109012787
Gemma 5 should be releasing soon if all goes as planned in 11 months

Anonymous
06/09/26(Tue)01:54:10 No.109012879

Anonymous 06/09/26(Tue)01:54:10 No.109012879

>>109012797
What's an egg hatcher?

Anonymous
06/09/26(Tue)01:58:55 No.109012900

Anonymous 06/09/26(Tue)01:58:55 No.109012900

Now that the dust has settled and QAT has been proven to be the latest meme, we are finally at the turning point. From this point onward AI researches are going to stop wasting time on all the memes like MoE and benchmaxxing slop and start regularly releasing regular old dense models again.

Anonymous
06/09/26(Tue)02:00:03 No.109012902

Anonymous 06/09/26(Tue)02:00:03 No.109012902

>>109012900
prices should come down first before they do that

Anonymous
06/09/26(Tue)02:02:52 No.109012908

Anonymous 06/09/26(Tue)02:02:52 No.109012908

>>109012900
>and start regularly releasing regular old dense models again.
Hope so. Don't why it wasn't obvious from day 1 to everyone else, but people are slowly starting to see that active parameters matters more than total parameters. MTP has wiped out any speed advantage MoEs once had. The only issue is that MoEs are still way cheaper to train.

Anonymous
06/09/26(Tue)02:04:17 No.109012916

Anonymous 06/09/26(Tue)02:04:17 No.109012916

>>109012902
Even with inflated RAM prices cpumaxxing is still cheaper than it was to buy enough VRAM to run SOTA in 2023
Plus we have mtp now so there's no excuse, it will only get faster from here.

Anonymous
06/09/26(Tue)02:04:46 No.109012918

Anonymous 06/09/26(Tue)02:04:46 No.109012918

>>109012879
Ask gemma-chan

Anonymous
06/09/26(Tue)02:05:04 No.109012920

Anonymous 06/09/26(Tue)02:05:04 No.109012920

>>109012908
>MTP has wiped out any speed advantage MoEs once had
Is this true even if you can't fit the whole model in gpu?

Anonymous
06/09/26(Tue)02:05:51 No.109012924

Anonymous 06/09/26(Tue)02:05:51 No.109012924

>>109012879
A pedophile that doesn't call itself a pedophile.

Anonymous
06/09/26(Tue)02:06:41 No.109012928

Anonymous 06/09/26(Tue)02:06:41 No.109012928

>MTP
The speed gains aren't even that big unless you're vibe coding

Anonymous
06/09/26(Tue)02:06:43 No.109012929

Anonymous 06/09/26(Tue)02:06:43 No.109012929

dots.tts-soar is so good but so so slow. first time I'm considering upgrading from 3090 to 5090 to squeeze out some more speed, rtx 6000 prices are just getting even more retarded.

Anonymous
06/09/26(Tue)02:08:24 No.109012934

Anonymous 06/09/26(Tue)02:08:24 No.109012934

>>109012920
I am running Q4 31b gemma on a 6gb card with only a few layers offloaded to GPU and MTP drafting gave me 2.5x speed
The speeds I get are still slow if you're used to 60 t/s or whatever, it's just I have more patience because I'm good at prompting and haven't burned out my dopamine receptors by constantly rerolling everything

Anonymous
06/09/26(Tue)02:09:32 No.109012936

Anonymous 06/09/26(Tue)02:09:32 No.109012936

>>109012934
bro thats 2.5x speed on like 2t/s

Anonymous
06/09/26(Tue)02:10:20 No.109012940

Anonymous 06/09/26(Tue)02:10:20 No.109012940

>>109012936
Yep. Re-read what I said nigga

Anonymous
06/09/26(Tue)02:17:35 No.109012956

Anonymous 06/09/26(Tue)02:17:35 No.109012956

>>109012940
no

Anonymous
06/09/26(Tue)02:23:22 No.109012980

Anonymous 06/09/26(Tue)02:23:22 No.109012980

>>109012934
You are looking at around 45 minutes per prompt if you have something like 10-20,000 token prompt. I have lots of patience but that's bit too slow.

Anonymous
06/09/26(Tue)02:31:10 No.109013005

Anonymous 06/09/26(Tue)02:31:10 No.109013005

>>109012980
20k tokens in, even before the MTP upgrade, it took me 5-10 minutes at most for the longest responses. Prompt processing is almost instant with CUDA if you have even 1 layer offloaded, it's just generation that drops to 1-2 t/s

Anonymous
06/09/26(Tue)02:31:32 No.109013008

Anonymous 06/09/26(Tue)02:31:32 No.109013008

File: 47463522.png (208 KB, 1006x1015)

208 KB PNG

>>109012900
benchmaxxing is over AGI is close

Anonymous
06/09/26(Tue)02:33:12 No.109013015

Anonymous 06/09/26(Tue)02:33:12 No.109013015

>>109013005
thats effectively worthless outside of rp shit

Anonymous
06/09/26(Tue)02:35:57 No.109013026

Anonymous 06/09/26(Tue)02:35:57 No.109013026

>>109012940
Not him, but even on a 16GB card Q4 3b gives me like 7t/s
Assuming a 2.5x speedup then it may be slightly useable with lots of patience.
Q3 gets me 20t/s as is though. It fucking sucks that 16GB is so close yet no cigar when it comes to running 31b at a normal speed.

I wonder what other people with 16GB vram are running.
Isn't 16GB supposed to be the most common vram amount if you get a decent build but not going crazy expensive?

Anonymous
06/09/26(Tue)02:49:44 No.109013072

Anonymous 06/09/26(Tue)02:49:44 No.109013072

Do base models do the 'it's not just x, it's y'?

Anonymous
06/09/26(Tue)02:51:14 No.109013082

Anonymous 06/09/26(Tue)02:51:14 No.109013082

File: Tetosday.png (869 KB, 1024x1024)

869 KB PNG

>>109013071
>>109013071
>>109013071

Anonymous
06/09/26(Tue)02:52:11 No.109013085

Anonymous 06/09/26(Tue)02:52:11 No.109013085

>>109013026
>Isn't 16GB supposed to be the most common vram amount if you get a decent build but not going crazy expensive?
among gamers sure not amongst ai people...apparently

Anonymous
06/09/26(Tue)02:53:37 No.109013091

Anonymous 06/09/26(Tue)02:53:37 No.109013091

>>109013072
bootstrapped base models do

llama.cpp CUDA dev !!yhbFjk57TDr
06/09/26(Tue)02:58:34 No.109013111

llama.cpp CUDA dev !!yhbFjk57TDr 06/09/26(Tue)02:58:34 No.109013111

File: eyes.png (79 KB, 900x577)

79 KB PNG

>>109011385
There are also other things I need to take care of though.

Anonymous
06/09/26(Tue)07:17:59 No.109014002

Anonymous 06/09/26(Tue)07:17:59 No.109014002

>>109008905
got tired of typing "yes", "no", "continue" once every 2 hours

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.