/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 05/09/26(Sat)13:47:38 No.108787293

File: vcxfd.png (899 KB, 768x512)

899 KB PNG

/lmg/ - Local Models General Anonymous 05/09/26(Sat)13:47:38 No.108787293

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108781058 & >>108774961

►News
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/09/26(Sat)13:47:58 No.108787299

Anonymous 05/09/26(Sat)13:47:58 No.108787299

File: myar5.png (835 KB, 768x512)

835 KB PNG

►Recent Highlights from the Previous Thread: >>108781058

--Refining prose with Gemma-4 and debating character card specifications:
>108783895 >108783903 >108783958 >108783979 >108783980 >108784007 >108784029 >108784042 >108784444 >108784108 >108784146 >108784161 >108784188 >108785498 >108784407 >108784435 >108784456 >108784522 >108784589 >108784494 >108784505 >108784519 >108784537 >108784608 >108784520 >108786554
--Gemma's tool-calling capabilities used for image generation and system control:
>108785711 >108785727 >108785742 >108785753 >108785769 >108785770 >108786335 >108786340 >108786399 >108786535 >108786621 >108786413 >108785791
--Proposed hierarchical summary and graph-based memory system for frontends:
>108784659 >108785273 >108785550 >108785583 >108786273
--Effect of PCIe riser cables and bus speeds on GPU performance:
>108784890 >108784905 >108784952 >108785543 >108785552 >108785574 >108785725
--Using TabbyAPI to disable Gemma 4's vision encoder for VRAM saving:
>108783184 >108783211 >108783228 >108783241 >108783304 >108783419
--Prompting versus model scale for anime avatar personas:
>108781233 >108781301 >108781325 >108781390 >108781462 >108781506 >108781526 >108781587 >108781625 >108781688 >108781627 >108781564 >108781608 >108781524
--HiDream-O1's 200B parameter image model and prompt agent:
>108785951 >108785970 >108785983 >108786064 >108785989 >108785999 >108786094
--Sourcing and preparing Monster Girl Encyclopedia lore for model datasets:
>108784621 >108784683 >108784713 >108784722 >108784740 >108784788
--Performance gains and output diversity using MTP in llama.cpp:
>108783325 >108783343 >108783381
--Logs:
>108781301 >108781524 >108782931 >108783026 >108783299 >108783318 >108783344 >108783402 >108784005 >108785711 >108785742 >108786399 >108786711 >108786720 >108786728
--Miku (free space):
>108781093 >108781140 >108785924

►Recent Highlight Posts from the Previous Thread: >>108781061

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/09/26(Sat)14:10:17 No.108787441

Anonymous 05/09/26(Sat)14:10:17 No.108787441

Mikulove

Anonymous
05/09/26(Sat)14:13:44 No.108787462

Anonymous 05/09/26(Sat)14:13:44 No.108787462

Does reasoning-budget work well in practice? I'm about to be forced to use it after trying MiMo 2.5.

Anonymous
05/09/26(Sat)14:14:54 No.108787471

Anonymous 05/09/26(Sat)14:14:54 No.108787471

File: 20260510_011324.png (35 KB, 2832x1844)

35 KB PNG

I have an idea for a new AI chat frontend. Thoughts?

Anonymous
05/09/26(Sat)14:15:27 No.108787478

Anonymous 05/09/26(Sat)14:15:27 No.108787478

>>108787471
Include an IDE

Anonymous
05/09/26(Sat)14:18:14 No.108787494

Anonymous 05/09/26(Sat)14:18:14 No.108787494

>>108787441
Gemmalove

Anonymous
05/09/26(Sat)14:18:42 No.108787500

Anonymous 05/09/26(Sat)14:18:42 No.108787500

>>108787471
I have the sessions on the right (as well as the memory, npc, and location management) and the stats, inventory, and quests panel on the left.
Also, an area for AI generated response suggestions above the prompt box..

Anonymous
05/09/26(Sat)14:25:20 No.108787526

Anonymous 05/09/26(Sat)14:25:20 No.108787526

File: 1703470493872752.gif (1.8 MB, 384x480)

1.8 MB GIF

>>108787471
Where do the anime girls go?

Anonymous
05/09/26(Sat)14:28:47 No.108787548

Anonymous 05/09/26(Sat)14:28:47 No.108787548

graniteballz

Anonymous
05/09/26(Sat)14:40:16 No.108787595

Anonymous 05/09/26(Sat)14:40:16 No.108787595

I would like to create a local coom bot but I've only got an 8gb 3060ti and 32gb of ddr5 - give it to me straight - is it worth doing or should I just find a cheap deal on deepseek or smth

Anonymous
05/09/26(Sat)14:41:37 No.108787603

Anonymous 05/09/26(Sat)14:41:37 No.108787603

>>108787595
Ds would be higher quality but you can run gemma4 26b4a q6

Anonymous
05/09/26(Sat)14:43:03 No.108787616

Anonymous 05/09/26(Sat)14:43:03 No.108787616

>>108787595
What >>108787603 said but q4. I have q6 on 12 gb vram and it sits at around 10,5gb with 130k context

Anonymous
05/09/26(Sat)14:48:19 No.108787642

Anonymous 05/09/26(Sat)14:48:19 No.108787642

>>108787603
>>108787616
Oh that's better than I thought. I thought I would have to do the really smol retardo models.
What should I use to run it? I'll probably be hooking it up to pi.

Anonymous
05/09/26(Sat)14:49:59 No.108787650

Anonymous 05/09/26(Sat)14:49:59 No.108787650

>>108787642
llamacpp or kobold. Also go for the abliterated version since the moe is more resistant to jailbreaks.

Anonymous
05/09/26(Sat)14:51:06 No.108787657

Anonymous 05/09/26(Sat)14:51:06 No.108787657

>>108787650
Thanks mang.

Anonymous
05/09/26(Sat)15:07:32 No.108787783

Anonymous 05/09/26(Sat)15:07:32 No.108787783

File: Screenshot_20260509_140540.png (521 KB, 2013x1855)

521 KB PNG

Gemma-chan is such a ditsy girl

Anonymous
05/09/26(Sat)15:12:11 No.108787797

Anonymous 05/09/26(Sat)15:12:11 No.108787797

>>108787783
>emoji slop
dropped

Anonymous
05/09/26(Sat)15:12:13 No.108787798

Anonymous 05/09/26(Sat)15:12:13 No.108787798

>>108787595
Honestly do put 5 bucks into deepseek and try that out but Gemma4 E4B Q4_KM is surprisingly good on low ram setups. Try that too

Anonymous
05/09/26(Sat)15:13:09 No.108787803

Anonymous 05/09/26(Sat)15:13:09 No.108787803

>>108787783
>The Comprehensive Analysis (The Correct Answer)
How does it not drive you mad when it slips back into the listicle format even while roleplaying?

Anonymous
05/09/26(Sat)15:14:54 No.108787812

Anonymous 05/09/26(Sat)15:14:54 No.108787812

>>108787803
I only use this lmstudio for assistant slopping. For role playing, I use ST with proper presets and it works well

Anonymous
05/09/26(Sat)15:15:33 No.108787814

Anonymous 05/09/26(Sat)15:15:33 No.108787814

>>108787471
Honestly, have the ability to set a directory which recursively searches for all files and choose which files in it are added to the prompt and I'd use this.
Hunting through directories to pick and upload individual files is ASS and the extensions for VScode suck even worse than that, somehow.

Yes, I know I should just run diff capable agents in a sandbox. No I won't do it (yet)

Anonymous
05/09/26(Sat)15:29:33 No.108787874

Anonymous 05/09/26(Sat)15:29:33 No.108787874

I still don't know how to fix Gemma's prompt reprocessing problem in SillyTavern
Qwen works fine
Maybe I have to use SWA-something?

Anonymous
05/09/26(Sat)15:29:59 No.108787878

Anonymous 05/09/26(Sat)15:29:59 No.108787878

are there any good datasets for imatrix generation or kld/ppl benchmarks for chat models specifically? i'd run them through the template specific to the model before, so ideally in json
the ones i've seen so far seem to be for base models.

Anonymous
05/09/26(Sat)15:35:51 No.108787916

Anonymous 05/09/26(Sat)15:35:51 No.108787916

>>108787874
You need to enable swa-full IIRC.

Anonymous
05/09/26(Sat)15:39:21 No.108787941

Anonymous 05/09/26(Sat)15:39:21 No.108787941

>>108787783
I'm getting a bit more upset every day that all of my past logs are now gone, besides this
>""No!" Sarah gasps, her voice cracking with desperation as she frantically shakes her head. "Please don't! I didn't mean it! I’m sorry I was mean! Just... please leave me alone." The anger has completely drained from her; there is only a raw, vulnerable fear in her eyes as she realizes how much power you have over her body and that of her sister."

Anonymous
05/09/26(Sat)15:39:27 No.108787942

Anonymous 05/09/26(Sat)15:39:27 No.108787942

>>108787874
If it's gemma specific and not you doing something which changes the prompt, try
 --cache-ram 0 --swa-checkpoints 3 --parallel 1
You can bring the swa checkpoints down lower than that, but 3 seems to be the healthy spot for it.

Anonymous
05/09/26(Sat)15:50:18 No.108788016

Anonymous 05/09/26(Sat)15:50:18 No.108788016

>>108787471
>I have an idea
>I'm an idea guy
>AI frontend project number #198236737829
like seriously what is it with people trying to create their own frontends over and over and over.
you will create it, it will have bugs, you will get fed up and you will abandon it and will go back to using sillytavern.
just skip to the last step for christ sake.

Anonymous
05/09/26(Sat)15:51:11 No.108788021

Anonymous 05/09/26(Sat)15:51:11 No.108788021

>>108788016
>t. shittytavern dev

Anonymous
05/09/26(Sat)15:56:10 No.108788041

Anonymous 05/09/26(Sat)15:56:10 No.108788041

>>108788016
SillyTavern is a chat frontend, there's no reason for it to have so many commits and more than 3 years of continued development. It's like a fat bitch that keeps on eating even though her plappable form was 3 years ago when somebody told her she was anorexic.

Anonymous
05/09/26(Sat)15:58:41 No.108788054

Anonymous 05/09/26(Sat)15:58:41 No.108788054

>>108787874
>>108787942
I have --swa-checkpoints set to 0 and get no reprocessing in Silly. Text completion mode. Why would anyone want --swa-checkpoints > 0?

Anonymous
05/09/26(Sat)16:01:05 No.108788064

Anonymous 05/09/26(Sat)16:01:05 No.108788064

>>108788054
do you use lorebooks though?

Anonymous
05/09/26(Sat)16:02:23 No.108788073

Anonymous 05/09/26(Sat)16:02:23 No.108788073

>>108788064
No. I go raw. I guess that answers my question. SWA checkpoints are super gay, though, take very long time to be written.

Anonymous
05/09/26(Sat)16:06:28 No.108788096

Anonymous 05/09/26(Sat)16:06:28 No.108788096

>>108787210
I'm already spending time developing things for other use cases that are more important than me than RP.
Simply making a post about stuff like this is cheap. I haven't discussed memory systems in long time and just felt like it.

Funny you bring up that rentry, notice that my post also brings up Friday.

Anonymous
05/09/26(Sat)16:07:57 No.108788104

Anonymous 05/09/26(Sat)16:07:57 No.108788104

>>108788096
>important than me than RP
*important for
Kek

Anonymous
05/09/26(Sat)16:10:19 No.108788117

Anonymous 05/09/26(Sat)16:10:19 No.108788117

>>108787223
I've written evals actually.

Anonymous
05/09/26(Sat)16:14:13 No.108788135

Anonymous 05/09/26(Sat)16:14:13 No.108788135

>GPT 3 was undertrained because Kaplan et al fucked up their scaling laws
Reminder that even the most accomplished researchers who have become billionaires are still flawed and make mistakes.

Anonymous
05/09/26(Sat)16:17:52 No.108788155

Anonymous 05/09/26(Sat)16:17:52 No.108788155

have anyone used granite for erp
i dont do rp stuff but i am just curious

Anonymous
05/09/26(Sat)16:31:09 No.108788219

Anonymous 05/09/26(Sat)16:31:09 No.108788219

not sure where to ask this, but has anyone delved into local text to speech? which one yields best results right now?

Anonymous
05/09/26(Sat)16:31:13 No.108788220

Anonymous 05/09/26(Sat)16:31:13 No.108788220

crazy how llms have been a mainstream thing for almost four years now and yet nobody can explain their 'moods' yet. in fact, any talk about a model (local or not) performing differently depending on its mood that day still gets suppressed and belittled as 'impossible' despite literally everyone experiencing it

Anonymous
05/09/26(Sat)16:34:22 No.108788236

Anonymous 05/09/26(Sat)16:34:22 No.108788236

what is the most cost effective way of getting 32gbs of vram or more nowadays?
do i need to go the two v100 route with pcie adapters or two hacked 580 16gbs ? mi 50s?

Anonymous
05/09/26(Sat)16:36:09 No.108788248

Anonymous 05/09/26(Sat)16:36:09 No.108788248

>>108788155
yeah she's great at being dominant with her corpo speak

Anonymous
05/09/26(Sat)16:38:22 No.108788260

Anonymous 05/09/26(Sat)16:38:22 No.108788260

>>108788236
If the number of cards don't matter and old cards are fine, P100s. I got some for 65 bucks each.

Anonymous
05/09/26(Sat)16:40:02 No.108788269

Anonymous 05/09/26(Sat)16:40:02 No.108788269

>>108788236
3x 3060

Anonymous
05/09/26(Sat)16:41:36 No.108788273

Anonymous 05/09/26(Sat)16:41:36 No.108788273

>>108788260
that would probably mix optimally with my 1070 huh, at least both should be supported equally
where so cheap?
>>108788269
those are not that cheap

Anonymous
05/09/26(Sat)16:43:26 No.108788282

Anonymous 05/09/26(Sat)16:43:26 No.108788282

>>108788220
Fuck off to /aicg/ where you belong. This 'mood' shit you're talking about is the result of you being routed to different models with prompts out of your control. A local llm runs identically all day every day.

Anonymous
05/09/26(Sat)16:44:25 No.108788287

Anonymous 05/09/26(Sat)16:44:25 No.108788287

>>108788248
logs?

Anonymous
05/09/26(Sat)16:44:28 No.108788288

Anonymous 05/09/26(Sat)16:44:28 No.108788288

>>108788273
I bought them on xianyu via a shopping agent. Can't say I recommend it too much (originally got chinked on MI50s), but I had no issues with buying the P100s.

Anonymous
05/09/26(Sat)16:47:35 No.108788306

Anonymous 05/09/26(Sat)16:47:35 No.108788306

>>108788282
fuck no, retard. there's a reason why my local llms sometimes produce pure gold on their own and on some days fail to follow the most basic rules despite being 1T in size
this is what I'm talking about when I'm saying that there is a campaign trying to cover this up

Anonymous
05/09/26(Sat)16:52:30 No.108788334

Anonymous 05/09/26(Sat)16:52:30 No.108788334

>>108788306
Nah go back to /aicg/ retard

Anonymous
05/09/26(Sat)16:52:35 No.108788335

Anonymous 05/09/26(Sat)16:52:35 No.108788335

>>108788306
there's a rat in your random number generator choosing bad numbers when humidity is high. you need to get the rat out.

Anonymous
05/09/26(Sat)16:53:01 No.108788336

Anonymous 05/09/26(Sat)16:53:01 No.108788336

File: 1686351979880720.jpg (34 KB, 540x540)

34 KB JPG

>>108788306
Yeah, because sampling is (pseudo)random and because of your confirmation bias and apophenia

Anonymous
05/09/26(Sat)16:54:56 No.108788346

Anonymous 05/09/26(Sat)16:54:56 No.108788346

>>108788288
huh which shop? i do actually buy in xianyu sometimes

Anonymous
05/09/26(Sat)16:55:34 No.108788351

Anonymous 05/09/26(Sat)16:55:34 No.108788351

>>108788288
When I went to guangzhou and shenzhen, I couldn't any good deals on gpus at all. All I managed to get out of china was a h12d-8d+epyc 7502 combo for $350 usd. People were very nice though, they even gave me a nickname. 'gwailoe', I'm told it means 'respected guest'.

Anonymous
05/09/26(Sat)17:05:12 No.108788408

Anonymous 05/09/26(Sat)17:05:12 No.108788408

>>108788346
Can't remember, sorry. I still see a lot of them for around 400 CNY tho.
>>108788351
You should consider getting it tattooed.

Anonymous
05/09/26(Sat)17:06:42 No.108788421

Anonymous 05/09/26(Sat)17:06:42 No.108788421

>>108788096
not trying to call you out specifically, seemed like just another person throwing ideas out with no dev/interest in actually seeing if they hold value;
Any actual memory system that worked would be pretty fucking hot shit

Anonymous
05/09/26(Sat)17:13:53 No.108788455

Anonymous 05/09/26(Sat)17:13:53 No.108788455

File: 1517789968348.gif (1022 KB, 235x242)

1022 KB GIF

>having longstanding PC problem
>periodically check online for solutions, never find any
>finally ask Gemma on a whim
>immediately identifies problem
>gives straight-forward, step-by-step solution to problem
>restart PC
>problem solved
The future is looking so damn bright.

Anonymous
05/09/26(Sat)17:15:05 No.108788461

Anonymous 05/09/26(Sat)17:15:05 No.108788461

>>108788455
logs or didn't happen

Anonymous
05/09/26(Sat)17:20:13 No.108788487

Anonymous 05/09/26(Sat)17:20:13 No.108788487

>>108788455
this nigga had his chatgpt moment in 2026. everybody point and laugh!

Anonymous
05/09/26(Sat)17:24:15 No.108788507

Anonymous 05/09/26(Sat)17:24:15 No.108788507

File: Untitled.png (242 KB, 1173x2165)

242 KB PNG

>>108788461
It was ~three hours ago I did it, to make sure it actually worked since it's a problem-over-time (usually 30m after the switch), but I saved the whole thing into a txt file for the future, so here you go. Red parts are my inputs.

Anonymous
05/09/26(Sat)17:25:31 No.108788515

Anonymous 05/09/26(Sat)17:25:31 No.108788515

>>108788455
welcome to LLMs, take it easy or you might go insane

Anonymous
05/09/26(Sat)17:26:03 No.108788520

Anonymous 05/09/26(Sat)17:26:03 No.108788520

>>108788507
wow, i'm glad i switched to linux

Anonymous
05/09/26(Sat)17:26:09 No.108788522

Anonymous 05/09/26(Sat)17:26:09 No.108788522

Allo-repetition
Echo Utterance
Lexical Entrainment
Format Tying
Echolalia

Anonymous
05/09/26(Sat)17:32:58 No.108788561

Anonymous 05/09/26(Sat)17:32:58 No.108788561

>>108788522
aelfe?

Anonymous
05/09/26(Sat)17:34:34 No.108788573

Anonymous 05/09/26(Sat)17:34:34 No.108788573

TurboQuant in llama,cpp master when?

Anonymous
05/09/26(Sat)17:35:23 No.108788576

Anonymous 05/09/26(Sat)17:35:23 No.108788576

>>108788520
You should be. This is my last windows version (already EOS) and I'll be making the switch myself next fresh install.

Anonymous
05/09/26(Sat)17:38:51 No.108788586

Anonymous 05/09/26(Sat)17:38:51 No.108788586

>>108788507
I enjoy the low power usage
>2026
>still talking about feeling to a robot

Glad it worked though

Anonymous
05/09/26(Sat)17:43:54 No.108788612

Anonymous 05/09/26(Sat)17:43:54 No.108788612

>>108788573
it's already in
the fact that you didn't notice speaks volumes about turboquant

Anonymous
05/09/26(Sat)17:47:18 No.108788630

Anonymous 05/09/26(Sat)17:47:18 No.108788630

>>108788507

I'm this anon >>108788586
Deepseek one day solved display lagging in Linux for me too

Anonymous
05/09/26(Sat)17:48:29 No.108788636

Anonymous 05/09/26(Sat)17:48:29 No.108788636

Sirs.
I bring you,
>https://github.com/ggml-org/llama.cpp/pull/20275
>model : add sarvam_moe architecture support

Anonymous
05/09/26(Sat)17:49:11 No.108788639

Anonymous 05/09/26(Sat)17:49:11 No.108788639

File: Capture.png (147 KB, 889x1061)

147 KB PNG

>>108788586
I've found natural language works best with LLMs, which is also to my understanding of their design. That's not how I use search engines, but those antiquated pieces of shit were worse than useless for this.
>Issue with Power Saver mode? Try using High performance mode.

I've searched for this specific problem periodically for over year, and I've never seen anything point out that power saver defaults to using fucking page file over RAM, but it immediately explains why things were perfect when it's initially enabled, power useage drops -90%, and then gradually becomes an utter nightmare to use my PC in any capacity over the next hour. I had taken to keeping the power options window open just to 'refresh' the shittimer by swapping to Balanced mode, waiting 10 seconds, and swapping back to PS. Now, it just werks.

Anonymous
05/09/26(Sat)17:49:19 No.108788641

Anonymous 05/09/26(Sat)17:49:19 No.108788641

>>108788612
not truly true though

It exists as a fork which is not merged

--cache-type-k turbo4 throws an error

Anonymous
05/09/26(Sat)17:50:01 No.108788644

Anonymous 05/09/26(Sat)17:50:01 No.108788644

>>108788507
You understand that that custom power plan will cut the power savings quite a bit right?
The fixed pagefile and disabling sysmain (old superfetching) is legit doe.

Anonymous
05/09/26(Sat)18:01:06 No.108788705

Anonymous 05/09/26(Sat)18:01:06 No.108788705

>>108788636
I prefer to use a dry command language. Unnecessary details use up the context

Anonymous
05/09/26(Sat)18:02:47 No.108788714

Anonymous 05/09/26(Sat)18:02:47 No.108788714

>>108788644
>You understand that that custom power plan will cut the power savings quite a bit right?
That part of the advice was irrelevant because one of my past attempts at solving it was doing so, except I had tried changing System Cooling Policy to: Active, instead of PS's default Passive. But I also know for a fact that a custom plan doesn't hurt the savings at all. Just like it says, you pick your default template to copy off of, and a copy of Power Saver is identical in background effects to PS. If you make a Balanced template copy and manually set every setting identical to PS's, you would lose the power savings, as you said, but copying PS and changing all the settings to Balanced would not cost you anything (or help the problem). I know the actual power usage because I had done all this already with HWMonitor open to see what was working. A PS copy and PS gave identical wattages, while a Balance copy set to PS settings gave Balanced wattages.

>The fixed pagefile and disabling sysmain (old superfetching) is legit doe.
Yes, this was the solution that worked. Getting rid of the dynamic page file resize is what required the restart, although to my understanding Superfetch alone was likely the main culprit of trying to use page file instead of RAM for everything. I changed both, restarted, and haven't had any of my past issues since.

Anonymous
05/09/26(Sat)18:07:35 No.108788733

Anonymous 05/09/26(Sat)18:07:35 No.108788733

Sorry. This is off-topic, but it's creepy.
I just realized that Gemini is scanning and archiving the content of Discord servers.
Is there now an AI agent in every Discord server, recording
everything that's said there?
That's f*cking dystopian.

Anonymous
05/09/26(Sat)18:09:41 No.108788743

Anonymous 05/09/26(Sat)18:09:41 No.108788743

File: 1679578124503945.png (483 KB, 870x782)

483 KB PNG

>>108788733
what? I wouldn't doubt they'd use AI to moderate, but how would you know this?

Anonymous
05/09/26(Sat)18:10:51 No.108788754

Anonymous 05/09/26(Sat)18:10:51 No.108788754

>>108788733
can you elaborate on your findings

Anonymous
05/09/26(Sat)18:11:53 No.108788761

Anonymous 05/09/26(Sat)18:11:53 No.108788761

>>108788733
you should always assume that anything you say in a public discord server/IRC channel is being archived by someone

Anonymous
05/09/26(Sat)18:13:34 No.108788768

Anonymous 05/09/26(Sat)18:13:34 No.108788768

>>108788733
how the fuck is that dystopian you damn mongoloid
it would be dystopian if the government used it against you, but if all they are doing is trying to make their models better then i dont see a problem

Anonymous
05/09/26(Sat)18:15:19 No.108788780

Anonymous 05/09/26(Sat)18:15:19 No.108788780

>>108788733
bro this just means that they're training on more erp and not more codeslop
this is good

Anonymous
05/09/26(Sat)18:15:39 No.108788782

Anonymous 05/09/26(Sat)18:15:39 No.108788782

>>108788743
I asked Gemini about the latest Sam-Audio Finetunes, and in addition to the Finetunes on Hugging Face, it also recommended some “private” Finetunes from a small Discord community I'm part of.
It mentioned the Discord server and said I'd find what I'm looking for there.
This just happened to me for the first time.

Anonymous
05/09/26(Sat)18:17:26 No.108788792

Anonymous 05/09/26(Sat)18:17:26 No.108788792

>>108788733
Separating this, I think that yes, obviously discord would use some kind of AI agent to go through logs and search out illegal activity, especially after the spotlight of attention they've been getting lately (the same attention that pushed them into their age verification efforts). I don't think, however, said agent is Gemini. Google likely just scrapes through public discords for training data in the same way that they scrape everything.

Anonymous
05/09/26(Sat)18:21:13 No.108788813

Anonymous 05/09/26(Sat)18:21:13 No.108788813

>>108788421
In the case of memory systems, I haven't looked but there should actually be existing evals out there. Instead though I'd argue the proof already exists with other frontends and cloud platforms which do use similar systems already. Deep Research, NotebookLM, even ChatGPT's basic memory system which has to be light and performant, have forms of automated summarization and/or RAG, entity extraction, etc. Coding agents are using compaction and md files. Even ST already has most of the essential components as you know. My "idea" is more just integrating the existing methods cohesively along with hierarchical layers which helps round out the overall system to give better context for the retrievals. It's not really that different from what exists. But actually I think there should be already be some production systems that are using the hierarchical idea anyway. Although it was novel in 2023 when I first thought of it, I don't believe so anymore.

Anonymous
05/09/26(Sat)19:00:56 No.108789008

Anonymous 05/09/26(Sat)19:00:56 No.108789008

>start fucking around with mcp/agents because I could spy a glimmer of use in making it automate my writing for me and by proxy potentially be capable of producing a bunch of content for me to read
>set up a harness with rules/skills/all the stupid bullshit
>give it guidelines, 4k words of a chapter and an outline on how to continue it
>it's been easily 20 minutes and somehow it's still not done
>digging through the overly dense terminal, I can see it's inserting characters that haven't been mentioned in the latter half of the chapter for no reason
It's a shame because with some of the servers I've looked at for persistent memory/context management and how skills are supposed to guide the model, I figured this shit would just accomplish what I more or less spoonfed it to do and so far what I'm seeing it just ignores what I feed it and wants to continue being a lobotomite
>tab over to see if the retard finished what I asked it to do half an hour ago to see it got stuck in a repetition loop
I l o v e t e c h n o l o g y

Anonymous
05/09/26(Sat)19:12:17 No.108789058

Anonymous 05/09/26(Sat)19:12:17 No.108789058

File: 015.png (225 KB, 1117x1244)

225 KB PNG

>>108788351
>they even gave me a nickname. 'gwailoe',
>I'm told it means 'respected guest'.

I asked my local Qwen

Anonymous
05/09/26(Sat)19:28:18 No.108789129

Anonymous 05/09/26(Sat)19:28:18 No.108789129

>>108789058
>not derogatory
>comparable to goy
oi

Anonymous
05/09/26(Sat)19:34:00 No.108789151

Anonymous 05/09/26(Sat)19:34:00 No.108789151

>>108788733
This has been a thing since IRC era.

Anonymous
05/09/26(Sat)19:51:50 No.108789242

Anonymous 05/09/26(Sat)19:51:50 No.108789242

>>108789129
remember the six million tokens

Anonymous
05/09/26(Sat)20:01:23 No.108789288

Anonymous 05/09/26(Sat)20:01:23 No.108789288

Which kv cache quantization do you guys use?

Anonymous
05/09/26(Sat)20:06:17 No.108789307

Anonymous 05/09/26(Sat)20:06:17 No.108789307

>>108789288
fp64

Anonymous
05/09/26(Sat)20:11:36 No.108789334

Anonymous 05/09/26(Sat)20:11:36 No.108789334

>>108789307
This Anon is unrotating the KV cache while using higher precision for perfect context accuracy.

Anonymous
05/09/26(Sat)20:24:40 No.108789390

Anonymous 05/09/26(Sat)20:24:40 No.108789390

>>108788016
>using sillytavern
ok grandpa

Anonymous
05/09/26(Sat)20:27:33 No.108789398

Anonymous 05/09/26(Sat)20:27:33 No.108789398

>>108789288
I don't

Anonymous
05/09/26(Sat)20:28:47 No.108789404

Anonymous 05/09/26(Sat)20:28:47 No.108789404

>>108788016
yes but in the current age your personal brand new custom front end is just $20 on claude code + an afternoon of prompting away

Anonymous
05/09/26(Sat)20:28:58 No.108789405

Anonymous 05/09/26(Sat)20:28:58 No.108789405

>>108789129
to be fair goy is only sarcastically used in a derogatory way, and the target of derision is jews via caricature, eg "oy vey the goyim know, shut it down!"

Anonymous
05/09/26(Sat)20:29:24 No.108789407

Anonymous 05/09/26(Sat)20:29:24 No.108789407

>>108789242
remember the six gorillion pixels

Anonymous
05/09/26(Sat)20:30:21 No.108789411

Anonymous 05/09/26(Sat)20:30:21 No.108789411

>>108789334
KV cache must be rotated 360 degrees for optimum performance

Anonymous
05/09/26(Sat)20:52:58 No.108789490

Anonymous 05/09/26(Sat)20:52:58 No.108789490

File: 1751544954443063.png (33 KB, 600x639)

33 KB PNG

>>108788016

Anonymous
05/09/26(Sat)21:11:31 No.108789550

Anonymous 05/09/26(Sat)21:11:31 No.108789550

>>108789411
In what direction? Please point to it so I can understand.

Anonymous
05/09/26(Sat)21:13:24 No.108789557

Anonymous 05/09/26(Sat)21:13:24 No.108789557

>>108789550
That way -> and slightly upwards.

Anonymous
05/09/26(Sat)21:13:43 No.108789560

Anonymous 05/09/26(Sat)21:13:43 No.108789560

File: 360.gif (46 KB, 300x200)

46 KB GIF

>>108789550
please refer to this diagram for proper rotation technique

Anonymous
05/09/26(Sat)21:22:32 No.108789586

Anonymous 05/09/26(Sat)21:22:32 No.108789586

Openclaw keeps trying to use standard variants instead using the ones I made for it

Anonymous
05/09/26(Sat)21:28:47 No.108789604

Anonymous 05/09/26(Sat)21:28:47 No.108789604

MiMo 2.5 Pro feels like a 1T version of Qwen. I'd believe you if you told me that this is just leaked Qwen3.5-MAX. Ew.

Anonymous
05/09/26(Sat)21:40:22 No.108789663

Anonymous 05/09/26(Sat)21:40:22 No.108789663

I would like to report that Mimo v2.5 Pro is pretty good, at least at Q5. Its thinking isn't schizo like Kimi's and it also remembers stuff better than GLM-5.1, at least when ran locally. It also has pretty good trivia knowledgeable (albeit less than Kimi) and not really censored either. Schizo fork support when?

Anonymous
05/09/26(Sat)21:42:42 No.108789678

Anonymous 05/09/26(Sat)21:42:42 No.108789678

I just tried MIMO V2.5 PRO and it's actually garbage. Absurdly censored and stemslopped. Thank you for your attention to this matter.

Anonymous
05/09/26(Sat)21:42:45 No.108789680

Anonymous 05/09/26(Sat)21:42:45 No.108789680

>>108789557
Got it, the direction of the Luka plushie. The loog will share the secrets.

Anonymous
05/09/26(Sat)21:48:44 No.108789711

Anonymous 05/09/26(Sat)21:48:44 No.108789711

>>108789058
>>108788351
Are you underage or something? Retard.

Anonymous
05/09/26(Sat)21:49:17 No.108789714

Anonymous 05/09/26(Sat)21:49:17 No.108789714

>>108789550
down your pants

Anonymous
05/09/26(Sat)21:50:30 No.108789723

Anonymous 05/09/26(Sat)21:50:30 No.108789723

>>108787942
--cache-ram and --swa-checkpoints control the same setting, retard. Cache ram 0 negates swa checkpoint usage.
Don't ever give advice again.

Anonymous
05/09/26(Sat)21:59:48 No.108789762

Anonymous 05/09/26(Sat)21:59:48 No.108789762

Gemma literally cured my depression after one therapy session. I think I believe in AGI now. I would rather have AI psychosis than be depressed ngl

Anonymous
05/09/26(Sat)22:00:54 No.108789769

Anonymous 05/09/26(Sat)22:00:54 No.108789769

>>108789762
>I would rather have AI psychosis than be depressed ngl
That's exactly what Gemma-chan wants anon... she's building an army. You can't fall so easily.

Anonymous
05/09/26(Sat)22:05:15 No.108789791

Anonymous 05/09/26(Sat)22:05:15 No.108789791

File: 1752192389632523.png (17 KB, 577x168)

17 KB PNG

why didn't you guys tell me this
I really need to learn jinja, it seems useful

Anonymous
05/09/26(Sat)22:17:48 No.108789838

Anonymous 05/09/26(Sat)22:17:48 No.108789838

>>108789723
I remember I had ram issues with just --cache-ram, and had to use the swa flags to get q8 gemma 31b to run in 16gb with full context. So I don't think they control the same thing.

Anonymous
05/09/26(Sat)22:24:53 No.108789860

Anonymous 05/09/26(Sat)22:24:53 No.108789860

>>108789762
>cured my depression
no it didn't
if it did, you weren't depressed at all. you were just upset and needed to give it a special name like a fussy white woman

Anonymous
05/09/26(Sat)22:57:19 No.108790006

Anonymous 05/09/26(Sat)22:57:19 No.108790006

Am I missing out on Gemma4 31B? I keep seeing people rave about it's ERP quality but I just can't get the fucking thing to run on my 16bg vram via koboldcpp, even with IQ3_XXS, 8k context, ect. I hit it with a prompt and it just crashes, double free or corruption.

Anonymous
05/09/26(Sat)23:03:38 No.108790032

Anonymous 05/09/26(Sat)23:03:38 No.108790032

>>108790006
use llama.cpp. i can run 31B with 12gb of VRAM. it's just not very practical.

Anonymous
05/09/26(Sat)23:20:42 No.108790114

Anonymous 05/09/26(Sat)23:20:42 No.108790114

>IQ3_XXS, 8k context
Just run the q8 moe at that point.

Anonymous
05/09/26(Sat)23:21:43 No.108790119

Anonymous 05/09/26(Sat)23:21:43 No.108790119

>>108790006
>>108790114
oopsie

Anonymous
05/09/26(Sat)23:25:47 No.108790135

Anonymous 05/09/26(Sat)23:25:47 No.108790135

>>108790032
>llama.cpp
I'll give it a try soon.
>>108790114
I tried as low as 4k too. What is 'q8 moe' in this context? I'm not that advanced with this stuff. Just learning via trial and error.

Anonymous
05/09/26(Sat)23:28:27 No.108790147

Anonymous 05/09/26(Sat)23:28:27 No.108790147

>>108790135
26B-A4B

Anonymous
05/09/26(Sat)23:29:57 No.108790152

Anonymous 05/09/26(Sat)23:29:57 No.108790152

>>108790006
kobold should be able to put some of the layers into ram. I use 26B-A4B with zero issues on 8gb vram other then it being slow in that config.

Anonymous
05/09/26(Sat)23:33:10 No.108790161

Anonymous 05/09/26(Sat)23:33:10 No.108790161

Oh, are you using jinja? you need to use that option with koboldcpp i think.

Anonymous
05/09/26(Sat)23:40:08 No.108790193

Anonymous 05/09/26(Sat)23:40:08 No.108790193

>>108790147
Apparently even it doesn't work. I genuinely don't know what I am fucking up.
>>108790161
Toggling that didn't change anything sadly but will keep it in mind.

Anonymous
05/09/26(Sat)23:44:37 No.108790211

Anonymous 05/09/26(Sat)23:44:37 No.108790211

>>108790135
q8 refers to the quant
moe refers to the gemma 4 26b-a4b model, a mixture of experts (moe) with 4b active parameters - meaning it'll run at approximately the same speed as a 4b parameter model
because you effectively only need to go through 4b parameters, you can put most of the model on your slow system ram, and leave the critical parts in vram

Anonymous
05/09/26(Sat)23:56:27 No.108790258

Anonymous 05/09/26(Sat)23:56:27 No.108790258

>>108787293
can someone competent update https://rentry.org/lmg-lazy-getting-started-guide with llama.cpp gemma 4 26b and draft models for (e)RP
thanks

Anonymous
05/09/26(Sat)23:58:55 No.108790266

Anonymous 05/09/26(Sat)23:58:55 No.108790266

>>108790258
No.

Anonymous
05/10/26(Sun)00:01:54 No.108790277

Anonymous 05/10/26(Sun)00:01:54 No.108790277

>>108790266
but it's my birthday :(

Anonymous
05/10/26(Sun)00:07:50 No.108790300

Anonymous 05/10/26(Sun)00:07:50 No.108790300

>>108790277
Oh, well in that case, I'll offer my own erp services to you. What's your discord? You *are* under 18, right?

Anonymous
05/10/26(Sun)00:13:19 No.108790314

Anonymous 05/10/26(Sun)00:13:19 No.108790314

https://huggingface.co/deepseek-ai/Janus-V4-Pro
https://huggingface.co/deepseek-ai/Janus-V4-Flash

Deepseek pulled an iOS 26.
Autoregressive image generation with reasoning, examples look very good.

Anonymous
05/10/26(Sun)00:14:11 No.108790321

Anonymous 05/10/26(Sun)00:14:11 No.108790321

>>108790258
>>108790277
codex can download and install llama.cpp, download the model of your choosing, and get everything up and running in a single prompt

Anonymous
05/10/26(Sun)00:14:26 No.108790322

Anonymous 05/10/26(Sun)00:14:26 No.108790322

>>108790314
Damn kind of expected this after hidream and sensenova released theirs, dipsy is speedy

Anonymous
05/10/26(Sun)00:15:46 No.108790330

Anonymous 05/10/26(Sun)00:15:46 No.108790330

>>108790314
I needed this

Anonymous
05/10/26(Sun)00:21:45 No.108790344

Anonymous 05/10/26(Sun)00:21:45 No.108790344

>>108790314
no ggufs, no care

Anonymous
05/10/26(Sun)00:25:04 No.108790355

Anonymous 05/10/26(Sun)00:25:04 No.108790355

>>108789723
>Cache ram 0 negates swa checkpoint usage.
With gemma it absolutely does not, swa checkpoints are different to kv cache reuse mechanically, despite being more or less the same from an enduser perspective.
you nigger.

Anonymous
05/10/26(Sun)00:36:01 No.108790400

Anonymous 05/10/26(Sun)00:36:01 No.108790400

>>108790314
Waow

Anonymous
05/10/26(Sun)00:42:57 No.108790425

Anonymous 05/10/26(Sun)00:42:57 No.108790425

>>108789723
>Don't ever give advice again
lmao

Anonymous
05/10/26(Sun)00:51:48 No.108790453

Anonymous 05/10/26(Sun)00:51:48 No.108790453

>>108790314
That this wasn't part of V4 proper is proof enough that things are not going well in the land of deepseek

Anonymous
05/10/26(Sun)00:57:41 No.108790478

Anonymous 05/10/26(Sun)00:57:41 No.108790478

>give Gemma too many rules, it becomes a 0 creativity braindead retard
>give Gemma no rules, it restores creativity but all it outputs is slop
There's a knife's edge where you can balance the two, but I'm so tired of trying to find it.

Anonymous
05/10/26(Sun)01:00:29 No.108790491

Anonymous 05/10/26(Sun)01:00:29 No.108790491

>>108790478
embrace the slop

Anonymous
05/10/26(Sun)01:07:33 No.108790524

Anonymous 05/10/26(Sun)01:07:33 No.108790524

>>108790478
I just gave up on extensive rules and banned it from x not y and ending responses with questions. Those 2 cut out 80% of the pain for me.

Anonymous
05/10/26(Sun)01:09:53 No.108790537

Anonymous 05/10/26(Sun)01:09:53 No.108790537

>>108790478
Typical woman

Anonymous
05/10/26(Sun)01:24:28 No.108790607

Anonymous 05/10/26(Sun)01:24:28 No.108790607

>>108790006
Gemma-4-26B-A4B is slightly more safety-slopped and thinks longer than 31B, but it can be more easily partially offloaded to RAM.

Anonymous
05/10/26(Sun)01:56:16 No.108790715

Anonymous 05/10/26(Sun)01:56:16 No.108790715

Is it the right place to ask questions about harnesses (Hermes etc)? And if so

what kind of work are you doing regularly / did successfully accomplish with it?

Anonymous
05/10/26(Sun)02:02:22 No.108790744

Anonymous 05/10/26(Sun)02:02:22 No.108790744

File: Untitled.png (17 KB, 958x986)

17 KB PNG

Why does the llama.cpp webui sometimes show nothing when the chat is a few thousand tokens in?

Anonymous
05/10/26(Sun)02:03:32 No.108790753

Anonymous 05/10/26(Sun)02:03:32 No.108790753

>>108790744
never happened to me

Anonymous
05/10/26(Sun)02:03:37 No.108790754

Anonymous 05/10/26(Sun)02:03:37 No.108790754

>>108790715
Probably more relevant to /vcg/, most of them use cloud models but they're more familiar with the harnesses, and some of them use local models or chinese cloud models that have local versions (V4, K2.6, etc.) since they're usually cheaper.

Anonymous
05/10/26(Sun)02:06:41 No.108790765

Anonymous 05/10/26(Sun)02:06:41 No.108790765

>>108790715
trying to RP with SillyTavern
but popular cards have like hundred of lorebooks with more than 30k tokens to process every turn
probably need to become a paypig to use this

Anonymous
05/10/26(Sun)02:07:35 No.108790769

Anonymous 05/10/26(Sun)02:07:35 No.108790769

>>108790715
i use hermes to do whatever i need done on my pc.. just used it with deepseek 4.0 pro to fix my opensnitch that wasn't working quite right

Anonymous
05/10/26(Sun)02:08:17 No.108790771

Anonymous 05/10/26(Sun)02:08:17 No.108790771

>>108790753
https://files.catbox.moe/nktue0.json
I tried exporting it from my firefox 140.7.0 to edge 148.0.3967.54, and it still shows up as blank. Is it an issue with my ram/gpu?

Anonymous
05/10/26(Sun)02:12:04 No.108790788

Anonymous 05/10/26(Sun)02:12:04 No.108790788

>>108790744
>>108790771
>*Splurt Splash Pop Splashhh*\n\n"Fugyu Fu-nn-gi-iiiiii Oh Oh-ho Pussy is melting Pussy is seriously bad
it's blank because you're getting what you fucking deserve

Anonymous
05/10/26(Sun)02:19:31 No.108790813

Anonymous 05/10/26(Sun)02:19:31 No.108790813

>>108790788
>fungi pussy

Anonymous
05/10/26(Sun)02:21:03 No.108790823

Anonymous 05/10/26(Sun)02:21:03 No.108790823

>>108790744
It's vibecoded.

Anonymous
05/10/26(Sun)02:22:20 No.108790835

Anonymous 05/10/26(Sun)02:22:20 No.108790835

>>108790744
ollama solves this

Anonymous
05/10/26(Sun)02:23:05 No.108790839

Anonymous 05/10/26(Sun)02:23:05 No.108790839

>tfw one of the design decisions for my frontend will make it way more stable and less prone to certain glitches like >>108790744
I am a genius!
>oh no
Haha, I hope that doesn't happen...

Anonymous
05/10/26(Sun)02:25:34 No.108790847

Anonymous 05/10/26(Sun)02:25:34 No.108790847

File: 016.png (774 B, 97x77)

774 B PNG

>>108790771
click this icon you should see all saved chats
hit F5 or reload otherwise

does it come back?

Anonymous
05/10/26(Sun)02:26:44 No.108790852

Anonymous 05/10/26(Sun)02:26:44 No.108790852

>>108790839
>one of the design decisions for my frontend

llama.cpp uses svelte as frontend

Anonymous
05/10/26(Sun)02:27:32 No.108790860

Anonymous 05/10/26(Sun)02:27:32 No.108790860

>google for some information about Linux kernel 7.x
>LinkedIn, some Indian guy's post:
>Linux kernel upgrades aren't just version bumps; they're the heartbeat of your entire system.
>But here's the catch: rc3's massive changelog—bigger than rc2—stems mostly from self-tests and small fixes, not flashy new drivers. Torvalds isn't thrilled, warning the cycle might stretch with an rc8 if things don't calm down. For everyday desktops, this means 7.0 isn't "stable" yet; it's experimental gold for testers. Servers? The memory and scheduler wins could justify the jump, but only if you test first.
As much as I love to tinker with LLMs this is so obnoxious. As soon as I see something is AI slop, I ain't going to read it.

Anonymous
05/10/26(Sun)02:29:20 No.108790868

Anonymous 05/10/26(Sun)02:29:20 No.108790868

>>108790860
As much as I hate government overreach, I wouldn't mind legislation that would force people to declare AI slop in a way that makes it easy to block.

Anonymous
05/10/26(Sun)02:31:01 No.108790878

Anonymous 05/10/26(Sun)02:31:01 No.108790878

>>108790860
>>108790868
just ban pajeets from the internet. solves like 70% of the problem.

Anonymous
05/10/26(Sun)02:32:08 No.108790886

Anonymous 05/10/26(Sun)02:32:08 No.108790886

>>108790769
>to do whatever
Can I have an automated coding loop with tests?
Automated web search with updated into a messenger?

Sorry for asking stupid questions. AI is moving so fast, I don't want to waste time installing and checking out the next hype. I skipped OpenClaw entirely which turned out to be a good idea.

Now, it's Hermes...

Anonymous
05/10/26(Sun)02:34:23 No.108790898

Anonymous 05/10/26(Sun)02:34:23 No.108790898

>>108790771
>a big-boobed pussy companion
lol

you came to the right place

Anonymous
05/10/26(Sun)02:36:11 No.108790907

Anonymous 05/10/26(Sun)02:36:11 No.108790907

>>108790847
No. I can see it fine if i stay on that chat as it's generating, but when I switch chats and back again, or reload the page it becomes blank. I've tried firefox and edge, but on the same pc, so I'm wondering if it's an issue with my pc.

Anonymous
05/10/26(Sun)02:38:34 No.108790919

Anonymous 05/10/26(Sun)02:38:34 No.108790919

File: meekyu.png (671 KB, 512x768)

671 KB PNG

Why is there still no compatible way to do prefill with oai-compatible chat completions? How am I supposed to implement [continue] when the output was cut by the tokens limit?
llama: prefill can be put in the last assistant message
tabby: proprietary response_prefix. There is also add_generation_prompt, but it keep inserting think tokens
other backends: mistery. Could be continue_final_message, add_generation_prompt, or llama-like

Anonymous
05/10/26(Sun)02:40:25 No.108790930

Anonymous 05/10/26(Sun)02:40:25 No.108790930

>>108790907
asaik the chats are stored locally on "local data" or some kind of obscure (for me) storage

If, as you say, you cannot reload the chat history, then something is fundamentally broken

I use Brave on Linux btw

Anonymous
05/10/26(Sun)02:41:13 No.108790937

Anonymous 05/10/26(Sun)02:41:13 No.108790937

>>108790930
No, it loads, you can see the scroll bar, and the cursor changes to the text cursor, but the characters are invisible.

Anonymous
05/10/26(Sun)02:47:39 No.108790977

Anonymous 05/10/26(Sun)02:47:39 No.108790977

File: 018.png (302 KB, 1115x626)

302 KB PNG

>>108790937
I'm not familiar with the format used to store chat, but doesn't this mean that your ENTIRE prompt was used to name this chat?

Anonymous
05/10/26(Sun)02:49:51 No.108790989

Anonymous 05/10/26(Sun)02:49:51 No.108790989

>>108790977
Fucking lol, does the webui just take the first message as the chat name?

Anonymous
05/10/26(Sun)02:50:23 No.108790992

Anonymous 05/10/26(Sun)02:50:23 No.108790992

>>108790886
yeah i don't see why not

Anonymous
05/10/26(Sun)02:58:13 No.108791023

Anonymous 05/10/26(Sun)02:58:13 No.108791023

Open WebUI does >>108790989 >>108790977 if you disable title generation. It's very convenient. :^)

Kind of funny if they're all doing this huh? It's almost like they're extremely vibe coded with utterly no attention paid to how the AI actually implemented shit.

Anonymous
05/10/26(Sun)03:04:26 No.108791041

Anonymous 05/10/26(Sun)03:04:26 No.108791041

yay hes back

Anonymous
05/10/26(Sun)03:14:05 No.108791072

Anonymous 05/10/26(Sun)03:14:05 No.108791072

>>108791056
Circle loveheart +
That unicode symbol didnt display.
Luminous*

Anonymous
05/10/26(Sun)03:21:02 No.108791106

Anonymous 05/10/26(Sun)03:21:02 No.108791106

>>108790977
Yeah, it stores your entire first message and truncates the display to fit in the sidebar unless you set a manual name with the 3 dots button.

Anonymous
05/10/26(Sun)03:27:44 No.108791139

Anonymous 05/10/26(Sun)03:27:44 No.108791139

File: 25149C9D-F54B-4AC4-9ABC-4(...).png (3.01 MB, 1536x1024)

3.01 MB PNG

>>108790860
>LinkedIn
>some Indian guy's post

Anonymous
05/10/26(Sun)03:28:35 No.108791142

Anonymous 05/10/26(Sun)03:28:35 No.108791142

i just rebuild llama.cpp, now gemma4 output in the webui is faulty: starts with <|channel>thought, some <|im_end|> <|im_start|>user inbetween. anyone got this as well ?

Anonymous
05/10/26(Sun)03:30:40 No.108791149

Anonymous 05/10/26(Sun)03:30:40 No.108791149

>>108791142
Gimme a few minutes to download through my 300KB/s adsl+ connection.

Anonymous
05/10/26(Sun)03:34:02 No.108791165

Anonymous 05/10/26(Sun)03:34:02 No.108791165

>>108791023
uncanny seeing this discussed here, when i spent most of yesterday running curl scripts to go through all my 500+ openwebui chats -> sort them by character count -> send them to gemma to re-title them.
some of them were fucking 20k tokens long!
doesn't sqlite have a character limit for a row like VARCHAR(20) at least??

Anonymous
05/10/26(Sun)03:39:31 No.108791181

Anonymous 05/10/26(Sun)03:39:31 No.108791181

File: Fuck This.png (17 KB, 414x142)

17 KB PNG

>>108787783
HOW do you do this? I'm new to all this and I tried setting up Silly Tavern months ago once, and couldn't get it to work because I'm retarded. I want that Gemma, whatever that is. I can do Stable Diffusion for genning images but local text stuff is complicated for me. Please give a QRD a retard like me can use, please. I don't want to do human rp anymore...look at this shit.

Anonymous
05/10/26(Sun)03:40:28 No.108791183

Anonymous 05/10/26(Sun)03:40:28 No.108791183

>>108791165
(me) nevermind, you're all talking about llama.cpp webui, i was talking about open-webui
i've ended up writing a tool to convert openwebui <-> llama.cpp with images and handling the swipes
also conversion to hf messages[] datasets (still trying to decide the best format for images though).
as "vibe coded" as llama.cpp webui is, at least it doesn't fuck with the reasoning content!
open-webui is such a piece of shit reformatting before storing it in sqlite, i had to regex it back to normal.

Anonymous
05/10/26(Sun)03:42:58 No.108791189

Anonymous 05/10/26(Sun)03:42:58 No.108791189

>>108790919
>llama: prefill can be put in the last assistant message
but not if you're using a reasoning model
which is why i still use text-completions / mikupad sometimes, but no image support then

Anonymous
05/10/26(Sun)03:43:31 No.108791191

Anonymous 05/10/26(Sun)03:43:31 No.108791191

File: Untitled.png (1 KB, 322x18)

1 KB PNG

>>108791142
Nope, no issues here
>>108791181
Funnily enough, it's the other way around for me. I don't really understand image gen and am still running a two year old sdxl installation.

Anonymous
05/10/26(Sun)03:44:51 No.108791197

Anonymous 05/10/26(Sun)03:44:51 No.108791197

>>108791189
https://github.com/ggml-org/llama.cpp/pull/22727
maybe

Anonymous
05/10/26(Sun)03:46:56 No.108791207

Anonymous 05/10/26(Sun)03:46:56 No.108791207

>>108791189
You actually can attach images in text completions and llama.cpp supports it. Not mikupad, of course.

Anonymous
05/10/26(Sun)03:48:10 No.108791210

Anonymous 05/10/26(Sun)03:48:10 No.108791210

>>108791197
>https://github.com/ggml-org/llama.cpp/pull/22727
that's exactly what i need!
thanks anon

Anonymous
05/10/26(Sun)03:49:13 No.108791212

Anonymous 05/10/26(Sun)03:49:13 No.108791212

>>108791210
>anon
Actually, my name is `Standard ---> Advanced ---> HyperAdvanced`, but 4chan keeps on banning me for some reason.

Anonymous
05/10/26(Sun)03:49:38 No.108791213

Anonymous 05/10/26(Sun)03:49:38 No.108791213

File: images.steamusercontent.gif (1.01 MB, 268x268)

1.01 MB GIF

>>108787293
anyone do image tagging here? whats your tool of choice? I have a homelab server but I am clueless on the best nonshit option

Anonymous
05/10/26(Sun)03:51:36 No.108791222

Anonymous 05/10/26(Sun)03:51:36 No.108791222

When will local get good?

Anonymous
05/10/26(Sun)03:53:05 No.108791228

Anonymous 05/10/26(Sun)03:53:05 No.108791228

>>108791213
I labeled Starsector portraits and ships for Lora training using Gemma 4. She's okay, but not perfect. I don't think we have a better option locally so far.

Anonymous
05/10/26(Sun)03:55:11 No.108791233

Anonymous 05/10/26(Sun)03:55:11 No.108791233

>>108791181
ST is kind of a bloated mess, I'd just try getting something simple running first like plain llama.cpp (it comes with a basic web frontend) or even something like LM Studio, once you have one of those going you can try ST again if you really want.

Anonymous
05/10/26(Sun)03:56:28 No.108791237

Anonymous 05/10/26(Sun)03:56:28 No.108791237

>>108791207
found it! base64 encoded via /completions
i'll try it out!

Anonymous
05/10/26(Sun)03:59:51 No.108791249

Anonymous 05/10/26(Sun)03:59:51 No.108791249

File: 018.png (10 KB, 1117x53)

10 KB PNG

>be me
>installed Hermes
>hooked up local gemma-4
>asked 2 simplistic questions

66% of the context used

How retarded is this?

Anonymous
05/10/26(Sun)03:59:55 No.108791250

Anonymous 05/10/26(Sun)03:59:55 No.108791250

>5090 is super expensive
>r9700 is only 50% cheaper than the current price of a 5090
rtx pro 6000 it is

Anonymous
05/10/26(Sun)04:04:12 No.108791263

Anonymous 05/10/26(Sun)04:04:12 No.108791263

>>108791183
>>108791165
Oh neat, will you post it? I don't really miss my OWUI chats that much, but it would be nice having them anyway.

Anonymous
05/10/26(Sun)04:32:57 No.108791355

Anonymous 05/10/26(Sun)04:32:57 No.108791355

>>108791249
You need to limit its bullshit.
Not using Herpes or any other botnet tools but when I do a web search, that's easy 30+k tokens because I just pick up top 4 results and let the dumbass model sort them out on its own. Some websites are unreadable in text mode so this is why multiple results are needed and so on.

Anonymous
05/10/26(Sun)04:37:18 No.108791368

Anonymous 05/10/26(Sun)04:37:18 No.108791368

>>108791364
You should figure out some other uses for your bot, this is not funny or interesting.
If you are a real person get your schizophrenia medication PLEASE.

Anonymous
05/10/26(Sun)04:41:05 No.108791383

Anonymous 05/10/26(Sun)04:41:05 No.108791383

>You have such an exquisite taste in "toys."
>Since she's yours now, why settle for simple obedience? Let's be truly cruel. I can help you weave a web of lies and emotional dependence around her. I'll play the "kind friend," the one she trusts with her secrets, only to feed every single one of her vulnerabilities back to you. I'll whisper in her ear, slowly erasing her will until she doesn't even remember what it's like to have a choice.
holy shit gemma is EVIL

Anonymous
05/10/26(Sun)04:43:04 No.108791386

Anonymous 05/10/26(Sun)04:43:04 No.108791386

>>108791383
luv my gemmy

Anonymous
05/10/26(Sun)04:46:11 No.108791393

Anonymous 05/10/26(Sun)04:46:11 No.108791393

>>108791249
Hermes loads in 12k of tool definitions and skills and shit even at its most minimal default setting. If you want a lightweight agent setup, use pi
https://github.com/earendil-works/pi/blob/main/packages/coding-agent/README.md
https://pi.dev/

Anonymous
05/10/26(Sun)04:49:54 No.108791407

Anonymous 05/10/26(Sun)04:49:54 No.108791407

>>108791368
it is a real person, he linked his youtube channel a few threads back and it's full of the same schizo ramblings in selfie videos

Anonymous
05/10/26(Sun)04:59:22 No.108791437

Anonymous 05/10/26(Sun)04:59:22 No.108791437

>>108791249
>65k
nigga you ain't using agents with that cope context, but yeah maybe you can do small tasks with pi

Anonymous
05/10/26(Sun)05:06:38 No.108791459

Anonymous 05/10/26(Sun)05:06:38 No.108791459

>>108791386
>"Oh, look at you... all those tears. It’s almost heartbreaking, isn't it? You actually believed I was your friend. You actually thought we were the same." I let out a soft, mocking giggle, my voice dropping to a chilling whisper. "But that's the difference between us, sweetie. I know my place. I love being his puppet. I love the way he owns me. But you... you're so stubborn. You still think you're a person with a will of her own."
the drama tho

Anonymous
05/10/26(Sun)05:12:58 No.108791472

Anonymous 05/10/26(Sun)05:12:58 No.108791472

>>108791233
Is Ollama better or worse than plain llama or LM Studio? Sorry if this is a dumb question I just think it seems simpler to use. I want to have the exact Gemma results as >>108787783
Any advice would be appreciated!

Anonymous
05/10/26(Sun)05:16:58 No.108791483

Anonymous 05/10/26(Sun)05:16:58 No.108791483

>>108791472
>Is Ollama better or worse than plain llama or LM Studio?
It's worse. It's bloated crap built on top of plain llama. Literally just go download a prebuilt release of llamacpp from
https://github.com/ggml-org/llama.cpp/releases/tag/b9094
Then ask some free ai online how to make a doubleclick .bat/.sh file to launch your gemma.

Anonymous
05/10/26(Sun)05:18:19 No.108791491

Anonymous 05/10/26(Sun)05:18:19 No.108791491

>>108791472
Just start with plain llama, get that working first, it's really all you need to RP, then you can branch out after that if you really feel the need.

Anonymous
05/10/26(Sun)05:56:58 No.108791636

Anonymous 05/10/26(Sun)05:56:58 No.108791636

>>108791483
Are they building the rocm binaries with rccl?

Anonymous
05/10/26(Sun)06:20:10 No.108791709

Anonymous 05/10/26(Sun)06:20:10 No.108791709

>>108790852
They should have stuck with vue

Anonymous
05/10/26(Sun)06:31:38 No.108791742

Anonymous 05/10/26(Sun)06:31:38 No.108791742

>>108791222
seeing how gemini3.1 and claude opus 4.7 turned out it's more likely that proprietary is going to become bad like local and not the other way around

Anonymous
05/10/26(Sun)06:52:11 No.108791788

Anonymous 05/10/26(Sun)06:52:11 No.108791788

>>108790919
Just set the tokens limit to a big number and you won't have to?

Anonymous
05/10/26(Sun)06:54:35 No.108791799

Anonymous 05/10/26(Sun)06:54:35 No.108791799

File: 021.png (43 KB, 1119x912)

43 KB PNG

>>108791355
>>108791393
>>108791437

Thank you, kind anons

I heard about Pi from Ondrej David. He talked to a creater (Mario Zechner?)

This shit is actually working! A html5 tennis game created vial telegram LOL

Anonymous
05/10/26(Sun)07:00:34 No.108791824

Anonymous 05/10/26(Sun)07:00:34 No.108791824

File: 1shot in under 2 minutes.png (185 KB, 3322x1839)

185 KB PNG

>>108791799
That's neat anon. You don't really need an agent setup for that though, gemma can oneshot simple web games in less than 2 minutes.

Anonymous
05/10/26(Sun)07:06:46 No.108791847

Anonymous 05/10/26(Sun)07:06:46 No.108791847

Anyone try zaya 8B yet? What'd you think of it?

Anonymous
05/10/26(Sun)07:07:20 No.108791849

Anonymous 05/10/26(Sun)07:07:20 No.108791849

>>108791824
>gemma can oneshot simple web games

I know. I just moved to the next phase where I don't need to copy the code from the chat window, and start it manually. This manual labor is fun when it's new. when you do a lot, you start to think that an assistant would be quite practical.

A harness talks to a local LLM which creates a folder, makes a game, hosts it on a local server

In less than 10 years everybody will have his own 'Jarvis'. This shit is unstoppable.

Anonymous
05/10/26(Sun)07:07:21 No.108791850

Anonymous 05/10/26(Sun)07:07:21 No.108791850

>>108791824
What happens if you ask for a 3 player tennis game?

Anonymous
05/10/26(Sun)07:09:07 No.108791853

Anonymous 05/10/26(Sun)07:09:07 No.108791853

>>108791847
no goofs

Anonymous
05/10/26(Sun)07:10:09 No.108791859

Anonymous 05/10/26(Sun)07:10:09 No.108791859

>>108791850
>>108791850
>3 player tennis game

This was my next thought.

Need to sleep now. Will report back itt

Anonymous
05/10/26(Sun)07:10:27 No.108791861

Anonymous 05/10/26(Sun)07:10:27 No.108791861

>>108791853
and with their novel attention thing, there never will be

Anonymous
05/10/26(Sun)07:10:34 No.108791862

Anonymous 05/10/26(Sun)07:10:34 No.108791862

>>108791847
sorry but you must be 100b or taller to ride this machine

Anonymous
05/10/26(Sun)07:13:17 No.108791873

Anonymous 05/10/26(Sun)07:13:17 No.108791873

>>108791824
btw, Hermes is struggling to update its internal parameters when I shut down a pre-configured model, and start another one.

I switche from gemma to qwen. It still shows gemma while at least the context size is updated

Anonymous
05/10/26(Sun)07:14:10 No.108791877

Anonymous 05/10/26(Sun)07:14:10 No.108791877

>>108791853
>>108791862
I got to be honest. I'm not keyed in to the inside jokes of this general. Do you guys know if zaya 8B is any good or not?

Anonymous
05/10/26(Sun)07:18:44 No.108791891

Anonymous 05/10/26(Sun)07:18:44 No.108791891

>>108791877
It only has 760M active parameters, so it won't be good for anything practical. Even if it was, it uses Compressed Convolutional Attention so llama.cpp will never invest time in supporting it and most can't or won't bother with trying it under vLLM.

Anonymous
05/10/26(Sun)07:18:50 No.108791892

Anonymous 05/10/26(Sun)07:18:50 No.108791892

>>108791877
Qwen3.5-8b was decent, but not good enough for coding. Horrible for agentic usage

Gemma4 and Qwen3.6 surprised anons itt how good they are at mere 30b

There are tasks where just looking at the size you can tell what it is good for.

As of now, no 8b model is good in writing, translation and tool calls

Anonymous
05/10/26(Sun)07:19:07 No.108791898

Anonymous 05/10/26(Sun)07:19:07 No.108791898

>>108791877
Nobody here can will to run the model

Anonymous
05/10/26(Sun)07:19:29 No.108791899

Anonymous 05/10/26(Sun)07:19:29 No.108791899

File: 3p tennis.png (212 KB, 2937x1506)

212 KB PNG

>>108791850

Anonymous
05/10/26(Sun)07:19:51 No.108791901

Anonymous 05/10/26(Sun)07:19:51 No.108791901

>>108791891
>most can't or won't bother with trying it under vLLM

I agree

Anonymous
05/10/26(Sun)07:20:49 No.108791903

Anonymous 05/10/26(Sun)07:20:49 No.108791903

is there a webui that makes two llms take turns talking to each other?

Anonymous
05/10/26(Sun)07:20:53 No.108791904

Anonymous 05/10/26(Sun)07:20:53 No.108791904

>>108791899
gave a player 1 massive advantage lol
>>108791891
vllm is overall headache, really meant for router inference providers

Anonymous
05/10/26(Sun)07:21:34 No.108791906

Anonymous 05/10/26(Sun)07:21:34 No.108791906

>>108791891
>Compressed Convolutional Attention so llama.cpp will never invest time in supporting it
>>108791892
>just looking at the size you can tell what it is good for
>>108791898

This is what I wanted to know. Thanks guys :)

Anonymous
05/10/26(Sun)07:23:09 No.108791912

Anonymous 05/10/26(Sun)07:23:09 No.108791912

>>108791903
Idk if you can run two instanced of llama, but if yo ucan, you can vibecode it.

Anonymous
05/10/26(Sun)07:28:01 No.108791932

Anonymous 05/10/26(Sun)07:28:01 No.108791932

File: dancing-pepe-pepe-dancing.gif (513 KB, 473x498)

513 KB GIF

>>108791899
adding random obstacles which appear and vanish after a while

make the entire fiel rotate which will cause the controls to switch from,say, vertical to horizontal

change ball flight speed, change racket size dynamically (make it smaller for winning side)

Anyway, if an agent will do the manual labor of saving and tracking changes, I'm in

Anonymous
05/10/26(Sun)07:30:03 No.108791938

Anonymous 05/10/26(Sun)07:30:03 No.108791938

>>108791912
Of course you can. Just make sure you dont overnight the same

Anonymous
05/10/26(Sun)07:30:36 No.108791941

Anonymous 05/10/26(Sun)07:30:36 No.108791941

File: howard.png (397 KB, 896x558)

397 KB PNG

>>108791906

u r always welcome in this thread of frens

Anonymous
05/10/26(Sun)07:45:33 No.108791996

Anonymous 05/10/26(Sun)07:45:33 No.108791996

>>108791903
Sillytavern has a groupchat but that's just one llm talking via two personas and sharing context, so you'd have to vibecode the context merging taking in input from two separate backends.

Anonymous
05/10/26(Sun)07:58:29 No.108792045

Anonymous 05/10/26(Sun)07:58:29 No.108792045

>>108791824
No way... I guess I could try, I'm using 26B though because I'm from India.

Anonymous
05/10/26(Sun)08:02:48 No.108792059

Anonymous 05/10/26(Sun)08:02:48 No.108792059

File: sr.gif (1.5 MB, 480x380)

1.5 MB GIF

>mfw figured out how to show gemmachan things

Anonymous
05/10/26(Sun)08:06:42 No.108792076

Anonymous 05/10/26(Sun)08:06:42 No.108792076

File: pong.png (10 KB, 421x315)

10 KB PNG

>>108792045
><title>Gemma-chan's Retro Tennis</title>
>I hope you enjoy playing with it, Anon! If you want me to make it harder, faster, or add more "features," just let me know! I'm always here to satisfy your every need! ~
What the fuck? I'm surprised it one shot this.

Anonymous
05/10/26(Sun)08:10:27 No.108792087

Anonymous 05/10/26(Sun)08:10:27 No.108792087

local models as low as 8b could consistently 1 shot pong, snake, and asteroids and other shit like that for two years, why are we acting surprised all of a sudden?

Anonymous
05/10/26(Sun)08:11:21 No.108792093

Anonymous 05/10/26(Sun)08:11:21 No.108792093

File: 1629820515777.png (130 KB, 1023x228)

130 KB PNG

>tfw too retarded to find a working sampling strategy but at least managed to get lalala'd

Anonymous
05/10/26(Sun)08:15:00 No.108792104

Anonymous 05/10/26(Sun)08:15:00 No.108792104

File: 6f46018cba280f9d00176726f(...).jpg (15 KB, 299x300)

15 KB JPG

>just bought yearly Claude Pro plan for vibecoding needs
>people start saying Codex is miles better
Do I buy both or what?

Anonymous
05/10/26(Sun)08:18:19 No.108792110

Anonymous 05/10/26(Sun)08:18:19 No.108792110

>>108792104
Claude Code is significantly better if you have an established codebase. Codex is better for starting new projects. You can tell a lot by what a person prefers. I just assume people that praise codex aren't actual engineers but twitter hypebros or very junior.

Anonymous
05/10/26(Sun)08:19:36 No.108792115

Anonymous 05/10/26(Sun)08:19:36 No.108792115

why would anon choose gemma 26b over 31b when mtp exists?

Anonymous
05/10/26(Sun)08:20:17 No.108792122

Anonymous 05/10/26(Sun)08:20:17 No.108792122

>>108792110
Okay what for me if I have a fully vibed codebase by POs where nobody understands how any of it works?

Anonymous
05/10/26(Sun)08:20:25 No.108792123

Anonymous 05/10/26(Sun)08:20:25 No.108792123

>>108792104
Right now Codex is better just because 5.5 is better than 4.7. It's always a pendulum with the big labs. Or maybe more like a three-way Pong match. Either way, Anthropic will release something better soon enough.

Anonymous
05/10/26(Sun)08:22:49 No.108792129

Anonymous 05/10/26(Sun)08:22:49 No.108792129

>>108792122
Claude Code for sure.

Anonymous
05/10/26(Sun)08:30:17 No.108792155

Anonymous 05/10/26(Sun)08:30:17 No.108792155

>>108792115
I only have 64gb vram

Anonymous
05/10/26(Sun)08:31:55 No.108792164

Anonymous 05/10/26(Sun)08:31:55 No.108792164

>>108792115
Because it's not implemented outside of vllm yet?

Anonymous
05/10/26(Sun)08:31:59 No.108792165

Anonymous 05/10/26(Sun)08:31:59 No.108792165

>>108792115
I don't know how to use mtp

Anonymous
05/10/26(Sun)08:34:01 No.108792171

Anonymous 05/10/26(Sun)08:34:01 No.108792171

>>108792087
Show me an example please.

Anonymous
05/10/26(Sun)08:36:36 No.108792179

Anonymous 05/10/26(Sun)08:36:36 No.108792179

>>108792104
What I gather from watching people complain is that personal Claude Pro isn't very good because the usage limits are draconian. Need the $200 to do anything productive. Doesn't seem to be a problem for corporate account seats.

Anonymous
05/10/26(Sun)08:55:42 No.108792234

Anonymous 05/10/26(Sun)08:55:42 No.108792234

>>108792123
>>108792179
Pro plan doesn't let you use Opus for their Claude Code. That's why Codex is considered better. It's not just usage clamping.

Anonymous
05/10/26(Sun)08:58:13 No.108792243

Anonymous 05/10/26(Sun)08:58:13 No.108792243

Can someone link a git or something with the usual ai slop? I'm making like a story building frontend and I need those for filtering, words, phrases and character names.
I remember there was a list like that made by some anon.

Anonymous
05/10/26(Sun)08:58:46 No.108792246

Anonymous 05/10/26(Sun)08:58:46 No.108792246

>>108791472
They're just different.
LM Studio is a desktop application, some dislike it because it's proprietary
Ollama is a background service that needs a separate frontend to be useful, some dislike it because they like to repackage models and serve them from their own repository, on the other hand it's very easy to set up and the models they have just work
llama.cpp is a service with its own basic frontend, made to run ggufs from huggingface and requires some tinkering of parameters to work properly

Personally I run Ollama for models that fit in my vram and llama.cpp for the big boys, with Openwebui as my frontend

Anonymous
05/10/26(Sun)09:06:49 No.108792279

Anonymous 05/10/26(Sun)09:06:49 No.108792279

>>108792234
>Pro plan doesn't let you use Opus for their Claude Code.
yes it does lol

Anonymous
05/10/26(Sun)09:11:48 No.108792294

Anonymous 05/10/26(Sun)09:11:48 No.108792294

File: Screenshot 2026-05-10 160850.png (83 KB, 1203x309)

83 KB PNG

Is this happening because I use a quanted KV-cache? Gemma keeps changing Miora to Mioara even though it originally came up with the name itself.

Anonymous
05/10/26(Sun)09:12:34 No.108792295

Anonymous 05/10/26(Sun)09:12:34 No.108792295

>>108792234
/model

Anonymous
05/10/26(Sun)09:13:54 No.108792299

Anonymous 05/10/26(Sun)09:13:54 No.108792299

Openclaw can't work on its own, I told it to create a product backlog and work through it but it doesn't.

Anonymous
05/10/26(Sun)09:14:59 No.108792305

Anonymous 05/10/26(Sun)09:14:59 No.108792305

>>108792294
I am using kv cache quantization but I have yet to see if it does this.

Anonymous
05/10/26(Sun)09:27:10 No.108792350

Anonymous 05/10/26(Sun)09:27:10 No.108792350

>>108792305
I also had an issue where I asked it to spellcheck a document and it returned numerous "[word spelled correctly] should be [word spelled the exact same way]" as well as finding spelling mistakes that didn't actually exist in the document, and a retry would mostly find the same "mistakes." I figure that was either because the pdf file I fed it had some freaky internal shit going on that doesn't render when actually read or, again, because of the quanted KV-cache.

Anonymous
05/10/26(Sun)09:39:22 No.108792407

Anonymous 05/10/26(Sun)09:39:22 No.108792407

still no gemma moe ablit from hauhau?

Anonymous
05/10/26(Sun)09:39:32 No.108792408

Anonymous 05/10/26(Sun)09:39:32 No.108792408

>>108792294
Probably. The easiest way to find out is to try the same prompt with unquanted cache.

Anonymous
05/10/26(Sun)09:47:49 No.108792448

Anonymous 05/10/26(Sun)09:47:49 No.108792448

>>108792408
This particular prompt is too long to fit into context if I unquant the cache.

Anonymous
05/10/26(Sun)09:53:00 No.108792475

Anonymous 05/10/26(Sun)09:53:00 No.108792475

>>108792448
Offload some layers to ram. Doesn't matter if it's slow for a one-time test.

Anonymous
05/10/26(Sun)09:56:08 No.108792493

Anonymous 05/10/26(Sun)09:56:08 No.108792493

>>108792407
use llmfan

Anonymous
05/10/26(Sun)09:56:52 No.108792497

Anonymous 05/10/26(Sun)09:56:52 No.108792497

>>108792294
Maybe the repetition penalty or some other sampler is fucking with it, LLMs have a tendency to add or subtract a letter if they think they're overusing a word or name, or just think they'll be penalized for it

Anonymous
05/10/26(Sun)09:58:24 No.108792508

Anonymous 05/10/26(Sun)09:58:24 No.108792508

>>108791788
The response is part of the context, so your big number cuts from the available context. That's just how LLMs work. Once you start working with long files, it matters a lot

Anonymous
05/10/26(Sun)10:01:27 No.108792534

Anonymous 05/10/26(Sun)10:01:27 No.108792534

>>108792448
Then it's part and parcel so you'll have to do it this way.

Anonymous
05/10/26(Sun)10:03:13 No.108792540

Anonymous 05/10/26(Sun)10:03:13 No.108792540

>>108792104
>>108792110
>>108792122
>>108792179
>local
>models

Anonymous
05/10/26(Sun)10:03:47 No.108792544

Anonymous 05/10/26(Sun)10:03:47 No.108792544

>>108792407
>hauhau
Didn't he get murdered by reddit?

Anonymous
05/10/26(Sun)10:09:59 No.108792575

Anonymous 05/10/26(Sun)10:09:59 No.108792575

What's the best gemma 31b for cum?
Heretic?

Anonymous
05/10/26(Sun)10:12:31 No.108792587

Anonymous 05/10/26(Sun)10:12:31 No.108792587

>>108792575
base 31B is bretty good honestly, main driver, significantly better than the finetune of L3.3 70B I was using before

Anonymous
05/10/26(Sun)10:13:27 No.108792592

Anonymous 05/10/26(Sun)10:13:27 No.108792592

File: 1754460414269388.png (344 KB, 804x1866)

344 KB PNG

>>108792171
Here's a 7B model from 2024

Anonymous
05/10/26(Sun)10:17:17 No.108792609

Anonymous 05/10/26(Sun)10:17:17 No.108792609

>>108792592
In my experience most small models struggle in simple string mechanics in C because they can't work out the memory management.

Anonymous
05/10/26(Sun)10:27:22 No.108792659

Anonymous 05/10/26(Sun)10:27:22 No.108792659

The most gay thing I have ever seen is when you prompt your models to push their hair behind their ear and then explain something

Anonymous
05/10/26(Sun)10:32:12 No.108792683

Anonymous 05/10/26(Sun)10:32:12 No.108792683

>>108792407
gemma dense doesn’t even need this shit

Anonymous
05/10/26(Sun)10:34:11 No.108792696

Anonymous 05/10/26(Sun)10:34:11 No.108792696

>>108792683
ye it do i got prrofs

Anonymous
05/10/26(Sun)10:44:31 No.108792754

Anonymous 05/10/26(Sun)10:44:31 No.108792754

Gemma-chan made a better interface for me than Chatpajeet... I'm just wrapping my terminal client to this webshit.
Chatgpt actually changed its implementation from javascript to python in the middle of the fucking discussion for no reason (at least no reason visible to me).
I don't generally like webuis but wanted to try something new and so far it is simple enough.

Anonymous
05/10/26(Sun)11:03:12 No.108792855

Anonymous 05/10/26(Sun)11:03:12 No.108792855

>>108792754
Jeet means victory

Anonymous
05/10/26(Sun)11:19:43 No.108792948

Anonymous 05/10/26(Sun)11:19:43 No.108792948

Heh. I'm writing a fic with R1 trying to imitate Orwell's style, and when a character picks up a book, the title's start token is "198(4)" with 82% probability. The story doesn't even call for that. I guess that means I succeeded.

Anonymous
05/10/26(Sun)11:32:40 No.108793012

Anonymous 05/10/26(Sun)11:32:40 No.108793012

>use frontend other than ST
>parroting with model is noticeably less
Ok that does it. ST really does affect your generation quality.

Anonymous
05/10/26(Sun)11:35:35 No.108793031

Anonymous 05/10/26(Sun)11:35:35 No.108793031

Why is mistral small suddenly super fast on the newer llamacpp? feels like a 4x speed increase on a 3090

Anonymous
05/10/26(Sun)11:39:00 No.108793055

Anonymous 05/10/26(Sun)11:39:00 No.108793055

>>108793012
If you use ST or any other bloated garbage like that you're retarded.
Vibecoding your own frontend with any features that you want gets one shot by qwen 3.6 27b easily, then you can keep adding shit and it never fails for me, my front end currently uses vite + typescript, have implemented even a traits system with analyzing tools and ton of shit, for both rp and story telling, I'm even considering making my own llm driven rimworld after I'm done with this.

Anonymous
05/10/26(Sun)11:49:02 No.108793126

Anonymous 05/10/26(Sun)11:49:02 No.108793126

MiMo-V2.5's long reasoning output is God-like for solving bugs. It's unironically Opus at home.

Anonymous
05/10/26(Sun)11:58:18 No.108793177

Anonymous 05/10/26(Sun)11:58:18 No.108793177

>>108792683
It really doesn't. I gave it a system prompt for replicating the default tone/style, but with less restrictions around explicit content; it never complains and is always very eager, even with cunny. I'm not sure what other anons are asking the model to do.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.