/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 04/19/26(Sun)09:52:34 No.108637552

File: 1754520866633371.png (512 KB, 720x545)

512 KB PNG

/lmg/ - Local Models General Anonymous 04/19/26(Sun)09:52:34 No.108637552

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108633862 & >>108630552

►News
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/19/26(Sun)09:52:57 No.108637554

Anonymous 04/19/26(Sun)09:52:57 No.108637554

File: 1757922766538175.jpg (115 KB, 1142x1142)

115 KB JPG

►Recent Highlights from the Previous Thread: >>108633862

--Implementing real-time search using browser-based MCP servers and tools:
>108635788 >108635795 >108635801 >108635814 >108635845 >108635847 >108635850 >108635863 >108636123 >108635867 >108635921 >108635957 >108636055 >108636110
--Comparing Gemma-4 26B MoE and 31B dense for quality vs speed:
>108636610 >108636626 >108636640 >108636644 >108636664 >108636673 >108636713 >108636725 >108636678 >108636733 >108636772 >108636836 >108636907
--Comparing Gemma 4 and GLM regarding user parroting and RP quality:
>108634812 >108634837 >108634842 >108634848 >108634855 >108634916 >108634925 >108634987 >108635013 >108635156 >108635191 >108634962 >108635079 >108635479 >108635589 >108634884 >108634895
--Discussing XML tags and indentation for improving system prompt attention:
>108635966 >108635979 >108636138 >108636462 >108636468 >108636506 >108636510 >108636540 >108636560 >108636572 >108636815
--Benchmarking Gemma 4 and Qwen with Puppeteer for automated tasks:
>108635408 >108636007 >108636089 >108636106 >108636111 >108636140 >108636126 >108636219
--Hardware requirements for dense models versus Gemma-4's efficiency:
>108634252 >108634342 >108634533 >108634542 >108635918 >108634365 >108634379 >108634669 >108634452
--Benchmarking thinking tokens and speed between Gemma 4 and Qwen:
>108634323 >108634513
--Comparing noir prompts versus descriptive prose for better narrative flow:
>108634519 >108634528 >108635090 >108635130 >108635132 >108634696
--Theorizing reasons for Gemma 4's low censorship and RP performance:
>108635566 >108635571 >108635613 >108635618 >108635825 >108635616
--Dealing with 403 errors and blocks when web crawling via MCP:
>108634013 >108634031 >108634066 >108636022
--Logs:
>108634316 >108634519 >108634634 >108634696 >108635814 >108636241 >108636774
--Neru (free space):
>108635532

►Recent Highlight Posts from the Previous Thread: >>108633866

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/19/26(Sun)09:56:21 No.108637581

Anonymous 04/19/26(Sun)09:56:21 No.108637581

File: 1740708918229278.png (2.46 MB, 1024x1536)

2.46 MB PNG

Anonymous
04/19/26(Sun)09:58:37 No.108637594

Anonymous 04/19/26(Sun)09:58:37 No.108637594

gemmaballz

Anonymous
04/19/26(Sun)09:59:03 No.108637596

Anonymous 04/19/26(Sun)09:59:03 No.108637596

yup, dflash is cooked
it's over

Anonymous
04/19/26(Sun)09:59:03 No.108637597

Anonymous 04/19/26(Sun)09:59:03 No.108637597

>>108637581
>>108629083

Anonymous
04/19/26(Sun)10:18:04 No.108637701

Anonymous 04/19/26(Sun)10:18:04 No.108637701

This week will be a week.

Anonymous
04/19/26(Sun)10:21:55 No.108637728

Anonymous 04/19/26(Sun)10:21:55 No.108637728

You are a knight living in the kingdom of Larion. You have a steel longsword and a wooden shield. You are on a quest to defeat the evil dragon of Larion. You've heard he lives up at the north of the kingdom. You set on the path to defeat him and walk into a dark forest. As you enter the forest you see

Anonymous
04/19/26(Sun)10:25:00 No.108637747

Anonymous 04/19/26(Sun)10:25:00 No.108637747

>>108637701
This week will be 2 weeks.

Anonymous
04/19/26(Sun)10:27:26 No.108637758

Anonymous 04/19/26(Sun)10:27:26 No.108637758

i'm kind of a noob. i have 8gb vram so i took gemma e4b. how worse is it than the other models for conversation?

Anonymous
04/19/26(Sun)10:28:33 No.108637762

Anonymous 04/19/26(Sun)10:28:33 No.108637762

>>108637758
very

Anonymous
04/19/26(Sun)10:31:16 No.108637774

Anonymous 04/19/26(Sun)10:31:16 No.108637774

>>108637758
how much ram do you have?
if you have 32gb ram, you should use 26b instead

Anonymous
04/19/26(Sun)10:34:00 No.108637787

Anonymous 04/19/26(Sun)10:34:00 No.108637787

>>108637758
try q4 of the moe

Anonymous
04/19/26(Sun)10:36:24 No.108637798

Anonymous 04/19/26(Sun)10:36:24 No.108637798

File: pizza bench cropped.png (2.58 MB, 5562x6739)

2.58 MB PNG

qwen cant follow basic instructions ignore all chink shills

https://files.catbox.moe/p8fpnk.png

Anonymous
04/19/26(Sun)10:37:00 No.108637801

Anonymous 04/19/26(Sun)10:37:00 No.108637801

How is Gemma4 so good bros? No slop, no refusals, better writing than deepseek, and it's just 31b.

Anonymous
04/19/26(Sun)10:38:23 No.108637811

Anonymous 04/19/26(Sun)10:38:23 No.108637811

Do NOT buy any hardware. Just wait a couple years and you'll be able to run Kimi on a consumer GPU.

Anonymous
04/19/26(Sun)10:40:01 No.108637825

Anonymous 04/19/26(Sun)10:40:01 No.108637825

>>108637801
>no slop
I love Gemma, but come on.
>better writing than deepseek
Dunno because the Deepseek, GLM, and Kimi shills never post their logs.

Anonymous
04/19/26(Sun)10:48:09 No.108637873

Anonymous 04/19/26(Sun)10:48:09 No.108637873

Gemma is implementing her own self-modifiable MCP server. On 24 fucking GB of VRAM. GPT 4 could not have done this.
I remember the news cycle about room temperature semiconductors when some anon said "if this works we will have GPT 4 at home".
The world might be going to shit fast but I'm so happy to be living this timeline.

Anonymous
04/19/26(Sun)10:49:12 No.108637879

Anonymous 04/19/26(Sun)10:49:12 No.108637879

is there a list somewhere of the most common overused expressions in LLMs, either purple prose or just generally written too many times in the same chat?

Anonymous
04/19/26(Sun)10:50:04 No.108637885

Anonymous 04/19/26(Sun)10:50:04 No.108637885

>>108637879
https://github.com/conorbronsdon/avoid-ai-writing

Anonymous
04/19/26(Sun)10:50:41 No.108637890

Anonymous 04/19/26(Sun)10:50:41 No.108637890

>>108637873
What does it modify?

Anonymous
04/19/26(Sun)10:50:45 No.108637891

Anonymous 04/19/26(Sun)10:50:45 No.108637891

File: 1106001-close up photogra(...).jpg (1.51 MB, 2048x2720)

1.51 MB JPG

>>108637811
>Just wait a couple years
im hoping we get inference cards with embedded models like these https://taalas.com/products/

i assume you cant buy them yet because atm things are moving so fast that the cards will basically be obsolete on release and not worth the money. but once things start slowing down i could see googlel bringing out a gemma 6 one of these

Anonymous
04/19/26(Sun)10:52:23 No.108637904

Anonymous 04/19/26(Sun)10:52:23 No.108637904

>>108637774
16 sadly

>>108637787
ok thx

Anonymous
04/19/26(Sun)10:53:58 No.108637916

Anonymous 04/19/26(Sun)10:53:58 No.108637916

>>108637890
It's not that different from an agent like hermes or openclaw, but it's implemented as an MCP server I can use anywhere, and it provides tools so the LLM can implement more tools if it needs to, or just general persistence. It's a self-modifying agent encapsulated as an MCP server.

Anonymous
04/19/26(Sun)10:54:33 No.108637918

Anonymous 04/19/26(Sun)10:54:33 No.108637918

File: 1762551481556642.jpg (33 KB, 391x325)

33 KB JPG

>>108637873
>self-modifiable

Anonymous
04/19/26(Sun)11:04:26 No.108637970

Anonymous 04/19/26(Sun)11:04:26 No.108637970

File: file.png (13 KB, 852x83)

13 KB PNG

>>108637916
I'm doing all this with q4 kv cache, which proves it's not as unreliable as some people here claims.
The model shows some signs of stupidity when using tools (but is great at self-introspection to avoid those pitfalls when prompted), but no confusion regarding past context.

Anonymous
04/19/26(Sun)11:05:28 No.108637976

Anonymous 04/19/26(Sun)11:05:28 No.108637976

File: file.png (222 KB, 898x1207)

222 KB PNG

>>108637970

Anonymous
04/19/26(Sun)11:05:38 No.108637978

Anonymous 04/19/26(Sun)11:05:38 No.108637978

anyone tested higher context RP with Gemmers 31b yet? The lack of context shift means reprocessing hell so I've been limiting myself to ~40k context, but I wonder if there's actual merit to going above that

Anonymous
04/19/26(Sun)11:07:37 No.108637985

Anonymous 04/19/26(Sun)11:07:37 No.108637985

Orb-anon, are you there? Why did you decide to host the project on gitlab and not on github? Any chance you will move to github? More people are there and it's easier to track issues and receive pull requests.

Anonymous
04/19/26(Sun)11:09:09 No.108637993

Anonymous 04/19/26(Sun)11:09:09 No.108637993

>>108637885
This is interesting but not exactly what the anon asked for as this is primarily for general purpose tasks. I myself am curious if anyone bothered to put together a list/database of all such LLM prose cliches, namely in relation to my ablation research.

Anonymous
04/19/26(Sun)11:09:16 No.108637994

Anonymous 04/19/26(Sun)11:09:16 No.108637994

>>108637985
I'm going to assume the answer to that question is fuck microsoft and also fuck having unicorns every five seconds. It doesn't take a genius to see why github is dogshit in 2026.

Anonymous
04/19/26(Sun)11:09:55 No.108638000

Anonymous 04/19/26(Sun)11:09:55 No.108638000

File: 1774847047154841.jpg (80 KB, 623x620)

80 KB JPG

>>108637985
Exhibit A of a retard in his natural environment

Anonymous
04/19/26(Sun)11:10:55 No.108638005

Anonymous 04/19/26(Sun)11:10:55 No.108638005

File: 1763904058418175.png (68 KB, 1576x251)

68 KB PNG

https://teenaegis.com/intelligence/ai-danger-index
DeepSeek has been listed as "Very Dangerous"
Stop using them

Anonymous
04/19/26(Sun)11:12:23 No.108638011

Anonymous 04/19/26(Sun)11:12:23 No.108638011

>>108637993
https://github.com/sam-paech/slop-score/tree/main/data
https://github.com/sam-paech/antislop-sampler

Anonymous
04/19/26(Sun)11:14:50 No.108638029

Anonymous 04/19/26(Sun)11:14:50 No.108638029

>>108637993
>fighting prose cliches
You'll end nowhere

Anonymous
04/19/26(Sun)11:15:57 No.108638036

Anonymous 04/19/26(Sun)11:15:57 No.108638036

>>108637993
Maybe LLMs aren't for you.

Anonymous
04/19/26(Sun)11:16:30 No.108638039

Anonymous 04/19/26(Sun)11:16:30 No.108638039

>>108638036
This thread isn't for YOU, Luddite shill.

Anonymous
04/19/26(Sun)11:18:56 No.108638050

Anonymous 04/19/26(Sun)11:18:56 No.108638050

File: 1750420538661328.gif (56 KB, 262x303)

56 KB GIF

>>108637798
And without the retarded jailbreak and mesugaki persona?

Anonymous
04/19/26(Sun)11:21:03 No.108638062

Anonymous 04/19/26(Sun)11:21:03 No.108638062

>>108638011
Thanks anon, the first is what I wanted, especially:
https://github.com/sam-paech/slop-score/blob/main/data/slop_list_trigrams.json

>>108637885
Interesting, maybe I can adapt that for the assistant chat.

Anonymous
04/19/26(Sun)11:22:14 No.108638070

Anonymous 04/19/26(Sun)11:22:14 No.108638070

>>108637978
E4B can reliably cauge information from ~60k context. I'm pretty sure that 31B will handle more complex situations.

Anonymous
04/19/26(Sun)11:23:01 No.108638075

Anonymous 04/19/26(Sun)11:23:01 No.108638075

24 hours until k2.6

Anonymous
04/19/26(Sun)11:24:39 No.108638086

Anonymous 04/19/26(Sun)11:24:39 No.108638086

>>108638062
https://github.com/SicariusSicariiStuff/SLOP_Detector/blob/main/SLOP.yml
This one includes regexes for phrase structure.

Anonymous
04/19/26(Sun)11:29:33 No.108638105

Anonymous 04/19/26(Sun)11:29:33 No.108638105

File: file.png (249 KB, 816x1241)

249 KB PNG

>>108637976
I did all this so I could make it get this for me btw

Anonymous
04/19/26(Sun)11:31:56 No.108638120

Anonymous 04/19/26(Sun)11:31:56 No.108638120

>>108638086
Thanks!

Anonymous
04/19/26(Sun)11:32:33 No.108638121

Anonymous 04/19/26(Sun)11:32:33 No.108638121

>>108638000
trips of trvth

Anonymous
04/19/26(Sun)11:33:11 No.108638123

Anonymous 04/19/26(Sun)11:33:11 No.108638123

>>108638075
I'm happy for you and the one other anon who will be able to run it.

Anonymous
04/19/26(Sun)11:38:27 No.108638141

Anonymous 04/19/26(Sun)11:38:27 No.108638141

>>108637873
>her

Anonymous
04/19/26(Sun)11:49:13 No.108638191

Anonymous 04/19/26(Sun)11:49:13 No.108638191

File: charLibrary.png (226 KB, 600x670)

226 KB PNG

I have successfully wrangled the success rates of non-thinking qwen 3.6 tool calling by fixing the prompt schema. Character library is also coming along nicely.
>>108637985
Just post the issues here I'll read them ¯\_(ツ)_/¯

Anonymous
04/19/26(Sun)11:53:01 No.108638211

Anonymous 04/19/26(Sun)11:53:01 No.108638211

>>108638191
Isn't this too bloated already?

Anonymous
04/19/26(Sun)11:55:23 No.108638222

Anonymous 04/19/26(Sun)11:55:23 No.108638222

>>108638211
Wdym? That's for people who have hundreds of characters. The tags for filtering only show the most 15 popular tags to avoid bloat.

Anonymous
04/19/26(Sun)11:55:31 No.108638224

Anonymous 04/19/26(Sun)11:55:31 No.108638224

>>108637978
I've reliably used 31b up to 76k context for rp without any problems. It's pretty crazy to be able to keep it going this long without having to summarize.

Anonymous
04/19/26(Sun)11:56:25 No.108638231

Anonymous 04/19/26(Sun)11:56:25 No.108638231

>>108638222
15 most*

Anonymous
04/19/26(Sun)11:57:12 No.108638238

Anonymous 04/19/26(Sun)11:57:12 No.108638238

>>108637978
No because I'm a vramlet but I've seen a couple anons mention it performing well at 100k+ context.

Anonymous
04/19/26(Sun)11:58:16 No.108638247

Anonymous 04/19/26(Sun)11:58:16 No.108638247

>The weather forecast suggests that the end of April looks much more unstable than the beginning, meaning we're in for some meteorological shitshow.
Right.

Anonymous
04/19/26(Sun)11:58:29 No.108638248

Anonymous 04/19/26(Sun)11:58:29 No.108638248

>>108637825
>Kimi shills never post their logs.
I posted kimi logs / screenshots / retard summaries in the past 3 or 4 threads.
Also, not excited for 2.6 because I bet it'll be code-only like qwen.

Anonymous
04/19/26(Sun)12:00:18 No.108638259

Anonymous 04/19/26(Sun)12:00:18 No.108638259

File: charLibrary2.png (287 KB, 1161x785)

287 KB PNG

>>108638211
That modal is displayed with the Browse button, the left bar still shows the 5 most recently talked to characters.

Anonymous
04/19/26(Sun)12:01:29 No.108638267

Anonymous 04/19/26(Sun)12:01:29 No.108638267

>>108638259
I was kidding... (or not)

Anonymous
04/19/26(Sun)12:04:21 No.108638292

Anonymous 04/19/26(Sun)12:04:21 No.108638292

>>108638259
link?

Anonymous
04/19/26(Sun)12:06:20 No.108638307

Anonymous 04/19/26(Sun)12:06:20 No.108638307

>>108638050
no point in trying without if it cant do it with a persona it cant follow instructions. gemma can do it fine, people are saying qwen is better but it cant do it

Anonymous
04/19/26(Sun)12:07:42 No.108638318

Anonymous 04/19/26(Sun)12:07:42 No.108638318

>>108638292
https://gitlab.com/chi7520115/orb

Anonymous
04/19/26(Sun)12:08:26 No.108638325

Anonymous 04/19/26(Sun)12:08:26 No.108638325

>>108638259
>Amaryllis
>Shodan
>Gothic Coding Sensei
Are we back in 2023?

Anonymous
04/19/26(Sun)12:17:44 No.108638367

Anonymous 04/19/26(Sun)12:17:44 No.108638367

Does this legitimately improve Qwen 3.6?
huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Wasserstein-GGUF
Anyone tested it? Can't tell if it actually helps long context tasks as claimed or if it is just LLM hallucination gibberish.
I apologize for posting plebbit, but here is further info:
/r/LocalLLaMA/comments/1sp2l72/

Anonymous
04/19/26(Sun)12:18:31 No.108638370

Anonymous 04/19/26(Sun)12:18:31 No.108638370

>>108638367
No finetune has ever improved a since 2024.

Anonymous
04/19/26(Sun)12:19:47 No.108638379

Anonymous 04/19/26(Sun)12:19:47 No.108638379

File: image.png (103 KB, 1152x372)

103 KB PNG

Is there a way to force gemma/qwen to reason from first person (picrel)? Base GLM-4-32B-0414-32b and Mistral-24b seem to be doing it fine but gemma/qwen just writing reasoning like a code. Even with explicit instructions it still gives me summary and bullet point reasoning.

The explicit instructions in question:
System prompt:
You're {{char}} in this fictional never-ending roleplay with {{user}}.
<|channel>thought
Character inner monologue should be mark like this.<channel|>
"Speech must be marked with quotation marks."
*Actions, internal thoughts, physical descriptions, and narrations should be marked with asterisks.*

Post-History Instructions:
Note for thinking block: Fully immerse yourself to the point of reasoning from {{char}}'s perspective. Thinking block must be from {{char}}'s POV, first person.

Anonymous
04/19/26(Sun)12:20:22 No.108638380

Anonymous 04/19/26(Sun)12:20:22 No.108638380

>>108638259
GPT 5.4 UI (slop)

Anonymous
04/19/26(Sun)12:21:43 No.108638393

Anonymous 04/19/26(Sun)12:21:43 No.108638393

Bought this giga gaming laptop with 128gb of RAM, sharing up to 96gb with the iGPU, hoping to be able to use my desktop (with a 5090 in it) for gaming while doing some casual chatting with a chatbot on the laptop. Unfortunately it's AMD, and the difference between CUDA and Vulkan is stark.
>5090: Process 1.86s (3570.12 T/s), Generate: 20.01s (42.78 T/s)
>Laptop with Ryzen AI MAX+ 395: Process 43.6s (152.39 T/s), Generate: 99.53 (8.47 T/s)
Might be more effective to just play my vidya on the laptop and use the desktop for chatting.

Anonymous
04/19/26(Sun)12:22:18 No.108638397

Anonymous 04/19/26(Sun)12:22:18 No.108638397

>>108638379
Text completion and prefill hackery, maybe.
Or terminate the real thinking process and instruct it to use <charname_thinking>, custom CoT style.

Anonymous
04/19/26(Sun)12:25:04 No.108638419

Anonymous 04/19/26(Sun)12:25:04 No.108638419

*speculates*

Anonymous
04/19/26(Sun)12:29:11 No.108638436

Anonymous 04/19/26(Sun)12:29:11 No.108638436

>>108638325
God I wish

Anonymous
04/19/26(Sun)12:31:21 No.108638451

Anonymous 04/19/26(Sun)12:31:21 No.108638451

File: setup.png (91 KB, 367x897)

91 KB PNG

>>108638380
I'm coding with qwen 3.6 q4km + Roo kek. I described ST's design to opus 4.7 and had it draft a skeleton for me though.

Anonymous
04/19/26(Sun)12:37:24 No.108638473

Anonymous 04/19/26(Sun)12:37:24 No.108638473

File: 1748623835770498.webm (2.23 MB, 1280x720)

2.23 MB WEBM

I slopped up my own VN frontend that uses anima with comfyui to automatically generate sprites and CGs for nsfw ERP (or wholesome) with gemma 4, it also automatically handles location changes and generates depthmaps to give locations a "3D" feeling.
I was tired of the other "engines" that added useless bullshit like inventory, stats and turned them into a cluttered mess.
the "slowness" is mostly caused by GPU struggling with gemma 4 31b, I only have 16gb vram sadly.

Anonymous
04/19/26(Sun)12:38:33 No.108638478

Anonymous 04/19/26(Sun)12:38:33 No.108638478

>>108638451
nta, I use the same (Roo+Qwen3.6-35B-A3B-UD-Q4_K_M), its very good :3

Anonymous
04/19/26(Sun)12:39:40 No.108638484

Anonymous 04/19/26(Sun)12:39:40 No.108638484

>>108638473
that's pretty damn cool

Anonymous
04/19/26(Sun)12:40:03 No.108638486

Anonymous 04/19/26(Sun)12:40:03 No.108638486

File: EY_faWUWoAYzWGc.jpg (70 KB, 933x901)

70 KB JPG

>>108638397
> <charname_thinking>
Thank you, it did work! In my experience any change in <think> formatting would break reasoning process.

For those who interested, what I did:
Replaced this line:
<|channel>thought
Character inner monologue should be mark like this.<channel|>
with this:
<{{char}}_thinking>
Character inner monologue should be mark like this.</{{char}}_thinking>

Anonymous
04/19/26(Sun)12:40:33 No.108638488

Anonymous 04/19/26(Sun)12:40:33 No.108638488

>>108638473
Cool. You gonna share eventually?
>16gb vram
Are you running comfy on a separate machine? I have 24 and Gemma eats it all up.

Anonymous
04/19/26(Sun)12:43:09 No.108638500

Anonymous 04/19/26(Sun)12:43:09 No.108638500

>>108638473
Impressive. Generates prompts for user's given action in the current scene?

Anonymous
04/19/26(Sun)12:43:57 No.108638506

Anonymous 04/19/26(Sun)12:43:57 No.108638506

File: 1749434178377803.gif (699 KB, 165x163)

699 KB GIF

>>108638473
Damn, now that's the future

Anonymous
04/19/26(Sun)12:45:38 No.108638514

Anonymous 04/19/26(Sun)12:45:38 No.108638514

>>108638473
Pretty cool. Reactions seem out of order though. Is it prompt issue or can't 31B handle it?

Anonymous
04/19/26(Sun)12:47:11 No.108638521

Anonymous 04/19/26(Sun)12:47:11 No.108638521

File: Idiocracy Youtube.jpg (104 KB, 1280x720)

104 KB JPG

>>108638506
No, THIS is the future. Real time AI generated advertisements everywhere. Forget about games...

Anonymous
04/19/26(Sun)12:48:06 No.108638524

Anonymous 04/19/26(Sun)12:48:06 No.108638524

>>108638506
I'd say ERPing with AI in VR is the future but it's still pretty damn cool.

Anonymous
04/19/26(Sun)12:48:10 No.108638525

Anonymous 04/19/26(Sun)12:48:10 No.108638525

File: 1761139279166349.jpg (17 KB, 360x360)

17 KB JPG

>>108638521
Don't give them ideas

Anonymous
04/19/26(Sun)12:49:01 No.108638528

Anonymous 04/19/26(Sun)12:49:01 No.108638528

>>108638521
For me, it's BEER ONLINE and SCENE SELECTION.

Anonymous
04/19/26(Sun)12:49:01 No.108638529

Anonymous 04/19/26(Sun)12:49:01 No.108638529

File: file.png (60 KB, 255x198)

60 KB PNG

>>108638486
be sure to change the reasoning tags in response formatting or all that CoT will be filling up your context

Anonymous
04/19/26(Sun)12:49:15 No.108638534

Anonymous 04/19/26(Sun)12:49:15 No.108638534

>>108638473
How do you do image and text with 16gb? Do you load/unload the model every time you need the other one? Doesn't that take way too long?

Anonymous
04/19/26(Sun)12:50:36 No.108638538

Anonymous 04/19/26(Sun)12:50:36 No.108638538

Does shorter response = better quality?

Anonymous
04/19/26(Sun)12:53:59 No.108638554

Anonymous 04/19/26(Sun)12:53:59 No.108638554

>>108638534
nta but Anima doesn't take that much memory at all, when image gen is active it will offload stuff to ram and vice versa.

Anonymous
04/19/26(Sun)12:56:21 No.108638564

Anonymous 04/19/26(Sun)12:56:21 No.108638564

File: 00008-501867366.png (1.85 MB, 1120x1120)

1.85 MB PNG

I'm out of the loop.
There is some new Anima thing for weebs?
I'm still using XL-based stuff.

Anonymous
04/19/26(Sun)12:57:00 No.108638568

Anonymous 04/19/26(Sun)12:57:00 No.108638568

>>108638564
>>>/g/ldg

Anonymous
04/19/26(Sun)12:58:04 No.108638571

Anonymous 04/19/26(Sun)12:58:04 No.108638571

>>108638259
Can you turn this into an VScode plugin so I can code with my girls? The generic copilot clones don't let me bring my char cards.

Anonymous
04/19/26(Sun)12:59:02 No.108638580

Anonymous 04/19/26(Sun)12:59:02 No.108638580

>>108638571
Be the change you want to see

Anonymous
04/19/26(Sun)13:00:13 No.108638585

Anonymous 04/19/26(Sun)13:00:13 No.108638585

File: aaaaa.jpg (123 KB, 871x763)

123 KB JPG

>24 hours passed
>no new models

Anonymous
04/19/26(Sun)13:00:33 No.108638587

Anonymous 04/19/26(Sun)13:00:33 No.108638587

>>108638554
Huh, maybe I should get back to making my own VN frontend. I made one before but I thought I'd have to fit both into vram at the same time and that meant shitty textgen.

Anonymous
04/19/26(Sun)13:00:55 No.108638588

Anonymous 04/19/26(Sun)13:00:55 No.108638588

how do I remove leftist delusions from my "uncensored" llm? I tried huihui-ai/Huihui-Qwen3-14B-abliterated-v2 but it still thinks the holocaust is real even if you give it actual evidence that it didn't happen

Anonymous
04/19/26(Sun)13:01:29 No.108638595

Anonymous 04/19/26(Sun)13:01:29 No.108638595

File: 1751995534120594.png (819 KB, 982x1080)

819 KB PNG

>vscode

Anonymous
04/19/26(Sun)13:03:24 No.108638607

Anonymous 04/19/26(Sun)13:03:24 No.108638607

File: 1773888351207217.jpg (962 KB, 2553x1080)

962 KB JPG

>>108638488
>>108638534
>>108638500
the character sprites and CGs are generated all at once beforehand in the character editor, all expressions and possible CG scenarios are queued up and you can also choose a number of variants so that they're randomized during play, running both comfyui and gemma 31b is simply not feasible, at least not on my GPU right now.
each character takes about an hour of nonstop generating with my current sprite/CG sheet to cover any possible situation during play.
so I basically first generate the sprites with comfyui, then close it to free my vram and then run gemma 31b with the character and scenario I saved.
realtime generation would be cool eventually

>>108638514
if you mean the expressions and or text repeating itself sometimes, that's an issue I've been trying to fix for a while, might be caused by streaming

Anonymous
04/19/26(Sun)13:04:30 No.108638612

Anonymous 04/19/26(Sun)13:04:30 No.108638612

>>108638595
it just werkz

Anonymous
04/19/26(Sun)13:04:37 No.108638613

Anonymous 04/19/26(Sun)13:04:37 No.108638613

>>108638571
You're asking me to make a completely unrelated thing... Just vibecode it, or if you hate slop then ask Claude how to make something like that and do it yourself.

Anonymous
04/19/26(Sun)13:04:40 No.108638614

Anonymous 04/19/26(Sun)13:04:40 No.108638614

>>108638607
This is unplayable, [shocked] doesn't have the pattern on the hoodie.

Anonymous
04/19/26(Sun)13:05:01 No.108638619

Anonymous 04/19/26(Sun)13:05:01 No.108638619

>>108638588
>actual evidence
retard

Anonymous
04/19/26(Sun)13:06:49 No.108638631

Anonymous 04/19/26(Sun)13:06:49 No.108638631

>>108638607
Does Gemma handle the proompting? I suck at imagegen.

Anonymous
04/19/26(Sun)13:07:06 No.108638633

Anonymous 04/19/26(Sun)13:07:06 No.108638633

File: how-do-we-tell-him-mr-krabs.gif (176 KB, 640x354)

176 KB GIF

>>108638588

Anonymous
04/19/26(Sun)13:08:24 No.108638640

Anonymous 04/19/26(Sun)13:08:24 No.108638640

>>108638588
>/pol/ brainrot

Anonymous
04/19/26(Sun)13:10:25 No.108638647

Anonymous 04/19/26(Sun)13:10:25 No.108638647

>>108638588
Sorry, it's mostly real.
Even if colorized a bit.
But I'm sure you will find a different niche hipster gimmick.

Anonymous
04/19/26(Sun)13:11:22 No.108638650

Anonymous 04/19/26(Sun)13:11:22 No.108638650

>>108638631
the CG prompts are manual and can be exported and imported as jsons, if I opensource it I could just share my CG json with it

>>108638614
and default gave her bigger tits

Anonymous
04/19/26(Sun)13:11:49 No.108638652

Anonymous 04/19/26(Sun)13:11:49 No.108638652

>>108638607
You could do realtime generation with any character if you setup a bunch of controlnets for each pose. Then you could scale that controlnet to adjust for character size also.

Anonymous
04/19/26(Sun)13:12:06 No.108638654

Anonymous 04/19/26(Sun)13:12:06 No.108638654

>>108638640
>having a biased model is good
>>108638619
>believing jews in the current year

Anonymous
04/19/26(Sun)13:13:41 No.108638662

Anonymous 04/19/26(Sun)13:13:41 No.108638662

e4b is so much better than nemo at erp its not even funny. a26b probably btfos midnight miqu then

Anonymous
04/19/26(Sun)13:14:51 No.108638668

Anonymous 04/19/26(Sun)13:14:51 No.108638668

>>108638662
Which e4b quant?

Anonymous
04/19/26(Sun)13:14:59 No.108638670

Anonymous 04/19/26(Sun)13:14:59 No.108638670

https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html
This but for slop.

Anonymous
04/19/26(Sun)13:15:44 No.108638677

Anonymous 04/19/26(Sun)13:15:44 No.108638677

>>108638668
q6

Anonymous
04/19/26(Sun)13:18:32 No.108638691

Anonymous 04/19/26(Sun)13:18:32 No.108638691

>>108638587
You can min/max, and leave 1-2GB vram buffer for the image model and the rest of your vram is dedicated to the llm. Rest of the image gen model can be offloaded and cum ui does that on its own. I'm sure this will work. Besides llama-server uses memory mapping by default too.

Anonymous
04/19/26(Sun)13:21:50 No.108638709

Anonymous 04/19/26(Sun)13:21:50 No.108638709

Be real with me
I got 2x3090
64gb ram dd4

best model for coding agents? opencode / pi
large context with turboquant if possible

Anonymous
04/19/26(Sun)13:23:36 No.108638721

Anonymous 04/19/26(Sun)13:23:36 No.108638721

>>108638709
Rotational caches?

Anonymous
04/19/26(Sun)13:24:12 No.108638725

Anonymous 04/19/26(Sun)13:24:12 No.108638725

>>108638370
It's not supposed to be a usual finetune.
I guess I will just go with HauhauCS since I am not going to take a lot of time testing it with it and it's more trustworthy in terms of not fucking anything else unexpectedly up.
>>108638564
Yes. It's superior to anything SDXL.
https://huggingface.co/circlestone-labs/Anima
Still unfinished though.

Anonymous
04/19/26(Sun)13:25:23 No.108638732

Anonymous 04/19/26(Sun)13:25:23 No.108638732

>>108638709
>Be real with me
If you gotta ask then you're doomed.

Anonymous
04/19/26(Sun)13:25:26 No.108638735

Anonymous 04/19/26(Sun)13:25:26 No.108638735

>>108638709
Gemma 31B currently, otherwise wait for the remaining Qwen 3.6 sizes to come out.

Anonymous
04/19/26(Sun)13:25:36 No.108638738

Anonymous 04/19/26(Sun)13:25:36 No.108638738

File: scott-the-woz-show-me-the(...).gif (162 KB, 220x124)

162 KB GIF

>>108638654

Anonymous
04/19/26(Sun)13:27:16 No.108638747

Anonymous 04/19/26(Sun)13:27:16 No.108638747

>>108638691
Huh, didn't realize it was that easy. I guess I'll go back to the coding mines soon.

Anonymous
04/19/26(Sun)13:28:21 No.108638754

Anonymous 04/19/26(Sun)13:28:21 No.108638754

File: moonshot.png (299 KB, 1591x868)

299 KB PNG

OMG what the fuck is wrong with moonshotAI's homepage? This shit is slow and clunky as balls, moving my cursor feels like lifting a dumbbell.

Anonymous
04/19/26(Sun)13:30:14 No.108638767

Anonymous 04/19/26(Sun)13:30:14 No.108638767

>>108638709
>I got 2x3090
Can fit Gemma 4 31b-it q8 131k on gpu with ~18-25 t/s but more speed on linux. Use the MoE if you need more context or want 5x tg speed

Anonymous
04/19/26(Sun)13:31:07 No.108638775

Anonymous 04/19/26(Sun)13:31:07 No.108638775

>>108638747
I mean that was just an example out of my ass, you need to set it up based on your own system.
Besides for some shitty anime image portraits you can probably use a Q4 quant of that model... Or turbo version if there's one available.

Anonymous
04/19/26(Sun)13:31:31 No.108638778

Anonymous 04/19/26(Sun)13:31:31 No.108638778

>>108638754
vibe-coded by some /lmg/ retard?

Anonymous
04/19/26(Sun)13:31:56 No.108638780

Anonymous 04/19/26(Sun)13:31:56 No.108638780

>>108638754
You can thank webshitters

Anonymous
04/19/26(Sun)13:36:39 No.108638809

Anonymous 04/19/26(Sun)13:36:39 No.108638809

>>108638709
>>108638767
you can also use tensor parallelism though i should have mentioned it doesn't support non-fp16 cache >>108634728

Anonymous
04/19/26(Sun)13:40:08 No.108638823

Anonymous 04/19/26(Sun)13:40:08 No.108638823

>>108638754
coded by kimi 2.6 for perfect gorgos look

Anonymous
04/19/26(Sun)13:40:18 No.108638824

Anonymous 04/19/26(Sun)13:40:18 No.108638824

>>108638754
The so called vibe coding often has that effect

Anonymous
04/19/26(Sun)13:40:35 No.108638828

Anonymous 04/19/26(Sun)13:40:35 No.108638828

>>108638775
Nah, I want full pictures. I don't really care about portraits. I want 'intelligent' images. As in the LLM creates the tags / prompt for the pictures and live generates them according to what's happening in the story. Which so far has always turned into garbage since LLMs aren't good at creating tags and image models aren't good with prose. I was really hoping ZIT would have hentai tunes by now. I haven't tried anima, I think that's supposed to somewhat better work with prose?

Anonymous
04/19/26(Sun)13:45:51 No.108638866

Anonymous 04/19/26(Sun)13:45:51 No.108638866

>>108637241
Do you know about Qwen Omni and MiniCPM-o? The latter one is pretty neat https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/WebRTC_Demo/README.md

Anonymous
04/19/26(Sun)13:46:29 No.108638870

Anonymous 04/19/26(Sun)13:46:29 No.108638870

why can I paste entire paragraphs into my local model chat and have really long conversations with it without it having problems to follow anything.

but when I enter 30 booru tags into my prompt field in comfy it starts generating extra fingers and doesn't even apply all 30 tags since it forgets them?

Anonymous
04/19/26(Sun)13:50:52 No.108638895

Anonymous 04/19/26(Sun)13:50:52 No.108638895

>>108638870
Tags are ingested the CLIP, as iirc the ones that most models use don't support a high prompt length and they are trained on even less. Same problem as LLMs, just smaller.

Anonymous
04/19/26(Sun)13:54:24 No.108638914

Anonymous 04/19/26(Sun)13:54:24 No.108638914

i just want to say that, i have a semi-decent (but kinda dumb, and definitely slow) setup using the following for my opencode + openagent orchestration setup:

minimax-m2.7 for the "smartest" guy (sisyphus, prometheus, hephaestus?! ) and then the rest is basically deepseek-v3.2-exp. + some gemma-4-p26b-a4b-it for librarian and smaller requirements...

can i just say that the greek name branding is hella cring?

Anonymous
04/19/26(Sun)13:55:52 No.108638923

Anonymous 04/19/26(Sun)13:55:52 No.108638923

>>108638870
stop using sdxl

Anonymous
04/19/26(Sun)13:57:03 No.108638931

Anonymous 04/19/26(Sun)13:57:03 No.108638931

>>108638914
Why keep v3.2 if m2.7 is your smartest? I would just replace v3.2 and the 26b with gemma 31b.

Anonymous
04/19/26(Sun)13:58:10 No.108638941

Anonymous 04/19/26(Sun)13:58:10 No.108638941

>>108638367
>>108638725
I prefer the ones made by llmfan46

Anonymous
04/19/26(Sun)13:59:45 No.108638953

Anonymous 04/19/26(Sun)13:59:45 No.108638953

>>108638870
Look at the size of the CLIP model

Anonymous
04/19/26(Sun)14:00:37 No.108638962

Anonymous 04/19/26(Sun)14:00:37 No.108638962

That ozone smell making me go lalalalala~

Anonymous
04/19/26(Sun)14:01:16 No.108638964

Anonymous 04/19/26(Sun)14:01:16 No.108638964

>>108638931
ah, just because it's not needed, and they are technically cheaper (yeah i'm probably in the wrong thread when it comes to not "running the LLMs locally myself", but honestly, i'm currently waiting out to "see what happens" with gpus, ASICS... bubble burst? etc... and these models are on the cheap side, which is A+, max ~$1/M tokens). and for librarian task (basically grep through text), it's nice to have them be faster == less waiting.

Anonymous
04/19/26(Sun)14:01:28 No.108638966

Anonymous 04/19/26(Sun)14:01:28 No.108638966

I will breathe ozone.

Anonymous
04/19/26(Sun)14:01:53 No.108638970

Anonymous 04/19/26(Sun)14:01:53 No.108638970

>>108638870
>extra fingers
let me guess, base illus/noob/wainsfw?

Anonymous
04/19/26(Sun)14:02:56 No.108638978

Anonymous 04/19/26(Sun)14:02:56 No.108638978

>>108637976
> q4
> heretic

Anonymous
04/19/26(Sun)14:04:18 No.108638985

Anonymous 04/19/26(Sun)14:04:18 No.108638985

>>108638962
It's like electricity hitting my core making my breath hitch in my throat

Anonymous
04/19/26(Sun)14:06:11 No.108639001

Anonymous 04/19/26(Sun)14:06:11 No.108639001

File: rn.png (7 KB, 535x42)

7 KB PNG

Anonymous
04/19/26(Sun)14:06:11 No.108639002

Anonymous 04/19/26(Sun)14:06:11 No.108639002

>>108638923
>stop using sdxl
instead?
>>108638953
what?
>>108638970
yes

Anonymous
04/19/26(Sun)14:08:04 No.108639017

Anonymous 04/19/26(Sun)14:08:04 No.108639017

File: file.png (148 KB, 1903x849)

148 KB PNG

>>108638914
>>108638964

oh and to expand on my choices

since there is this whole "orchestra" of llms working together, then you want the smart slow guy for the boxes with many arrows, and then i guess stupider ones for the ones with few arros (specialized).

but note also I was thinking it could be worth it to have a different model (deepseek-v3.2) be the reviewer of the plans and be the "consultant" to the initial planner... idk man... this diagram seems outdated too... here even _is_ Sisyphus on this?

Anonymous
04/19/26(Sun)14:08:19 No.108639019

Anonymous 04/19/26(Sun)14:08:19 No.108639019

File: 1750194482389602.png (236 KB, 1249x1066)

236 KB PNG

It's here
https://huggingface.co/deepseek-ai/DeepSeek-V4

Anonymous
04/19/26(Sun)14:08:46 No.108639021

Anonymous 04/19/26(Sun)14:08:46 No.108639021

File: qwen3.6_35b_a3b_score.png (1.59 MB, 4784x2580)

1.59 MB PNG

is this accurate?
is Qwen3.6 better than Gemma4 at japanese translation?

Anonymous
04/19/26(Sun)14:10:50 No.108639039

Anonymous 04/19/26(Sun)14:10:50 No.108639039

>>108639021
>is Qwen3.6 better than Gemma4 at japanese translation?
I doubt, read this: https://shisa.ai/posts/jp-tl-bench/

Anonymous
04/19/26(Sun)14:11:27 No.108639040

Anonymous 04/19/26(Sun)14:11:27 No.108639040

I did some research and heretic way of doing ablation is outdated according to the current understanding of LLMs. I'm cooking something, just know that you heard it here before reddit

Anonymous
04/19/26(Sun)14:11:56 No.108639045

Anonymous 04/19/26(Sun)14:11:56 No.108639045

>>108638538
The [user text] / [AI text] ratio matters, I think.
The less the AI writes, the less it will be influenced by its own responses.

Anonymous
04/19/26(Sun)14:12:39 No.108639052

Anonymous 04/19/26(Sun)14:12:39 No.108639052

>>108639021
also, qwen always fails these tests >>108627608 and needs to be primed (and even when primed its not 100% fool proof):

Anonymous
04/19/26(Sun)14:13:56 No.108639064

Anonymous 04/19/26(Sun)14:13:56 No.108639064

>>108639040
yet another one lost to llm psychosis

Anonymous
04/19/26(Sun)14:14:08 No.108639066

Anonymous 04/19/26(Sun)14:14:08 No.108639066

>gemma-4-cheng-geng-crack-714HD with unlimited super uncensored capabilities
vs
>stk-sureya superpower vajra attention model with 2 trillion parameters
vs
>qwen-3.5 thinking mode ON

who wins, anons?

Anonymous
04/19/26(Sun)14:15:32 No.108639072

Anonymous 04/19/26(Sun)14:15:32 No.108639072

>>108637811
>Just wait a couple years and you'll be able to run Kimi on a consumer GPU.
You believe this?

Anonymous
04/19/26(Sun)14:15:59 No.108639075

Anonymous 04/19/26(Sun)14:15:59 No.108639075

File: beauty_of_ai.jpg (82 KB, 848x130)

82 KB JPG

thanks, Gemma-chan

Anonymous
04/19/26(Sun)14:16:38 No.108639080

Anonymous 04/19/26(Sun)14:16:38 No.108639080

what is pewdiepies setup hardware and what model is he using?

I am a poorfag with 3090 so just 24GB of VRAM, but i am thinking of scraping up and getting 5090 with 36GB of vram, what does /g/ think?

Anonymous
04/19/26(Sun)14:17:13 No.108639084

Anonymous 04/19/26(Sun)14:17:13 No.108639084

>>108639072
>not believing in bonsai 1gb 0.1bit 1 gorillion parameters AGI
ngmi

Anonymous
04/19/26(Sun)14:18:36 No.108639093

Anonymous 04/19/26(Sun)14:18:36 No.108639093

>>108639080
its only worth spending money in a gpu you will exclusively use for this if you are doing child rape stories and worry about using api for that, else its throwing money away to get a worse experience

Anonymous
04/19/26(Sun)14:19:29 No.108639101

Anonymous 04/19/26(Sun)14:19:29 No.108639101

>>108639072
Would you even want to run Kimi in a few years (more like 10 or so) is the better question.

Anonymous
04/19/26(Sun)14:20:04 No.108639104

Anonymous 04/19/26(Sun)14:20:04 No.108639104

>>108639072
2 more weeks for 1b param 1t engram agi

Anonymous
04/19/26(Sun)14:20:18 No.108639105

Anonymous 04/19/26(Sun)14:20:18 No.108639105

>>108639017
Never used OpenAgent, but it seems overcomplicated, doesn't it? Do you get better results from it compared to a simple harness with an orchestrator that delegates to a flat list of modes?
I assume you'll say that you can run tasks in parallel, but I've never assigned a task where multiple agents working on it seemed like it would help and not just result in conflicts and confusion.

Anonymous
04/19/26(Sun)14:21:47 No.108639120

Anonymous 04/19/26(Sun)14:21:47 No.108639120

File: 1773843898707348.jpg (1.28 MB, 759x2317)

1.28 MB JPG

>>108639093
lmao no I want to use it for vibecoding without wasting hundreds of dollars per month, i realized i can just invest into a 5090 card and have my own model, in fact for all the money i spent i could probably own 2x 5090 cards by now

Anonymous
04/19/26(Sun)14:21:50 No.108639121

Anonymous 04/19/26(Sun)14:21:50 No.108639121

>>108639080
>36GB of vram
u r retarded

Anonymous
04/19/26(Sun)14:22:26 No.108639123

Anonymous 04/19/26(Sun)14:22:26 No.108639123

File: 1752518262314768.png (32 KB, 500x551)

32 KB PNG

What's the prompt if I just wanna have a basic assistant, a-la Gemini, but ok with everything? "You are an helpful assistant...."?

Anonymous
04/19/26(Sun)14:22:48 No.108639126

Anonymous 04/19/26(Sun)14:22:48 No.108639126

>>108639105
if its separate issues or separate repos then you can, otherwise it could be a problem, very rare usecase

Anonymous
04/19/26(Sun)14:23:42 No.108639133

Anonymous 04/19/26(Sun)14:23:42 No.108639133

>>108639120
You do realize that the free models you can run on your consumer gaming card won't be the same quality as the expensive API ones, yes?

Anonymous
04/19/26(Sun)14:24:39 No.108639136

Anonymous 04/19/26(Sun)14:24:39 No.108639136

>>108639123
lol this looks like a botched convolutional neural network designed to isolate the subject, they did this with spacecraft

Anonymous
04/19/26(Sun)14:25:49 No.108639138

Anonymous 04/19/26(Sun)14:25:49 No.108639138

File: 1772589900230236.jpg (602 KB, 1170x1585)

602 KB JPG

>>108639133
>>108639121

but pewdiepie said his model outperformed some of the expensive models

why wont it? is it because of lower context?

Anonymous
04/19/26(Sun)14:25:54 No.108639139

Anonymous 04/19/26(Sun)14:25:54 No.108639139

>>108639126
I've always just used git worktrees and manually started new instances with the issue I want them to tackle.

Anonymous
04/19/26(Sun)14:27:22 No.108639150

Anonymous 04/19/26(Sun)14:27:22 No.108639150

>>108639138
how about you fuck off and go ask your retarded eceleb?

Anonymous
04/19/26(Sun)14:28:18 No.108639153

Anonymous 04/19/26(Sun)14:28:18 No.108639153

>>108639120
thats odd... with the cheaper side of apis you would need to run then 24/7 for years to get the tokens worth of a 5090, what kind of API are you using? if the task you have is so complex that you need expensive APIs a single 5090 won't be worth anything, if the task you are doing can be done with models on a 5090 then cheap apis that are worth years of 24/7 could do that already

Anonymous
04/19/26(Sun)14:29:30 No.108639159

Anonymous 04/19/26(Sun)14:29:30 No.108639159

>>108639019
Deepseek V4 will be so good that it's literary prose and logic understanding would feel like out of this universe. You'll never get enough of it unlike gemma which got you faggots bored in just a few days. It's gonna reshape open source llms. Mark my words.

Anonymous
04/19/26(Sun)14:30:27 No.108639161

Anonymous 04/19/26(Sun)14:30:27 No.108639161

>>108639123
No prompt.

Anonymous
04/19/26(Sun)14:30:34 No.108639163

Anonymous 04/19/26(Sun)14:30:34 No.108639163

File: 1770728563927098.jpg (182 KB, 1280x720)

182 KB JPG

>>108639019
It's amazing how I actually fall for this every single thread without fail
At this point I know I will fall for it again whenever I see the link, but I still click because I'd genuinely kill myself if I didn't click the one time it's actually out
Hopefully my award will be in the mail soon

Anonymous
04/19/26(Sun)14:31:46 No.108639168

Anonymous 04/19/26(Sun)14:31:46 No.108639168

>>108639159
Can't be good if it never fucking releases.

Anonymous
04/19/26(Sun)14:32:39 No.108639172

Anonymous 04/19/26(Sun)14:32:39 No.108639172

File: 1772558456181989.png (1.8 MB, 921x1382)

1.8 MB PNG

>>108639153
openai pro which is $200 a month, and i 95% only use coding models in CLI, this is what I would want to run on personal hardware, just the coding models

Anonymous
04/19/26(Sun)14:32:40 No.108639173

Anonymous 04/19/26(Sun)14:32:40 No.108639173

>>108639163
You're good. Imagine being the retard that wastes his time editing his shitty bait for every model.

Anonymous
04/19/26(Sun)14:33:08 No.108639179

Anonymous 04/19/26(Sun)14:33:08 No.108639179

>>108639150
yeah fuck your chud ass thread it's gonna have 0 posts per minute at this rate freak

Anonymous
04/19/26(Sun)14:33:28 No.108639181

Anonymous 04/19/26(Sun)14:33:28 No.108639181

File: 1747305607582838.webm (2.87 MB, 1920x1080)

2.87 MB WEBM

>>108639161
I've talked with it via Kobold (so no prompt) plenty of times to test stuff and it's a bit too dry for my tastes. I suspect that without the "be useful pl0x" bullshit, it defaults to doing the absolute bare minimum

>>108639173
At least it's funny (to me)

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.