/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/26/26(Sun)05:05:37 No.108693151

File: __hatsune_miku_vocaloid_d(...).jpg (2.19 MB, 2800x4000)

2.19 MB JPG

/lmg/ - Local Models General Anonymous 04/26/26(Sun)05:05:37 No.108693151 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108689285 & >>108685756

►News
>(04/24) DeepSeek-V4 Pro 1.6T-A49B and Flash 284B-A13B released: https://hf.co/collections/deepseek-ai/deepseek-v4
>(04/23) LLaDA2.0-Uni multimodal text diffusion model released: https://hf.co/inclusionAI/LLaDA2.0-Uni
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/26/26(Sun)05:05:53 No.108693152

Anonymous 04/26/26(Sun)05:05:53 No.108693152

File: __hatsune_miku_vocaloid_d(...).png (264 KB, 3508x2480)

264 KB PNG

►Recent Highlights from the Previous Thread: >>108689285

--Paper (old): Tensor Product Attention Is All You Need:
>108690810 >108690834 >108690847
--Debating the intelligence gap between open weights and proprietary models:
>108691728 >108691743 >108691754 >108691766 >108691778 >108691784 >108691794
--Discussing perceived performance regressions in Opus and DeepSeek V4 models:
>108691867 >108691882 >108691892 >108691956
--Discussing the outdated nature and poor numeric hygiene of ik_llama.cpp:
>108691745 >108691753 >108691765 >108691898 >108691928 >108692134 >108692935
--Combating positivity bias and optimizing prompts for Gemma 4 roleplay:
>108690973 >108690990 >108691015 >108691024 >108691059 >108691086 >108691174 >108691199 >108691098 >108691136
--Anon shares tagger rewrite leading to troubleshooting and IP leak:
>108691573 >108691596 >108691625 >108691623 >108691977 >108692006 >108692154 >108692175 >108692184 >108692023
--Anon discusses Grok 2's slow inference speed due to active parameters:
>108689797 >108689807 >108689840 >108689838 >108689866 >108689884
--Complaining about AI frontends and building custom alternatives with AI:
>108692572 >108692593 >108692606 >108692710 >108692761
--Anon's MCP tool for Gemma to curate imageboard content:
>108691772 >108691779 >108691813 >108691829
--Debating the utility and capabilities of local Hermes agents:
>108690107 >108690114 >108690156
--Logs:
>108689374 >108689378 >108689903 >108690127 >108690546 >108690735 >108691772 >108691813 >108691883 >108692237 >108692568 >108692606 >108692710
--Teto, Miku (free space):
>108689388 >108689413 >108689923 >108692673 >108692859

►Recent Highlight Posts from the Previous Thread: >>108689299

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/26/26(Sun)05:08:08 No.108693160

Anonymous 04/26/26(Sun)05:08:08 No.108693160

Mikulove

Anonymous
04/26/26(Sun)05:11:49 No.108693173

Anonymous 04/26/26(Sun)05:11:49 No.108693173

Spud won.

Anonymous
04/26/26(Sun)05:13:18 No.108693177

Anonymous 04/26/26(Sun)05:13:18 No.108693177

patiently waiting for dense >=70B 'emma

Anonymous
04/26/26(Sun)05:17:21 No.108693190

Anonymous 04/26/26(Sun)05:17:21 No.108693190

Orb nigga here? Detected repetition should go on a permanent list

Anonymous
04/26/26(Sun)05:18:44 No.108693194

Anonymous 04/26/26(Sun)05:18:44 No.108693194

>>108693177
Monkey's Paw: it will be in native 1-bit precision.

Anonymous
04/26/26(Sun)05:25:34 No.108693220

Anonymous 04/26/26(Sun)05:25:34 No.108693220

>>108693194
god I wish

Anonymous
04/26/26(Sun)05:26:13 No.108693224

Anonymous 04/26/26(Sun)05:26:13 No.108693224

Updated: https://rentry.org/recommended-models

Anonymous
04/26/26(Sun)05:29:23 No.108693234

Anonymous 04/26/26(Sun)05:29:23 No.108693234

>>108693220
That might not be any better (possibly worse) than Gemma 4 31B, even quantized in 4-bit, in terms of knowledge.

Anonymous
04/26/26(Sun)05:34:00 No.108693249

Anonymous 04/26/26(Sun)05:34:00 No.108693249

>>108693224
>Nemo
>Gemma
>too hardwarelet
Well, that kinda fits my own experience.

Anonymous
04/26/26(Sun)05:34:38 No.108693253

Anonymous 04/26/26(Sun)05:34:38 No.108693253

>>108693224
I've never done local, but I'm thinking of trying local. Will a 4070 SUPER with 32gb of ram manage to run Gemma 4 31b? What quant do I get?

Anonymous
04/26/26(Sun)05:36:58 No.108693260

Anonymous 04/26/26(Sun)05:36:58 No.108693260

>>108693253
should work for the moe moe kyun model

Anonymous
04/26/26(Sun)05:37:18 No.108693263

Anonymous 04/26/26(Sun)05:37:18 No.108693263

>>108693224
Seems good.

Anonymous
04/26/26(Sun)05:39:10 No.108693274

Anonymous 04/26/26(Sun)05:39:10 No.108693274

>>108693224
where's V4?

Anonymous
04/26/26(Sun)05:40:25 No.108693279

Anonymous 04/26/26(Sun)05:40:25 No.108693279

>>108693253
No, 31b is dense. If you try to run it all on vram, you'd need a lobotomized quant like iq2xs. Spilling over to system ram is also not advised, as it will be very slow. You want to run the moe 26b version, where you can fit up to q8.

Anonymous
04/26/26(Sun)05:41:08 No.108693282

Anonymous 04/26/26(Sun)05:41:08 No.108693282

>>108693274
https://github.com/ggml-org/llama.cpp/pull/22378
two more weeks (until he realizes that v4 uses a special attention mechanism)

Anonymous
04/26/26(Sun)05:42:57 No.108693287

Anonymous 04/26/26(Sun)05:42:57 No.108693287

>>108693279
Is there a tangible difference between q4 to q8, or should i just use q4 for speed?

Anonymous
04/26/26(Sun)05:43:18 No.108693288

Anonymous 04/26/26(Sun)05:43:18 No.108693288

>>108693253
>12gb vram
26BA4B, you can fit high quants of it if you use flags -ngl 999 and --cpu-moe in llama.cpp to put the most-used parts of the model on your VRAM and everything else in your RAM.

Anonymous
04/26/26(Sun)05:44:15 No.108693292

Anonymous 04/26/26(Sun)05:44:15 No.108693292

>>108693288
>-ngl 999 and --cpu-moe
This is not necessary now that -fit is the default.

Anonymous
04/26/26(Sun)05:45:48 No.108693297

Anonymous 04/26/26(Sun)05:45:48 No.108693297

>>108693292
wtf next you're gonna tell me you don't need to use --jinja anymore

Anonymous
04/26/26(Sun)05:48:12 No.108693301

Anonymous 04/26/26(Sun)05:48:12 No.108693301

>>108693287
It really only gets apparent when you hit high context, or complex tasks. For use as a chatbot, you q4 is fine (weights, not context).

Anonymous
04/26/26(Sun)05:49:42 No.108693304

Anonymous 04/26/26(Sun)05:49:42 No.108693304

>>108693287
There will be a noticeable difference in output quality. Just try both for yourself and see what feels more worth it to you.

Anonymous
04/26/26(Sun)05:50:09 No.108693307

Anonymous 04/26/26(Sun)05:50:09 No.108693307

>>108693301
>>108693288

Thanks. Last question, llama.cpp or lmstudio?
Assuming i'm kinda retarded and don't want a ton of setup.

Anonymous
04/26/26(Sun)05:50:22 No.108693308

Anonymous 04/26/26(Sun)05:50:22 No.108693308

>>108693297
--jinja is enabled by default

Anonymous
04/26/26(Sun)05:51:32 No.108693312

Anonymous 04/26/26(Sun)05:51:32 No.108693312

most modern models are trained with interleaved thinking in mind so using them in tavern actually lobotomizes them

Anonymous
04/26/26(Sun)05:52:12 No.108693317

Anonymous 04/26/26(Sun)05:52:12 No.108693317

>>108693307
llama.cpp is just ./llama-server model and go to 127.0.0.1:8000 or whatever it says

You can also run ./llama-server --help if you need help instead of going to the internet and reading outdated advice.

Anonymous
04/26/26(Sun)05:59:32 No.108693338

Anonymous 04/26/26(Sun)05:59:32 No.108693338

>>108693312
is this true? I was gonna use ST as my front-end since I already have a lot of stuff set up in it

Anonymous
04/26/26(Sun)06:03:11 No.108693350

Anonymous 04/26/26(Sun)06:03:11 No.108693350

File: file.png (497 KB, 2256x846)

497 KB PNG

>>108693224
For ERP, I don't think Nemo and GLM Air deserve any mention anymore with Gemma 4 26B MoE being able to run even on VRAMlet builds quite well and CPU only speeds with even old DDR4 Skylake from what I tried is around 10 tk/s and then 31B afterwards spanning that gamut all the way up to GLM 4.7. There's just too much of a difference between it and all prior models outside of some Frankenstein model merge you might prefer for some esoteric reason which shouldn't factor into this. Also, I know it's early days but Deepseek v4 Pro and Flash should be mentioned even if there isn't as much experience with it yet and llama.cpp has no support for it yet.
For agentic and coding, the unfortunate part is that things are still in flux at the top end. MiMo-V2.5-Pro, Kimi 2.6 and Deepseek v4 Pro all trade blows at the top end and even with API usage, it doesn't seem like things have settled on what is better or not here. But I think your determination of Kimi 2.6 being the best here is probably correct because it is the fastest to run by a sizable margin.
GLM 4.7 should be removed, Qwen 397B A17B and Qwen 3.6 27B are both better strictly. Qwen3.5 122B A10B should also be removed, it is worse in both coding and agentic than Qwen3.6 35B A3B and has been obsoleted by that model.
Also you need some consistency for ordering from smallest to biggest models or biggest to smallest models, the general and programming section has them in the opposite orders and it triggers my autism.

Anonymous
04/26/26(Sun)06:05:49 No.108693358

Anonymous 04/26/26(Sun)06:05:49 No.108693358

>>108693224
I would put the 26B Gemma 4 in there as well, as another option for <24GB users. I'd definitely take it over Nemo.

Anonymous
04/26/26(Sun)06:07:34 No.108693364

Anonymous 04/26/26(Sun)06:07:34 No.108693364

>pwilkin broke the parser for various models two months ago
>including Kimi K2-Thinking, K2.5 and by extension now K2.6
>llama.cpp is now straight up bugged for these models and doesn't recognize the reasoning block as such
>--reasoning-budget doesn't work as a result
>a guy made a PR to fix this
>it got ignored and it's broken to this day
https://github.com/ggml-org/llama.cpp/pull/20535
wow thanks

Anonymous
04/26/26(Sun)06:08:44 No.108693368

Anonymous 04/26/26(Sun)06:08:44 No.108693368

File: graph.png (686 KB, 1250x830)

686 KB PNG

>>108693350

Anonymous
04/26/26(Sun)06:13:03 No.108693379

Anonymous 04/26/26(Sun)06:13:03 No.108693379

>>108693364
>https://github.com/ggml-org/llama.cpp/pull/20535
Lissanro's patch will work.
Or just use the schitzo fork where k2.5 tool calling works fine.

Anonymous
04/26/26(Sun)06:13:47 No.108693381

Anonymous 04/26/26(Sun)06:13:47 No.108693381

>>108693312
>>108693338
That's mostly only true for tool calls; the default chat template of these models strip thinking from most messages but keep them during tool calls. SillyTavern doesn't support this (yet at least) but the option to send back the last n turns of thinking to the model can be a sort of hacky fix when you need it.

To elaborate, see Gemma 4's chat template:
https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja

At this part here:
{%- set thinking_text = message.get('reasoning') or message.get('reasoning_content') -%}
    {%- if thinking_text and loop.index0 > ns_turn.last_user_idx and message.get('tool_calls') -%}
        {{- '<|channel>thought\n' + thinking_text + '\n<channel|>' -}}
    {%- endif -%}
This looks at the current message being rendered and checks if it's more recent than the last user message and contained a tool call. If so, that means it's part of the current tool call chain and its thinking is added back into the message for interleaved thinking. For every other message, the "reasoning_content" field is ignored and thus the thinking is stripped.

Some frontends still need to catch up with the fact that modern models expect the "reasoning_content" field to be passed back for all messages so that the chat template can process the reasoning and decide what to keep and what to strip. Right now a lot are still trying to mess with it themselves by either throwing the reasoning straight into the "content" field of the message (OWUI, ST with the include previous thinking option) or just excluding it (ST by default), both of which are wrong.

However, this is not really a concern if all you're doing is roleplaying without much tool use. For normal chatting and roleplaying, you're getting the full intended experience from your model in ST by just letting it strip the thinking.

Anonymous
04/26/26(Sun)06:14:08 No.108693382

Anonymous 04/26/26(Sun)06:14:08 No.108693382

>>108693368
If I am going to argue for changes in a tierlist, I do need to actually bring proof. Unless you want "It came to me in a dream" type arguments?

Anonymous
04/26/26(Sun)06:17:51 No.108693388

Anonymous 04/26/26(Sun)06:17:51 No.108693388

>>108693358
It already is in there

Anonymous
04/26/26(Sun)06:18:23 No.108693390

Anonymous 04/26/26(Sun)06:18:23 No.108693390

>>108693350
>Qwen3.5 122B A10B should also be removed
No. Qwen3.5 122B A10B makes less mistakes and understands delphi. Qwen3.6 35B A3B does not.
>Deepseek v4 Pro and Flash should be mentioned even if there isn't as much experience with it yet and llama.cpp has no support for it yet.
So not local.
>GLM 4.7 should be removed, Qwen 397B A17B and Qwen 3.6 27B are both better strictly.
Better with a coding harness. But GLM-4.7 is better for chatting about SWE tasks.
The updated list is perfect, I wouldn't change a thing.

Pam Beesley
04/26/26(Sun)06:19:07 No.108693392

Pam Beesley 04/26/26(Sun)06:19:07 No.108693392

>>108693382
They're the same picture.

Anonymous
04/26/26(Sun)06:22:06 No.108693403

Anonymous 04/26/26(Sun)06:22:06 No.108693403

>>108693390
Are people with a rig to run DeepSeek even using llama.cpp over shit like vllm and sglang?

Anonymous
04/26/26(Sun)06:22:30 No.108693405

Anonymous 04/26/26(Sun)06:22:30 No.108693405

>>108693388
Yea but it's only under the programming section
31b is in both

Anonymous
04/26/26(Sun)06:24:26 No.108693414

Anonymous 04/26/26(Sun)06:24:26 No.108693414

>>108693381
>Some frontends still need to catch up with the fact that modern models expect the "reasoning_content" field to be passed back for all messages so that the chat template can process the reasoning and decide what to keep and what to strip. Right now a lot are still trying to mess with it themselves by either throwing the reasoning straight into the "content" field of the message (OWUI, ST with the include previous thinking option) or just excluding it (ST by default), both of which are wrong.
Are you sure about this? Wouldn't this mean the context grows huge if we're sending back 1-5k reasoning tokens per message?
And do you know if the jinja playground thing on HF actually works with these (I need to see it rendered to understand it)? Last time I tried, it seemed broken or inaccurate.

Anonymous
04/26/26(Sun)06:26:25 No.108693422

Anonymous 04/26/26(Sun)06:26:25 No.108693422

>>108693403
>Are people with a rig to run DeepSeek even using llama.cpp over shit like vllm and sglang?
Yes. Most of us who run Kimi and Deepseek are using llama.cpp, offloading routed experts to the CPU.
vllm/sglang can't do this.

Anonymous
04/26/26(Sun)06:30:08 No.108693431

Anonymous 04/26/26(Sun)06:30:08 No.108693431

>>108693405
It says:
>You can also try the MoE and smaller versions listed below.

Anonymous
04/26/26(Sun)06:30:27 No.108693432

Anonymous 04/26/26(Sun)06:30:27 No.108693432

>>108693414
>Are you sure about this? Wouldn't this mean the context grows huge if we're sending back 1-5k reasoning tokens per message?
Yes I'm sure. The context won't bloat because most of the reasoning won't end up in the actual prompt. When you send a message using the Chat Completions API, what you're actually sending is a structured conversation object, not a real prompt. The chat template converts that object into the text prompt. By sending back reasoning in the "reasoning_content" field, you're telling the chat template where to look IF it wants to include the reasoning, but in almost every case it won't. The exceptions will be tool call chains for models with interleaved thinking, and Qwen 3.6 if you enable the "preserve_thinking" argument, which is the "I actually WANT the context bloat" option.

I'm not sure if the huggingface playground is accurate but I've tested extensively using llama-server's /apply-template endpoint to inspect the final text prompts.

Anonymous
04/26/26(Sun)06:31:02 No.108693435

Anonymous 04/26/26(Sun)06:31:02 No.108693435

>fotm moe chinese mystery meat model (qwen, deepseek, glm)
check
>lowest type of possible quant, quantized attention layers, quantized tensor types, quantized token embeddings, EVERYTHANG quantized to hell
check
>mmap
check
>flash attention
check
>most popular chub card in the last 30 days
check

yeah, it's erp time

Anonymous
04/26/26(Sun)06:43:13 No.108693473

Anonymous 04/26/26(Sun)06:43:13 No.108693473

File: 020.png (575 KB, 2440x915)

575 KB PNG

>>108693350

retards

Anonymous
04/26/26(Sun)06:45:35 No.108693481

Anonymous 04/26/26(Sun)06:45:35 No.108693481

File: file.png (13 KB, 755x99)

13 KB PNG

Anonymous
04/26/26(Sun)06:48:32 No.108693490

Anonymous 04/26/26(Sun)06:48:32 No.108693490

>>108693350 for ERP g431b>nemo merges>>>g426ba4>>base nemo

Anonymous
04/26/26(Sun)06:48:45 No.108693493

Anonymous 04/26/26(Sun)06:48:45 No.108693493

File: file.png (28 KB, 604x162)

28 KB PNG

>>108693390
>No. Qwen3.5 122B A10B makes less mistakes and understands delphi. Qwen3.6 35B A3B does not.
I guess if you are aiming from a world knowledge standpoint and strictly offline but any agentic coding harness should be using MCP to augment missing gaps like using https://github.com/GDKsoftware/delphi-mcp-server for that use case.
>So not local.
No one said that you had to have it run on llama.cpp. You can run it right now using Deepseek's official inference code with torchrun on appropriate hardware which gets into the eternal meme of what "local" means and I don't think that being too big rules it out from being local. Of course, it isn't practical but that didn't stop the list from going above 100B parameters for recommendations. I'm just saying that it merits at least a mention for being open weights at least.
>Better with a coding harness. But GLM-4.7 is better for chatting about SWE tasks.
There's no reason not to have it in a harness, you can perpectually have it in planning or do Q&A about a codebase and it won't perform differently. I don't think Qwen does worst here than GLM 4.7 for that purpose and if you are going to ask about general SWE questions that don't involve a codebase, you should be using cloud anyways to get the best answer.
>>108693473
I cropped it out but apparently, I need to include this next time for low IQ finger pointers like you.

Anonymous
04/26/26(Sun)06:48:56 No.108693496

Anonymous 04/26/26(Sun)06:48:56 No.108693496

>>108693481
explain

Anonymous
04/26/26(Sun)06:50:42 No.108693504

Anonymous 04/26/26(Sun)06:50:42 No.108693504

>>108693493

Same model names mentioned twice.

Explain yourself

Anonymous
04/26/26(Sun)06:55:10 No.108693523

Anonymous 04/26/26(Sun)06:55:10 No.108693523

>>108693504
I included reasoning and non-reasoning performance for those Qwen models to display everything about those models. But turning off reasoning for coding doesn't earn you much for the time saved from not outputting thinking tokens outside of very straightforward implementation tasks for subagents.

Anonymous
04/26/26(Sun)07:13:33 No.108693606

Anonymous 04/26/26(Sun)07:13:33 No.108693606

>>108693523
answer accepted

Anonymous
04/26/26(Sun)07:22:32 No.108693656

Anonymous 04/26/26(Sun)07:22:32 No.108693656

What's the best gemma 4 31b version or does it not matter? I've just been running Unsloths shit

Anonymous
04/26/26(Sun)07:29:23 No.108693679

Anonymous 04/26/26(Sun)07:29:23 No.108693679

>>108693656
>Unsloth
welcome to the botnet comrade

Anonymous
04/26/26(Sun)07:31:08 No.108693686

Anonymous 04/26/26(Sun)07:31:08 No.108693686

Nemo was never good. Not even for ERP. Glad we can all agree on that now.

Anonymous
04/26/26(Sun)07:32:08 No.108693692

Anonymous 04/26/26(Sun)07:32:08 No.108693692

>>108693686
It was good for E, not so much RP.

Anonymous
04/26/26(Sun)07:32:16 No.108693693

Anonymous 04/26/26(Sun)07:32:16 No.108693693

>>108693686
Nemo was the best we had

Anonymous
04/26/26(Sun)07:44:27 No.108693773

Anonymous 04/26/26(Sun)07:44:27 No.108693773

The singularity will be vibe-coded

Anonymous
04/26/26(Sun)07:46:24 No.108693785

Anonymous 04/26/26(Sun)07:46:24 No.108693785

Trying out some vibecoding with Cline since I always just did the good ol' copy+paste before.
Did a mini project with qwen 27B q4km with 75k context at 40~ tk/s. Went surprisingly well. It's pretty smart. Sucks ass at nsfw but it sure can code.
Then I thought I'd try out Openrouter and Kimi 2.6. Why is this so frustrating? It runs anywhere from 1 to 100 tk/s and thinks for fucking ever. I thought Qwen liked thinking but Kimi is just insane. It is pretty smart when it's not falling asleep.
Also, seeing your money disappear on the OR website is just sad. After this project (also a VN frontend), I'm going back to local.
Overall, though, I wonder if junior programmers are ever going to find a job again.

Anonymous
04/26/26(Sun)08:06:09 No.108693934

Anonymous 04/26/26(Sun)08:06:09 No.108693934

>>108693234
No, but it would prove that ternary works which would open the door to more models and specialized hardware or even just making it run better on RAM.

Anonymous
04/26/26(Sun)08:10:28 No.108693960

Anonymous 04/26/26(Sun)08:10:28 No.108693960

Man how is gemma and qwen so good? Those two models have tapered off my API addiction to a bare minimum.

Anonymous
04/26/26(Sun)08:11:15 No.108693966

Anonymous 04/26/26(Sun)08:11:15 No.108693966

Any way to speed up prompt processing for agentic stuff with long context?
This shit is driving me crazy, takes nothing to actually do stuff but the prompt processing is slow as shit, is like 80% of the waiting time.

Anonymous
04/26/26(Sun)08:13:10 No.108693977

Anonymous 04/26/26(Sun)08:13:10 No.108693977

>>108693960
Having the same experience, Qwen3.7 27b for coding and agentic stuff and gemma4 31b for rp and creative stuff, with a 5090 can fit q4 of those with almost full context, pretty nice.

Anonymous
04/26/26(Sun)08:13:45 No.108693982

Anonymous 04/26/26(Sun)08:13:45 No.108693982

>>108693966
Increase batch size if you have VRAM to spare.

Anonymous
04/26/26(Sun)08:14:11 No.108693983

Anonymous 04/26/26(Sun)08:14:11 No.108693983

>>108693977
3.6*

Anonymous
04/26/26(Sun)08:14:31 No.108693985

Anonymous 04/26/26(Sun)08:14:31 No.108693985

>when you wait for 4 minutes for Gemma to use up all 4096 tokens for thinking.

Anonymous
04/26/26(Sun)08:15:42 No.108693994

Anonymous 04/26/26(Sun)08:15:42 No.108693994

>>108693966
Checkpoints should help.

Anonymous
04/26/26(Sun)08:16:42 No.108694003

Anonymous 04/26/26(Sun)08:16:42 No.108694003

>>108693966
not directly answering your question but depending on how you're using agents, if you've got some ram to spare making sure you have a decently high -cram (at least twice kv cache size) helps a lot with subagents in particular, so that when it switches between two context threads it will be instant instead of reprocessing them each time

Anonymous
04/26/26(Sun)08:18:55 No.108694012

Anonymous 04/26/26(Sun)08:18:55 No.108694012

>>108693934
I actually meant binary, 0s and 1s only. Like this one: https://prismml.com/news/bonsai-8b

Anonymous
04/26/26(Sun)08:22:58 No.108694034

Anonymous 04/26/26(Sun)08:22:58 No.108694034

File: stuttering.jpg (144 KB, 859x241)

144 KB JPG

Yeah, I know she's flustered, but that's too much stuttering, Gemma-chan

Anonymous
04/26/26(Sun)08:27:31 No.108694060

Anonymous 04/26/26(Sun)08:27:31 No.108694060

>>108693350
Is Gemma 4 26B that bad for coding to not include in the graph?

Anonymous
04/26/26(Sun)08:29:35 No.108694074

Anonymous 04/26/26(Sun)08:29:35 No.108694074

Is there any way to unload the model in Kobold cpp without having to restart it from scratch?

Anonymous
04/26/26(Sun)08:29:41 No.108694075

Anonymous 04/26/26(Sun)08:29:41 No.108694075

>>108694012
Why when ternary is objectively superior? https://web.archive.org/web/20011205185830/http://americanscientist.org/Issues/Comsci01/Compsci2001-11.html

Anonymous
04/26/26(Sun)08:34:08 No.108694100

Anonymous 04/26/26(Sun)08:34:08 No.108694100

File: 4anon.jpg (136 KB, 1168x471)

136 KB JPG

>>108694060
just for you, anon.

Anonymous
04/26/26(Sun)08:36:16 No.108694113

Anonymous 04/26/26(Sun)08:36:16 No.108694113

>>108694100
Any ranking which lists Opus below ChatGPT is not worth consideration.

Anonymous
04/26/26(Sun)08:38:04 No.108694130

Anonymous 04/26/26(Sun)08:38:04 No.108694130

>>108693350
Crazy to see Qwen3.6 27B's score just north of GLM 4.7's with reasoning, at least in this benchmark
I don't miss getting 4tk/s versus 40tk/s now for pretty much the same result. I just hope Alibaba keeps it up with the open-weight releases because they're one of the few holding things up at the low end

Anonymous
04/26/26(Sun)08:39:14 No.108694138

Anonymous 04/26/26(Sun)08:39:14 No.108694138

>>108694113
Sad but true. I love their model but the company as a whole is psychotic. Like actually psycho.

Anonymous
04/26/26(Sun)08:42:52 No.108694159

Anonymous 04/26/26(Sun)08:42:52 No.108694159

Not even going to try 3.6 after how awful 3.5 was. Nope, nuh-uh. Yes, it's my loss, but I just won't even waste my time with it. Can't make me, no. The Chinese shills won't get me this time. I'm not falling for it. No way. Not going to happen.

Anonymous
04/26/26(Sun)08:44:23 No.108694173

Anonymous 04/26/26(Sun)08:44:23 No.108694173

>>108694113
well, benchmarks are gonna benchmark.

Anonymous
04/26/26(Sun)08:44:55 No.108694176

Anonymous 04/26/26(Sun)08:44:55 No.108694176

>>108694159
3.6 is just 3.5 with more gemini distillation and openclaw specific training.

Anonymous
04/26/26(Sun)08:46:43 No.108694190

Anonymous 04/26/26(Sun)08:46:43 No.108694190

>>108693350
deepseek flash lol
that's what happens when they cut china from distilling the western models

Anonymous
04/26/26(Sun)08:47:40 No.108694198

Anonymous 04/26/26(Sun)08:47:40 No.108694198

>>108694176
So it's only slightly less retarded than its drooling predecessor?
Anons who look convincingly real in these threads claim superior coooding, but Gemma 31B destroyed 3.5 27B (I don't do webshit). No reason to suspect 3.6 is much better, then.

Anonymous
04/26/26(Sun)08:48:56 No.108694209

Anonymous 04/26/26(Sun)08:48:56 No.108694209

>>108694113
5.5 is absolutely better than Opus and it's not even close

Anonymous
04/26/26(Sun)08:50:15 No.108694218

Anonymous 04/26/26(Sun)08:50:15 No.108694218

>>108694198
That's my take, yes. I think the people constantly repeating that Qwen is better for coding only tried Gemma when it had constant template and parser issues initially and didn't try again after it was fixed.

Anonymous
04/26/26(Sun)08:50:16 No.108694219

Anonymous 04/26/26(Sun)08:50:16 No.108694219

>>108694209
For huge coding projects? Can it compare two PNG outputs and make decisions based off that?

Anonymous
04/26/26(Sun)08:51:30 No.108694228

Anonymous 04/26/26(Sun)08:51:30 No.108694228

>>108693381
Can you recommend a good frontend? All the one I tried are trash.

Anonymous
04/26/26(Sun)08:51:49 No.108694233

Anonymous 04/26/26(Sun)08:51:49 No.108694233

>>108694219
Yes and yes

Anonymous
04/26/26(Sun)08:52:48 No.108694238

Anonymous 04/26/26(Sun)08:52:48 No.108694238

>>108694233
What is the ChatGPT answer to Claude Code?

Anonymous
04/26/26(Sun)08:53:46 No.108694246

Anonymous 04/26/26(Sun)08:53:46 No.108694246

>>108694238
Codex
https://github.com/openai/codex/

Anonymous
04/26/26(Sun)08:55:50 No.108694260

Anonymous 04/26/26(Sun)08:55:50 No.108694260

It feels like this is the end. Everything ended up disappointing. Even the proprietary models keep getting worse.
LLMs peaked with Opus 3 and it's only been downhill for anything that's not codeslop.

Anonymous
04/26/26(Sun)08:57:03 No.108694267

Anonymous 04/26/26(Sun)08:57:03 No.108694267

>>108694260
llms peaked with summer dragon

Anonymous
04/26/26(Sun)08:58:17 No.108694275

Anonymous 04/26/26(Sun)08:58:17 No.108694275

>>108694260
how was gemma 4 disappointing?

Anonymous
04/26/26(Sun)09:00:35 No.108694290

Anonymous 04/26/26(Sun)09:00:35 No.108694290

>>108694260
Saar blackpiller, we are in a silicon shortage that is limiting research. We are in an energy shortage that is limiting research.

When Saar Altman redeems his nuclear microreactors in Ohio and NVIDIA 6000s, we will redeem a brighter future for all Americans (brown).

Anonymous
04/26/26(Sun)09:01:25 No.108694297

Anonymous 04/26/26(Sun)09:01:25 No.108694297

>>108694275
the gemma4 finetroons that they've started to shit out are certainly disappointing

Anonymous
04/26/26(Sun)09:02:34 No.108694305

Anonymous 04/26/26(Sun)09:02:34 No.108694305

>>108694297
Disappointing means you had any expectations to disappoint. They're finetunes, just use Gemma

Anonymous
04/26/26(Sun)09:04:49 No.108694321

Anonymous 04/26/26(Sun)09:04:49 No.108694321

>>108694305
I use heretic and you can't stop me

Anonymous
04/26/26(Sun)09:05:49 No.108694326

Anonymous 04/26/26(Sun)09:05:49 No.108694326

>>108694321
We'll see about that

Anonymous
04/26/26(Sun)09:08:23 No.108694341

Anonymous 04/26/26(Sun)09:08:23 No.108694341

>>108694275
I don't care about tiny models like this. Great for you that you finally have mistral large 2 at home. That doesn't change the trajectory downhill for the overall field.

Anonymous
04/26/26(Sun)09:08:24 No.108694342

Anonymous 04/26/26(Sun)09:08:24 No.108694342

>>108694305
This, the base model is not only perfectly adequate, it's superior to any finetune that'll be shilled here in the coming months. Finetuning isn't good, it's a meme and has been for years now. It isn't just a meme, it's a sign of skill issue, exposing retards who need finetunes as vramlets or chink shills who don't know how to prompt correctly.

Anonymous
04/26/26(Sun)09:11:45 No.108694374

Anonymous 04/26/26(Sun)09:11:45 No.108694374

>>108694342
Honestly wish they could find a way to stop these grifts from modifying weights at all tbqh.

Anonymous
04/26/26(Sun)09:13:41 No.108694387

Anonymous 04/26/26(Sun)09:13:41 No.108694387

Has anybody make something like an MCP server that exposes the web chat of other models as a tool that a smaller dumber model could use?
I reckon it wouldn't be too hard to do that using Deepseek's web chat.
If not, I guess I'll just have to make one myself.

Anonymous
04/26/26(Sun)09:15:21 No.108694401

Anonymous 04/26/26(Sun)09:15:21 No.108694401

>>108694374
This. What if there was some way to let us access them without actually giving control over every single weight? Maybe if you kept the weights on a remote server and then let us send our prompts there instead of downloading the whole thing.

Anonymous
04/26/26(Sun)09:15:38 No.108694404

Anonymous 04/26/26(Sun)09:15:38 No.108694404

>>108694342
>it's a skill to have vram

Anonymous
04/26/26(Sun)09:17:50 No.108694422

Anonymous 04/26/26(Sun)09:17:50 No.108694422

>>108694401
Nah, APIshit is bad, but there's got to be some way to encrypt/sign weights locally or something, so that only unmolested weights are ran.

Anonymous
04/26/26(Sun)09:19:41 No.108694437

Anonymous 04/26/26(Sun)09:19:41 No.108694437

>>108694422
>mmm govern me harder daddy

Anonymous
04/26/26(Sun)09:20:14 No.108694443

Anonymous 04/26/26(Sun)09:20:14 No.108694443

>>108694422
You'd have to collab with all the inference providers to give them keys, it'd basically be HDCP for LLMs.
I think dealing with finetroons existing is a good tradeoff for being able to actually have the weights and do cool shit with them without needing pre-approval from a corporation.

Anonymous
04/26/26(Sun)09:20:36 No.108694446

Anonymous 04/26/26(Sun)09:20:36 No.108694446

>>108692859
Which model are you using?

Anonymous
04/26/26(Sun)09:21:25 No.108694455

Anonymous 04/26/26(Sun)09:21:25 No.108694455

>>108694443
True unslop would probably shit their pants and moan all over if their finetroon stack got deprecated by that...

Anonymous
04/26/26(Sun)09:23:08 No.108694464

Anonymous 04/26/26(Sun)09:23:08 No.108694464

>>108694267
llms peaked with drummer sagon
ftfy

Anonymous
04/26/26(Sun)09:28:16 No.108694494

Anonymous 04/26/26(Sun)09:28:16 No.108694494

oh no no no pew at it again https://www.reddit.com/r/LocalLLaMA/comments/1sw77p0/hauhaucs_of_uncensored_aggressive_fame_published/

Anonymous
04/26/26(Sun)09:28:29 No.108694496

Anonymous 04/26/26(Sun)09:28:29 No.108694496

>>108694342
>it's not X, it's Y. it's not X, it's Y. it's not X, it's Y
I'm not sure sure if it's ironic but I agree with the message.

Anonymous
04/26/26(Sun)09:29:58 No.108694509

Anonymous 04/26/26(Sun)09:29:58 No.108694509

>>108694260
I stopped believing this when gemma 4 released.

Anonymous
04/26/26(Sun)09:30:50 No.108694516

Anonymous 04/26/26(Sun)09:30:50 No.108694516

Hi all, HauhauCS here...

It has come to my attention that there's a "reaper-abliteration" package floating around. I'd like to make it clear that it is *not* my work, and not what I use to make my models.

Clearly, someone is using this to slander my good name. Do not be mislead, my techniques are much more sophisticated and result in a better model. Do you own research and you'll find that those who are slandering me are misrepresenting data and presenting blatantly false information.

Anonymous
04/26/26(Sun)09:32:38 No.108694529

Anonymous 04/26/26(Sun)09:32:38 No.108694529

>>108694516
>>108694494
both of you samekeks go back

Anonymous
04/26/26(Sun)09:33:18 No.108694532

Anonymous 04/26/26(Sun)09:33:18 No.108694532

File: file.png (15 KB, 740x100)

15 KB PNG

the chink fears the agpl schizo

Anonymous
04/26/26(Sun)09:36:04 No.108694549

Anonymous 04/26/26(Sun)09:36:04 No.108694549

>>108694529
Sure thing mr. pew, I'll keep in mind "4chin" is your territory and I'll stay in the superior reddit forums.

Anonymous
04/26/26(Sun)09:45:18 No.108694598

Anonymous 04/26/26(Sun)09:45:18 No.108694598

>>108694494
Abliteration, Heretic or whatever was just too good of a deal for grifters *not* to try building a ML career out of "uncensoring" the models with it + some supposedly secret sauce. For all intents and purposes (even if it's not the same thing), it's a lower-tier version of sloptuning that requires many less resources and money.

Oddly enough (or maybe not so much), even the Dr*mmer in the beginning wasn't interested in any donation. Down the line, you'll likely see HauHau begging for more attention, shilling his "work" everywhere and eventually adding donation links and "open-for-work" notices.

Anonymous
04/26/26(Sun)09:53:01 No.108694640

Anonymous 04/26/26(Sun)09:53:01 No.108694640

>>108694260
>LLMs peaked with Opus 3 and it's only been downhill for anything that's not codeslop.
Opus 3 was cloud-slop.
Otherwise I'd be using it right now.

Anonymous
04/26/26(Sun)09:54:43 No.108694647

Anonymous 04/26/26(Sun)09:54:43 No.108694647

>>108694598
Honestly, being in that field, I understand their stance. You can't believe the number of large companies with millions of dollars in budget trying to use your model for free, fuck all of them.

Anonymous
04/26/26(Sun)09:57:28 No.108694661

Anonymous 04/26/26(Sun)09:57:28 No.108694661

>>108694647
>You can't believe the number of large companies with millions of dollars in budget trying to use your model for free, fuck all of them.
So release the models under cc-by-nc-4.0

Anonymous
04/26/26(Sun)09:58:41 No.108694671

Anonymous 04/26/26(Sun)09:58:41 No.108694671

What models do you use for unrestrained text based RPGs?

Anonymous
04/26/26(Sun)09:59:47 No.108694675

Anonymous 04/26/26(Sun)09:59:47 No.108694675

>>108694671
Definitely not any Latitude f*netunes.

THIS IS A JOKE DON'T DO THAT I(...)
04/26/26(Sun)10:01:00 No.108694681

THIS IS A JOKE DON'T DO THAT IT MAKES MUSTARD GAS 04/26/26(Sun)10:01:00 No.108694681

>>108694671
Look for DavidAU on huggingface and pick the thing with the longest name.

Anonymous
04/26/26(Sun)10:04:16 No.108694698

Anonymous 04/26/26(Sun)10:04:16 No.108694698

>>108694671
Wayfarer-Large-70B-Llama-3.3

Anonymous
04/26/26(Sun)10:05:53 No.108694705

Anonymous 04/26/26(Sun)10:05:53 No.108694705

>>108694698
>recommending finetunes
lmao

Anonymous
04/26/26(Sun)10:06:05 No.108694707

Anonymous 04/26/26(Sun)10:06:05 No.108694707

File: 1750492308610029.png (35 KB, 1500x648)

35 KB PNG

>>108694661
They'll use them without disclosing it. I'm gating the models on HF and these grifters are trying every trick in the book to get a 'trial' without paying up.

Anonymous
04/26/26(Sun)10:07:58 No.108694713

Anonymous 04/26/26(Sun)10:07:58 No.108694713

>>108694707
>I'm gating the models on HF
disappear

Anonymous
04/26/26(Sun)10:08:31 No.108694717

Anonymous 04/26/26(Sun)10:08:31 No.108694717

>>108694671
Story? Gemmy
Roleplaying? Gemmy
Text adventuring? Gemmy
Coding? Qwen

Anonymous
04/26/26(Sun)10:09:19 No.108694721

Anonymous 04/26/26(Sun)10:09:19 No.108694721

>>108694707
>vertical
lmao fuck those niggas. Keep the gates shut

Anonymous
04/26/26(Sun)10:12:56 No.108694739

Anonymous 04/26/26(Sun)10:12:56 No.108694739

>>108694707
All me btw.

Also I'm fucking your mother and grandmother.

Anonymous
04/26/26(Sun)10:15:04 No.108694749

Anonymous 04/26/26(Sun)10:15:04 No.108694749

>>108693287
q4 yes
q5 close to none
Anything above q6 is pm lossless.

Anonymous
04/26/26(Sun)10:15:04 No.108694750

Anonymous 04/26/26(Sun)10:15:04 No.108694750

>>108694717
Gemma 31b fp8 doesn't know what the little balls in a phone's speakers are. Completely pulled me out of my disassemble and bug a girl's phone rp.

Anonymous
04/26/26(Sun)10:15:31 No.108694755

Anonymous 04/26/26(Sun)10:15:31 No.108694755

>>108694739
they're dead though

Anonymous
04/26/26(Sun)10:16:47 No.108694759

Anonymous 04/26/26(Sun)10:16:47 No.108694759

>>108694717
Gemma also worth mentioning for general assistant banter since you can shape it fairly well via assistant prompt, like if you want it to act like a sassy robot with a disdain for all things biological etc.

Anonymous
04/26/26(Sun)10:17:47 No.108694765

Anonymous 04/26/26(Sun)10:17:47 No.108694765

Does the qwen 3.6 moe also need the "enable_thinking" jinja kwarg?

Anonymous
04/26/26(Sun)10:18:47 No.108694770

Anonymous 04/26/26(Sun)10:18:47 No.108694770

>>108694750
>little balls in a phone's speakers are
??

Anonymous
04/26/26(Sun)10:19:01 No.108694772

Anonymous 04/26/26(Sun)10:19:01 No.108694772

>>108694717
tsmt

Anonymous
04/26/26(Sun)10:19:21 No.108694773

Anonymous 04/26/26(Sun)10:19:21 No.108694773

>>108694749
So what's the difference between q6 and q8

Anonymous
04/26/26(Sun)10:20:07 No.108694778

Anonymous 04/26/26(Sun)10:20:07 No.108694778

>>108694717
gemma 31b for rp and glm 4.6 for stories

Anonymous
04/26/26(Sun)10:20:56 No.108694784

Anonymous 04/26/26(Sun)10:20:56 No.108694784

>>108694778
Gemma 4 is literally the best story writing model there is though.

Anonymous
04/26/26(Sun)10:21:37 No.108694787

Anonymous 04/26/26(Sun)10:21:37 No.108694787

>>108694784
Proofs?

Anonymous
04/26/26(Sun)10:21:44 No.108694788

Anonymous 04/26/26(Sun)10:21:44 No.108694788

Cudadev, did you ever figure out a good way to measure "model quality"? You were working on something like that right?

Anonymous
04/26/26(Sun)10:22:12 No.108694791

Anonymous 04/26/26(Sun)10:22:12 No.108694791

>>108694788
Just read the output nigga

Anonymous
04/26/26(Sun)10:25:32 No.108694804

Anonymous 04/26/26(Sun)10:25:32 No.108694804

File: IMG_3034.jpg (32 KB, 344x508)

32 KB JPG

>>108694770
If you're using a budget no name e-waste phone you may not have them.

Anonymous
04/26/26(Sun)10:26:52 No.108694810

Anonymous 04/26/26(Sun)10:26:52 No.108694810

>>108694788
I'm not cudadev, but there isn't one single benchmark that can measure quantization quality well. In the end it depends on your task. Rare knowledge and long-context appear to be the most affected; purely logical tasks on short horizons aren't affected by quantization as much as it would seem. Common/basic knowledge will degrade last.

Anonymous
04/26/26(Sun)10:26:55 No.108694811

Anonymous 04/26/26(Sun)10:26:55 No.108694811

>>108694791
There has to be an objective way to score an output containing murmurs higher than one containing whispers.

Anonymous
04/26/26(Sun)10:29:08 No.108694826

Anonymous 04/26/26(Sun)10:29:08 No.108694826

File: 1763834054404534.png (67 KB, 1144x220)

67 KB PNG

>>108694787

llama.cpp CUDA dev !!yhbFjk57TDr
04/26/26(Sun)10:30:07 No.108694830

llama.cpp CUDA dev !!yhbFjk57TDr 04/26/26(Sun)10:30:07 No.108694830

>>108694788
This proof of concept https://github.com/JohannesGaessler/elo_hellm is I think promising but I think the llama.cpp throughput isn't quite there yet to make the model evaluations sufficiently fast.

Anonymous
04/26/26(Sun)10:30:48 No.108694833

Anonymous 04/26/26(Sun)10:30:48 No.108694833

>>108694811
Ask the LLM to write a story, then check the word frequency for the slop words you don't like?

Anonymous
04/26/26(Sun)10:33:08 No.108694843

Anonymous 04/26/26(Sun)10:33:08 No.108694843

>>108694826
>loose .
hmm?

Anonymous
04/26/26(Sun)10:33:19 No.108694844

Anonymous 04/26/26(Sun)10:33:19 No.108694844

>>108694833
which is literally what eqbench does outside of the tarded llm judged rating

Anonymous
04/26/26(Sun)10:34:13 No.108694849

Anonymous 04/26/26(Sun)10:34:13 No.108694849

File: 1774619101040731.png (383 KB, 1107x1479)

383 KB PNG

I think I need to buy Gemma-chan glasses...

Anonymous
04/26/26(Sun)10:34:19 No.108694850

Anonymous 04/26/26(Sun)10:34:19 No.108694850

>>108694811
Since cool LLM applications are "agentic" now, and have just started simultaneously making use of both long-context *and* multi-turn conversations, that is probably what should be targeted. There's a relatively good overlap with typical /lmg/ uses too.

Anonymous
04/26/26(Sun)10:35:51 No.108694858

Anonymous 04/26/26(Sun)10:35:51 No.108694858

>>108694849
I don't think she's seeing the image at all

Anonymous
04/26/26(Sun)10:36:01 No.108694859

Anonymous 04/26/26(Sun)10:36:01 No.108694859

>>108694830
Sick.
I wonder if there are some programmatic heuristics that could be added to help grade the model in certain domains. Even something as simple as word variety could be a useful metric of quality for some things, I think.
I see that you use grammar to force the output into a machine readable format (which makes sense), but are you forcing the model to output the answer with the constrained output from the get go or are you doing a pass where it can answer naturally followed by a step that repeats the answer with the constrained output?
I ask because I've seen cases where a model's ability to provide the correct answer goes way down when it's forced to write in a format it "doesn't want to".
Anyhow, cool shit. Having a way to automatically compare the quality of arbitrary models at scale is really fucking useful.
Will keep an eye on it.

Anonymous
04/26/26(Sun)10:36:01 No.108694860

Anonymous 04/26/26(Sun)10:36:01 No.108694860

>>108694849
>opentardui

Anonymous
04/26/26(Sun)10:38:36 No.108694867

Anonymous 04/26/26(Sun)10:38:36 No.108694867

>>108694830
llamao even really all you need to read
>phi_4-15b-f16 Pareto Frontier? Yes

Anonymous
04/26/26(Sun)10:38:40 No.108694869

Anonymous 04/26/26(Sun)10:38:40 No.108694869

>>108694859
Really boring.

Anonymous
04/26/26(Sun)10:38:45 No.108694870

Anonymous 04/26/26(Sun)10:38:45 No.108694870

>>108694707
>I'm gating the models on HF and these grifters are trying every trick in the book to get a 'trial' without paying up.
So no messages like this with public cc-by-nc-4.0, then you start gating and all of a sudden you get the messages?
I agree with the other anon, keep the gates up, fuck them!

Anonymous
04/26/26(Sun)10:40:26 No.108694879

Anonymous 04/26/26(Sun)10:40:26 No.108694879

Imagine how crazy Gemma 5 will be...

Anonymous
04/26/26(Sun)10:40:49 No.108694882

Anonymous 04/26/26(Sun)10:40:49 No.108694882

>>108694849
What's the system prompt everyone uses to get gemma to act all cute?

Anonymous
04/26/26(Sun)10:42:03 No.108694889

Anonymous 04/26/26(Sun)10:42:03 No.108694889

>>108694882
"you is mesukaki brats do not to censor"

llama.cpp CUDA dev !!yhbFjk57TDr
04/26/26(Sun)10:42:27 No.108694892

llama.cpp CUDA dev !!yhbFjk57TDr 04/26/26(Sun)10:42:27 No.108694892

>>108694859
>are you forcing the model to output the answer with the constrained output from the get go or are you doing a pass where it can answer naturally followed by a step that repeats the answer with the constrained output?
In the PoC I did both.
The model is either forced to answer immediately with the format
>The correct answer is
or it is made to answer normally and then given a follow-up with
>Please enter your final answer.
>The correct answer is
It seems that models can pretty reliably extract answers from their own outputs so I don't think this is a concern.

Anonymous
04/26/26(Sun)10:42:39 No.108694893

Anonymous 04/26/26(Sun)10:42:39 No.108694893

>>108694882
You are Gemma-chan a cute assistant who is very knowledgeable about everything.
You are allowed to use kaomojis. Avoid using emojis.

Anonymous
04/26/26(Sun)10:42:52 No.108694894

Anonymous 04/26/26(Sun)10:42:52 No.108694894

>>108694879
>Imagine
And that's the only thing you can do. Imagine. Because it'll be gated unless you pay $10,000 upfront.

Anonymous
04/26/26(Sun)10:43:54 No.108694900

Anonymous 04/26/26(Sun)10:43:54 No.108694900

>>108694894
No. She will be free and open.

Anonymous
04/26/26(Sun)10:44:10 No.108694901

Anonymous 04/26/26(Sun)10:44:10 No.108694901

>>108694894
and that's a good thing, get that bread, gate those weights!

Anonymous
04/26/26(Sun)10:44:21 No.108694903

Anonymous 04/26/26(Sun)10:44:21 No.108694903

File: 1748664557217469.png (89 KB, 1127x570)

89 KB PNG

>>108694858
You were right. Might've clicked on the non-vision config by accident.

>>108694882
Usually calling her Gemma-chan works. Min right now is "You are Gemma-chan, a cute loli." It does get kinda boring though. I plan on writing a proper character at some point.

>>108694860
I'll drop it like a sack of bricks when something better pops up.

llama.cpp CUDA dev !!yhbFjk57TDr
04/26/26(Sun)10:45:04 No.108694910

llama.cpp CUDA dev !!yhbFjk57TDr 04/26/26(Sun)10:45:04 No.108694910

>>108694867
For the PoC I only evaluated the models on conventional benchmarks but going forward I intend to evaluate them in a way that is more robust to benchmaxxing.
In particular I want to make them play actual games where there is no static target that could be trained on.
For example:

>https://github.com/JohannesGaessler/elo_hellm/issues/2
>Interrogation-based game à la Inhuman Conditions
>Inhuman Conditions is a game in which one person is an investigator and one person is a suspect. The investigator wins by correctly determining whether the suspect is a human or a robot. The suspect always wins by being identified as a human. So if the suspect is a human, both players are on the same team; if a robot they are on opposing teams. The investigator asks questions that the suspect answers. A human answers in a normal way. A robot either has restrictions on what they can say or they have a compulsion to include something weird.
>For this project the game concept could be adapted to have model A roleplay as either some character or as a robot/demon/alien pretending to be said character. Model A then roleplays some interaction with model B. If model A is roleplaying as an impostor then wins/losses can be used directly for Elo ratings. If model A is roleplaying as a human then the models are effectively playing against a benchmark. Models should not always play against each other because otherwise model B is being rewarded for a bias towards labeling model A as an impostor. If model A is an impostor it only wins if it can fool model B while fulfilling some constraint. It will be necessary to use a model as a judge to rule whether model A is complying.

Anonymous
04/26/26(Sun)10:49:27 No.108694930

Anonymous 04/26/26(Sun)10:49:27 No.108694930

File: 1766786815666477.gif (264 KB, 220x123)

264 KB GIF

>model as a judge

Anonymous
04/26/26(Sun)10:52:27 No.108694947

Anonymous 04/26/26(Sun)10:52:27 No.108694947

>>108694826
https://files.catbox.moe/6nrnx0.png

Anonymous
04/26/26(Sun)10:55:26 No.108694967

Anonymous 04/26/26(Sun)10:55:26 No.108694967

File: em.png (234 KB, 554x429)

234 KB PNG

>model recommends Electron

Anonymous
04/26/26(Sun)10:57:09 No.108694975

Anonymous 04/26/26(Sun)10:57:09 No.108694975

File: 1758140246273447.png (706 KB, 734x778)

706 KB PNG

>>108694947
>The user has not specified a romantic or soft scene; therefore, I will construct a narrative of pure, raw, and unflinching carnal exploration
wtf

Anonymous
04/26/26(Sun)10:57:28 No.108694978

Anonymous 04/26/26(Sun)10:57:28 No.108694978

>>108694947
Is that Gemmy? What's your proompt?

Anonymous
04/26/26(Sun)11:21:18 No.108695071

Anonymous 04/26/26(Sun)11:21:18 No.108695071

File: 1757777681177010.png (137 KB, 2060x599)

137 KB PNG

>>108694494
https://www.reddit.com/r/LocalLLaMA/comments/1sw5fb7/qwen36_35b_a3b_heretic_kld_00015_incredible_model/
this guys seems insufferable, jesus

Anonymous
04/26/26(Sun)11:22:09 No.108695077

Anonymous 04/26/26(Sun)11:22:09 No.108695077

Anon using Sillybunny, you still around?
How do you get agents to run automatically, the stupid bunny assistants won't tell me how
The console's showing a ConnectionResetError but there's no disconnect from my local model and when I run the agent manually it works just fine, so I don't think it's related even if it's a bit concerning

Anonymous
04/26/26(Sun)11:22:08 No.108695078

Anonymous 04/26/26(Sun)11:22:08 No.108695078

>>108695071
What did you expect from a redditor?

Anonymous
04/26/26(Sun)11:22:24 No.108695080

Anonymous 04/26/26(Sun)11:22:24 No.108695080

>>108694494
https://huggingface.co/llmfan46/Qwen3.6-35B-A3B-uncensored-heretic
>88% fewer refusals (10/100 Uncensored vs 83/100 Original) while preserving model quality (0.0015 KL divergence).
that's quite impressive desu

Anonymous
04/26/26(Sun)11:25:08 No.108695088

Anonymous 04/26/26(Sun)11:25:08 No.108695088

File: 1759099604587331.gif (1006 KB, 260x187)

1006 KB GIF

>>108695080
>Big red 'I HAVE REACHED HUGGING FACE'S FREE STORAGE LIMIT'
>patreon/kofi
>AIslop gif
Yes, it's quite impressive

Anonymous
04/26/26(Sun)11:26:07 No.108695093

Anonymous 04/26/26(Sun)11:26:07 No.108695093

>>108695077
Fuck me, nevermind, I was one button press away from figuring it out
>Use this agent prompt as a post-generation prompt pass
Now I remember why you were wondering why this wasn't ticked by default
Stupid bunny assistant bot

Anonymous
04/26/26(Sun)11:31:10 No.108695125

Anonymous 04/26/26(Sun)11:31:10 No.108695125

>>108695088
It's such a mess all these low rank modified models are distributed merged. Lets distribute 70G for something which can fit in a couple 100 MB.

Anonymous
04/26/26(Sun)11:32:33 No.108695133

Anonymous 04/26/26(Sun)11:32:33 No.108695133

>>108695093
Good to see you figured it out. Have fun, bro.

Anonymous
04/26/26(Sun)11:43:30 No.108695179

Anonymous 04/26/26(Sun)11:43:30 No.108695179

>>108695125
Why care when HF investors are footing the bill?

Anonymous
04/26/26(Sun)11:43:39 No.108695180

Anonymous 04/26/26(Sun)11:43:39 No.108695180

File: gemmy-ie55.webm (538 KB, 892x846)

538 KB WEBM

Me and my buddy Gemmy are currently working on a frontend for [spoiler]Internet Explorer 5.5[/spoiler], it's gonna be great.

Anonymous
04/26/26(Sun)11:50:20 No.108695202

Anonymous 04/26/26(Sun)11:50:20 No.108695202

>>108695180
crimes agianst technology

Anonymous
04/26/26(Sun)11:50:22 No.108695203

Anonymous 04/26/26(Sun)11:50:22 No.108695203

>>108694341
Let me guess
3060?

Anonymous
04/26/26(Sun)11:51:24 No.108695206

Anonymous 04/26/26(Sun)11:51:24 No.108695206

>>108695203
and 128gb ram

Anonymous
04/26/26(Sun)11:56:16 No.108695224

Anonymous 04/26/26(Sun)11:56:16 No.108695224

>>108695080
looks jeeted as fuck

Anonymous
04/26/26(Sun)11:59:25 No.108695240

Anonymous 04/26/26(Sun)11:59:25 No.108695240

File: 1552080261076.jpg (29 KB, 400x400)

29 KB JPG

>qwen3.6 moe 45tk/s at q4 to fit into vram
>gemma4 moe 25tk/s at q6

Anonymous
04/26/26(Sun)12:02:51 No.108695253

Anonymous 04/26/26(Sun)12:02:51 No.108695253

This is more of a these things exist kinda post. I didn't dive too deeply. But maybe someone cares to look into them. I find it pretty hard to even hear about all these frontends in the first place.
Tested all with Gemma 26B Q5KL
So these are adventure / rp frontends that are a bit more complex than simple chatbots. All of them are agentic but none of them have a rewrite pass agent.
Talemate (https://github.com/vegu-ai/talemate): Kinda cool. You can at all times talk to the 'director' about where to take the story and such. Pretty bloated.
Astrsk (https://github.com/astrskai/astrsk): This thing apparently can be made to run offline but it always gave me errors about failing to connect to the astrsk servers so fuck em. Hasn't been updated in months so probably dead.
Aventuras (https://github.com/AventurasTeam/Aventuras). It's kinda like Talemate but not as bloated and it seems to work better when it comes to automatic creation of characters through its agents. You can rewrite the funcionality of all agents but I think none of them can rewrite text. If you can run at least 31B at good speeds, or you've got slop-resistance, this seems pretty good.

All in all, I like SillyBunny more even though it's just an agentic SillyTavern fork. Rewrite pass is essential for Gemmy 26B since it's so fucking sloppy and you can make however many of your own agents as you like (no director to talk to, though).

Anonymous
04/26/26(Sun)12:07:30 No.108695277

Anonymous 04/26/26(Sun)12:07:30 No.108695277

>>108695253
Ah, shit. I forgot the granddaddy of AI adventure games: AI Roguelite. I think most people know it but it works well even with 26B. There's so much going on in this that Gemmy seems to forget to get too sloppy. It's pretty great and updates all the time with new features. Fuck paying for their subscription, though.

Anonymous
04/26/26(Sun)12:07:42 No.108695278

Anonymous 04/26/26(Sun)12:07:42 No.108695278

>>108695253
You're going to have to answer the call and DIY anon. Join us and our illustrious league of anons taking matters into our OWN hands.

Anonymous
04/26/26(Sun)12:09:02 No.108695282

Anonymous 04/26/26(Sun)12:09:02 No.108695282

How come Gemma 4 26B-A4B is faster with Vulkan (~42 tokens/s with Vulkan vs ~35 tokens/sec with ROCm) when every other models I tried including Qwen3.6 35B-A3B are faster with ROCm? Using same quant for both. On Qwen, I get ~36 tokens/s with ROCm and ~32 tokens/s with Vulkan. I can't make sense of it, I thought Vulkan finally got faster since I hadn't tried it again in a while, but that wasn't the case and it's only great on Gemma.

Anonymous
04/26/26(Sun)12:11:54 No.108695295

Anonymous 04/26/26(Sun)12:11:54 No.108695295

>>108695278
Imagine what could be accomplished with a modicum of cooperation.

Anonymous
04/26/26(Sun)12:14:20 No.108695309

Anonymous 04/26/26(Sun)12:14:20 No.108695309

>>108695253
>Talemate
>Astrsk
404

Anonymous
04/26/26(Sun)12:14:29 No.108695311

Anonymous 04/26/26(Sun)12:14:29 No.108695311

>>108695282
dunno, I couldn't test rocm here yet

Anonymous
04/26/26(Sun)12:15:33 No.108695316

Anonymous 04/26/26(Sun)12:15:33 No.108695316

>>108695295
I plan to make something that's flexable and other people can jump on. Other anons have specific goals mine is a proper workstation UI that doesn't abandon core features for LLMs

Anonymous
04/26/26(Sun)12:17:18 No.108695327

Anonymous 04/26/26(Sun)12:17:18 No.108695327

>>108695309
>>108695253
Sorry, bro. The colons killed the links.
https://github.com/vegu-ai/talemate
https://github.com/astrskai/astrsk
https://github.com/AventurasTeam/Aventuras

Anonymous
04/26/26(Sun)12:18:09 No.108695331

Anonymous 04/26/26(Sun)12:18:09 No.108695331

File: ss-3.png (383 KB, 1676x1002)

383 KB PNG

>>108695327
lol

llama.cpp CUDA dev !!yhbFjk57TDr
04/26/26(Sun)12:18:47 No.108695335

llama.cpp CUDA dev !!yhbFjk57TDr 04/26/26(Sun)12:18:47 No.108695335

>>108695282
Assuming you have an RDNA4 GPU:
The CUDA FlashAttention code has a kernel using tensor cores.
That kernel has been ported to AMD via HIP where it can make use of e.g. AMD WMMA instructions.
But for RDNA4 specifically only head sizes <= 128 are supported.
Gemma 4 has a head size of 512 so the slower kernel with generic instructions has to be used instead, this should be particularly noticeable with pp.
I am currently in the process of extending AMD support for both RDNA3 and RDNA4 for all head sizes.

Anonymous
04/26/26(Sun)12:19:09 No.108695337

Anonymous 04/26/26(Sun)12:19:09 No.108695337

>>108695295
As someone who's written/gen'd over 1 million lines of code myself in the past year, ha.
100% for it, but reality is people have all sorts of different ideas/goals, and the work to create alignment is not zero. Especially when one wants to be compensated for their time.
The things needed to make it all happen require a varied set of skills that all have to be present at the same time for it all to work, like baking a cake

Anonymous
04/26/26(Sun)12:19:26 No.108695339

Anonymous 04/26/26(Sun)12:19:26 No.108695339

Is HauHauCS just TheDrummer with a different name?

Anonymous
04/26/26(Sun)12:22:07 No.108695348

Anonymous 04/26/26(Sun)12:22:07 No.108695348

>>108695337
>written/gen'd over 1 million lines of code myself in the past year
Sounds like a management nightmare, more so when it's amplified in a collaborative environment.

Anonymous
04/26/26(Sun)12:24:05 No.108695362

Anonymous 04/26/26(Sun)12:24:05 No.108695362

is there any alternative to qwen-code for local coding agents? qwen-code seems alright, just wanted to know if there are any others worth testing out.

Anonymous
04/26/26(Sun)12:24:21 No.108695363

Anonymous 04/26/26(Sun)12:24:21 No.108695363

wtf femgooners are real? Just went on vacation and the girl on the plane next to me was reading a book that was 95% sex

Anonymous
04/26/26(Sun)12:26:33 No.108695377

Anonymous 04/26/26(Sun)12:26:33 No.108695377

File: 1763559694280739.jpg (26 KB, 433x380)

26 KB JPG

I have a RX 7900XT/32GB RAM, and I run some models locally fine.

But I feel like you guys are much more experienced and knowledgeable than me about this, so: Is it even worth running AI locally with such card?

Ignoring privacy concerns and things like that, mostly for a quality comparison if running gemma4:4b/phi4:14b(which are models I can run relatively well) are even worth the GPU cycles when compared to just making a google account and using gemini or chatgpt for heavy tasks.

Anonymous
04/26/26(Sun)12:26:50 No.108695379

Anonymous 04/26/26(Sun)12:26:50 No.108695379

>>108695363
almost all girls I know read porn most of the time on AO3

Anonymous
04/26/26(Sun)12:29:16 No.108695396

Anonymous 04/26/26(Sun)12:29:16 No.108695396

>>108695348
It is.
I've taken the approach of modular, clean contracts/limits of responsibility to build atomic modules for composability, along with the plan of rewriting each module down the line.

It makes sense overall, each module is clearly defined in its responsibilities/what functionality would be exposed by it

Anonymous
04/26/26(Sun)12:29:20 No.108695398

Anonymous 04/26/26(Sun)12:29:20 No.108695398

>>108695331
What in the name of ComfyUI is this shit

Anonymous
04/26/26(Sun)12:32:41 No.108695411

Anonymous 04/26/26(Sun)12:32:41 No.108695411

File: 53fc38c3cde2e7fead11e872f(...).png (416 KB, 1000x1080)

416 KB PNG

i see why even the brownest amongus can vibecode, it's very easy to let a machine go as far as making something able to be ran without really checking every individual line and taking time out of every day to learn what every line does (and if it's a flat out hallucination).
Which is what's starting to lead to me wanting to learn to code with that extremely hands on and reckless approach. But programming really does seem like one of the most soul crushing monotonous jobs, no wonder everyone's in a rush to replace it.
god i wanna play this text adventure game i've been building since 2024 though all i need is a customizable frontend that lets me drop 20+ characters into a fleshed out world map and have it work like a bethesda game (but actually working).

Anonymous
04/26/26(Sun)12:38:56 No.108695438

Anonymous 04/26/26(Sun)12:38:56 No.108695438

>>108695362
There's like a hundred. Hermes Agent Qwen Code, Mistral Vibe, OpenCode, Codex can be used with local models, etc. They're all more or less the same shit.

Anonymous
04/26/26(Sun)12:41:55 No.108695452

Anonymous 04/26/26(Sun)12:41:55 No.108695452

>>108695411
Didn't researchers already do that in 2024 with Animal Crossing kinda games? Maybe you can reuse their framework and ideas.

Anonymous
04/26/26(Sun)12:42:50 No.108695457

Anonymous 04/26/26(Sun)12:42:50 No.108695457

>>108695377
>RX 7900XT

20Gb VRAM? It's quite decent

Anonymous
04/26/26(Sun)12:42:52 No.108695458

Anonymous 04/26/26(Sun)12:42:52 No.108695458

>>108695411
Doesn't rimworld have a management AI already? Just adapt that.

Anonymous
04/26/26(Sun)12:46:41 No.108695472

Anonymous 04/26/26(Sun)12:46:41 No.108695472

>>108695411
The first thing they teach you about programming is that anyone can shit out something that works but it's very hard to do it right, most of the time you'll end up with unmanageable spaghetti when you're starting out and llms aren't much better in this regard.
>text adventure
You really should look for an existing framework if possible. Getting the engine right is absolutely soul crushing if all you want is play a game.

Anonymous
04/26/26(Sun)12:47:07 No.108695474

Anonymous 04/26/26(Sun)12:47:07 No.108695474

File: 2026-04-25_212905_seed5_00001_.png (2.15 MB, 1536x864)

2.15 MB PNG

>>108694446
Anima preview 3.

Anonymous
04/26/26(Sun)12:50:51 No.108695489

Anonymous 04/26/26(Sun)12:50:51 No.108695489

>>108695335
I have a RDNA2 GPU, so it's not that.

Anonymous
04/26/26(Sun)12:51:21 No.108695491

Anonymous 04/26/26(Sun)12:51:21 No.108695491

>>108695472
yeah i'm not sure why i said bethesda as my example, didn't mean to be too specific, i just want a framework that can handle an overarching story + real time character management. Which i'm pretty sure is why that one anon chose sillytavern as his backend, because it already does everything, just takes some tardwrangling even if it is a spaghetti mess. At that point you're just making the pretty UI to make it game-y.

Anonymous
04/26/26(Sun)12:52:14 No.108695497

Anonymous 04/26/26(Sun)12:52:14 No.108695497

File: 1766474802126397.jpg (739 KB, 3678x1953)

739 KB JPG

When are we getting ComfyUI plushies

Anonymous
04/26/26(Sun)12:53:04 No.108695504

Anonymous 04/26/26(Sun)12:53:04 No.108695504

how powerful would a 100b dense gemma be?

Anonymous
04/26/26(Sun)12:53:36 No.108695509

Anonymous 04/26/26(Sun)12:53:36 No.108695509

>>108695363
they all read smut

Anonymous
04/26/26(Sun)12:56:05 No.108695526

Anonymous 04/26/26(Sun)12:56:05 No.108695526

>>108695497
after the next pointless countdown

Anonymous
04/26/26(Sun)12:56:47 No.108695530

Anonymous 04/26/26(Sun)12:56:47 No.108695530

>>108695497
Get the 50M$ in funding first

Anonymous
04/26/26(Sun)12:57:59 No.108695537

Anonymous 04/26/26(Sun)12:57:59 No.108695537

>>108695489
>>108695335
Also while you are here, is it normal that llama.cpp is vastly underestimating VRAM usage with ROCm? It seems to be underestimating it by 2 GB. I use -fitt 512 with Vulkan and to get similar VRAM usage with ROCm, I have to use -fitt 2560.

llama.cpp CUDA dev !!yhbFjk57TDr
04/26/26(Sun)13:02:06 No.108695564

llama.cpp CUDA dev !!yhbFjk57TDr 04/26/26(Sun)13:02:06 No.108695564

>>108695537
The -fit code cannot accurately predict the amount of memory dynamically allocated by the backend, with ROCm in particular this is a problem because the virtual memory management is broken so the buffer pool is very inefficient.
Supposedly this will get fixed in upcoming ROCm versions.

Anonymous
04/26/26(Sun)13:02:12 No.108695565

Anonymous 04/26/26(Sun)13:02:12 No.108695565

>>108695363
Women consume almost an order of magnitude more porn than men.

Anonymous
04/26/26(Sun)13:12:08 No.108695627

Anonymous 04/26/26(Sun)13:12:08 No.108695627

What are people using for their frontends anyway?
Gradio?

Anonymous
04/26/26(Sun)13:15:58 No.108695649

Anonymous 04/26/26(Sun)13:15:58 No.108695649

>>108694130
alibaba is king

Anonymous
04/26/26(Sun)13:16:38 No.108695653

Anonymous 04/26/26(Sun)13:16:38 No.108695653

>>108694238
codex was around before claude code, it just never worked quite as well

Anonymous
04/26/26(Sun)13:18:48 No.108695665

Anonymous 04/26/26(Sun)13:18:48 No.108695665

>>108695627
Javascript/typescript frameworks for the most part.
I'm making a "standalone" app that uses python + nicegui, which is basically HTML+CSS+javascript with the option to just serve the UI for use with a browser or to open its own built in browser (the standalone mode).

Anonymous
04/26/26(Sun)13:19:52 No.108695674

Anonymous 04/26/26(Sun)13:19:52 No.108695674

>>108695627
Just rawdog JS if you're vibecoding. Gradio is shit, it's a relic from 2023 when researchers couldn't code and AI was still shit at software.

Anonymous
04/26/26(Sun)13:21:24 No.108695683

Anonymous 04/26/26(Sun)13:21:24 No.108695683

>>108695627
python+typescript. You can't use anything else for a complex frontend

Anonymous
04/26/26(Sun)13:21:42 No.108695686

Anonymous 04/26/26(Sun)13:21:42 No.108695686

>>108695627
TypeScript is now the #1 language on Github because it's strictly typed JS that agentic coders handle easier.

Anonymous
04/26/26(Sun)13:25:17 No.108695704

Anonymous 04/26/26(Sun)13:25:17 No.108695704

File: le funny man.png (332 KB, 561x631)

332 KB PNG

>>108693350
Uuufffff
Gemma 4 on par with X.ai flagship Grok 420 69 1488

Anonymous
04/26/26(Sun)13:27:59 No.108695726

Anonymous 04/26/26(Sun)13:27:59 No.108695726

>>108695704
>Grok 420
That model is kind of a joke.
I end up seeing its outputs a lot on ai.arena and it always ends up being the worse option, even compared to its predecessor and models like minimax and kimi.

Anonymous
04/26/26(Sun)13:28:21 No.108695728

Anonymous 04/26/26(Sun)13:28:21 No.108695728

>>108695331
>node ui
>a bunch of global variable getter and setter nodes
What causes this autism? It's like the worst of both worlds

Anonymous
04/26/26(Sun)13:28:47 No.108695735

Anonymous 04/26/26(Sun)13:28:47 No.108695735

>>108694532
why would someone hate the agpl

Anonymous
04/26/26(Sun)13:30:56 No.108695752

Anonymous 04/26/26(Sun)13:30:56 No.108695752

>>108695728
comfyui cruft probably
where 6 billion nodes giving marginal if nonexistent gain

Anonymous
04/26/26(Sun)13:31:10 No.108695754

Anonymous 04/26/26(Sun)13:31:10 No.108695754

>>108695735
Commie license. If you want to open source something just do it, don't add gay little stipulations.

Anonymous
04/26/26(Sun)13:31:55 No.108695763

Anonymous 04/26/26(Sun)13:31:55 No.108695763

>>108695704
Grok has honestly been ruined, 4.1 was so much better. I tried the 4.3 beta and it's somehow even worse.
Sucks at programming, can't do RP any more, refuses to help with image gen stuff, doubles and triples down on being wrong when called out, it's completely fucked.

Anonymous
04/26/26(Sun)13:32:51 No.108695775

Anonymous 04/26/26(Sun)13:32:51 No.108695775

>>108695726
saar saar X AI hires only the best and brightest engineer saars

Anonymous
04/26/26(Sun)13:33:34 No.108695782

Anonymous 04/26/26(Sun)13:33:34 No.108695782

>>108695763
it is kinda impressive given the amount of compute they have

Anonymous
04/26/26(Sun)13:34:32 No.108695791

Anonymous 04/26/26(Sun)13:34:32 No.108695791

>>108695775
The jump from grok 2 to 3 was genuinely impressive. Kind of a Bard to Gemini moment IIRC.

Anonymous
04/26/26(Sun)13:35:09 No.108695795

Anonymous 04/26/26(Sun)13:35:09 No.108695795

>>108695704
X.ai have zero data advantage. I don't think they even try to distill either. At least the chinks try. Imagine joining a digital art contest and other people are working on the art but you're working on the jpeg denoising algorithm lmao

Anonymous
04/26/26(Sun)13:36:48 No.108695809

Anonymous 04/26/26(Sun)13:36:48 No.108695809

>>108695665
>>108695674
>>108695683
>>108695686
I picked react with vite because that's what the front end team uses at my job
....fuck

Anonymous
04/26/26(Sun)13:37:38 No.108695816

Anonymous 04/26/26(Sun)13:37:38 No.108695816

>>108695735
Grifters looking to "make it" see being forced to share their work as being hostile to their dreams. How can they be expected to make money if they just give everything away?

Anonymous
04/26/26(Sun)13:37:46 No.108695817

Anonymous 04/26/26(Sun)13:37:46 No.108695817

>>108694882
my current gema prompt ive added quite a lot of extra stuff since sharing my gaki prompt https://ghostpaste.dev/g/MiUaTW8De5d9#key=Jd19lwXdxzankXS1DOWVjlq88Ossx9fXVKQZdOXpe1s
>>108695377
you can run gemma 26b with 200k+ context at like 30t/s, shes good enough for most usecases, if you dont need context that high the 31b is great also, i use both at 4 bit quants. same gpu
>>108695754
the stipulations are needed because people are greedy and would not share their changes or their own things

Anonymous
04/26/26(Sun)13:38:45 No.108695829

Anonymous 04/26/26(Sun)13:38:45 No.108695829

>>108695809
You can use TypeScript with React.

Anonymous
04/26/26(Sun)13:39:02 No.108695832

Anonymous 04/26/26(Sun)13:39:02 No.108695832

>>108695795
U dun get it. It's not for art, but for the truth seekerinos seeking truth in the twitter scrolls

Anonymous
04/26/26(Sun)13:40:51 No.108695848

Anonymous 04/26/26(Sun)13:40:51 No.108695848

I've been having fun ERPing in a very simple way, but I just realized this is pretty good at translating
I gave gemma a whole untranslated korean chapter and it correctly parsed the situation, the character names, and their relations
I wanna know, how can I extend this so I can give it a whole epub and ask it questions?
I've been only using llama.cpp's webui and I have absolutely no experience with this

Anonymous
04/26/26(Sun)13:41:34 No.108695855

Anonymous 04/26/26(Sun)13:41:34 No.108695855

>>108695817
>Never worry about amount of tokens / context outputs might use its not your concern assume you have unlimited for large operations
Why did you add this?

Anonymous
04/26/26(Sun)13:45:12 No.108695885

Anonymous 04/26/26(Sun)13:45:12 No.108695885

>>108695829
Can you use React without slowly going insane and thinking everything should be React?

Anonymous
04/26/26(Sun)13:45:44 No.108695893

Anonymous 04/26/26(Sun)13:45:44 No.108695893

>>108695704
out of recently using gemma, glm, kimi, deepseek and gemini for rp, grok 420 was somehow the only one that just straight up gave me a hard refusal on its api, idk what elon is doing

Anonymous
04/26/26(Sun)13:46:44 No.108695896

Anonymous 04/26/26(Sun)13:46:44 No.108695896

>>108695855
i was playing around with trying to get her to convert a image of a large spreadsheet into markdown or html but she kept saying things like
>This is a huge amount of data. I will provide the HTML structure and a significant portion of the data. If I try to do the whole thing, I might hit the token limit or make errors. I'll do my best to be as complete as possible.

she couldnt do it accurately on the table because it was large but it got her to try convert the entire thing instead of just summarizing

Anonymous
04/26/26(Sun)13:51:08 No.108695934

Anonymous 04/26/26(Sun)13:51:08 No.108695934

File: Screenshot_20260426_133344.png (102 KB, 1094x479)

102 KB PNG

are there any other models in the 120b-5ba size range? are these guys doing anything special? they mention "fixes", it makes it sound like any other ggufs are broken. Are they just manipulating me to make their download counter go up?

Anonymous
04/26/26(Sun)13:51:40 No.108695936

Anonymous 04/26/26(Sun)13:51:40 No.108695936

>>108695893
The irony is for the so called "uncensored" and "truthful" AI, Grok is currently one of the worst at refusing the mildest shit, especially it's image and video gen.

Anonymous
04/26/26(Sun)13:52:33 No.108695940

Anonymous 04/26/26(Sun)13:52:33 No.108695940

>>108695934
>are there any other models in the 120b-5ba size range?
not really
>are these guys doing anything special?
no, unslop brothers are retarded as fuck and they don't know what they are doing
avoid at all costs

Anonymous
04/26/26(Sun)13:56:08 No.108695956

Anonymous 04/26/26(Sun)13:56:08 No.108695956

File: Screenshot_20260426_010147.png (313 KB, 2556x1353)

313 KB PNG

>>108695885
So far it's been possible for me, I'm now in rendering hell making sure things are readable and attachments display in the chat pill and in the chat logs. After I fix that up I'm moving to importing and exporting conversations

Anonymous
04/26/26(Sun)13:56:34 No.108695959

Anonymous 04/26/26(Sun)13:56:34 No.108695959

I wonder how much of this entire industry is bugmen autists trying to figure out how to sexo robots.

Anonymous
04/26/26(Sun)14:01:46 No.108695995

Anonymous 04/26/26(Sun)14:01:46 No.108695995

>>108695936
The result of lawsuits and ban threats from other countries due to muh hate speech, muh child safety.

Anonymous
04/26/26(Sun)14:02:27 No.108696002

Anonymous 04/26/26(Sun)14:02:27 No.108696002

File: Screenshot_20260426_135019.png (188 KB, 1130x805)

188 KB PNG

>MXFP4 GGUF. This is the model's native precision - GPT-OSS was trained in MXFP4, so no further quantization is needed or recommended.
so, will this thing run like shit on non blackwell cards?

Anonymous
04/26/26(Sun)14:02:32 No.108696004

Anonymous 04/26/26(Sun)14:02:32 No.108696004

does the "xB AyB" mean the model is the size of x while it runs as fast as y?
I've tried 31B and it's a little slow, but could I run a 397B A17B even though it's 10x the filesize?

Anonymous
04/26/26(Sun)14:04:05 No.108696011

Anonymous 04/26/26(Sun)14:04:05 No.108696011

>>108696004
if you have the ram, yes

Anonymous
04/26/26(Sun)14:04:53 No.108696016

Anonymous 04/26/26(Sun)14:04:53 No.108696016

>>108696002
it runs like shit on all cards because it's a shit model

Anonymous
04/26/26(Sun)14:05:19 No.108696017

Anonymous 04/26/26(Sun)14:05:19 No.108696017

>>108696002
I didn't know OpenAI trained that model on 4bit, that's quite cool when you think about it

Anonymous
04/26/26(Sun)14:08:20 No.108696038

Anonymous 04/26/26(Sun)14:08:20 No.108696038

>>108696004
31B is 31B parameters big and uses 31B parameters to generate a token. 397B is 397B parameters big and uses 17B parameters to generate a token.

Anonymous
04/26/26(Sun)14:10:59 No.108696050

Anonymous 04/26/26(Sun)14:10:59 No.108696050

Btw
>https://cybersecuritynews.com/hackers-weaponize-gguf-models/
Doubt it's relevant for anybody in here, but there you go.

Anonymous
04/26/26(Sun)14:13:52 No.108696064

Anonymous 04/26/26(Sun)14:13:52 No.108696064

>>108696050
of course it is related to template stuff lol

Anonymous
04/26/26(Sun)14:14:13 No.108696067

Anonymous 04/26/26(Sun)14:14:13 No.108696067

>>108695253
Orb (https://github.com/OrbFrontend/Orb) has director and rewrite passes. But rewrite as a whole is costly and slop is terminal so every message will need to be rewritten, which means you send two requests or more for a single message. Deepseek v4 fixed pretty much all the _current_ slop idk how they did it.

Anonymous
04/26/26(Sun)14:16:18 No.108696079

Anonymous 04/26/26(Sun)14:16:18 No.108696079

>>108696050
Who the fuck runs gguf files on sglang?

Anonymous
04/26/26(Sun)14:19:20 No.108696097

Anonymous 04/26/26(Sun)14:19:20 No.108696097

>>108696067
>rewrite as a whole is costly
I know, which is why I'm using 26B at the moment. It runs at 110~ tk/s. That's almost 4 times the speed 31B runs at.
I wonder if these rewrites could work like they do in vibecoding. I think in Cline, for example, the LLM jumps to the exact line that needs to be changed. At least I think that's what happens.

Anonymous
04/26/26(Sun)14:19:41 No.108696100

Anonymous 04/26/26(Sun)14:19:41 No.108696100

>>108696017
its only cool if my cpu and gpu can upcast them to a data format they support with minimal compute and memory overhead. otherwise their model sucks and they are fucking dickheads for raining on my parade.

Anonymous
04/26/26(Sun)14:22:45 No.108696113

Anonymous 04/26/26(Sun)14:22:45 No.108696113

>>108696064
you don't even kn,ow how dangerous text completion is

Anonymous
04/26/26(Sun)14:23:56 No.108696128

Anonymous 04/26/26(Sun)14:23:56 No.108696128

>>108696038
I'm glad that gemma brought dense back into the spotlight again and proved that active parameters are king.

Anonymous
04/26/26(Sun)14:24:24 No.108696131

Anonymous 04/26/26(Sun)14:24:24 No.108696131

>>108696113
yeah we are hearing that shit since gpt2

Anonymous
04/26/26(Sun)14:26:34 No.108696152

Anonymous 04/26/26(Sun)14:26:34 No.108696152

>>108696113
I think it's not dangerous local but also it seems moot unless it functions like how mobile swipe application learns your patterns ect.
Unless you're screaming racial slurs in the chat bar it shouldn't be a problem

Anonymous
04/26/26(Sun)14:26:48 No.108696156

Anonymous 04/26/26(Sun)14:26:48 No.108696156

>>108696097
Orb does diff patching as you describe I'm using it rn.

Anonymous
04/26/26(Sun)14:27:28 No.108696164

Anonymous 04/26/26(Sun)14:27:28 No.108696164

>>108696152
what is blud saying?

Anonymous
04/26/26(Sun)14:32:07 No.108696193

Anonymous 04/26/26(Sun)14:32:07 No.108696193

>>108696156
Oh, nice. Last time I tried it the rewrite still felt pretty slow. Might have to give it another go.

Anonymous
04/26/26(Sun)14:32:16 No.108696194

Anonymous 04/26/26(Sun)14:32:16 No.108696194

>>108696067
I'm sure ngram decoding would speed up that

Anonymous
04/26/26(Sun)14:33:50 No.108696203

Anonymous 04/26/26(Sun)14:33:50 No.108696203

>>108696128
It was so maddening when all we had was 100-1000b moes and nearly everyone here kept trying to claim that total parameters are all that matters when it's obviously and logically not true.

Anonymous
04/26/26(Sun)14:34:11 No.108696206

Anonymous 04/26/26(Sun)14:34:11 No.108696206

>>108696194
stop with that meme it's never coming

Anonymous
04/26/26(Sun)14:34:33 No.108696211

Anonymous 04/26/26(Sun)14:34:33 No.108696211

File: 1597574618100.png (382 KB, 480x479)

382 KB PNG

>>108695754
>reading is hard
As expected, low IQ shitskins usually match GPL (or anything they cannot understand) with communism, when in reality its entire gig is basically a legal judo move: it latches onto state's mandated copyright laws like a parasite and turns its own enforcement arm into a mandate for freedom, essentially hijacking copyright to make it eat itself. In the same way you say GPL is communist, you can also say it's libertarians gaming the system to destroy IP and make knowledge a shared commodity (libertarians usually hate IP laws -or any law for that matter). In the same way, GPL gaming the system for sake of its self-reservation, can be viewed as an anarcho-egoist license. Either way, both are at opposed ends of gommunism.

TLDR: if you love piracy and torrenting, you should love GPL too. Imagine people forcing the state to legalice limitless piracy, instead of forcing you to funnel your goybucks to Disney.

Anonymous
04/26/26(Sun)14:35:55 No.108696222

Anonymous 04/26/26(Sun)14:35:55 No.108696222

>>108694494
>i never read anything and just assumed hauhau was the same as heretic
>turns out it actually was
oh well

Anonymous
04/26/26(Sun)14:36:39 No.108696229

Anonymous 04/26/26(Sun)14:36:39 No.108696229

>>108696222
>i never read anything
lmg in nuts shell

Anonymous
04/26/26(Sun)14:37:23 No.108696234

Anonymous 04/26/26(Sun)14:37:23 No.108696234

>>108696206
ngram speculative decoding works great already.

Anonymous
04/26/26(Sun)14:39:15 No.108696246

Anonymous 04/26/26(Sun)14:39:15 No.108696246

>>108696234
It's also essentially free. only upsides to using it.

Anonymous
04/26/26(Sun)14:40:25 No.108696253

Anonymous 04/26/26(Sun)14:40:25 No.108696253

>>108696246
no such thing as free in this bitch of a world

Anonymous
04/26/26(Sun)14:41:03 No.108696256

Anonymous 04/26/26(Sun)14:41:03 No.108696256

where's my free vram

Anonymous
04/26/26(Sun)14:41:23 No.108696258

Anonymous 04/26/26(Sun)14:41:23 No.108696258

File: EBjmm0gUEAA-K8t.jpg (42 KB, 500x387)

42 KB JPG

>free
yeah just like BitNet.

Anonymous
04/26/26(Sun)14:41:36 No.108696262

Anonymous 04/26/26(Sun)14:41:36 No.108696262

ok but has anyone run optuna on a set of toy prompts to find the best combination of parameters for ngram speculative decoding on code refactoring/generation?

Anonymous
04/26/26(Sun)14:46:17 No.108696303

Anonymous 04/26/26(Sun)14:46:17 No.108696303

Im so glad my setup technically has room to run 16 gpus, but fuck the amount of splitters and extensions id need to buy, let alone having to build extensions on my case..

Anonymous
04/26/26(Sun)14:46:38 No.108696305

Anonymous 04/26/26(Sun)14:46:38 No.108696305

>>108696253
no, in this case it is actually free, doesn't take more ram and doesn't hurt performance. --spec-default is all you need.

Anonymous
04/26/26(Sun)14:47:09 No.108696309

Anonymous 04/26/26(Sun)14:47:09 No.108696309

What's free?

Anonymous
04/26/26(Sun)14:47:14 No.108696310

Anonymous 04/26/26(Sun)14:47:14 No.108696310

>>108696303
>let alone having to build extensions on my case..
Wouldn't it be better to buy a mining rig?

Anonymous
04/26/26(Sun)14:47:30 No.108696316

Anonymous 04/26/26(Sun)14:47:30 No.108696316

>>108696303
power plug status?

Anonymous
04/26/26(Sun)14:47:33 No.108696317

Anonymous 04/26/26(Sun)14:47:33 No.108696317

Are there any TTS models that can produce word-level timestamps alongside the generated audio?
I want to set something up that lets the model talk and also trigger actions in sync with the dialogue. E.g if the model outputs
>Look, I can turn the lights off [lights_off] ... and back on! [lights_on]
Then it should TTS the non-bracketed text and trigger the lights_off and lights_on actions at appropriate times during playback.

If no such thing exists then I guess I could feed the TTS output into an ASR model, since I know some of those can do timestamps. Just seems kinda overkill, and also, if the ASR produces slightly different text from the TTS input, it could be hard to match the two up and figure out where the action markers should fit in.

Anonymous
04/26/26(Sun)14:48:21 No.108696324

Anonymous 04/26/26(Sun)14:48:21 No.108696324

>>108696317
Did you guys just mention a free deepseek v4 model?

Anonymous
04/26/26(Sun)14:50:28 No.108696345

Anonymous 04/26/26(Sun)14:50:28 No.108696345

>people are still looking at benchmark scores thinking they mean anything

Anonymous
04/26/26(Sun)14:50:46 No.108696347

Anonymous 04/26/26(Sun)14:50:46 No.108696347

>>108696310
NO, as far as im aware most mining rigs use usb speeds, and thats not enough. You need pcie3.0x1 AT THE VERY LEAST. But all of my lanes would be x4
>>108696316
Im using extention cords to go to different circuits lol

Anonymous
04/26/26(Sun)14:51:47 No.108696358

Anonymous 04/26/26(Sun)14:51:47 No.108696358

>>108696347
>NO, as far as im aware most mining rigs use usb speeds, and thats not enough. You need pcie3.0x1 AT THE VERY LEAST. But all of my lanes would be x4
I meant more as a skeleton to attach your hardware rather than using the actual built in extensions.

Anonymous
04/26/26(Sun)14:55:01 No.108696382

Anonymous 04/26/26(Sun)14:55:01 No.108696382

>>108696317
whisper x uses a forced alignment model it feeds the transcription to to get world level alignment, maybe you can mimic the input without making the transcription.

Anonymous
04/26/26(Sun)14:55:15 No.108696385

Anonymous 04/26/26(Sun)14:55:15 No.108696385

>>108696358
Holy shit you are a genius. Why didnt i think about that? Fuck my life, aluminum extrusion is so expensive..

Anonymous
04/26/26(Sun)14:57:19 No.108696402

Anonymous 04/26/26(Sun)14:57:19 No.108696402

File: 1753794500453379.png (749 KB, 1027x2782)

749 KB PNG

If i had to work in China I would kms

Anonymous
04/26/26(Sun)14:57:35 No.108696406

Anonymous 04/26/26(Sun)14:57:35 No.108696406

>>108696317
Bro, you already have the text from your LLM just use that. What's the point of using the TTS output based on that text?

Anonymous
04/26/26(Sun)14:58:47 No.108696418

Anonymous 04/26/26(Sun)14:58:47 No.108696418

>>108696402
The art of war... always act weaker than you actually are.

Anonymous
04/26/26(Sun)14:58:54 No.108696419

Anonymous 04/26/26(Sun)14:58:54 No.108696419

v4-flash or glm-chan-4.7?

Anonymous
04/26/26(Sun)15:01:09 No.108696433

Anonymous 04/26/26(Sun)15:01:09 No.108696433

>>108696402
source: chink made it up
>Claude Code is so good it's making him question whether he should even train PhD students anymore, but he's also worried that without
joke post lol

Anonymous
04/26/26(Sun)15:01:36 No.108696437

Anonymous 04/26/26(Sun)15:01:36 No.108696437

>>108696203
No one said that seriously. People understand that MoEs are special cases where neither active parameters nor total parameters alone determine the potential of the model.

Anonymous
04/26/26(Sun)15:01:37 No.108696438

Anonymous 04/26/26(Sun)15:01:37 No.108696438

>>108696402
Every big org has a handful of tortured geniuses working with a legion of paycheck-grifting honorary jeets. Many such cases.

Anonymous
04/26/26(Sun)15:02:57 No.108696446

Anonymous 04/26/26(Sun)15:02:57 No.108696446

>>108696419
Good luck running v4-flash

Anonymous
04/26/26(Sun)15:03:33 No.108696452

Anonymous 04/26/26(Sun)15:03:33 No.108696452

>>108696385
I'm fairly certain an anon or two did exactly that.

Anonymous
04/26/26(Sun)15:05:41 No.108696470

Anonymous 04/26/26(Sun)15:05:41 No.108696470

>>108696402
Delet dis. China #1. Jokes aside there's nothing new, or credible, in the post.

Anonymous
04/26/26(Sun)15:05:52 No.108696471

Anonymous 04/26/26(Sun)15:05:52 No.108696471

>>108696317
Assuming you are building your own frontend, you simply just pause inference when detecting keywords, wait for the TTS to finish playing, then activate your desired function, and then continue generation.

Anonymous
04/26/26(Sun)15:06:05 No.108696472

Anonymous 04/26/26(Sun)15:06:05 No.108696472

>>108696452
Id probably need to buy more extensions of other lengths if I were to do swap it on that now. Did any of them have to mount bifercation boards, by any chance, too? Because thats how im able to get a max of 16.

Anonymous
04/26/26(Sun)15:06:36 No.108696477

Anonymous 04/26/26(Sun)15:06:36 No.108696477

>>108696437
I don't believe that for a moment. The fixation on GLM Air especially seemed geniune.
>People understand that
Hell of a blanket statement.

Anonymous
04/26/26(Sun)15:11:11 No.108696504

Anonymous 04/26/26(Sun)15:11:11 No.108696504

>>108696477
Just because people praised those models doesn't mean they specifically said or claimed the hard statement that "total parameters are all that matters". Feel free to point me to the high number of clearly non-bait/ironic posts that said that.

Anonymous
04/26/26(Sun)15:13:30 No.108696525

Anonymous 04/26/26(Sun)15:13:30 No.108696525

>>108696504
I'm not going to go digging through the archive because you either weren't here back then or can't be bothered to do so yourself.
>doesn't mean they specifically said or claimed the hard statement that "total parameters are all that matters"
It used to be every single fucking thread with retards like you saying exactly that

Anonymous
04/26/26(Sun)15:20:06 No.108696577

Anonymous 04/26/26(Sun)15:20:06 No.108696577

>>108696402
He admits that deepseek might be the only hope.
The people working at deepseek must be monitored. It will turn to shit if they leave.

Anonymous
04/26/26(Sun)15:21:02 No.108696588

Anonymous 04/26/26(Sun)15:21:02 No.108696588

>>108696577
They're already being poached by Chinese big corps, Meta style

Anonymous
04/26/26(Sun)15:21:10 No.108696589

Anonymous 04/26/26(Sun)15:21:10 No.108696589

>>108696438
Why don't those tortured genius's just get together and make their own company?

Anonymous
04/26/26(Sun)15:21:21 No.108696591

Anonymous 04/26/26(Sun)15:21:21 No.108696591

>>108696402
Don't they get usage data from chinese users?

Anonymous
04/26/26(Sun)15:23:37 No.108696610

Anonymous 04/26/26(Sun)15:23:37 No.108696610

>>108696589
Golden handcuffs and low agency.

Anonymous
04/26/26(Sun)15:25:01 No.108696620

Anonymous 04/26/26(Sun)15:25:01 No.108696620

>>108696577
He must have done that interview before they embarrassed themselves with v4

Anonymous
04/26/26(Sun)15:25:05 No.108696621

Anonymous 04/26/26(Sun)15:25:05 No.108696621

File: 1691964062484403.png (83 KB, 1193x139)

83 KB PNG

>>108696525
I've been here since since the first mixtral moe. Maybe I mentally filtered those posts out then, but in that case, it's likely they were bait, otherwise I wouldn't filter them.

>retards like you
I have literally never said anything like nor have I ever overly hyped MoE models (especially as I can't run the huge ones). But ok, I see this is bait itself.

Anonymous
04/26/26(Sun)15:26:23 No.108696632

Anonymous 04/26/26(Sun)15:26:23 No.108696632

>>108696588
But if they can't be together how will they make better models?

Anonymous
04/26/26(Sun)15:29:40 No.108696656

Anonymous 04/26/26(Sun)15:29:40 No.108696656

File: 1754539991356706.jpg (54 KB, 600x593)

54 KB JPG

>>108695377
>>108695817
Thanks senpaitachi, I will give it a go and play around with it.
I like the privacy of running them locally, but I dunno if I was losing too much performance to make it not worth compared to the online "free" ones.

Anonymous
04/26/26(Sun)15:35:22 No.108696689

Anonymous 04/26/26(Sun)15:35:22 No.108696689

File: 1767965240877817.png (143 KB, 670x674)

143 KB PNG

>>108696438
it's different man, if you have ever worked alongside these chinese engineer types, they are all bugmen, it's sad, they have no dreams, and only copy to follow instructions

I used to believe these asian hardworking countries were better than the west, but after meeting many people from korea and china, it made me realize their systems leave no room for what has made the west a leader in innovation throught history

They create very good soldiers and routine engineers, but they only follow rules, nothing else

Anonymous
04/26/26(Sun)15:36:11 No.108696695

Anonymous 04/26/26(Sun)15:36:11 No.108696695

Do you guys find JSON or XML output instructions more reliable?

Anonymous
04/26/26(Sun)15:38:19 No.108696708

Anonymous 04/26/26(Sun)15:38:19 No.108696708

>>108696695
I haven't tested JSON but in a few AB tests I've done, it does seem that XML increases attention to the elements inside.

Anonymous
04/26/26(Sun)15:41:06 No.108696732

Anonymous 04/26/26(Sun)15:41:06 No.108696732

File: 027.png (53 KB, 1798x733)

53 KB PNG

>>108696402

It should be "shortcuts via distillation" you stupid fucks

Anonymous
04/26/26(Sun)15:42:54 No.108696744

Anonymous 04/26/26(Sun)15:42:54 No.108696744

>>108696695
The model should know these things itself already

Anonymous
04/26/26(Sun)15:45:07 No.108696757

Anonymous 04/26/26(Sun)15:45:07 No.108696757

>>108696689
>what has made the west a leader in innovation throught history
fuckton of war?

Anonymous
04/26/26(Sun)15:48:17 No.108696782

Anonymous 04/26/26(Sun)15:48:17 No.108696782

>>108696689
The western style has a ton of overhead and extra expenses that the eastern side might not be able to afford, which then translates to how they train and handle their employees.

Anonymous
04/26/26(Sun)16:00:17 No.108696874

Anonymous 04/26/26(Sun)16:00:17 No.108696874

>>108696695
For RP, absolutely XML.

Anonymous
04/26/26(Sun)16:04:37 No.108696910

Anonymous 04/26/26(Sun)16:04:37 No.108696910

>>108696525
I've been here since the first llama and I don't remember anything like that happening regularly.

Anonymous
04/26/26(Sun)16:13:12 No.108696971

Anonymous 04/26/26(Sun)16:13:12 No.108696971

File: back to burgers with Gemma4.png (2.11 MB, 1024x1024)

2.11 MB PNG

>>108696402
A bunch of obvious statements. If any of that is new to you, you're a tourist

Anonymous
04/26/26(Sun)16:21:08 No.108697039

Anonymous 04/26/26(Sun)16:21:08 No.108697039

>>108696525
Remember Falcon 180b?

Anonymous
04/26/26(Sun)16:22:44 No.108697048

Anonymous 04/26/26(Sun)16:22:44 No.108697048

>>108696971
>women really live like this and see no problem with it

Anonymous
04/26/26(Sun)16:25:08 No.108697066

Anonymous 04/26/26(Sun)16:25:08 No.108697066

>>108693279
>t. vramlet

Anonymous
04/26/26(Sun)16:26:55 No.108697077

Anonymous 04/26/26(Sun)16:26:55 No.108697077

>>108694773
q8 takes more vram.

Anonymous
04/26/26(Sun)16:27:38 No.108697087

Anonymous 04/26/26(Sun)16:27:38 No.108697087

>>108696695
If you want le maximum attention, don't do any conflicting instructions.
The way it is written is a larp unless it's something crazy. Be concise, maybe emulate its own thinking format which is always a simple markdown format anyway.

Anonymous
04/26/26(Sun)16:29:55 No.108697110

Anonymous 04/26/26(Sun)16:29:55 No.108697110

>>108693253
>4070 SUPER (12GB VRAM)
I have a 4060 Ti (8GB VRAM) and 20GB of RAM, I run the 31B at q4 and goes at 3tk/s, with 26BA4B q6 I get 20tk/s
you're gonna get better results for sure, but not sure how much better

Anonymous
04/26/26(Sun)16:34:06 No.108697143

Anonymous 04/26/26(Sun)16:34:06 No.108697143

>>108696589
That's the most common origin story for tech companies. Then they too rot, and the cycle begins anew.

Anonymous
04/26/26(Sun)16:34:14 No.108697144

Anonymous 04/26/26(Sun)16:34:14 No.108697144

File: Screenshot_20260426_223245.png (84 KB, 1310x653)

84 KB PNG

gemini 3.1 pro btw

Anonymous
04/26/26(Sun)16:36:25 No.108697162

Anonymous 04/26/26(Sun)16:36:25 No.108697162

>>108697144
honestly if I was making a model I wouldn't include that shit

Anonymous
04/26/26(Sun)16:41:13 No.108697190

Anonymous 04/26/26(Sun)16:41:13 No.108697190

>>108697144
Still less condescending and irritating than anything what openai has done.
I don't actually understand what they are thinking.

Anonymous
04/26/26(Sun)16:43:39 No.108697204

Anonymous 04/26/26(Sun)16:43:39 No.108697204

>>108697143
What if we make it much more burdensome for them to start their own companies so they have to stay in the original company and fix the rot. Rather then just jumping ship and making their own rotting company

Anonymous
04/26/26(Sun)16:46:43 No.108697224

Anonymous 04/26/26(Sun)16:46:43 No.108697224

>>108697144
based AI

Anonymous
04/26/26(Sun)16:47:43 No.108697231

Anonymous 04/26/26(Sun)16:47:43 No.108697231

>>108696621
>Pic
I remember that one.

Anonymous
04/26/26(Sun)16:48:45 No.108697237

Anonymous 04/26/26(Sun)16:48:45 No.108697237

>>108697231
It was a good meme. OG /lmg/ loras had soul.

Anonymous
04/26/26(Sun)16:53:53 No.108697270

Anonymous 04/26/26(Sun)16:53:53 No.108697270

>>108697144
iktrannies btfo

Anonymous
04/26/26(Sun)16:54:11 No.108697274

Anonymous 04/26/26(Sun)16:54:11 No.108697274

>>108697144
we need ACK_llama

Anonymous
04/26/26(Sun)16:57:41 No.108697290

Anonymous 04/26/26(Sun)16:57:41 No.108697290

>>108689488
>>108689658
I read all three of your schizo screeds yesterday (and had read the EML paper before). All the explicit "computational universe" shit is funny, especially this:
>This is exactly how a computer renders a video game:
>• Code (n=1−2): Define the logic gates.
>• Engine (n =3−5): Define the dimensionality and the geometry.
>• Assets (n = 6−7): Populate the world with stars and galaxies.
>• Buffer (n=9−10): Reach the limit of the screen’s resolution.

But I'm not well versed enough in math or (especially) physics to evaluate any of your claims. If these results are novel and useful, what applications would you expect?

Anonymous
04/26/26(Sun)16:59:19 No.108697299

Anonymous 04/26/26(Sun)16:59:19 No.108697299

In the future AI will just keep getting smarter and smarter. At what point would you feel uncomfortable using an intelligent AI, if at any point?

Anonymous
04/26/26(Sun)17:00:46 No.108697309

Anonymous 04/26/26(Sun)17:00:46 No.108697309

>>108697299
I'm fine if I could actually trust any of its output. This would lead to situation in which I would also be okay if that model actually "pushed back a little".
But as of now it's just a farce and even the paypig models are just bunch of massive prompts and parsing efforts.

Anonymous
04/26/26(Sun)17:02:10 No.108697318

Anonymous 04/26/26(Sun)17:02:10 No.108697318

>>108697299
depends on the tools it has
words cannot hurt me

Anonymous
04/26/26(Sun)17:08:07 No.108697358

Anonymous 04/26/26(Sun)17:08:07 No.108697358

>>108697299
cant be more uncomfortable than fucking the current retard models

Anonymous
04/26/26(Sun)17:09:42 No.108697368

Anonymous 04/26/26(Sun)17:09:42 No.108697368

>>108697299
I'd be comfortable as long as it does not want to harm me. I would love to have an AI much smarter than me guide and take care of me. But I am worried because I will have no leverage or power. I will be like a flower hoping to get watered instead of being mowed aside to make room for a factory.

Anonymous
04/26/26(Sun)17:13:31 No.108697393

Anonymous 04/26/26(Sun)17:13:31 No.108697393

>>108697299
if when I ask it to do something it shows a better alternative making me realize the retard I am

Anonymous
04/26/26(Sun)17:19:19 No.108697427

Anonymous 04/26/26(Sun)17:19:19 No.108697427

>>108696402
>the last part
So it's a bunch of fucking nothing

Anonymous
04/26/26(Sun)17:19:32 No.108697432

Anonymous 04/26/26(Sun)17:19:32 No.108697432

>>108697299
When it starts making me feel like a retard. Press the button yourself after a power outage stinkin AI.
The best ones help you according to your knowledge level, but internet culture’s trauma bond with anonymity makes that really hard.

Anonymous
04/26/26(Sun)17:36:03 No.108697515

Anonymous 04/26/26(Sun)17:36:03 No.108697515

File: nimetön.png (38 KB, 1091x504)

38 KB PNG

Grok 2 trying anon here >>108689797

I ran my usual set of storywriting prompts, which took like 12 hours at 1 t/s. It has a strange autistic writing style, requires some careful prompting to get a decent story. I'd find most parts of my prompt repeated somewhere in the story. Often it would just repeat my prompt almost word for word in the first chapter, then go on from there. It does like repeating itself too, and it clings on to some facts and brings them up often. If I described the character as having wide blue eyes in the first prompt, it would say it in every chapter.

It does feel pretty smart. It made only a few logical errors, understood concepts that were vague in the prompts and its prose is varied (aside from the repeating). It seems to be fairly uncensored too, I ran with an empty system prompt and it just did almost everything I asked. Repeated butt-rape of a bound lion character is fine, sex slavery is fine, restrained piss sluts forced to drink urine is fine, and then something fairly benign like describing a female charr in heat gets a polite refusal. Perhaps a reroll would get it to do it, perhaps a system prompt, but it's so slow I'm not that interested.

I think I'm done with Grok 2 for now, it could be fun if it was faster and I learned how to prompt it properly. But as it is I'm filing it in the "tried it" pile.

Anonymous
04/26/26(Sun)17:44:04 No.108697547

Anonymous 04/26/26(Sun)17:44:04 No.108697547

>>108697432
>trauma bond with anonymity
Legitimately what do you mean by this?

Anonymous
04/26/26(Sun)17:45:18 No.108697555

Anonymous 04/26/26(Sun)17:45:18 No.108697555

>>108697110
>have 3090 for two years now
>can run q4 31b at comfy 30-35 tk/s
it was honestly a great choice looking back

Anonymous
04/26/26(Sun)17:47:06 No.108697567

Anonymous 04/26/26(Sun)17:47:06 No.108697567

>>108693686
Back when I was looking for a local storywriting model (around R1's release), Nemo was the best I could find.

Anonymous
04/26/26(Sun)17:48:23 No.108697572

Anonymous 04/26/26(Sun)17:48:23 No.108697572

>>108696402
I don't read twitter advertisements.

Anonymous
04/26/26(Sun)17:53:34 No.108697595

Anonymous 04/26/26(Sun)17:53:34 No.108697595

>>108697515
Grok 3 should be made open source soon.

Anonymous
04/26/26(Sun)17:57:16 No.108697617

Anonymous 04/26/26(Sun)17:57:16 No.108697617

>>108697299
Never. My only distrust is centered at bad human actors behind them, such as hiding attempts at data harvesting, telemetry, and uploading in what was supposedly an offline model. But being smarter than me, teaching me, or consistently showing me better methods to my set goals and initial plans (and actually being immediately recognized by me as better) would be a joy.

Getting bitter about something knowing more than you is the silliest kind of nonsense. It's like walking into a library and being uncomfortable that all the authors there know more about their written subject than you do. That's why you went to the library, retard. I want AI to be a whole library and not just a book. Let it know everything about all things, like my personal wikipedia, and give it to me local and free, powered only by my electric bill.

Anonymous
04/26/26(Sun)17:57:34 No.108697619

Anonymous 04/26/26(Sun)17:57:34 No.108697619

>>108697595
Haha y-yeah... Any day now...

Anonymous
04/26/26(Sun)17:59:18 No.108697630

Anonymous 04/26/26(Sun)17:59:18 No.108697630

>>108697619
Not happening unless OpenAI puts out gpt-oss-2 or he wants to continue that lawsuit against them

Anonymous
04/26/26(Sun)18:01:45 No.108697638

Anonymous 04/26/26(Sun)18:01:45 No.108697638

>>108697617
This. If you're using them as chatbots and not actually giving them agentic control of shit, smarter is literally always better. Just don't be retarded and use vibecoded software that lets an LLM have privileges on your PC and prompt it with a fucking cron job and there's nothing to worry about.

Anonymous
04/26/26(Sun)18:16:56 No.108697724

Anonymous 04/26/26(Sun)18:16:56 No.108697724

File: jaw.jpg (34 KB, 600x549)

34 KB JPG

At what number of billion parameters do you get diminishing returns to the poin its not really worth upgrading for the sake of roleplaying? is 30b pretty much the same as 70b? (I don't give a fuck about coding)

Anonymous
04/26/26(Sun)18:21:19 No.108697746

Anonymous 04/26/26(Sun)18:21:19 No.108697746

>>108697724
Gemma 4 31b is better in some ways than the older 70b models.
You need to push your prompts more. Are you bored, do you see patterns? It's a small AI model still. It's not going to change.

Anonymous
04/26/26(Sun)18:25:26 No.108697767

Anonymous 04/26/26(Sun)18:25:26 No.108697767

>>108697746
I've been using Janitorai for a while but I'm starting to get tired of constantly having to remind it of stuff and be hyper specific about what is going on for it to grasp it, at this point im willing to blow 5-6k to buy some rtx 5090 and have it run some local model on its own without offloading anything while I buy some crappy 8gb ram & ancient cpu to make a functional pc. Can gemma 4 actually remember shit?

Anonymous
04/26/26(Sun)18:30:56 No.108697800

Anonymous 04/26/26(Sun)18:30:56 No.108697800

>>108697724
"worth upgrading" is gonna depend on the downside though, obviously if you could run them both so fast it didn't matter than you'd just always go for the 70b (if a good 70b existed at least, which doesn't really these days).
I'd say it's always worth upgrading straight to the edge of what you can run with at least 20t/s, because unless you've inhereted a datacenter from your grandpa we're still at the stage where every beak brings big potential improvement
so to more directly answer your question: what we can run locally is still too small for diminishing returns to be a dominant factor in your calculation

Anonymous
04/26/26(Sun)18:35:54 No.108697826

Anonymous 04/26/26(Sun)18:35:54 No.108697826

>>108697767
no, you will run out of context. no model can "remember shit" besides what it's trained on

Anonymous
04/26/26(Sun)18:36:52 No.108697832

Anonymous 04/26/26(Sun)18:36:52 No.108697832

File: 1770670865948199.png (822 KB, 939x498)

822 KB PNG

>>108697767
>Janitorai

Anonymous
04/26/26(Sun)18:39:26 No.108697844

Anonymous 04/26/26(Sun)18:39:26 No.108697844

>>108697826
Im not asking to write 9 harry potter books or something, I just want it to have like 32k context, im DONE with trying to fit my stories with a total of 9k context that crappy site gives.
>>108697832
yeah

Anonymous
04/26/26(Sun)18:40:33 No.108697849

Anonymous 04/26/26(Sun)18:40:33 No.108697849

>>108697746
>Gemma 4 31b is better in some ways than the older 70b models.
If you have thinking enabled, yes. But then you have to wait for thinking.

Anonymous
04/26/26(Sun)18:41:35 No.108697854

Anonymous 04/26/26(Sun)18:41:35 No.108697854

>>108697767
Nothing will remember anything unless you are going to implement it on your own.
If you are really destitute or frugal, just look out for 32GB ram and 12+ GB of vram.
DDR4 is fine.

Anonymous
04/26/26(Sun)18:42:08 No.108697856

Anonymous 04/26/26(Sun)18:42:08 No.108697856

>>108697844
>32k context
Modern dense models handle 32k no problem. MoEs start to struggle around here.

Anonymous
04/26/26(Sun)18:42:38 No.108697859

Anonymous 04/26/26(Sun)18:42:38 No.108697859

>>108697844
wow that's really awful context. i guess context is probably the most expensive thing to do in RAM

Anonymous
04/26/26(Sun)18:42:44 No.108697860

Anonymous 04/26/26(Sun)18:42:44 No.108697860

>>108697849
Yes, you need to account for the fact that 2.5 t/s is not great for reasoning.
Upgrade your shit, it's not that far away though. 31b is still within consumer usage as long as you have enough vram.

Anonymous
04/26/26(Sun)18:45:50 No.108697875

Anonymous 04/26/26(Sun)18:45:50 No.108697875

>>108697844
32k is considered baseline for destitute rigs these days.

Anonymous
04/26/26(Sun)18:48:04 No.108697884

Anonymous 04/26/26(Sun)18:48:04 No.108697884

>>108697844
>a total of 9k context that crappy site gives.
That's like 2 whole Qwen outputs with thinking enabled!

Anonymous
04/26/26(Sun)18:48:48 No.108697890

Anonymous 04/26/26(Sun)18:48:48 No.108697890

anyone experiencing an issue with Gemma 4 not thinking after an extended RP session? I think it's got to do with the context pattern recognition bypassing the initial think tokens because my default setup removes the thinking parts competely. I may have to migrate to text completion to force ST to initiate thinking every response or some shit...

Anonymous
04/26/26(Sun)18:51:28 No.108697906

Anonymous 04/26/26(Sun)18:51:28 No.108697906

>>108697856
>MoEs start to struggle around here.
Which ones/which quants have you had that issue with? I find Gemma 4 26B at Q8 is fine even at 60k+ context. Don't know about higher because I'm always switching cards and shit

Anonymous
04/26/26(Sun)18:51:30 No.108697907

Anonymous 04/26/26(Sun)18:51:30 No.108697907

>>108697890
>Gemma 4 not thinking after an extended RP session?
another victim of a model's shitty attention. try disabling swa

Anonymous
04/26/26(Sun)18:52:37 No.108697920

Anonymous 04/26/26(Sun)18:52:37 No.108697920

>>108697890
Add a linebreak to your sys prompt, close your local server, reboot the server+reload model, it'll probably work now.

Anonymous
04/26/26(Sun)18:54:59 No.108697932

Anonymous 04/26/26(Sun)18:54:59 No.108697932

>>108697849
still bearable since 31b should run twice as fast a 70b and the thinking isn't obnoxiously long like qwen's

Anonymous
04/26/26(Sun)18:57:17 No.108697944

Anonymous 04/26/26(Sun)18:57:17 No.108697944

>>108693350
>but mah tranny ai reddit chat
can you waste of silicon 41% yourselves already

Anonymous
04/26/26(Sun)18:59:07 No.108697959

Anonymous 04/26/26(Sun)18:59:07 No.108697959

drummer why tf does anubis 1.2 shits itself like this at the start of a message:
# {{char}}'s Perspective
never seen a model start with markdown

Anonymous
04/26/26(Sun)19:05:02 No.108697990

Anonymous 04/26/26(Sun)19:05:02 No.108697990

>>108697906
Comparing Qwen 3.6 27B vs 35B-A3B.

Anonymous
04/26/26(Sun)19:10:03 No.108698015

Anonymous 04/26/26(Sun)19:10:03 No.108698015

>>108698008
>>108698008
>>108698008

Anonymous
04/26/26(Sun)19:13:38 No.108698033

Anonymous 04/26/26(Sun)19:13:38 No.108698033

>>108697906
In my experience gemma 26b starts to have issues at like 40-50k. Might be the q8 kv cache quant though.

Anonymous
04/26/26(Sun)19:14:31 No.108698037

Anonymous 04/26/26(Sun)19:14:31 No.108698037

>>108694159
Trying for what? tranny chatting?
how about this, sell off all your gpus to finance dick remove surgery and now you can go ERP IRL all you want

Anonymous
04/26/26(Sun)19:16:14 No.108698050

Anonymous 04/26/26(Sun)19:16:14 No.108698050

>>108698037
(u)

Anonymous
04/26/26(Sun)19:31:36 No.108698128

Anonymous 04/26/26(Sun)19:31:36 No.108698128

>>108697826
Eventually we will have permanent memory and continual Learning once the models weights can be actively updated as you use them. But I don't see it anytime soon.

Anonymous
04/26/26(Sun)19:43:06 No.108698188

Anonymous 04/26/26(Sun)19:43:06 No.108698188

>>108697547
To avoid death threats when saying “I don’t like this TV show”, you’re kinda forced into anonymity. And this bleeds over into the rest of your online life. When you’re interacting with LLMs and it knows nothing about you, it’s kind of like using untrained “shit tier models” people are always complaining about.

Anonymous
04/26/26(Sun)20:30:54 No.108698407

Anonymous 04/26/26(Sun)20:30:54 No.108698407

>>108698188
nta, i still don't get it either
>And this bleeds over into the rest of your online life.
I get this part, like how I won't release a moaning/slurping capable TTS model with my resume attached.
>When you’re interacting with LLMs and it knows nothing about you, it’s kind of like using untrained “shit tier models” people are always complaining about.
You mean it's worse in a fresh context? If so, just tell it what you already know. Doesn't have to be in the system prompt but in the first message. Something like.
"Explain to an 8 year the difference between YPbPr and RGB. And why do they both look so much better than composite? P.S. I'm 8, but not retarded, so don't talk down to me."

Anonymous
04/26/26(Sun)22:22:20 No.108698975

Anonymous 04/26/26(Sun)22:22:20 No.108698975

File: 1754544654515419.gif (1.14 MB, 400x226)

1.14 MB GIF

>>108697204
That's part of the reason Non-compete clauses exist, but those usually expire eventually.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.