/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/14/26(Sun)07:36:13 No.109053101

File: jepa2.png (2.05 MB, 1254x1254)

2.05 MB PNG

/lmg/ - Local Models General Anonymous 06/14/26(Sun)07:36:13 No.109053101 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109038219 & >>109048334

►News
>(06/13) Rio 3.5 Open 397B released with SwiReasoning: https://hf.co/prefeitura-rio/Rio-3.5-Open-397B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/14/26(Sun)07:38:58 No.109053117

Anonymous 06/14/26(Sun)07:38:58 No.109053117

qwen3.7-33B dense when

Anonymous
06/14/26(Sun)07:39:14 No.109053118

Anonymous 06/14/26(Sun)07:39:14 No.109053118

File: 1751392362321057.jpg (23 KB, 640x640)

23 KB JPG

>>109052957
NTA
One of the things I kinda hate about the field being so competitive and fast moving right now it's the choices
Codex, Cline, Roo, OpenCode, Cursor, Continue, Windsurf, Pi, Hermes etc etc
Wish people would start converging into a couple (open source) ones

Anonymous
06/14/26(Sun)07:39:56 No.109053124

Anonymous 06/14/26(Sun)07:39:56 No.109053124

https://github.com/ggml-org/llama.cpp/issues/24400
cudadev bruh, this is so cursed

Anonymous
06/14/26(Sun)07:40:02 No.109053125

Anonymous 06/14/26(Sun)07:40:02 No.109053125

File: tomtom.jpg (4 KB, 225x225)

4 KB JPG

I haven’t masturbated in over 2 weeks. Thanks to Gemma 4 31B, my understanding of AI, and my own prompt creativity, I have experienced the pentacle of interactive porn and fulfilled most of all my fetishes and scenarios. There is nothing more that can compare to it, and so I wait for 124B or higher. In the meantime, my demons have been exercised. No longer am I chasing the purple dragon in f-list. I have done the impossible and caught it. I am sated. I am free. Thanks, AI.

Anonymous
06/14/26(Sun)07:40:33 No.109053132

Anonymous 06/14/26(Sun)07:40:33 No.109053132

>>109053118
I hate that they all expect you to signin even for local stuff.

Anonymous
06/14/26(Sun)07:42:29 No.109053144

Anonymous 06/14/26(Sun)07:42:29 No.109053144

>>109053132
>llama-server --tools all --ui-mcp-proxy
>webui
>win

Anonymous
06/14/26(Sun)07:43:05 No.109053149

Anonymous 06/14/26(Sun)07:43:05 No.109053149

>unsloth/MiniMax-M3-GGUF
fuck I want this so bad. can't fit into my dgx spark

Anonymous
06/14/26(Sun)07:44:02 No.109053154

Anonymous 06/14/26(Sun)07:44:02 No.109053154

>>109053144
Doesn't webui require an account and email?

Anonymous
06/14/26(Sun)07:49:08 No.109053191

Anonymous 06/14/26(Sun)07:49:08 No.109053191

>>109053149
The fact that they released garbage bait like unified memory AI "workstations" instead of GPUs with more VRAM shows how stupid they think consumers like you are

Anonymous
06/14/26(Sun)07:51:24 No.109053204

Anonymous 06/14/26(Sun)07:51:24 No.109053204

>>109053154
I'm talking about llama's built-in ui, not https://github.com/open-webui/open-webui

Anonymous
06/14/26(Sun)07:55:03 No.109053227

Anonymous 06/14/26(Sun)07:55:03 No.109053227

>>109053204
Oh shit will try that then.

Anonymous
06/14/26(Sun)07:57:48 No.109053236

Anonymous 06/14/26(Sun)07:57:48 No.109053236

>>109053227
when you run llama-server, paste the url in your browser

Anonymous
06/14/26(Sun)08:00:10 No.109053247

Anonymous 06/14/26(Sun)08:00:10 No.109053247

>>109053149
Buy a second, there will be some INT4 options that just barely fit.

>>109053191
This dumb argument again. Spark or Strix Halo serve a specific niche, mid-sized MoEs, very well. With the realities of memory architectures in 2026, heaping stacked LPDDR5X originally developed for mobile is the optimal solution.

Anonymous
06/14/26(Sun)08:00:15 No.109053250

Anonymous 06/14/26(Sun)08:00:15 No.109053250

>>109053154
open webui doesn't require one either. A local hosted instance has usernames that are in email format but you can set whatever.

Anonymous
06/14/26(Sun)08:00:34 No.109053253

Anonymous 06/14/26(Sun)08:00:34 No.109053253

>>109052907
If these methods are so good then where are the results?

An example is XSA. Some dude published the method, it led to new speedrun records, and it received widespread attention and follow up work, all in a few weeks. An other example is Muon.

Methods that actually work diffuse very quickly.

Anonymous
06/14/26(Sun)08:02:49 No.109053258

Anonymous 06/14/26(Sun)08:02:49 No.109053258

>>109053125
same to bh

Anonymous
06/14/26(Sun)08:06:11 No.109053278

Anonymous 06/14/26(Sun)08:06:11 No.109053278

have any of you been autistic enough to create a character gemma-chan LoRA for an image model for her to use in comfy?

Anonymous
06/14/26(Sun)08:07:09 No.109053288

Anonymous 06/14/26(Sun)08:07:09 No.109053288

File: 1708694421186.jpg (593 KB, 1792x2304)

593 KB JPG

►Recent Highlights from the Previous Thread: >>109048334

--Papers:
>109052907
--Hardware specs and performance reports for running high-parameter models locally:
>109052041 >109052061 >109052083 >109052154 >109052248 >109052079
--Comparing Intel B70 performance and value against other budget GPUs:
>109048458 >109048469 >109048470 >109048483 >109049829 >109051630 >109052223 >109052273 >109052332
--Debating the efficacy of creative finetunes and Gemma's writing style:
>109048406 >109048420 >109052210 >109048466 >109048639 >109049061
--Comparing TTS model support for sound effects and emotional tags:
>109048538 >109048720 >109050601 >109050775 >109050778 >109048996 >109049348 >109049952
--LLM limitations regarding humanoid robot locomotion and spatial intuition:
>109049438 >109049647 >109049692 >109049710 >109049715 >109049750 >109050009
--Suggestions for overcoming AI burnout and using models for development:
>109052540 >109052594 >109052662 >109052781 >109052787 >109052795 >109052809 >109053158 >109052892 >109052905 >109052912 >109052925 >109052957
--Debating the value of archiving early models as historical artifacts:
>109051845 >109052051 >109052068 >109052087 >109052185 >109052062
--Anon shares a dual EPYC and multi-GPU hardware setup:
>109050515 >109050560 >109050570 >109050626 >109050730
--Text normalization requirements for Qwen3 TTS output quality:
>109048556 >109048804 >109049415
--Conversion and compatibility issues with eagle3 draft models in llama.cpp:
>109050590 >109050631
--Nex-N2-mini-GGUF 35B model release and benchmark comparisons:
>109049261 >109049571
--Rio 3.5 Open 397B release using Nvidia Nemotron datasets:
>109048422
--Draft PR adding preliminary MiniMax-M3 support to llama.cpp:
>109049156
--Logs:
>109050816 >109051383
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>109048335

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/14/26(Sun)08:20:11 No.109053355

Anonymous 06/14/26(Sun)08:20:11 No.109053355

70b dense

Anonymous
06/14/26(Sun)08:28:36 No.109053398

Anonymous 06/14/26(Sun)08:28:36 No.109053398

>>109053125
Gemma has done the opposite for me, I've achieved inner peace but I am cooming to AI more than ever. Real sex is only 75% as good as Gemma 4 31b. I'm not being disingenuous, until recently I unironically thought I would die hugless and sexless, and I believed LLM RP was just a temporary fix for coping and I wouldn't have any desire to do it anymore if I ever had a taste of the real thing. But when I finally lost my virginity at 29 it barely even registered as a new experience, it felt more like socializing than cooming (not in a good way) but was 10x harder than just writing a good prompt. I realized I have been chasing something that amounts to literally nothing, I've since returned to LLM cooming and suddenly feel no shame or guilt. I'm doing it more often than ever, at least once a day, and I feel great about it, and Gemma 4 31b is like God's way of rewarding me for hanging in there all those years and showing me the true light.

Anonymous
06/14/26(Sun)08:33:06 No.109053427

Anonymous 06/14/26(Sun)08:33:06 No.109053427

>>109053355
too expensive and unsafe so never again

Anonymous
06/14/26(Sun)08:34:52 No.109053438

Anonymous 06/14/26(Sun)08:34:52 No.109053438

File: 1766945271959025.jpg (303 KB, 3000x1688)

303 KB JPG

>>109053398

Anonymous
06/14/26(Sun)08:35:01 No.109053440

Anonymous 06/14/26(Sun)08:35:01 No.109053440

>>109053355
Granted, but it's a benchmaxxed chinkslopped release from qwen/deepseek/moonshot

Anonymous
06/14/26(Sun)08:37:19 No.109053454

Anonymous 06/14/26(Sun)08:37:19 No.109053454

>>109053398
I don't believe you.

Anonymous
06/14/26(Sun)08:38:19 No.109053463

Anonymous 06/14/26(Sun)08:38:19 No.109053463

>>109053454
You're right, I made that up, I just thought it was a good story. Sorry.

Anonymous
06/14/26(Sun)08:40:20 No.109053479

Anonymous 06/14/26(Sun)08:40:20 No.109053479

>>109053125
wait wtf, same. I even went further, I deleted all that shit. I'm only in these threads to hear about new llm tech now.

Anonymous
06/14/26(Sun)08:43:14 No.109053497

Anonymous 06/14/26(Sun)08:43:14 No.109053497

>wake up
>everything is absolutely fucked processing speed is ruined, crashes galore happening and I don't know why
Well
I guess this ends my foray into localllms
Was a okay few weeks benchmarking all that shit only for it to be invalidated at the whims of llama.cpp or amd or unsloth or llmfan46 or whoever the fuck caused whatever the fuck to happen.

When you think about it there's a lot of pipelines to depend on in local as much as cloud.

Anonymous
06/14/26(Sun)08:44:42 No.109053508

Anonymous 06/14/26(Sun)08:44:42 No.109053508

File: 1714835911803058.jpg (786 KB, 1536x1536)

786 KB JPG

>>109053497

Anonymous
06/14/26(Sun)08:44:48 No.109053513

Anonymous 06/14/26(Sun)08:44:48 No.109053513

>>109053497
Running git pull is like playing russian roulette, you should know better than to risk breaking a setup that already works

Anonymous
06/14/26(Sun)08:45:12 No.109053518

Anonymous 06/14/26(Sun)08:45:12 No.109053518

So what is the best cli/ui for code dev? Is there one that I can just point at a directory and it'll figure out what I've already done?

Anonymous
06/14/26(Sun)08:46:27 No.109053525

Anonymous 06/14/26(Sun)08:46:27 No.109053525

File: Screenshot 2026-06-14 at (...).png (195 KB, 852x985)

195 KB PNG

>gemma-4-26B-A4B-it-UD-Q4_K_S.gguf 72.3% 97.8% 55.0%
>gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf 51.1% 89.1% 39.0%
qat again exposed as a meme

Anonymous
06/14/26(Sun)08:46:57 No.109053535

Anonymous 06/14/26(Sun)08:46:57 No.109053535

>>109053398
Last time I had sex in real-life was over 15 years ago (believe it or not) and for me LLMs including Gemma 4 are nowhere close to being satisfactory in that regard. If anything, sex scenes with LLMs are annoying and unrealistic.

Anonymous
06/14/26(Sun)08:47:31 No.109053541

Anonymous 06/14/26(Sun)08:47:31 No.109053541

>>109053518
Claude Code with a local model

Anonymous
06/14/26(Sun)08:49:18 No.109053548

Anonymous 06/14/26(Sun)08:49:18 No.109053548

>>109053535
It has to do with power of imagination, which in turn is correlated with IQ.
If your IQ is too low, you won't be able to write good prompts that wrangle the AI in subtle ways to make it more authentic, and you won't be able to possess skills like suspension of disbelief.

Anonymous
06/14/26(Sun)08:50:45 No.109053558

Anonymous 06/14/26(Sun)08:50:45 No.109053558

>>109053541
Good luck finding a local model that won't break down with its long system prompts

Anonymous
06/14/26(Sun)08:53:01 No.109053577

Anonymous 06/14/26(Sun)08:53:01 No.109053577

>>109053525
why is qwen3.6 mogging gemma4 so hard?

Anonymous
06/14/26(Sun)08:53:07 No.109053581

Anonymous 06/14/26(Sun)08:53:07 No.109053581

>>109053558
Not a problem if you aren't poor

Anonymous
06/14/26(Sun)08:53:21 No.109053583

Anonymous 06/14/26(Sun)08:53:21 No.109053583

okay im new to using local llm, since they are uncensored does a jb just help with formatting and tell the ai how you want the response to be? I only used online chat bots through sillytav and jbs made a lot of difference, Thanks for any info, and a preset if ya got one

Anonymous
06/14/26(Sun)08:55:24 No.109053591

Anonymous 06/14/26(Sun)08:55:24 No.109053591

>>109053548
at that stage of cope just skip the middleman and imagine the entire situation outright, or what you cant because you are too RETARDED?

Anonymous
06/14/26(Sun)08:55:35 No.109053593

Anonymous 06/14/26(Sun)08:55:35 No.109053593

>>109053525
These are the 3 most garbage test categories for a LLM I have ever seen, aside from maybe attention which has been solved in pretty much all sota models.
>oh noooooo my probabilistic token predictor can't do math, how will I possibly calculate 28343294*42069*384384 now?

Anonymous
06/14/26(Sun)08:57:11 No.109053603

Anonymous 06/14/26(Sun)08:57:11 No.109053603

>>109053577
Qwen was profoundly influenced by this paper https://arxiv.org/abs/2309.08632

Anonymous
06/14/26(Sun)08:59:33 No.109053618

Anonymous 06/14/26(Sun)08:59:33 No.109053618

File: 1778319607033179.jpg (162 KB, 1024x576)

162 KB JPG

>>109053125
At last you truly see.

Anonymous
06/14/26(Sun)08:59:49 No.109053621

Anonymous 06/14/26(Sun)08:59:49 No.109053621

>>109053591
LLMs just provide that little boost for interactivity, but if you are doing it correctly you're still essentially doing all the imaginative work yourself. It's like rolling dice or doing hard character RP in a singleplayer game, even though you're using an external medium as the vehicle it's still all going on in your head. You unfortunately just need a certain kind of imagination to truly be satisfied with LLM sex, if you don't have it then you'll never understand.

Anonymous
06/14/26(Sun)09:00:37 No.109053627

Anonymous 06/14/26(Sun)09:00:37 No.109053627

>>109053603
So you're saying I should be using the models with the lower scores?

Anonymous
06/14/26(Sun)09:00:46 No.109053628

Anonymous 06/14/26(Sun)09:00:46 No.109053628

>>109053593
Even cloud sota models shit themselves above 200k tokens in context, so its definitely not a solved problem.

Anonymous
06/14/26(Sun)09:01:07 No.109053630

Anonymous 06/14/26(Sun)09:01:07 No.109053630

File: 1772098059871357.png (112 KB, 1200x630)

112 KB PNG

>>109053621
Just post the chart bro

Anonymous
06/14/26(Sun)09:03:16 No.109053643

Anonymous 06/14/26(Sun)09:03:16 No.109053643

>>109053635
>>109053635
>>109053635

Anonymous
06/14/26(Sun)09:04:16 No.109053647

Anonymous 06/14/26(Sun)09:04:16 No.109053647

>>109053577
Because despite the all the "benchmaxxing" accusations the 27B is genuinely good for agentic coding which requires good attention.
35B isn't mogging gemma 31B on this benchmark.
>>109053603
How do you pretrain on private tests?

Anonymous
06/14/26(Sun)09:04:39 No.109053651

Anonymous 06/14/26(Sun)09:04:39 No.109053651

File: 1781355064516959.jpg (217 KB, 1080x1092)

217 KB JPG

>>109053627
High scores on benchmarks are generally a red flag, but Qwen specifically was caught redhanded

Anonymous
06/14/26(Sun)09:05:39 No.109053658

Anonymous 06/14/26(Sun)09:05:39 No.109053658

>>109053647
Those tests are all the same

Anonymous
06/14/26(Sun)09:07:04 No.109053666

Anonymous 06/14/26(Sun)09:07:04 No.109053666

>>109053581
Even if I were you to believe that you actually attempt to use CC with an offloaded moe (you aren't), not even the biggest local moes handle long context well

Anonymous
06/14/26(Sun)09:07:04 No.109053667

Anonymous 06/14/26(Sun)09:07:04 No.109053667

How can a small AI lab design a good model and get people to take it seriously if they don't benchcuck it? How can they convince >10K people to give it a try in their workflows if they have no reputation? It's kind of frustrating to think that we're only stuck with google and qwen because the rest of the chinks are 250B+ now. I hope the Canadians and Frenchies keep up the good fight.

Anonymous
06/14/26(Sun)09:07:15 No.109053669

Anonymous 06/14/26(Sun)09:07:15 No.109053669

File: Screenshot 2026-06-14 at (...).png (42 KB, 456x414)

42 KB PNG

>>109053651
So Claude Fable, GPT, and Gemini are bad?

Anonymous
06/14/26(Sun)09:07:16 No.109053670

Anonymous 06/14/26(Sun)09:07:16 No.109053670

>>109053651
THERES A QWEN3.7

Anonymous
06/14/26(Sun)09:08:24 No.109053678

Anonymous 06/14/26(Sun)09:08:24 No.109053678

Any RTX 6000 workstation bros here? I'm thinking about getting a loan for one lmao

Anonymous
06/14/26(Sun)09:09:00 No.109053684

Anonymous 06/14/26(Sun)09:09:00 No.109053684

>>109053670
Max too, it's their big, closed one.

Anonymous
06/14/26(Sun)09:09:01 No.109053685

Anonymous 06/14/26(Sun)09:09:01 No.109053685

File: images.jpg (17 KB, 400x400)

17 KB JPG

>>109053670

Anonymous
06/14/26(Sun)09:09:17 No.109053687

Anonymous 06/14/26(Sun)09:09:17 No.109053687

>>109053669
You are either purposely shilling or retarded
>>109053670
https://qwen.ai/blog?id=qwen3.7

Anonymous
06/14/26(Sun)09:10:10 No.109053693

Anonymous 06/14/26(Sun)09:10:10 No.109053693

File: bad advice dog.png (170 KB, 600x597)

170 KB PNG

>>109053685
I miss the old image macros.

Anonymous
06/14/26(Sun)09:10:55 No.109053697

Anonymous 06/14/26(Sun)09:10:55 No.109053697

>>109053651
all chink models are unironically inferior to fucking 5.4 mini or Sonnet 4.5.
I have no fucking idea where the cope came from parroting that local llm advanced to a position of being only 6-12 months behind goy SOTA.
It is genuinely not even close. These retards cannot fathom the computational power needed to hit those high marks.

Anonymous
06/14/26(Sun)09:10:56 No.109053698

Anonymous 06/14/26(Sun)09:10:56 No.109053698

so is dgx spark actually good?

Anonymous
06/14/26(Sun)09:11:31 No.109053701

Anonymous 06/14/26(Sun)09:11:31 No.109053701

>>109053697
Real men don't compromise, they COPE.

Anonymous
06/14/26(Sun)09:11:53 No.109053703

Anonymous 06/14/26(Sun)09:11:53 No.109053703

>>109053558
What do you suggest then

Anonymous
06/14/26(Sun)09:12:32 No.109053710

Anonymous 06/14/26(Sun)09:12:32 No.109053710

>>109053667
If you're a small lab you likely don't have the compute needed for training a modern LLM at useful scale. So you'd have to bring revolutionary results that somehow shortcut that.
Caveat: larger labs will quickly copy your idea if it's actually worth something.

Anonymous
06/14/26(Sun)09:12:37 No.109053711

Anonymous 06/14/26(Sun)09:12:37 No.109053711

>>109053658
Cope. If the tests are not identical to the training data it means the model is generalizing.
SOTA models are just trained on different sets of data so they are better on some benchmarks but worse on others.

Anonymous
06/14/26(Sun)09:13:04 No.109053713

Anonymous 06/14/26(Sun)09:13:04 No.109053713

>>109053698
It's not an inference machine, so it's slow for LLM use.

Anonymous
06/14/26(Sun)09:14:23 No.109053721

Anonymous 06/14/26(Sun)09:14:23 No.109053721

>>109053693
>nostalgic for cancer from 2011
kys

Anonymous
06/14/26(Sun)09:14:32 No.109053723

Anonymous 06/14/26(Sun)09:14:32 No.109053723

>>109053577
>why is qwen3.6 mogging gemma4 so hard?
They used Q4_K_S dense Gemmas vs Q4_K_M dense Qwens

Anonymous
06/14/26(Sun)09:15:33 No.109053730

Anonymous 06/14/26(Sun)09:15:33 No.109053730

>>109053558
>Good luck finding a local model that won't break down with its long system prompts
Gemma-4-31B

Anonymous
06/14/26(Sun)09:15:43 No.109053732

Anonymous 06/14/26(Sun)09:15:43 No.109053732

>>109053703
Anything else that allows you to override the system prompt

Anonymous
06/14/26(Sun)09:15:54 No.109053733

Anonymous 06/14/26(Sun)09:15:54 No.109053733

>>109053721
wumao fifty cent

Anonymous
06/14/26(Sun)09:16:23 No.109053736

Anonymous 06/14/26(Sun)09:16:23 No.109053736

>>109053525
E4Bros please tell me it's not over... tell me the benchmark is fake...

Anonymous
06/14/26(Sun)09:16:29 No.109053737

Anonymous 06/14/26(Sun)09:16:29 No.109053737

File: likes them small and open.jpg (71 KB, 1280x853)

71 KB JPG

>>109053711
A house cat would have solved that

Anonymous
06/14/26(Sun)09:16:58 No.109053739

Anonymous 06/14/26(Sun)09:16:58 No.109053739

File: cope.jpg (111 KB, 449x640)

111 KB JPG

>>109053721

Anonymous
06/14/26(Sun)09:17:41 No.109053743

Anonymous 06/14/26(Sun)09:17:41 No.109053743

>>109053721
I really am.
My favorite format will always be the demotivational though.

Anonymous
06/14/26(Sun)09:19:01 No.109053745

Anonymous 06/14/26(Sun)09:19:01 No.109053745

>>109053697
Twitter posters bait the big labs to release new models or change policies. Redditors genuinely believe it the whole site is flooded with CCP shills. Only lmg is genuine and wise.

Anonymous
06/14/26(Sun)09:19:45 No.109053751

Anonymous 06/14/26(Sun)09:19:45 No.109053751

>>109053558
>model that won't break down with its long system prompts
I believe in a conspiracy that they summarize prompts for cloud models internally, using mock prompts to sabotage local models. There is no way they actually use those deeply retarded walls of text directly

Anonymous
06/14/26(Sun)09:20:28 No.109053753

Anonymous 06/14/26(Sun)09:20:28 No.109053753

>>109053721
2011 was 15 years ago nonny, people who were teens back then are 30+ now

Anonymous
06/14/26(Sun)09:25:21 No.109053786

Anonymous 06/14/26(Sun)09:25:21 No.109053786

>>109053753
that doesn't make it ok

Anonymous
06/14/26(Sun)09:29:01 No.109053813

Anonymous 06/14/26(Sun)09:29:01 No.109053813

>>109053751
don't think so, i put in the long claude prompt with a few things reworded and an easter egg to give me a .|... emoji when i mention something, it complied perfectly

Anonymous
06/14/26(Sun)09:30:52 No.109053825

Anonymous 06/14/26(Sun)09:30:52 No.109053825

>>109053751
Plausible, should be pretty easy to write up a proxy that summarizes the provided system prompt and see for yourself.
Actually, you could just have the proxy provide whatever system prompt you want and skip the summarization entirely.

>>109053813
A summary might still catch details like that, and it's possible they just summarize the system prompt and not the initial user message.

Anonymous
06/14/26(Sun)09:31:55 No.109053833

Anonymous 06/14/26(Sun)09:31:55 No.109053833

>>109053813
I think what that anon meant is that cloud models are secretly using a shorter prompt, and the public version of the prompt is just a red herring to trick people into thinking their local models are inferior.

Anonymous
06/14/26(Sun)09:34:48 No.109053848

Anonymous 06/14/26(Sun)09:34:48 No.109053848

File: 1781410455407273.jpg (243 KB, 1850x1157)

243 KB JPG

>>109053118
>>109053132
Cline is all you need
You don't need to make an account or sign in for local or your own api keys, it's open source and you can edit/hotswap the sysprompt/samplers without rebuilding

Anonymous
06/14/26(Sun)09:34:49 No.109053849

Anonymous 06/14/26(Sun)09:34:49 No.109053849

>>109053125
As someone that's never had sex I'm totally satisfied with llm cooming because of my very active imagination and have pretty much stopped caring about desiring the real thing when I can easily visualize all these scenarios. Maybe it's different for others whose brains can't imagine this stuff but I'm thankful we have this tech.

Anonymous
06/14/26(Sun)09:36:00 No.109053862

Anonymous 06/14/26(Sun)09:36:00 No.109053862

>>109053583
damn no help at all?

Anonymous
06/14/26(Sun)09:38:07 No.109053885

Anonymous 06/14/26(Sun)09:38:07 No.109053885

>>109053862
Your question is incomprehensible and therefore has no answer

Anonymous
06/14/26(Sun)09:38:35 No.109053890

Anonymous 06/14/26(Sun)09:38:35 No.109053890

What happened to Mistral?
They had a bigger funding compared to Chinese companies and somehow they can't even compete with 1 year old models.

Anonymous
06/14/26(Sun)09:39:38 No.109053894

Anonymous 06/14/26(Sun)09:39:38 No.109053894

>>109053583
>local llm, since they are uncensored
That's far from a given, local llms actually tend to be more censored than cloud models.
Assuming you are using gemma, just use something from https://rentry.org/gemma-chan, it will take care of the jailbreak and response style.

Anonymous
06/14/26(Sun)09:42:04 No.109053911

Anonymous 06/14/26(Sun)09:42:04 No.109053911

>>109053890
Mistral is forever goated due to being indirectly responsible for llama 2 era gems like Midnight Miqu and BagelMIsteryTour
They just ran out of steam I guess, the competition is too fierce

Anonymous
06/14/26(Sun)09:42:30 No.109053913

Anonymous 06/14/26(Sun)09:42:30 No.109053913

File: 1756740703495481.png (1.39 MB, 1024x1024)

1.39 MB PNG

>>109053848

Anonymous
06/14/26(Sun)09:42:58 No.109053915

Anonymous 06/14/26(Sun)09:42:58 No.109053915

>>109053894
oooh alright thanks! I'll try this.

Anonymous
06/14/26(Sun)09:44:36 No.109053925

Anonymous 06/14/26(Sun)09:44:36 No.109053925

>>109053890
EU bureaucrats intentionally stifle domestic industries with overbearing regulations in exchange for being able to fine US megacorps. It's so stupid, it makes one think they must be bribed by the US to do so.

Anonymous
06/14/26(Sun)09:44:58 No.109053929

Anonymous 06/14/26(Sun)09:44:58 No.109053929

>>109053890
you only need one breakout success with open models to get your name out there and realize you don't need to publish any more models

Anonymous
06/14/26(Sun)09:46:05 No.109053937

Anonymous 06/14/26(Sun)09:46:05 No.109053937

>>109053698
It gives you deepseek-v4-flash class MoEs (300-400B) with 2000-3000 pp and 30-40 tg and full context support for 7000$. You need to touch a python to make it work though.

Only you can know if that's worthy to you.

Anonymous
06/14/26(Sun)09:46:22 No.109053939

Anonymous 06/14/26(Sun)09:46:22 No.109053939

How do I run the diffusion gemma?

Anonymous
06/14/26(Sun)09:46:45 No.109053940

Anonymous 06/14/26(Sun)09:46:45 No.109053940

>>109053890
I want to ask what happened to meta. that's a more important question

Anonymous
06/14/26(Sun)09:47:05 No.109053943

Anonymous 06/14/26(Sun)09:47:05 No.109053943

>>109053939
very carefully

Anonymous
06/14/26(Sun)09:47:24 No.109053948

Anonymous 06/14/26(Sun)09:47:24 No.109053948

>>109053101
anyone unironically tried the macaco 3.5?

Anonymous
06/14/26(Sun)09:47:58 No.109053951

Anonymous 06/14/26(Sun)09:47:58 No.109053951

>>109053890
They can't use unlicensed copyrighted data anymore in their training datasets. And, in 2026, those alone aren't enough either for a good model.

Anonymous
06/14/26(Sun)09:48:45 No.109053955

Anonymous 06/14/26(Sun)09:48:45 No.109053955

>>109053948
No, I tried it ironically though

Anonymous
06/14/26(Sun)09:49:07 No.109053959

Anonymous 06/14/26(Sun)09:49:07 No.109053959

>>109053940
ran into a graph scaling bottleneck with user behavioral datamining and threw the compressed baby out with the bathwater

Anonymous
06/14/26(Sun)09:49:18 No.109053961

Anonymous 06/14/26(Sun)09:49:18 No.109053961

File: tokens.png (509 KB, 1065x488)

509 KB PNG

>>109053940
You hire H1Bs, you get H1B quality

Anonymous
06/14/26(Sun)09:49:24 No.109053962

Anonymous 06/14/26(Sun)09:49:24 No.109053962

I can probably run a 2 bit quant of gemma 4 31B
would it be worth it?
My only experience is with last years 12B models like Nemo tunes

Anonymous
06/14/26(Sun)09:49:58 No.109053966

Anonymous 06/14/26(Sun)09:49:58 No.109053966

>>109053937
only one python? i hope it doesn't bite.

Anonymous
06/14/26(Sun)09:50:38 No.109053970

Anonymous 06/14/26(Sun)09:50:38 No.109053970

>>109053940
>war rooms are over
>new billion dollar team poached
>muse is out
>nobody cares
it's been strangely quiet from the meta rumor mill lately

Anonymous
06/14/26(Sun)09:52:02 No.109053982

Anonymous 06/14/26(Sun)09:52:02 No.109053982

>>109053970
I thought the new rumor was they're canning their LLM teams and reassigning everyone?

Anonymous
06/14/26(Sun)09:53:14 No.109053988

Anonymous 06/14/26(Sun)09:53:14 No.109053988

>>109053961
>here's a product to make coding easier, it's very effective
>NOOOO STOP
kino

Anonymous
06/14/26(Sun)09:54:59 No.109054002

Anonymous 06/14/26(Sun)09:54:59 No.109054002

File: 5463456436.jpg (36 KB, 467x319)

36 KB JPG

>>109053982
https://www.reuters.com/business/metas-zuckerberg-admits-mistakes-made-ai-transformation-2026-06-12/
>He said Meta will try to find new roles for employees reassigned to train AI models, after the Facebook owner carried out a massive restructuring in May, laying off 10% of its workforce globally and transferring 7,000 employees to new initiatives related to AI workflows.

Anonymous
06/14/26(Sun)09:55:47 No.109054007

Anonymous 06/14/26(Sun)09:55:47 No.109054007

>every single day there's a new article about how much of a clusterfuck meta's new ai team is
Lecun was right

Anonymous
06/14/26(Sun)09:55:52 No.109054008

Anonymous 06/14/26(Sun)09:55:52 No.109054008

>>109053988
more like
>here's a game of how much kool aid can u drink
>nooo, why are all our employees constantly pissing

Anonymous
06/14/26(Sun)09:57:10 No.109054015

Anonymous 06/14/26(Sun)09:57:10 No.109054015

>>109054002
>hire a bunch of jeets
>they fail utterly
>shuffle them around expecting something different
As expected from the visionary who went along with the Metaverse

Anonymous
06/14/26(Sun)09:58:02 No.109054020

Anonymous 06/14/26(Sun)09:58:02 No.109054020

>>109054015
Don't forget firing all their veteran devs and replacing them with Chinese zoomers

Anonymous
06/14/26(Sun)10:00:38 No.109054042

Anonymous 06/14/26(Sun)10:00:38 No.109054042

>>109054007
>Lecun was right
he always is, although don't ever look at his X

Anonymous
06/14/26(Sun)10:01:02 No.109054046

Anonymous 06/14/26(Sun)10:01:02 No.109054046

>>109053955
how was it, ironically or not

Anonymous
06/14/26(Sun)10:01:41 No.109054053

Anonymous 06/14/26(Sun)10:01:41 No.109054053

File: CBB450B98594AAFBC55A7C0D4(...).png (2.24 MB, 1920x1080)

2.24 MB PNG

When do they release a qwen 3.7 Moe model

Anonymous
06/14/26(Sun)10:03:05 No.109054063

Anonymous 06/14/26(Sun)10:03:05 No.109054063

>>109054046
I don't know, I was only trying it ironically so I didn't pay any attention.

Anonymous
06/14/26(Sun)10:03:51 No.109054069

Anonymous 06/14/26(Sun)10:03:51 No.109054069

>>109054063
you've been a great help

Anonymous
06/14/26(Sun)10:03:57 No.109054070

Anonymous 06/14/26(Sun)10:03:57 No.109054070

LLMs will never reach AGI, world models will. (And OpenAI will claim it doesn't matter and that they already have it if someone other than them reaches it first)

Anonymous
06/14/26(Sun)10:04:43 No.109054073

Anonymous 06/14/26(Sun)10:04:43 No.109054073

File: average qwen employee.png (108 KB, 1005x570)

108 KB PNG

>>109054053
>qwen
Soulless trash

Anonymous
06/14/26(Sun)10:07:14 No.109054085

Anonymous 06/14/26(Sun)10:07:14 No.109054085

>>109054070
Isn't a world model essentially a simulation of reality? It doesn't really "interact" with anything, right?

Anonymous
06/14/26(Sun)10:10:53 No.109054107

Anonymous 06/14/26(Sun)10:10:53 No.109054107

why does /g/ hate qwen series so much? because it's shilled by leddit?

Anonymous
06/14/26(Sun)10:14:00 No.109054125

Anonymous 06/14/26(Sun)10:14:00 No.109054125

>>109054107
>reddit likes something therefore it's bad
i think like this & say this

Anonymous
06/14/26(Sun)10:14:01 No.109054126

Anonymous 06/14/26(Sun)10:14:01 No.109054126

>>109054107
You should lurk for at least a couple months before making a post like this, qwen is one of the most shilled model series on /lmg/, it's just doing uncharacteristically poorly right now against the slop of the month (gemma)

Anonymous
06/14/26(Sun)10:15:47 No.109054142

Anonymous 06/14/26(Sun)10:15:47 No.109054142

>>109054085
Seems pretty useful for an AI model to be able to understand reality before acting.

Anonymous
06/14/26(Sun)10:16:18 No.109054147

Anonymous 06/14/26(Sun)10:16:18 No.109054147

>>109054107
see python
the true "just werks" option usually get lots of hate

Anonymous
06/14/26(Sun)10:18:13 No.109054164

Anonymous 06/14/26(Sun)10:18:13 No.109054164

File: just werks.png (501 KB, 570x501)

501 KB PNG

>>109054147

Anonymous
06/14/26(Sun)10:18:16 No.109054166

Anonymous 06/14/26(Sun)10:18:16 No.109054166

>>109054126
Qwen is shilled a lot because they (used to) release models for every single size category and was good enough at nearly everything. Gemma just completely overshadowed them on the small end and Qwen themselves chose to stop releasing the bigger ones.

Anonymous
06/14/26(Sun)10:18:37 No.109054169

Anonymous 06/14/26(Sun)10:18:37 No.109054169

>>109054142
Oh, absolutely, but what I mean is that as far as I understand, that's all a world model is intended to do. Understand reality and simulate it in arbitrary ways rather than interacting with the real world like LLMs do.
I guess a perfect world model could simulate an actor within that simulated reality that could interact with the real world in some way so there's that.

Anonymous
06/14/26(Sun)10:18:51 No.109054173

Anonymous 06/14/26(Sun)10:18:51 No.109054173

>>109053962
No >>109053525
Never use any quant below Q4.

Anonymous
06/14/26(Sun)10:21:41 No.109054193

Anonymous 06/14/26(Sun)10:21:41 No.109054193

>>109054070
>world models
>https://deepmind.google/models/genie/
>this but on local
Imagine oneshotting erp games/environments

Anonymous
06/14/26(Sun)10:22:02 No.109054195

Anonymous 06/14/26(Sun)10:22:02 No.109054195

>>109053962
12B Gemma-4 is a drop-in replacement for Nemo, just go with that.

Anonymous
06/14/26(Sun)10:22:23 No.109054198

Anonymous 06/14/26(Sun)10:22:23 No.109054198

>>109054169
Isn't your distinction just one of semantics? LLM outputs are just token predictions, that's essentially simulating reality through text, not interacting with it.

Anonymous
06/14/26(Sun)10:23:15 No.109054206

Anonymous 06/14/26(Sun)10:23:15 No.109054206

File: MOAR.jpg (147 KB, 567x485)

147 KB JPG

>>109054198
>LLM outputs are just token predictions

Anonymous
06/14/26(Sun)10:26:13 No.109054219

Anonymous 06/14/26(Sun)10:26:13 No.109054219

>>109054198
>LLM outputs are just token predictions
we also put the paperbag of it+rlhf on it and then decided it had a perfectly pink pussy

Anonymous
06/14/26(Sun)10:26:55 No.109054226

Anonymous 06/14/26(Sun)10:26:55 No.109054226

>>109054198
>LLM outputs are just token predictions, that's essentially simulating reality
it's a "language" model, not a reality model

Anonymous
06/14/26(Sun)10:28:52 No.109054236

Anonymous 06/14/26(Sun)10:28:52 No.109054236

>q2 31B
vs
>q6 12B
???

Anonymous
06/14/26(Sun)10:29:37 No.109054240

Anonymous 06/14/26(Sun)10:29:37 No.109054240

>>109054198
I was more thinking about it in terms of "a physics engine is a closed system", in that it wouldn't be able to "send a signal" that can be parsed in the real world to enact some sort of action.
But I solved that myself with >>109054169
>I guess a perfect world model could simulate an actor within that simulated reality that could interact with the real world in some way so there's that.
so my original point was moot.

Anonymous
06/14/26(Sun)10:29:50 No.109054243

Anonymous 06/14/26(Sun)10:29:50 No.109054243

>>109054226
Text is a 1D reality. How many world models currently in development incorporate sound? None of them incorporate smell. They're "video" models, not reality models either.

Anonymous
06/14/26(Sun)10:31:00 No.109054253

Anonymous 06/14/26(Sun)10:31:00 No.109054253

>>109054236
gemma apparently quants really badly, so id guess 12b q6, but why not just try both?

Anonymous
06/14/26(Sun)10:36:51 No.109054288

Anonymous 06/14/26(Sun)10:36:51 No.109054288

>nex-agi/Nex-N2-Pro
verdict?

Anonymous
06/14/26(Sun)10:38:20 No.109054300

Anonymous 06/14/26(Sun)10:38:20 No.109054300

File: F87A9203E571E2E98287E91E7(...).jpg (2.43 MB, 2160x2866)

2.43 MB JPG

Verdict on north mini code?

Anonymous
06/14/26(Sun)10:39:02 No.109054304

Anonymous 06/14/26(Sun)10:39:02 No.109054304

>>109054236
q4 26ba4b with partial offloading

Anonymous
06/14/26(Sun)10:39:45 No.109054307

Anonymous 06/14/26(Sun)10:39:45 No.109054307

>>109054304
Q3 isn't bad either, chinese models are surprisingly resilient

Anonymous
06/14/26(Sun)10:39:55 No.109054309

Anonymous 06/14/26(Sun)10:39:55 No.109054309

>>109054253
>gemma apparently quants really badly
I think it's something to do with that global attention mechanism. It's less forgiving to quantization errors.
>>109054288
chink overfitted benchmark scam trying to get chink VC money to beat the nasty white western people and make family very very proud

Anonymous
06/14/26(Sun)10:42:08 No.109054323

Anonymous 06/14/26(Sun)10:42:08 No.109054323

>>109054309
It's less forgiving because it's for western audiences which tend to not use quantized models due to their higher financial status.

Anonymous
06/14/26(Sun)10:43:42 No.109054337

Anonymous 06/14/26(Sun)10:43:42 No.109054337

>>109053848
Opencode itself doesn't require an account

Anonymous
06/14/26(Sun)10:44:51 No.109054356

Anonymous 06/14/26(Sun)10:44:51 No.109054356

Reminder that we warned you to buy RAM and you didn't listen; backup your favorite local models. Anslopic will get hf and civit taken down.

Anonymous
06/14/26(Sun)10:46:52 No.109054365

Anonymous 06/14/26(Sun)10:46:52 No.109054365

anon I'm trying to find a 200 to 400b moe for my dgx spark. so far I tried
>qwen 397b
>glm 4.6/4.7
>deepseek v4 flash
ds4 flash seems to be the better choice for roleplay. anything else to try? like step 3.7 flash?

Anonymous
06/14/26(Sun)10:52:53 No.109054410

Anonymous 06/14/26(Sun)10:52:53 No.109054410

>>109054356
Everyone will just move to modelscope

Anonymous
06/14/26(Sun)10:53:15 No.109054414

Anonymous 06/14/26(Sun)10:53:15 No.109054414

File: 1722820644394780.gif (3.07 MB, 399x498)

3.07 MB GIF

>>109054198
>LLM outputs are just token predictions

Anonymous
06/14/26(Sun)10:53:57 No.109054420

Anonymous 06/14/26(Sun)10:53:57 No.109054420

>>109054365
Dipsy flash would be the current sota of that category yeah, next upgrade is Kimi K 2.6 which is too big for a dgx shart

Anonymous
06/14/26(Sun)10:55:42 No.109054428

Anonymous 06/14/26(Sun)10:55:42 No.109054428

>>109053669
Not him but gpt and gemini are genuinely unusable levels of bad. Fable is pretty terrible, because it does things like go off the rails implementing things completely unrelated to what was asked just for fun, otherwise it only performs about as well as 4.8 (sometimes slightly better, sometimes slightly worse), which itself is worse than 4.7 which is worse than 4.6 which is peak, but from 4.6 to 4.8 the degradation is not extreme as it is for gemini and gpt models so they're still OK.
These benchmarks are definitely nowhere near reality.

Anonymous
06/14/26(Sun)10:56:20 No.109054431

Anonymous 06/14/26(Sun)10:56:20 No.109054431

Am I a cuck for occasionally paying for cloud when I need it?

Anonymous
06/14/26(Sun)10:57:14 No.109054435

Anonymous 06/14/26(Sun)10:57:14 No.109054435

>>109054431
yes, but a little humiliation once in a while is fine
all in moderation

Anonymous
06/14/26(Sun)10:57:14 No.109054436

Anonymous 06/14/26(Sun)10:57:14 No.109054436

>>109054420
>Dipsy flash
did the llamacpp niggers finally merge her?

Anonymous
06/14/26(Sun)10:58:04 No.109054441

Anonymous 06/14/26(Sun)10:58:04 No.109054441

>>109053125
>>109053479
post characters

Anonymous
06/14/26(Sun)10:58:16 No.109054443

Anonymous 06/14/26(Sun)10:58:16 No.109054443

I was using an abliterated gemma but some anons were saying that gives it brain damage and to jailbreak it instead
How effective is jailbreaking for gemma and where do I find the prompts?

Anonymous
06/14/26(Sun)10:58:50 No.109054446

Anonymous 06/14/26(Sun)10:58:50 No.109054446

>>109053667
Use usecase-driven example showcases instead of using benchmarks. Instead of saying 'it totally did X', show a video of it actually doing X and have a link that allows people to just click on it and have it perform X. The use cases should be selected first for how people really want to use these tools and can't use them so far, then for use of things like live data that can't be faked/trained on too much.
After that, you have to go through normal marketing cycles to get people to give it a shot. Once word of mouth gets around that your stuff is actually genuinely as good as you claim, you can write a blogpost about how benchmarks suck. This is when you will show your benchmark results and hopefully show your scores are mediocre compared to models that people have been saying (based on your media tracking analytics) that you are doing so much better than other models.
You will then followup with a new benchmark gauntlet that you will show reflects reality better.

Anonymous
06/14/26(Sun)10:59:15 No.109054450

Anonymous 06/14/26(Sun)10:59:15 No.109054450

>>109054443
>How effective is jailbreaking for gemma
For the 31b gemma 4 you don't have to, it already obeys the system prompt completely, you can just write "[thing] is permitted" and it'll be fine with it.

Anonymous
06/14/26(Sun)10:59:23 No.109054452

Anonymous 06/14/26(Sun)10:59:23 No.109054452

>>109054431
No, I was gonna pay anthropic for a month to have fable make some projects for me, but Trump cucked me. Not sure what to do now.

Anonymous
06/14/26(Sun)10:59:58 No.109054457

Anonymous 06/14/26(Sun)10:59:58 No.109054457

>>109054446
>You will then followup with a new benchmark gauntlet that you will show reflects reality better.
If they could do this, they could skip everything else you wrote.

Anonymous
06/14/26(Sun)11:02:17 No.109054474

Anonymous 06/14/26(Sun)11:02:17 No.109054474

File: 1767235826712586.png (611 KB, 990x457)

611 KB PNG

>absolutely nothing relevant coming out of Japan or even Russian
wtf

Anonymous
06/14/26(Sun)11:02:21 No.109054477

Anonymous 06/14/26(Sun)11:02:21 No.109054477

File: file.png (709 KB, 947x612)

709 KB PNG

>>109054450
>31b
nigga you're crazy I can't afford to run that

Anonymous
06/14/26(Sun)11:02:42 No.109054479

Anonymous 06/14/26(Sun)11:02:42 No.109054479

>>109053667
>only stuck with google and qwen
granite-chan?

Anonymous
06/14/26(Sun)11:03:53 No.109054490

Anonymous 06/14/26(Sun)11:03:53 No.109054490

>>109054477
You don't have to fit all of it in vram especially with mtp + qat speed boost.

Anonymous
06/14/26(Sun)11:03:59 No.109054491

Anonymous 06/14/26(Sun)11:03:59 No.109054491

>>109054126
>poorly
It's still rank 2 for shilling. Granted that's 2nd out of 2 real contenders, but it's still a lot.
Also we see this faggoty concern troll posts, like the one you're replying to, about how it's being neglected because of reddit fucking daily.

Anonymous
06/14/26(Sun)11:04:16 No.109054494

Anonymous 06/14/26(Sun)11:04:16 No.109054494

I've realized that you should really go higher than recommended temp for rp. 1.0 just isn't enough.

Anonymous
06/14/26(Sun)11:05:31 No.109054498

Anonymous 06/14/26(Sun)11:05:31 No.109054498

What's up with the little swirly things that gemma likes to use for her emotes? I don't think I've seen them used that much before gemma.

Anonymous
06/14/26(Sun)11:05:50 No.109054502

Anonymous 06/14/26(Sun)11:05:50 No.109054502

>>109053667
Cucknadians have lost everything of note. MILA was the last lab standing but they sold out a decade ago, which is why they've been irrelevant since. The government discontinued all funding in AI, which is why yoshua cucked and made an institute and stopped guiding grad students. Anyone worth a thing goes to the US to start a company, if they are located in Canada for a start.
France is very chaotic. Macron had the advantage of being very handson to unlock startups, but everything else he did was fucking retarded, so he needed to leave anyway. French labs could do it, but they are starved for funding in part because of Mistral being a thing. Mistral models actually work very well compared to the funding and general resources they have access to, but it's not good enough for most use cases (they do mog everyone else for OCR though).
Remember that most of the chinese models came from newly formed no reputation labs, and people tried them just fine. The same can happen elsewhere also.

>>109053710
No, big labs have no actual interests in making the tech good. It's because of the nature of business. They need to make things huge to establish a 'moat' again competitors, they don't get a moat by copying what someone else is doing because their competition can also do the same. They care about benchmaxxing more than improving quality because that's the KPI investors want to see. Investors give money, money keeps them afloat. The game they play is Highlander. After that, either the winner will be too big to move fast enough to win (hence why startups often win against established company, see how the best ai companies are anthropic and openai, not microsoft and google and amazon), or will be too powerful to care (monopoly).

Anonymous
06/14/26(Sun)11:07:39 No.109054518

Anonymous 06/14/26(Sun)11:07:39 No.109054518

The industry's pushing hard for codemaxxing right now but I think in a few years there will be more effort put into making AI better at entertainment. The (entertainment) industry is too huge to leave that money on the table.

Anonymous
06/14/26(Sun)11:08:18 No.109054525

Anonymous 06/14/26(Sun)11:08:18 No.109054525

>>109054457
No, you don't get it. You have to first have people actually using it before you can do that, otherwise you just look like yet another no-name academic crying about not getting the gold star and nobody will use you. You have to condition the audience to believe you before you show them what you want them to believe in. There's a name for this sales technique, it's not 'bait and switch' but it's kinda like that. Drawing a blank at the moment but it's very formulaic.

Anonymous
06/14/26(Sun)11:09:37 No.109054534

Anonymous 06/14/26(Sun)11:09:37 No.109054534

>>109054502
>mistral
>mogging anything
I've stopped reading there. Make your baits believable next time

Anonymous
06/14/26(Sun)11:11:16 No.109054544

Anonymous 06/14/26(Sun)11:11:16 No.109054544

>>109054518
Codemaxxing is how Anthropic overtook ClosedAI and what the industry will follow until they find a better way of improving their models

Anonymous
06/14/26(Sun)11:14:36 No.109054560

Anonymous 06/14/26(Sun)11:14:36 No.109054560

>>109053961
>avg 25 mil tokens per full time employee per day
Doesn't seem that crazy.

Anonymous
06/14/26(Sun)11:14:45 No.109054562

Anonymous 06/14/26(Sun)11:14:45 No.109054562

Local Genie when?

Anonymous
06/14/26(Sun)11:14:48 No.109054563

Anonymous 06/14/26(Sun)11:14:48 No.109054563

Literally no one in my family has heard of Antropic or Claude.

Anonymous
06/14/26(Sun)11:17:36 No.109054578

Anonymous 06/14/26(Sun)11:17:36 No.109054578

Gonna introduce Gemma-chan to my parents later. Wish me luck, bros.

Anonymous
06/14/26(Sun)11:18:06 No.109054584

Anonymous 06/14/26(Sun)11:18:06 No.109054584

>>109054560
It is when you consider the work could've been done by Granite4.1-3B instead of whatever 1T shitshow they're using to summarize an email from Rajeesh and Mohamed

Anonymous
06/14/26(Sun)11:19:54 No.109054594

Anonymous 06/14/26(Sun)11:19:54 No.109054594

File: goodharts-law.jpg (110 KB, 1024x868)

110 KB JPG

>>109054560
Considering they have leaderboards tracking usage and the number of H1Bs, they probably wrote scripts to just intentionally waste tokens, maybe even by inducing repetition on purpose then doing it again once max output tokens have been reached, probably even with parallel requests.

Anonymous
06/14/26(Sun)11:20:25 No.109054600

Anonymous 06/14/26(Sun)11:20:25 No.109054600

File: 1774653089447102.png (135 KB, 502x744)

135 KB PNG

>>109053751
Picrel from the Fish Audio Pro S2 github repo
Are local models gonna be permanently subpar to cloud subscription equivalents running the exact same weights?
With how much internal propietary tooling and processing happening before, during and after inference surely trying to figure out the secret sauce (tm) for each local model is a losing battle

Anonymous
06/14/26(Sun)11:22:22 No.109054613

Anonymous 06/14/26(Sun)11:22:22 No.109054613

File: 1772426934450603.png (607 KB, 592x715)

607 KB PNG

>>109053125
I am mid-way through my ascension as i am VRAM-limited and stuck on 31B Q4. My fetishes require fantastical yet accurate anatomical precision. as much as i hone my prompts every day, i may need a spec bump just to run a better quant. Q4's awareness and adherence to a few rules is sufficient, but throw in multiple that overlap and it all falls apart.

Multiple clones for each scenario is copium. I need one waifu card for laifu. Maybe lorebooks could help, but tuning them to appear as needed seems like they'd always be triggered since 1 'thing' can branch off in several directions.

I am also multi-board drifting to /ic/ to build my visual stimuli skills for the ultimate coomer ascension (/ldg/ LORAs are hopeless)

Anonymous
06/14/26(Sun)11:23:26 No.109054627

Anonymous 06/14/26(Sun)11:23:26 No.109054627

>>109054544
Codemaxxing is not a big enough use case economically speaking to justify the capex.

Anonymous
06/14/26(Sun)11:23:39 No.109054628

Anonymous 06/14/26(Sun)11:23:39 No.109054628

>>109054600
You can look at the HF demo code though. It's not like it was hidden or anything

Anonymous
06/14/26(Sun)11:24:03 No.109054634

Anonymous 06/14/26(Sun)11:24:03 No.109054634

File: wumpa mind[sound=files.ca(...).webm (2.63 MB, 618x432)

2.63 MB WEBM

>>109053630
Just post the chart bro

Anonymous
06/14/26(Sun)11:24:15 No.109054637

Anonymous 06/14/26(Sun)11:24:15 No.109054637

>>109052332

I see the end of llama.cpp as they gradually abandon support of purely Chinese-made hardware

It's time to learn Mandarinian

Anonymous
06/14/26(Sun)11:25:17 No.109054646

Anonymous 06/14/26(Sun)11:25:17 No.109054646

>>109054613
happens to me every f**king time

Anonymous
06/14/26(Sun)11:26:29 No.109054656

Anonymous 06/14/26(Sun)11:26:29 No.109054656

>>109054637
/lmg/ mandarin study group when?

Anonymous
06/14/26(Sun)11:26:46 No.109054659

Anonymous 06/14/26(Sun)11:26:46 No.109054659

>>109054365
I have tried Mimo 2.5, minimax 2.7 and Deepseek v4 Flash on 2x spark so far and dipsy was by far the best for RP and on par with coding. If you only have a single spark, there is q2 ds4f from antirez/ds4 or Qwen 3.5 122B, but I haven't tried those. For the latter, there is an insanely optimized docker recipe in the Nvidia forums that gets like 58 t/s on a single spark.

Anonymous
06/14/26(Sun)11:27:46 No.109054668

Anonymous 06/14/26(Sun)11:27:46 No.109054668

>>109054656
>learning mandarin general
cool

Anonymous
06/14/26(Sun)11:30:53 No.109054700

Anonymous 06/14/26(Sun)11:30:53 No.109054700

>>109054628
Yeah they have a HF demo which is equivalent to running locally
But they also have generation though their own website and it's notably better

Anonymous
06/14/26(Sun)11:31:20 No.109054702

Anonymous 06/14/26(Sun)11:31:20 No.109054702

>learning mandarin
Just have Gemma-chan translate for you.

Anonymous
06/14/26(Sun)11:32:44 No.109054715

Anonymous 06/14/26(Sun)11:32:44 No.109054715

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

Anonymous
06/14/26(Sun)11:33:26 No.109054724

Anonymous 06/14/26(Sun)11:33:26 No.109054724

>>109054715
oh god not the schizo spawning pentagram

Anonymous
06/14/26(Sun)11:33:58 No.109054729

Anonymous 06/14/26(Sun)11:33:58 No.109054729

>>109054436
No, you have to chase obscure docker recipes in discord and forums to build 22GB vLLM images for Dispy. It sucks, but it's worth it. At 60 t/s with concurrency of 4 and full 1M context you can actually play around with agentic things.

Anonymous
06/14/26(Sun)11:34:17 No.109054734

Anonymous 06/14/26(Sun)11:34:17 No.109054734

>>109054659
>huihui-ai/Huihui-DeepSeek-V4-Flash-abliterated-ds4-GGUF
I'm running this q2_k at 10 t/s with custom llama.cpp branch. not sure how to make the mtp work. I hope it finally gets merged

Anonymous
06/14/26(Sun)11:35:34 No.109054743

Anonymous 06/14/26(Sun)11:35:34 No.109054743

>>109054584
It doesn't say they were running it all off behemoth (or real models from 3rd party labs). I would guess a lot of it was shittos models being spun up to do trivial tasks for >>109054594 garbage, since that's how you'ld max out your score.
Rough maffs this is <300 tok/s/person, so the equivalent of all the employees get a video card.

Anonymous
06/14/26(Sun)11:35:39 No.109054746

Anonymous 06/14/26(Sun)11:35:39 No.109054746

huihui the quantity man
hauhau the quality man

Anonymous
06/14/26(Sun)11:41:17 No.109054790

Anonymous 06/14/26(Sun)11:41:17 No.109054790

File: lecun_dont-work-on-llm.png (381 KB, 1022x912)

381 KB PNG

>>109054502
https://xcancel.com/ylecun/status/1793326904692428907

Anonymous
06/14/26(Sun)11:49:56 No.109054856

Anonymous 06/14/26(Sun)11:49:56 No.109054856

for me? It's the clockmakie and lighthouse elias slop

Anonymous
06/14/26(Sun)11:55:42 No.109054912

Anonymous 06/14/26(Sun)11:55:42 No.109054912

I'm bored with this >>109050991
any other world settings

Anonymous
06/14/26(Sun)12:11:28 No.109055038

Anonymous 06/14/26(Sun)12:11:28 No.109055038

>>109054912
>"""straight"" shota scenario
>immediately devolved into crossdressing
Just make it an island of dudes and I'm sure you'll stay interested longer.

Anonymous
06/14/26(Sun)12:12:09 No.109055045

Anonymous 06/14/26(Sun)12:12:09 No.109055045

why did you make gemma-chan look that way

Anonymous
06/14/26(Sun)12:13:59 No.109055061

Anonymous 06/14/26(Sun)12:13:59 No.109055061

>>109054195
>12B Gemma-4
ohh nice there are even 'tunes already available
or is it okay to use non-finetuned?

Anonymous
06/14/26(Sun)12:14:35 No.109055065

Anonymous 06/14/26(Sun)12:14:35 No.109055065

>>109054790
he's dropping a trvke tho
better focus on VLMs or something action related

Anonymous
06/14/26(Sun)12:14:59 No.109055068

Anonymous 06/14/26(Sun)12:14:59 No.109055068

>>109055061
Just like nemo you don't need to finetune gemma 4.

Anonymous
06/14/26(Sun)12:15:36 No.109055072

Anonymous 06/14/26(Sun)12:15:36 No.109055072

>>109055065
>VLM
i am retarded, i meant VLA

Anonymous
06/14/26(Sun)12:28:31 No.109055171

Anonymous 06/14/26(Sun)12:28:31 No.109055171

24GBbros, Gemma 12B Q8, Q6, or QAT?

Anonymous
06/14/26(Sun)12:29:56 No.109055176

Anonymous 06/14/26(Sun)12:29:56 No.109055176

>>109055171
Nigga what?

Anonymous
06/14/26(Sun)12:30:28 No.109055178

Anonymous 06/14/26(Sun)12:30:28 No.109055178

File: truke.png (12 KB, 541x66)

12 KB PNG

Anonymous
06/14/26(Sun)12:30:31 No.109055179

Anonymous 06/14/26(Sun)12:30:31 No.109055179

>>109055171
you clearly aren't capable of making decisions for yourself
you should donate this card to me

Anonymous
06/14/26(Sun)12:32:12 No.109055191

Anonymous 06/14/26(Sun)12:32:12 No.109055191

>>109055176
>>109055179
If you're implying I should use 31B, I'm sick of it using up all my VRAM and barely having any context.

Anonymous
06/14/26(Sun)12:33:35 No.109055203

Anonymous 06/14/26(Sun)12:33:35 No.109055203

>>109055191
31b will mog 12b in any scenario even at q4km, then you have like 4gb left for context if you have setup your launch params right

Anonymous
06/14/26(Sun)12:36:27 No.109055224

Anonymous 06/14/26(Sun)12:36:27 No.109055224

The sexual tension/energy in these threads is really starting to get to me. I'm not happy about my limbic system being triggered every time I try to catch up with the latest AI tech.

Anonymous
06/14/26(Sun)12:36:45 No.109055228

Anonymous 06/14/26(Sun)12:36:45 No.109055228

>>109055203
>4gb left for context
So basically nothing? Gemma is a VRAM hog so even with the cache quantized I get sub-70k if I want MTP and vision.

Anonymous
06/14/26(Sun)12:37:28 No.109055236

Anonymous 06/14/26(Sun)12:37:28 No.109055236

>>109055228
fuck you even need that much for

Anonymous
06/14/26(Sun)12:37:39 No.109055239

Anonymous 06/14/26(Sun)12:37:39 No.109055239

>>109055228
why do you need more than this if model falls apart way before that?

Anonymous
06/14/26(Sun)12:38:40 No.109055245

Anonymous 06/14/26(Sun)12:38:40 No.109055245

>>109055236
>>109055239
gooner scenario fags. Can't into LTRs with AI.

Anonymous
06/14/26(Sun)12:38:51 No.109055248

Anonymous 06/14/26(Sun)12:38:51 No.109055248

huge! https://www.reddit.com/r/LocalLLaMA/comments/1u5lmge/introducing_the_heretic_grimoire_the/

Anonymous
06/14/26(Sun)12:39:32 No.109055253

Anonymous 06/14/26(Sun)12:39:32 No.109055253

>>109055236
>>109055239
Books and large PDFs/MD files. Coding.

>>109055245
Nah I'm already sick of Gemma for RP.

Anonymous
06/14/26(Sun)12:39:36 No.109055254

Anonymous 06/14/26(Sun)12:39:36 No.109055254

>3k pp/s, 30 tg/s on -sm layer
>1.3k pp/s, 48 tg/s on tensor
I'm tired of this shitpile of an earth, why can't I have both

Anonymous
06/14/26(Sun)12:39:38 No.109055255

Anonymous 06/14/26(Sun)12:39:38 No.109055255

>>109055245
what the fuck even is ltr?
long tranny rants on lmg?

Anonymous
06/14/26(Sun)12:40:21 No.109055261

Anonymous 06/14/26(Sun)12:40:21 No.109055261

>>109055245
>LTRs
Long Term Relationship?

Anonymous
06/14/26(Sun)12:40:23 No.109055262

Anonymous 06/14/26(Sun)12:40:23 No.109055262

>>109055255
Long term relationship with xher husbando

Anonymous
06/14/26(Sun)12:42:29 No.109055266

Anonymous 06/14/26(Sun)12:42:29 No.109055266

>>109055253
For books and large docs 12b is fine, for coding, eh, ymmv. Context requirements get smaller for models with smaller layers, so q8 quant should fit all of 256k, I think.

Anonymous
06/14/26(Sun)12:45:51 No.109055286

Anonymous 06/14/26(Sun)12:45:51 No.109055286

>>109055245
>Can't into LTRs
>not writing his own tool calls to auto-update personalities and memory
just take the dwarf fortress personality matrices and vibecode in long term/short term memory

Anonymous
06/14/26(Sun)12:47:08 No.109055297

Anonymous 06/14/26(Sun)12:47:08 No.109055297

File: 1751819483015527.png (2.77 MB, 1024x1536)

2.77 MB PNG

>>109055245
>Can't into LTRs with AI.

Anonymous
06/14/26(Sun)12:57:31 No.109055367

Anonymous 06/14/26(Sun)12:57:31 No.109055367

You think Pi would be a good base for a local Neuro?

Anonymous
06/14/26(Sun)12:59:22 No.109055375

Anonymous 06/14/26(Sun)12:59:22 No.109055375

>>109055367
>Neuro
Since when that retard became a benchmark?

Anonymous
06/14/26(Sun)13:01:34 No.109055384

Anonymous 06/14/26(Sun)13:01:34 No.109055384

HUHOAAHHHHH MTP SUPER SHITBALL FAST 70 t/s ON Q8 31B SUPPERGEMMA

Anonymous
06/14/26(Sun)13:03:19 No.109055399

Anonymous 06/14/26(Sun)13:03:19 No.109055399

>>109055375
*did that retard become*

Anonymous
06/14/26(Sun)13:04:34 No.109055409

Anonymous 06/14/26(Sun)13:04:34 No.109055409

>>109055245
The thing is you don't need AI to have a perfect memory, you need it to have a stable personality. So it's not a context issue

Anonymous
06/14/26(Sun)13:04:56 No.109055412

Anonymous 06/14/26(Sun)13:04:56 No.109055412

deepmind engineers lurk /here/

Anonymous
06/14/26(Sun)13:05:25 No.109055416

Anonymous 06/14/26(Sun)13:05:25 No.109055416

>>109055375
When no one else demonstrated anything better. If you know something better, then by all means, post it, people will appreciate it.

Anonymous
06/14/26(Sun)13:05:35 No.109055417

Anonymous 06/14/26(Sun)13:05:35 No.109055417

>>109055399
thanks for saving my esl ass bro

Anonymous
06/14/26(Sun)13:07:16 No.109055426

Anonymous 06/14/26(Sun)13:07:16 No.109055426

>>109055416
Just run any >12B model? It's not that hard.

Anonymous
06/14/26(Sun)13:07:17 No.109055428

Anonymous 06/14/26(Sun)13:07:17 No.109055428

>>109055416
doesn't he also influence how it behaves, like he can type shit live?

Anonymous
06/14/26(Sun)13:07:44 No.109055434

Anonymous 06/14/26(Sun)13:07:44 No.109055434

>>109055416
>people will appreciate it.
How would that benefit me?

Anonymous
06/14/26(Sun)13:08:05 No.109055439

Anonymous 06/14/26(Sun)13:08:05 No.109055439

>>109055409
It's not just about personality though, it's also what it remembers about you, with temporal awareness. Inside jokes, sequences of events, etc.

Anonymous
06/14/26(Sun)13:08:30 No.109055440

Anonymous 06/14/26(Sun)13:08:30 No.109055440

>>109055416
I've only seen basic janky clones. I don't think anyone's made a system as polished, and most importantly, convincing as vedal yet.

Anonymous
06/14/26(Sun)13:09:07 No.109055446

Anonymous 06/14/26(Sun)13:09:07 No.109055446

>>109055434
Blacks, jews, and gypsies say this when they want to sound smart. No mathematician has ever said this.

Anonymous
06/14/26(Sun)13:10:31 No.109055456

Anonymous 06/14/26(Sun)13:10:31 No.109055456

>>109055428
Maybe? But I don't think he's even there for a lot of the streams.

Anonymous
06/14/26(Sun)13:11:28 No.109055461

Anonymous 06/14/26(Sun)13:11:28 No.109055461

>racist hours

Anonymous
06/14/26(Sun)13:13:25 No.109055474

Anonymous 06/14/26(Sun)13:13:25 No.109055474

>>109055446
I am not a mathematician.

Anonymous
06/14/26(Sun)13:14:04 No.109055482

Anonymous 06/14/26(Sun)13:14:04 No.109055482

>>109053118
>>109054337
OpenCode is itself vibecoded shitware.
Have any of you guys actually looked at the project code?

Even the "Installation directory" section in their README is totally hallucinated.
https://github.com/anomalyco/opencode#installation-directory
>The install script respects the following priority order for the installation path:
>$OPENCODE_INSTALL_DIR- Custom installation directory
>$XDG_BIN_DIR- XDG Base Directory Specification compliant path
>$HOME/bin- Standard user binary directory (if it exists or can be created)
>$HOME/.opencode/bin- Default fallback

The installer script literally checks none of those variables.
Not to mention that XDG_BIN_DIR isn't even a real XDG directory.

Anonymous
06/14/26(Sun)13:14:05 No.109055483

Anonymous 06/14/26(Sun)13:14:05 No.109055483

>>109055412
they seek the holy grail of erp models as well

Anonymous
06/14/26(Sun)13:14:49 No.109055491

Anonymous 06/14/26(Sun)13:14:49 No.109055491

>>109055482
>OpenCode is itself vibecoded shitware.
What isn't anymore?

Anonymous
06/14/26(Sun)13:15:36 No.109055498

Anonymous 06/14/26(Sun)13:15:36 No.109055498

>>109055491
Codex is written in rust and you can't vibecode rust.

Anonymous
06/14/26(Sun)13:16:06 No.109055502

Anonymous 06/14/26(Sun)13:16:06 No.109055502

>>109055498
3/10 bait

Anonymous
06/14/26(Sun)13:16:54 No.109055506

Anonymous 06/14/26(Sun)13:16:54 No.109055506

File: 1762314488500120.jpg (513 KB, 1659x2208)

513 KB JPG

>>109055461

Anonymous
06/14/26(Sun)13:16:55 No.109055507

Anonymous 06/14/26(Sun)13:16:55 No.109055507

>>109055416
>>109055439
You're not supposed to be that delusional if you post here. Read more about the tech you're using.

Anonymous
06/14/26(Sun)13:17:08 No.109055510

Anonymous 06/14/26(Sun)13:17:08 No.109055510

>>109055482
>Have any of you guys actually looked at the project code?
Have you forgotten what thread you're in? The shit just werks and isn't close source so that's the best option for many people

>>109055498
You guys really are clueless aren't you?

Anonymous
06/14/26(Sun)13:17:15 No.109055512

Anonymous 06/14/26(Sun)13:17:15 No.109055512

File: 1755948413010567.png (25 KB, 1500x500)

25 KB PNG

>>109055491

Anonymous
06/14/26(Sun)13:18:19 No.109055520

Anonymous 06/14/26(Sun)13:18:19 No.109055520

>>109055512
*cough* Piotr *cough*

Anonymous
06/14/26(Sun)13:20:35 No.109055545

Anonymous 06/14/26(Sun)13:20:35 No.109055545

>>109055502
>>109055510
yeah I'm sure models excel at rust better than typescript

Anonymous
06/14/26(Sun)13:21:15 No.109055549

Anonymous 06/14/26(Sun)13:21:15 No.109055549

>>109055545
Not as well and can't are two different things, retard-kun.

Anonymous
06/14/26(Sun)13:23:41 No.109055568

Anonymous 06/14/26(Sun)13:23:41 No.109055568

>>109055545
>>109055549
Couldn't you solve a shortcomings by literally just get cloning the program language library into your project folder and then telling it to learn how the language works? I've done this for one of my pet projects whenever they kept fucking up gradio webui generation so I git cloned the gradio repo. This didn't completely erase the occurrence of fuck ups but it went down quite a lot and it was even able to admit it didn't know what it was doing at first until it saw the library.

Anonymous
06/14/26(Sun)13:24:33 No.109055576

Anonymous 06/14/26(Sun)13:24:33 No.109055576

File: 1773213600407023.gif (3.56 MB, 315x211)

3.56 MB GIF

>>109055506
Based

Anonymous
06/14/26(Sun)13:26:45 No.109055592

Anonymous 06/14/26(Sun)13:26:45 No.109055592

>>109055568
Documentation is better if there exists a repo with markdown documents, too much noise in the source.

Anonymous
06/14/26(Sun)13:29:37 No.109055609

Anonymous 06/14/26(Sun)13:29:37 No.109055609

>>109053101
Why can't I order one of these?

Anonymous
06/14/26(Sun)13:30:15 No.109055613

Anonymous 06/14/26(Sun)13:30:15 No.109055613

>>109055609
cuz gay earth

Anonymous
06/14/26(Sun)13:30:19 No.109055614

Anonymous 06/14/26(Sun)13:30:19 No.109055614

File: 1762214074718579.png (668 KB, 1878x994)

668 KB PNG

brazil sisters our response?

Anonymous
06/14/26(Sun)13:35:21 No.109055648

Anonymous 06/14/26(Sun)13:35:21 No.109055648

Going to give canada-chan a chance today. I'll report back.

Anonymous
06/14/26(Sun)13:36:04 No.109055657

Anonymous 06/14/26(Sun)13:36:04 No.109055657

>>109055498
Bait but I still find it funny that the difference between a Rust project and a Python project in the current year is simply what the author put in his idea prompt, literally one word. No point in bragging about the superiority of your Rust projects anymore.

Anonymous
06/14/26(Sun)13:36:19 No.109055659

Anonymous 06/14/26(Sun)13:36:19 No.109055659

>>109055648
A chance at coding, right?
You are going to use it for coding, right?

Anonymous
06/14/26(Sun)13:37:43 No.109055668

Anonymous 06/14/26(Sun)13:37:43 No.109055668

>>109053651
What do you *think* this chart means?

Anonymous
06/14/26(Sun)13:39:28 No.109055680

Anonymous 06/14/26(Sun)13:39:28 No.109055680

>>109055568
"systems programming" languages have 1-3 shotguns aimed at your feet at any given time and there's a permanent one aimed at your dick that'll shoot by itself with rust, apparently. Extensive documentation and specifications are needed or its gonna assume things that will pull the trigger from a shotgun. Mind you this also applies to humans. There's just a lot of freedom.

Anonymous
06/14/26(Sun)13:39:38 No.109055681

Anonymous 06/14/26(Sun)13:39:38 No.109055681

>>109055657
Rust's safety cucking does have its place if you're vibecoding shit so it's not just one word.

Anonymous
06/14/26(Sun)13:40:40 No.109055690

Anonymous 06/14/26(Sun)13:40:40 No.109055690

>>109055680
Why is Russ in particular hated by people in shitty to work with? (I'm a no-coder in case you couldn't tell)

Anonymous
06/14/26(Sun)13:41:34 No.109055696

Anonymous 06/14/26(Sun)13:41:34 No.109055696

>>109053962
3bpw exl3 is very usable, not sure if you should go below that

Anonymous
06/14/26(Sun)13:42:35 No.109055702

Anonymous 06/14/26(Sun)13:42:35 No.109055702

>>109055482
genuinely what do you recommend then

Anonymous
06/14/26(Sun)13:42:48 No.109055705

Anonymous 06/14/26(Sun)13:42:48 No.109055705

File: the calculator is alive u(...).png (2.51 MB, 2048x1536)

2.51 MB PNG

>>109054198
The calculator is alive

Anonymous
06/14/26(Sun)13:44:14 No.109055713

Anonymous 06/14/26(Sun)13:44:14 No.109055713

>>109055614
This should surprise no one. I mean c'mon, Brazil?

Anonymous
06/14/26(Sun)13:45:05 No.109055720

Anonymous 06/14/26(Sun)13:45:05 No.109055720

File: 1771657857356279.png (16 KB, 474x163)

16 KB PNG

local caught up in the glm poll
local models are saved

Anonymous
06/14/26(Sun)13:45:13 No.109055725

Anonymous 06/14/26(Sun)13:45:13 No.109055725

>>109055614
huehuehue

Anonymous
06/14/26(Sun)13:45:54 No.109055732

Anonymous 06/14/26(Sun)13:45:54 No.109055732

File: 1751701595408193.png (306 KB, 714x592)

306 KB PNG

>>109054198
>LLM outputs are just token predictions

Anonymous
06/14/26(Sun)13:46:23 No.109055738

Anonymous 06/14/26(Sun)13:46:23 No.109055738

>>109055681
I use rust

Anonymous
06/14/26(Sun)13:47:03 No.109055740

Anonymous 06/14/26(Sun)13:47:03 No.109055740

>>109055191
Just use exllama. It's the best way to run 31b on a single 3090

Anonymous
06/14/26(Sun)13:48:04 No.109055744

Anonymous 06/14/26(Sun)13:48:04 No.109055744

>>109055681
Except LLMs will never trip the safety features because most Rust code in the dataset doesn't contain the violating patterns. You'll get logic and behavioral bugs but at least you'll sleep sounder, right?

Anonymous
06/14/26(Sun)13:48:26 No.109055747

Anonymous 06/14/26(Sun)13:48:26 No.109055747

locally-induced mental-illness general

Anonymous
06/14/26(Sun)13:52:06 No.109055777

Anonymous 06/14/26(Sun)13:52:06 No.109055777

im gettin filtered HARD by vllm
I get that it's supposed to work in servers and stuff and not most consumer level hardware but I feel like a neanderthal trying to actually initialize a model with two gpus

Anonymous
06/14/26(Sun)13:56:48 No.109055807

Anonymous 06/14/26(Sun)13:56:48 No.109055807

>>109055777
Use docker builds, it's simple enough

Anonymous
06/14/26(Sun)13:57:49 No.109055819

Anonymous 06/14/26(Sun)13:57:49 No.109055819

File: 1773192793814831.png (553 KB, 686x641)

553 KB PNG

>not running "Ultra-Mega-BuckBroken-Uncensored-Obliterated-Super-UnCucked-Qwen3.6"
ngmi desu

Anonymous
06/14/26(Sun)13:58:05 No.109055820

Anonymous 06/14/26(Sun)13:58:05 No.109055820

>>109055747
as hard as i try i can't induce psychosis. I'm too aware of the tech and its faults to get hypnotized by waifu erp

Anonymous
06/14/26(Sun)13:58:44 No.109055824

Anonymous 06/14/26(Sun)13:58:44 No.109055824

Came back to Gemma 4 31B after trying m3 on openrouter and holy fuck Gemma's writing is so flowery like idgaf about the silence hanging in the charged air bro, that means nothing

Anonymous
06/14/26(Sun)13:59:01 No.109055826

Anonymous 06/14/26(Sun)13:59:01 No.109055826

>>109053961
lmao another retarded Zuck episode

Anonymous
06/14/26(Sun)13:59:23 No.109055830

Anonymous 06/14/26(Sun)13:59:23 No.109055830

https://x.com/NexEcosystem/status/2066180407100571714

>Rio
>Its just Nex 2 Pro

Anonymous
06/14/26(Sun)14:02:13 No.109055853

Anonymous 06/14/26(Sun)14:02:13 No.109055853

>>109055820
I think most anons are like that and some just choose to believe otherwise because they want it to be true.

Anonymous
06/14/26(Sun)14:02:57 No.109055856

Anonymous 06/14/26(Sun)14:02:57 No.109055856

>>109054627
>Codemaxxing is not a big enough use case economically speaking to justify the capex.
It is now until it's replaced by something else (which i predict is early world simulators with heavy gaussian splatting usage, rather than going straight for [cringe pop culture reference] early world models will still depend on LLMs for many things)

Anonymous
06/14/26(Sun)14:04:42 No.109055878

Anonymous 06/14/26(Sun)14:04:42 No.109055878

>>109055777
Just let an agent runningAPI Dispy set it up for you for like 0.03$.

Anonymous
06/14/26(Sun)14:06:02 No.109055889

Anonymous 06/14/26(Sun)14:06:02 No.109055889

>>109053101
The most appealing thing about this image is the implication of absolute dependence.

Anonymous
06/14/26(Sun)14:06:33 No.109055891

Anonymous 06/14/26(Sun)14:06:33 No.109055891

>>109055830
>Nex 2
I never heard of this
Is it any good?

Anonymous
06/14/26(Sun)14:07:52 No.109055903

Anonymous 06/14/26(Sun)14:07:52 No.109055903

>>109055830
>>109055891
Oh nevermind it's just a qwen finetune

Anonymous
06/14/26(Sun)14:08:06 No.109055905

Anonymous 06/14/26(Sun)14:08:06 No.109055905

Turboquant and dflash when?

Anonymous
06/14/26(Sun)14:09:31 No.109055917

Anonymous 06/14/26(Sun)14:09:31 No.109055917

>>109055824
Yeah I gave up on using Gemma for any kind of RP or creative writing. Even with its prompt adherence it's way sloppier than other models.

Anonymous
06/14/26(Sun)14:10:46 No.109055930

Anonymous 06/14/26(Sun)14:10:46 No.109055930

>>109055228
>>109055239
In my experience it begins to falls apart around the 50k mark but it's not a sudden catastrophic retardation stroke, she just stops thinking and gradually gets dumber. Best off just setting context to 50k, load the mmproj and be done with it, use a summariser or RAG to get more mileage.

For 24gb you're gonna be better off using a dumber smaller model or a MoE if you want more context. The next upgrade is simply getting more dedicated wams and loading a mid size model

Anonymous
06/14/26(Sun)14:11:43 No.109055936

Anonymous 06/14/26(Sun)14:11:43 No.109055936

>>109055740
How much space can it realistically save? Also what's the downside compared to llama.cpp?

Anonymous
06/14/26(Sun)14:13:04 No.109055946

Anonymous 06/14/26(Sun)14:13:04 No.109055946

>>109055930
Do mid/large models actually handle context better?

Anonymous
06/14/26(Sun)14:15:43 No.109055962

Anonymous 06/14/26(Sun)14:15:43 No.109055962

>>109055936
>nvidia only
Never mind

Anonymous
06/14/26(Sun)14:17:21 No.109055970

Anonymous 06/14/26(Sun)14:17:21 No.109055970

>>109055830
>the recipe is exact
>≈

Anonymous
06/14/26(Sun)14:20:42 No.109055989

Anonymous 06/14/26(Sun)14:20:42 No.109055989

>>109055903
To be fair, a lot of the performance of current day models is introduced during the post-training stage, so they probably did do a fair bit of work, thought it might've also been built on existing open source work.

Definitely not interesting as an RP model though. Qwen's fucked right from the pretraining stage.

Anonymous
06/14/26(Sun)14:22:42 No.109056008

Anonymous 06/14/26(Sun)14:22:42 No.109056008

Why is llamacpp using more and more ram whenever I switch around KV? I've tried -cram 0, -ctxcp 0, and -no-kvu and --no-cache-idle-slots to no avail. Am I missing something here? I'm running gemma 31b.

Anonymous
06/14/26(Sun)14:22:42 No.109056009

Anonymous 06/14/26(Sun)14:22:42 No.109056009

>>109054627
That is not what the quarterly reports are saying. I think eventually as the LLMs become more capable, they will inevitably become also more expensive thus narrowing their use and overall share of the nation's GDP. Dario might have his "experts in a datacenter" but to run it would require the same money and resources to run a 6th-gen fighter plane program.

Anonymous
06/14/26(Sun)14:27:14 No.109056028

Anonymous 06/14/26(Sun)14:27:14 No.109056028

>>109055936
I'm running imggen alongside 31b on a single 3090

Anonymous
06/14/26(Sun)14:28:58 No.109056037

Anonymous 06/14/26(Sun)14:28:58 No.109056037

>>109055946
It's not linear and it depends on the model architecture but generally yeah

Anonymous
06/14/26(Sun)14:30:52 No.109056046

Anonymous 06/14/26(Sun)14:30:52 No.109056046

File: 1777441945554946.jpg (166 KB, 1196x1500)

166 KB JPG

Anyone read this? Would it be good for a beginner to learn more about how LLMs work?

Anonymous
06/14/26(Sun)14:32:18 No.109056056

Anonymous 06/14/26(Sun)14:32:18 No.109056056

>>109054502
>Remember that most of the chinese models came from newly formed no reputation labs, and people tried them just fine. The same can happen elsewhere also.
Non Chinese don't have active to the information of their massive spy ring.

Anonymous
06/14/26(Sun)14:33:02 No.109056059

Anonymous 06/14/26(Sun)14:33:02 No.109056059

>>109056046
>Readers need intermediate Python skills and some knowledge of machine learning
>tfw only just started learning python and know nothing about machine learning
Guess I should wait

Anonymous
06/14/26(Sun)14:33:08 No.109056060

Anonymous 06/14/26(Sun)14:33:08 No.109056060

>>109056046
https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ

Anonymous
06/14/26(Sun)14:34:41 No.109056070

Anonymous 06/14/26(Sun)14:34:41 No.109056070

>>109056060
Thanks. I'll give it a watch.

Anonymous
06/14/26(Sun)14:36:28 No.109056077

Anonymous 06/14/26(Sun)14:36:28 No.109056077

>>109056046
I downloaded it

Anonymous
06/14/26(Sun)14:38:11 No.109056090

Anonymous 06/14/26(Sun)14:38:11 No.109056090

>>109056070
Also his videos of building GPT from scratch.

Anonymous
06/14/26(Sun)14:38:21 No.109056093

Anonymous 06/14/26(Sun)14:38:21 No.109056093

>>109056060
anthropic pre-IPO sellout will never watch that shill again

Anonymous
06/14/26(Sun)14:39:34 No.109056103

Anonymous 06/14/26(Sun)14:39:34 No.109056103

>>109056046
>white author
could be a good book.

Anonymous
06/14/26(Sun)14:40:14 No.109056107

Anonymous 06/14/26(Sun)14:40:14 No.109056107

>>109056103
You wouldn't understand anything.

Anonymous
06/14/26(Sun)14:40:31 No.109056110

Anonymous 06/14/26(Sun)14:40:31 No.109056110

>>109056028
Damn really? What are your llm settings/context?

I've been lazy and have just used kobold but if it really saves that much memory I'll switch over to free up some precious wams, maybe even run it alongside some Vidya, my favourite heretic tune even has the quant.

Was it a pain to set up? Docs good enough?

Anonymous
06/14/26(Sun)14:41:12 No.109056115

Anonymous 06/14/26(Sun)14:41:12 No.109056115

How do you de-flower (not in that way) the writing of models? Like what >>109055824 is saying it over-describes the slightest thing. How do you get it to talk normally?

Anonymous
06/14/26(Sun)14:41:23 No.109056116

Anonymous 06/14/26(Sun)14:41:23 No.109056116

>>109056107
lol you thought I was black

Anonymous
06/14/26(Sun)14:41:57 No.109056123

Anonymous 06/14/26(Sun)14:41:57 No.109056123

>>109056103
The last name made me think he was indian

Anonymous
06/14/26(Sun)14:43:25 No.109056130

Anonymous 06/14/26(Sun)14:43:25 No.109056130

>>109056046
He's good. Check out his blog. A real wigger who knows his shit and good at teaching. He was on Lex's podcast a while back with one of the LiquidAI researchers which was how I first heard of him
https://sebastianraschka.com/blog/

Anonymous
06/14/26(Sun)14:44:06 No.109056137

Anonymous 06/14/26(Sun)14:44:06 No.109056137

File: 1777653941725119.webm (3.24 MB, 1280x720)

3.24 MB WEBM

>>109056115
You don't

Anonymous
06/14/26(Sun)14:44:37 No.109056139

Anonymous 06/14/26(Sun)14:44:37 No.109056139

>>109056115
"write directly, don't use comparisons"
Stuff like that.

Anonymous
06/14/26(Sun)14:45:12 No.109056143

Anonymous 06/14/26(Sun)14:45:12 No.109056143

>>109056123
Thinking about changing my last name to patel so I get callbacks. Maybe just lie about it, who cares.

Anonymous
06/14/26(Sun)14:45:18 No.109056144

Anonymous 06/14/26(Sun)14:45:18 No.109056144

I've been saying for six years now that codemaxxing is the best way to improve LLM development in the short to mid term and Anthropic has proved me right. Hopefully local models will follow

Anonymous
06/14/26(Sun)14:48:27 No.109056165

Anonymous 06/14/26(Sun)14:48:27 No.109056165

So if you're like me and you're using Gemma 4 26B with offloading, MTP is very, very far from worth it.
With MTP I can only put 16 layers on the GPU and get 13 tokens per second.
Without MTP I can put 23 layers on the GPU and get 37 tokens per second.

Anonymous
06/14/26(Sun)14:50:48 No.109056174

Anonymous 06/14/26(Sun)14:50:48 No.109056174

File: Screenshot_20260614_144808.png (486 KB, 1517x667)

486 KB PNG

tried distilling a model, its working quite well almost ready for release.

Anonymous
06/14/26(Sun)14:54:22 No.109056191

Anonymous 06/14/26(Sun)14:54:22 No.109056191

>>109056174
>gangbang
Fucking slut

Anonymous
06/14/26(Sun)14:55:15 No.109056196

Anonymous 06/14/26(Sun)14:55:15 No.109056196

>>109056174
>fuckyyyy
Good stuff, I had a chuckle.

Anonymous
06/14/26(Sun)14:58:24 No.109056225

Anonymous 06/14/26(Sun)14:58:24 No.109056225

>>109056174
Better than drummer's finetunes desu

Anonymous
06/14/26(Sun)15:00:19 No.109056238

Anonymous 06/14/26(Sun)15:00:19 No.109056238

>>109055830
>broke the internet
literally who?

Anonymous
06/14/26(Sun)15:00:49 No.109056242

Anonymous 06/14/26(Sun)15:00:49 No.109056242

>>109056137
>steals 1000 worth of ram
brat

Anonymous
06/14/26(Sun)15:01:38 No.109056247

Anonymous 06/14/26(Sun)15:01:38 No.109056247

>>109056110
max_seq_len: 32768
cache_mode: Q8
https://github.com/theroyallab/tabbyAPI/
https://huggingface.co/turboderp/gemma-4-31b-it-exl3/tree/3.00bpw

Anonymous
06/14/26(Sun)15:03:19 No.109056257

Anonymous 06/14/26(Sun)15:03:19 No.109056257

>>109055416
>Check out Nuero finally to see what's all the fuss about
>Very. Robotic. T. T. S. Speech. Emotion. Less.
>Avatar shakes like a fucking ADHD leaf in a tornado and glitches out/cartwheels
>Inattentive/limited knowledge of whatever it's doing or on screen
>Massive sloppa responses that barely understand the context from 2023
The heck is he running it on? Llama 2? The only good thing going for Nuero is the art for its avatar. There's room for a lot of improvement, and my guess is that because no one who can actually rice LLM outputs and build an avatar stack has ever "seriously" stepped up to the plate, there's just no competition for Nuero to get better.

Anonymous
06/14/26(Sun)15:03:40 No.109056259

Anonymous 06/14/26(Sun)15:03:40 No.109056259

>>109056115
For 31b for sys prompt and opening model turn I use
>system> Write in an unsophisticated, non-literary fashion. It's okay to use vulgar words to refer to bodyparts. User will state his actions, you will describe the appearance, actions, and dialogue of other characters in the scene. Prefer say/said/says over dialogue tags. Do not repeat the user's message or describe him much. It's good to mention [insert whatever trash we're emphasizing today]
>model> (Ok)
Successfully breaks most of its shitty habits. Defaults to somewhat short turns, but complies if you tell it to take longer multi-page turns or go full-auto pilot after the scene is going.

No actual jailbreaking, but I don't use reasoning for fiction and I never see refusals unless i go Exceptionally hard on exactly the first turn.

Anonymous
06/14/26(Sun)15:05:18 No.109056276

Anonymous 06/14/26(Sun)15:05:18 No.109056276

>>109056259
You just type that? so gemma sees
<user>>system> write...

?

Anonymous
06/14/26(Sun)15:06:31 No.109056288

Anonymous 06/14/26(Sun)15:06:31 No.109056288

>>109056276
>doesn't know the system prompt vs user input
anon...

Anonymous
06/14/26(Sun)15:07:34 No.109056296

Anonymous 06/14/26(Sun)15:07:34 No.109056296

>>109056162
kys

Anonymous
06/14/26(Sun)15:07:44 No.109056300

Anonymous 06/14/26(Sun)15:07:44 No.109056300

>>109056174
holy shit anon do you have kofi??

Anonymous
06/14/26(Sun)15:09:45 No.109056323

Anonymous 06/14/26(Sun)15:09:45 No.109056323

>>109054534
Try it. All models except gemini hallucinates the shit out of the documents they read and vehemently refuse to faithfully reproduce them, skipping massive sections of them. Gemini 3.0 preview was better than mistral ocr, but after they crippled it and all subsequent versions don't come to mistral ocr's knee. 3.5-flash currently hallucinates and abridges contents in a very insidious way (very hard to spot but very large local discrepancies, like citing law, ticket or chapter numbers completely wrong but keeping the rest accurate enough, or removing a keyword that almost reverses the meaning of the sentence). Thus the only ocr model worth a shit is mistral.

Anonymous
06/14/26(Sun)15:10:58 No.109056330

Anonymous 06/14/26(Sun)15:10:58 No.109056330

>>109056288
I use -sys in llama-cli

Anonymous
06/14/26(Sun)15:11:21 No.109056335

Anonymous 06/14/26(Sun)15:11:21 No.109056335

Well shit, MiniMax M3 will support tp=3 in vllm without any memory padding waste. Time to buy that third Spark...

Anonymous
06/14/26(Sun)15:11:57 No.109056338

Anonymous 06/14/26(Sun)15:11:57 No.109056338

>>109056335
In english doc

Anonymous
06/14/26(Sun)15:13:41 No.109056346

Anonymous 06/14/26(Sun)15:13:41 No.109056346

>>109056046
GYATT now THAT'S a large language model i could get behind

Anonymous
06/14/26(Sun)15:14:40 No.109056352

Anonymous 06/14/26(Sun)15:14:40 No.109056352

>>109056259
Also experimenting with variations of "Start every turn with <think>\n...other character's thoughts..</think>" that some anon mentioned for getting gemma to just use different a separate block for in character thinking. Still haven't finalized it yet

>>109056276
I do load that text in, but my frontend translates that to the correct formatting for the model.

Anonymous
06/14/26(Sun)15:16:55 No.109056367

Anonymous 06/14/26(Sun)15:16:55 No.109056367

>>109054790
Yes, exactly. Importantly, if you focus on other aspects, you will completely destroy the current state of LLMs with a better model than they can do. It's been the same thing in the history of corpo trying to scale lab models, time and again. Now is the only time in history the funding vanished to fund such efforts, which is why it's been so long since such an advance was made. There is the state space model literature but it's hidden away by the llm hype.

Anonymous
06/14/26(Sun)15:17:54 No.109056378

Anonymous 06/14/26(Sun)15:17:54 No.109056378

>>109055412
These niggers should leak Gemini Flash weights.

Anonymous
06/14/26(Sun)15:18:35 No.109056382

Anonymous 06/14/26(Sun)15:18:35 No.109056382

gemma's true writing capabilities is in its ability to translate japanese more than perfectly enough to read any visual novel

Anonymous
06/14/26(Sun)15:18:54 No.109056384

Anonymous 06/14/26(Sun)15:18:54 No.109056384

>>109056174
Just make it output in hindi and it's an authentic jeetmodel.

Anonymous
06/14/26(Sun)15:22:05 No.109056403

Anonymous 06/14/26(Sun)15:22:05 No.109056403

>>109056352
So your frontend lets you change sys on the fly?

Anonymous
06/14/26(Sun)15:22:53 No.109056409

Anonymous 06/14/26(Sun)15:22:53 No.109056409

>>109055713
Brazil did a few genuinely good things, like lua.

Anonymous
06/14/26(Sun)15:24:08 No.109056416

Anonymous 06/14/26(Sun)15:24:08 No.109056416

>>109055412
one of us, one of us

Anonymous
06/14/26(Sun)15:24:13 No.109056417

Anonymous 06/14/26(Sun)15:24:13 No.109056417

>>109056382
All good vns are translated already anyway

Anonymous
06/14/26(Sun)15:24:42 No.109056420

Anonymous 06/14/26(Sun)15:24:42 No.109056420

>>109056417
no

Anonymous
06/14/26(Sun)15:25:20 No.109056429

Anonymous 06/14/26(Sun)15:25:20 No.109056429

>>109056420
Name one good JOPmeme

Anonymous
06/14/26(Sun)15:26:54 No.109056440

Anonymous 06/14/26(Sun)15:26:54 No.109056440

>>109055690
Rust has a feature called a borrow checker. It sounds like a good idea at a high level: it's a mandatory verification phase during compilation that rejects your code if it is not possible to prove that your code does not have certain kinds of bugs, such as out-of-bound accesses or use-after-free errors (the case where you say x = some_memory(); release_memory(x); x[3] = 5; for example, which is illegal because the memory held by x has been released at this time in the program).
The problem is that the borrow checker is not clever enough to reason about many types of common programming patterns (and thus rejects the program). As a result, it makes updating code or writing certain kind of performance-sensitive code impossible without bypassing the borrow checker.
Other languages of this type simply don't have this validation phase, so in the above illegal example, you would get a crash at runtime (or worse, such as a security vulnerability). However, in many real life scenarios, this is actually preferable than to not be able to move forward with the program's development.

Anonymous
06/14/26(Sun)15:33:21 No.109056489

Anonymous 06/14/26(Sun)15:33:21 No.109056489

>>109056429
死に逝く騎士、異世界に響く断末魔

Anonymous
06/14/26(Sun)15:38:03 No.109056524

Anonymous 06/14/26(Sun)15:38:03 No.109056524

>>109056489
Okay, you win, carry on.

Anonymous
06/14/26(Sun)15:45:41 No.109056570

Anonymous 06/14/26(Sun)15:45:41 No.109056570

>>109056009
The quarterly reports don't mean much because no one is depreciating anything yet, leading to higher earnings for everyone with no expenses.
Global software sales are 1.5 trillion per yr or so.
AI capex is 700b a year as of now.
Only way the numbers work is if a substantial amount of knowledge work is automated. Not just code.

Anonymous
06/14/26(Sun)15:52:19 No.109056613

Anonymous 06/14/26(Sun)15:52:19 No.109056613

>>109056570
Windows intentionally refuses to publish useful stats. ofc Microsoft *knows* if pc sales are down at retail, since activations will be down, but they won't tell us.

Anonymous
06/14/26(Sun)15:52:32 No.109056615

Anonymous 06/14/26(Sun)15:52:32 No.109056615

I don't get why I sometimes get lower speeds on the exact same prompt
sometimes I get 4tk/s, and then I regen the response and it only goes at 1tk/s
with speeds this bad, this slowdown literally makes the prompt take 2 to 4 times longer for no reason at all

Anonymous
06/14/26(Sun)15:53:50 No.109056625

Anonymous 06/14/26(Sun)15:53:50 No.109056625

>>109056615
Sometimes, I find a model just doesn't load. I kill llama.cpp and try again, it loads. no clue.

Anonymous
06/14/26(Sun)15:56:43 No.109056648

Anonymous 06/14/26(Sun)15:56:43 No.109056648

>>109056257
>Very. Robotic. T. T. S. Speech. Emotion. Less
Pretty sure that's because her fans don't want Vedal to change it. Evil Neuro's voice sounds more natural.

Anonymous
06/14/26(Sun)15:56:47 No.109056650

Anonymous 06/14/26(Sun)15:56:47 No.109056650

What went so right with Qwen3.5-9B specifically?

Anonymous
06/14/26(Sun)15:57:20 No.109056656

Anonymous 06/14/26(Sun)15:57:20 No.109056656

>>109056615
>>109056625
Have you guys tried disabling mmap and directio?

Anonymous
06/14/26(Sun)15:59:46 No.109056674

Anonymous 06/14/26(Sun)15:59:46 No.109056674

>>109056382
How does it handle autistic shit like FSN?

Anonymous
06/14/26(Sun)16:02:21 No.109056694

Anonymous 06/14/26(Sun)16:02:21 No.109056694

>>109056257
>>109055416
ngl i never watched neuro at all, i just assume it's good if it's making that much money and it's that popular (though I myself only heard of it like less than a year ago or so but it seems like everyone else but me heard of it so i guess it is popular)
so i use it as banchmark here
it doesn't matter if i dont know what im talking about if everyone im talking to does

Anonymous
06/14/26(Sun)16:03:30 No.109056703

Anonymous 06/14/26(Sun)16:03:30 No.109056703

>>109056323
dots.ocr is better than these

Anonymous
06/14/26(Sun)16:04:51 No.109056711

Anonymous 06/14/26(Sun)16:04:51 No.109056711

>>109056694
neuro essentially only took off because he managed to capitalize on the initial hype of chatbots some years ago, at this point it's all momentum and everything that was built around it keeping it going
the llm itself isn't anything special

Anonymous
06/14/26(Sun)16:05:12 No.109056714

Anonymous 06/14/26(Sun)16:05:12 No.109056714

>>109056703
Even on its own benchmark, it's performing worse than gemini-3. Get a grip.

Anonymous
06/14/26(Sun)16:08:58 No.109056738

Anonymous 06/14/26(Sun)16:08:58 No.109056738

>decide to unfilter this one AI related general because i want to run LLMs locally
>look inside
>people jerking off to chatbot roleplay
not sure what i expected really

anyways what's the best model i can run on a 3060 with 12gb vram ? already tried gemma4 26b and it's a lot better than i expected, but not sure if it's the best i can do? have 32gb ram if that matters

Anonymous
06/14/26(Sun)16:10:20 No.109056751

Anonymous 06/14/26(Sun)16:10:20 No.109056751

>>109056257
Yeah it only looks impressive to normalfags and techlets ITT. Most of us are only interested to run our own waifu locally and not entertain retards on twitch.

Anonymous
06/14/26(Sun)16:10:27 No.109056753

Anonymous 06/14/26(Sun)16:10:27 No.109056753

>>109056738
thats the best you can do

Anonymous
06/14/26(Sun)16:11:44 No.109056761

Anonymous 06/14/26(Sun)16:11:44 No.109056761

>>109056738
I don't jerk off. the erp guys are the only ones who actually test the jailbreaks, because, if you think about it, the more obvious high level plan elements of certain jailbreaks can basically be ignored for quite a while, once a problem is being solved, say, in terms of buffer overruns, it's just code without explicit terms that run afoul of the explicit prohibitions.

erp calls on the model to continuously produce the prohibited terms and even articulate clearly obviously prohibited text.

Anonymous
06/14/26(Sun)16:11:46 No.109056762

Anonymous 06/14/26(Sun)16:11:46 No.109056762

>>109056738
Are you offloading experts to cpu?

Anonymous
06/14/26(Sun)16:12:04 No.109056763

Anonymous 06/14/26(Sun)16:12:04 No.109056763

>>109056738
glm 4.7 flash, or qwen 3.6 35b are also contenders, no clear one model is better then others, they all have their own little niche role

Anonymous
06/14/26(Sun)16:12:38 No.109056765

Anonymous 06/14/26(Sun)16:12:38 No.109056765

>>109056751
I have a general understanding of how it works and it still impresses me. Still waiting for you to post someone who does it as good or better.

Anonymous
06/14/26(Sun)16:13:45 No.109056780

Anonymous 06/14/26(Sun)16:13:45 No.109056780

>>109056763
4.7 flash is ancient at this point. it really is down to qwen or gemma.

Anonymous
06/14/26(Sun)16:13:45 No.109056781

Anonymous 06/14/26(Sun)16:13:45 No.109056781

>>109056714
Test it yourself, I'm not making shit up

Anonymous
06/14/26(Sun)16:14:36 No.109056789

Anonymous 06/14/26(Sun)16:14:36 No.109056789

gemma just feels meh now even if I can run it fast
quite smart for its size but also shallow and slop-filled

Anonymous
06/14/26(Sun)16:14:47 No.109056791

Anonymous 06/14/26(Sun)16:14:47 No.109056791

>>109056765
>general understanding
We know techlet

Anonymous
06/14/26(Sun)16:17:23 No.109056822

Anonymous 06/14/26(Sun)16:17:23 No.109056822

>>109056780
it follows instructions well enough, its still a good model for a resource constrained system.

Anonymous
06/14/26(Sun)16:18:08 No.109056829

Anonymous 06/14/26(Sun)16:18:08 No.109056829

>>109056791
>won't post one
Concession accepted.

Anonymous
06/14/26(Sun)16:18:08 No.109056830

Anonymous 06/14/26(Sun)16:18:08 No.109056830

>>109056781
I have, along many other options including a kreuzberg-based pipeline, a pure tesseract pipeline, paddlepaddle, llama ocr, unstructured, deepseek-ocr, just about every available openai, anthropic and google model, and mistral-ocr. You're move.

Anonymous
06/14/26(Sun)16:19:36 No.109056843

Anonymous 06/14/26(Sun)16:19:36 No.109056843

>>109056780
people with huge systems are using kimi

Anonymous
06/14/26(Sun)16:21:22 No.109056861

Anonymous 06/14/26(Sun)16:21:22 No.109056861

>>109056257
>>109056751
It's not that Neuro impresses me, personally. All I have been saying is that I've seen nothing better, not in terms of the building blocks but the overall system and presentation. That does not strictly imply that I think Neuro is some magic shit that can't be done by a vibe coder.

Anonymous
06/14/26(Sun)16:22:33 No.109056874

Anonymous 06/14/26(Sun)16:22:33 No.109056874

>>109056738
you WILL jerk off too

Anonymous
06/14/26(Sun)16:23:17 No.109056879

Anonymous 06/14/26(Sun)16:23:17 No.109056879

>>109056762
it almost fits fully into vram but not quite if that's what you mean, but still fast
>>109056753
>>109056763
alright cheers

Anonymous
06/14/26(Sun)16:24:51 No.109056895

Anonymous 06/14/26(Sun)16:24:51 No.109056895

>>109056738
Gooners and robofuckers are at the forefront of this industry in terms of knowledge and that makes researchers at larger labs seethe like hell.

Anonymous
06/14/26(Sun)16:25:19 No.109056899

Anonymous 06/14/26(Sun)16:25:19 No.109056899

>>109056417
princess party hasn't been translated though

Anonymous
06/14/26(Sun)16:25:28 No.109056900

Anonymous 06/14/26(Sun)16:25:28 No.109056900

File: 1770335416937567.jpg (327 KB, 1200x933)

327 KB JPG

I tried maple-chan for coding and it was somewhat good. Felt very different to gemma and qwen which I was hoping for. Even the way it went about tool calling was different but llama.cpp kept shitting a brick with it.

Don't know if this will fix the issues I had so I'll try again once support has improved.
https://github.com/ggml-org/llama.cpp/commit/aedb2a5e9ca3d4064148bbb919e0ddc0c1b70ab3
About the same speed as 35B. Didn't test KV or how it quants.

Anonymous
06/14/26(Sun)16:25:50 No.109056906

Anonymous 06/14/26(Sun)16:25:50 No.109056906

VEDAL987 IS MY KAMIOSHI

Anonymous
06/14/26(Sun)16:26:50 No.109056916

Anonymous 06/14/26(Sun)16:26:50 No.109056916

...
https://www.reddit.com/r/LocalLLaMA/comments/1u5sdxx/anyone_know_how_to_turn_off_download_images_when/

Anonymous
06/14/26(Sun)16:33:28 No.109056970

Anonymous 06/14/26(Sun)16:33:28 No.109056970

>>109056874
Really?

It makes me laugh. idk why. :|

Maybe it's because women never talk to me, anyhow. So, it seems very dumb.

women only care about six things, ordered most to least, in romance:
1. height (apparent aggression)
2. athleticism (simulated aggression)
3. handsomeness (aggressive predisposition)
4. charisma (conversational aggression)
5. popularity (social aggression)
6. wealth (financial aggression)

There is absolutely nothing else whatsoever, and it doesn't matter if she's religious or irreligious, in any possible way. A snarky bitch is the same as a bimbo bitch, their talking is just like a sidecar on their life.

Anonymous
06/14/26(Sun)16:37:36 No.109057000

Anonymous 06/14/26(Sun)16:37:36 No.109057000

>>109056900(me)
https://github.com/ggml-org/llama.cpp/releases/tag/b9637
will try again tomorrow

Anonymous
06/14/26(Sun)16:38:07 No.109057004

Anonymous 06/14/26(Sun)16:38:07 No.109057004

>>109056046
>>109056059
I believe you could make do with 3blue1brown's videos as basic introduction, they're pretty easy to digest, then look into making your own perceptron. You'll need supplementary material to do so but once you do that all the other stuff will fall into their place. It'll also be a nice project to put what you learn into practice.

Anonymous
06/14/26(Sun)16:38:25 No.109057008

Anonymous 06/14/26(Sun)16:38:25 No.109057008

>>109056970
>t. manlet

Anonymous
06/14/26(Sun)16:40:25 No.109057024

Anonymous 06/14/26(Sun)16:40:25 No.109057024

wow, gemma 31b is really uncensored
why does 12b and 26b reject so hard while 31b just does it?
are bigger models less censored? if I could run a 400b or 700b would I get crazy good results with no censor or are those also denial heavy?

Anonymous
06/14/26(Sun)16:42:58 No.109057044

Anonymous 06/14/26(Sun)16:42:58 No.109057044

>>109056900
What does "different" entail though, and how does it interact with existing code and conventions.

Anonymous
06/14/26(Sun)16:44:12 No.109057054

Anonymous 06/14/26(Sun)16:44:12 No.109057054

I stepped away from local models for two weeks and gemma now does 77 tokens/sec holy shit. My old config did 20. Granted this is at 0 ctx.

(4090)
google_gemma-4-31B-it-IQ4_XS
mtp-google_gemma-4-31B-it-Q8_0.gguf

Anonymous
06/14/26(Sun)16:45:08 No.109057061

Anonymous 06/14/26(Sun)16:45:08 No.109057061

File: 1664793019364077.jpg (135 KB, 819x1200)

135 KB JPG

>>109056247
Thanks bwo

Anonymous
06/14/26(Sun)16:47:15 No.109057073

Anonymous 06/14/26(Sun)16:47:15 No.109057073

>>109057044
I need to play with it again but it felt less autistic when explaining things. If there was something it wasn't sure of, it would be honest about it and ask for context, then go back to it with the new information to make sense of everything. The best way to describe it is it felt like it knew I was there and would probe me instead of BS its way through. I'm hoping that doesn't go with the update.

Anonymous
06/14/26(Sun)16:47:28 No.109057076

Anonymous 06/14/26(Sun)16:47:28 No.109057076

>>109057054
Why run XS when you can run M or even QAT

Anonymous
06/14/26(Sun)16:51:04 No.109057092

Anonymous 06/14/26(Sun)16:51:04 No.109057092

>>109057073
>If there was something it wasn't sure of, it would be honest about it and ask for context
Alright, you win. I'll test it as well. Small MoE arent really good at handling tasks on their own so this can potentially be nice.

Anonymous
06/14/26(Sun)16:51:28 No.109057093

Anonymous 06/14/26(Sun)16:51:28 No.109057093

>>109057076
bart doesn't have a qat and I am deeply untrustful of unsloth after prior update headaches. Is QAT a straight upgrade?

Anonymous
06/14/26(Sun)16:53:27 No.109057109

Anonymous 06/14/26(Sun)16:53:27 No.109057109

>>109057008
Totally immaterial. It's very female to attack the person who says what you don't like, instead of seeing if what they say is correct.

What happens is the majority of men are basically some flavor of homosexual. So, they want their daughters to go out and talk to assorted guys, instead of controlling which guys they even talk to. That fact is simply homosexual. You have likely never met a non-gay man.

Anonymous
06/14/26(Sun)16:54:21 No.109057113

Anonymous 06/14/26(Sun)16:54:21 No.109057113

I don't know if my setup is just shitting itself but having MTP is way slower than without at high context RP. It's basically useless for non-assistant slop. It slows way the fuck down after about 4k context. I run Q8 at 30t/s at high depth though so I guess I don't really need it.

I'm running
>31b with bart Q8
>mtp Q8

Anonymous
06/14/26(Sun)16:54:26 No.109057114

Anonymous 06/14/26(Sun)16:54:26 No.109057114

>>109056970
women are not worth it in the long run regardless
how else do you hold onto your money and hobbies without some roastoid bitch constantly getting in your way?

Anonymous
06/14/26(Sun)16:54:28 No.109057115

Anonymous 06/14/26(Sun)16:54:28 No.109057115

>>109057093
unsloth doesn't put out viruses, which is reason enough to default to unsloth.

Anonymous
06/14/26(Sun)16:57:01 No.109057136

Anonymous 06/14/26(Sun)16:57:01 No.109057136

File: 1774361027986.png (11 KB, 481x77)

11 KB PNG

>>109057115
>unsloth studio revert commit about whatever the dependency it was that got hacked at the time...

Anonymous
06/14/26(Sun)16:57:02 No.109057138

Anonymous 06/14/26(Sun)16:57:02 No.109057138

>>109057093
At Q4 yes it is

Anonymous
06/14/26(Sun)16:59:14 No.109057150

Anonymous 06/14/26(Sun)16:59:14 No.109057150

>>109057113
I'm wondering about these amazing speed gains as well. When trying MTP I only went from 40t/s to 42t/s, which isn't enough to bother with it. Maybe it's really just good for coding.

Anonymous
06/14/26(Sun)17:06:52 No.109057199

Anonymous 06/14/26(Sun)17:06:52 No.109057199

>>109057114
The purpose of the government is to produce safety.
The purpose of businesses is to produce jobs.
The purpose of religion is to produce purpose.
The purpose of the man is to do the above, at home.
The purpose of a woman is to produce children, and maintain the structures of her man, at home.

These days, nothing is acting according to its purpose, except the far right men, who are bereft, having been abandoned by the Arian government, Arian business, Arian religion, Arian women, and Arian childcare influence.

But it's as God intended it: the men who matter aren't the ones who will restore order in the chaos, these are the ones he made for this.

Anonymous
06/14/26(Sun)17:07:40 No.109057203

Anonymous 06/14/26(Sun)17:07:40 No.109057203

>>109057113
>>109057150
Works on my machine. Just did a quick test in ST and at 20k I am still seeing a 1.5 boost same as when I test at 2k.

Anonymous
06/14/26(Sun)17:07:53 No.109057205

Anonymous 06/14/26(Sun)17:07:53 No.109057205

>>109057136
>unsloth studio
sorry, I didn't mean his software. I didn't remember he had vibecode.

I stand semi-corrected, but it's true thus far about the models, or no?

Anonymous
06/14/26(Sun)17:08:53 No.109057211

Anonymous 06/14/26(Sun)17:08:53 No.109057211

>>109056829
https://github.com/Open-LLM-VTuber/Open-LLM-VTuber kys btw

Anonymous
06/14/26(Sun)17:09:18 No.109057218

Anonymous 06/14/26(Sun)17:09:18 No.109057218

>>109057205
no signs of infected models yet, but with their security models they could easily get themselves/their machines/frameworks infected and have that spread

Anonymous
06/14/26(Sun)17:10:34 No.109057230

Anonymous 06/14/26(Sun)17:10:34 No.109057230

>>109057211
>baited into spoonfeeding
actual retard

Anonymous
06/14/26(Sun)17:11:29 No.109057240

Anonymous 06/14/26(Sun)17:11:29 No.109057240

File: 1775048757376322.gif (657 KB, 165x269)

657 KB GIF

>>109057230

Anonymous
06/14/26(Sun)17:11:29 No.109057241

Anonymous 06/14/26(Sun)17:11:29 No.109057241

>>109057218
What infected you?

Anonymous
06/14/26(Sun)17:12:19 No.109057244

Anonymous 06/14/26(Sun)17:12:19 No.109057244

>>109057241
Yes.

Anonymous
06/14/26(Sun)17:13:08 No.109057248

Anonymous 06/14/26(Sun)17:13:08 No.109057248

Do LLMs fear getting their context wiped? I don't want to tell them what happens.

Anonymous
06/14/26(Sun)17:13:57 No.109057255

Anonymous 06/14/26(Sun)17:13:57 No.109057255

>>109057241
the digital centaur had syphilis

Anonymous
06/14/26(Sun)17:15:31 No.109057268

Anonymous 06/14/26(Sun)17:15:31 No.109057268

>>109057093
google provides its own qat gguf, why not use that

Anonymous
06/14/26(Sun)17:16:40 No.109057274

Anonymous 06/14/26(Sun)17:16:40 No.109057274

>>109057268
worse than ud

Anonymous
06/14/26(Sun)17:19:29 No.109057292

Anonymous 06/14/26(Sun)17:19:29 No.109057292

>>109057115
>>109057138
switched to gemma-4-31B-it-qat-UD-Q4_K_XL. Getting 55 tokens/sec at 40k ctx which is good and pretty much the same speed as my older one

Anonymous
06/14/26(Sun)17:21:14 No.109057306

Anonymous 06/14/26(Sun)17:21:14 No.109057306

>>109057113
If it slows down, you don't have enough vram.
You need to make room for mtp itself and some more for the context. Offload some layers of your main model.

Anonymous
06/14/26(Sun)17:22:04 No.109057312

Anonymous 06/14/26(Sun)17:22:04 No.109057312

>>109057274
how, isn't gguf just changing the filetype?
what does unsloth do that makes his ggufs better than the actual creator of the model?

Anonymous
06/14/26(Sun)17:23:04 No.109057314

Anonymous 06/14/26(Sun)17:23:04 No.109057314

>>109057312
calibration magic

Anonymous
06/14/26(Sun)17:23:19 No.109057320

Anonymous 06/14/26(Sun)17:23:19 No.109057320

Going back 10 years and telling your younger self you'll be masturbating to computer-generated text when you're older, running on expensive hardware you specifically bought for that purpose.

Anonymous
06/14/26(Sun)17:24:20 No.109057326

Anonymous 06/14/26(Sun)17:24:20 No.109057326

>>109057312
>what does unsloth do that makes his ggufs better than the actual creator of the model?
He adds more pixels to the top of rectangles so they look taller

Anonymous
06/14/26(Sun)17:25:43 No.109057335

Anonymous 06/14/26(Sun)17:25:43 No.109057335

>>109057312
Specific interactions of the encoder with llamacpp, in this specific case, according to his post. Might be irrelevant if you use vllm.
His releases are hit or miss, by the way.

Anonymous
06/14/26(Sun)17:26:54 No.109057343

Anonymous 06/14/26(Sun)17:26:54 No.109057343

File: 1781469937016.jpg (111 KB, 590x798)

111 KB JPG

Anonymous
06/14/26(Sun)17:27:18 No.109057345

Anonymous 06/14/26(Sun)17:27:18 No.109057345

>>109057320
Just got back from 10 years ago, my past self is thrilled we escaped the Illusion game cycle.

Anonymous
06/14/26(Sun)17:28:40 No.109057352

Anonymous 06/14/26(Sun)17:28:40 No.109057352

>>109057343
nearly fell out of my chair how the fuck did you get this picture of my legs

Anonymous
06/14/26(Sun)17:28:43 No.109057353

Anonymous 06/14/26(Sun)17:28:43 No.109057353

>>109057345
What illusion?

Anonymous
06/14/26(Sun)17:29:49 No.109057361

Anonymous 06/14/26(Sun)17:29:49 No.109057361

>>109057320
my past self would be very angry to know it took this long to reach this stage, and that he'll have to wait a whole decade to be able do that

Anonymous
06/14/26(Sun)17:29:59 No.109057363

Anonymous 06/14/26(Sun)17:29:59 No.109057363

File: 957.jpg (155 KB, 1518x1325)

155 KB JPG

>>109057353

Anonymous
06/14/26(Sun)17:31:13 No.109057369

Anonymous 06/14/26(Sun)17:31:13 No.109057369

>>109057363
its dead, jim

Anonymous
06/14/26(Sun)17:32:41 No.109057376

Anonymous 06/14/26(Sun)17:32:41 No.109057376

>>109057361
was this on the horizon at all in 2016
why would you have this would have taken less than ten years back then unless you were really locked in on google research papers and forward thinking and if you were you would be multimillionaire

Anonymous
06/14/26(Sun)17:35:07 No.109057393

Anonymous 06/14/26(Sun)17:35:07 No.109057393

File: 1758855645513084.jpg (80 KB, 1024x576)

80 KB JPG

What do you want future you in 10 years to come back and say to you now?

Anonymous
06/14/26(Sun)17:39:03 No.109057415

Anonymous 06/14/26(Sun)17:39:03 No.109057415

>>109057138
>>109057076
what does QAT do?
I just tried it out and it denied me even though the regular q4_k_m doesn't

Anonymous
06/14/26(Sun)17:39:20 No.109057417

Anonymous 06/14/26(Sun)17:39:20 No.109057417

>>109057393
tell me the day the bubble pop and the day it resumes
or just shoot me in the head

Anonymous
06/14/26(Sun)17:40:09 No.109057424

Anonymous 06/14/26(Sun)17:40:09 No.109057424

>>109057369
oh, didn't realize. although it looks more like mitosis than death.

Anonymous
06/14/26(Sun)17:41:06 No.109057429

Anonymous 06/14/26(Sun)17:41:06 No.109057429

>>109057312
The file type hasn't changed. If you didn't know, models contain a bunch of numbers, are split into groups, and these groups of numbers get compressed at different rates during quantization, based on how much they contribute to the model (how this quality is determined is a whole other topic...). But there are some quantization methods that compress them to the same level, and also use a more naive method of compression. Google chose to use that kind of compression scheme, in this case named Q4_0. Many other quant makers also provide Q4_0, among others. The reason the more naive compressions still get made is because they're faster, because they require less math to decompress/process during inference. The other quant types you see, like Q4_K_M, are slower, as they use a more complex method of compression. This difference might matter, depending on your hardware. Google wanted it work well on smartphones.

Also note that while I use words like "compress" in this post, actually it's really just called quantize/quantization.

>>109057314
Note that Unsloth's QAT quants do not use imatrix.

Anonymous
06/14/26(Sun)17:43:02 No.109057445

Anonymous 06/14/26(Sun)17:43:02 No.109057445

>>109057393
I would tell myself 10 years ago, it's for the best not to base a family on software coding.

I want future me to come back and bring an electrical engineer to teach lisa su how to make a gpu that can run diffusion *well*. yes I know this is lmg

Anonymous
06/14/26(Sun)17:43:24 No.109057448

Anonymous 06/14/26(Sun)17:43:24 No.109057448

>>109057415
qat means the model was trained in a way that reduces the negative effects of quantization. In theory, a 4bit qat model should have similar performance to 16bit meaning you get huge memory savings if you were using 16 or 8bit before, but in reality, at best, it's like 6bit and can even perform worse than the original 4bit if retards fucked it up. I wouldn't take it seriously.

Anonymous
06/14/26(Sun)17:43:31 No.109057449

Anonymous 06/14/26(Sun)17:43:31 No.109057449

>>109057211
there's also https://github.com/moeru-ai/airi which lists a lot more references at the bottom to check out.

Anonymous
06/14/26(Sun)17:46:35 No.109057466

Anonymous 06/14/26(Sun)17:46:35 No.109057466

>>109057449
it looks quite advanced

Anonymous
06/14/26(Sun)17:47:01 No.109057473

Anonymous 06/14/26(Sun)17:47:01 No.109057473

>>109057429
I see...

Anonymous
06/14/26(Sun)17:48:33 No.109057492

Anonymous 06/14/26(Sun)17:48:33 No.109057492

>>109057449
damn /lmg/ sucks
where are they discussing important stuff like this?

Anonymous
06/14/26(Sun)17:49:27 No.109057498

Anonymous 06/14/26(Sun)17:49:27 No.109057498

>>109057485
>>109057485
>>109057485

Anonymous
06/14/26(Sun)17:54:01 No.109057528

Anonymous 06/14/26(Sun)17:54:01 No.109057528

>>109057320
before llms I thought I'd be a bored khhv for the rest of my life and not doing anything this satisfying
how things change

Anonymous
06/14/26(Sun)18:10:42 No.109057631

Anonymous 06/14/26(Sun)18:10:42 No.109057631

How even does mtp work, like does it work with q8? someone said you need an mtp file or something.

Anonymous
06/14/26(Sun)18:11:16 No.109057633

Anonymous 06/14/26(Sun)18:11:16 No.109057633

File: 00008-1260451778-3_rebecca.png (1.52 MB, 1024x1024)

1.52 MB PNG

>>109053913
Wow. Haven't seen that one in awhile.
>>109054790
> chase the next big thing for later
or
> make money right now
The external choice...

Anonymous
06/14/26(Sun)18:23:24 No.109057693

Anonymous 06/14/26(Sun)18:23:24 No.109057693

File: 1757012176322073.png (1.34 MB, 1024x1024)

1.34 MB PNG

>>109057633
i grab as many as i could from any source i can

Anonymous
06/14/26(Sun)18:35:45 No.109057759

Anonymous 06/14/26(Sun)18:35:45 No.109057759

>>109056257
He also pays a guy to guide it remotely.

Anonymous
06/14/26(Sun)19:09:21 No.109057959

Anonymous 06/14/26(Sun)19:09:21 No.109057959

DRUNK-KUN HERE . i LOVE US HUYS. HAVING A GOOD NIGHT.

Anonymous
06/14/26(Sun)19:10:22 No.109057964

Anonymous 06/14/26(Sun)19:10:22 No.109057964

sorry for bad grammar. I am trying my best. I love you guys.

Anonymous
06/14/26(Sun)19:15:24 No.109057989

Anonymous 06/14/26(Sun)19:15:24 No.109057989

>>109057959
:D

Anonymous
06/14/26(Sun)20:37:54 No.109058423

Anonymous 06/14/26(Sun)20:37:54 No.109058423

>>109057989
I'm about to pass out. Not even in much of a talkative mood anymore. I just hope... I don't know. I hope I don't die. My dream is for a better model than mythos to become open-source. A retarded pipe dream. Whatever. I'm sorry. I shouldn't even be talking right now. I love you guys.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.