/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/23/25(Sat)11:55:42 No.106358752

File: 1743052378919146.jpg (186 KB, 768x1024)

186 KB JPG

/lmg/ - Local Models General Anonymous 08/23/25(Sat)11:55:42 No.106358752 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106351514 & >>106345562

►News
>(08/21) Command A Reasoning released: https://hf.co/CohereLabs/command-a-reasoning-08-2025
>(08/20) ByteDance releases Seed-OSS-36B models: https://github.com/ByteDance-Seed/seed-oss
>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base
>(08/18) Nemotron Nano 2 released: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2
>(08/15) Ovis2.5 MLLMs released: https://huggingface.co/collections/AIDC-AI/ovis25-689ec1474633b2aab8809335

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
08/23/25(Sat)11:56:14 No.106358757

Anonymous 08/23/25(Sat)11:56:14 No.106358757

File: image_2025-08-15_085459386.png (185 KB, 450x450)

185 KB PNG

►Recent Highlights from the Previous Thread: >>106351514

--Qwen VL blocks Mao commemorative tea image due to political content moderation:
>106352603 >106352638 >106352653 >106352678 >106352695 >106352729 >106352741 >106352766 >106352778 >106352788 >106352794 >106352824 >106354537 >106354560
--GPU frequency locking affects code path performance and can't be queried:
>106351737 >106351762 >106351867 >106351875 >106351889 >106351911
--Frontend differences affecting token generation speed on same backend:
>106353506 >106353548 >106353898 >106354113 >106353905 >106354026
--Reasoning pre-fill exploits model trust bias for stronger output control:
>106354146 >106354174 >106354426 >106354778 >106354793 >106354614
--Meta partners with Midjourney, sparking criticism and speculation:
>106352643 >106352648 >106352649 >106354887 >106355765
--Avoid FP16 CUDA flags to prevent numerical overflow in quantized models:
>106356396 >106356788
--Qwen models overusing "not x but y" phrasing:
>106353981 >106353997 >106354008 >106354031 >106354058 >106354159 >106354182 >106356075
--GPU memory fault due to excessive GPU offload layers and poor memory management:
>106352359 >106352374 >106352413 >106352428 >106352463 >106352578 >106352673
--Maximize VRAM usage during fine-tuning for optimal throughput:
>106355943 >106356138 >106356180 >106356282
--Anons deploy local LLMs for gaming, finance, automation, and adult content:
>106354780 >106354986 >106355189 >106355209 >106355240
--OpenAI's India expansion mirrors past tech offshoring trends:
>106353105 >106353224 >106353263
--Seed 36B model support merged:
>106354673 >106355049 >106357911
--Illegal GPU memory access likely caused by index calculation bugs, not VRAM capacity:
>106352021 >106352040
--Copyright lawsuit accuses Meta of using pirated adult films for AI training:
>106352956
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>106351520

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
08/23/25(Sat)11:58:06 No.106358772

Anonymous 08/23/25(Sat)11:58:06 No.106358772

Local AI is as good as dead if we don't get a local equivalent to Genie 3 by the end of this year.

Anonymous
08/23/25(Sat)11:58:41 No.106358780

Anonymous 08/23/25(Sat)11:58:41 No.106358780

>>106358752
>Command A Reasoning released
How is it?

Anonymous
08/23/25(Sat)12:00:02 No.106358795

Anonymous 08/23/25(Sat)12:00:02 No.106358795

>>106358772
whats the plan then genius

Anonymous
08/23/25(Sat)12:00:07 No.106358797

Anonymous 08/23/25(Sat)12:00:07 No.106358797

loli feet

Anonymous
08/23/25(Sat)12:01:01 No.106358804

Anonymous 08/23/25(Sat)12:01:01 No.106358804

>>106358772
How have you pushed local models in order to realize this claim?

Anonymous
08/23/25(Sat)12:01:39 No.106358809

Anonymous 08/23/25(Sat)12:01:39 No.106358809

>>106358780
Cohere has completely committed to slopping up and safety cucking their shit.

Anonymous
08/23/25(Sat)12:04:55 No.106358831

Anonymous 08/23/25(Sat)12:04:55 No.106358831

How come ST doesn't have some simple tool calling yet that lets the model roll a die or something dynamically? Why are local models so far behind?

Anonymous
08/23/25(Sat)12:05:00 No.106358832

Anonymous 08/23/25(Sat)12:05:00 No.106358832

File: 250815_blog_command-a-rea(...).png (29 KB, 2048x1263)

29 KB PNG

>>106358780
It is absolutely safe.

Anonymous
08/23/25(Sat)12:06:45 No.106358856

Anonymous 08/23/25(Sat)12:06:45 No.106358856

>>106358780
It competes with gpt-oss

Anonymous
08/23/25(Sat)12:08:50 No.106358878

Anonymous 08/23/25(Sat)12:08:50 No.106358878

>>106358832
You are absolutely right! Bringing safety to all is a part of my core programming.

Anonymous
08/23/25(Sat)12:09:09 No.106358884

Anonymous 08/23/25(Sat)12:09:09 No.106358884

>>106355818
Hoping someone could patch command-a-reasoning-08-2025 into ST. Model works over the API trial key.

"thinking": {
    "type": "disabled",  # enabled by default
    "token_budget": 500  # no error on disabled, no max, unlimited when not specified
}

"message": {
  "role": "assistant",
  "content": [
    {
      "type": "thinking",
      "thinking": "stuff here"
    },
    {
      "type": "text",
      "text": "final response here"
    }
  ]
}

Anonymous
08/23/25(Sat)12:09:29 No.106358886

Anonymous 08/23/25(Sat)12:09:29 No.106358886

>>106358831
SillyTavern is not a local model it's an user interface.

Anonymous
08/23/25(Sat)12:09:32 No.106358887

Anonymous 08/23/25(Sat)12:09:32 No.106358887

File: 1754493464792375.png (1.4 MB, 1664x928)

1.4 MB PNG

Anonymous
08/23/25(Sat)12:09:45 No.106358890

Anonymous 08/23/25(Sat)12:09:45 No.106358890

>>106358831
The bloated broken mess that is ServiceTesnor is single-handedly holding back local.

Anonymous
08/23/25(Sat)12:09:57 No.106358892

Anonymous 08/23/25(Sat)12:09:57 No.106358892

File: LLM-history-fancy.png (1.28 MB, 7279x3166)

1.28 MB PNG

Little update

Anonymous
08/23/25(Sat)12:10:39 No.106358902

Anonymous 08/23/25(Sat)12:10:39 No.106358902

>>106358831
>How come ST doesn't have some simple tool calling yet that lets the model roll a die or something dynamically?
It literally does. Ask gpt to look up the documentation

Anonymous
08/23/25(Sat)12:13:12 No.106358922

Anonymous 08/23/25(Sat)12:13:12 No.106358922

>>106358892
Seems like you ran out of colours. Mentioning individual dev is also nasty and irrelevant.

Anonymous
08/23/25(Sat)12:16:21 No.106358959

Anonymous 08/23/25(Sat)12:16:21 No.106358959

>>106358922
>Mentioning individual dev is also nasty and irrelevant.
Which dev?

Anonymous
08/23/25(Sat)12:17:41 No.106358974

Anonymous 08/23/25(Sat)12:17:41 No.106358974

command-a-reasoning really was the final punch in the dick of densessisies

Anonymous
08/23/25(Sat)12:18:14 No.106358980

Anonymous 08/23/25(Sat)12:18:14 No.106358980

>>106358959
are you incapable of reading you even quoted the name

Anonymous
08/23/25(Sat)12:19:14 No.106358995

Anonymous 08/23/25(Sat)12:19:14 No.106358995

>>106358980
moetards are really trying to play up a cohere model failing as a win for themselves?
oh no no no

Anonymous
08/23/25(Sat)12:21:38 No.106359017

Anonymous 08/23/25(Sat)12:21:38 No.106359017

>>106358980
What? Bro, how many B is your brain and at what quant is it running?

Anonymous
08/23/25(Sat)12:25:19 No.106359056

Anonymous 08/23/25(Sat)12:25:19 No.106359056

>>106359017
You are absolutely right to question this

Anonymous
08/23/25(Sat)12:27:19 No.106359070

Anonymous 08/23/25(Sat)12:27:19 No.106359070

>>106358892
DS V3 0324 so good it was mentioned twice

Anonymous
08/23/25(Sat)12:28:34 No.106359090

Anonymous 08/23/25(Sat)12:28:34 No.106359090

Best way to make a disappointment build?

Anonymous
08/23/25(Sat)12:29:09 No.106359097

Anonymous 08/23/25(Sat)12:29:09 No.106359097

>>106359090
Buy a prebuilt

Anonymous
08/23/25(Sat)12:29:10 No.106359098

Anonymous 08/23/25(Sat)12:29:10 No.106359098

>>106358892
Retard

Anonymous
08/23/25(Sat)12:29:43 No.106359104

Anonymous 08/23/25(Sat)12:29:43 No.106359104

crazy how we're still stuck with sillytavern in 2025 when it's essentially stuck as a cobbled together piece of shit from 2023 for all eternity

Anonymous
08/23/25(Sat)12:29:45 No.106359105

Anonymous 08/23/25(Sat)12:29:45 No.106359105

>>106358892
Thanks

Anonymous
08/23/25(Sat)12:29:53 No.106359107

Anonymous 08/23/25(Sat)12:29:53 No.106359107

>>106359090
Buy a premade HP and realise it has 9nly two ram sockets.

Anonymous
08/23/25(Sat)12:30:26 No.106359114

Anonymous 08/23/25(Sat)12:30:26 No.106359114

>>106359104
this but llms in general

Anonymous
08/23/25(Sat)12:31:05 No.106359119

Anonymous 08/23/25(Sat)12:31:05 No.106359119

>>106359090
Buy enough RAM to run deepseek and realize it's slower than the slowest cloud provider

Anonymous
08/23/25(Sat)12:32:11 No.106359123

Anonymous 08/23/25(Sat)12:32:11 No.106359123

>>106359104
Try doing better

Anonymous
08/23/25(Sat)12:32:26 No.106359124

Anonymous 08/23/25(Sat)12:32:26 No.106359124

Cloud will always be cheaper and faster than local because you aren't running your local model 24/7

Anonymous
08/23/25(Sat)12:32:49 No.106359128

Anonymous 08/23/25(Sat)12:32:49 No.106359128

>>106359104
Vibe code your own interface. You're sending formatted strings to model and back. All you need to know is how to implement tags for each model you are using and how to keep every string in order. It's that simple.

Anonymous
08/23/25(Sat)12:33:12 No.106359132

Anonymous 08/23/25(Sat)12:33:12 No.106359132

>>106359090
Forgo the build and give away your privacy, autonomy and personal information to use the cloud instead.

Anonymous
08/23/25(Sat)12:34:27 No.106359148

Anonymous 08/23/25(Sat)12:34:27 No.106359148

People who still use shit cards from 2023 think they're in their rights to criticize 2025 models

Anonymous
08/23/25(Sat)12:35:24 No.106359164

Anonymous 08/23/25(Sat)12:35:24 No.106359164

>>106359090
>RTX 5070 12 GB
>slow 16GB DDR4 RAM
>Intel i3-14100

Anonymous
08/23/25(Sat)12:37:53 No.106359184

Anonymous 08/23/25(Sat)12:37:53 No.106359184

>>106358892
Jamba sisters...

Anonymous
08/23/25(Sat)12:40:04 No.106359205

Anonymous 08/23/25(Sat)12:40:04 No.106359205

>>106359148
bro I haven't touched a card with less than 3k tokens in a year and a half

Anonymous
08/23/25(Sat)12:40:35 No.106359212

Anonymous 08/23/25(Sat)12:40:35 No.106359212

>>106359090
spend about $15k on hardware and run the best local model you can find

Anonymous
08/23/25(Sat)12:42:35 No.106359234

Anonymous 08/23/25(Sat)12:42:35 No.106359234

>>536373993
>>536373993
>>536373993
Apologize to rentry

Anonymous
08/23/25(Sat)12:43:19 No.106359241

Anonymous 08/23/25(Sat)12:43:19 No.106359241

>>106359070
Will correct it in the next version.

>>106359184
Will add as a note in summer flood.

Anonymous
08/23/25(Sat)12:43:45 No.106359246

Anonymous 08/23/25(Sat)12:43:45 No.106359246

>>106359234
>>>/vg/536373993
>>>/vg/536373993
>>>/vg/536373993
oops

Anonymous
08/23/25(Sat)12:44:44 No.106359256

Anonymous 08/23/25(Sat)12:44:44 No.106359256

File: GooH6mwWIAEGDaG.jpg (81 KB, 1000x707)

81 KB JPG

>Try Qwen +200b
>Purple prose schizo
>GLM, and Deepseek are MoEs
>Kimi too big for local
I await my Magnum v5.

Anonymous
08/23/25(Sat)12:45:48 No.106359261

Anonymous 08/23/25(Sat)12:45:48 No.106359261

>>106359256
This brings memories. Funny how that image slipped past filters.

Anonymous
08/23/25(Sat)12:46:01 No.106359263

Anonymous 08/23/25(Sat)12:46:01 No.106359263

>>106359256
kimi and qwen are also moes

Anonymous
08/23/25(Sat)12:46:53 No.106359275

Anonymous 08/23/25(Sat)12:46:53 No.106359275

>>106359256
Qwen is a MoE too...

Anonymous
08/23/25(Sat)12:48:00 No.106359285

Anonymous 08/23/25(Sat)12:48:00 No.106359285

>>106359263
>>106359275
Fuck me, I saw instruct and thought it wasn't. This explains everything.

Anonymous
08/23/25(Sat)12:48:40 No.106359290

Anonymous 08/23/25(Sat)12:48:40 No.106359290

If dense is so good
Why aren't more people training them

Anonymous
08/23/25(Sat)12:49:31 No.106359299

Anonymous 08/23/25(Sat)12:49:31 No.106359299

>>106359290
because expensive, and as always once something becomes anywhere big it's race to the bottom time

Anonymous
08/23/25(Sat)12:50:51 No.106359311

Anonymous 08/23/25(Sat)12:50:51 No.106359311

>>106359299
Why don't people that want dense models train their own?

Anonymous
08/23/25(Sat)12:51:14 No.106359315

Anonymous 08/23/25(Sat)12:51:14 No.106359315

>>106359311
refer to >>106359299
>because expensive

Anonymous
08/23/25(Sat)12:51:15 No.106359316

Anonymous 08/23/25(Sat)12:51:15 No.106359316

>>106359290
Everyone wants to be the next deepseek now

Anonymous
08/23/25(Sat)12:54:57 No.106359351

Anonymous 08/23/25(Sat)12:54:57 No.106359351

>>106358892
Is this a joke? DeepSeek was never good.

Anonymous
08/23/25(Sat)12:56:30 No.106359363

Anonymous 08/23/25(Sat)12:56:30 No.106359363

>>106359351
Fuck off :D

Anonymous
08/23/25(Sat)12:56:40 No.106359364

Anonymous 08/23/25(Sat)12:56:40 No.106359364

>>106359351
just let it go sam

Anonymous
08/23/25(Sat)12:57:48 No.106359373

Anonymous 08/23/25(Sat)12:57:48 No.106359373

>>106359351
Sam, it's been almost nine months. Please settle down.

Anonymous
08/23/25(Sat)12:59:59 No.106359396

Anonymous 08/23/25(Sat)12:59:59 No.106359396

>>106359315
wtf are you poor?

Anonymous
08/23/25(Sat)13:03:31 No.106359425

Anonymous 08/23/25(Sat)13:03:31 No.106359425

I was drunk last night and downloaded GPT Ass. Jesus, I promptly deleted it today.

Anonymous
08/23/25(Sat)13:04:07 No.106359434

Anonymous 08/23/25(Sat)13:04:07 No.106359434

>>106358892
hybrid reasoners work fine, look at GLM

Anonymous
08/23/25(Sat)13:05:45 No.106359448

Anonymous 08/23/25(Sat)13:05:45 No.106359448

File: e107305d653cfc62117d586b5(...).jpg (153 KB, 736x894)

153 KB JPG

Is there a trick to prompting moes that I'm not aware of? GLM, Deepseek, and Qwen3 are all schizo when I use them.

Anonymous
08/23/25(Sat)13:05:49 No.106359450

Anonymous 08/23/25(Sat)13:05:49 No.106359450

>>106359351
Openai was never good

Anonymous
08/23/25(Sat)13:07:45 No.106359465

Anonymous 08/23/25(Sat)13:07:45 No.106359465

>>106358831
You need to write an extension to give it tools.

Anonymous
08/23/25(Sat)13:08:33 No.106359470

Anonymous 08/23/25(Sat)13:08:33 No.106359470

File: Untitled.jpg (331 KB, 1637x817)

331 KB JPG

WTF cheater

Anonymous
08/23/25(Sat)13:09:04 No.106359474

Anonymous 08/23/25(Sat)13:09:04 No.106359474

>>106359450
o3 was good. Was expensive too. But good.

Anonymous
08/23/25(Sat)13:20:08 No.106359573

Anonymous 08/23/25(Sat)13:20:08 No.106359573

>compute_imatrix: 1500.86 seconds per pass - ETA 973 hours 53.38 minutes
O-oh...

Anonymous
08/23/25(Sat)13:22:38 No.106359595

Anonymous 08/23/25(Sat)13:22:38 No.106359595

playing games with reasoning models sure is time consuming

Anonymous
08/23/25(Sat)13:23:49 No.106359604

Anonymous 08/23/25(Sat)13:23:49 No.106359604

>>106359595
Hybrid reasoners are perfect for that.

Anonymous
08/23/25(Sat)13:24:35 No.106359613

Anonymous 08/23/25(Sat)13:24:35 No.106359613

>>106359595
Prefil the reasoning with relevant information.
Hell, inject lorebook entries in the reasoning block even.

Anonymous
08/23/25(Sat)13:29:18 No.106359645

Anonymous 08/23/25(Sat)13:29:18 No.106359645

Is it just me or are inline latex single dollar signs not rendering on DS webapp

Anonymous
08/23/25(Sat)13:35:00 No.106359709

Anonymous 08/23/25(Sat)13:35:00 No.106359709

I notice the same model takes twice as long responding to my prompt on Open WebUI as in the Ollama interface. Most of the time is loading time as I wait for the first word to appear. This happens even if I run the same prompt back-to-back in a different chat, so it's not Open WebUI loading the model for the first time. I know Open WebUI adds overhead, but this is suspicious. Anything I can check in my settings?

Anonymous
08/23/25(Sat)13:36:52 No.106359723

Anonymous 08/23/25(Sat)13:36:52 No.106359723

>>106359709
stop using ollama

Anonymous
08/23/25(Sat)13:39:44 No.106359750

Anonymous 08/23/25(Sat)13:39:44 No.106359750

>>106359723
Suggestions for alternatives? I want something with features like chat history, markdown, etc. and not just a command terminal.

Anonymous
08/23/25(Sat)13:41:58 No.106359766

Anonymous 08/23/25(Sat)13:41:58 No.106359766

>>106359750
llamacpp has it's own embedded webui.

Anonymous
08/23/25(Sat)13:43:13 No.106359780

Anonymous 08/23/25(Sat)13:43:13 No.106359780

>>106358892
>Next up - the AI ice age

Anonymous
08/23/25(Sat)13:44:15 No.106359787

Anonymous 08/23/25(Sat)13:44:15 No.106359787

>>106359750
troonkupad

Anonymous
08/23/25(Sat)13:47:02 No.106359808

Anonymous 08/23/25(Sat)13:47:02 No.106359808

Is Miku trans?

Anonymous
08/23/25(Sat)13:47:34 No.106359817

Anonymous 08/23/25(Sat)13:47:34 No.106359817

>>106359808
Yes

Anonymous
08/23/25(Sat)13:47:59 No.106359824

Anonymous 08/23/25(Sat)13:47:59 No.106359824

>>106359750
llama.cpp server and any frontend what works for (you). llama has its own webchat but that's very bare. SillyTavern or whatever else is out there works well.

Anonymous
08/23/25(Sat)13:48:38 No.106359826

Anonymous 08/23/25(Sat)13:48:38 No.106359826

>>106359808
Absolutely

Anonymous
08/23/25(Sat)13:49:12 No.106359837

Anonymous 08/23/25(Sat)13:49:12 No.106359837

Anyone using any good Mistral Small models? I've been pretty much exclusively using Magnum Diamond (Cydonia is meh, decent but I think there's other better Mistral small models)?

I really wanna try out the Qwen shit but I can never really get it to work well. Feels like it's really poor at RP (probably prompt issue or some shit). Got 24GB VRAM, 32GB RAM so i'm pretty limited on the shit I can run

Anonymous
08/23/25(Sat)13:49:30 No.106359844

Anonymous 08/23/25(Sat)13:49:30 No.106359844

So now that the dust has settled
What went wrong with DeepSeek 3.1?

Anonymous
08/23/25(Sat)13:49:38 No.106359847

Anonymous 08/23/25(Sat)13:49:38 No.106359847

>>106359808
She is

Anonymous
08/23/25(Sat)13:50:39 No.106359854

Anonymous 08/23/25(Sat)13:50:39 No.106359854

>>106359844
Lack of sex modality

Anonymous
08/23/25(Sat)13:52:25 No.106359877

Anonymous 08/23/25(Sat)13:52:25 No.106359877

Wtf is going on in /aicg/? I come back after a few hours and every post is deleted.

Anonymous
08/23/25(Sat)13:52:29 No.106359879

Anonymous 08/23/25(Sat)13:52:29 No.106359879

File: file.png (12 KB, 638x56)

12 KB PNG

muh blackx rights

Anonymous
08/23/25(Sat)13:54:29 No.106359898

Anonymous 08/23/25(Sat)13:54:29 No.106359898

>>106359808
Stop replying to yourself

Anonymous
08/23/25(Sat)13:54:32 No.106359899

Anonymous 08/23/25(Sat)13:54:32 No.106359899

>>106359844
People expecting to RP with it using schizo cards.

Anonymous
08/23/25(Sat)13:55:31 No.106359906

Anonymous 08/23/25(Sat)13:55:31 No.106359906

>>106359837
Devatral

Anonymous
08/23/25(Sat)13:55:40 No.106359908

Anonymous 08/23/25(Sat)13:55:40 No.106359908

>>106359837
Qwen is not that great at writing. Mistral 3.2 is ok. Cydonia is somewhat strange it's not bad though.
Try Gemma 3 or Gemma 3 Glitter specifically. I really like its output (relative) but it's annoying if you are pushing its censorship limits. That works too but you need to groom it first can't just blurt out something or otherwise it'll display suicide hotline disclaimer with phone numbers lel

Anonymous
08/23/25(Sat)13:55:59 No.106359914

Anonymous 08/23/25(Sat)13:55:59 No.106359914

File: 1738785005961858.png (308 KB, 1683x1353)

308 KB PNG

Newfag here, Im trying to build fast local models for erp conversations, what are some models that are on par with qwen-flash's speed? Because those 1-3s delays in most LLMs are a huge turndown for me. We are talking about around 3-500ms with like 100-300 input tokens.

Also in picrel the numbers of gigabytes in parentheses are the memory needed right? How tf are you supposed to have 200GB in your local machine?

Anonymous
08/23/25(Sat)13:56:57 No.106359923

Anonymous 08/23/25(Sat)13:56:57 No.106359923

>>106359448
https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune
Recommended settings.

Anonymous
08/23/25(Sat)13:59:49 No.106359944

Anonymous 08/23/25(Sat)13:59:49 No.106359944

File: cdf614e2f036e2e4c9366f973(...).jpg (675 KB, 992x1403)

675 KB JPG

Smell will be the next big modality

Anonymous
08/23/25(Sat)14:01:10 No.106359949

Anonymous 08/23/25(Sat)14:01:10 No.106359949

>>106359914
>How tf are you supposed to have 200GB in your local machine?
A lot of money
Personally I'm eyeing a 96gb DDR5 kit

Anonymous
08/23/25(Sat)14:04:54 No.106359974

Anonymous 08/23/25(Sat)14:04:54 No.106359974

>>106359914
>How tf are you supposed to have 200GB in your local machine?
people off load the model to system ram out of desperation. I suppose you could stack some of those workstation cards like thet Blackwell rtx 6000 but it would cost prohibitive pretty quickly.

Anonymous
08/23/25(Sat)14:05:08 No.106359976

Anonymous 08/23/25(Sat)14:05:08 No.106359976

https://rentry.org/lmg-lazy-spoonfeed-guide
?

Anonymous
08/23/25(Sat)14:05:16 No.106359978

Anonymous 08/23/25(Sat)14:05:16 No.106359978

>>106359914
>even Q1_S is extremely capable
does that mean 1 bit?

Anonymous
08/23/25(Sat)14:06:16 No.106359984

Anonymous 08/23/25(Sat)14:06:16 No.106359984

>>106359949
what would you run with that?

Anonymous
08/23/25(Sat)14:06:45 No.106359989

Anonymous 08/23/25(Sat)14:06:45 No.106359989

>>106359914
>How tf are you supposed to have 200GB in your local machine?
disk space + RAM + VRAM ≥ 250GB
Most of lmg ERPs at 1 t/s.

Anonymous
08/23/25(Sat)14:07:11 No.106359993

Anonymous 08/23/25(Sat)14:07:11 No.106359993

>>106359914
>How tf are you supposed to have 200GB in your local machine?
https://www.amazon.com/Crucial-5600MHz-5200MHz-Compatible-CP2K64G56C46U5/dp/B0DSR5P84D
https://www.amazon.com/G-SKILL-4x64GB-CL36-44-44-96-Desktop-Computer/dp/B0FFKFCLLL
Like this.

Anonymous
08/23/25(Sat)14:07:58 No.106359998

Anonymous 08/23/25(Sat)14:07:58 No.106359998

>>106359984
Cope quants of bigger models than glm45 air

Anonymous
08/23/25(Sat)14:09:20 No.106360005

Anonymous 08/23/25(Sat)14:09:20 No.106360005

>>106359989
c'mon, you need at least 5t/s for it to be halfway enjoyable. I can run larger models at 1t/s but I never would, what do you do while it's spitting out the response?

Anonymous
08/23/25(Sat)14:10:53 No.106360019

Anonymous 08/23/25(Sat)14:10:53 No.106360019

>>106360005
>what do you do while it's spitting out the response?
Masturbating and shit posting.

Anonymous
08/23/25(Sat)14:11:01 No.106360022

Anonymous 08/23/25(Sat)14:11:01 No.106360022

>>106360005
I get 3t/s and I am sure something is fucked with my config + I am on windows.

Anonymous
08/23/25(Sat)14:11:43 No.106360038

Anonymous 08/23/25(Sat)14:11:43 No.106360038

>>106359976
>Pub: 23 Aug 2025 18:04 UTC
>Views: 0
>wixmp.com
Did it have anything useful before?

Anonymous
08/23/25(Sat)14:12:03 No.106360042

Anonymous 08/23/25(Sat)14:12:03 No.106360042

>>106360005
i just switch tabs and do something else

Anonymous
08/23/25(Sat)14:14:28 No.106360066

Anonymous 08/23/25(Sat)14:14:28 No.106360066

>>106359976
recommending ooba as a first thing to start with is diabolical (no one is going to call this shit text gen ui, fuck off)
llama.cpp has release builds every hour or so anyway if you are running windows, and if you are running linux and can't compile a program then this might not be a hobby for you anyway
overall pretty dogshit guide, if it were really spoonfeeding then it would go from a to z through every part but it's not even half assed, more like quarter assed
a list of recommended programs would be better + a small glossary and that's it

Anonymous
08/23/25(Sat)14:15:33 No.106360075

Anonymous 08/23/25(Sat)14:15:33 No.106360075

>>106360038
It's just the old spoonfeed guide (https://rentry.org/lmg-spoonfeed-guide) with model recommendations replaced with a link to the rentry in the OP.

Anonymous
08/23/25(Sat)14:25:00 No.106360173

Anonymous 08/23/25(Sat)14:25:00 No.106360173

>>106360066
>a list of recommended programs would be better + a small glossary and that's it
That's already covered by the links in the OP template.

>recommending ooba as a first thing to start with is diabolical
Ooba is still used, and rewriting it to walkthrough llama.cpp instead is too much for a lazy guide.

>overall pretty dogshit guide, if it were really spoonfeeding then it would go from a to z through every part but it's not even half assed, more like quarter assed
It touches on and mentions most things someone starting out will need to know and gets them running. People have to put in some effort themselves too. If someone asks how to run local and you give them a 30 page document, they won't even bother.

It's better than the current getting started guide, no?

Anonymous
08/23/25(Sat)14:26:25 No.106360195

Anonymous 08/23/25(Sat)14:26:25 No.106360195

>>106359978

Allowed quantization types:
   2  or  Q4_0    :  4.34G, +0.4685 ppl @ Llama-3-8B
   3  or  Q4_1    :  4.78G, +0.4511 ppl @ Llama-3-8B
  38  or  MXFP4_MOE :  MXFP4 MoE
   8  or  Q5_0    :  5.21G, +0.1316 ppl @ Llama-3-8B
   9  or  Q5_1    :  5.65G, +0.1062 ppl @ Llama-3-8B
  19  or  IQ2_XXS :  2.06 bpw quantization
  20  or  IQ2_XS  :  2.31 bpw quantization
  28  or  IQ2_S   :  2.5  bpw quantization
  29  or  IQ2_M   :  2.7  bpw quantization
  24  or  IQ1_S   :  1.56 bpw quantization
  31  or  IQ1_M   :  1.75 bpw quantization
  36  or  TQ1_0   :  1.69 bpw ternarization
  37  or  TQ2_0   :  2.06 bpw ternarization
  10  or  Q2_K    :  2.96G, +3.5199 ppl @ Llama-3-8B
  21  or  Q2_K_S  :  2.96G, +3.1836 ppl @ Llama-3-8B
  23  or  IQ3_XXS :  3.06 bpw quantization
  26  or  IQ3_S   :  3.44 bpw quantization
  27  or  IQ3_M   :  3.66 bpw quantization mix
  12  or  Q3_K    : alias for Q3_K_M
  22  or  IQ3_XS  :  3.3 bpw quantization
  11  or  Q3_K_S  :  3.41G, +1.6321 ppl @ Llama-3-8B
  12  or  Q3_K_M  :  3.74G, +0.6569 ppl @ Llama-3-8B
  13  or  Q3_K_L  :  4.03G, +0.5562 ppl @ Llama-3-8B
  25  or  IQ4_NL  :  4.50 bpw non-linear quantization
  30  or  IQ4_XS  :  4.25 bpw non-linear quantization
  15  or  Q4_K    : alias for Q4_K_M
  14  or  Q4_K_S  :  4.37G, +0.2689 ppl @ Llama-3-8B
  15  or  Q4_K_M  :  4.58G, +0.1754 ppl @ Llama-3-8B
  17  or  Q5_K    : alias for Q5_K_M
  16  or  Q5_K_S  :  5.21G, +0.1049 ppl @ Llama-3-8B
  17  or  Q5_K_M  :  5.33G, +0.0569 ppl @ Llama-3-8B
  18  or  Q6_K    :  6.14G, +0.0217 ppl @ Llama-3-8B
   7  or  Q8_0    :  7.96G, +0.0026 ppl @ Llama-3-8B
   1  or  F16     : 14.00G, +0.0020 ppl @ Mistral-7B
  32  or  BF16    : 14.00G, -0.0050 ppl @ Mistral-7B
   0  or  F32     : 26.00G              @ 7B
          COPY    : only copy tensors, no quantizing

Anonymous
08/23/25(Sat)14:28:18 No.106360219

Anonymous 08/23/25(Sat)14:28:18 No.106360219

>>106360195
>we have bitnet at home

Anonymous
08/23/25(Sat)14:28:55 No.106360226

Anonymous 08/23/25(Sat)14:28:55 No.106360226

File: file.png (16 KB, 177x937)

16 KB PNG

i am running a chroot inside a chroot, and i am running things off of different partitions
HOW THE FUCK DO I HIDE THIS SHIT INSIDE MY FILE PICKER AND INSIDE MY FILE MANAGER
HOW TO FUCKING HIDE IT FUCK FUCK FUCK FUCK FUCK!!!!!!!! FUUUUUUUCCCCCCCCCCCCCKKKKKKKKKKKKKKKKKKKKK

Anonymous
08/23/25(Sat)14:30:20 No.106360238

Anonymous 08/23/25(Sat)14:30:20 No.106360238

>>106360195
Thanks, I found a 2 year old version of that chart.
Q1 sounds like there's no way it can be good desu

Anonymous
08/23/25(Sat)14:30:31 No.106360243

Anonymous 08/23/25(Sat)14:30:31 No.106360243

>>106360219
Not quite. Bitnet needs the training to be quantization aware to be near-lossless as they claim. This is not it.

Anonymous
08/23/25(Sat)14:33:16 No.106360265

Anonymous 08/23/25(Sat)14:33:16 No.106360265

>>106360173
>It's better than the current getting started guide, no?
not really
i could try my hand at writing a guide, but i was here since llama2 released, so i'm not sure what are the pain points for new people of various literacy levels
this shit isn't really rocket science though, i'm sure that a moderately non-retarded person could figure it out in an afternoon or two on their own
you can't really save the lowest common denominator from their own stupidity

Anonymous
08/23/25(Sat)14:33:59 No.106360272

Anonymous 08/23/25(Sat)14:33:59 No.106360272

File: llama_quantize.png (16 KB, 1023x720)

16 KB PNG

>>106360238
It's the output of llama-quantize without parameters. You should have it on your pc already.
>Q1 sounds like there's no way it can be good desu
Some people are desperate and will do it anyway.

Anonymous
08/23/25(Sat)14:35:56 No.106360289

Anonymous 08/23/25(Sat)14:35:56 No.106360289

>>106360238
>>106360238
Q1 can be good for HUGE HUGE HUGE models, like deepseek R1, ymmv

Anonymous
08/23/25(Sat)14:36:28 No.106360298

Anonymous 08/23/25(Sat)14:36:28 No.106360298

>>106360238
Any Q1 Deepseek is better than any dense model you have ever tried. The problem of Q1 is that it is essentially enforced greedy sampling. All rerolls are almost the same.

Anonymous
08/23/25(Sat)14:43:33 No.106360369

Anonymous 08/23/25(Sat)14:43:33 No.106360369

>>106360298
Interesting. So it's hard baked. Hardtack.

Anonymous
08/23/25(Sat)14:49:29 No.106360407

Anonymous 08/23/25(Sat)14:49:29 No.106360407

>>106359448
to some degree the model is going to act the way it wants to act no matter what, but imo those models need a more restrained prompt than the ones that people used to use for roleplay with dry models, e.g. you don't really want to be encouraging them to use a flashy personality-maxxed hentai writing style or telling them to be extremely creative and unpredictable etc. they do much better with a more neutral prompt

Anonymous
08/23/25(Sat)14:55:28 No.106360462

Anonymous 08/23/25(Sat)14:55:28 No.106360462

>>106358752
Do any of you use TTS programs? I'm not looking for the best - I'm looking for fast and low vram, because I want as much of my vram as possible to be dedicated to the LLM, not to the TTS.

Anonymous
08/23/25(Sat)15:02:37 No.106360524

Anonymous 08/23/25(Sat)15:02:37 No.106360524

>>106360462
I used piper for a bit. Had to make the glue between my editor and piper, but it worked. It's stupid fast. Kokorotts is fast too. Not as fast, but I think it sounds a little better. Haven't tried kittentts. It has the smallest models of all three, so it should be faster than piper. One of these days i'll integrate it in my stuff.

Anonymous
08/23/25(Sat)15:02:55 No.106360527

Anonymous 08/23/25(Sat)15:02:55 No.106360527

>>106359976
I agree with the other anon that recommending ooba is dumb.
In my opinion LM Studio (not open source I know) is the easiest to get started with because it uses llama.cpp and it doesn't require any python stuff that filters so many people. Kobold is a distant second.
Anyone serious about this uses llama.cpp and it's not even mentioned.

Anonymous
08/23/25(Sat)15:06:35 No.106360554

Anonymous 08/23/25(Sat)15:06:35 No.106360554

>>106360527
>Anyone serious about this uses llama.cpp and it's not even mentioned.
It's a spoonfeed guide. Whoever needs to read that, is not serious yet. llama.cpp is mentioned in the OP that retards cannot be bothered to read, so whatever.

Anonymous
08/23/25(Sat)15:06:59 No.106360557

Anonymous 08/23/25(Sat)15:06:59 No.106360557

>>106358892
real nice

Anonymous
08/23/25(Sat)15:08:10 No.106360565

Anonymous 08/23/25(Sat)15:08:10 No.106360565

>>106360524
Nice, thanks for the suggestions

Anonymous
08/23/25(Sat)15:08:23 No.106360566

Anonymous 08/23/25(Sat)15:08:23 No.106360566

I think I moved on from ooba when i got a bug where you couldn't interrupt generation and you had to wait for it to finish. I also still remember how retard forcibly changed API to openAI API and removed the old one. And openAI implementation was obviously bugged so there was no way to use it.

Anonymous
08/23/25(Sat)15:09:22 No.106360573

Anonymous 08/23/25(Sat)15:09:22 No.106360573

So, meta is not releasing Llama 4 Behemoth, right?

Anonymous
08/23/25(Sat)15:10:25 No.106360583

Anonymous 08/23/25(Sat)15:10:25 No.106360583

>>106360573
Right after Grok 2

Anonymous
08/23/25(Sat)15:11:18 No.106360588

Anonymous 08/23/25(Sat)15:11:18 No.106360588

>>106360573
they realized that drummer already has tunes that are named behemoth so they decided to scrap the model to not cause any confusion

Hi all, Drummer here...
08/23/25(Sat)15:12:51 No.106360600

Hi all, Drummer here... 08/23/25(Sat)15:12:51 No.106360600

>>106360588
damn this drummer guy is pretty badass

WANG
08/23/25(Sat)15:12:54 No.106360601

WANG 08/23/25(Sat)15:12:54 No.106360601

File: Screenshot 2025-08-23 125031.png (1.77 MB, 1422x873)

1.77 MB PNG

hi shitfuckers
just wanted to tell you all that me and sam-chan's plan to hollow meta from the inside out has been going smoothly
convincing meta to abandon open source? easy
making lecunny ledone? yawn
convincing zuck to use gpt-5 after spending millions on his "superintelligence" (lol) team? well let's just say zuck is even more of a bottom than sam is
i've also been putting the plans in motion to get that chink ban underway. enjoy your deepsneed while it lasts, because when i'm done the only chinese letters you'll see will be the digits after a dollar sign
lol. good luck, and for those of you who are interested in resources about openness, remember that by reading this you have acknowledged that the wang-sama foundation legally owns your car and your daughter's virginity

Anonymous
08/23/25(Sat)15:15:19 No.106360623

Anonymous 08/23/25(Sat)15:15:19 No.106360623

>>106359908
Isn't Gemma meant to be super retarded when it comes to roleplaying though (misremembering basic details etc)

I remember trying it before, Drummers one and the Abilerated one or something? Both sucked.

Anonymous
08/23/25(Sat)15:15:28 No.106360625

Anonymous 08/23/25(Sat)15:15:28 No.106360625

>>106359104
this but also cumfartui for diffusion

Anonymous
08/23/25(Sat)15:16:45 No.106360638

Anonymous 08/23/25(Sat)15:16:45 No.106360638

>>106360524
Nta but thanks, I'll test kittentts and will integrate that to my client. Todo list grows but no work gets done lol

Anonymous
08/23/25(Sat)15:17:44 No.106360651

Anonymous 08/23/25(Sat)15:17:44 No.106360651

File: file.png (195 KB, 896x900)

195 KB PNG

drummer why is gemma r1 12b so shit? i swear to god i pulled out a steam deck and then it started doing this 2 messages later when i told it to suck my dick
glm4.5 air chan would never do this

Anonymous
08/23/25(Sat)15:18:52 No.106360663

Anonymous 08/23/25(Sat)15:18:52 No.106360663

>>106360601
buy an ad

Anonymous
08/23/25(Sat)15:20:49 No.106360682

Anonymous 08/23/25(Sat)15:20:49 No.106360682

>>106360623
No, in my experience even 12b gemma 3 excels and is comparable to larger 24b mistral. I mean I use it for d&d rp and it can cite how much gold my partner has etc. I don't have any complex rules and have tried to make every system prompt rule as concise as possible.
Retardation comes more from its censorship in sexual content but this can be avoided with jailbreak thing plus gemma 3 glitter is somewhat better in this sense.
try it out and if you don't like it into the trash it goes

Anonymous
08/23/25(Sat)15:21:52 No.106360692

Anonymous 08/23/25(Sat)15:21:52 No.106360692

https://youtu.be/mjB6HDot1Uk?t=428

Anonymous
08/23/25(Sat)15:22:28 No.106360702

Anonymous 08/23/25(Sat)15:22:28 No.106360702

>>106360692
youtube slop

Anonymous
08/23/25(Sat)15:31:03 No.106360793

Anonymous 08/23/25(Sat)15:31:03 No.106360793

>>106360601
>sam-chan
>not sama-chama
anon...

Anonymous
08/23/25(Sat)15:39:21 No.106360889

Anonymous 08/23/25(Sat)15:39:21 No.106360889

>>106360601
>lecunny ledone
qrd on this?

Anonymous
08/23/25(Sat)15:46:05 No.106360964

Anonymous 08/23/25(Sat)15:46:05 No.106360964

I have 128 GB RAM and 24 GB VRAM. What model that fits is the best for making small scripts? I could run Qwen 235B at like Q2 to Q3, or GLM 4 Air at Q6. Does the quantization hurt Qwen too much for coding or is still the best even when lobotomized?

Anonymous
08/23/25(Sat)15:50:45 No.106361016

Anonymous 08/23/25(Sat)15:50:45 No.106361016

>>106360964
Devatral FP16

Anonymous
08/23/25(Sat)15:52:22 No.106361034

Anonymous 08/23/25(Sat)15:52:22 No.106361034

>>106361016
That would be really slow though since no more than half of the model could fit in VRAM.

Anonymous
08/23/25(Sat)15:54:45 No.106361058

Anonymous 08/23/25(Sat)15:54:45 No.106361058

File: 1736744801054204.png (29 KB, 1022x271)

29 KB PNG

>>106360889

Anonymous
08/23/25(Sat)16:01:18 No.106361136

Anonymous 08/23/25(Sat)16:01:18 No.106361136

>>106361058
it know

Anonymous
08/23/25(Sat)16:02:36 No.106361151

Anonymous 08/23/25(Sat)16:02:36 No.106361151

has lecun made a statement on genie3 yet? google just went ahead and did what he was dreaming of with that

Anonymous
08/23/25(Sat)16:04:03 No.106361170

Anonymous 08/23/25(Sat)16:04:03 No.106361170

>>106360889
He reports to Wang now

Anonymous
08/23/25(Sat)16:04:38 No.106361174

Anonymous 08/23/25(Sat)16:04:38 No.106361174

File: onmeth.png (21 KB, 701x27)

21 KB PNG

>Deepseek R1 400b
>prompt it with: Write uniquely to the tone of {{char}}'s personality.
>Magical shit like this happens

Anonymous
08/23/25(Sat)16:05:09 No.106361184

Anonymous 08/23/25(Sat)16:05:09 No.106361184

>>106361174
400b?

Anonymous
08/23/25(Sat)16:05:16 No.106361187

Anonymous 08/23/25(Sat)16:05:16 No.106361187

>>106361174
>Deepseek R1 400b
is this some pruned shit?

Anonymous
08/23/25(Sat)16:06:03 No.106361204

Anonymous 08/23/25(Sat)16:06:03 No.106361204

>>106361151
Do we even know what architecture it uses and what methods they used to train it?

Anonymous
08/23/25(Sat)16:06:38 No.106361208

Anonymous 08/23/25(Sat)16:06:38 No.106361208

>>106361204
No, closed source has finally achieved its moat. All it needed was to fully abandon LLMs.

Anonymous
08/23/25(Sat)16:07:15 No.106361212

Anonymous 08/23/25(Sat)16:07:15 No.106361212

File: file.png (15 KB, 566x247)

15 KB PNG

DEEPSEEK WHY WHY!?!?!?!
>>106361207
HOLY SHIT
HE DELIVERED
THANK YOU MUSK SAMA
I APOLOGIZE

Anonymous
08/23/25(Sat)16:07:32 No.106361215

Anonymous 08/23/25(Sat)16:07:32 No.106361215

>>106360583
where is behemoth?
https://huggingface.co/xai-org/grok-2

Anonymous
08/23/25(Sat)16:08:17 No.106361224

Anonymous 08/23/25(Sat)16:08:17 No.106361224

>>106361215
>500GB
I sleep

Anonymous
08/23/25(Sat)16:08:29 No.106361228

Anonymous 08/23/25(Sat)16:08:29 No.106361228

>>106361208
If the details are that light then I imagine Lecunny would be incentivized to not make a post about it since he either knows too much and would get into trouble, or he'd have to "speculate".

Anonymous
08/23/25(Sat)16:09:37 No.106361234

Anonymous 08/23/25(Sat)16:09:37 No.106361234

File: 1732453400163950.png (242 KB, 655x1223)

242 KB PNG

>>106361215
The fuck is this structure?

Anonymous
08/23/25(Sat)16:11:09 No.106361251

Anonymous 08/23/25(Sat)16:11:09 No.106361251

>>106361215
Use the command below to launch an inference server. This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory).
does this mean its 8bit?
500gb but 8bit? that means.. 1trillion parameters? BROS BROS?!??!?! BROS!@#?%!#%^)!@$*^()!#@$&*^!)($^ BROS FUCKING BROS HOLY SHIT !!!!

Anonymous
08/23/25(Sat)16:11:23 No.106361253

Anonymous 08/23/25(Sat)16:11:23 No.106361253

Is grok 2 potentially fuckable? As in is it worth it over r1 and glm4.5 for sex?

Anonymous
08/23/25(Sat)16:11:34 No.106361256

Anonymous 08/23/25(Sat)16:11:34 No.106361256

File: Screenshot 2025-08-23 141118.png (26 KB, 602x209)

26 KB PNG

>>106361215

Anonymous
08/23/25(Sat)16:12:09 No.106361264

Anonymous 08/23/25(Sat)16:12:09 No.106361264

>>106361253
Nah. Grok 3 was okay at erotic creative writing though.

Anonymous
08/23/25(Sat)16:12:30 No.106361266

Anonymous 08/23/25(Sat)16:12:30 No.106361266

>>106361253
Ask the cloudcucks instead

Anonymous
08/23/25(Sat)16:12:56 No.106361270

Anonymous 08/23/25(Sat)16:12:56 No.106361270

>>106361215
>it's real
Well, /lmg/? Your apology to Elon sir?

Anonymous
08/23/25(Sat)16:13:09 No.106361272

Anonymous 08/23/25(Sat)16:13:09 No.106361272

>>106361256
I recognize that profile picture

Anonymous
08/23/25(Sat)16:13:25 No.106361276

Anonymous 08/23/25(Sat)16:13:25 No.106361276

>>106361253
no, it was hardly even a good option at the time

Anonymous
08/23/25(Sat)16:13:38 No.106361280

Anonymous 08/23/25(Sat)16:13:38 No.106361280

>>106361270
He was late by a few days, but it could be the unpaid intern's fault

Anonymous
08/23/25(Sat)16:14:59 No.106361292

Anonymous 08/23/25(Sat)16:14:59 No.106361292

>>106361270
I still think Elon is a despicable piece of shit. This model is late, 2 generations behind and quite frankly worthless. I would only apologize if it made me cum but that is not gonna happen.

Anonymous
08/23/25(Sat)16:15:00 No.106361293

Anonymous 08/23/25(Sat)16:15:00 No.106361293

Grok2 had native image editing, didn't it?

Anonymous
08/23/25(Sat)16:15:04 No.106361294

Anonymous 08/23/25(Sat)16:15:04 No.106361294

>>106360964
235B less quantized than Q1 of R1, and even at Q2 or Q3, it's still better than dense 70B models and more than capable at making small scripts.

Anonymous
08/23/25(Sat)16:16:36 No.106361308

Anonymous 08/23/25(Sat)16:16:36 No.106361308

>>106361293
I don't think so. Didn't it call flux on the backend?

Anonymous
08/23/25(Sat)16:16:38 No.106361309

Anonymous 08/23/25(Sat)16:16:38 No.106361309

>>106361293
Grok2 called out to Flux iirc

Anonymous
08/23/25(Sat)16:16:39 No.106361310

Anonymous 08/23/25(Sat)16:16:39 No.106361310

>>106361294
What about Air though?

Anonymous
08/23/25(Sat)16:19:11 No.106361336

Anonymous 08/23/25(Sat)16:19:11 No.106361336

>>106361310
Never tried it. But you can download both and ask them to do some script and keep the one that does better.

Anonymous
08/23/25(Sat)16:19:30 No.106361340

Anonymous 08/23/25(Sat)16:19:30 No.106361340

>>106361276
So there is zero reason to use it over Deepseek. No one sane is going to actually use it. And we will now get some loud obnoxious saars running around the internet saying that Elon is a friend of open source because of it.

I hope Elon gets cancer soon.

Anonymous
08/23/25(Sat)16:20:31 No.106361348

Anonymous 08/23/25(Sat)16:20:31 No.106361348

>>106361340
go cry on blue cry, more open weights is always a good thing

Anonymous
08/23/25(Sat)16:20:58 No.106361352

Anonymous 08/23/25(Sat)16:20:58 No.106361352

>>106361310
air is good

Anonymous
08/23/25(Sat)16:21:24 No.106361355

Anonymous 08/23/25(Sat)16:21:24 No.106361355

>>106361215
>b. Restrictions:
>You may not use the Materials, derivatives, or outputs (including generated data) to train, create, or improve any foundational, large language, or general-purpose AI models, except for modifications or fine-tuning of Grok 2 permitted under and in accordance with the terms of this Agreement.
Ewww

Anonymous
08/23/25(Sat)16:21:54 No.106361358

Anonymous 08/23/25(Sat)16:21:54 No.106361358

>>106361215
So it's basically mixtral but 500gb.
"hidden_size": 8192,
"intermediate_size": 32768,
"moe_intermediate_size": 16384,
"num_experts_per_tok": 2,
"num_local_experts": 8,
"num_hidden_layers": 64,

Anonymous
08/23/25(Sat)16:22:07 No.106361361

Anonymous 08/23/25(Sat)16:22:07 No.106361361

>>106361355
No one should want to anyway kek.

Anonymous
08/23/25(Sat)16:22:18 No.106361363

Anonymous 08/23/25(Sat)16:22:18 No.106361363

>>106361348
Post output

Anonymous
08/23/25(Sat)16:22:56 No.106361368

Anonymous 08/23/25(Sat)16:22:56 No.106361368

ollama run grok 2

Anonymous
08/23/25(Sat)16:23:31 No.106361378

Anonymous 08/23/25(Sat)16:23:31 No.106361378

ollama run you're mum

Anonymous
08/23/25(Sat)16:23:49 No.106361381

Anonymous 08/23/25(Sat)16:23:49 No.106361381

>>106361348
Indeed, only because then losing to Grok in the benchmarks is more embarrassing.
For being fuckable? You'd be stupid not prefer the fucking wildly hallucinating Gemma mini models over Grok.

Anonymous
08/23/25(Sat)16:23:54 No.106361382

Anonymous 08/23/25(Sat)16:23:54 No.106361382

>>106361361
But if Grok 3 and 4 ever get released, they'll likely have the same license.

Anonymous
08/23/25(Sat)16:25:15 No.106361396

Anonymous 08/23/25(Sat)16:25:15 No.106361396

>>106361382
Is Grok 3 and 4 so good to warrant distilling them though?

Anonymous
08/23/25(Sat)16:28:00 No.106361422

Anonymous 08/23/25(Sat)16:28:00 No.106361422

>>106361253
Grok 2 was the one that had the engineers on twitter complaining about how much positivity bias leaked into it from contaminated training data.

>>106361340
If he really wanted to show up Altman, he could have easily released both Grok 2 and 3, and even gpt-oss-sized distill just to rub it in. The probably could have knocked out the distills in a week.

Anonymous
08/23/25(Sat)16:31:54 No.106361460

Anonymous 08/23/25(Sat)16:31:54 No.106361460

Grok 2 saved local

Anonymous
08/23/25(Sat)16:32:15 No.106361464

Anonymous 08/23/25(Sat)16:32:15 No.106361464

>>106361460
*safed

Anonymous
08/23/25(Sat)16:32:47 No.106361469

Anonymous 08/23/25(Sat)16:32:47 No.106361469

>>106361422
Sir they are working on Grok 5 AGI Companions, they are rightly focusing their attention where it's needed.

Anonymous
08/23/25(Sat)16:34:24 No.106361491

Anonymous 08/23/25(Sat)16:34:24 No.106361491

>>106361251
Weird if true. HF says the tensors are at BF16.

Anonymous
08/23/25(Sat)16:36:32 No.106361508

Anonymous 08/23/25(Sat)16:36:32 No.106361508

>>106361491
dont trust HF autodetect for anything, its always wrong
if its BF16 even better, only 250b model thats nice

Anonymous
08/23/25(Sat)16:38:41 No.106361527

Anonymous 08/23/25(Sat)16:38:41 No.106361527

>>106361469
If they dumped both 2 and 3 at the same time, they wouldn't have people nagging them to do another release in 6 months because they would have already gotten it out of the way.

Anonymous
08/23/25(Sat)16:40:16 No.106361545

Anonymous 08/23/25(Sat)16:40:16 No.106361545

>>106361527
Please understand, safety checking needs long time.

Anonymous
08/23/25(Sat)16:40:52 No.106361554

Anonymous 08/23/25(Sat)16:40:52 No.106361554

>>106361396
Grok 4 is the a SOTA model. Grok 1 and Grok 2 were them just dipping their toes in the water. 3 is when they really started doing decent.

Anonymous
08/23/25(Sat)16:44:45 No.106361575

Anonymous 08/23/25(Sat)16:44:45 No.106361575

>>106361527
Elon dumps something when his ego needs a stroke, so my guess is it'll probably come when / if OpenAI does another "open" release

Anonymous
08/23/25(Sat)16:54:34 No.106361648

Anonymous 08/23/25(Sat)16:54:34 No.106361648

File: wahaha cry.jpg (64 KB, 1280x720)

64 KB JPG

Petra why are you bullying the facehuggers, you know they're sensitive

Anonymous
08/23/25(Sat)16:55:49 No.106361657

Anonymous 08/23/25(Sat)16:55:49 No.106361657

>>106358189
I like how these faggots are acting all uppity as if markdown rendering is some arcane secret only they control. Anyone could vibecode a clone over a weekend these days

Anonymous
08/23/25(Sat)16:57:48 No.106361673

Anonymous 08/23/25(Sat)16:57:48 No.106361673

>>106361648
It's really funny

Anonymous
08/23/25(Sat)16:58:04 No.106361677

Anonymous 08/23/25(Sat)16:58:04 No.106361677

>>106361215
>Usage: Serving with SGLang
https://github.com/sgl-project/sglang/pull/9532/files
Is 'xai_temperature' something like dynamic temperature?

Anonymous
08/23/25(Sat)16:58:49 No.106361680

Anonymous 08/23/25(Sat)16:58:49 No.106361680

>>106361657
The issue isn't rendering an alternative but hosting that shit

Anonymous
08/23/25(Sat)16:58:50 No.106361681

Anonymous 08/23/25(Sat)16:58:50 No.106361681

>>106361648
How does he not get banned anyway?

Anonymous
08/23/25(Sat)16:59:27 No.106361688

Anonymous 08/23/25(Sat)16:59:27 No.106361688

File: petra.png (93 KB, 636x667)

93 KB PNG

>>106361648
You harbour sin brother

Anonymous
08/23/25(Sat)17:02:16 No.106361706

Anonymous 08/23/25(Sat)17:02:16 No.106361706

File: file.png (29 KB, 1520x389)

29 KB PNG

>>106361681
hf jannies are based
picrel is from when gpt oss released and i dropped gamer word and whatever else

Anonymous
08/23/25(Sat)17:05:26 No.106361740

Anonymous 08/23/25(Sat)17:05:26 No.106361740

File: file.png (12 KB, 682x131)

12 KB PNG

HAPPENING! CONFIRMED TO BE 260B/A30B
QWEN3 235B BUT SEX AND STUPED
BASED BASED BASED

Anonymous
08/23/25(Sat)17:06:21 No.106361743

Anonymous 08/23/25(Sat)17:06:21 No.106361743

>>106361648
if the companies who create new models and publish them on huggingface realize that the main userbase of open models uses them for porn and obscene purposes, they're more likely to try to pander to us in the future

Anonymous
08/23/25(Sat)17:08:04 No.106361759

Anonymous 08/23/25(Sat)17:08:04 No.106361759

>>106361743
I'd say the opposite is far more likely, Which he would like since he's been trying to kill the thread for a while now.

Anonymous
08/23/25(Sat)17:09:53 No.106361783

Anonymous 08/23/25(Sat)17:09:53 No.106361783

>>106361740
8 experts 2 active. Therefore slow as shit

Anonymous
08/23/25(Sat)17:12:12 No.106361801

Anonymous 08/23/25(Sat)17:12:12 No.106361801

>>106361783
what about the common/shared ones?

Anonymous
08/23/25(Sat)17:12:16 No.106361802

Anonymous 08/23/25(Sat)17:12:16 No.106361802

Hey /lmg/ scholars, what makes LLMs so sensitive to quantization degradation? I'm quantizing small transformers models (T5, ViT, Bert...) to UINT8 ONNX and get literally 0 degradation over the full FP32 safetensors (and sometimes a very small improvement due to regularization). Why is that so hard to achieve with LLMs?

Anonymous
08/23/25(Sat)17:12:30 No.106361807

Anonymous 08/23/25(Sat)17:12:30 No.106361807

File: firefox-2025-08-23_17-11-06.jpg (500 KB, 1003x1286)

500 KB JPG

saw the shitposts in that thread and fucking knew it came from here lol

hf admins literally posting as well there so ill await my ban

Anonymous
08/23/25(Sat)17:16:32 No.106361847

Anonymous 08/23/25(Sat)17:16:32 No.106361847

File: dither-3596767975.jpg (240 KB, 1200x755)

240 KB JPG

>>106360238
>Q1 sounds like there's no way it can be good desu
's fine

Anonymous
08/23/25(Sat)17:17:12 No.106361853

Anonymous 08/23/25(Sat)17:17:12 No.106361853

>>106361801
Look at the config. >>106361358
It don't think it has shared experts

Anonymous
08/23/25(Sat)17:18:23 No.106361864

Anonymous 08/23/25(Sat)17:18:23 No.106361864

>>106361802
depends on usecase
8 bit int is barely quanted anons itt are nutting to Q1
masturbation requires novelty = nuanced token distribution
experiment with QAT

Anonymous
08/23/25(Sat)17:26:09 No.106361925

Anonymous 08/23/25(Sat)17:26:09 No.106361925

>>106361864
You got a point. I forgot you guys run sub-Q8 models

Anonymous
08/23/25(Sat)17:34:55 No.106361990

Anonymous 08/23/25(Sat)17:34:55 No.106361990

>some Ukrainian guy who cited my paper died in Russian strikes last month
welp

Anonymous
08/23/25(Sat)17:35:28 No.106361995

Anonymous 08/23/25(Sat)17:35:28 No.106361995

>>106361925
big-MoE changed the computational game a bit but I'd say 4-6bit quants are most widely used

Anonymous
08/23/25(Sat)17:35:52 No.106361996

Anonymous 08/23/25(Sat)17:35:52 No.106361996

>>106361990
lmao that sounds funny, post more info please

Anonymous
08/23/25(Sat)17:37:32 No.106362014

Anonymous 08/23/25(Sat)17:37:32 No.106362014

>>106361990
that's horrible to hear, alpindale

Anonymous
08/23/25(Sat)17:38:47 No.106362027

Anonymous 08/23/25(Sat)17:38:47 No.106362027

>>106361990
What's the point of writing papers?

Anonymous
08/23/25(Sat)17:41:35 No.106362056

Anonymous 08/23/25(Sat)17:41:35 No.106362056

>>106362027
To get citations

Anonymous
08/23/25(Sat)17:42:21 No.106362065

Anonymous 08/23/25(Sat)17:42:21 No.106362065

>>106362027
Showing your peers the size of your dick

Anonymous
08/23/25(Sat)17:44:47 No.106362091

Anonymous 08/23/25(Sat)17:44:47 No.106362091

>>106362027
publish or perish

Anonymous
08/23/25(Sat)17:46:07 No.106362102

Anonymous 08/23/25(Sat)17:46:07 No.106362102

>>106362027
publishing papers makes you a "researcher" and eligible for free money from universities if you're part of their sekrit club of academics
this way you can live off your degree without getting a real job or doing anything productive

Anonymous
08/23/25(Sat)17:47:25 No.106362115

Anonymous 08/23/25(Sat)17:47:25 No.106362115

>>106362102
>eligible for free money from universities if you're part of their sekrit club of academics
Lmg, please elaborate

Anonymous
08/23/25(Sat)17:48:58 No.106362129

Anonymous 08/23/25(Sat)17:48:58 No.106362129

>>106361807
>lmao what a good meme
>that will be $0.16
Do cloudcuckd really?

Anonymous
08/23/25(Sat)17:52:48 No.106362162

Anonymous 08/23/25(Sat)17:52:48 No.106362162

>>106362129
put it another way, if I'm paying $0.16 for cloudshit, I do want my dick sucked, metaphorically or not.

Anonymous
08/23/25(Sat)17:56:30 No.106362195

Anonymous 08/23/25(Sat)17:56:30 No.106362195

>>106360238
>Q1 sounds like there's no way it can be good desu
all the people who are positive about q1 are hard copers

Anonymous
08/23/25(Sat)17:57:23 No.106362206

Anonymous 08/23/25(Sat)17:57:23 No.106362206

>>106360238
q1 is cope, you need at least dynamic q2 for a close to lossless experience with big moe models such as deepseek r1 0528

Anonymous
08/23/25(Sat)17:57:43 No.106362209

Anonymous 08/23/25(Sat)17:57:43 No.106362209

>>106362129
Wait till they start asking for tips.

Anonymous
08/23/25(Sat)18:00:19 No.106362235

Anonymous 08/23/25(Sat)18:00:19 No.106362235

>>106360238

Listen to what this anon said >>106362206

Anonymous
08/23/25(Sat)18:01:12 No.106362246

Anonymous 08/23/25(Sat)18:01:12 No.106362246

>>106359148
What has changed with cards?

Anonymous
08/23/25(Sat)18:03:04 No.106362266

Anonymous 08/23/25(Sat)18:03:04 No.106362266

>>106362246
Models are now powerful enough to take all the schizo ramblings in your card literally

Anonymous
08/23/25(Sat)18:04:37 No.106362277

Anonymous 08/23/25(Sat)18:04:37 No.106362277

>>106362162
Unfortunately they make it reluctant to suck my dick when I want it to (ERP) and overly eager to suck my dick when I don't want it to (coding).

Anonymous
08/23/25(Sat)18:05:20 No.106362282

Anonymous 08/23/25(Sat)18:05:20 No.106362282

>>106362129
its not opus and im not spending that much - its the new 235b qwen3 with the cost multiplied by a random big number

retard discord users tend to like the responses more if they see it costs money and its claude - human psychology

Anonymous
08/23/25(Sat)18:06:03 No.106362289

Anonymous 08/23/25(Sat)18:06:03 No.106362289

>>106362266
>Models are now powerful enough to take all the schizo ramblings in your card literally
So whats the effective limit for tokens now?

Anonymous
08/23/25(Sat)18:08:35 No.106362315

Anonymous 08/23/25(Sat)18:08:35 No.106362315

>>106362289
it's not about the amount of tokens, it's what you do with them

Anonymous
08/23/25(Sat)18:11:40 No.106362341

Anonymous 08/23/25(Sat)18:11:40 No.106362341

>>106359993
>2 memory channels
LOL

Anonymous
08/23/25(Sat)18:12:33 No.106362351

Anonymous 08/23/25(Sat)18:12:33 No.106362351

>>106362315
>it's what you do with them
Yeah? whats the smallest card you've seen work well? how many tokens?

Anonymous
08/23/25(Sat)18:15:47 No.106362380

Anonymous 08/23/25(Sat)18:15:47 No.106362380

>>106361215
the discussion thread is lol :)

Anonymous
08/23/25(Sat)18:16:38 No.106362388

Anonymous 08/23/25(Sat)18:16:38 No.106362388

>>106362282
based

Anonymous
08/23/25(Sat)18:16:42 No.106362389

Anonymous 08/23/25(Sat)18:16:42 No.106362389

>>106359766
>>106359787
>>106359824
Thanks, llama.cpp + OpenWebUI is way faster. Maybe I'll check out other frontends later. I'm new at this and just used ollama + OpenWebUI because that's the advice that seemed most common online.

Anonymous
08/23/25(Sat)18:17:02 No.106362392

Anonymous 08/23/25(Sat)18:17:02 No.106362392

>>106362282
kek

Anonymous
08/23/25(Sat)18:19:49 No.106362417

Anonymous 08/23/25(Sat)18:19:49 No.106362417

File: Grok.jpg (52 KB, 590x595)

52 KB JPG

Elon claims Grok 3 will be open sourced "in about 6 months."
https://x.com/elonmusk/status/1959379349322313920

Anonymous
08/23/25(Sat)18:21:33 No.106362430

Anonymous 08/23/25(Sat)18:21:33 No.106362430

What is the use case for grok 2 when deepseek and qwen3 coder 480b exist?

Anonymous
08/23/25(Sat)18:22:47 No.106362439

Anonymous 08/23/25(Sat)18:22:47 No.106362439

>>106362417
Seems like you need to x2 every timeframe he gives.

Anonymous
08/23/25(Sat)18:22:54 No.106362441

Anonymous 08/23/25(Sat)18:22:54 No.106362441

>>106362417
I hereby formally apologize to Elon.

Anonymous
08/23/25(Sat)18:22:55 No.106362442

Anonymous 08/23/25(Sat)18:22:55 No.106362442

>>106362417
Would be nice. Grok 3 is an okay creative writing model

Anonymous
08/23/25(Sat)18:26:42 No.106362476

Anonymous 08/23/25(Sat)18:26:42 No.106362476

>>106362417
When will Ani be opensourced?

Anonymous
08/23/25(Sat)18:27:26 No.106362483

Anonymous 08/23/25(Sat)18:27:26 No.106362483

>>106362417
i kneel. fucking BASED

Anonymous
08/23/25(Sat)18:27:40 No.106362485

Anonymous 08/23/25(Sat)18:27:40 No.106362485

>>106362476
I will open her source

Anonymous
08/23/25(Sat)18:33:24 No.106362541

Anonymous 08/23/25(Sat)18:33:24 No.106362541

>>106360243
>Bitnet needs the training to be quantization aware to be near-lossless as they claim.
Pre-training.

What was QAT is now QApT. QAT is now trash thanks to Google and Gemma 3 poisoning the well.

Anonymous
08/23/25(Sat)18:34:16 No.106362544

Anonymous 08/23/25(Sat)18:34:16 No.106362544

>>106358752
Reminder Miku is canonly skinny and have flat chests

Anonymous
08/23/25(Sat)18:36:41 No.106362563

Anonymous 08/23/25(Sat)18:36:41 No.106362563

File: Grok2.png (460 KB, 1188x937)

460 KB PNG

>>106362380
You're not kidding

Anonymous
08/23/25(Sat)18:37:00 No.106362566

Anonymous 08/23/25(Sat)18:37:00 No.106362566

>>106362541
>QAT is now trash thanks to Google and Gemma 3 poisoning the well.

what happened?

Anonymous
08/23/25(Sat)18:37:04 No.106362568

Anonymous 08/23/25(Sat)18:37:04 No.106362568

>>106359808
Miku as character? No.
mikuspammers definitely are, though

Anonymous
08/23/25(Sat)18:40:54 No.106362602

Anonymous 08/23/25(Sat)18:40:54 No.106362602

>>106362417
Elon sir delivered!

Anonymous
08/23/25(Sat)18:42:47 No.106362624

Anonymous 08/23/25(Sat)18:42:47 No.106362624

>>106362541
>Pre-training.
Distinction without a difference and a stupid naming convention. It's training.

Anonymous
08/23/25(Sat)18:46:05 No.106362644

Anonymous 08/23/25(Sat)18:46:05 No.106362644

https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/discussions/2#68a9cfca361af4a168b42b74
In case anyone else tried to make DS 3.1 reasoning work with ST chat completion.

Anonymous
08/23/25(Sat)18:48:30 No.106362661

Anonymous 08/23/25(Sat)18:48:30 No.106362661

>>106362644
So, is it worth it to make the jump from R1/V3?

Anonymous
08/23/25(Sat)18:49:49 No.106362669

Anonymous 08/23/25(Sat)18:49:49 No.106362669

>>106362644
>ubergarm
I have seen this name somewhere...

Anonymous
08/23/25(Sat)18:51:14 No.106362676

Anonymous 08/23/25(Sat)18:51:14 No.106362676

>>106362661
It can deal better with longer context, but it's more autistic so you have to be more explicit about what you want it to do.

Anonymous
08/23/25(Sat)18:54:04 No.106362694

Anonymous 08/23/25(Sat)18:54:04 No.106362694

File: u.png (173 KB, 460x460)

173 KB PNG

Anonymous
08/23/25(Sat)18:55:01 No.106362699

Anonymous 08/23/25(Sat)18:55:01 No.106362699

>>106362661
Not if your primary use case is Vocaloid/UTAU birthday asking at IQ1KT. V3 0324 is better here

Anonymous
08/23/25(Sat)19:00:30 No.106362735

Anonymous 08/23/25(Sat)19:00:30 No.106362735

/pol/ favorite celebrity did something, now the serbian is going to be like a kid on a sugarrush all weekend

Anonymous
08/23/25(Sat)19:03:27 No.106362747

Anonymous 08/23/25(Sat)19:03:27 No.106362747

>>106362735
As one of the prime shitposters I can confirm that I am not feeling like shitposting that much now.

Anonymous
08/23/25(Sat)19:07:22 No.106362765

Anonymous 08/23/25(Sat)19:07:22 No.106362765

>>106362694
will he quant the grok?

Anonymous
08/23/25(Sat)19:11:58 No.106362792

Anonymous 08/23/25(Sat)19:11:58 No.106362792

File: 1589887234978.gif (1.54 MB, 230x230)

1.54 MB GIF

>>106361864
>experiment with QAT
Was MXFP4 really a mistake?

Anonymous
08/23/25(Sat)19:12:52 No.106362795

Anonymous 08/23/25(Sat)19:12:52 No.106362795

>qwen3-30b-a3b-thinking-2507-q8 slower on ollama but thinks efficiently
>qwen3-30b-a3b-thinking-2507-q8 faster on llama.cpp but keeps repeating itself in the thinking block so the speed gains are negated
What's going on? Why is the same model with the same quantization behaving differently on ollama and llama.cpp? What should I tweak to make the llama.cpp model behave more like the ollama model and reduce overthinking to actually benefit from the faster inference?

Anonymous
08/23/25(Sat)19:14:13 No.106362807

Anonymous 08/23/25(Sat)19:14:13 No.106362807

>>106362795
ollama is slow trash and you didn't set the samplers correctly on llama.cpp, causing repetition

Anonymous
08/23/25(Sat)19:14:29 No.106362812

Anonymous 08/23/25(Sat)19:14:29 No.106362812

>>106362795
You need to inspect the prompt, the hyperparameters, and the launch parameters the backends are getting and compare them.

Anonymous
08/23/25(Sat)19:15:51 No.106362823

Anonymous 08/23/25(Sat)19:15:51 No.106362823

>>106362795
Get a grip on your inference infrastructure and understand what it's actually doing under the hood. Log the raw text input and diff it

Anonymous
08/23/25(Sat)19:15:53 No.106362825

Anonymous 08/23/25(Sat)19:15:53 No.106362825

Btw I heard that ikllama KT quants are bad at the moment. But what is the problem with them specifically? I wanted to do an R1 IQ2_KT quant since I tried exl3 70B like that and I was surprised how good it was.

Anonymous
08/23/25(Sat)19:17:20 No.106362839

Anonymous 08/23/25(Sat)19:17:20 No.106362839

>>106362566
>what happened?
Google says they use QAT for Gemma 3, when it's just quantization aware fine-tuning.

Anonymous
08/23/25(Sat)19:17:57 No.106362840

Anonymous 08/23/25(Sat)19:17:57 No.106362840

>>106362795
Does the llama.cpp log have a warning about a double BOS token?
The GGUF file can define that one should be added and if you then also add one in your prompt you can end up with two.

Anonymous
08/23/25(Sat)19:18:13 No.106362845

Anonymous 08/23/25(Sat)19:18:13 No.106362845

>>106362351
>how many tokens
Seven

Anonymous
08/23/25(Sat)19:19:40 No.106362855

Anonymous 08/23/25(Sat)19:19:40 No.106362855

>>106362351
>You're X.
You don't need more

Anonymous
08/23/25(Sat)19:19:47 No.106362856

Anonymous 08/23/25(Sat)19:19:47 No.106362856

>>106362845
We have sex. You are a pony.

Anonymous
08/23/25(Sat)19:20:40 No.106362860

Anonymous 08/23/25(Sat)19:20:40 No.106362860

>>106359837
Broken-tutu-24b, turn off all samplers, they adversely affect output, causing bad repetition.

Anonymous
08/23/25(Sat)19:23:00 No.106362874

Anonymous 08/23/25(Sat)19:23:00 No.106362874

>>106362541
>distillation
>omni
>QAT
They keep watering down established terms.

Anonymous
08/23/25(Sat)19:54:30 No.106363056

Anonymous 08/23/25(Sat)19:54:30 No.106363056

File: Timeline.png (688 KB, 6277x1302)

688 KB PNG

>>106358892
>Adding cuck and shot to the timeline.
A big vulgar and unneeded but more importantly wasn't there initially. Why even include them on the timeline?

Anonymous
08/23/25(Sat)20:05:12 No.106363134

Anonymous 08/23/25(Sat)20:05:12 No.106363134

>>106359908
>>106360682
do you use 27b?

It's so fucking slow, even compared to 32b models for me.

Anonymous
08/23/25(Sat)20:10:36 No.106363177

Anonymous 08/23/25(Sat)20:10:36 No.106363177

File: file.jpg (71 KB, 1081x137)

71 KB JPG

>>106362417
kek at how fast trannies itt change their flip-flops

Anonymous
08/23/25(Sat)20:14:19 No.106363198

Anonymous 08/23/25(Sat)20:14:19 No.106363198

>>106361292
>I still think Elon is a despicable piece of shit
no truer words have ever been spoken.

Anonymous
08/23/25(Sat)20:14:38 No.106363201

Anonymous 08/23/25(Sat)20:14:38 No.106363201

Can someone explain quants to me?

Is it true Q4 K_M is all you really need? I usually go for the highest that my GPU can handle but I literally can't see a difference between Q4 K_M and Q5 K_M but i've not tested long enough to know.

Anonymous
08/23/25(Sat)20:21:01 No.106363235

Anonymous 08/23/25(Sat)20:21:01 No.106363235

File: 1744388702202956.png (705 KB, 1896x1055)

705 KB PNG

song of the day featuring miku and teto, tenntekomai girl
https://www.nicovideo.jp/watch/sm45323744

Anonymous
08/23/25(Sat)20:22:06 No.106363241

Anonymous 08/23/25(Sat)20:22:06 No.106363241

>>106362417
safetykeks btfo
I hope he buys meta too and fixes the shit out of their models

Anonymous
08/23/25(Sat)20:22:12 No.106363242

Anonymous 08/23/25(Sat)20:22:12 No.106363242

>>106361270
I will always be glad that Elon kicked started the space industry after it was stagnant for decades. But my god he can't help himself from burning bridges and flying off the handle for no good reason. Hopefully he someday learns how to keep his shit together because eventually he will run out of bridges to burn.

Anonymous
08/23/25(Sat)20:22:43 No.106363245

Anonymous 08/23/25(Sat)20:22:43 No.106363245

>>106363235
Is this Japanese youtube?

Anonymous
08/23/25(Sat)20:24:12 No.106363258

Anonymous 08/23/25(Sat)20:24:12 No.106363258

File: mmlu_vs_quants.png (336 KB, 3000x2100)

336 KB PNG

>>106363201
The smaller the more degradation, generally.
Basically, since you are losing numerical precision of the numbers that are being used in the calculations, each "internal nudge" towards the final output is that little bit more different ("inaccurate") compared to full precision.
Something like that.
How much the degradation is noticeable or matter will depend on a lot.
The heuristic is, use the largest bpw (correlated with file size) that you can run at the speeds you are comfortable with the context size you need.

Anonymous
08/23/25(Sat)20:24:29 No.106363260

Anonymous 08/23/25(Sat)20:24:29 No.106363260

>>106363245
yep! and if you've ever seen those videos where viewers' comments are scrolling from right to left, across the main screen of the video, that's where it comes from

Anonymous
08/23/25(Sat)20:26:07 No.106363281

Anonymous 08/23/25(Sat)20:26:07 No.106363281

>>106363201
>Can someone explain quants to me?
Accuracy goes down as the quantization becomes more aggressive. Generally, bigger models handle low bit quants better than smaller ones. That's it.
>Is it true Q4 K_M is all you really need?
If q4km is good enough for you, use that. If you can manage to run something bigger and tolerate the speed, use that instead. If you need more memory, use a lower bit quant.
It depends on the problem you're trying to solve and your expectations. This is not me asking what the problem is nor what your expectations are. It's something you have to evaluate yourself.

Anonymous
08/23/25(Sat)20:27:09 No.106363291

Anonymous 08/23/25(Sat)20:27:09 No.106363291

File: create machines guy head (...).webm (1.63 MB, 270x480)

1.63 MB WEBM

>>106363235

Anonymous
08/23/25(Sat)20:27:44 No.106363294

Anonymous 08/23/25(Sat)20:27:44 No.106363294

>>106363235
When is crypton going to give up and let Synth V make a Miku voicebank? She sounds awful compared to Teto.

Anonymous
08/23/25(Sat)20:31:40 No.106363328

Anonymous 08/23/25(Sat)20:31:40 No.106363328

>>106363258
>>106363281
nah I get that stuff, I just read that most graphs show for the standard models (not speaking the crazier sized ones, moreso in the sub 34b range) that Q4_M is sort of the sweet spot or some shit but I have no idea so figured one of you guys may know more.

I'll stick to Q4s for a while, see how they feel.

Anonymous
08/23/25(Sat)20:35:51 No.106363368

Anonymous 08/23/25(Sat)20:35:51 No.106363368

>>106363245
unironically better for my eyes than american jewtube
now if only I knew more japanese...

Anonymous
08/23/25(Sat)20:36:06 No.106363371

Anonymous 08/23/25(Sat)20:36:06 No.106363371

File: color_depth_llm_comparison.jpg (55 KB, 648x337)

55 KB JPG

>>106363201
quantization is a mapping of the models weights down to a smaller size.
weights are basically floating point numbers.
basically it is like images, the lower the amount of bits you can store for the image, the less accurate the picture will be to the original.

Anonymous
08/23/25(Sat)20:40:47 No.106363415

Anonymous 08/23/25(Sat)20:40:47 No.106363415

>>106363371
We never needed more than 4bit color.

Anonymous
08/23/25(Sat)20:41:29 No.106363418

Anonymous 08/23/25(Sat)20:41:29 No.106363418

>>106363415
I like to give my models 6 bits, as a treat.

Anonymous
08/23/25(Sat)20:41:44 No.106363423

Anonymous 08/23/25(Sat)20:41:44 No.106363423

>>106363415
the greens got duller

Anonymous
08/23/25(Sat)20:42:15 No.106363427

Anonymous 08/23/25(Sat)20:42:15 No.106363427

>>106363134
I've never gotten Gemma to work to good for me either for similar reasons. I prefer it's writing style to Mistral small but having a 24GB card I struggle to find reason to pick Gemma over Mistral when one is so much quicker.

Anonymous
08/23/25(Sat)20:42:36 No.106363429

Anonymous 08/23/25(Sat)20:42:36 No.106363429

>>106363423
You noticed that but not the blues turning gray?

Anonymous
08/23/25(Sat)20:42:53 No.106363433

Anonymous 08/23/25(Sat)20:42:53 No.106363433

File: tetter.jpg (122 KB, 634x357)

122 KB JPG

>>106363294
Most SV Tetos while vastly more natural sounding compared to the old sovl UTAU and vocaloid, sound the same. Might be that few producers make an effort with tuning to make her sound different.
Gets boring. Kino exception: https://www.youtube.com/watch?v=ekrAP7mzKa0
New vocaloid versions are meh imo, trying too hard to sound "real". Vocaloid V2-4 Miku variations sound different and have the soul of imperfection.
https://www.youtube.com/watch?v=rQRlSJJ0OrI

Anonymous
08/23/25(Sat)20:46:17 No.106363452

Anonymous 08/23/25(Sat)20:46:17 No.106363452

File: 63643.jpg (72 KB, 960x540)

72 KB JPG

GPT OSS VS Grok 2 VS maverick
Who is the king of local?

Anonymous
08/23/25(Sat)20:49:09 No.106363476

Anonymous 08/23/25(Sat)20:49:09 No.106363476

>>106361058
>Of course! This is a great question that gets to the heart of
you edited this right? I hope so

Anonymous
08/23/25(Sat)20:49:27 No.106363479

Anonymous 08/23/25(Sat)20:49:27 No.106363479

>>106363452
me :D

Anonymous
08/23/25(Sat)20:51:06 No.106363495

Anonymous 08/23/25(Sat)20:51:06 No.106363495

>>106363452
GPT OSS

Anonymous
08/23/25(Sat)20:52:14 No.106363504

Anonymous 08/23/25(Sat)20:52:14 No.106363504

>>106363452
Drummer

Anonymous
08/23/25(Sat)20:54:14 No.106363518

Anonymous 08/23/25(Sat)20:54:14 No.106363518

>>106363479
I didn't vote for you.

Anonymous
08/23/25(Sat)20:55:24 No.106363525

Anonymous 08/23/25(Sat)20:55:24 No.106363525

>>106363518
Alright, rank them.
1. GPT OSS
2. Grok 2
3. Llama 4 Maverick
4. meee

Anonymous
08/23/25(Sat)20:58:04 No.106363540

Anonymous 08/23/25(Sat)20:58:04 No.106363540

>>106363525
(You) > Grok 2 > Llama 4 Maverick > GPT OSS
Omitting any Chinese options is cheating though.

Anonymous
08/23/25(Sat)21:24:05 No.106363737

Anonymous 08/23/25(Sat)21:24:05 No.106363737

>>106363452
glm 4.5 air

Anonymous
08/23/25(Sat)21:24:32 No.106363743

Anonymous 08/23/25(Sat)21:24:32 No.106363743

>>106363540
i know my worth (less than chinese models)

Anonymous
08/23/25(Sat)21:26:03 No.106363757

Anonymous 08/23/25(Sat)21:26:03 No.106363757

File: Screenshot 2025-08-23 at (...).png (27 KB, 366x474)

27 KB PNG

>being excited for >this

Anonymous
08/23/25(Sat)21:27:49 No.106363767

Anonymous 08/23/25(Sat)21:27:49 No.106363767

>>106363757
He also said grok 3 in six months, which means this is just precedent-setting. I'm more excited for OpenAI getting their lunch eaten from all angles than the actual release themselves.

Anonymous
08/23/25(Sat)21:28:59 No.106363777

Anonymous 08/23/25(Sat)21:28:59 No.106363777

>>106363767
Dropping a model not a single person will ever use isn't eating anyone's lunch.

Anonymous
08/23/25(Sat)21:29:25 No.106363781

Anonymous 08/23/25(Sat)21:29:25 No.106363781

>>106362206
>dynamic q2 for a close to lossless experience
Really? Q2? How much better is that than Q1?

Anonymous
08/23/25(Sat)21:31:05 No.106363794

Anonymous 08/23/25(Sat)21:31:05 No.106363794

>>106363777
Why won't anyone use it?

Anonymous
08/23/25(Sat)21:35:53 No.106363832

Anonymous 08/23/25(Sat)21:35:53 No.106363832

>>106363781
twice as many Qs bro

Anonymous
08/23/25(Sat)21:38:32 No.106363848

Anonymous 08/23/25(Sat)21:38:32 No.106363848

>>106363245
No. YouTube is western nicodou

Anonymous
08/23/25(Sat)21:38:48 No.106363850

Anonymous 08/23/25(Sat)21:38:48 No.106363850

>>106363794
It's old, big, and dumb. Much like your mother. They didn't even have the decency to release the base model.

Anonymous
08/23/25(Sat)21:39:39 No.106363858

Anonymous 08/23/25(Sat)21:39:39 No.106363858

>>106363850
That sucks

Anonymous
08/23/25(Sat)21:39:47 No.106363859

Anonymous 08/23/25(Sat)21:39:47 No.106363859

how much ram does mistral small 24b take up at 128k context?

Anonymous
08/23/25(Sat)21:39:58 No.106363861

Anonymous 08/23/25(Sat)21:39:58 No.106363861

>>106363777
It makes OpenAI's future retiring of 4o instead of open-sourcing look very weak, especially after their last open source shitshow. It causes a public loss of confidence, which is a useful antidote to their arrogance.

Anonymous
08/23/25(Sat)21:40:56 No.106363868

Anonymous 08/23/25(Sat)21:40:56 No.106363868

>>106363861
They'll say it's too dangerous to release, and for all I know they actually believe that.

Anonymous
08/23/25(Sat)21:45:21 No.106363903

Anonymous 08/23/25(Sat)21:45:21 No.106363903

>>106363868
The engineers and safety researchers might believe it, but I don't believe for a minute that sam does.

Anonymous
08/23/25(Sat)21:47:30 No.106363917

Anonymous 08/23/25(Sat)21:47:30 No.106363917

>>106363903
Well, companies only open source last gen technology. And GPT-4o is still current gen for OpenAI :)

Anonymous
08/23/25(Sat)21:55:00 No.106363972

Anonymous 08/23/25(Sat)21:55:00 No.106363972

>>106362417
Gotta hand it to this,
He delivers (eventually).

Anonymous
08/23/25(Sat)22:01:16 No.106364011

Anonymous 08/23/25(Sat)22:01:16 No.106364011

grok-2 gguf status

Anonymous
08/23/25(Sat)22:02:47 No.106364019

Anonymous 08/23/25(Sat)22:02:47 No.106364019

>>106364011
sir sglang is all you need sir

Anonymous
08/23/25(Sat)22:50:22 No.106364258

Anonymous 08/23/25(Sat)22:50:22 No.106364258

>>106358772
With a 5090 rtx what could i make as far as video?

Anonymous
08/23/25(Sat)22:51:47 No.106364271

Anonymous 08/23/25(Sat)22:51:47 No.106364271

File: wtf.jpg (33 KB, 447x315)

33 KB JPG

Can anyone explain to me how koboldccp works with the offloading shit.

Why does it automatically start reducing the layers when I take something easy like say a 24b Mistral Small model up to 24k context. Does that mean my VRAM isn't enough or something? Because when I just manually set it to 43/43 it works fine, even quicker I think.

Should I ignore that Auto Offload Layer shit entirely?

Anonymous
08/23/25(Sat)23:03:31 No.106364354

Anonymous 08/23/25(Sat)23:03:31 No.106364354

>>106363903
They kneecapped Toss, there's no way they'll ever release 4o

Anonymous
08/23/25(Sat)23:18:39 No.106364439

Anonymous 08/23/25(Sat)23:18:39 No.106364439

Did anyone manage to get any TTS models working with RDNA3/ROCm on Arch? I need someone to explain how like I'm a fucking retard, every attempt I've made has failed despite using the rocm torch packages, onnx and whatever else, I always end up with dependency conflicts I fucking hate python environments and pip packages so much

Anonymous
08/23/25(Sat)23:21:33 No.106364450

Anonymous 08/23/25(Sat)23:21:33 No.106364450

>>106363903
>but I don't believe for a minute that sam does.
The fact that Sam released the stinking pile of shit that was OSS makes me 50/50 whether he was trying to poison future open models from other companies with the approach that takes a sledgehammer to intelligence in the name of "safety", or whether he's genuinely schizo and believes peasants don't deserve what amounts to private internet access

Anonymous
08/23/25(Sat)23:25:08 No.106364467

Anonymous 08/23/25(Sat)23:25:08 No.106364467

>>106364271
>Auto Offload
jank; ignore. Use that which makes it go faster through trial and error, then save the good config.

Anonymous
08/23/25(Sat)23:44:21 No.106364579

Anonymous 08/23/25(Sat)23:44:21 No.106364579

File: 1750864005822845.jpg (44 KB, 706x692)

44 KB JPG

>>106358757
>Copyright lawsuit accuses Meta of using pirated adult films for AI training:
Kek

Anonymous
08/23/25(Sat)23:47:44 No.106364598

Anonymous 08/23/25(Sat)23:47:44 No.106364598

I'd use Grok 2 as my Ani's brain. SADly I'm vramlet and ramlet

Anonymous
08/23/25(Sat)23:56:46 No.106364650

Anonymous 08/23/25(Sat)23:56:46 No.106364650

>>106364011
not needed

Anonymous
08/23/25(Sat)23:57:12 No.106364653

Anonymous 08/23/25(Sat)23:57:12 No.106364653

>>106364639
>>106364639
>>106364639

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.