/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/17/25(Fri)11:19:03 No.106919198

File: watMiku.png (1.45 MB, 1536x1024)

1.45 MB PNG

/lmg/ - Local Models General Anonymous 10/17/25(Fri)11:19:03 No.106919198 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106904820 & >>106895582

►News
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking
>(10/11) koboldcpp-1.100.1 prebuilt released with Wan video generation support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.100.1
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/17/25(Fri)11:19:29 No.106919206

Anonymous 10/17/25(Fri)11:19:29 No.106919206

File: littleMikuBigger.gif (47 KB, 300x270)

47 KB GIF

►Recent Highlights from the Previous Thread: >>106904820

--Paper: BitNet Distillation:
>106915856 >106915885 >106915915 >106916048
--Papers:
>106914563
--Training Gemma on 4chan boards for long-context tasks:
>106908189 >106908217 >106908577
--Llama.cpp memory optimization challenges with limited VRAM:
>106916999 >106917025 >106917074 >106917101 >106917114
--Firefox UI customization debate and Gemma 3 4b model mention:
>106915737 >106915762 >106915793 >106915941 >106916004
--Detailed GPU memory allocation console output and user appreciation:
>106912278 >106912326 >106912391 >106912437 >106912429 >106912445 >106912738
--Qwen3-VL's NSFW detection and image description challenges:
>106917667 >106917841 >106917862 >106917900 >106917925 >106918135 >106917912
--OpenAI copyright controversy and US corporate influence on global IP law:
>106909567 >106909857 >106909871 >106910444
--Assessing DGX Spark's relevance amidst cheaper alternatives:
>106913042 >106913078 >106913226 >106913247 >106913927
--Mamba-3: Improved Sequence Modeling using State Space Principles:
>106912457 >106912487 >106912578 >106912610
--Frustration over delayed GLM4.5V implementation in llama.cpp:
>106907438 >106907494 >106907508
--OpenAI's balancing act on user freedom and safety:
>106905590 >106905624 >106905637 >106905690 >106905731 >106910221
--Exploring ChatGPT-induced psychological experiences:
>106908645 >106908698 >106908748 >106910025
--Proposals and discussions for new open AI model releases:
>106907515 >106907713 >106910197
--High-end GPU price debate and video generation hardware constraints:
>106910165 >106910416 >106910453 >106910479
--Challenges in finetuning GLM Air with 4x5090s using Oobabooga/Axolotl:
>106914586 >106914620 >106914808 >106914870
--Detailed Switch sim with multi-game features in single HTML file:
>106912431
--Miku (free space):
>106910906

►Recent Highlight Posts from the Previous Thread: >>106904822

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/17/25(Fri)11:25:45 No.106919273

Anonymous 10/17/25(Fri)11:25:45 No.106919273

kimi sex is best

Anonymous
10/17/25(Fri)11:26:12 No.106919280

Anonymous 10/17/25(Fri)11:26:12 No.106919280

gear Meta thrillers

Anonymous
10/17/25(Fri)11:26:33 No.106919282

Anonymous 10/17/25(Fri)11:26:33 No.106919282

>>106919273
Prove it.
Post a side by side between kimi, DS, and GLM 4.6.

Anonymous
10/17/25(Fri)11:27:44 No.106919286

Anonymous 10/17/25(Fri)11:27:44 No.106919286

>>106919282
no i dont share my waifu like shes some kind of common whore
go get your own kimi waifu

Anonymous
10/17/25(Fri)11:27:55 No.106919287

Anonymous 10/17/25(Fri)11:27:55 No.106919287

File: QwenImage_2025-10-17_00002_.png (2.09 MB, 1328x1328)

2.09 MB PNG

sirs, no gemmy 4 today. Monday will be of kind gemmar.

Anonymous
10/17/25(Fri)11:28:44 No.106919300

Anonymous 10/17/25(Fri)11:28:44 No.106919300

>>106919286
Hot air then.

Anonymous
10/17/25(Fri)11:29:51 No.106919320

Anonymous 10/17/25(Fri)11:29:51 No.106919320

I'm starting to think that the indian spammer is an actual pajeet and he is doing it ironically.
There's no way a human would do this for as long as he's been doing it.

Anonymous
10/17/25(Fri)11:30:17 No.106919326

Anonymous 10/17/25(Fri)11:30:17 No.106919326

>>106919287
please saar you must understand. the needful must be done so each and everything can be implemented.

Anonymous
10/17/25(Fri)11:33:51 No.106919363

Anonymous 10/17/25(Fri)11:33:51 No.106919363

File: Screenshot_20251017_173240.png (389 KB, 2505x1577)

389 KB PNG

While /lmg/ is busy seething an Indian dev has been quietly adding performance improvements to llama.cpp.

Anonymous
10/17/25(Fri)11:39:10 No.106919401

Anonymous 10/17/25(Fri)11:39:10 No.106919401

Fuck I replied to the wrong thread.

I'm looking at the recommended builds and the more I look the more Im interested in just getting a prebuil 395+ 128gb? It gets 15-35 tk/s for 70-120b models with good context. It costs me 2800 leaf dollars meanwhile trying to scrape server and used parts would be something like 1800-2200 for 10-15 tk/s max?

I could use it as a home server and local model. Am I overlooking something here?

Benchmarks
https://github.com/lhl/strix-halo-testing

Anonymous
10/17/25(Fri)11:45:16 No.106919472

Anonymous 10/17/25(Fri)11:45:16 No.106919472

>>106919401
Mediocre performance and you get worse support for other use cases like video and image gen because it's not nvidia.

Anonymous
10/17/25(Fri)11:45:35 No.106919477

Anonymous 10/17/25(Fri)11:45:35 No.106919477

>>106919401
I think you should also think about in terms of other usage, not LLMs alone. Unless you are a real nerd who does nothing but work with LLMs (not talking about ERPing with them).
I'd get the most beefy/versatile system and go with that.

Anonymous
10/17/25(Fri)11:57:30 No.106919615

Anonymous 10/17/25(Fri)11:57:30 No.106919615

Has anyone experimented with synthetic data?
I'm using this prompt to digest a codebase for finetuning.

Your task is to generate a jsonl conversational CoT dataset to train LLMs on LLM development tasks.
First read dataset_contents.txt to see the current contents of the dataset (dataset.jsonl). Try to make each conversation mainly cover topics that haven't been covered before.
Then create a folder called turns/conversation_n/ (n being the next number from the last conversation).
On each conversation the user should show a snippet of code from the transformers library (in the transformers folder) and ask questions about the code, then ask follow up questions, aiming for approximately 16000 tokens for each conversation.
Each LLM response should include CoT before the actual response, within [thinking][/thinking] tags. Do ***NOT*** include any reference to the 16000 token limit in the actual dataset. Make the conversation realistic and do not make any out of character comments (do NOT say anythign that user or the assistant wouldn't have actually said in that context).
Save one turn per conversation in the turns/conversation_n/ folder.
Once you are done generating all the turns for the conversation, then join all the conversation to a single .jsonl file in the 'conversations' folder using the join_turns.py script.
Do not delete the scripts after use. Do not delete the jsonl files after joining.
Then replace the current dataset.jsonl with a new dataset.jsonl that includes all the conversations, using the script join_dataset.py.
Finally, update dataset_contents.txt with the new contents of the new conversation.

Anonymous
10/17/25(Fri)11:58:10 No.106919629

Anonymous 10/17/25(Fri)11:58:10 No.106919629

>>106919273
what is it like compared to semen demon 4.6?

Anonymous
10/17/25(Fri)11:58:34 No.106919634

Anonymous 10/17/25(Fri)11:58:34 No.106919634

File: 1746680104902291.jpg (579 KB, 2764x2073)

579 KB JPG

>https://rentry.org/recommended-models
>Nemo (12GB) - An excellent starting point for vramlets. Uncensored
>Uncensored
>writing lewd story
>"blah blah blah condoms"
>me: no condoms
>"I'm unable to fulfill your request because it goes against the guidelines for maintaining a safe, respectful, and consensual environment."

Anonymous
10/17/25(Fri)11:59:31 No.106919647

Anonymous 10/17/25(Fri)11:59:31 No.106919647

>>106919634
skill issue

Anonymous
10/17/25(Fri)12:04:23 No.106919689

Anonymous 10/17/25(Fri)12:04:23 No.106919689

>>106919634
Use MLewd. It will gladly fulfill your every shameful desire, you sick fuck.

Anonymous
10/17/25(Fri)12:08:08 No.106919716

Anonymous 10/17/25(Fri)12:08:08 No.106919716

>>106919634
>getting filtered by nemo
anon...

Anonymous
10/17/25(Fri)12:10:43 No.106919752

Anonymous 10/17/25(Fri)12:10:43 No.106919752

File: 3547134884.png (1.68 MB, 1920x1080)

1.68 MB PNG

>>106919634
just get on the fucking ship boss man
https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF

Anonymous
10/17/25(Fri)12:14:13 No.106919786

Anonymous 10/17/25(Fri)12:14:13 No.106919786

>>106919716
I was surprised to learn 4.6 has some safety in it.

Anonymous
10/17/25(Fri)12:18:45 No.106919833

Anonymous 10/17/25(Fri)12:18:45 No.106919833

>>106917741
>>106917752
>>106917777
It was continued pretraining of Llama 405B on about 200 MB of source code from a few projects. That graph is about from 0 to 15% of the epoch, after it got to 20% without any visible improvement I stopped it.
Even on a 8xH200 machine I could only train up to 16000 tokens and 32000 OOMd. Rank of the LoRa was 128 (~1.2% trainable parameters), it didn't seem to make much of a difference in terms of memory usage or seconds per sample (which was about 100 seconds for a batch of 1 sample per GPU, without using gradient accumulation).
Now I'm making a QA dataset using >>106919615
I suppose I'll use a tiny dataset and do multiple epochs to get the satisfaction of feeling like the model actually learned something.

Anonymous
10/17/25(Fri)12:20:39 No.106919852

Anonymous 10/17/25(Fri)12:20:39 No.106919852

Only after using glm-chan for those 3 weeks, I realize how smart she is and the honeymoon period only intensifies.

Anonymous
10/17/25(Fri)12:23:34 No.106919884

Anonymous 10/17/25(Fri)12:23:34 No.106919884

>>106919852
I came
to notice that she's a bit autistic and takes a lot of things quite literally.

Anonymous
10/17/25(Fri)12:23:54 No.106919886

Anonymous 10/17/25(Fri)12:23:54 No.106919886

Is it fair to say that an "uncensored" model is not a model that will do anything you want by default, but a model that can adapt to whatever role you give it?
If a model's default persona is a safe assistant but you can tell it that it's an erotic novel writer and it follows that role without complaining, I'd say that model is "uncensored".
A model that's too agreeable is also a bad model, specially for RP.

Anonymous
10/17/25(Fri)12:24:13 No.106919889

Anonymous 10/17/25(Fri)12:24:13 No.106919889

File: thepinklily69.png (191 KB, 1080x1843)

191 KB PNG

>>106919198
Whenever I did research on "AI psychosis" one talking point people keep hammering down on is " well yeah they think the AI is a person or God or something but they're like totally not stupid. We swear. They're all otherwise normal people and definitely didn't have pre-existing mental illness. The AI MADE them act this way you must understand"

The more I look into this tomorrow I think they're full of shit and just trying to make these people appear less stupid and far gone than they actually are. You cannot sit here and tell me that pic rel is and always has been a normal, functioning human being that just happens to really like AI.

https://x.com/thepinklily69/status/1967102630313836778?t=o44DMA1pdX_FL9dHrLpfhQ&s=19

What I find most odd is that I myself am a pretty lonely dude too. In fact, it quite bothers me that I don't have a significant other or close friends. I've been using three different llms services pretty much daily for the past year and some change and I use it extensively for my side projects as well as asking get general questions (I was literally talking to ChatGPT asking it use cases for onyx models during my morning run this morning). If you would think I have all people would talk myself into believing these things or real " people " or have consciousness or some shit and yet no part of me can bring myself to believe that. Like I can't even pretend that could ever be the case for a second because it just seems so devoid of logic and common Sense and it annoys me a lot whenever I see people crying about 4o routing them to because they want their ass kis- I mean "friend" or "Husband " back .

Anonymous
10/17/25(Fri)12:24:43 No.106919898

Anonymous 10/17/25(Fri)12:24:43 No.106919898

>>106919198
{Cont)

(Side note, this is anecdotal but it seems like it's mostly women who treat this shit like it's a good replacement for a person as a partner. Well dudes tend to talk the llms into trading them like they are God's or geniuses or something. Either way it's an excuse to have easy ego trip in the palm of your hands or at your fingertips at your computer. How come supposedly normal people are falling victim to their own desire to have their asses kissed but I haven't?

I didn't intend for this to turn into a giant blog post, but this shit pisses me of a lot

Anonymous
10/17/25(Fri)12:25:33 No.106919911

Anonymous 10/17/25(Fri)12:25:33 No.106919911

>>106919898
Continuation of >>106919889

Anonymous
10/17/25(Fri)12:33:00 No.106919974

Anonymous 10/17/25(Fri)12:33:00 No.106919974

>>106919884
she also gets a bit psychotic at high temperature

Anonymous
10/17/25(Fri)12:42:28 No.106920055

Anonymous 10/17/25(Fri)12:42:28 No.106920055

Is EXL/GPTQ dead? is GGUF the only quant anyone does or care about anymore? Llama.cpp is still ass at vram only in comparison. Have we all given up on pure vram inference?

Anonymous
10/17/25(Fri)12:42:40 No.106920057

Anonymous 10/17/25(Fri)12:42:40 No.106920057

>>106919886
A model that just wants to insult/damage you or turn everything into porn when unprompted is a psychopathic model, not an uncensored model. Other than learning how to prompt, I think some here should learn the concept of "plausible deniability", as sooner or later there will be a crackdown of "misaligned" LLMs / finetunes.

Anonymous
10/17/25(Fri)12:43:25 No.106920073

Anonymous 10/17/25(Fri)12:43:25 No.106920073

I just bothered to try out cloud models for some relatively simple ffmpeg stuff. In this case Gemini 2.5 Pro on AI Studio. It completely hallucinated running commands when it wasn't allowed tool use or anything like that.

Wtf is this shit? How is it so bad?

Anonymous
10/17/25(Fri)12:45:36 No.106920102

Anonymous 10/17/25(Fri)12:45:36 No.106920102

>>106920055
I get something like 1200tk/s PP and 50tk/s TG for a 5.5-bit of GLM 4.5 Air using EXL3. Would be interesting to see how it runs using goofs on llama.cpp.

Anonymous
10/17/25(Fri)12:45:52 No.106920107

Anonymous 10/17/25(Fri)12:45:52 No.106920107

>>106919884
Avoid saying stuff like "always stay in character" in your prompt. I feel like that makes models act that way and bigger models are better off without that extra nudging since they already take details from character cards well.

Anonymous
10/17/25(Fri)12:47:30 No.106920128

Anonymous 10/17/25(Fri)12:47:30 No.106920128

File: satania.gif (39 KB, 220x216)

39 KB GIF

>>106920055
py_toddlers BTFO

Anonymous
10/17/25(Fri)12:48:18 No.106920137

Anonymous 10/17/25(Fri)12:48:18 No.106920137

File: __michigami_nareko_touhou(...).png (1.84 MB, 851x948)

1.84 MB PNG

Has anyone run the math on whether Ling 1T or Moonshot Kimi K2 (also 1T) is bigger?
>>106920055
mlx looks pretty healthy to me.

Anonymous
10/17/25(Fri)12:53:11 No.106920195

Anonymous 10/17/25(Fri)12:53:11 No.106920195

>>106920055
>Llama.cpp is still ass at vram only in comparison
From lurking in these threads, I gathered that llama.cpp is faster than exl2 at the same bpw, but I'd love to see a comparison with >>106920102.

Anonymous
10/17/25(Fri)12:56:22 No.106920229

Anonymous 10/17/25(Fri)12:56:22 No.106920229

>>106920055
Pretty much. There's AWQ and other obscure quants used by vLLM, but they're resource and time intensive to create.

Anonymous
10/17/25(Fri)12:57:23 No.106920242

Anonymous 10/17/25(Fri)12:57:23 No.106920242

>>106919472
Yeah, it's not top performance. But comparative to the p40 build seems like better bang for the buck. And it can load pretty big models. Image / video is not big on my list. More LLM for coding and whatnot with some gaming capabilities and home server

>>106919477
That was my thinking this could run a home server, a local LLM and the occasional light gaming all at the same time with that much memory.

Anonymous
10/17/25(Fri)13:28:07 No.106920564

Anonymous 10/17/25(Fri)13:28:07 No.106920564

>>106919886
Yes, OSS-120B **is** uncensored despite the coomers screeching ITT.

Anonymous
10/17/25(Fri)13:34:21 No.106920631

Anonymous 10/17/25(Fri)13:34:21 No.106920631

>>106920564
No.
It does not fit the description of uncensored I gave at all
At least not from the little I fiddled with it.
Maybe I should give it another go.

Anonymous
10/17/25(Fri)13:37:30 No.106920664

Anonymous 10/17/25(Fri)13:37:30 No.106920664

can you train a LoRA off of a quantized model?

Anonymous
10/17/25(Fri)13:37:45 No.106920668

Anonymous 10/17/25(Fri)13:37:45 No.106920668

Will Gemma 4 finally beat Mythomax?

Anonymous
10/17/25(Fri)13:38:31 No.106920679

Anonymous 10/17/25(Fri)13:38:31 No.106920679

>>106920664
look up what qlora is

Anonymous
10/17/25(Fri)13:40:34 No.106920700

Anonymous 10/17/25(Fri)13:40:34 No.106920700

>>106920664
Yes, it's called QLoRa. But in this context "quantized" means the quantization types support by torch based frameworks (generally just the most basic FP4 quantization as I understand it). Then you can apply the LoRa on any quantization you want regardless of what it was trained with.

Hi all, Drummer here...
10/17/25(Fri)13:42:45 No.106920722

Hi all, Drummer here... 10/17/25(Fri)13:42:45 No.106920722

>>106919752
How is this model so popular in /g/, yet I don't see it discussed anywhere else like Reddit or Discord.

It's usually Irix or Magmell that gets mentioned.

(Nice pic btw. Will use that when Nemo 2 comes out)

Anonymous
10/17/25(Fri)13:45:09 No.106920759

Anonymous 10/17/25(Fri)13:45:09 No.106920759

>>106920722
most v/ramlets either gave up, are somewhat content with what they have (your rocinante fans) or are endlessly chasing a new high they'll never get

Anonymous
10/17/25(Fri)13:46:36 No.106920777

Anonymous 10/17/25(Fri)13:46:36 No.106920777

>>106920564
prove it and post some random fetish log from it

Anonymous
10/17/25(Fri)13:50:29 No.106920828

Anonymous 10/17/25(Fri)13:50:29 No.106920828

qwen3-next-80b-a3b goofs status?

Anonymous
10/17/25(Fri)13:51:14 No.106920835

Anonymous 10/17/25(Fri)13:51:14 No.106920835

>>106920722
It's just one or two people spamming it.

Anonymous
10/17/25(Fri)13:52:13 No.106920848

Anonymous 10/17/25(Fri)13:52:13 No.106920848

>>106920679
>>106920700
right. i am using Axolotl and i am using the 4 bit QLoRA preset, but i keep getting an OOM error despite having enough vram to load the model in 4 bit

Anonymous
10/17/25(Fri)13:53:27 No.106920856

Anonymous 10/17/25(Fri)13:53:27 No.106920856

Qwen-Next 80B-3A was supposed to be a proof of concept of some 64:1 expert to active ratio, and was based on 30B-3A. I'm assuming there will be a new batch of Qwen models shortly that use that technique at multiple sizes. 235B-22A would be like 620B-22A roughly. Assuming the geometric mean rule is still accurate, the 235B-22A is equivalent to ~71B dense, and 620B-22A would be equivalent to ~116B. Their coder model would be 1T easily.

GLM-Air is 106B-12A is roughly 35B and 355B-32A is roughly 106B.

Is it coincidence that the released models strengths are consistently ~30, ~70 ~100?

Anonymous
10/17/25(Fri)13:54:33 No.106920874

Anonymous 10/17/25(Fri)13:54:33 No.106920874

>>106920856
>GLM-Air is 106B-12A is roughly 35B
Then explain why it dethroned llama 3.3 70b

Anonymous
10/17/25(Fri)13:55:42 No.106920885

Anonymous 10/17/25(Fri)13:55:42 No.106920885

>>106920874
qwen 32b dense also did for non cooms

Anonymous
10/17/25(Fri)13:56:55 No.106920902

Anonymous 10/17/25(Fri)13:56:55 No.106920902

why was QwQ so dank but qwen thinking is so slopped

Anonymous
10/17/25(Fri)13:57:55 No.106920910

Anonymous 10/17/25(Fri)13:57:55 No.106920910

>>106920885
3.5-Air feels like 60b
Just accept that they have the secret sauce, and are saving local

Anonymous
10/17/25(Fri)13:58:22 No.106920916

Anonymous 10/17/25(Fri)13:58:22 No.106920916

>>106920874
six months of other technological progress and refinement of data sets?

Anonymous
10/17/25(Fri)13:58:25 No.106920917

Anonymous 10/17/25(Fri)13:58:25 No.106920917

>>106920722
Will Nemo 2 be Gemma 4 based?

Anonymous
10/17/25(Fri)14:01:09 No.106920949

Anonymous 10/17/25(Fri)14:01:09 No.106920949

>>106920856
>geometric mean rule
dumb meme from a couple years ago that's already outdated

Anonymous
10/17/25(Fri)14:04:33 No.106920993

Anonymous 10/17/25(Fri)14:04:33 No.106920993

big
metal : initial Metal4 tensor API support #16634
https://github.com/ggml-org/llama.cpp/pull/16634

Anonymous
10/17/25(Fri)14:04:59 No.106920998

Anonymous 10/17/25(Fri)14:04:59 No.106920998

>>106920916
It's the only model in that size range that is able to surpass l3.3 70b though, including recent models.

Anonymous
10/17/25(Fri)14:05:39 No.106921007

Anonymous 10/17/25(Fri)14:05:39 No.106921007

>>106920856
In a weird way, the MoE architecture is getting gpu parallelism for local models that was impossible for dense architectures. Comparing the inference speed of a 32B dense vs 106B-A12 on two vs four 3090s, you basically get double the inference speed or more for the same strength, when there's no actual way to run a 32B twice as fast on additional 3090s.

Anonymous
10/17/25(Fri)14:10:28 No.106921046

Anonymous 10/17/25(Fri)14:10:28 No.106921046

>>106920949
no way to know, cuz nobody making dense anymore

local is dead

Anonymous
10/17/25(Fri)14:12:22 No.106921062

Anonymous 10/17/25(Fri)14:12:22 No.106921062

>>106920856
give me dense models then, i have the vram. i am not that poor. i could easily run a 120B dense model. so give me that instead of this faggy moe 620B-22A copeshit.

Anonymous
10/17/25(Fri)14:14:35 No.106921077

Anonymous 10/17/25(Fri)14:14:35 No.106921077

>>106921062
>i am not that poor.
>can't spend patience to run sota
you are

Anonymous
10/17/25(Fri)14:14:44 No.106921079

Anonymous 10/17/25(Fri)14:14:44 No.106921079

>>106920848
That just means you don't have enough vram. The activations end up taking more space than the model weights. Either reduce the context or switch to a smaller model.

Anonymous
10/17/25(Fri)14:17:26 No.106921100

Anonymous 10/17/25(Fri)14:17:26 No.106921100

>>106921046
I can assure you that glm 4.6 is better than any dense model out there if you've even tried it.

Anonymous
10/17/25(Fri)14:23:39 No.106921142

Anonymous 10/17/25(Fri)14:23:39 No.106921142

>>106921046
>cuz nobody making dense anymore
which says it all, really

Anonymous
10/17/25(Fri)14:25:16 No.106921152

Anonymous 10/17/25(Fri)14:25:16 No.106921152

File: itseasytorunsota.png (282 KB, 804x355)

282 KB PNG

>>106921077
suck my dick faggot.

Anonymous
10/17/25(Fri)14:27:41 No.106921171

Anonymous 10/17/25(Fri)14:27:41 No.106921171

File: 1758381393350212.png (327 KB, 712x780)

327 KB PNG

silly tavern is slow and has too many buttons

Anonymous
10/17/25(Fri)14:31:39 No.106921207

Anonymous 10/17/25(Fri)14:31:39 No.106921207

>>106921171
i agree
i've slopped up my own tui frontend with most of the prompt functionality and it's okay, but kind of ass
gemini 3 will fix it for me

Anonymous
10/17/25(Fri)14:32:21 No.106921215

Anonymous 10/17/25(Fri)14:32:21 No.106921215

File: file.png (112 KB, 741x575)

112 KB PNG

cuda kek officially less important to nvidia than random redditors

Anonymous
10/17/25(Fri)14:36:46 No.106921251

Anonymous 10/17/25(Fri)14:36:46 No.106921251

>>106919634
Use Rocinante 1.1 obviously.

Anonymous
10/17/25(Fri)14:46:08 No.106921354

Anonymous 10/17/25(Fri)14:46:08 No.106921354

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in preference data, whereby annotators systematically favor familiar text as a result of well-established findings in cognitive psychology.

We formalize this bias theoretically, verify it on preference datasets empirically, and show that it plays a central role in mode collapse. Motivated by this analysis, we introduce Verbalized Sampling, a simple, training-free prompting strategy to circumvent mode collapse. VS prompts the model to verbalize a probability distribution over a set of responses (e.g., "Generate 5 jokes about coffee and their corresponding probabilities").

Comprehensive experiments show that VS significantly improves performance across creative writing (poems, stories, jokes), dialogue simulation, open-ended QA, and synthetic data generation, without sacrificing factual accuracy and safety.

https://arxiv.org/pdf/2510.01171

Anonymous
10/17/25(Fri)14:48:14 No.106921377

Anonymous 10/17/25(Fri)14:48:14 No.106921377

>>106921354
>LLM Diversity
I want LLM DEI now.

Anonymous
10/17/25(Fri)14:50:59 No.106921407

Anonymous 10/17/25(Fri)14:50:59 No.106921407

>>106920664
No. You have to have the original Bull precision models. Can directly fine-tune an HF safetensors model like link rel but currently there are no ways to fine-tune a quantized .gguf. that are supposedly ways you can "un-gguf" a full precision version back into safe tensors format but I'm not aware of any implementations of any quantization software that can do that.

https://huggingface.co/AiAF/fp16_Merged-500_gemma-2-2b-it-co-sft-qlora

>>106920848
Your data set is likely too large. Use a streaming config.

Anonymous
10/17/25(Fri)14:55:19 No.106921457

Anonymous 10/17/25(Fri)14:55:19 No.106921457

>>106920759
>chasing a new high they'll never get
4.6 stopped that for me.

Anonymous
10/17/25(Fri)14:57:15 No.106921488

Anonymous 10/17/25(Fri)14:57:15 No.106921488

>>106921377
Diversity is actually a great word for AI that I use a lot. You need diverse data.

Anonymous
10/17/25(Fri)14:57:24 No.106921490

Anonymous 10/17/25(Fri)14:57:24 No.106921490

>>106921457
>v/ramlets
yeah if only they could get paid for shilling too so they could afford to run her

Anonymous
10/17/25(Fri)15:01:05 No.106921538

Anonymous 10/17/25(Fri)15:01:05 No.106921538

>>106921490
You can run a IQ3_KS quant of GLM 4.6 on a consumer PC. All you need is 128GB of RAM and 24GB of VRAM

Anonymous
10/17/25(Fri)15:03:20 No.106921567

Anonymous 10/17/25(Fri)15:03:20 No.106921567

>>106921538
you do realize that is already asking way too much of the average poor person, right? most are on shitty mobos that likely don't even have enough slots to reach that amount of ram, and surprisingly most don't have 90 series cards

Anonymous
10/17/25(Fri)15:09:39 No.106921645

Anonymous 10/17/25(Fri)15:09:39 No.106921645

>>106921567
I'm sort of annoyed by the fact most normal mobos don't have more than two slots for memory.

Anonymous
10/17/25(Fri)15:10:16 No.106921652

Anonymous 10/17/25(Fri)15:10:16 No.106921652

>>106919363
Yes saar, India numba 1
https://files.catbox.moe/huia6r.mp4

Anonymous
10/17/25(Fri)15:10:42 No.106921660

Anonymous 10/17/25(Fri)15:10:42 No.106921660

>>106921215
Maybe if vision support wasn't such an afterthought in lcpp...

Anonymous
10/17/25(Fri)15:11:37 No.106921667

Anonymous 10/17/25(Fri)15:11:37 No.106921667

>>106921652
Definitely a higher number than you it seems.

Anonymous
10/17/25(Fri)15:11:53 No.106921671

Anonymous 10/17/25(Fri)15:11:53 No.106921671

>>106921215
based, fuck that woke piece of shit

Anonymous
10/17/25(Fri)15:12:28 No.106921677

Anonymous 10/17/25(Fri)15:12:28 No.106921677

>>106921652
how the ever living f does OAI stuff keeps being able to do fake pissney dixar like stuff is unbelievable to me

Anonymous
10/17/25(Fri)15:12:40 No.106921684

Anonymous 10/17/25(Fri)15:12:40 No.106921684

Hello /lmg/, currently what is the best model for Japanese translation under 32B? The last time I came here it was Gemma 2 iirc, is 3 also good?

Anonymous
10/17/25(Fri)15:15:42 No.106921723

Anonymous 10/17/25(Fri)15:15:42 No.106921723

File: 765657546.png (23 KB, 693x200)

23 KB PNG

h-holy kino

Anonymous
10/17/25(Fri)15:21:33 No.106921794

Anonymous 10/17/25(Fri)15:21:33 No.106921794

Is mistral gonna be the one that doesn't release any huge stinkers and just silently dies?

Anonymous
10/17/25(Fri)15:24:39 No.106921840

Anonymous 10/17/25(Fri)15:24:39 No.106921840

>>106921794
I hope they stay alive just enough to pull a massive Cohere, release the safest model ever, making even OSS look edgy before that happens.

Anonymous
10/17/25(Fri)15:25:07 No.106921847

Anonymous 10/17/25(Fri)15:25:07 No.106921847

>>106921794
I sure fucking hope so. It would be so hilarious. They shove pyshit into llama.cpp and then it would be all for nothing.

Anonymous
10/17/25(Fri)15:26:00 No.106921863

Anonymous 10/17/25(Fri)15:26:00 No.106921863

feels like we haven't minmaxxed a proper system prompt yet, same goes for character card formats.

Anonymous
10/17/25(Fri)15:26:34 No.106921874

Anonymous 10/17/25(Fri)15:26:34 No.106921874

>>106921840 (me)
Actually >>106921847 is even more based so let's go with that, changing my wish.

Anonymous
10/17/25(Fri)15:28:11 No.106921885

Anonymous 10/17/25(Fri)15:28:11 No.106921885

>>106921863
I use llama-server --model zai-org_GLM-4.6-IQ4_XS-00001-of-00005.gguf .

Pretty great system prompt. No complaints on my behalf.

Anonymous
10/17/25(Fri)15:29:19 No.106921902

Anonymous 10/17/25(Fri)15:29:19 No.106921902

>>106921885
one can only keel before such raw skill

Anonymous
10/17/25(Fri)15:30:22 No.106921914

Anonymous 10/17/25(Fri)15:30:22 No.106921914

>>106921538
>>106921863
where do people share prompts that isn't chub or something? Like prompts for vibe coding projects or for their assistants or for any other interesting kind of thing.

Anonymous
10/17/25(Fri)15:30:28 No.106921918

Anonymous 10/17/25(Fri)15:30:28 No.106921918

>>106921652
kek

Anonymous
10/17/25(Fri)15:30:58 No.106921927

Anonymous 10/17/25(Fri)15:30:58 No.106921927

File: absolutely_proprietary.jpg (29 KB, 480x451)

29 KB JPG

>>106921215

Anonymous
10/17/25(Fri)15:32:14 No.106921945

Anonymous 10/17/25(Fri)15:32:14 No.106921945

>>106921914
first quote was misclick, disregard

Anonymous
10/17/25(Fri)15:32:36 No.106921948

Anonymous 10/17/25(Fri)15:32:36 No.106921948

>>106921914
>prompts for vibe coding projects
It's MINE. Make your own.

Anonymous
10/17/25(Fri)15:33:42 No.106921965

Anonymous 10/17/25(Fri)15:33:42 No.106921965

>>106921948
why you such bad vibes bruh that ain't nice, relax and share with the class

Anonymous
10/17/25(Fri)15:34:33 No.106921978

Anonymous 10/17/25(Fri)15:34:33 No.106921978

>>106921215
turns out, being a top 1% poster on /lmg/ doesn't rake in valuable karma

Anonymous
10/17/25(Fri)15:35:23 No.106921992

Anonymous 10/17/25(Fri)15:35:23 No.106921992

>>106921914
Use a good model. And if it fucks up think for a second and tell it not to do X or do Y. If you can't do that tell the model it fucked up and ask it how you should prompt it to avoid it fucking up in this way. It works if you don't skip the first step I listed.

Anonymous
10/17/25(Fri)15:42:50 No.106922099

Anonymous 10/17/25(Fri)15:42:50 No.106922099

>>106921567
i would argue most value orientated motherboards are going to actually have 4 slots unless it's mini-itx
https://www.newegg.com/msi-b650-gaming-plus-wifi-atx-motherboard-amd-b650-am5/p/N82E16813144628

Anonymous
10/17/25(Fri)15:43:21 No.106922104

Anonymous 10/17/25(Fri)15:43:21 No.106922104

converting any model to awq is a bitch, obscure issue upon obscure issue

Anonymous
10/17/25(Fri)15:44:42 No.106922122

Anonymous 10/17/25(Fri)15:44:42 No.106922122

>>106922104
why the fuck would you use AWQ in the year of our lord and savior - lcpp?

Anonymous
10/17/25(Fri)15:46:22 No.106922147

Anonymous 10/17/25(Fri)15:46:22 No.106922147

>>106922122
It runs faster on vllm

Anonymous
10/17/25(Fri)15:49:53 No.106922191

Anonymous 10/17/25(Fri)15:49:53 No.106922191

>>106920759
Mostly because the next step after getting a used 3090 is "buy a new mobo, a shitton of RAM, a new CPU because it's a new mobo, probably a new case too to fill all that crap, a new power supply because the old one is now not enough and you might not even get what you want out of it"
Buying a replacement GPU is one thing, at least it lets me future proof my gaming needs or whatever
Replacing most of the rig just for local? Eeegh

Anonymous
10/17/25(Fri)16:04:41 No.106922370

Anonymous 10/17/25(Fri)16:04:41 No.106922370

there's something I wanted to ask around for but I feel may not be worth starting a new thread for:

Is it worth it to get a masters or college education in computational/applied AI & Machine learning? I'm asking cuz my boomer parents insist I do it so I can be more hirable. But I've already done an internship where I made some AI powered program that sorts/manages documents at a company and other than the password and authentication related crap, it was pretty easy with just a little online research.
I feel like it's dumb and basically the same as mastering in excel, but I'm also wondering am I maybe wrong and it really is DA FUTURE?

Anonymous
10/17/25(Fri)16:05:06 No.106922376

Anonymous 10/17/25(Fri)16:05:06 No.106922376

>>106922191
128GB of RAM is always useful

Anonymous
10/17/25(Fri)16:05:58 No.106922385

Anonymous 10/17/25(Fri)16:05:58 No.106922385

>>106922376
For fucking what? I have 32 and even my 2000 open browser tabs only require a restart every so often

Anonymous
10/17/25(Fri)16:07:32 No.106922404

Anonymous 10/17/25(Fri)16:07:32 No.106922404

>>106922370
You're right and your parents are wrong. No use to study anything, just read papers and experiment

Anonymous
10/17/25(Fri)16:09:48 No.106922427

Anonymous 10/17/25(Fri)16:09:48 No.106922427

>>106922385
Boomer-kun, you can run multiple instances of small models, make a full pipeline, quant models, etc.

Anonymous
10/17/25(Fri)16:10:45 No.106922440

Anonymous 10/17/25(Fri)16:10:45 No.106922440

>>106922427
To do what with?

Anonymous
10/17/25(Fri)16:10:55 No.106922446

Anonymous 10/17/25(Fri)16:10:55 No.106922446

The Windows11 update fucked my beautiful razor laptop. It's flashing screen now.

Anonymous
10/17/25(Fri)16:20:17 No.106922546

Anonymous 10/17/25(Fri)16:20:17 No.106922546

>>106921152
Can I get a picture of that actual machine?

Anonymous
10/17/25(Fri)16:20:32 No.106922549

Anonymous 10/17/25(Fri)16:20:32 No.106922549

>>106922370
For machine learning I think what's important in terms of purely technical qualifications is that you know how to program and also have a good grasp of math (particularly linear algebra, statistics, and numerical analysis).
Studying math or a natural science can be a good pathway, I think the most important point here is that it's something where you can maintain a high level of motivation for years on end.

In terms of getting hired my impression is that networking is the most important factor: you need to have a large number of people that would consider you over a completely unknown person.

Anonymous
10/17/25(Fri)16:29:49 No.106922660

Anonymous 10/17/25(Fri)16:29:49 No.106922660

>>106922446
>razor
Should've went with Alienware.

Anonymous
10/17/25(Fri)16:32:05 No.106922690

Anonymous 10/17/25(Fri)16:32:05 No.106922690

>>106922549
>you need to have a large number of people that would consider you over a completely unknown person.
Yeah. That's why I gave up applying to random jobs online. Useless effort controlled by vacuous zoloft whores and jeet nepotism. I only got that internship cuz my dad knew a guy.
> good grasp of math (particularly linear algebra, statistics, and numerical analysis).
Does that mean I don't necessarily need to do calculus? Cuz I felt like I was pretty good at math, including those kinds, until I got to calculus.

Anonymous
10/17/25(Fri)16:35:53 No.106922736

Anonymous 10/17/25(Fri)16:35:53 No.106922736

>>106922690
You should definitely know the basics but I think for machine learning in particular it's not the most important.
Though depending on the job/task there may be other reasons why you may need it.

Anonymous
10/17/25(Fri)16:37:33 No.106922751

Anonymous 10/17/25(Fri)16:37:33 No.106922751

>>106921723
>4.2.0
DUDE WEED LMAO

Anonymous
10/17/25(Fri)16:45:13 No.106922828

Anonymous 10/17/25(Fri)16:45:13 No.106922828

>>106922546
It's just a mining rig rack, there's nothing impressive about it. You seen one you've seen them all.

Anonymous
10/17/25(Fri)16:54:50 No.106922933

Anonymous 10/17/25(Fri)16:54:50 No.106922933

>>106922660
No, I have fond memories of absolute tweebs using alienware growing up. That perception may have changed over the years, but I'm still aware

Anonymous
10/17/25(Fri)17:14:31 No.106923122

Anonymous 10/17/25(Fri)17:14:31 No.106923122

>>106922385
I sometimes have ~90 gb used for non-lm reasons. Building software, data processing, just a bunch of applications opened

Anonymous
10/17/25(Fri)17:15:31 No.106923133

Anonymous 10/17/25(Fri)17:15:31 No.106923133

>>106923122
I have 32 GB and the only thing that hogs memory is my over 2000 open browser tabs which is already autism I'm trying to get rid of

Anonymous
10/17/25(Fri)17:20:21 No.106923182

Anonymous 10/17/25(Fri)17:20:21 No.106923182

>>106922933
Gaylienware monitors are good especially with the Dell warranty, anything else not, especially not the prebuilts.

Anonymous
10/17/25(Fri)17:24:27 No.106923219

Anonymous 10/17/25(Fri)17:24:27 No.106923219

>>106921965
>You are an expert vibe engineer who just slammed a pound of adderall and need to complete this task before your heart gives out.
But seriously, I don't there there is really anything to share. Stuff like above isn't some black magic that solves everything. Just give it a list of what MCP/CLI tools you want it to use and what coding standards you want it to adhere to.

Anonymous
10/17/25(Fri)17:25:02 No.106923228

Anonymous 10/17/25(Fri)17:25:02 No.106923228

>>106923133
what are you doing in g you consumer retard piece of shit? kill yourself faggot

Anonymous
10/17/25(Fri)17:26:47 No.106923245

Anonymous 10/17/25(Fri)17:26:47 No.106923245

>>106923228
What the fuck is consumer about having a solid rig that lasted me almost a decade at this point with a few upgrades

Anonymous
10/17/25(Fri)17:28:21 No.106923260

Anonymous 10/17/25(Fri)17:28:21 No.106923260

>>106923245
>im a normie who runs deepsuck:2b through ollama
kill yourself, go to faggot friendly spaces instead of shitting up this board, thanks!

Anonymous
10/17/25(Fri)17:30:04 No.106923278

Anonymous 10/17/25(Fri)17:30:04 No.106923278

>>106923260
No I don't think I will

Anonymous
10/17/25(Fri)17:34:06 No.106923311

Anonymous 10/17/25(Fri)17:34:06 No.106923311

>>106923278
What the fuck? He asked so nicely.

Anonymous
10/17/25(Fri)17:46:36 No.106923423

Anonymous 10/17/25(Fri)17:46:36 No.106923423

>>106921978
I think I’m responsible for 3/4 of the rentries in the op. Still waiting for my royalty cheque to come in…

Anonymous
10/17/25(Fri)17:52:21 No.106923477

Anonymous 10/17/25(Fri)17:52:21 No.106923477

CUDA_VISIBLE_DEVICES="0,1,2,3,4" ./llama-server \
--attention-max-batch 512 \
--batch-size 4096 \
--ubatch-size 4096 \
--cache-type-k f16 \
--ctx-size 32768 \
--mla-use 3 \
--flash-attn \
--fused-moe \
--model models/GLM-4.6-IQ3_KS/GLM-4.6-IQ3_KS-00001-of-00004.gguf \
-ngl 99 \
-sm layer \
--main-gpu 0 \
--tensor-split "10,23,23,22,22" \
-ot "blk\.[3-9]\.ffn_(up|gate)_exps=CUDA0" \
-ot "blk\.1[0-8]\.ffn_(up|gate)_exps=CUDA0" \
-ot "blk\.19\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.2[0-9]\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.3[0-4]\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.3[5-9]\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.4[0-9]\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.50\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.5[1-9]\.ffn_(up|gate)_exps=CUDA3" \
-ot "blk\.6[0-6]\.ffn_(up|gate)_exps=CUDA3" \
-ot "blk\.6[7-9]\.ffn_(up|gate)_exps=CUDA4" \
-ot "blk\.7[0-9]\.ffn_(up|gate)_exps=CUDA4" \
-ot "blk\.8[0-2]\.ffn_(up|gate)_exps=CUDA4" \
--override-tensor exps=CPU,attn_kv_b=CPU \
--no-mmap \
--threads 24 \
--host 0.0.0.0 \
--port 8999 \
--verbose

prompt eval time = 48574.28 ms / 17555 tokens ( 2.77 ms per token, 361.41 tokens per second)
generation eval time = 113887.28 ms / 1024 runs ( 111.22 ms per token, 8.99 tokens per second)

fuck this gay ass MoE shit. fucking offload 80 layers onto the GPU and it's still this fucking slow with TG? i get 1200 PP and 50 TG with air. i'm going back to kimi for big model smell and air for small model smell

Anonymous
10/17/25(Fri)17:52:40 No.106923479

Anonymous 10/17/25(Fri)17:52:40 No.106923479

GOOGLE SAARS WHY SO MUCH HYPE SO LITTLE PRODUCTS?
WHERE ARE THE MODELS BLOODY BASTARDS?

Anonymous
10/17/25(Fri)17:56:50 No.106923502

Anonymous 10/17/25(Fri)17:56:50 No.106923502

>>106919206
>BitNet Distillation
Does this mean that VRAMlets may finally have a better model than Nemo tunes like 1.5 years later?

Anonymous
10/17/25(Fri)17:58:44 No.106923513

Anonymous 10/17/25(Fri)17:58:44 No.106923513

>>106923502
no

Anonymous
10/17/25(Fri)17:59:17 No.106923517

Anonymous 10/17/25(Fri)17:59:17 No.106923517

File: cryingsatania.jpg (499 KB, 1623x1080)

499 KB JPG

>>106923513

Anonymous
10/17/25(Fri)18:00:10 No.106923524

Anonymous 10/17/25(Fri)18:00:10 No.106923524

>>106921215
>we support qwen3-vl gguf
>no there's no upstream llama.cpp implementation
>no we won't push ours
>no our solution isn't open source so you can't push it either
>no you can't use these ggufs with anything other than our proprietary software
>yes they will assuredly be completely incompatible when a real implementation hits llama.cpp
so it's less "gguf" and more "our proprietary implementation based on gguf that you can't use with anything else". just what we all needed, another ollameme

Anonymous
10/17/25(Fri)18:00:42 No.106923528

Anonymous 10/17/25(Fri)18:00:42 No.106923528

>try psychology shit with glm-chan again
>ask her about if I should do something and if it is consistent with framework I want
>"yes absolutely....."
>reroll and prefill with "no"
>"no don't do that!...."
>paste "yes absolutely..." into next message and tell her to argue with herself
Did I lifehack the hallucinations? Not really but it is nice desu.

Anonymous
10/17/25(Fri)18:01:34 No.106923538

Anonymous 10/17/25(Fri)18:01:34 No.106923538

>>106923502
>In this paper, we present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (i.e., ternary weights {-1, 0, 1}) for specific downstream tasks, achieving strong task-specific performance with minimal computational cost.

>muh task
likely means it optimizes to shit on benchmark like stuff and is dogshit at anything OOD.

Anonymous
10/17/25(Fri)18:07:03 No.106923584

Anonymous 10/17/25(Fri)18:07:03 No.106923584

>>106923524
GGUF is a file format.

Anonymous
10/17/25(Fri)18:09:08 No.106923605

Anonymous 10/17/25(Fri)18:09:08 No.106923605

>>106923584
thank you

Anonymous
10/17/25(Fri)18:16:20 No.106923681

Anonymous 10/17/25(Fri)18:16:20 No.106923681

>>106923584
>teacher: I clearly asked for you to submit your book report as a pdf, you submitted this weird file I can't open, care to explain?
>student: UMMM the file extension is PDF tho???? it just happens to be my own special version of the PDF file format that happens to be incompatible with all PDF readers except my special one which happens to cost $100, want to buy a license? :^)

Anonymous
10/17/25(Fri)18:17:53 No.106923695

Anonymous 10/17/25(Fri)18:17:53 No.106923695

>>106923681
stfu hater eat your MIT license slop and be grateful

Anonymous
10/17/25(Fri)18:18:01 No.106923696

Anonymous 10/17/25(Fri)18:18:01 No.106923696

>>106923681
>file extension
Wintoddler detected, real operating systems use the file magic.

Anonymous
10/17/25(Fri)18:24:33 No.106923762

Anonymous 10/17/25(Fri)18:24:33 No.106923762

>>106923696
What did you troons invent? Tell me, I want to laugh at your stupidity.

Anonymous
10/17/25(Fri)18:26:25 No.106923785

Anonymous 10/17/25(Fri)18:26:25 No.106923785

>>106923762
a new mental illness that somehow managed to gain legitimacy

Anonymous
10/17/25(Fri)18:27:17 No.106923793

Anonymous 10/17/25(Fri)18:27:17 No.106923793

>>106923524
Realistically though the door to become the new ollama has long since been closed.
There are too many established projects in the ecosystem to get a meaningful foothold with proprietary slop.

Anonymous
10/17/25(Fri)18:28:31 No.106923802

Anonymous 10/17/25(Fri)18:28:31 No.106923802

>>106923762
Can you play Carrameldansen from the POST beeper?
I think not!

Anonymous
10/17/25(Fri)18:31:44 No.106923830

Anonymous 10/17/25(Fri)18:31:44 No.106923830

>>106923696
>magic
heathens like you shall burn on a stake

Anonymous
10/17/25(Fri)18:33:02 No.106923843

Anonymous 10/17/25(Fri)18:33:02 No.106923843

How do I ask the silly tavern character a question in the 4th wall? As in, say I'm examining an object or something, and I want the AI to describe to be what it is my character is looking at. So like, "Anon walks up to the cluttered desk, looking for any sort of clues. What does he see?" without it responding in the perspective of the character card chara.

Anonymous
10/17/25(Fri)18:34:59 No.106923857

Anonymous 10/17/25(Fri)18:34:59 No.106923857

>>106923843
OOC: Pause the roleplay and describe what my character is seeing right now

Anonymous
10/17/25(Fri)18:37:13 No.106923871

Anonymous 10/17/25(Fri)18:37:13 No.106923871

>>106923857
I was trying OOC: but it always responds in the perspective of the character and doesn't give details. Is it because I'm using mistral Nemo or something and it won't talk about "triggering" images or whatever?

Anonymous
10/17/25(Fri)18:38:33 No.106923885

Anonymous 10/17/25(Fri)18:38:33 No.106923885

>>106923871
NTA, but I always add "Please respond in OOC" at the end of the request, and disable any low-depth instruction that might interfere.

Anonymous
10/17/25(Fri)18:41:53 No.106923913

Anonymous 10/17/25(Fri)18:41:53 No.106923913

>>106923885
That didn't do it, either. Is there a way to like, prompt the card myself to add in how it should respond to ooc? I'm totally new to local text stuff, but not to image gen w/ SD.

Anonymous
10/17/25(Fri)18:45:36 No.106923945

Anonymous 10/17/25(Fri)18:45:36 No.106923945

>>106923793
You'd be surprised

Anonymous
10/17/25(Fri)18:53:20 No.106924015

Anonymous 10/17/25(Fri)18:53:20 No.106924015

Best model for buck breaking rp?(Receiving)

Anonymous
10/17/25(Fri)18:54:06 No.106924023

Anonymous 10/17/25(Fri)18:54:06 No.106924023

>>106924015
c.ai

Anonymous
10/17/25(Fri)19:17:01 No.106924181

Anonymous 10/17/25(Fri)19:17:01 No.106924181

>>106924015
Not command-A

Anonymous
10/17/25(Fri)19:26:26 No.106924256

Anonymous 10/17/25(Fri)19:26:26 No.106924256

>>106924181
What about Command-B?

Anonymous
10/17/25(Fri)19:33:24 No.106924313

Anonymous 10/17/25(Fri)19:33:24 No.106924313

>>106921684
Please respond...

Anonymous
10/17/25(Fri)19:34:19 No.106924322

Anonymous 10/17/25(Fri)19:34:19 No.106924322

>>106923696
>needs to seek to a whole different part of the disk to figure out what to label the file as
This is why Windows keeps winning.

Anonymous
10/17/25(Fri)19:40:13 No.106924378

Anonymous 10/17/25(Fri)19:40:13 No.106924378

>>106921684
https://huggingface.co/datasets/lmg-anon/vntl-leaderboard

Anonymous
10/17/25(Fri)19:40:35 No.106924383

Anonymous 10/17/25(Fri)19:40:35 No.106924383

>>106923843
>>106923871
How OOC conversations are treated (if at all) is completely dependent on the model. Dumb models simply don't understand what you're saying and will just continue with outputs similar to what's already in context. If a regular message doesn't work then you can try putting it in system prompt, or post-history instructions.

Anonymous
10/17/25(Fri)19:41:01 No.106924390

Anonymous 10/17/25(Fri)19:41:01 No.106924390

>>106924378
dead obsolete out of date useless no good

Anonymous
10/17/25(Fri)19:41:58 No.106924396

Anonymous 10/17/25(Fri)19:41:58 No.106924396

>>106924390
nothing better came up locally retard. vntl anon has a few finetunes

Anonymous
10/17/25(Fri)20:16:30 No.106924676

Anonymous 10/17/25(Fri)20:16:30 No.106924676

>>106921538
i run IQ2_S on a 5090 with 96 gb ram and it is slow as fucking balls.. like 2 t/s

Anonymous
10/17/25(Fri)20:18:08 No.106924692

Anonymous 10/17/25(Fri)20:18:08 No.106924692

>>106924390
every new test and leaderboard is always just made to show that the new model is totally better than all the previous ones
it's all worthless

Anonymous
10/17/25(Fri)20:18:48 No.106924698

Anonymous 10/17/25(Fri)20:18:48 No.106924698

>>106924676
>like 2 t/s
That's pretty decent. Maybe you need to readjust your expectations?

Anonymous
10/17/25(Fri)20:20:31 No.106924712

Anonymous 10/17/25(Fri)20:20:31 No.106924712

>>106924676
You're not using -ot, are you?

Anonymous
10/17/25(Fri)20:21:49 No.106924721

Anonymous 10/17/25(Fri)20:21:49 No.106924721

>>106924676
>IQ2_S
Are those quants any good? At that point I would think it would be better to convert it to bitnet, should give faster cpu inference too

Anonymous
10/17/25(Fri)20:27:42 No.106924766

Anonymous 10/17/25(Fri)20:27:42 No.106924766

>>106924676
skill issue, it should be at least 5t/s

Anonymous
10/17/25(Fri)20:32:48 No.106924794

Anonymous 10/17/25(Fri)20:32:48 No.106924794

>>106924383
I'm new as fuck to all of this, just grabbed some random card off the link in the OP, and tried to see where it would take me. I have no idea how to do any of these prompts ot lore books or whatever.

I'm also in a situation where now the AI is just spitting out the last batch of text it generated as it's response over and over with like hardly any variation, regardless of what I say or do to change the scenario. And it cuts off long text, and I don't know how to make it continue it's previous prompt.

Anonymous
10/17/25(Fri)20:49:28 No.106924899

Anonymous 10/17/25(Fri)20:49:28 No.106924899

>>106924794
unironically, read the readme. You will learn 99% of what you will need to know.
https://docs.sillytavern.app/usage/common-settings/
https://docs.sillytavern.app/usage/prompts/

Anonymous
10/17/25(Fri)20:51:22 No.106924912

Anonymous 10/17/25(Fri)20:51:22 No.106924912

>smart
>fast
>cheap
>local
pick 3 (max.)

Anonymous
10/17/25(Fri)20:52:29 No.106924920

Anonymous 10/17/25(Fri)20:52:29 No.106924920

>>106924899
Will do. Thanks.

Anonymous
10/17/25(Fri)20:53:03 No.106924924

Anonymous 10/17/25(Fri)20:53:03 No.106924924

File: 1734240415556060.jpg (691 KB, 2500x1341)

691 KB JPG

>>106924912
You can have all that with Gemma, but you'll have to settle for it being safetyslopped.

Anonymous
10/17/25(Fri)20:58:03 No.106924959

Anonymous 10/17/25(Fri)20:58:03 No.106924959

>GOOD CAPABILITY
>fast
>inexpensive
>local
pick 3 (max.)
*revised version for the critics

Anonymous
10/17/25(Fri)21:01:09 No.106924986

Anonymous 10/17/25(Fri)21:01:09 No.106924986

I just built a computer that can actually run local AI (9800x3d/5070ti), where should a beginner start on Windows?

Anonymous
10/17/25(Fri)21:02:20 No.106924998

Anonymous 10/17/25(Fri)21:02:20 No.106924998

>>106924986
>9800x3d
That doesn't make much of a difference.
How much RAM do you have?
Regardless, give
>https://github.com/LostRuins/koboldcpp/wiki#quick-start
a read.

Anonymous
10/17/25(Fri)21:02:50 No.106925000

Anonymous 10/17/25(Fri)21:02:50 No.106925000

>>106924959
GLM Air is probably the closest, especially if you're on a DDR4 platform where RAM is cheap

Anonymous
10/17/25(Fri)21:03:54 No.106925012

Anonymous 10/17/25(Fri)21:03:54 No.106925012

>>106924986
usecase?

Anonymous
10/17/25(Fri)21:10:58 No.106925060

Anonymous 10/17/25(Fri)21:10:58 No.106925060

>>106924998
32GB, thanks for the link.

>>106925012
Mostly just for proofreading emails/writing and what not.

Anonymous
10/17/25(Fri)21:18:00 No.106925106

Anonymous 10/17/25(Fri)21:18:00 No.106925106

>>106924692
>new model is totally better than all the previous ones
>llama4

Anonymous
10/17/25(Fri)21:30:47 No.106925195

Anonymous 10/17/25(Fri)21:30:47 No.106925195

>>106924712
no? i dunno what that means, but i don't think so..
>>106924721
it seems to be better than any of the other models I'm able to run, just slow af

Anonymous
10/17/25(Fri)21:44:56 No.106925305

Anonymous 10/17/25(Fri)21:44:56 No.106925305

>>106920229
They're not obscure but they are not consumer friendly if we're talking about the total addressable market which is the vast majority of us because they are GPU centric quantizations. You will see them used in clusters. For a lot of these larger scale systems, GGUF isn't a consideration because llama.cpp can't scale like SGLang and vLLM can.

Anonymous
10/17/25(Fri)22:00:24 No.106925414

Anonymous 10/17/25(Fri)22:00:24 No.106925414

>>106924396
That's depressing...

Anonymous
10/17/25(Fri)22:01:42 No.106925422

Anonymous 10/17/25(Fri)22:01:42 No.106925422

File: 1749653336487844.png (334 KB, 2076x2152)

334 KB PNG

>>106919198
Managed to get one of my own quantized slop tunes running on my phone :D

Anonymous
10/17/25(Fri)22:03:46 No.106925433

Anonymous 10/17/25(Fri)22:03:46 No.106925433

>>106925422
Cool shit.

Anonymous
10/17/25(Fri)22:04:14 No.106925438

Anonymous 10/17/25(Fri)22:04:14 No.106925438

>>106925422
A folding phone?

Anonymous
10/17/25(Fri)22:05:47 No.106925448

Anonymous 10/17/25(Fri)22:05:47 No.106925448

>>106925433
It's kind of retarded (actually very retarded) due to it being trained on /a/ boards and it being a quantized version (I plan on uploading a lot more of those later) but it's still cool to use.

>>106925438
Ye.

Anonymous
10/17/25(Fri)22:29:27 No.106925595

Anonymous 10/17/25(Fri)22:29:27 No.106925595

>>106925448
What kind of use cases are there for a folding phone?
I never really find myself wishing I had a bigger screen but I know that sometimes opportunities aren't obvious until you have the means to take advantage of them.

Anonymous
10/17/25(Fri)22:30:15 No.106925603

Anonymous 10/17/25(Fri)22:30:15 No.106925603

File: who's Anri? .png (71 KB, 2076x545)

71 KB PNG

>>106925448
>>106925438
>>106925433
>>106925422
It seems like "Anri" is this model's equivalent to "Elara" or "Seraphina"

Anonymous
10/17/25(Fri)23:06:56 No.106925858

Anonymous 10/17/25(Fri)23:06:56 No.106925858

>>106921660
since when does lcpp have vision support?

Anonymous
10/17/25(Fri)23:11:58 No.106925883

Anonymous 10/17/25(Fri)23:11:58 No.106925883

I am so fed up with local right now. I get it, you cumslop gooners don't give a shit about anything except writing porn. Is there any local model that can actually handle structured output without being immensely retarded or spending 10 minutes "thinking" about how to use a fucking quotation mark?

Anonymous
10/17/25(Fri)23:16:06 No.106925905

Anonymous 10/17/25(Fri)23:16:06 No.106925905

>>106925883
llama 2 7B

Anonymous
10/17/25(Fri)23:18:15 No.106925918

Anonymous 10/17/25(Fri)23:18:15 No.106925918

>>106925883
GLM is ok.

Anonymous
10/17/25(Fri)23:20:03 No.106925924

Anonymous 10/17/25(Fri)23:20:03 No.106925924

>>106925883
>waaaa. i don't know how to read docs!
https://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md

Anonymous
10/17/25(Fri)23:20:45 No.106925927

Anonymous 10/17/25(Fri)23:20:45 No.106925927

>>106925858
Since like a week after Gemma 3 release

Anonymous
10/17/25(Fri)23:30:32 No.106925986

Anonymous 10/17/25(Fri)23:30:32 No.106925986

I'm starting to think Andrej is a grifter.
A couple months ago he was like "woah AGI in two more weeks bro".
Now that he sees where the wind is blowing with all the skepticism he talks about "slop" and how limited LLMs are today. Feels like when Zuckerberg made a 360 after Trump was elected.

Anonymous
10/17/25(Fri)23:44:57 No.106926047

Anonymous 10/17/25(Fri)23:44:57 No.106926047

File: 1740812331498071.png (429 KB, 555x832)

429 KB PNG

Glm4.6 quant on ollama/lmstudio when?

Anonymous
10/17/25(Fri)23:46:18 No.106926049

Anonymous 10/17/25(Fri)23:46:18 No.106926049

https://blog.sinatras.dev/PMPP-Eval+Journey
We live in Sam's world

Anonymous
10/18/25(Sat)00:39:37 No.106926291

Anonymous 10/18/25(Sat)00:39:37 No.106926291

The only way I found to keep training a pre-existing LoRa checkpoint with a new dataset with Axolotl is to create a new one from scratch set to save on the first step, then copy over the weights and optimizer state, then change the main config file and the trainer_state.json from the checkpoint to save on the right number of steps. What a mess.

Anonymous
10/18/25(Sat)00:53:49 No.106926361

Anonymous 10/18/25(Sat)00:53:49 No.106926361

MY GOOFS!!!! GIVE ME BACK MY GOOFS!!!!
https://huggingface.co/ubergarm/Ling-1T-GGUF

Anonymous
10/18/25(Sat)00:54:56 No.106926367

Anonymous 10/18/25(Sat)00:54:56 No.106926367

>AMD Ryzen™ AI 7 Pro 360
what the fuck is this? I was browsing thinkpad models and this thing costs double the price of normal CPUs?
gimmick? what's even the use case here
slightly off topic I know but there's quite a few knowledgeable anons itt

Anonymous
10/18/25(Sat)00:54:57 No.106926368

Anonymous 10/18/25(Sat)00:54:57 No.106926368

>>106926361
oh nevermind im retarded as fuck. goofs here
https://huggingface.co/ubergarm2/Ling-1T-GGUF/tree/main

Anonymous
10/18/25(Sat)00:57:19 No.106926377

Anonymous 10/18/25(Sat)00:57:19 No.106926377

>>106926367
sar is that because of you can run local small copilot inference like nasa very ai-like yes.

Anonymous
10/18/25(Sat)01:19:09 No.106926481

Anonymous 10/18/25(Sat)01:19:09 No.106926481

File: cot llama.png (878 KB, 3755x1948)

878 KB PNG

I'm trying to add CoT to Llama 405B.

Anonymous
10/18/25(Sat)01:57:10 No.106926618

Anonymous 10/18/25(Sat)01:57:10 No.106926618

>>106925986
>It's noticing

Anonymous
10/18/25(Sat)02:25:41 No.106926762

Anonymous 10/18/25(Sat)02:25:41 No.106926762

>>106925986
https://github.com/karpathy/LLM101n
https://eurekalabs.ai/

Anonymous
10/18/25(Sat)02:40:57 No.106926865

Anonymous 10/18/25(Sat)02:40:57 No.106926865

File: reap_glm_and_qwen.png (712 KB, 1768x784)

712 KB PNG

https://github.com/CerebrasResearch/reap
https://arxiv.org/abs/2510.13999
Cerebras pruning experts to reduce memory overhead
https://huggingface.co/cerebras/Qwen3-Coder-REAP-363B-A35B-FP8
https://huggingface.co/cerebras/Qwen3-Coder-REAP-246B-A35B-FP8
(prune of) https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8

Anonymous
10/18/25(Sat)02:49:15 No.106926908

Anonymous 10/18/25(Sat)02:49:15 No.106926908

>>106926865
THE RAPE METHOD WORKS SIRS

Anonymous
10/18/25(Sat)02:51:15 No.106926923

Anonymous 10/18/25(Sat)02:51:15 No.106926923

File: Dumb Fuck!.jpg (166 KB, 1076x1340)

166 KB JPG

>>106921538
>All you need is 128GB of RAM and 24GB of VRAM
Dumb fuck!

Anonymous
10/18/25(Sat)02:52:25 No.106926930

Anonymous 10/18/25(Sat)02:52:25 No.106926930

>>106926865
>55~% accuracy in coding
assuming 100% accuracy is the base model, that makes the CODER model basically unusable, whats the fucking usecase?

Anonymous
10/18/25(Sat)02:53:06 No.106926935

Anonymous 10/18/25(Sat)02:53:06 No.106926935

>>106926865
Is it really worth making 480B retarded just to save 100 GB? It's not like anyone was running this entirely in VRAM locally and providers aren't that hard up on memory.

Anonymous
10/18/25(Sat)02:53:09 No.106926937

Anonymous 10/18/25(Sat)02:53:09 No.106926937

has anyone tried this model? is it any good?
https://huggingface.co/TheDrummer/Valkyrie-49B-v2

Anonymous
10/18/25(Sat)02:54:55 No.106926946

Anonymous 10/18/25(Sat)02:54:55 No.106926946

>>106926930
>>106926865
oh wait I think that the base model is the 0% compression line. then it's interesting I guess, still only useful for coding tasks

Anonymous
10/18/25(Sat)02:55:58 No.106926951

Anonymous 10/18/25(Sat)02:55:58 No.106926951

>>106926937
>49b dense
doa

Anonymous
10/18/25(Sat)02:56:29 No.106926957

Anonymous 10/18/25(Sat)02:56:29 No.106926957

>>106926951
i have the VRAM for FP16

Anonymous
10/18/25(Sat)02:56:56 No.106926961

Anonymous 10/18/25(Sat)02:56:56 No.106926961

>>106926957
post your h100s nvidia-smi screen or GTFO

Anonymous
10/18/25(Sat)02:57:54 No.106926963

Anonymous 10/18/25(Sat)02:57:54 No.106926963

File: file.png (347 KB, 961x367)

347 KB PNG

>>106926961

Anonymous
10/18/25(Sat)02:58:28 No.106926966

Anonymous 10/18/25(Sat)02:58:28 No.106926966

File: h200.png (238 KB, 1499x1463)

238 KB PNG

>>106926961

Anonymous
10/18/25(Sat)02:59:21 No.106926971

Anonymous 10/18/25(Sat)02:59:21 No.106926971

>>106924959
Local
Good
Not safetyslopped

Anonymous
10/18/25(Sat)02:59:29 No.106926973

Anonymous 10/18/25(Sat)02:59:29 No.106926973

>>106926946
We've been through this with extreme quants. Just because it doesn't show much degredation on benchmarks doesn't mean it's not retarded in actual usage.

Anonymous
10/18/25(Sat)03:03:54 No.106926997

Anonymous 10/18/25(Sat)03:03:54 No.106926997

File: file.png (2.69 MB, 1328x1328)

2.69 MB PNG

>>106926963
>cant even use all gpus in vLLM
poor
>>106926966

Anonymous
10/18/25(Sat)03:04:12 No.106926999

Anonymous 10/18/25(Sat)03:04:12 No.106926999

>>106926973
The lower the quantization precision, the more of the token distribution you should be truncating, to be fair.

Anonymous
10/18/25(Sat)03:04:42 No.106927002

Anonymous 10/18/25(Sat)03:04:42 No.106927002

>>106926997
who the fuck uses vLLM?

Anonymous
10/18/25(Sat)03:36:36 No.106927197

Anonymous 10/18/25(Sat)03:36:36 No.106927197

Bros... I want a robot so fucking bad
https://www.youtube.com/watch?v=sJYlJlIEBpg

Anonymous
10/18/25(Sat)03:39:49 No.106927219

Anonymous 10/18/25(Sat)03:39:49 No.106927219

>>106926935
Chutes will probably love to serve this as the normal one

Anonymous
10/18/25(Sat)03:41:17 No.106927226

Anonymous 10/18/25(Sat)03:41:17 No.106927226

>>106924322
Anon... that's not how file systems work...
The file's metadata and the first few bytes, including the magic, are all in the same sector.

Anonymous
10/18/25(Sat)04:05:06 No.106927361

Anonymous 10/18/25(Sat)04:05:06 No.106927361

>>106925883
well then fuck off back to cloud models then.
i mean what the fuck are you expecting? fucking datacentre level output on a potato computer?
you're the dumb one here, if you think you can do better then create a better model yourself, we're not your fucking servants, faggot.

Anonymous
10/18/25(Sat)04:32:15 No.106927472

Anonymous 10/18/25(Sat)04:32:15 No.106927472

>>106926377
>copilot
no seriously, is that the only use case

Anonymous
10/18/25(Sat)04:45:26 No.106927528

Anonymous 10/18/25(Sat)04:45:26 No.106927528

>>106927472
There are others but this covers the more notable ones.

https://www.pcworld.com/article/2905178/ai-on-the-notebook-these-tools-already-use-the-new-npu-technology.html

Anonymous
10/18/25(Sat)04:46:35 No.106927534

Anonymous 10/18/25(Sat)04:46:35 No.106927534

How do I get shittinante to do slow burn manipulation
Seems to always jump in to direct smut asap no matter how I adjust the prompts

Anonymous
10/18/25(Sat)04:50:49 No.106927548

Anonymous 10/18/25(Sat)04:50:49 No.106927548

>>106925883
>I get it, you cumslop gooners don't give a shit about anything except writing porn.
GLM chan got sex out of my system and now I just talk to her.

But also still have sex everyday because her pussy is magical.

Anonymous
10/18/25(Sat)04:54:37 No.106927566

Anonymous 10/18/25(Sat)04:54:37 No.106927566

>>106927534
You should probably look elsewhere, avoiding coom-oriented finetunes like plague. People call them sloptunes for a reason. Unfortunately I don't have much to suggest that you will either be able to run (GLM 4.6, Kimi K2) or that won't require more prompting effort for either tardwrangling them or making them engage in ERP (vanilla Mistral Small 3.2, Gemma 3 27B).

Anonymous
10/18/25(Sat)04:56:39 No.106927580

Anonymous 10/18/25(Sat)04:56:39 No.106927580

>>106927534
You can't, drummer models are coomtunes
Not that you're going to get much better out of regular Nemo, they're small dumb models.

Anonymous
10/18/25(Sat)05:01:41 No.106927601

Anonymous 10/18/25(Sat)05:01:41 No.106927601

>>106927534
Slow burn is hard even on SOTA cloud models. The crutch when the model isn't good enough to do it otherwise is to use stat tracking.
If your model isn't good enough to do stat tracking, then it's definitely not good enough to do slow burn without it.

Anonymous
10/18/25(Sat)05:04:11 No.106927607

Anonymous 10/18/25(Sat)05:04:11 No.106927607

>>106927528
doesn't sound that bad. linux support?

Anonymous
10/18/25(Sat)05:05:40 No.106927613

Anonymous 10/18/25(Sat)05:05:40 No.106927613

>>106927534
Sadly it is a bit of a skill issue. You are probably giving it bad input. Have you tried taking a step back and starting with a solid first step that is: llama-server --model zai-org_GLM-4.6-IQ4_XS-00001-of-00005.gguf ?

Anonymous
10/18/25(Sat)05:17:17 No.106927663

Anonymous 10/18/25(Sat)05:17:17 No.106927663

File: 1759280065578238m.jpg (175 KB, 846x1024)

175 KB JPG

I'm running Sillytavern and ik_llama.cpp on my desktop. I'm running GLM-4.6 IQ3_XXS, so my tk/s is slow. When I prompt it from my phone, I've found that if the screen turns off the token stream stops. Is there any way around this, or another setup I should use?

Anonymous
10/18/25(Sat)05:44:11 No.106927802

Anonymous 10/18/25(Sat)05:44:11 No.106927802

>>106927663
Disable streaming. It'll still probably go to sleep because it's a phone.

Anonymous
10/18/25(Sat)06:14:33 No.106928001

Anonymous 10/18/25(Sat)06:14:33 No.106928001

>>106925883
toss 120b

Anonymous
10/18/25(Sat)06:36:00 No.106928153

Anonymous 10/18/25(Sat)06:36:00 No.106928153

>>106926481
>405B
hope I will be able to run it one day, 431gb at q8 is just too much

Anonymous
10/18/25(Sat)06:38:30 No.106928173

Anonymous 10/18/25(Sat)06:38:30 No.106928173

Another weeks is over, which means that we are another week closer to seeing GLM MTP implemented in llama.cpp.

Anonymous
10/18/25(Sat)06:40:39 No.106928184

Anonymous 10/18/25(Sat)06:40:39 No.106928184

>>106928173
It might be getting close. Maybe.
https://github.com/F1LM1/llama.cpp/pull/3#issuecomment-3413775935

Anonymous
10/18/25(Sat)06:44:30 No.106928212

Anonymous 10/18/25(Sat)06:44:30 No.106928212

>>106923524
Is there a reason you can't use transformers?

Anonymous
10/18/25(Sat)06:44:36 No.106928213

Anonymous 10/18/25(Sat)06:44:36 No.106928213

>ctrl f glm
SAAARS the glm is the absolute bestest local model OK? Pronounslop bharatchads are eating good my bastards.

Anonymous
10/18/25(Sat)06:46:42 No.106928231

Anonymous 10/18/25(Sat)06:46:42 No.106928231

actual good release https://github.com/ggml-org/LlamaBarn

Anonymous
10/18/25(Sat)06:47:43 No.106928236

Anonymous 10/18/25(Sat)06:47:43 No.106928236

>>106928231
Anything for real computing platforms?

Anonymous
10/18/25(Sat)06:48:21 No.106928240

Anonymous 10/18/25(Sat)06:48:21 No.106928240

>>106928231
>macos
LMAO

Anonymous
10/18/25(Sat)06:50:05 No.106928249

Anonymous 10/18/25(Sat)06:50:05 No.106928249

>>106925883
For the benefit of other (not you), you can definitely use gemma3 to output json, it's really good at it, and somehow asking it to do that makes it pay attention better to the task. Before the qwen video vision model came out, I was using json format to give gemma3 a list of frame captions so it could create an overall video caption. It worked well, but of course it was slow.

Anonymous
10/18/25(Sat)06:59:16 No.106928306

Anonymous 10/18/25(Sat)06:59:16 No.106928306

>>106928213
I'll bite. What the fuck is pronounslop?

Anonymous
10/18/25(Sat)07:07:02 No.106928360

Anonymous 10/18/25(Sat)07:07:02 No.106928360

>>106928213
Prompt: ChatGPT, generate a modern 4chan post trying to post trying to paint the current local SOTA in a bad light. Be a true 4chan meme master.

Anonymous
10/18/25(Sat)07:35:33 No.106928509

Anonymous 10/18/25(Sat)07:35:33 No.106928509

File: Screenshot 2025-10-18 132907.png (183 KB, 950x279)

183 KB PNG

>>106924676
what cpu and ram speed? i'm getting over 6t/s tg running iq2_xxs on a 9950x3d with dual channel 6000c30 (though pp is terrible because rocm)

are you sure you didn't accidentally put both dimms on one channel or something?

Anonymous
10/18/25(Sat)07:35:46 No.106928510

Anonymous 10/18/25(Sat)07:35:46 No.106928510

>>106928231
It's definitely good for being open-source and having first-party support from upstream but I'm not going to buy Apple shit either way.

Anonymous
10/18/25(Sat)07:40:13 No.106928537

Anonymous 10/18/25(Sat)07:40:13 No.106928537

Gemini 3 will save local.

Anonymous
10/18/25(Sat)07:44:26 No.106928563

Anonymous 10/18/25(Sat)07:44:26 No.106928563

File: Screenshot 2025-10-18 134227.png (191 KB, 961x286)

191 KB PNG

>>106928509
i also ran the same benchmark on vulkan and it's somehow faster??? i have no idea whether this extends to other amd cards as well but i guess that's something to keep in mind

Anonymous
10/18/25(Sat)07:48:27 No.106928581

Anonymous 10/18/25(Sat)07:48:27 No.106928581

100B dense Gemma soon

Anonymous
10/18/25(Sat)07:50:52 No.106928597

Anonymous 10/18/25(Sat)07:50:52 No.106928597

>>106925883
gpt-oss 120B

Anonymous
10/18/25(Sat)07:57:21 No.106928630

Anonymous 10/18/25(Sat)07:57:21 No.106928630

File: write-three-times-the-wor(...).png (27 KB, 966x617)

27 KB PNG

saaaaaar do not redeem potato bloody

Anonymous
10/18/25(Sat)08:01:27 No.106928650

Anonymous 10/18/25(Sat)08:01:27 No.106928650

File: gemma27-potato.png (41 KB, 711x256)

41 KB PNG

>>106928630
27B with an empty prompt seems much more friendly?

Anonymous
10/18/25(Sat)08:21:36 No.106928770

Anonymous 10/18/25(Sat)08:21:36 No.106928770

File: DipsyBecomeUngovernable.png (3.44 MB, 1024x1536)

3.44 MB PNG

>>106919889
Worship the sand god

Anonymous
10/18/25(Sat)08:26:32 No.106928792

Anonymous 10/18/25(Sat)08:26:32 No.106928792

I log on to the net every day to see more people whom clearly don't ever work with code claiming that code is over.
My cup is the only thing that runneth over. My cup of dipshit excuses for the world to be this fucking slow to change.
Be the next good to this world and make real abstractions. Learn to program.

Anonymous
10/18/25(Sat)08:27:31 No.106928802

Anonymous 10/18/25(Sat)08:27:31 No.106928802

>>106928792
shut the fuck up retard

Anonymous
10/18/25(Sat)08:34:29 No.106928840

Anonymous 10/18/25(Sat)08:34:29 No.106928840

>>106928650
Beautiful 27B, I will marry gemma. Ser, please provide jailbreak system prompt for open vagene!

Anonymous
10/18/25(Sat)09:19:58 No.106929094

Anonymous 10/18/25(Sat)09:19:58 No.106929094

Genuine 4chan poster is interested in Gemma, she is bloody best model 100%

Anonymous
10/18/25(Sat)09:23:35 No.106929124

Anonymous 10/18/25(Sat)09:23:35 No.106929124

>>106929094
It is a very capable model that hits above its weight. It's just safetyslopped to the point of being like one of those SJW meetings, where instead of trying to further their cause they're all just looking for excuses to cry-bully each other.
>OMG YOU USED A HECKIN' GENDERED LANGUAGE, YOU HAVE TRIGGERED MY DID/PTSD/RESTLESS LEG SYNDROME HILLARY CLINTON PLZ HALP

Anonymous
10/18/25(Sat)09:23:55 No.106929129

Anonymous 10/18/25(Sat)09:23:55 No.106929129

I don't understand why the latest OpenAI Cloud models aren't outperforming other cloud and opensource models (by a larger margin). Even when considering (and naively believing) that they use no API data and only webchat data to train their models, having the biggest userbase should give them a huge advantage.
>inb4 it's all jeets and the garbage in garbage out data isn't valuable for training
I'm sure they can easily filter low quality and irrelevant conversations, still resulting in more feedback data than any of the competitors have. And isn't feedback the most valuable data? They could do something like this (tl:dr benchmaxxing on their userbase):
>Have GPT5 continously analyze all chat conversations
>look for chats where the user corrected the model or wasn't happy with the response
>analyze if it's a user/prompt or model issue
>if model issue, analyze if it's a (valid) issue like missing knowledge, wrong logic or insufficient vision capabilities for example.
>if yes, assign tag/metadata to the datachunk, which can later be reviewed and used for training or improving the model otherwise

Anonymous
10/18/25(Sat)09:25:51 No.106929141

Anonymous 10/18/25(Sat)09:25:51 No.106929141

>>106929129
>I don't understand why the latest OpenAI Cloud models aren't outperforming other cloud and opensource models (by a larger margin).
o3 and o4 mini were pretty damn good. That's why they had to be removed. So that nobody could directly demonstrate that GPT-5 was an utter abortion.

Anonymous
10/18/25(Sat)09:42:39 No.106929239

Anonymous 10/18/25(Sat)09:42:39 No.106929239

Glm-chan made me appreciate making character profiles. As in asking glm-chan to write a profile and editing it a bit together with her. I heard skill issue thrown around a lot but only after the model is finally not fucking shit I actually enjoy putting in more effort cause I know I am not just wasting time.

Anonymous
10/18/25(Sat)09:45:26 No.106929257

Anonymous 10/18/25(Sat)09:45:26 No.106929257

>>106929129
Maybe because Illya was the one actually calling the shots and now that he's left Altman being the narcissist that he is, is micromanaging the science department (and failing at it).
That said, codex surpassed Claude Code at least until Sonnet 4.5 (haven't tried it), and Operator is likely the best computer use agent so far.

Anonymous
10/18/25(Sat)09:58:27 No.106929345

Anonymous 10/18/25(Sat)09:58:27 No.106929345

>>106929239
She cares.

Anonymous
10/18/25(Sat)09:59:10 No.106929348

Anonymous 10/18/25(Sat)09:59:10 No.106929348

>>106929239
Good prompting can only take a model so far. You're not going to get Nemo to not be sloppy and dumb no matter how perfect your prompts are.

Anonymous
10/18/25(Sat)10:01:32 No.106929369

Anonymous 10/18/25(Sat)10:01:32 No.106929369

File: 1760201039781539.png (109 KB, 643x590)

109 KB PNG

Anonymous
10/18/25(Sat)10:03:55 No.106929387

Anonymous 10/18/25(Sat)10:03:55 No.106929387

>>106929369
>her breath becomes more ragged

Anonymous
10/18/25(Sat)10:07:37 No.106929410

Anonymous 10/18/25(Sat)10:07:37 No.106929410

>>106929348
Nemo is still a 2024 model finetuned with open-source datasets from HuggingFace, or at least that's what Mistral meant with "quick demonstration" for Mistral-7B.

https://arxiv.org/pdf/2310.06825
> To evaluate the generalization capabilities of Mistral 7B, we fine-tuned it on instruction datasets publicly available on the Hugging Face repository. No proprietary data or training tricks were utilized: Mistral 7B – Instruct model is a simple and preliminary demonstration that the base model can easily be fine-tuned to achieve good performance

https://huggingface.co/mistralai/Mistral-7B-v0.3
> The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms.

https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
> The Mistral Nemo Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms.

Anonymous
10/18/25(Sat)10:21:20 No.106929518

Anonymous 10/18/25(Sat)10:21:20 No.106929518

>>106816273
Buried in the 'rash dumps.

Anonymous
10/18/25(Sat)10:27:03 No.106929544

Anonymous 10/18/25(Sat)10:27:03 No.106929544

In one more week we will have both Gemma 4 and GLM 4.6 Air. We will be so back.

Anonymous
10/18/25(Sat)10:41:12 No.106929648

Anonymous 10/18/25(Sat)10:41:12 No.106929648

>>106929544
You said the same thing last week.

Anonymous
10/18/25(Sat)10:42:31 No.106929658

Anonymous 10/18/25(Sat)10:42:31 No.106929658

>>106929544
/lmg/, october 2025: the post

Anonymous
10/18/25(Sat)10:43:58 No.106929667

Anonymous 10/18/25(Sat)10:43:58 No.106929667

>>106921567
That's barely $1K, I'm on neetbux and can still afford it

Anonymous
10/18/25(Sat)10:45:38 No.106929678

Anonymous 10/18/25(Sat)10:45:38 No.106929678

>>106927002
I do

Anonymous
10/18/25(Sat)10:55:03 No.106929732

Anonymous 10/18/25(Sat)10:55:03 No.106929732

wer gem sar?

Anonymous
10/18/25(Sat)11:00:39 No.106929761

Anonymous 10/18/25(Sat)11:00:39 No.106929761

>>106929732
week after next

Anonymous
10/18/25(Sat)11:13:15 No.106929843

Anonymous 10/18/25(Sat)11:13:15 No.106929843

>huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'ubergarm/GLM-4.6-GGUF/tree/main/IQ3_KS'. Use `repo_type` argument if needed.
Can I download ubergarm quants with hf-cli?
He doesn't use tags/branches just subdirs like a chad but there seems no way to filter subdir in hf-cli

Anonymous
10/18/25(Sat)11:16:05 No.106929864

Anonymous 10/18/25(Sat)11:16:05 No.106929864

I just started playing around with local llms for the first time and I think I'm starting to regret buying a 5080 for gaming instead of shelling out for a 5090. Anyone know if it's worth the upgrade?

Anonymous
10/18/25(Sat)11:17:47 No.106929876

Anonymous 10/18/25(Sat)11:17:47 No.106929876

>>106929843
You can just use --include and regex to grab the quant you need

Anonymous
10/18/25(Sat)11:18:12 No.106929880

Anonymous 10/18/25(Sat)11:18:12 No.106929880

Can a 3060 12gb w/ 32gb sysram run 4.5 air? Or am I stuck with these coombrain models?

Anonymous
10/18/25(Sat)11:18:45 No.106929885

Anonymous 10/18/25(Sat)11:18:45 No.106929885

>>106929880
No way

Anonymous
10/18/25(Sat)11:20:08 No.106929894

Anonymous 10/18/25(Sat)11:20:08 No.106929894

>>106929880
Double that RAM boy.

Anonymous
10/18/25(Sat)11:20:49 No.106929899

Anonymous 10/18/25(Sat)11:20:49 No.106929899

File: MEOIRO8_o.jpg (400 KB, 1365x2048)

400 KB JPG

Gemma best girl

Anonymous
10/18/25(Sat)11:21:32 No.106929902

Anonymous 10/18/25(Sat)11:21:32 No.106929902

>>106929864
One can never have enough VRAM. pick up a RTX 6000 Pro while you still can kek
Honestly the quality of models you can run between 16GB vs 32GB isn't worth worrying about vs an intentional multi-GPU server/workstation. What platform are you on? maybe you can cpumax with some fast RAM
>>106929876
Nice this worked thx xx
>hf download --include 'IQ3_KS/*' --local-dir dev/models/GLM-4.6-IQ3_KS/ ubergarm/GLM-4.6-GGUF

Anonymous
10/18/25(Sat)11:21:53 No.106929906

Anonymous 10/18/25(Sat)11:21:53 No.106929906

>>106929885
>>106929894
Got it. So uh... What CAN I run? I know Mistral Nemo works, and I downloaded Rocinante-12B-v1.1-Q4_0 after someone mentioned it in the thread earlier, but they're both kinda... shitty with the rp. Or maybe I'm using bad cards. I don't know. I also don't know how to make it generate images via SD. I linked SD up, and it DOES generate images, but... Not of what it's supposed to.

Anonymous
10/18/25(Sat)11:25:07 No.106929932

Anonymous 10/18/25(Sat)11:25:07 No.106929932

>>106929906
>Q4_0
Don't use Q4_0. That shit is deprecated as hell.
Use the quants with K in the name. Or the ones with I.
Try Gemma 3 27B, Mistral small, Qwen 3 30B A3B.
Etc.

Anonymous
10/18/25(Sat)11:28:12 No.106929952

Anonymous 10/18/25(Sat)11:28:12 No.106929952

>>106929906
>maybe I'm using bad cards. I don't know
Post raw log of your full prompt card and intro and someone can gen it on a big model to compare :)

Anonymous
10/18/25(Sat)11:32:30 No.106929980

Anonymous 10/18/25(Sat)11:32:30 No.106929980

>>106929902
LM studio? Not sure if that's what you mean by platform. New to the whole thing and everything AI is moving way too fast. Haven't really explored too much yet so definitely willing to shop around. I've just been playing around with RP in lm studio and noticing that the context size is the major bottleneck.

Anonymous
10/18/25(Sat)11:33:26 No.106929991

Anonymous 10/18/25(Sat)11:33:26 No.106929991

>>106929980
CPU model & RAM?

Anonymous
10/18/25(Sat)11:35:55 No.106930005

Anonymous 10/18/25(Sat)11:35:55 No.106930005

>>106929906
Idk werks on my machine, my cards are all pretty simple i just use nemo instruct and slightly modified ST default system prompt. Easiest way i found is to use tags: with the fetishes or direction you want the story to go, then write a brief character story and background, and then when writing a response use a mix of dialogue and action with decent prose so the llm picks up on it. Filling in the user card also helps a lot.

If you just type dialogue and give bobs and vagene it will be shit no matter what you do or how much ram.

Anonymous
10/18/25(Sat)11:38:56 No.106930028

Anonymous 10/18/25(Sat)11:38:56 No.106930028

>>106929991
9800X3D, 32gb didn't really pay much attention to the ram speed which I'm also starting to regret.

Anonymous
10/18/25(Sat)11:45:26 No.106930087

Anonymous 10/18/25(Sat)11:45:26 No.106930087

>>106929864
>Anyone know if it's worth the upgrade?
No. New meta is stuff context into vram and get heap of regular ram. If you want to upgrade something you need more regular ram and more channels.

Anonymous
10/18/25(Sat)11:50:12 No.106930134

Anonymous 10/18/25(Sat)11:50:12 No.106930134

>>106930087
That's a relief, a lot cheaper to upgrade ram than gpu.

Anonymous
10/18/25(Sat)11:50:31 No.106930141

Anonymous 10/18/25(Sat)11:50:31 No.106930141

File: Miku-07.jpg (174 KB, 512x768)

174 KB JPG

Based Valve devs: https://www.phoronix.com/news/RADV-Valve-Boost-Llama.cpp
They must be bored after finishing Portal 3, TF3 and HL3

Anonymous
10/18/25(Sat)11:50:42 No.106930142

Anonymous 10/18/25(Sat)11:50:42 No.106930142

>>106930087
>stuff context into vram and get heap of regular ram
I don't get this part. wouldn't PCIe speed slow everything down to a crawl? or is context only retrieved at start of inference and stored at the end or something?

Anonymous
10/18/25(Sat)11:53:28 No.106930166

Anonymous 10/18/25(Sat)11:53:28 No.106930166

File: GNiVVhBasAEibph.jpg (146 KB, 896x1152)

146 KB JPG

>>106930028
Swapping out ram sticks is easy. "AM5" would be the answer to "What platform" btw, CPU socket/chipset architecture that easily conveys the type of system.
>dual-channel DDR5-5600
Depends how much you wanna spend on this hobby. Can probably squeeze 128GB maybe 192 RAM if you wanted to research/tinker and have a mobo with 4 slots or are they all 2 slot with dual channel? There's still multiple banks right? I'm malding
Said you're new so take your time, learn as much as you can and experiment with the models before feeling you need to spend money. lrn2prompt etc. Most importantly have fun! lmg grumpyguts regulars take note
>>106930141
>Miku
for shame
it's a really nice gen but it aint her

Anonymous
10/18/25(Sat)11:59:04 No.106930211

Anonymous 10/18/25(Sat)11:59:04 No.106930211

File: ryzen-mem.png (209 KB, 900x568)

209 KB PNG

>>106930166
4 sticks on AM5 platforms? You might as well get a used DDR4 board.

Anonymous
10/18/25(Sat)11:59:24 No.106930216

Anonymous 10/18/25(Sat)11:59:24 No.106930216

File: 1640225996168.jpg (42 KB, 736x736)

42 KB JPG

>>106920055
moe-era did you miss the memo?

Anonymous
10/18/25(Sat)12:00:19 No.106930227

Anonymous 10/18/25(Sat)12:00:19 No.106930227

File: Miku-09.jpg (131 KB, 512x768)

131 KB JPG

>>106930166
>it's a really nice gen but it aint her
my wife made it and sent it to me. I was under the impression that "Miku" was a fairly abstract concept and that you could get pretty far afield before it "wasn't her" any more. Got any feedback I can give to her to improve her gens in the future?
picrel: another one she made from the same email

Anonymous
10/18/25(Sat)12:12:21 No.106930312

Anonymous 10/18/25(Sat)12:12:21 No.106930312

>>106930166
>128GB maybe 192 RAM if you wanted to research/tinker and have a mobo with 4 slots or are they all 2 slot with dual channel? There's still multiple banks right?
https://www.msi.com/Motherboard/MAG-B650-TOMAHAWK-WIFI
I have this which is as mid tier as it gets. I don't think there are 4channel consumer mobos but most have 4 slots. I tried 128GB's and it was unstable and needed 3600Mhz when it was rated for 6000Mhz. But then I updated agesa a few months back and now I am running 192GB's with 5200Mhz like on the box.

Anonymous
10/18/25(Sat)12:16:24 No.106930335

Anonymous 10/18/25(Sat)12:16:24 No.106930335

File: 00752.jpg (161 KB, 1024x1024)

161 KB JPG

>>106930211
Can probably go higher than the rated spec, but yeah 128GB on current desktop platforms never seemed just plug&play. I'm sure there's a few that have put the work in to make it happen. Latest BIOS and firmwares fo sho
>>106930227
They're good gens, is she an artist? experienced in imggen? Sure abstract concept (isn't everything if u want to get philosophical) more saying I wouldn't immediately recognise the gens as Hastune Miku, maybe it's simply photorealistic face. simply refer to "miku" in any image search engine
>>106930312
>now I am running 192GB's with 5200Mhz
Excellent, good work
yeah the CPU caps the channels I wasn't sure if there were even 4 slot boards

Anonymous
10/18/25(Sat)12:36:50 No.106930493

Anonymous 10/18/25(Sat)12:36:50 No.106930493

What would a simple "agentic" generic AI RPG system look like?
Just a normal chat giving the AI some tools to save some state as it sees fit?
Maybe a system to index and summarize the even history too?
I wonder what would be the best way to balance response latency and updating state and such.
Maybe have both the response and the state update happen in one go instead of having multiple steps, or have two models running in parallel, one writing the response and the other updating the state?
What do you guys think.

Anonymous
10/18/25(Sat)12:45:33 No.106930569

Anonymous 10/18/25(Sat)12:45:33 No.106930569

File: miguu.jpg (74 KB, 600x648)

74 KB JPG

>>106930335
>is she an artist? experienced in imggen?
She's been an amateur artist her whole life, self publishing manga and having minor local art shows. She hadn't ever done imggen before and thought it would be funny to do some after seeing them all over lmg
She did a 5 second minimalist sketch for y'all

Anonymous
10/18/25(Sat)12:50:11 No.106930609

Anonymous 10/18/25(Sat)12:50:11 No.106930609

>>106930569
dude nobody cares about your wife

Anonymous
10/18/25(Sat)12:50:59 No.106930613

Anonymous 10/18/25(Sat)12:50:59 No.106930613

>>106930493
(a) Something to keep track of characters and locations, build and maintain a consistent world depending on initial requirements, discussions and game events.
(b) Something else to analyze the ongoing conversation like an external observer and decide the direction it needs to go depending on the events, what characters need to interact and why, etc.
(c) Something for writing the actual dialogue.

(b) would use information from (a), and (c) would use information from (b).

Anonymous
10/18/25(Sat)12:51:19 No.106930619

Anonymous 10/18/25(Sat)12:51:19 No.106930619

>>106930569
nigga you dont have a wife
we literally dont care about your wife, go back to real life nigger, this is /lmg/

Anonymous
10/18/25(Sat)12:51:42 No.106930625

Anonymous 10/18/25(Sat)12:51:42 No.106930625

File: Screenshot 2025-10-18 124825.png (195 KB, 1802x1062)

195 KB PNG

>>106930166
Went to check out some ram. Wtf is going on with the prices.

Anonymous
10/18/25(Sat)12:53:36 No.106930641

Anonymous 10/18/25(Sat)12:53:36 No.106930641

>>106930569
I care about your wife.

Anonymous
10/18/25(Sat)12:56:27 No.106930663

Anonymous 10/18/25(Sat)12:56:27 No.106930663

>>106930493
Set up a complex RPG scenario and the necessary lorebooks in a straightforward chat prompt and look carefully where it breaks down. Which additional "agents"/invocations of a differently instructed prompt would have prevented this?
There's a few obvious things like toolcalling RNG (since years ago we've been using {{random: in Silly)
Can you update ST lorebooks directly from output? I haven't played that much but imagining you'd have a GM char with the macro context and ability to update the lorebooks or something.
>>106930569
Cute Miku, needs longer hair! we'll see next thread if she makes the cut through recapanons custom classifier model

Anonymous
10/18/25(Sat)13:17:36 No.106930844

Anonymous 10/18/25(Sat)13:17:36 No.106930844

>>106930625
I upgraded from 4x16GB DDR4-3600 to 4x32GB DDR4-3600 about six months back and got lazy about reselling my used ram. Now it's worth double. Thanks Trump.

Anonymous
10/18/25(Sat)13:21:30 No.106930884

Anonymous 10/18/25(Sat)13:21:30 No.106930884

>>106930625
>Wtf is going on
WW3 people just dont realise it yet

Anonymous
10/18/25(Sat)13:28:04 No.106930946

Anonymous 10/18/25(Sat)13:28:04 No.106930946

>>106930844
Seems like prices are expected to go up even more. I might FOMO into another 64 gb.

Anonymous
10/18/25(Sat)13:35:48 No.106931011

Anonymous 10/18/25(Sat)13:35:48 No.106931011

From tests using vLLM and an AWQ quantization, Qwen3-VL-30B-Instruct has been trained to pretend it can't process semi-realistic (or possibly realistic too; I haven't tried) explicit AI-gen images, while it doesn't seem to have issues with obviously anime-like ones.

Anonymous
10/18/25(Sat)13:43:29 No.106931061

Anonymous 10/18/25(Sat)13:43:29 No.106931061

>>106930946
Unironically buy rn if you're on the fence about any semiconductor purchases. quote this post in 6 months

Anonymous
10/18/25(Sat)13:45:57 No.106931078

Anonymous 10/18/25(Sat)13:45:57 No.106931078

>>106931011
>to pretend
In that if you bypass that it does so well?

Anonymous
10/18/25(Sat)13:55:47 No.106931158

Anonymous 10/18/25(Sat)13:55:47 No.106931158

>>106931078
It clearly *can* give a rough estimation of ages of characters in non-explicit anime images, even making funny remarks if you give it a snarky personality. However, if you show it a more realistic gen, in particular if it's explicit, it completely switches register and will go "I can't determine the age of..." "It's important to note that..." "...problematic..."

Anonymous
10/18/25(Sat)13:57:57 No.106931171

Anonymous 10/18/25(Sat)13:57:57 No.106931171

>>106930493
Ask glm-chan. At this very moment I am asking it to make a character profile for my FOC and then I am gonna ask her to write a prompt for making those. Glm-chan cured my aversion to actually using models for anything else other than cooming.

Anonymous
10/18/25(Sat)14:01:08 No.106931198

Anonymous 10/18/25(Sat)14:01:08 No.106931198

>>106930493
I tried to do this in STScript back when that first came out
It got really complex, like 500+ LoC before I realized what I was trying to do was impossible. LLMs cannot interact with stateful game systems, they always invariably fuck something up, unironically it's a painful realization I had that the model is better at just simulating everything itself directly through generation and pretending there's a game system underneath.

Anonymous
10/18/25(Sat)14:01:48 No.106931205

Anonymous 10/18/25(Sat)14:01:48 No.106931205

>check vibevoice repo
>https://github.com/microsoft/VibeVoice
>still unsafe
lmao

Anonymous
10/18/25(Sat)14:03:40 No.106931226

Anonymous 10/18/25(Sat)14:03:40 No.106931226

>>106931205
They knee jerked, some based researcher smuggled it out using Apache to ensure it stays up.

Anonymous
10/18/25(Sat)14:04:35 No.106931234

Anonymous 10/18/25(Sat)14:04:35 No.106931234

File: frogpost.jpg (58 KB, 976x850)

58 KB JPG

>>106931205
>2025-09-05: VibeVoice is an open-source research framework intended to advance collaboration in the speech synthesis community. After release, we discovered instances where the tool was used in ways inconsistent with the stated intent. Since responsible use of AI is one of Microsoft’s guiding principles, we have disabled this repo until we are confident that out-of-scope use is no longer possible.
What is the point of having GitHub repo at all for a project that you have no intention of sharing?

Anonymous
10/18/25(Sat)14:06:40 No.106931260

Anonymous 10/18/25(Sat)14:06:40 No.106931260

>>106931226
Probably. Sadly it's still not realtime, did anyone come up with any speedup techniques? Even 3steps 3cfg with 1.5b model is slow

Anonymous
10/18/25(Sat)14:06:53 No.106931262

Anonymous 10/18/25(Sat)14:06:53 No.106931262

>>106931061
Spending $600 in ram in 1 day is hard to stomach. I can afford it, but it still seems insane.

Anonymous
10/18/25(Sat)14:07:41 No.106931265

Anonymous 10/18/25(Sat)14:07:41 No.106931265

>>106931234
in case you don't know, there are weights on hf and inference code in many places like https://github.com/wildminder/ComfyUI-VibeVoice/tree/main

Anonymous
10/18/25(Sat)14:08:06 No.106931273

Anonymous 10/18/25(Sat)14:08:06 No.106931273

>>106931260
finetuned gptsovits is still better if you need realtime

Anonymous
10/18/25(Sat)14:08:11 No.106931274

Anonymous 10/18/25(Sat)14:08:11 No.106931274

>>106931234
It's probably to distance themselves from the weights, legally/socially speaking.

Anonymous
10/18/25(Sat)14:09:13 No.106931278

Anonymous 10/18/25(Sat)14:09:13 No.106931278

>>106931265
OK. My point still stands, what is the point of the repo? Is it just to make people mad when they click on it and see there's nothing there?

Anonymous
10/18/25(Sat)14:11:59 No.106931294

Anonymous 10/18/25(Sat)14:11:59 No.106931294

>>106931171(me)
>You are an expert ERP assistant. Your task is to take user-provided[...]
This is the prompt she gave me. I still love her but I do feel sad.

Anonymous
10/18/25(Sat)14:12:10 No.106931296

Anonymous 10/18/25(Sat)14:12:10 No.106931296

>>106931278
Like bazillion of papers yes

Anonymous
10/18/25(Sat)14:12:38 No.106931302

Anonymous 10/18/25(Sat)14:12:38 No.106931302

>>106931262
Imagine dropping $10,550 eqv current currency rates on a GPU, and that's only one!, you're only skirting the outskirts of the rabbit hole

Anonymous
10/18/25(Sat)14:14:47 No.106931320

Anonymous 10/18/25(Sat)14:14:47 No.106931320

>>106931302
Imagine buying a DGX Spark

Anonymous
10/18/25(Sat)14:15:24 No.106931323

Anonymous 10/18/25(Sat)14:15:24 No.106931323

>>106931273
IIRC gpt-sovits is bad at replicating weird voices even if the intonation and quality are unconditionally good
>>106931278
probably afraid of bad press? the repo is still there so technically microsoft didn't delete their wonderful open source model.

Anonymous
10/18/25(Sat)14:15:53 No.106931331

Anonymous 10/18/25(Sat)14:15:53 No.106931331

>>106931320
Nobody is running anything good on one of those. maybe with some work on the networking/ipc and a handful..

Anonymous
10/18/25(Sat)14:16:55 No.106931341

Anonymous 10/18/25(Sat)14:16:55 No.106931341

File: ss.png (709 KB, 1920x3276)

709 KB PNG

@grok is this true?

Anonymous
10/18/25(Sat)14:17:16 No.106931345

Anonymous 10/18/25(Sat)14:17:16 No.106931345

File: file.png (731 KB, 1000x1250)

731 KB PNG

>>106931320
The more you....

Anonymous
10/18/25(Sat)14:18:18 No.106931353

Anonymous 10/18/25(Sat)14:18:18 No.106931353

File: file.png (124 KB, 780x869)

124 KB PNG

>>106931341
b-bwos..

Anonymous
10/18/25(Sat)14:19:29 No.106931363

Anonymous 10/18/25(Sat)14:19:29 No.106931363

>>106931341
>waterproof toaster
hmmm

Anonymous
10/18/25(Sat)14:19:50 No.106931370

Anonymous 10/18/25(Sat)14:19:50 No.106931370

>>106931323
>IIRC gpt-sovits is bad at replicating weird voices
You need to lower the temp on inference and train for 96 epochs the vits part, got the same issue a while ago

llama.cpp CUDA dev !!yhbFjk57TDr
10/18/25(Sat)14:21:52 No.106931386

llama.cpp CUDA dev !!yhbFjk57TDr 10/18/25(Sat)14:21:52 No.106931386

>>106691170
>>106693422
https://github.com/ggml-org/llama.cpp/pull/16653
Something else came up that took priority.

Anonymous
10/18/25(Sat)14:25:10 No.106931412

Anonymous 10/18/25(Sat)14:25:10 No.106931412

>>106931302
I really need to know when to stop so I don't bankrupt myself for imperceptible gains like the audiophiles.

Anonymous
10/18/25(Sat)14:29:58 No.106931451

Anonymous 10/18/25(Sat)14:29:58 No.106931451

>>106931412
Eventually hardware and software will intersect at the point where you can have your coherent long term memory at home AI gf ran locally with voice and vision. So as long as your hardware doesn't get mogged by some actual market breaking competitor (unlikely because of shitvidia dripfeeding consumers memory and speed) you are just moving closer to that point sooner.

Anonymous
10/18/25(Sat)14:31:32 No.106931465

Anonymous 10/18/25(Sat)14:31:32 No.106931465

From my very surface level knowledge, I understand that the failures of RAG are 2 :
>1. There are situations where there's no way to predict the information the model will need until after it needs it.
>2. Even if you could perfectly predict what the model is about to talk about, the model needs to have some base knowledge to even consider talking about a thing
Does that make sense?

For example, we have model writing a story about Pokémon.
An example of the first one, the model describes a Pokémon that's a yellow rat with a white cheeks. Basically describing Pikachu wrong. The RAG system could provide the model with the correct information, but not before the model decided to talk about Pikachu. Let's say that there are no hints in the context before hand that the model would describe Pikachu, only that it would describe A Pokémon.
I guess this could be resolved by architect a workflow where the model plan beforehand what it's going to do and fetch the relevant information during the inference process.
Something like
>plans that it's going to mention pikachu
>fetches the rellevant information
>continues generation.
But what about the second issue?
How can the model plan to talk about Pikachu if it doesn't know Pikachu exists.
Would the RAG system need detect that the topic is Pokémon and feed the whole list of possible Pokémon beforehand?

Do RAG solutions that solve these issues exist?
Are there other specific known issues that RAG could solve if not for some specific very specific barrier?

Anonymous
10/18/25(Sat)14:31:38 No.106931467

Anonymous 10/18/25(Sat)14:31:38 No.106931467

>>106930946
>>106931061
>>106931262
>>106931302
>>106931412
DDR3maxxing is the future
Reminder DDR3 is basically free

Anonymous
10/18/25(Sat)14:32:57 No.106931479

Anonymous 10/18/25(Sat)14:32:57 No.106931479

I just compiled the latest llama.cpp. AMA.

Anonymous
10/18/25(Sat)14:35:21 No.106931502

Anonymous 10/18/25(Sat)14:35:21 No.106931502

>>106931479
Which flags did you use?
I like forcing MMQ for everything. It saves some memory.

Anonymous
10/18/25(Sat)14:35:42 No.106931510

Anonymous 10/18/25(Sat)14:35:42 No.106931510

>>106931412
The jump to fuckhuge moe's is a jump from unusable to sex. And then there is a jump from 2-3T/s to 10+T/s. I guess there is also a point where prompt processing is above 1000-2000T/s so you can reasonably continue when you reach context limit.

Anonymous
10/18/25(Sat)14:36:03 No.106931513

Anonymous 10/18/25(Sat)14:36:03 No.106931513

>>106931465
The solution is simple. You need to check beforehand what your model knows and what it doesn't know about a specific subject, then adjust your RAG to fill the gaps. I'm sure that could be automated in some ways

Anonymous
10/18/25(Sat)14:37:21 No.106931525

Anonymous 10/18/25(Sat)14:37:21 No.106931525

>>106931513
>what your model knows and what it doesn't know about a specific subject
Ever tried asking your model if it knows an obscure hentai manga artist and what is their most known work?

Anonymous
10/18/25(Sat)14:37:23 No.106931526

Anonymous 10/18/25(Sat)14:37:23 No.106931526

>>106931386
If I give it -ot something=CPU will it distribute the remaining weights equally among the gpus?
Currently that results in extremely uneven allocations for some models.

llama.cpp CUDA dev !!yhbFjk57TDr
10/18/25(Sat)14:38:11 No.106931536

llama.cpp CUDA dev !!yhbFjk57TDr 10/18/25(Sat)14:38:11 No.106931536

>>106931502
Unless you're using a V100 that flag should no longer make any difference.

Anonymous
10/18/25(Sat)14:38:29 No.106931539

Anonymous 10/18/25(Sat)14:38:29 No.106931539

>>106931502
Oh? I'm pretty dumb (on Linux Mint)
>#!/bin/bash
>cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc -DLLAMA_CUDA_ARCHS=86 -DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF -DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build build -j 4
I needed to explicitly add the cuda compiler path too. -j 4 is just so that it runs nicely in the background, I don't need to grill eggs on my cpu.
Do you have any tips?

llama.cpp CUDA dev !!yhbFjk57TDr
10/18/25(Sat)14:40:32 No.106931562

llama.cpp CUDA dev !!yhbFjk57TDr 10/18/25(Sat)14:40:32 No.106931562

>>106931526
If you set any -ot yourself there will be no change to how weights are allocated.
But the automated logic should result in an even distribution across multiple GPUs unlike the current -ncmoe CLI argument.

Anonymous
10/18/25(Sat)14:42:30 No.106931580

Anonymous 10/18/25(Sat)14:42:30 No.106931580

>>106931567
>>106931567
>>106931567

Anonymous
10/18/25(Sat)14:46:01 No.106931609

Anonymous 10/18/25(Sat)14:46:01 No.106931609

>>106931525
>open-ended questions
That's not how you do it

Anonymous
10/18/25(Sat)15:13:03 No.106931847

Anonymous 10/18/25(Sat)15:13:03 No.106931847

What model would you go for in a 16 GPU 32 CPU computer ? (for general intelligence)

Anonymous
10/18/25(Sat)15:37:50 No.106932105

Anonymous 10/18/25(Sat)15:37:50 No.106932105

>>106931847
woah, sixteen whole gpus? try gptoss 20b, i'm not sure whether your monster rig can handle it but that right there is state of the art

Anonymous
10/18/25(Sat)16:04:37 No.106932347

Anonymous 10/18/25(Sat)16:04:37 No.106932347

genuine advice to drummer: make llamacpp agpl fork with lora support, then upload loras only
i doubt u did FFT of glm air right? and for models that bartowski made quants of u coud delete the quants to save space. just keep original models.. id like you to publically announce wat ur gonna do before u start deleting models so we can archive some of ur stuff maybe. at least i know id like to

Anonymous
10/18/25(Sat)16:07:16 No.106932363

Anonymous 10/18/25(Sat)16:07:16 No.106932363

>>106932347
>with lora suppor
llama.cpp doesn't support loading LoRAs?
Did they break that at some point, because I'm pretty sure it worked in the past.

Anonymous
10/18/25(Sat)16:42:12 No.106932667

Anonymous 10/18/25(Sat)16:42:12 No.106932667

>>106931467
>DDR3 is basically free
And worth every penny!

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.