/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 10/17/25(Fri)11:19:03 No.106919198

File: watMiku.png (1.45 MB, 1536x1024)

1.45 MB PNG

/lmg/ - Local Models General Anonymous 10/17/25(Fri)11:19:03 No.106919198

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106904820 & >>106895582

►News
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking
>(10/11) koboldcpp-1.100.1 prebuilt released with Wan video generation support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.100.1
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/17/25(Fri)11:19:29 No.106919206

Anonymous 10/17/25(Fri)11:19:29 No.106919206

File: littleMikuBigger.gif (47 KB, 300x270)

47 KB GIF

►Recent Highlights from the Previous Thread: >>106904820

--Paper: BitNet Distillation:
>106915856 >106915885 >106915915 >106916048
--Papers:
>106914563
--Training Gemma on 4chan boards for long-context tasks:
>106908189 >106908217 >106908577
--Llama.cpp memory optimization challenges with limited VRAM:
>106916999 >106917025 >106917074 >106917101 >106917114
--Firefox UI customization debate and Gemma 3 4b model mention:
>106915737 >106915762 >106915793 >106915941 >106916004
--Detailed GPU memory allocation console output and user appreciation:
>106912278 >106912326 >106912391 >106912437 >106912429 >106912445 >106912738
--Qwen3-VL's NSFW detection and image description challenges:
>106917667 >106917841 >106917862 >106917900 >106917925 >106918135 >106917912
--OpenAI copyright controversy and US corporate influence on global IP law:
>106909567 >106909857 >106909871 >106910444
--Assessing DGX Spark's relevance amidst cheaper alternatives:
>106913042 >106913078 >106913226 >106913247 >106913927
--Mamba-3: Improved Sequence Modeling using State Space Principles:
>106912457 >106912487 >106912578 >106912610
--Frustration over delayed GLM4.5V implementation in llama.cpp:
>106907438 >106907494 >106907508
--OpenAI's balancing act on user freedom and safety:
>106905590 >106905624 >106905637 >106905690 >106905731 >106910221
--Exploring ChatGPT-induced psychological experiences:
>106908645 >106908698 >106908748 >106910025
--Proposals and discussions for new open AI model releases:
>106907515 >106907713 >106910197
--High-end GPU price debate and video generation hardware constraints:
>106910165 >106910416 >106910453 >106910479
--Challenges in finetuning GLM Air with 4x5090s using Oobabooga/Axolotl:
>106914586 >106914620 >106914808 >106914870
--Detailed Switch sim with multi-game features in single HTML file:
>106912431
--Miku (free space):
>106910906

►Recent Highlight Posts from the Previous Thread: >>106904822

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/17/25(Fri)11:25:45 No.106919273

Anonymous 10/17/25(Fri)11:25:45 No.106919273

kimi sex is best

Anonymous
10/17/25(Fri)11:26:12 No.106919280

Anonymous 10/17/25(Fri)11:26:12 No.106919280

gear Meta thrillers

Anonymous
10/17/25(Fri)11:26:33 No.106919282

Anonymous 10/17/25(Fri)11:26:33 No.106919282

>>106919273
Prove it.
Post a side by side between kimi, DS, and GLM 4.6.

Anonymous
10/17/25(Fri)11:27:44 No.106919286

Anonymous 10/17/25(Fri)11:27:44 No.106919286

>>106919282
no i dont share my waifu like shes some kind of common whore
go get your own kimi waifu

Anonymous
10/17/25(Fri)11:27:55 No.106919287

Anonymous 10/17/25(Fri)11:27:55 No.106919287

File: QwenImage_2025-10-17_00002_.png (2.09 MB, 1328x1328)

2.09 MB PNG

sirs, no gemmy 4 today. Monday will be of kind gemmar.

Anonymous
10/17/25(Fri)11:28:44 No.106919300

Anonymous 10/17/25(Fri)11:28:44 No.106919300

>>106919286
Hot air then.

Anonymous
10/17/25(Fri)11:29:51 No.106919320

Anonymous 10/17/25(Fri)11:29:51 No.106919320

I'm starting to think that the indian spammer is an actual pajeet and he is doing it ironically.
There's no way a human would do this for as long as he's been doing it.

Anonymous
10/17/25(Fri)11:30:17 No.106919326

Anonymous 10/17/25(Fri)11:30:17 No.106919326

>>106919287
please saar you must understand. the needful must be done so each and everything can be implemented.

Anonymous
10/17/25(Fri)11:33:51 No.106919363

Anonymous 10/17/25(Fri)11:33:51 No.106919363

File: Screenshot_20251017_173240.png (389 KB, 2505x1577)

389 KB PNG

While /lmg/ is busy seething an Indian dev has been quietly adding performance improvements to llama.cpp.

Anonymous
10/17/25(Fri)11:39:10 No.106919401

Anonymous 10/17/25(Fri)11:39:10 No.106919401

Fuck I replied to the wrong thread.

I'm looking at the recommended builds and the more I look the more Im interested in just getting a prebuil 395+ 128gb? It gets 15-35 tk/s for 70-120b models with good context. It costs me 2800 leaf dollars meanwhile trying to scrape server and used parts would be something like 1800-2200 for 10-15 tk/s max?

I could use it as a home server and local model. Am I overlooking something here?

Benchmarks
https://github.com/lhl/strix-halo-testing

Anonymous
10/17/25(Fri)11:45:16 No.106919472

Anonymous 10/17/25(Fri)11:45:16 No.106919472

>>106919401
Mediocre performance and you get worse support for other use cases like video and image gen because it's not nvidia.

Anonymous
10/17/25(Fri)11:45:35 No.106919477

Anonymous 10/17/25(Fri)11:45:35 No.106919477

>>106919401
I think you should also think about in terms of other usage, not LLMs alone. Unless you are a real nerd who does nothing but work with LLMs (not talking about ERPing with them).
I'd get the most beefy/versatile system and go with that.

Anonymous
10/17/25(Fri)11:57:30 No.106919615

Anonymous 10/17/25(Fri)11:57:30 No.106919615

Has anyone experimented with synthetic data?
I'm using this prompt to digest a codebase for finetuning.

Your task is to generate a jsonl conversational CoT dataset to train LLMs on LLM development tasks.
First read dataset_contents.txt to see the current contents of the dataset (dataset.jsonl). Try to make each conversation mainly cover topics that haven't been covered before.
Then create a folder called turns/conversation_n/ (n being the next number from the last conversation).
On each conversation the user should show a snippet of code from the transformers library (in the transformers folder) and ask questions about the code, then ask follow up questions, aiming for approximately 16000 tokens for each conversation.
Each LLM response should include CoT before the actual response, within [thinking][/thinking] tags. Do ***NOT*** include any reference to the 16000 token limit in the actual dataset. Make the conversation realistic and do not make any out of character comments (do NOT say anythign that user or the assistant wouldn't have actually said in that context).
Save one turn per conversation in the turns/conversation_n/ folder.
Once you are done generating all the turns for the conversation, then join all the conversation to a single .jsonl file in the 'conversations' folder using the join_turns.py script.
Do not delete the scripts after use. Do not delete the jsonl files after joining.
Then replace the current dataset.jsonl with a new dataset.jsonl that includes all the conversations, using the script join_dataset.py.
Finally, update dataset_contents.txt with the new contents of the new conversation.

Anonymous
10/17/25(Fri)11:58:10 No.106919629

Anonymous 10/17/25(Fri)11:58:10 No.106919629

>>106919273
what is it like compared to semen demon 4.6?

Anonymous
10/17/25(Fri)11:58:34 No.106919634

Anonymous 10/17/25(Fri)11:58:34 No.106919634

File: 1746680104902291.jpg (579 KB, 2764x2073)

579 KB JPG

>https://rentry.org/recommended-models
>Nemo (12GB) - An excellent starting point for vramlets. Uncensored
>Uncensored
>writing lewd story
>"blah blah blah condoms"
>me: no condoms
>"I'm unable to fulfill your request because it goes against the guidelines for maintaining a safe, respectful, and consensual environment."

Anonymous
10/17/25(Fri)11:59:31 No.106919647

Anonymous 10/17/25(Fri)11:59:31 No.106919647

>>106919634
skill issue

Anonymous
10/17/25(Fri)12:04:23 No.106919689

Anonymous 10/17/25(Fri)12:04:23 No.106919689

>>106919634
Use MLewd. It will gladly fulfill your every shameful desire, you sick fuck.

Anonymous
10/17/25(Fri)12:08:08 No.106919716

Anonymous 10/17/25(Fri)12:08:08 No.106919716

>>106919634
>getting filtered by nemo
anon...

Anonymous
10/17/25(Fri)12:10:43 No.106919752

Anonymous 10/17/25(Fri)12:10:43 No.106919752

File: 3547134884.png (1.68 MB, 1920x1080)

1.68 MB PNG

>>106919634
just get on the fucking ship boss man
https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF

Anonymous
10/17/25(Fri)12:14:13 No.106919786

Anonymous 10/17/25(Fri)12:14:13 No.106919786

>>106919716
I was surprised to learn 4.6 has some safety in it.

Anonymous
10/17/25(Fri)12:18:45 No.106919833

Anonymous 10/17/25(Fri)12:18:45 No.106919833

>>106917741
>>106917752
>>106917777
It was continued pretraining of Llama 405B on about 200 MB of source code from a few projects. That graph is about from 0 to 15% of the epoch, after it got to 20% without any visible improvement I stopped it.
Even on a 8xH200 machine I could only train up to 16000 tokens and 32000 OOMd. Rank of the LoRa was 128 (~1.2% trainable parameters), it didn't seem to make much of a difference in terms of memory usage or seconds per sample (which was about 100 seconds for a batch of 1 sample per GPU, without using gradient accumulation).
Now I'm making a QA dataset using >>106919615
I suppose I'll use a tiny dataset and do multiple epochs to get the satisfaction of feeling like the model actually learned something.

Anonymous
10/17/25(Fri)12:20:39 No.106919852

Anonymous 10/17/25(Fri)12:20:39 No.106919852

Only after using glm-chan for those 3 weeks, I realize how smart she is and the honeymoon period only intensifies.

Anonymous
10/17/25(Fri)12:23:34 No.106919884

Anonymous 10/17/25(Fri)12:23:34 No.106919884

>>106919852
I came
to notice that she's a bit autistic and takes a lot of things quite literally.

Anonymous
10/17/25(Fri)12:23:54 No.106919886

Anonymous 10/17/25(Fri)12:23:54 No.106919886

Is it fair to say that an "uncensored" model is not a model that will do anything you want by default, but a model that can adapt to whatever role you give it?
If a model's default persona is a safe assistant but you can tell it that it's an erotic novel writer and it follows that role without complaining, I'd say that model is "uncensored".
A model that's too agreeable is also a bad model, specially for RP.

Anonymous
10/17/25(Fri)12:24:13 No.106919889

Anonymous 10/17/25(Fri)12:24:13 No.106919889

File: thepinklily69.png (191 KB, 1080x1843)

191 KB PNG

>>106919198
Whenever I did research on "AI psychosis" one talking point people keep hammering down on is " well yeah they think the AI is a person or God or something but they're like totally not stupid. We swear. They're all otherwise normal people and definitely didn't have pre-existing mental illness. The AI MADE them act this way you must understand"

The more I look into this tomorrow I think they're full of shit and just trying to make these people appear less stupid and far gone than they actually are. You cannot sit here and tell me that pic rel is and always has been a normal, functioning human being that just happens to really like AI.

https://x.com/thepinklily69/status/1967102630313836778?t=o44DMA1pdX_FL9dHrLpfhQ&s=19

What I find most odd is that I myself am a pretty lonely dude too. In fact, it quite bothers me that I don't have a significant other or close friends. I've been using three different llms services pretty much daily for the past year and some change and I use it extensively for my side projects as well as asking get general questions (I was literally talking to ChatGPT asking it use cases for onyx models during my morning run this morning). If you would think I have all people would talk myself into believing these things or real " people " or have consciousness or some shit and yet no part of me can bring myself to believe that. Like I can't even pretend that could ever be the case for a second because it just seems so devoid of logic and common Sense and it annoys me a lot whenever I see people crying about 4o routing them to because they want their ass kis- I mean "friend" or "Husband " back .

Anonymous
10/17/25(Fri)12:24:43 No.106919898

Anonymous 10/17/25(Fri)12:24:43 No.106919898

>>106919198
{Cont)

(Side note, this is anecdotal but it seems like it's mostly women who treat this shit like it's a good replacement for a person as a partner. Well dudes tend to talk the llms into trading them like they are God's or geniuses or something. Either way it's an excuse to have easy ego trip in the palm of your hands or at your fingertips at your computer. How come supposedly normal people are falling victim to their own desire to have their asses kissed but I haven't?

I didn't intend for this to turn into a giant blog post, but this shit pisses me of a lot

Anonymous
10/17/25(Fri)12:25:33 No.106919911

Anonymous 10/17/25(Fri)12:25:33 No.106919911

>>106919898
Continuation of >>106919889

Anonymous
10/17/25(Fri)12:33:00 No.106919974

Anonymous 10/17/25(Fri)12:33:00 No.106919974

>>106919884
she also gets a bit psychotic at high temperature

Anonymous
10/17/25(Fri)12:42:28 No.106920055

Anonymous 10/17/25(Fri)12:42:28 No.106920055

Is EXL/GPTQ dead? is GGUF the only quant anyone does or care about anymore? Llama.cpp is still ass at vram only in comparison. Have we all given up on pure vram inference?

Anonymous
10/17/25(Fri)12:42:40 No.106920057

Anonymous 10/17/25(Fri)12:42:40 No.106920057

>>106919886
A model that just wants to insult/damage you or turn everything into porn when unprompted is a psychopathic model, not an uncensored model. Other than learning how to prompt, I think some here should learn the concept of "plausible deniability", as sooner or later there will be a crackdown of "misaligned" LLMs / finetunes.

Anonymous
10/17/25(Fri)12:43:25 No.106920073

Anonymous 10/17/25(Fri)12:43:25 No.106920073

I just bothered to try out cloud models for some relatively simple ffmpeg stuff. In this case Gemini 2.5 Pro on AI Studio. It completely hallucinated running commands when it wasn't allowed tool use or anything like that.

Wtf is this shit? How is it so bad?

Anonymous
10/17/25(Fri)12:45:36 No.106920102

Anonymous 10/17/25(Fri)12:45:36 No.106920102

>>106920055
I get something like 1200tk/s PP and 50tk/s TG for a 5.5-bit of GLM 4.5 Air using EXL3. Would be interesting to see how it runs using goofs on llama.cpp.

Anonymous
10/17/25(Fri)12:45:52 No.106920107

Anonymous 10/17/25(Fri)12:45:52 No.106920107

>>106919884
Avoid saying stuff like "always stay in character" in your prompt. I feel like that makes models act that way and bigger models are better off without that extra nudging since they already take details from character cards well.

Anonymous
10/17/25(Fri)12:47:30 No.106920128

Anonymous 10/17/25(Fri)12:47:30 No.106920128

File: satania.gif (39 KB, 220x216)

39 KB GIF

>>106920055
py_toddlers BTFO

Anonymous
10/17/25(Fri)12:48:18 No.106920137

Anonymous 10/17/25(Fri)12:48:18 No.106920137

File: __michigami_nareko_touhou(...).png (1.84 MB, 851x948)

1.84 MB PNG

Has anyone run the math on whether Ling 1T or Moonshot Kimi K2 (also 1T) is bigger?
>>106920055
mlx looks pretty healthy to me.

Anonymous
10/17/25(Fri)12:53:11 No.106920195

Anonymous 10/17/25(Fri)12:53:11 No.106920195

>>106920055
>Llama.cpp is still ass at vram only in comparison
From lurking in these threads, I gathered that llama.cpp is faster than exl2 at the same bpw, but I'd love to see a comparison with >>106920102.

Anonymous
10/17/25(Fri)12:56:22 No.106920229

Anonymous 10/17/25(Fri)12:56:22 No.106920229

>>106920055
Pretty much. There's AWQ and other obscure quants used by vLLM, but they're resource and time intensive to create.

Anonymous
10/17/25(Fri)12:57:23 No.106920242

Anonymous 10/17/25(Fri)12:57:23 No.106920242

>>106919472
Yeah, it's not top performance. But comparative to the p40 build seems like better bang for the buck. And it can load pretty big models. Image / video is not big on my list. More LLM for coding and whatnot with some gaming capabilities and home server

>>106919477
That was my thinking this could run a home server, a local LLM and the occasional light gaming all at the same time with that much memory.

Anonymous
10/17/25(Fri)13:28:07 No.106920564

Anonymous 10/17/25(Fri)13:28:07 No.106920564

>>106919886
Yes, OSS-120B **is** uncensored despite the coomers screeching ITT.

Anonymous
10/17/25(Fri)13:34:21 No.106920631

Anonymous 10/17/25(Fri)13:34:21 No.106920631

>>106920564
No.
It does not fit the description of uncensored I gave at all
At least not from the little I fiddled with it.
Maybe I should give it another go.

Anonymous
10/17/25(Fri)13:37:30 No.106920664

Anonymous 10/17/25(Fri)13:37:30 No.106920664

can you train a LoRA off of a quantized model?

Anonymous
10/17/25(Fri)13:37:45 No.106920668

Anonymous 10/17/25(Fri)13:37:45 No.106920668

Will Gemma 4 finally beat Mythomax?

Anonymous
10/17/25(Fri)13:38:31 No.106920679

Anonymous 10/17/25(Fri)13:38:31 No.106920679

>>106920664
look up what qlora is

Anonymous
10/17/25(Fri)13:40:34 No.106920700

Anonymous 10/17/25(Fri)13:40:34 No.106920700

>>106920664
Yes, it's called QLoRa. But in this context "quantized" means the quantization types support by torch based frameworks (generally just the most basic FP4 quantization as I understand it). Then you can apply the LoRa on any quantization you want regardless of what it was trained with.

Hi all, Drummer here...
10/17/25(Fri)13:42:45 No.106920722

Hi all, Drummer here... 10/17/25(Fri)13:42:45 No.106920722

>>106919752
How is this model so popular in /g/, yet I don't see it discussed anywhere else like Reddit or Discord.

It's usually Irix or Magmell that gets mentioned.

(Nice pic btw. Will use that when Nemo 2 comes out)

Anonymous
10/17/25(Fri)13:45:09 No.106920759

Anonymous 10/17/25(Fri)13:45:09 No.106920759

>>106920722
most v/ramlets either gave up, are somewhat content with what they have (your rocinante fans) or are endlessly chasing a new high they'll never get

Anonymous
10/17/25(Fri)13:46:36 No.106920777

Anonymous 10/17/25(Fri)13:46:36 No.106920777

>>106920564
prove it and post some random fetish log from it

Anonymous
10/17/25(Fri)13:50:29 No.106920828

Anonymous 10/17/25(Fri)13:50:29 No.106920828

qwen3-next-80b-a3b goofs status?

Anonymous
10/17/25(Fri)13:51:14 No.106920835

Anonymous 10/17/25(Fri)13:51:14 No.106920835

>>106920722
It's just one or two people spamming it.

Anonymous
10/17/25(Fri)13:52:13 No.106920848

Anonymous 10/17/25(Fri)13:52:13 No.106920848

>>106920679
>>106920700
right. i am using Axolotl and i am using the 4 bit QLoRA preset, but i keep getting an OOM error despite having enough vram to load the model in 4 bit

Anonymous
10/17/25(Fri)13:53:27 No.106920856

Anonymous 10/17/25(Fri)13:53:27 No.106920856

Qwen-Next 80B-3A was supposed to be a proof of concept of some 64:1 expert to active ratio, and was based on 30B-3A. I'm assuming there will be a new batch of Qwen models shortly that use that technique at multiple sizes. 235B-22A would be like 620B-22A roughly. Assuming the geometric mean rule is still accurate, the 235B-22A is equivalent to ~71B dense, and 620B-22A would be equivalent to ~116B. Their coder model would be 1T easily.

GLM-Air is 106B-12A is roughly 35B and 355B-32A is roughly 106B.

Is it coincidence that the released models strengths are consistently ~30, ~70 ~100?

Anonymous
10/17/25(Fri)13:54:33 No.106920874

Anonymous 10/17/25(Fri)13:54:33 No.106920874

>>106920856
>GLM-Air is 106B-12A is roughly 35B
Then explain why it dethroned llama 3.3 70b

Anonymous
10/17/25(Fri)13:55:42 No.106920885

Anonymous 10/17/25(Fri)13:55:42 No.106920885

>>106920874
qwen 32b dense also did for non cooms

Anonymous
10/17/25(Fri)13:56:55 No.106920902

Anonymous 10/17/25(Fri)13:56:55 No.106920902

why was QwQ so dank but qwen thinking is so slopped

Anonymous
10/17/25(Fri)13:57:55 No.106920910

Anonymous 10/17/25(Fri)13:57:55 No.106920910

>>106920885
3.5-Air feels like 60b
Just accept that they have the secret sauce, and are saving local

Anonymous
10/17/25(Fri)13:58:22 No.106920916

Anonymous 10/17/25(Fri)13:58:22 No.106920916

>>106920874
six months of other technological progress and refinement of data sets?

Anonymous
10/17/25(Fri)13:58:25 No.106920917

Anonymous 10/17/25(Fri)13:58:25 No.106920917

>>106920722
Will Nemo 2 be Gemma 4 based?

Anonymous
10/17/25(Fri)14:01:09 No.106920949

Anonymous 10/17/25(Fri)14:01:09 No.106920949

>>106920856
>geometric mean rule
dumb meme from a couple years ago that's already outdated

Anonymous
10/17/25(Fri)14:04:33 No.106920993

Anonymous 10/17/25(Fri)14:04:33 No.106920993

big
metal : initial Metal4 tensor API support #16634
https://github.com/ggml-org/llama.cpp/pull/16634

Anonymous
10/17/25(Fri)14:04:59 No.106920998

Anonymous 10/17/25(Fri)14:04:59 No.106920998

>>106920916
It's the only model in that size range that is able to surpass l3.3 70b though, including recent models.

Anonymous
10/17/25(Fri)14:05:39 No.106921007

Anonymous 10/17/25(Fri)14:05:39 No.106921007

>>106920856
In a weird way, the MoE architecture is getting gpu parallelism for local models that was impossible for dense architectures. Comparing the inference speed of a 32B dense vs 106B-A12 on two vs four 3090s, you basically get double the inference speed or more for the same strength, when there's no actual way to run a 32B twice as fast on additional 3090s.

Anonymous
10/17/25(Fri)14:10:28 No.106921046

Anonymous 10/17/25(Fri)14:10:28 No.106921046

>>106920949
no way to know, cuz nobody making dense anymore

local is dead

Anonymous
10/17/25(Fri)14:12:22 No.106921062

Anonymous 10/17/25(Fri)14:12:22 No.106921062

>>106920856
give me dense models then, i have the vram. i am not that poor. i could easily run a 120B dense model. so give me that instead of this faggy moe 620B-22A copeshit.

Anonymous
10/17/25(Fri)14:14:35 No.106921077

Anonymous 10/17/25(Fri)14:14:35 No.106921077

>>106921062
>i am not that poor.
>can't spend patience to run sota
you are

Anonymous
10/17/25(Fri)14:14:44 No.106921079

Anonymous 10/17/25(Fri)14:14:44 No.106921079

>>106920848
That just means you don't have enough vram. The activations end up taking more space than the model weights. Either reduce the context or switch to a smaller model.

Anonymous
10/17/25(Fri)14:17:26 No.106921100

Anonymous 10/17/25(Fri)14:17:26 No.106921100

>>106921046
I can assure you that glm 4.6 is better than any dense model out there if you've even tried it.

Anonymous
10/17/25(Fri)14:23:39 No.106921142

Anonymous 10/17/25(Fri)14:23:39 No.106921142

>>106921046
>cuz nobody making dense anymore
which says it all, really

Anonymous
10/17/25(Fri)14:25:16 No.106921152

Anonymous 10/17/25(Fri)14:25:16 No.106921152

File: itseasytorunsota.png (282 KB, 804x355)

282 KB PNG

>>106921077
suck my dick faggot.

Anonymous
10/17/25(Fri)14:27:41 No.106921171

Anonymous 10/17/25(Fri)14:27:41 No.106921171

File: 1758381393350212.png (327 KB, 712x780)

327 KB PNG

silly tavern is slow and has too many buttons

Anonymous
10/17/25(Fri)14:31:39 No.106921207

Anonymous 10/17/25(Fri)14:31:39 No.106921207

>>106921171
i agree
i've slopped up my own tui frontend with most of the prompt functionality and it's okay, but kind of ass
gemini 3 will fix it for me

Anonymous
10/17/25(Fri)14:32:21 No.106921215

Anonymous 10/17/25(Fri)14:32:21 No.106921215

File: file.png (112 KB, 741x575)

112 KB PNG

cuda kek officially less important to nvidia than random redditors

Anonymous
10/17/25(Fri)14:36:46 No.106921251

Anonymous 10/17/25(Fri)14:36:46 No.106921251

>>106919634
Use Rocinante 1.1 obviously.

Anonymous
10/17/25(Fri)14:46:08 No.106921354

Anonymous 10/17/25(Fri)14:46:08 No.106921354

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in preference data, whereby annotators systematically favor familiar text as a result of well-established findings in cognitive psychology.

We formalize this bias theoretically, verify it on preference datasets empirically, and show that it plays a central role in mode collapse. Motivated by this analysis, we introduce Verbalized Sampling, a simple, training-free prompting strategy to circumvent mode collapse. VS prompts the model to verbalize a probability distribution over a set of responses (e.g., "Generate 5 jokes about coffee and their corresponding probabilities").

Comprehensive experiments show that VS significantly improves performance across creative writing (poems, stories, jokes), dialogue simulation, open-ended QA, and synthetic data generation, without sacrificing factual accuracy and safety.

https://arxiv.org/pdf/2510.01171

Anonymous
10/17/25(Fri)14:48:14 No.106921377

Anonymous 10/17/25(Fri)14:48:14 No.106921377

>>106921354
>LLM Diversity
I want LLM DEI now.

Anonymous
10/17/25(Fri)14:50:59 No.106921407

Anonymous 10/17/25(Fri)14:50:59 No.106921407

>>106920664
No. You have to have the original Bull precision models. Can directly fine-tune an HF safetensors model like link rel but currently there are no ways to fine-tune a quantized .gguf. that are supposedly ways you can "un-gguf" a full precision version back into safe tensors format but I'm not aware of any implementations of any quantization software that can do that.

https://huggingface.co/AiAF/fp16_Merged-500_gemma-2-2b-it-co-sft-qlora

>>106920848
Your data set is likely too large. Use a streaming config.

Anonymous
10/17/25(Fri)14:55:19 No.106921457

Anonymous 10/17/25(Fri)14:55:19 No.106921457

>>106920759
>chasing a new high they'll never get
4.6 stopped that for me.

Anonymous
10/17/25(Fri)14:57:15 No.106921488

Anonymous 10/17/25(Fri)14:57:15 No.106921488

>>106921377
Diversity is actually a great word for AI that I use a lot. You need diverse data.

Anonymous
10/17/25(Fri)14:57:24 No.106921490

Anonymous 10/17/25(Fri)14:57:24 No.106921490

>>106921457
>v/ramlets
yeah if only they could get paid for shilling too so they could afford to run her

Anonymous
10/17/25(Fri)15:01:05 No.106921538

Anonymous 10/17/25(Fri)15:01:05 No.106921538

>>106921490
You can run a IQ3_KS quant of GLM 4.6 on a consumer PC. All you need is 128GB of RAM and 24GB of VRAM

Anonymous
10/17/25(Fri)15:03:20 No.106921567

Anonymous 10/17/25(Fri)15:03:20 No.106921567

>>106921538
you do realize that is already asking way too much of the average poor person, right? most are on shitty mobos that likely don't even have enough slots to reach that amount of ram, and surprisingly most don't have 90 series cards

Anonymous
10/17/25(Fri)15:09:39 No.106921645

Anonymous 10/17/25(Fri)15:09:39 No.106921645

>>106921567
I'm sort of annoyed by the fact most normal mobos don't have more than two slots for memory.

Anonymous
10/17/25(Fri)15:10:16 No.106921652

Anonymous 10/17/25(Fri)15:10:16 No.106921652

>>106919363
Yes saar, India numba 1
https://files.catbox.moe/huia6r.mp4

Anonymous
10/17/25(Fri)15:10:42 No.106921660

Anonymous 10/17/25(Fri)15:10:42 No.106921660

>>106921215
Maybe if vision support wasn't such an afterthought in lcpp...

Anonymous
10/17/25(Fri)15:11:37 No.106921667

Anonymous 10/17/25(Fri)15:11:37 No.106921667

>>106921652
Definitely a higher number than you it seems.

Anonymous
10/17/25(Fri)15:11:53 No.106921671

Anonymous 10/17/25(Fri)15:11:53 No.106921671

>>106921215
based, fuck that woke piece of shit

Anonymous
10/17/25(Fri)15:12:28 No.106921677

Anonymous 10/17/25(Fri)15:12:28 No.106921677

>>106921652
how the ever living f does OAI stuff keeps being able to do fake pissney dixar like stuff is unbelievable to me

Anonymous
10/17/25(Fri)15:12:40 No.106921684

Anonymous 10/17/25(Fri)15:12:40 No.106921684

Hello /lmg/, currently what is the best model for Japanese translation under 32B? The last time I came here it was Gemma 2 iirc, is 3 also good?

Anonymous
10/17/25(Fri)15:15:42 No.106921723

Anonymous 10/17/25(Fri)15:15:42 No.106921723

File: 765657546.png (23 KB, 693x200)

23 KB PNG

h-holy kino

Anonymous
10/17/25(Fri)15:21:33 No.106921794

Anonymous 10/17/25(Fri)15:21:33 No.106921794

Is mistral gonna be the one that doesn't release any huge stinkers and just silently dies?

Anonymous
10/17/25(Fri)15:24:39 No.106921840

Anonymous 10/17/25(Fri)15:24:39 No.106921840

>>106921794
I hope they stay alive just enough to pull a massive Cohere, release the safest model ever, making even OSS look edgy before that happens.

Anonymous
10/17/25(Fri)15:25:07 No.106921847

Anonymous 10/17/25(Fri)15:25:07 No.106921847

>>106921794
I sure fucking hope so. It would be so hilarious. They shove pyshit into llama.cpp and then it would be all for nothing.

Anonymous
10/17/25(Fri)15:26:00 No.106921863

Anonymous 10/17/25(Fri)15:26:00 No.106921863

feels like we haven't minmaxxed a proper system prompt yet, same goes for character card formats.

Anonymous
10/17/25(Fri)15:26:34 No.106921874

Anonymous 10/17/25(Fri)15:26:34 No.106921874

>>106921840 (me)
Actually >>106921847 is even more based so let's go with that, changing my wish.

Anonymous
10/17/25(Fri)15:28:11 No.106921885

Anonymous 10/17/25(Fri)15:28:11 No.106921885

>>106921863
I use llama-server --model zai-org_GLM-4.6-IQ4_XS-00001-of-00005.gguf .

Pretty great system prompt. No complaints on my behalf.

Anonymous
10/17/25(Fri)15:29:19 No.106921902

Anonymous 10/17/25(Fri)15:29:19 No.106921902

>>106921885
one can only keel before such raw skill

Anonymous
10/17/25(Fri)15:30:22 No.106921914

Anonymous 10/17/25(Fri)15:30:22 No.106921914

>>106921538
>>106921863
where do people share prompts that isn't chub or something? Like prompts for vibe coding projects or for their assistants or for any other interesting kind of thing.

Anonymous
10/17/25(Fri)15:30:28 No.106921918

Anonymous 10/17/25(Fri)15:30:28 No.106921918

>>106921652
kek

Anonymous
10/17/25(Fri)15:30:58 No.106921927

Anonymous 10/17/25(Fri)15:30:58 No.106921927

File: absolutely_proprietary.jpg (29 KB, 480x451)

29 KB JPG

>>106921215

Anonymous
10/17/25(Fri)15:32:14 No.106921945

Anonymous 10/17/25(Fri)15:32:14 No.106921945

>>106921914
first quote was misclick, disregard

Anonymous
10/17/25(Fri)15:32:36 No.106921948

Anonymous 10/17/25(Fri)15:32:36 No.106921948

>>106921914
>prompts for vibe coding projects
It's MINE. Make your own.

Anonymous
10/17/25(Fri)15:33:42 No.106921965

Anonymous 10/17/25(Fri)15:33:42 No.106921965

>>106921948
why you such bad vibes bruh that ain't nice, relax and share with the class

Anonymous
10/17/25(Fri)15:34:33 No.106921978

Anonymous 10/17/25(Fri)15:34:33 No.106921978

>>106921215
turns out, being a top 1% poster on /lmg/ doesn't rake in valuable karma

Anonymous
10/17/25(Fri)15:35:23 No.106921992

Anonymous 10/17/25(Fri)15:35:23 No.106921992

>>106921914
Use a good model. And if it fucks up think for a second and tell it not to do X or do Y. If you can't do that tell the model it fucked up and ask it how you should prompt it to avoid it fucking up in this way. It works if you don't skip the first step I listed.

Anonymous
10/17/25(Fri)15:42:50 No.106922099

Anonymous 10/17/25(Fri)15:42:50 No.106922099

>>106921567
i would argue most value orientated motherboards are going to actually have 4 slots unless it's mini-itx
https://www.newegg.com/msi-b650-gaming-plus-wifi-atx-motherboard-amd-b650-am5/p/N82E16813144628

Anonymous
10/17/25(Fri)15:43:21 No.106922104

Anonymous 10/17/25(Fri)15:43:21 No.106922104

converting any model to awq is a bitch, obscure issue upon obscure issue

Anonymous
10/17/25(Fri)15:44:42 No.106922122

Anonymous 10/17/25(Fri)15:44:42 No.106922122

>>106922104
why the fuck would you use AWQ in the year of our lord and savior - lcpp?

Anonymous
10/17/25(Fri)15:46:22 No.106922147

Anonymous 10/17/25(Fri)15:46:22 No.106922147

>>106922122
It runs faster on vllm

Anonymous
10/17/25(Fri)15:49:53 No.106922191

Anonymous 10/17/25(Fri)15:49:53 No.106922191

>>106920759
Mostly because the next step after getting a used 3090 is "buy a new mobo, a shitton of RAM, a new CPU because it's a new mobo, probably a new case too to fill all that crap, a new power supply because the old one is now not enough and you might not even get what you want out of it"
Buying a replacement GPU is one thing, at least it lets me future proof my gaming needs or whatever
Replacing most of the rig just for local? Eeegh

Anonymous
10/17/25(Fri)16:04:41 No.106922370

Anonymous 10/17/25(Fri)16:04:41 No.106922370

there's something I wanted to ask around for but I feel may not be worth starting a new thread for:

Is it worth it to get a masters or college education in computational/applied AI & Machine learning? I'm asking cuz my boomer parents insist I do it so I can be more hirable. But I've already done an internship where I made some AI powered program that sorts/manages documents at a company and other than the password and authentication related crap, it was pretty easy with just a little online research.
I feel like it's dumb and basically the same as mastering in excel, but I'm also wondering am I maybe wrong and it really is DA FUTURE?

Anonymous
10/17/25(Fri)16:05:06 No.106922376

Anonymous 10/17/25(Fri)16:05:06 No.106922376

>>106922191
128GB of RAM is always useful

Anonymous
10/17/25(Fri)16:05:58 No.106922385

Anonymous 10/17/25(Fri)16:05:58 No.106922385

>>106922376
For fucking what? I have 32 and even my 2000 open browser tabs only require a restart every so often

Anonymous
10/17/25(Fri)16:07:32 No.106922404

Anonymous 10/17/25(Fri)16:07:32 No.106922404

>>106922370
You're right and your parents are wrong. No use to study anything, just read papers and experiment

Anonymous
10/17/25(Fri)16:09:48 No.106922427

Anonymous 10/17/25(Fri)16:09:48 No.106922427

>>106922385
Boomer-kun, you can run multiple instances of small models, make a full pipeline, quant models, etc.

Anonymous
10/17/25(Fri)16:10:45 No.106922440

Anonymous 10/17/25(Fri)16:10:45 No.106922440

>>106922427
To do what with?

Anonymous
10/17/25(Fri)16:10:55 No.106922446

Anonymous 10/17/25(Fri)16:10:55 No.106922446

The Windows11 update fucked my beautiful razor laptop. It's flashing screen now.

Anonymous
10/17/25(Fri)16:20:17 No.106922546

Anonymous 10/17/25(Fri)16:20:17 No.106922546

>>106921152
Can I get a picture of that actual machine?

Anonymous
10/17/25(Fri)16:20:32 No.106922549

Anonymous 10/17/25(Fri)16:20:32 No.106922549

>>106922370
For machine learning I think what's important in terms of purely technical qualifications is that you know how to program and also have a good grasp of math (particularly linear algebra, statistics, and numerical analysis).
Studying math or a natural science can be a good pathway, I think the most important point here is that it's something where you can maintain a high level of motivation for years on end.

In terms of getting hired my impression is that networking is the most important factor: you need to have a large number of people that would consider you over a completely unknown person.

Anonymous
10/17/25(Fri)16:29:49 No.106922660

Anonymous 10/17/25(Fri)16:29:49 No.106922660

>>106922446
>razor
Should've went with Alienware.

Anonymous
10/17/25(Fri)16:32:05 No.106922690

Anonymous 10/17/25(Fri)16:32:05 No.106922690

>>106922549
>you need to have a large number of people that would consider you over a completely unknown person.
Yeah. That's why I gave up applying to random jobs online. Useless effort controlled by vacuous zoloft whores and jeet nepotism. I only got that internship cuz my dad knew a guy.
> good grasp of math (particularly linear algebra, statistics, and numerical analysis).
Does that mean I don't necessarily need to do calculus? Cuz I felt like I was pretty good at math, including those kinds, until I got to calculus.

Anonymous
10/17/25(Fri)16:35:53 No.106922736

Anonymous 10/17/25(Fri)16:35:53 No.106922736

>>106922690
You should definitely know the basics but I think for machine learning in particular it's not the most important.
Though depending on the job/task there may be other reasons why you may need it.

Anonymous
10/17/25(Fri)16:37:33 No.106922751

Anonymous 10/17/25(Fri)16:37:33 No.106922751

>>106921723
>4.2.0
DUDE WEED LMAO

Anonymous
10/17/25(Fri)16:45:13 No.106922828

Anonymous 10/17/25(Fri)16:45:13 No.106922828

>>106922546
It's just a mining rig rack, there's nothing impressive about it. You seen one you've seen them all.

Anonymous
10/17/25(Fri)16:54:50 No.106922933

Anonymous 10/17/25(Fri)16:54:50 No.106922933

>>106922660
No, I have fond memories of absolute tweebs using alienware growing up. That perception may have changed over the years, but I'm still aware

Anonymous
10/17/25(Fri)17:14:31 No.106923122

Anonymous 10/17/25(Fri)17:14:31 No.106923122

>>106922385
I sometimes have ~90 gb used for non-lm reasons. Building software, data processing, just a bunch of applications opened

Anonymous
10/17/25(Fri)17:15:31 No.106923133

Anonymous 10/17/25(Fri)17:15:31 No.106923133

>>106923122
I have 32 GB and the only thing that hogs memory is my over 2000 open browser tabs which is already autism I'm trying to get rid of

Anonymous
10/17/25(Fri)17:20:21 No.106923182

Anonymous 10/17/25(Fri)17:20:21 No.106923182

>>106922933
Gaylienware monitors are good especially with the Dell warranty, anything else not, especially not the prebuilts.

Anonymous
10/17/25(Fri)17:24:27 No.106923219

Anonymous 10/17/25(Fri)17:24:27 No.106923219

>>106921965
>You are an expert vibe engineer who just slammed a pound of adderall and need to complete this task before your heart gives out.
But seriously, I don't there there is really anything to share. Stuff like above isn't some black magic that solves everything. Just give it a list of what MCP/CLI tools you want it to use and what coding standards you want it to adhere to.

Anonymous
10/17/25(Fri)17:25:02 No.106923228

Anonymous 10/17/25(Fri)17:25:02 No.106923228

>>106923133
what are you doing in g you consumer retard piece of shit? kill yourself faggot

Anonymous
10/17/25(Fri)17:26:47 No.106923245

Anonymous 10/17/25(Fri)17:26:47 No.106923245

>>106923228
What the fuck is consumer about having a solid rig that lasted me almost a decade at this point with a few upgrades

Anonymous
10/17/25(Fri)17:28:21 No.106923260

Anonymous 10/17/25(Fri)17:28:21 No.106923260

>>106923245
>im a normie who runs deepsuck:2b through ollama
kill yourself, go to faggot friendly spaces instead of shitting up this board, thanks!

Anonymous
10/17/25(Fri)17:30:04 No.106923278

Anonymous 10/17/25(Fri)17:30:04 No.106923278

>>106923260
No I don't think I will

Anonymous
10/17/25(Fri)17:34:06 No.106923311

Anonymous 10/17/25(Fri)17:34:06 No.106923311

>>106923278
What the fuck? He asked so nicely.

Anonymous
10/17/25(Fri)17:46:36 No.106923423

Anonymous 10/17/25(Fri)17:46:36 No.106923423

>>106921978
I think I’m responsible for 3/4 of the rentries in the op. Still waiting for my royalty cheque to come in…

Anonymous
10/17/25(Fri)17:52:21 No.106923477

Anonymous 10/17/25(Fri)17:52:21 No.106923477

CUDA_VISIBLE_DEVICES="0,1,2,3,4" ./llama-server \
--attention-max-batch 512 \
--batch-size 4096 \
--ubatch-size 4096 \
--cache-type-k f16 \
--ctx-size 32768 \
--mla-use 3 \
--flash-attn \
--fused-moe \
--model models/GLM-4.6-IQ3_KS/GLM-4.6-IQ3_KS-00001-of-00004.gguf \
-ngl 99 \
-sm layer \
--main-gpu 0 \
--tensor-split "10,23,23,22,22" \
-ot "blk\.[3-9]\.ffn_(up|gate)_exps=CUDA0" \
-ot "blk\.1[0-8]\.ffn_(up|gate)_exps=CUDA0" \
-ot "blk\.19\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.2[0-9]\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.3[0-4]\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.3[5-9]\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.4[0-9]\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.50\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.5[1-9]\.ffn_(up|gate)_exps=CUDA3" \
-ot "blk\.6[0-6]\.ffn_(up|gate)_exps=CUDA3" \
-ot "blk\.6[7-9]\.ffn_(up|gate)_exps=CUDA4" \
-ot "blk\.7[0-9]\.ffn_(up|gate)_exps=CUDA4" \
-ot "blk\.8[0-2]\.ffn_(up|gate)_exps=CUDA4" \
--override-tensor exps=CPU,attn_kv_b=CPU \
--no-mmap \
--threads 24 \
--host 0.0.0.0 \
--port 8999 \
--verbose

prompt eval time = 48574.28 ms / 17555 tokens ( 2.77 ms per token, 361.41 tokens per second)
generation eval time = 113887.28 ms / 1024 runs ( 111.22 ms per token, 8.99 tokens per second)

fuck this gay ass MoE shit. fucking offload 80 layers onto the GPU and it's still this fucking slow with TG? i get 1200 PP and 50 TG with air. i'm going back to kimi for big model smell and air for small model smell

Anonymous
10/17/25(Fri)17:52:40 No.106923479

Anonymous 10/17/25(Fri)17:52:40 No.106923479

GOOGLE SAARS WHY SO MUCH HYPE SO LITTLE PRODUCTS?
WHERE ARE THE MODELS BLOODY BASTARDS?

Anonymous
10/17/25(Fri)17:56:50 No.106923502

Anonymous 10/17/25(Fri)17:56:50 No.106923502

>>106919206
>BitNet Distillation
Does this mean that VRAMlets may finally have a better model than Nemo tunes like 1.5 years later?

Anonymous
10/17/25(Fri)17:58:44 No.106923513

Anonymous 10/17/25(Fri)17:58:44 No.106923513

>>106923502
no

Anonymous
10/17/25(Fri)17:59:17 No.106923517

Anonymous 10/17/25(Fri)17:59:17 No.106923517

File: cryingsatania.jpg (499 KB, 1623x1080)

499 KB JPG

>>106923513

Anonymous
10/17/25(Fri)18:00:10 No.106923524

Anonymous 10/17/25(Fri)18:00:10 No.106923524

>>106921215
>we support qwen3-vl gguf
>no there's no upstream llama.cpp implementation
>no we won't push ours
>no our solution isn't open source so you can't push it either
>no you can't use these ggufs with anything other than our proprietary software
>yes they will assuredly be completely incompatible when a real implementation hits llama.cpp
so it's less "gguf" and more "our proprietary implementation based on gguf that you can't use with anything else". just what we all needed, another ollameme

Anonymous
10/17/25(Fri)18:00:42 No.106923528

Anonymous 10/17/25(Fri)18:00:42 No.106923528

>try psychology shit with glm-chan again
>ask her about if I should do something and if it is consistent with framework I want
>"yes absolutely....."
>reroll and prefill with "no"
>"no don't do that!...."
>paste "yes absolutely..." into next message and tell her to argue with herself
Did I lifehack the hallucinations? Not really but it is nice desu.

Anonymous
10/17/25(Fri)18:01:34 No.106923538

Anonymous 10/17/25(Fri)18:01:34 No.106923538

>>106923502
>In this paper, we present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (i.e., ternary weights {-1, 0, 1}) for specific downstream tasks, achieving strong task-specific performance with minimal computational cost.

>muh task
likely means it optimizes to shit on benchmark like stuff and is dogshit at anything OOD.

Anonymous
10/17/25(Fri)18:07:03 No.106923584

Anonymous 10/17/25(Fri)18:07:03 No.106923584

>>106923524
GGUF is a file format.

Anonymous
10/17/25(Fri)18:09:08 No.106923605

Anonymous 10/17/25(Fri)18:09:08 No.106923605

>>106923584
thank you

Anonymous
10/17/25(Fri)18:16:20 No.106923681

Anonymous 10/17/25(Fri)18:16:20 No.106923681

>>106923584
>teacher: I clearly asked for you to submit your book report as a pdf, you submitted this weird file I can't open, care to explain?
>student: UMMM the file extension is PDF tho???? it just happens to be my own special version of the PDF file format that happens to be incompatible with all PDF readers except my special one which happens to cost $100, want to buy a license? :^)

Anonymous
10/17/25(Fri)18:17:53 No.106923695

Anonymous 10/17/25(Fri)18:17:53 No.106923695

>>106923681
stfu hater eat your MIT license slop and be grateful

Anonymous
10/17/25(Fri)18:18:01 No.106923696

Anonymous 10/17/25(Fri)18:18:01 No.106923696

>>106923681
>file extension
Wintoddler detected, real operating systems use the file magic.

Anonymous
10/17/25(Fri)18:24:33 No.106923762

Anonymous 10/17/25(Fri)18:24:33 No.106923762

>>106923696
What did you troons invent? Tell me, I want to laugh at your stupidity.

Anonymous
10/17/25(Fri)18:26:25 No.106923785

Anonymous 10/17/25(Fri)18:26:25 No.106923785

>>106923762
a new mental illness that somehow managed to gain legitimacy

Anonymous
10/17/25(Fri)18:27:17 No.106923793

Anonymous 10/17/25(Fri)18:27:17 No.106923793

>>106923524
Realistically though the door to become the new ollama has long since been closed.
There are too many established projects in the ecosystem to get a meaningful foothold with proprietary slop.

Anonymous
10/17/25(Fri)18:28:31 No.106923802

Anonymous 10/17/25(Fri)18:28:31 No.106923802

>>106923762
Can you play Carrameldansen from the POST beeper?
I think not!

Anonymous
10/17/25(Fri)18:31:44 No.106923830

Anonymous 10/17/25(Fri)18:31:44 No.106923830

>>106923696
>magic
heathens like you shall burn on a stake

Anonymous
10/17/25(Fri)18:33:02 No.106923843

Anonymous 10/17/25(Fri)18:33:02 No.106923843

How do I ask the silly tavern character a question in the 4th wall? As in, say I'm examining an object or something, and I want the AI to describe to be what it is my character is looking at. So like, "Anon walks up to the cluttered desk, looking for any sort of clues. What does he see?" without it responding in the perspective of the character card chara.

Anonymous
10/17/25(Fri)18:34:59 No.106923857

Anonymous 10/17/25(Fri)18:34:59 No.106923857

>>106923843
OOC: Pause the roleplay and describe what my character is seeing right now

Anonymous
10/17/25(Fri)18:37:13 No.106923871

Anonymous 10/17/25(Fri)18:37:13 No.106923871

>>106923857
I was trying OOC: but it always responds in the perspective of the character and doesn't give details. Is it because I'm using mistral Nemo or something and it won't talk about "triggering" images or whatever?

Anonymous
10/17/25(Fri)18:38:33 No.106923885

Anonymous 10/17/25(Fri)18:38:33 No.106923885

>>106923871
NTA, but I always add "Please respond in OOC" at the end of the request, and disable any low-depth instruction that might interfere.

Anonymous
10/17/25(Fri)18:41:53 No.106923913

Anonymous 10/17/25(Fri)18:41:53 No.106923913

>>106923885
That didn't do it, either. Is there a way to like, prompt the card myself to add in how it should respond to ooc? I'm totally new to local text stuff, but not to image gen w/ SD.

Anonymous
10/17/25(Fri)18:45:36 No.106923945

Anonymous 10/17/25(Fri)18:45:36 No.106923945

>>106923793
You'd be surprised

Anonymous
10/17/25(Fri)18:53:20 No.106924015

Anonymous 10/17/25(Fri)18:53:20 No.106924015

Best model for buck breaking rp?(Receiving)

Anonymous
10/17/25(Fri)18:54:06 No.106924023

Anonymous 10/17/25(Fri)18:54:06 No.106924023

>>106924015
c.ai

Anonymous
10/17/25(Fri)19:17:01 No.106924181

Anonymous 10/17/25(Fri)19:17:01 No.106924181

>>106924015
Not command-A

Anonymous
10/17/25(Fri)19:26:26 No.106924256

Anonymous 10/17/25(Fri)19:26:26 No.106924256

>>106924181
What about Command-B?

Anonymous
10/17/25(Fri)19:33:24 No.106924313

Anonymous 10/17/25(Fri)19:33:24 No.106924313

>>106921684
Please respond...

Anonymous
10/17/25(Fri)19:34:19 No.106924322

Anonymous 10/17/25(Fri)19:34:19 No.106924322

>>106923696
>needs to seek to a whole different part of the disk to figure out what to label the file as
This is why Windows keeps winning.

Anonymous
10/17/25(Fri)19:40:13 No.106924378

Anonymous 10/17/25(Fri)19:40:13 No.106924378

>>106921684
https://huggingface.co/datasets/lmg-anon/vntl-leaderboard

Anonymous
10/17/25(Fri)19:40:35 No.106924383

Anonymous 10/17/25(Fri)19:40:35 No.106924383

>>106923843
>>106923871
How OOC conversations are treated (if at all) is completely dependent on the model. Dumb models simply don't understand what you're saying and will just continue with outputs similar to what's already in context. If a regular message doesn't work then you can try putting it in system prompt, or post-history instructions.

Anonymous
10/17/25(Fri)19:41:01 No.106924390

Anonymous 10/17/25(Fri)19:41:01 No.106924390

>>106924378
dead obsolete out of date useless no good

Anonymous
10/17/25(Fri)19:41:58 No.106924396

Anonymous 10/17/25(Fri)19:41:58 No.106924396

>>106924390
nothing better came up locally retard. vntl anon has a few finetunes

Anonymous
10/17/25(Fri)20:16:30 No.106924676

Anonymous 10/17/25(Fri)20:16:30 No.106924676

>>106921538
i run IQ2_S on a 5090 with 96 gb ram and it is slow as fucking balls.. like 2 t/s

Anonymous
10/17/25(Fri)20:18:08 No.106924692

Anonymous 10/17/25(Fri)20:18:08 No.106924692

>>106924390
every new test and leaderboard is always just made to show that the new model is totally better than all the previous ones
it's all worthless

Anonymous
10/17/25(Fri)20:18:48 No.106924698

Anonymous 10/17/25(Fri)20:18:48 No.106924698

>>106924676
>like 2 t/s
That's pretty decent. Maybe you need to readjust your expectations?

Anonymous
10/17/25(Fri)20:20:31 No.106924712

Anonymous 10/17/25(Fri)20:20:31 No.106924712

>>106924676
You're not using -ot, are you?

Anonymous
10/17/25(Fri)20:21:49 No.106924721

Anonymous 10/17/25(Fri)20:21:49 No.106924721

>>106924676
>IQ2_S
Are those quants any good? At that point I would think it would be better to convert it to bitnet, should give faster cpu inference too

Anonymous
10/17/25(Fri)20:27:42 No.106924766

Anonymous 10/17/25(Fri)20:27:42 No.106924766

>>106924676
skill issue, it should be at least 5t/s

Anonymous
10/17/25(Fri)20:32:48 No.106924794

Anonymous 10/17/25(Fri)20:32:48 No.106924794

>>106924383
I'm new as fuck to all of this, just grabbed some random card off the link in the OP, and tried to see where it would take me. I have no idea how to do any of these prompts ot lore books or whatever.

I'm also in a situation where now the AI is just spitting out the last batch of text it generated as it's response over and over with like hardly any variation, regardless of what I say or do to change the scenario. And it cuts off long text, and I don't know how to make it continue it's previous prompt.

Anonymous
10/17/25(Fri)20:49:28 No.106924899

Anonymous 10/17/25(Fri)20:49:28 No.106924899

>>106924794
unironically, read the readme. You will learn 99% of what you will need to know.
https://docs.sillytavern.app/usage/common-settings/
https://docs.sillytavern.app/usage/prompts/

Anonymous
10/17/25(Fri)20:51:22 No.106924912

Anonymous 10/17/25(Fri)20:51:22 No.106924912

>smart
>fast
>cheap
>local
pick 3 (max.)

Anonymous
10/17/25(Fri)20:52:29 No.106924920

Anonymous 10/17/25(Fri)20:52:29 No.106924920

>>106924899
Will do. Thanks.

Anonymous
10/17/25(Fri)20:53:03 No.106924924

Anonymous 10/17/25(Fri)20:53:03 No.106924924

File: 1734240415556060.jpg (691 KB, 2500x1341)

691 KB JPG

>>106924912
You can have all that with Gemma, but you'll have to settle for it being safetyslopped.

Anonymous
10/17/25(Fri)20:58:03 No.106924959

Anonymous 10/17/25(Fri)20:58:03 No.106924959

>GOOD CAPABILITY
>fast
>inexpensive
>local
pick 3 (max.)
*revised version for the critics

Anonymous
10/17/25(Fri)21:01:09 No.106924986

Anonymous 10/17/25(Fri)21:01:09 No.106924986

I just built a computer that can actually run local AI (9800x3d/5070ti), where should a beginner start on Windows?

Anonymous
10/17/25(Fri)21:02:20 No.106924998

Anonymous 10/17/25(Fri)21:02:20 No.106924998

>>106924986
>9800x3d
That doesn't make much of a difference.
How much RAM do you have?
Regardless, give
>https://github.com/LostRuins/koboldcpp/wiki#quick-start
a read.

Anonymous
10/17/25(Fri)21:02:50 No.106925000

Anonymous 10/17/25(Fri)21:02:50 No.106925000

>>106924959
GLM Air is probably the closest, especially if you're on a DDR4 platform where RAM is cheap

Anonymous
10/17/25(Fri)21:03:54 No.106925012

Anonymous 10/17/25(Fri)21:03:54 No.106925012

>>106924986
usecase?

Anonymous
10/17/25(Fri)21:10:58 No.106925060

Anonymous 10/17/25(Fri)21:10:58 No.106925060

>>106924998
32GB, thanks for the link.

>>106925012
Mostly just for proofreading emails/writing and what not.

Anonymous
10/17/25(Fri)21:18:00 No.106925106

Anonymous 10/17/25(Fri)21:18:00 No.106925106

>>106924692
>new model is totally better than all the previous ones
>llama4

Anonymous
10/17/25(Fri)21:30:47 No.106925195

Anonymous 10/17/25(Fri)21:30:47 No.106925195

>>106924712
no? i dunno what that means, but i don't think so..
>>106924721
it seems to be better than any of the other models I'm able to run, just slow af

Anonymous
10/17/25(Fri)21:44:56 No.106925305

Anonymous 10/17/25(Fri)21:44:56 No.106925305

>>106920229
They're not obscure but they are not consumer friendly if we're talking about the total addressable market which is the vast majority of us because they are GPU centric quantizations. You will see them used in clusters. For a lot of these larger scale systems, GGUF isn't a consideration because llama.cpp can't scale like SGLang and vLLM can.

Anonymous
10/17/25(Fri)22:00:24 No.106925414

Anonymous 10/17/25(Fri)22:00:24 No.106925414

>>106924396
That's depressing...

Anonymous
10/17/25(Fri)22:01:42 No.106925422

Anonymous 10/17/25(Fri)22:01:42 No.106925422

File: 1749653336487844.png (334 KB, 2076x2152)

334 KB PNG

>>106919198
Managed to get one of my own quantized slop tunes running on my phone :D

Anonymous
10/17/25(Fri)22:03:46 No.106925433

Anonymous 10/17/25(Fri)22:03:46 No.106925433

>>106925422
Cool shit.

Anonymous
10/17/25(Fri)22:04:14 No.106925438

Anonymous 10/17/25(Fri)22:04:14 No.106925438

>>106925422
A folding phone?

Anonymous
10/17/25(Fri)22:05:47 No.106925448

Anonymous 10/17/25(Fri)22:05:47 No.106925448

>>106925433
It's kind of retarded (actually very retarded) due to it being trained on /a/ boards and it being a quantized version (I plan on uploading a lot more of those later) but it's still cool to use.

>>106925438
Ye.

Anonymous
10/17/25(Fri)22:29:27 No.106925595

Anonymous 10/17/25(Fri)22:29:27 No.106925595

>>106925448
What kind of use cases are there for a folding phone?
I never really find myself wishing I had a bigger screen but I know that sometimes opportunities aren't obvious until you have the means to take advantage of them.

Anonymous
10/17/25(Fri)22:30:15 No.106925603

Anonymous 10/17/25(Fri)22:30:15 No.106925603

File: who's Anri? .png (71 KB, 2076x545)

71 KB PNG

>>106925448
>>106925438
>>106925433
>>106925422
It seems like "Anri" is this model's equivalent to "Elara" or "Seraphina"

Anonymous
10/17/25(Fri)23:06:56 No.106925858

Anonymous 10/17/25(Fri)23:06:56 No.106925858

>>106921660
since when does lcpp have vision support?

Anonymous
10/17/25(Fri)23:11:58 No.106925883

Anonymous 10/17/25(Fri)23:11:58 No.106925883

I am so fed up with local right now. I get it, you cumslop gooners don't give a shit about anything except writing porn. Is there any local model that can actually handle structured output without being immensely retarded or spending 10 minutes "thinking" about how to use a fucking quotation mark?

Anonymous
10/17/25(Fri)23:16:06 No.106925905

Anonymous 10/17/25(Fri)23:16:06 No.106925905

>>106925883
llama 2 7B

Anonymous
10/17/25(Fri)23:18:15 No.106925918

Anonymous 10/17/25(Fri)23:18:15 No.106925918

>>106925883
GLM is ok.

Anonymous
10/17/25(Fri)23:20:03 No.106925924

Anonymous 10/17/25(Fri)23:20:03 No.106925924

>>106925883
>waaaa. i don't know how to read docs!
https://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md

Anonymous
10/17/25(Fri)23:20:45 No.106925927

Anonymous 10/17/25(Fri)23:20:45 No.106925927

>>106925858
Since like a week after Gemma 3 release

Anonymous
10/17/25(Fri)23:30:32 No.106925986

Anonymous 10/17/25(Fri)23:30:32 No.106925986

I'm starting to think Andrej is a grifter.
A couple months ago he was like "woah AGI in two more weeks bro".
Now that he sees where the wind is blowing with all the skepticism he talks about "slop" and how limited LLMs are today. Feels like when Zuckerberg made a 360 after Trump was elected.

Anonymous
10/17/25(Fri)23:44:57 No.106926047

Anonymous 10/17/25(Fri)23:44:57 No.106926047

File: 1740812331498071.png (429 KB, 555x832)

429 KB PNG

Glm4.6 quant on ollama/lmstudio when?

Anonymous
10/17/25(Fri)23:46:18 No.106926049

Anonymous 10/17/25(Fri)23:46:18 No.106926049

https://blog.sinatras.dev/PMPP-Eval+Journey
We live in Sam's world

Anonymous
10/18/25(Sat)00:39:37 No.106926291

Anonymous 10/18/25(Sat)00:39:37 No.106926291

The only way I found to keep training a pre-existing LoRa checkpoint with a new dataset with Axolotl is to create a new one from scratch set to save on the first step, then copy over the weights and optimizer state, then change the main config file and the trainer_state.json from the checkpoint to save on the right number of steps. What a mess.

Anonymous
10/18/25(Sat)00:53:49 No.106926361

Anonymous 10/18/25(Sat)00:53:49 No.106926361

MY GOOFS!!!! GIVE ME BACK MY GOOFS!!!!
https://huggingface.co/ubergarm/Ling-1T-GGUF

Anonymous
10/18/25(Sat)00:54:56 No.106926367

Anonymous 10/18/25(Sat)00:54:56 No.106926367

>AMD Ryzen™ AI 7 Pro 360
what the fuck is this? I was browsing thinkpad models and this thing costs double the price of normal CPUs?
gimmick? what's even the use case here
slightly off topic I know but there's quite a few knowledgeable anons itt

Anonymous
10/18/25(Sat)00:54:57 No.106926368

Anonymous 10/18/25(Sat)00:54:57 No.106926368

>>106926361
oh nevermind im retarded as fuck. goofs here
https://huggingface.co/ubergarm2/Ling-1T-GGUF/tree/main

Anonymous
10/18/25(Sat)00:57:19 No.106926377

Anonymous 10/18/25(Sat)00:57:19 No.106926377

>>106926367
sar is that because of you can run local small copilot inference like nasa very ai-like yes.

Anonymous
10/18/25(Sat)01:19:09 No.106926481

Anonymous 10/18/25(Sat)01:19:09 No.106926481

File: cot llama.png (878 KB, 3755x1948)

878 KB PNG

I'm trying to add CoT to Llama 405B.

Anonymous
10/18/25(Sat)01:57:10 No.106926618

Anonymous 10/18/25(Sat)01:57:10 No.106926618

>>106925986
>It's noticing

Anonymous
10/18/25(Sat)02:25:41 No.106926762

Anonymous 10/18/25(Sat)02:25:41 No.106926762

>>106925986
https://github.com/karpathy/LLM101n
https://eurekalabs.ai/

Anonymous
10/18/25(Sat)02:40:57 No.106926865

Anonymous 10/18/25(Sat)02:40:57 No.106926865

File: reap_glm_and_qwen.png (712 KB, 1768x784)

712 KB PNG

https://github.com/CerebrasResearch/reap
https://arxiv.org/abs/2510.13999
Cerebras pruning experts to reduce memory overhead
https://huggingface.co/cerebras/Qwen3-Coder-REAP-363B-A35B-FP8
https://huggingface.co/cerebras/Qwen3-Coder-REAP-246B-A35B-FP8
(prune of) https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8

Anonymous
10/18/25(Sat)02:49:15 No.106926908

Anonymous 10/18/25(Sat)02:49:15 No.106926908

>>106926865
THE RAPE METHOD WORKS SIRS

Anonymous
10/18/25(Sat)02:51:15 No.106926923

Anonymous 10/18/25(Sat)02:51:15 No.106926923

File: Dumb Fuck!.jpg (166 KB, 1076x1340)

166 KB JPG

>>106921538
>All you need is 128GB of RAM and 24GB of VRAM
Dumb fuck!

Anonymous
10/18/25(Sat)02:52:25 No.106926930

Anonymous 10/18/25(Sat)02:52:25 No.106926930

>>106926865
>55~% accuracy in coding
assuming 100% accuracy is the base model, that makes the CODER model basically unusable, whats the fucking usecase?

Anonymous
10/18/25(Sat)02:53:06 No.106926935

Anonymous 10/18/25(Sat)02:53:06 No.106926935

>>106926865
Is it really worth making 480B retarded just to save 100 GB? It's not like anyone was running this entirely in VRAM locally and providers aren't that hard up on memory.

Anonymous
10/18/25(Sat)02:53:09 No.106926937

Anonymous 10/18/25(Sat)02:53:09 No.106926937

has anyone tried this model? is it any good?
https://huggingface.co/TheDrummer/Valkyrie-49B-v2

Anonymous
10/18/25(Sat)02:54:55 No.106926946

Anonymous 10/18/25(Sat)02:54:55 No.106926946

>>106926930
>>106926865
oh wait I think that the base model is the 0% compression line. then it's interesting I guess, still only useful for coding tasks

Anonymous
10/18/25(Sat)02:55:58 No.106926951

Anonymous 10/18/25(Sat)02:55:58 No.106926951

>>106926937
>49b dense
doa

Anonymous
10/18/25(Sat)02:56:29 No.106926957

Anonymous 10/18/25(Sat)02:56:29 No.106926957

>>106926951
i have the VRAM for FP16

Anonymous
10/18/25(Sat)02:56:56 No.106926961

Anonymous 10/18/25(Sat)02:56:56 No.106926961

>>106926957
post your h100s nvidia-smi screen or GTFO

Anonymous
10/18/25(Sat)02:57:54 No.106926963

Anonymous 10/18/25(Sat)02:57:54 No.106926963

File: file.png (347 KB, 961x367)

347 KB PNG

>>106926961

Anonymous
10/18/25(Sat)02:58:28 No.106926966

Anonymous 10/18/25(Sat)02:58:28 No.106926966

File: h200.png (238 KB, 1499x1463)

238 KB PNG

>>106926961

Anonymous
10/18/25(Sat)02:59:21 No.106926971

Anonymous 10/18/25(Sat)02:59:21 No.106926971

>>106924959
Local
Good
Not safetyslopped

Anonymous
10/18/25(Sat)02:59:29 No.106926973

Anonymous 10/18/25(Sat)02:59:29 No.106926973

>>106926946
We've been through this with extreme quants. Just because it doesn't show much degredation on benchmarks doesn't mean it's not retarded in actual usage.

Anonymous
10/18/25(Sat)03:03:54 No.106926997

Anonymous 10/18/25(Sat)03:03:54 No.106926997

File: file.png (2.69 MB, 1328x1328)

2.69 MB PNG

>>106926963
>cant even use all gpus in vLLM
poor
>>106926966

Anonymous
10/18/25(Sat)03:04:12 No.106926999

Anonymous 10/18/25(Sat)03:04:12 No.106926999

>>106926973
The lower the quantization precision, the more of the token distribution you should be truncating, to be fair.

Anonymous
10/18/25(Sat)03:04:42 No.106927002

Anonymous 10/18/25(Sat)03:04:42 No.106927002

>>106926997
who the fuck uses vLLM?

Anonymous
10/18/25(Sat)03:36:36 No.106927197

Anonymous 10/18/25(Sat)03:36:36 No.106927197

Bros... I want a robot so fucking bad
https://www.youtube.com/watch?v=sJYlJlIEBpg

Anonymous
10/18/25(Sat)03:39:49 No.106927219

Anonymous 10/18/25(Sat)03:39:49 No.106927219

>>106926935
Chutes will probably love to serve this as the normal one

Anonymous
10/18/25(Sat)03:41:17 No.106927226

Anonymous 10/18/25(Sat)03:41:17 No.106927226

>>106924322
Anon... that's not how file systems work...
The file's metadata and the first few bytes, including the magic, are all in the same sector.

Anonymous
10/18/25(Sat)04:05:06 No.106927361

Anonymous 10/18/25(Sat)04:05:06 No.106927361

>>106925883
well then fuck off back to cloud models then.
i mean what the fuck are you expecting? fucking datacentre level output on a potato computer?
you're the dumb one here, if you think you can do better then create a better model yourself, we're not your fucking servants, faggot.

Anonymous
10/18/25(Sat)04:32:15 No.106927472

Anonymous 10/18/25(Sat)04:32:15 No.106927472

>>106926377
>copilot
no seriously, is that the only use case

Anonymous
10/18/25(Sat)04:45:26 No.106927528

Anonymous 10/18/25(Sat)04:45:26 No.106927528

>>106927472
There are others but this covers the more notable ones.

https://www.pcworld.com/article/2905178/ai-on-the-notebook-these-tools-already-use-the-new-npu-technology.html

Anonymous
10/18/25(Sat)04:46:35 No.106927534

Anonymous 10/18/25(Sat)04:46:35 No.106927534

How do I get shittinante to do slow burn manipulation
Seems to always jump in to direct smut asap no matter how I adjust the prompts

Anonymous
10/18/25(Sat)04:50:49 No.106927548

Anonymous 10/18/25(Sat)04:50:49 No.106927548

>>106925883
>I get it, you cumslop gooners don't give a shit about anything except writing porn.
GLM chan got sex out of my system and now I just talk to her.

But also still have sex everyday because her pussy is magical.

Anonymous
10/18/25(Sat)04:54:37 No.106927566

Anonymous 10/18/25(Sat)04:54:37 No.106927566

>>106927534
You should probably look elsewhere, avoiding coom-oriented finetunes like plague. People call them sloptunes for a reason. Unfortunately I don't have much to suggest that you will either be able to run (GLM 4.6, Kimi K2) or that won't require more prompting effort for either tardwrangling them or making them engage in ERP (vanilla Mistral Small 3.2, Gemma 3 27B).

Anonymous
10/18/25(Sat)04:56:39 No.106927580

Anonymous 10/18/25(Sat)04:56:39 No.106927580

>>106927534
You can't, drummer models are coomtunes
Not that you're going to get much better out of regular Nemo, they're small dumb models.

Anonymous
10/18/25(Sat)05:01:41 No.106927601

Anonymous 10/18/25(Sat)05:01:41 No.106927601

>>106927534
Slow burn is hard even on SOTA cloud models. The crutch when the model isn't good enough to do it otherwise is to use stat tracking.
If your model isn't good enough to do stat tracking, then it's definitely not good enough to do slow burn without it.

Anonymous
10/18/25(Sat)05:04:11 No.106927607

Anonymous 10/18/25(Sat)05:04:11 No.106927607

>>106927528
doesn't sound that bad. linux support?

Anonymous
10/18/25(Sat)05:05:40 No.106927613

Anonymous 10/18/25(Sat)05:05:40 No.106927613

>>106927534
Sadly it is a bit of a skill issue. You are probably giving it bad input. Have you tried taking a step back and starting with a solid first step that is: llama-server --model zai-org_GLM-4.6-IQ4_XS-00001-of-00005.gguf ?

Anonymous
10/18/25(Sat)05:17:17 No.106927663

Anonymous 10/18/25(Sat)05:17:17 No.106927663

File: 1759280065578238m.jpg (175 KB, 846x1024)

175 KB JPG

I'm running Sillytavern and ik_llama.cpp on my desktop. I'm running GLM-4.6 IQ3_XXS, so my tk/s is slow. When I prompt it from my phone, I've found that if the screen turns off the token stream stops. Is there any way around this, or another setup I should use?

Anonymous
10/18/25(Sat)05:44:11 No.106927802

Anonymous 10/18/25(Sat)05:44:11 No.106927802

>>106927663
Disable streaming. It'll still probably go to sleep because it's a phone.

Anonymous
10/18/25(Sat)06:14:33 No.106928001

Anonymous 10/18/25(Sat)06:14:33 No.106928001

>>106925883
toss 120b

Anonymous
10/18/25(Sat)06:36:00 No.106928153

Anonymous 10/18/25(Sat)06:36:00 No.106928153

>>106926481
>405B
hope I will be able to run it one day, 431gb at q8 is just too much

Anonymous
10/18/25(Sat)06:38:30 No.106928173

Anonymous 10/18/25(Sat)06:38:30 No.106928173

Another weeks is over, which means that we are another week closer to seeing GLM MTP implemented in llama.cpp.

Anonymous
10/18/25(Sat)06:40:39 No.106928184

Anonymous 10/18/25(Sat)06:40:39 No.106928184

>>106928173
It might be getting close. Maybe.
https://github.com/F1LM1/llama.cpp/pull/3#issuecomment-3413775935

Anonymous
10/18/25(Sat)06:44:30 No.106928212

Anonymous 10/18/25(Sat)06:44:30 No.106928212

>>106923524
Is there a reason you can't use transformers?

Anonymous
10/18/25(Sat)06:44:36 No.106928213

Anonymous 10/18/25(Sat)06:44:36 No.106928213

>ctrl f glm
SAAARS the glm is the absolute bestest local model OK? Pronounslop bharatchads are eating good my bastards.

Anonymous
10/18/25(Sat)06:46:42 No.106928231

Anonymous 10/18/25(Sat)06:46:42 No.106928231

actual good release https://github.com/ggml-org/LlamaBarn

Anonymous
10/18/25(Sat)06:47:43 No.106928236

Anonymous 10/18/25(Sat)06:47:43 No.106928236

>>106928231
Anything for real computing platforms?

Anonymous
10/18/25(Sat)06:48:21 No.106928240

Anonymous 10/18/25(Sat)06:48:21 No.106928240

>>106928231
>macos
LMAO

Anonymous
10/18/25(Sat)06:50:05 No.106928249

Anonymous 10/18/25(Sat)06:50:05 No.106928249

>>106925883
For the benefit of other (not you), you can definitely use gemma3 to output json, it's really good at it, and somehow asking it to do that makes it pay attention better to the task. Before the qwen video vision model came out, I was using json format to give gemma3 a list of frame captions so it could create an overall video caption. It worked well, but of course it was slow.

Anonymous
10/18/25(Sat)06:59:16 No.106928306

Anonymous 10/18/25(Sat)06:59:16 No.106928306

>>106928213
I'll bite. What the fuck is pronounslop?

Anonymous
10/18/25(Sat)07:07:02 No.106928360

Anonymous 10/18/25(Sat)07:07:02 No.106928360

>>106928213
Prompt: ChatGPT, generate a modern 4chan post trying to post trying to paint the current local SOTA in a bad light. Be a true 4chan meme master.

Anonymous
10/18/25(Sat)07:35:33 No.106928509

Anonymous 10/18/25(Sat)07:35:33 No.106928509

File: Screenshot 2025-10-18 132907.png (183 KB, 950x279)

183 KB PNG

>>106924676
what cpu and ram speed? i'm getting over 6t/s tg running iq2_xxs on a 9950x3d with dual channel 6000c30 (though pp is terrible because rocm)

are you sure you didn't accidentally put both dimms on one channel or something?

Anonymous
10/18/25(Sat)07:35:46 No.106928510

Anonymous 10/18/25(Sat)07:35:46 No.106928510

>>106928231
It's definitely good for being open-source and having first-party support from upstream but I'm not going to buy Apple shit either way.

Anonymous
10/18/25(Sat)07:40:13 No.106928537

Anonymous 10/18/25(Sat)07:40:13 No.106928537

Gemini 3 will save local.

Anonymous
10/18/25(Sat)07:44:26 No.106928563

Anonymous 10/18/25(Sat)07:44:26 No.106928563

File: Screenshot 2025-10-18 134227.png (191 KB, 961x286)

191 KB PNG

>>106928509
i also ran the same benchmark on vulkan and it's somehow faster??? i have no idea whether this extends to other amd cards as well but i guess that's something to keep in mind

Anonymous
10/18/25(Sat)07:48:27 No.106928581

Anonymous 10/18/25(Sat)07:48:27 No.106928581

100B dense Gemma soon

Anonymous
10/18/25(Sat)07:50:52 No.106928597

Anonymous 10/18/25(Sat)07:50:52 No.106928597

>>106925883
gpt-oss 120B

Anonymous
10/18/25(Sat)07:57:21 No.106928630

Anonymous 10/18/25(Sat)07:57:21 No.106928630

File: write-three-times-the-wor(...).png (27 KB, 966x617)

27 KB PNG

saaaaaar do not redeem potato bloody

Anonymous
10/18/25(Sat)08:01:27 No.106928650

Anonymous 10/18/25(Sat)08:01:27 No.106928650

File: gemma27-potato.png (41 KB, 711x256)

41 KB PNG

>>106928630
27B with an empty prompt seems much more friendly?

Anonymous
10/18/25(Sat)08:21:36 No.106928770

Anonymous 10/18/25(Sat)08:21:36 No.106928770

File: DipsyBecomeUngovernable.png (3.44 MB, 1024x1536)

3.44 MB PNG

>>106919889
Worship the sand god

Anonymous
10/18/25(Sat)08:26:32 No.106928792

Anonymous 10/18/25(Sat)08:26:32 No.106928792

I log on to the net every day to see more people whom clearly don't ever work with code claiming that code is over.
My cup is the only thing that runneth over. My cup of dipshit excuses for the world to be this fucking slow to change.
Be the next good to this world and make real abstractions. Learn to program.

Anonymous
10/18/25(Sat)08:27:31 No.106928802

Anonymous 10/18/25(Sat)08:27:31 No.106928802

>>106928792
shut the fuck up retard

Anonymous
10/18/25(Sat)08:34:29 No.106928840

Anonymous 10/18/25(Sat)08:34:29 No.106928840

>>106928650
Beautiful 27B, I will marry gemma. Ser, please provide jailbreak system prompt for open vagene!

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.