/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/18/25(Sat)14:40:56 No.106931567

File: Miku-09.jpg (131 KB, 512x768)

131 KB JPG

/lmg/ - Local Models General Anonymous 10/18/25(Sat)14:40:56 No.106931567 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106919198 & >>106904820

►News
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking
>(10/11) koboldcpp-1.100.1 prebuilt released with Wan video generation support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.100.1
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/18/25(Sat)14:41:28 No.106931573

Anonymous 10/18/25(Sat)14:41:28 No.106931573

File: miguu.jpg (74 KB, 600x648)

74 KB JPG

►Recent Highlights from the Previous Thread: >>106919198

--Paper: Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity:
>106921354 >106921377 >106921488
--Paper: REAP the Experts: Why Pruning Prevails for One-Shot MoE compression:
>106926865 >106926930 >106926935 >106926946 >106926973 >106926999
--Cost-performance analysis of AMD 3950X 128GB vs custom server for LLM/home server/gaming:
>106919401 >106919472 >106919477 >106920242
--Synthetic data and conversational CoT dataset generation for LLM training:
>106919615 >106919833
--glm-chan model behavior and prompt optimization challenges:
>106919852 >106919884 >106919974 >106920107
--Defining uncensored models through role adaptability vs unpredictable behavior:
>106919886 >106920057 >106920564 >106920631 >106920777
--Limitations and workarounds for training LoRA on quantized models:
>106920664 >106920700 >106920848 >106921079 >106921407
--Sparse model scaling advantages over dense architectures:
>106920856 >106920874 >106920885 >106920916 >106920998 >106921046 >106921100 >106921142 >106921007
--Adding Metal4 tensor support to llama.cpp:
>106920993
--Proprietary GGUF format criticisms:
>106921215 >106923524 >106923584 >106923681 >106923793
--Struggles with AWQ model conversion and vLLM optimization:
>106922104 >106922122 >106922147
--AI/ML education vs practical skills and networking for job prospects:
>106922370 >106922549 >106922690 >106922736
--Valve devs improve Vulkan for llama.cpp AI:
>106930141
--LlamaBarn project announcement and real platform inquiry:
>106928231 >106928236
--Designing a multi-agent AI RPG with state management and narrative consistency:
>106930493 >106930613 >106931198 >106930663
--Challenges in RAG systems for base knowledge integration:
>106931465 >106931513
--Miku (free space):
>106924924 >106930166 >106930227 >106930335 >106930569

►Recent Highlight Posts from the Previous Thread: >>106919206

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/18/25(Sat)14:47:31 No.106931622

Anonymous 10/18/25(Sat)14:47:31 No.106931622

>>106931370
Thanks but I usually use other people's checkpoints and with them I don't see much difference between temperatures, but the quality is better than I remember

Anonymous
10/18/25(Sat)14:49:11 No.106931637

Anonymous 10/18/25(Sat)14:49:11 No.106931637

>>106931562
>const static std::string pattern_moe_all = "blk\\.\\d+\\.ffn_(up|down|gate)_(ch|)exps";
Okay I shouldn't need to do set -ot myself at all then.

Anonymous
10/18/25(Sat)14:50:35 No.106931647

Anonymous 10/18/25(Sat)14:50:35 No.106931647

https://github.com/ggml-org/llama.cpp/pull/16653
posting again since it got posted just at the end of the thread, but proper auto gpu memory allocation is finally coming

Anonymous
10/18/25(Sat)14:52:33 No.106931665

Anonymous 10/18/25(Sat)14:52:33 No.106931665

Elsa Hitler Margret

Anonymous
10/18/25(Sat)14:56:33 No.106931696

Anonymous 10/18/25(Sat)14:56:33 No.106931696

Instead of using zram, zswap is more performant option these day. Feels snappier. Even if you have shit tons of ram zswap is still useful because it stabilizes system paging.

Anonymous
10/18/25(Sat)15:25:50 No.106931969

Anonymous 10/18/25(Sat)15:25:50 No.106931969

File: rip.png (87 KB, 556x512)

87 KB PNG

bros, hf is after /ourguy/

Anonymous
10/18/25(Sat)15:26:54 No.106931980

Anonymous 10/18/25(Sat)15:26:54 No.106931980

>>106931969
good riddance

Anonymous
10/18/25(Sat)15:28:24 No.106931997

Anonymous 10/18/25(Sat)15:28:24 No.106931997

File: file.png (30 KB, 519x134)

30 KB PNG

>>106931980
>t.

Anonymous
10/18/25(Sat)15:29:16 No.106932012

Anonymous 10/18/25(Sat)15:29:16 No.106932012

>>106931969
always were

Anonymous
10/18/25(Sat)15:29:26 No.106932013

Anonymous 10/18/25(Sat)15:29:26 No.106932013

>>106931969
I lost like 40GB of private storage lately. Seems like they're trying to free as much space as they can

Anonymous
10/18/25(Sat)15:30:41 No.106932026

Anonymous 10/18/25(Sat)15:30:41 No.106932026

>>106932013
b-but when it was discussed lately anons said it was le nothingburger and nothing would change?

Anonymous
10/18/25(Sat)15:33:38 No.106932061

Anonymous 10/18/25(Sat)15:33:38 No.106932061

>>106931969
what about the guy that had like 20k merges

Anonymous
10/18/25(Sat)15:33:57 No.106932065

Anonymous 10/18/25(Sat)15:33:57 No.106932065

>>106932026
Well, saas are always trash nothing new here. I'm saving for a 20+TB drive now

Anonymous
10/18/25(Sat)15:38:01 No.106932106

Anonymous 10/18/25(Sat)15:38:01 No.106932106

File: file.png (11 KB, 211x148)

11 KB PNG

>>106932061
>*26K quants
thank you very much, and he seems fine, tough he uploads to the team mradermacher account,
I wonder how much space they use.
in comparison to picrel davidau has ~300 models and drummer is only approaching 200

Anonymous
10/18/25(Sat)15:38:19 No.106932109

Anonymous 10/18/25(Sat)15:38:19 No.106932109

>>106931969
Scammer lost. People won.

Anonymous
10/18/25(Sat)15:44:47 No.106932168

Anonymous 10/18/25(Sat)15:44:47 No.106932168

If I was him I would only keep the few latest tunes in HF and deposit older stuff to somewhere else.

Anonymous
10/18/25(Sat)15:46:12 No.106932185

Anonymous 10/18/25(Sat)15:46:12 No.106932185

>>106932168
but you're not him, and never will be

Anonymous
10/18/25(Sat)15:47:45 No.106932201

Anonymous 10/18/25(Sat)15:47:45 No.106932201

Thankfully the userbase has developed enough to realize slop tunes are all placebo and it is entirely skill issue or being too lazy to prefill a prude model

Anonymous
10/18/25(Sat)15:48:21 No.106932207

Anonymous 10/18/25(Sat)15:48:21 No.106932207

>>106932185
What do you mean?

Anonymous
10/18/25(Sat)15:48:32 No.106932210

Anonymous 10/18/25(Sat)15:48:32 No.106932210

>>106932201
>skill issue
Only if earning money is the skill we are talking about.

Anonymous
10/18/25(Sat)15:50:51 No.106932229

Anonymous 10/18/25(Sat)15:50:51 No.106932229

>>106932201
true, just use Gemma with the response you want pre-written by you in the sys prompt, it works 99% of the time better than nemo!

Anonymous
10/18/25(Sat)15:51:37 No.106932235

Anonymous 10/18/25(Sat)15:51:37 No.106932235

>>106932210
despite richfags constantly dunking on vramlets in the thread, they never post side by sides of the supposed retard vramlet model and their patrician richfag model because they know in their heart of heats for the purpose of ERP you really don't need that many parameters.

Anonymous
10/18/25(Sat)15:53:52 No.106932251

Anonymous 10/18/25(Sat)15:53:52 No.106932251

>>106932235
4B ought to be enough, if only they stopped trying to shove the entire internet in there.

Anonymous
10/18/25(Sat)15:54:45 No.106932258

Anonymous 10/18/25(Sat)15:54:45 No.106932258

I am Drummer.

Anonymous
10/18/25(Sat)15:54:54 No.106932259

Anonymous 10/18/25(Sat)15:54:54 No.106932259

File: imatrix.png (970 KB, 3110x1315)

970 KB PNG

>These quants provide best in class perplexity for the given memory footprint.
What am I missing?
IQ quants seem to be the meme I suspected them to be. All other inference params identical min_p=0.04 sampler only

Anonymous
10/18/25(Sat)15:55:34 No.106932264

Anonymous 10/18/25(Sat)15:55:34 No.106932264

File: file.png (1.05 MB, 871x796)

1.05 MB PNG

>>106932235
>for the purpose of ERP you really don't need that many parameters

Anonymous
10/18/25(Sat)15:56:00 No.106932271

Anonymous 10/18/25(Sat)15:56:00 No.106932271

>>106931969
Open source work?

Hi all, Drummer here...
10/18/25(Sat)15:56:05 No.106932276

Hi all, Drummer here... 10/18/25(Sat)15:56:05 No.106932276

>>106932258
No you're not.

Hi all, Drummer here...
10/18/25(Sat)15:57:10 No.106932288

Hi all, Drummer here... 10/18/25(Sat)15:57:10 No.106932288

>>106932258
You only suck penises like me. But you aren't me.

Anonymous
10/18/25(Sat)15:57:29 No.106932290

Anonymous 10/18/25(Sat)15:57:29 No.106932290

>>106932264
Well anon come on then, post your favorite card with vramlet nemo or gemma and with GLM i'm sure we'll be able to see 3000$ worth of prose improvement.

Hi all, Drummer here...
10/18/25(Sat)15:58:22 No.106932299

Hi all, Drummer here... 10/18/25(Sat)15:58:22 No.106932299

>>106932290
Nemo and Gemma gave me ED. Glm-chan gave me PE.

Hi all, Drummer here...
10/18/25(Sat)16:00:07 No.106932314

Hi all, Drummer here... 10/18/25(Sat)16:00:07 No.106932314

>>106932264
Air when? They're scammier than the drummer at this point.

Anonymous
10/18/25(Sat)16:03:57 No.106932340

Anonymous 10/18/25(Sat)16:03:57 No.106932340

File: drummer schizo grooming.png (79 KB, 542x390)

79 KB PNG

David won?

Anonymous
10/18/25(Sat)16:04:14 No.106932341

Anonymous 10/18/25(Sat)16:04:14 No.106932341

>>106932106
damn, would really like to see the exact storage usage numbers

Anonymous
10/18/25(Sat)16:07:42 No.106932369

Anonymous 10/18/25(Sat)16:07:42 No.106932369

>>106932340
schizotunes bros.... WE WON!!!

Anonymous
10/18/25(Sat)16:07:51 No.106932373

Anonymous 10/18/25(Sat)16:07:51 No.106932373

genuine advice to drummer: make llamacpp agpl fork with lora support, then upload loras only
i doubt u did FFT of glm air right? and for models that bartowski made quants of u coud delete the quants to save space. just keep original models.. id like you to publically announce wat ur gonna do before u start deleting models so we can archive some of ur stuff maybe. at least i know id like to
goodluck drumdrum, i still like ytrying your sloptunes no matter what anons say.
also instead of paying 200$/month u could rent seedbox and host models there or something..

Anonymous
10/18/25(Sat)16:09:06 No.106932391

Anonymous 10/18/25(Sat)16:09:06 No.106932391

>>106932373
>schizobabble
Try running that through an LLM next time zoomie

Anonymous
10/18/25(Sat)16:09:51 No.106932395

Anonymous 10/18/25(Sat)16:09:51 No.106932395

>>106932373
Question : >>106932363

Anonymous
10/18/25(Sat)16:12:09 No.106932411

Anonymous 10/18/25(Sat)16:12:09 No.106932411

>>106931969
>open source work
The only thing he ever did was fill up their hard drives with shit models and there wasn't even anything open source about it.

Anonymous
10/18/25(Sat)16:14:16 No.106932428

Anonymous 10/18/25(Sat)16:14:16 No.106932428

>>106931969
drummer, start an OF. I'll support you. show off that bussy while you do those 'toons baby

Anonymous
10/18/25(Sat)16:14:23 No.106932430

Anonymous 10/18/25(Sat)16:14:23 No.106932430

>>106932373
Great advice. He should totally do that.

Anonymous
10/18/25(Sat)16:14:49 No.106932433

Anonymous 10/18/25(Sat)16:14:49 No.106932433

>>106932395
He's retarded. LoRAs have always worked, but Drummer and the mouthbreathers that use his models probably wouldn't know how to load a LoRA. He also can't simply take llama.cpp and relicense it.

Anonymous
10/18/25(Sat)16:16:39 No.106932449

Anonymous 10/18/25(Sat)16:16:39 No.106932449

>>106932433
loras are a pain in the ass to use with quanted shit

Anonymous
10/18/25(Sat)16:17:24 No.106932458

Anonymous 10/18/25(Sat)16:17:24 No.106932458

>>106932433
I do remember LoRA not working in some specific circumstances (multi GPU?), but yeah.
As far as I know, people could release their LoRAs instead of just the final merged model.
I don't know how LoRA interacts with quantization however, if there's something specific you need to do for a specific quant and such, or if it only works with the unquanted model in GGUF format, etc.

Anonymous
10/18/25(Sat)16:17:30 No.106932459

Anonymous 10/18/25(Sat)16:17:30 No.106932459

What's a lora?

Anonymous
10/18/25(Sat)16:20:31 No.106932492

Anonymous 10/18/25(Sat)16:20:31 No.106932492

File: muah.jpg (1.08 MB, 3840x2160)

1.08 MB JPG

>>106931969
What does El Drummer actually do?
My assumption was the model merging/raw fucking around with the tensor data, for no good reason. But if he's actually tuning model weights and people enjoy them then respect.

Anonymous
10/18/25(Sat)16:21:13 No.106932495

Anonymous 10/18/25(Sat)16:21:13 No.106932495

>106932201
this is who is now pushing for lora bs by the by

Anonymous
10/18/25(Sat)16:21:41 No.106932501

Anonymous 10/18/25(Sat)16:21:41 No.106932501

>>106932492
He does tune, but it's all pretty half assed and not very interesting. Most attempts are big flops that contribute absolutely nothing.

Anonymous
10/18/25(Sat)16:21:54 No.106932505

Anonymous 10/18/25(Sat)16:21:54 No.106932505

>>106932459
It's like qlora but without quantization

Anonymous
10/18/25(Sat)16:22:19 No.106932509

Anonymous 10/18/25(Sat)16:22:19 No.106932509

>>106932492
he does indeed do tunes, david is the one that's mainly merges

Anonymous
10/18/25(Sat)16:26:26 No.106932549

Anonymous 10/18/25(Sat)16:26:26 No.106932549

>>106932501
>>106932509
We've moved beyond entertaining the concept of somehow merging trained model weights, right? huzzah

Anonymous
10/18/25(Sat)16:30:07 No.106932577

Anonymous 10/18/25(Sat)16:30:07 No.106932577

>>106932509
I love that David exists.
Where else would you get
>DavidAU/Qwen3-MOE-2x8B-TNG-Deckard-Beta-16B

Anonymous
10/18/25(Sat)16:32:01 No.106932589

Anonymous 10/18/25(Sat)16:32:01 No.106932589

>>106932577
Did anyone actually try any of these turds? Does David actually do anything to the weights, or does he just slap that shit together in mergekit and calls it a day?

Anonymous
10/18/25(Sat)16:32:20 No.106932593

Anonymous 10/18/25(Sat)16:32:20 No.106932593

>>106932577
I mean, look at this shit
>This is MOE model config of TWO "DND" (double neuron density) 8B models.
>The first model is trained on the TNG/Star Trek Universe (2 datasets) via Unsloth.
>The second model is trained on the Deckard/PK Dick Universe (5 datasets) via Unsloth.
>Both models use a BASE of Jan V1 4B + Brainstorm 40x (4B+ 40x => 8B parameters.)
>The MOE - mixture of experts - config is 2x8B - 16B parameters. With compression this creates a model of 13B - all the power of 16B in 13B package.
>This MOE drastically upscales the BOTH expert models in this MOE model.
>This model can also be used for Role play, Star Treking, Science Fiction, writing, story generation etc etc.
>The BASE model is (a 4B model + Brainstorm 40x adapter):
This is amazing.

Anonymous
10/18/25(Sat)16:33:07 No.106932601

Anonymous 10/18/25(Sat)16:33:07 No.106932601

>>106932589
that's not the point, it's drummer that needs to be stopped, david is a wholesome bean.

Anonymous
10/18/25(Sat)16:33:21 No.106932603

Anonymous 10/18/25(Sat)16:33:21 No.106932603

>>106932589
I've tried a couple and they were all, without fail, schizo out of the box.
Or just exactly like llama 3 8b with high temp but taking double to 4x the memory.

Anonymous
10/18/25(Sat)16:35:06 No.106932612

Anonymous 10/18/25(Sat)16:35:06 No.106932612

>>106932601
They're both retards wasting space and compute.

Anonymous
10/18/25(Sat)16:35:43 No.106932615

Anonymous 10/18/25(Sat)16:35:43 No.106932615

>>106932601
if beans could be schizophrenic....

Anonymous
10/18/25(Sat)16:35:52 No.106932616

Anonymous 10/18/25(Sat)16:35:52 No.106932616

File: file.png (149 KB, 603x920)

149 KB PNG

>>106932603
what class of model did you try doe?

Anonymous
10/18/25(Sat)16:37:00 No.106932623

Anonymous 10/18/25(Sat)16:37:00 No.106932623

>>106932601
drummer bought an ad on 4chan. He is /ourguy/ regardless of any other factor.

Anonymous
10/18/25(Sat)16:37:04 No.106932624

Anonymous 10/18/25(Sat)16:37:04 No.106932624

>>106931969
>scammer is no longer able to waste bandwidth advertising recycled toys over and over again
based

Anonymous
10/18/25(Sat)16:37:44 No.106932631

Anonymous 10/18/25(Sat)16:37:44 No.106932631

>>106932616
I love the schizo ass model cards.
Seriously, it's pure ML vodoo.
It's great.

Anonymous
10/18/25(Sat)16:38:45 No.106932638

Anonymous 10/18/25(Sat)16:38:45 No.106932638

File: Screenshot 2025-10-18 at (...).png (191 KB, 733x2170)

191 KB PNG

>>106932631
Forgot the image.

Anonymous
10/18/25(Sat)16:39:07 No.106932642

Anonymous 10/18/25(Sat)16:39:07 No.106932642

>>106932631
I assume you did consult the required reading material, right? https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

Anonymous
10/18/25(Sat)16:44:11 No.106932672

Anonymous 10/18/25(Sat)16:44:11 No.106932672

>>106932642
Of course.
Although you also have to keep the caveats of the individual models in mind, such as
>https://huggingface.co/DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Anonymous
10/18/25(Sat)16:45:07 No.106932681

Anonymous 10/18/25(Sat)16:45:07 No.106932681

>>106932623
fuck off

Anonymous
10/18/25(Sat)16:46:40 No.106932692

Anonymous 10/18/25(Sat)16:46:40 No.106932692

>>106932623
I forgot about that. Maybe drummer isn't a terrorist, he's just misguided.

Anonymous
10/18/25(Sat)16:46:47 No.106932693

Anonymous 10/18/25(Sat)16:46:47 No.106932693

>>106932681
it'll be okay Icy-Helicopter-kyun do not worries

Anonymous
10/18/25(Sat)16:55:05 No.106932749

Anonymous 10/18/25(Sat)16:55:05 No.106932749

Getting the MI50 to work on windows was such a pain in the ass.

Anonymous
10/18/25(Sat)16:57:07 No.106932759

Anonymous 10/18/25(Sat)16:57:07 No.106932759

>>106932749
Do tell.
Just drivers? Some sort of incompatibility?
RocM issues?

Anonymous
10/18/25(Sat)16:59:02 No.106932767

Anonymous 10/18/25(Sat)16:59:02 No.106932767

>>106932759
nta but amd never bothered with official windows support for the instinct cards

Anonymous
10/18/25(Sat)17:01:45 No.106932783

Anonymous 10/18/25(Sat)17:01:45 No.106932783

File: cursed-nagatoro.png (3.43 MB, 1920x1782)

3.43 MB PNG

>>106932749
On Linux it worked out of the box.

Anonymous
10/18/25(Sat)17:29:00 No.106932987

Anonymous 10/18/25(Sat)17:29:00 No.106932987

>>106932749
im flabbergasted you even got it to work on it

Anonymous
10/18/25(Sat)17:34:58 No.106933044

Anonymous 10/18/25(Sat)17:34:58 No.106933044

>>106932759
>>106932987
I had to flash the radeon pro VII vbios and use the bootcamp drivers because the normal drivers wouldn't work for some reason even though they are available in the AMD page.
Also for some reason the flashing tool refused to work on windows so I had to flash it on linux.
For anyone interested in flashing the original vbios for a better one here is the page I used: https://gist.github.com/evilJazz/14a4c82a67f2c52a6bb5f9cea02f5e13

Anonymous
10/18/25(Sat)17:37:28 No.106933066

Anonymous 10/18/25(Sat)17:37:28 No.106933066

>>106933044
that's a lot of fucking around for having it gimped by windows anyway

Anonymous
10/18/25(Sat)17:40:15 No.106933086

Anonymous 10/18/25(Sat)17:40:15 No.106933086

>>106931969
Why make this public so soon? I don't think "Thank you HF for giving my popular coomer tunes special treatment!" is good PR for HF.

Anonymous
10/18/25(Sat)17:42:18 No.106933100

Anonymous 10/18/25(Sat)17:42:18 No.106933100

>>106933044
Cool shit.
Thank you for sharing the link too.

Anonymous
10/18/25(Sat)17:45:02 No.106933123

Anonymous 10/18/25(Sat)17:45:02 No.106933123

>>106932492
>What does El Drummer actually do?
His biggest success was making the thinking process of a model make the model go into safe rejection mode. I think it was nemo so he made unsafe model safe.

Anonymous
10/18/25(Sat)17:47:22 No.106933141

Anonymous 10/18/25(Sat)17:47:22 No.106933141

>>106932749
What is this mindset? You are buying some used server gpus and are like "NONONO IT MUST WORK ON MY W10 MACHINE".

Anonymous
10/18/25(Sat)17:47:29 No.106933144

Anonymous 10/18/25(Sat)17:47:29 No.106933144

>>106932593
>double neuron density
Wow. I think he actually isn't a shyster like drummer. He is just that sovlful.

Anonymous
10/18/25(Sat)17:49:24 No.106933157

Anonymous 10/18/25(Sat)17:49:24 No.106933157

>>106932623
suck his dick and get HIV

Anonymous
10/18/25(Sat)17:50:47 No.106933168

Anonymous 10/18/25(Sat)17:50:47 No.106933168

Is it david_AU cause he is golden?

Anonymous
10/18/25(Sat)17:53:13 No.106933185

Anonymous 10/18/25(Sat)17:53:13 No.106933185

>NEVER
[4031, 3848]
>never
[37593]
>EXCLUSIVELY
[3337, 38953, 3166, 50309]
>exclusively
[327, 4256, 3210]
fuck you zuck

Anonymous
10/18/25(Sat)17:54:10 No.106933196

Anonymous 10/18/25(Sat)17:54:10 No.106933196

>>106931567
Okay, I know there is stuff you can do to make text models send prompts to SD based on the contents of the chat.
How do you do that?

I've got Forge SD WebUI up and running.
I haven't done anything with locally hosting LLMs yet, only messed around with Janitor and Venus, but I can probably pretty easily get KoboldCPP+SillyTavern+Mistral up and running.

Anonymous
10/18/25(Sat)17:55:32 No.106933210

Anonymous 10/18/25(Sat)17:55:32 No.106933210

>>106933185
get tokened, faggot

Anonymous
10/18/25(Sat)18:00:25 No.106933257

Anonymous 10/18/25(Sat)18:00:25 No.106933257

>>106932428
I'm honestly surprised more people don't have OFs starring AI starlets.

The image technology is there.
The video technology is there.
The voice technology is there if you want to got that far.
The text generation technology is there.

Seems like Digital Pimp is a major career possibility.

Anonymous
10/18/25(Sat)18:01:38 No.106933269

Anonymous 10/18/25(Sat)18:01:38 No.106933269

>>106933141
I'm traveling for some months so I'm lending my pc to my normie cousin because he wants to play some racing games and other stuff.

Anonymous
10/18/25(Sat)18:23:16 No.106933415

Anonymous 10/18/25(Sat)18:23:16 No.106933415

Can any anons that use ik_llama.cpp sanity check me on my llama-bench.exe setup?

I can run llama-server without issue, but I can't seem to get bench to load. Using a IQ2 GLM-4.6 on 128+24GB.

I'm mainly getting "main: error: failed to load model 'model_path'"

Here is the PS script I'm using to start the server - I've got -ngl 1 just to test, as my issues started when I tried to load any of the model into GPU.
# Change to the directory this script is in
Set-Location -Path $PSScriptRoot

# === Full path to your GLM-4.6 model ===
$MODEL = "G:\LLM\Models\GLM-4.6-IQ2_KL\GLM-4.6-IQ2_KL-00001-of-00003.gguf"

# === Launch llama-server with recommended GLM-4.6 settings ===
& .\llama-bench.exe `
  -m "$MODEL" `
  -mmp 0 `
  -ngl 1 `
  -fa 1 `
  -fmoe 1 `
  -ctk q4_0 -ctv q4_0 `
  -ot exps=CPU `
  -t 20

Pause

Anonymous
10/18/25(Sat)18:25:20 No.106933437

Anonymous 10/18/25(Sat)18:25:20 No.106933437

>>106933415
>-m "$MODEL" `
lol, lmao even. use your brain

Anonymous
10/18/25(Sat)18:41:56 No.106933561

Anonymous 10/18/25(Sat)18:41:56 No.106933561

>>106933437
It's not very big ok

I have "$MODEL" on my scripts for loading it in llama-server.exe and it doesn't give me a hard time.

Anonymous
10/18/25(Sat)18:48:46 No.106933613

Anonymous 10/18/25(Sat)18:48:46 No.106933613

>>106931969
The only decent model he put out was Unslopnemo 3.0. (4.0 is braindead, 4.1 is okay but less fun with writing style) Everything else I tried is either mega slop or just bad.

Anonymous
10/18/25(Sat)18:54:55 No.106933658

Anonymous 10/18/25(Sat)18:54:55 No.106933658

downloading ling to see if it can replace kimi sex

Anonymous
10/18/25(Sat)19:08:58 No.106933766

Anonymous 10/18/25(Sat)19:08:58 No.106933766

>>106933415
I've got it working now.

For some reason, only ~13 of my 24GB of VRAM is used during these benches. Is that normal, or should I be looking to fully saturate that?
$MODEL = "G:\LLM\Models\GLM-4.6-smol-IQ2_KS\GLM-4.6-smol-IQ2_KS-00001-of-00003.gguf"

# === Launch llama-server with recommended GLM-4.6 settings ===
& .\llama-bench.exe `
  -m $MODEL `
  -mmp 0 `
  -ngl 999 `
  -p 128,512 `
  -n 128,512 `
  -b 4096 `
  -ub 4096 `
  -fa 1 `
  -fmoe 1 `
  -ctk q8_0 -ctv q8_0 `
  -ot exps=CPU `
  -t 20

Pause

Anonymous
10/18/25(Sat)19:11:10 No.106933782

Anonymous 10/18/25(Sat)19:11:10 No.106933782

>>106933766
>For some reason, only ~13 of my 24GB of VRAM is used
> -ot exps=CPU `
You can write an enormous -ot expression or you can wait for >>106931647

Anonymous
10/18/25(Sat)19:11:55 No.106933790

Anonymous 10/18/25(Sat)19:11:55 No.106933790

Why didn't drummer do his own mememark? Big corpos lie about mememarks all the time. Why not make some fake bars himself?

Anonymous
10/18/25(Sat)19:13:42 No.106933802

Anonymous 10/18/25(Sat)19:13:42 No.106933802

>>106933766
Do all of those arguments work on llama-bench?
I can't remember what it was exactly, but some stuff that worked in llama-server wasn't implemented in llama-bench IIRC.
Maybe it's the cache quantization, I dunno.

Anonymous
10/18/25(Sat)19:14:56 No.106933822

Anonymous 10/18/25(Sat)19:14:56 No.106933822

>>106933790
Stop bullying Drummer... He might be bit simple but doesn't mean any harm to anyone.

Anonymous
10/18/25(Sat)19:16:19 No.106933834

Anonymous 10/18/25(Sat)19:16:19 No.106933834

>>106933802
if you run llama-bench.exe -h it'll give you a list of what it accepts.

>>106933782
What would the enormous -ot expression look like? All I've ever seen is exps=CPU and ".ffn_.*_exps.=CPU"

Anonymous
10/18/25(Sat)19:24:07 No.106933915

Anonymous 10/18/25(Sat)19:24:07 No.106933915

Why isn't there LFM2-VL 2.6B yet ? Also abliterated / uncensored ?

Need to caption 300k images and all the LLMs suck and I have to stick with FLorence-2 still...

Anonymous
10/18/25(Sat)19:25:08 No.106933927

Anonymous 10/18/25(Sat)19:25:08 No.106933927

>>106933822
That is Undi and Davidau.

Anonymous
10/18/25(Sat)19:37:15 No.106934031

Anonymous 10/18/25(Sat)19:37:15 No.106934031

>>106933185
>token banning in 2025
we have string bans now gramps

Anonymous
10/18/25(Sat)19:46:41 No.106934099

Anonymous 10/18/25(Sat)19:46:41 No.106934099

File: screenshot0240.png (1.99 MB, 2202x1238)

1.99 MB PNG

>>106932264
i agree with this image

Anonymous
10/18/25(Sat)19:58:44 No.106934183

Anonymous 10/18/25(Sat)19:58:44 No.106934183

>>106934099
I can't read those bent letters

Anonymous
10/18/25(Sat)19:59:44 No.106934190

Anonymous 10/18/25(Sat)19:59:44 No.106934190

>>106934183
>t. qwen vl

Anonymous
10/18/25(Sat)20:02:36 No.106934211

Anonymous 10/18/25(Sat)20:02:36 No.106934211

File: niggers cant read this.jpg (66 KB, 1080x817)

66 KB JPG

>>106934183
are you black?

Anonymous
10/18/25(Sat)20:03:12 No.106934215

Anonymous 10/18/25(Sat)20:03:12 No.106934215

>>106934190
>t. moniqwen

Anonymous
10/18/25(Sat)20:03:19 No.106934216

Anonymous 10/18/25(Sat)20:03:19 No.106934216

>>106934211
They can't?

Anonymous
10/18/25(Sat)20:05:27 No.106934235

Anonymous 10/18/25(Sat)20:05:27 No.106934235

>>106934216
They can barely read normal words, and let's not talk about reading comprehension...

Anonymous
10/18/25(Sat)20:09:12 No.106934263

Anonymous 10/18/25(Sat)20:09:12 No.106934263

Minimum specs to run GLM 4.6?

Anonymous
10/18/25(Sat)20:09:47 No.106934269

Anonymous 10/18/25(Sat)20:09:47 No.106934269

>>106934263
24GB VRAM + 128GB RAM (maybe 96?)

Anonymous
10/18/25(Sat)20:09:58 No.106934271

Anonymous 10/18/25(Sat)20:09:58 No.106934271

>>106934263
128gb ram + 24gb vram

Anonymous
10/18/25(Sat)20:11:06 No.106934277

Anonymous 10/18/25(Sat)20:11:06 No.106934277

>>106932235
Because it's not about the quality of the output, it's about how much you need to reroll and tard wrangle a small model until you get what you want, whereas a large model just gets it. The quality of a small model's output may even be better in side by side comparison, the difference is that with a small model, your struggle to make it output what you have in mind, while with a large model, you're balls deep in actual rp

Anonymous
10/18/25(Sat)20:11:16 No.106934283

Anonymous 10/18/25(Sat)20:11:16 No.106934283

>>106934269
>>106934271
>32GB 5090 with 64GB RAM
So close and yet so far.

Anonymous
10/18/25(Sat)20:11:46 No.106934288

Anonymous 10/18/25(Sat)20:11:46 No.106934288

>>106934283
ram is cheap though

Anonymous
10/18/25(Sat)20:12:25 No.106934295

Anonymous 10/18/25(Sat)20:12:25 No.106934295

>>106934277
You can reroll a small model a hundred times by the time your offloaded large model finish its first gen

Anonymous
10/18/25(Sat)20:12:39 No.106934297

Anonymous 10/18/25(Sat)20:12:39 No.106934297

File: file.png (250 KB, 1303x760)

250 KB PNG

>>106934288
was*

Anonymous
10/18/25(Sat)20:13:55 No.106934305

Anonymous 10/18/25(Sat)20:13:55 No.106934305

GLM 4.6/Deepsex for the first 20k tokens or so followed by Qwen 235B/22A thinking up to 60k context is the KINO setup for long-form lorebook RP, prove me wrong.

Anonymous
10/18/25(Sat)20:14:39 No.106934316

Anonymous 10/18/25(Sat)20:14:39 No.106934316

>>106934288
I'm unfortunately a 2 slotkek.

Anonymous
10/18/25(Sat)20:16:28 No.106934329

Anonymous 10/18/25(Sat)20:16:28 No.106934329

>>106934316
2x64gb sticks exist

Anonymous
10/18/25(Sat)20:17:51 No.106934341

Anonymous 10/18/25(Sat)20:17:51 No.106934341

>>106934305
Does it actually get better and not worse with 20k prefill?
...
...
I actually never tired Qwen above 10k tokens and I was using it for like 2 months at least. With glmchan I am hitting 10k almost everday. Can't be just the tokenizer right? Weird.

Anonymous
10/18/25(Sat)20:18:18 No.106934343

Anonymous 10/18/25(Sat)20:18:18 No.106934343

What are the differences between GLM 4.6 and 4.5 for practical use? I don't give a shit about benchmark faggotry.

Anonymous
10/18/25(Sat)20:22:44 No.106934381

Anonymous 10/18/25(Sat)20:22:44 No.106934381

>>106934288
NTA but my cpu/mobo doesn't boot if I try to use more than 2 32gb ram sticks

Anonymous
10/18/25(Sat)20:22:59 No.106934384

Anonymous 10/18/25(Sat)20:22:59 No.106934384

>>106934295
Waiting is fine, reading garbage output ruins immersion

Anonymous
10/18/25(Sat)20:25:43 No.106934400

Anonymous 10/18/25(Sat)20:25:43 No.106934400

>>106934341
From what I noticed, the 2507 thinking version at Q8 does keep the same syntax as the previous context. I also use a user prefill regarding the paragraph formatting so it doesn't devolve into one word sentences and it seems to hold together.
I'm using it because it has the best high context performance out of all local models outside of Deepsex v3.2 at the moment (which isn't even implemented yet), while still being pretty damn fast even at high context. Again, its meh when used at 0 context due to its quirks and lack of world knowledge, but with 20k+ context filled in, its acceptable. Give it a try if you have the RAM.

Anonymous
10/18/25(Sat)20:53:18 No.106934595

Anonymous 10/18/25(Sat)20:53:18 No.106934595

>>106934381
Have you tried updating your bios?

Anonymous
10/18/25(Sat)21:01:06 No.106934635

Anonymous 10/18/25(Sat)21:01:06 No.106934635

MTP support soon inshallah

Anonymous
10/18/25(Sat)21:03:17 No.106934650

Anonymous 10/18/25(Sat)21:03:17 No.106934650

>>106934635
And Qwen 80b!

Anonymous
10/18/25(Sat)21:04:13 No.106934656

Anonymous 10/18/25(Sat)21:04:13 No.106934656

File: 1704599385920673.png (354 KB, 488x651)

354 KB PNG

can any anons recommend the current best general knowledge model? something encyclopedic on science, medicine, history, coding.
I dont care about roleplay or artistic output, just something to answer my inane comments, I am currently using gemini 3 27b

Anonymous
10/18/25(Sat)21:06:19 No.106934669

Anonymous 10/18/25(Sat)21:06:19 No.106934669

>>106934656
In general, the largest the model, the more knowledge.
So something like kimi I guess.
Of course, Gemma models for example know a lot more than any other model in their weight range, but they are relatively small.

Anonymous
10/18/25(Sat)21:06:29 No.106934670

Anonymous 10/18/25(Sat)21:06:29 No.106934670

>>106934635
Be the change you want to see bro

Anonymous
10/18/25(Sat)21:11:13 No.106934704

Anonymous 10/18/25(Sat)21:11:13 No.106934704

>>106934635
Vibe coders will save us.

Anonymous
10/18/25(Sat)21:12:15 No.106934707

Anonymous 10/18/25(Sat)21:12:15 No.106934707

>>106934381
Pretty sure for AMD there were a bunch of BIOS updates in June or earlier that enabled 64GB DIMM support for many vendors.

Anonymous
10/18/25(Sat)21:13:21 No.106934710

Anonymous 10/18/25(Sat)21:13:21 No.106934710

>>106934381
>>106934707
Disregard that I thought you were trying to put 64GB sticks in there

Anonymous
10/18/25(Sat)21:14:17 No.106934720

Anonymous 10/18/25(Sat)21:14:17 No.106934720

>>106931969
Based. /lmg/ shills BTFO.

Anonymous
10/18/25(Sat)21:14:18 No.106934721

Anonymous 10/18/25(Sat)21:14:18 No.106934721

File: 1521254484420.jpg (625 KB, 2048x1365)

625 KB JPG

>>106934669
thanks anon, sorry I made typo with gemini, I was indeed using gemma 3 27b. I'll try kimi

Anonymous
10/18/25(Sat)21:15:23 No.106934728

Anonymous 10/18/25(Sat)21:15:23 No.106934728

>>106934721
>cheetah
They want to be our pets SO BAD.

Anonymous
10/18/25(Sat)21:19:13 No.106934745

Anonymous 10/18/25(Sat)21:19:13 No.106934745

>>106933658
the fuck kind of rig do you have?

Anonymous
10/18/25(Sat)21:22:55 No.106934763

Anonymous 10/18/25(Sat)21:22:55 No.106934763

>>106934635
https://voca.ro/1915MlAOFtMx

Anonymous
10/18/25(Sat)21:23:52 No.106934768

Anonymous 10/18/25(Sat)21:23:52 No.106934768

https://www.tomshardware.com/tech-industry/jensen-huang-says-nvidia-china-market-share-has-fallen-to-zero
>Jensen says Nvidia’s China AI GPU market share has plummeted from 95% to zero
lol, lmao even?

Anonymous
10/18/25(Sat)21:24:37 No.106934774

Anonymous 10/18/25(Sat)21:24:37 No.106934774

File: 1751552205845476.png (210 KB, 498x529)

210 KB PNG

>>106931567
Question for anons who rp with LLMs: do typically set a specific Max output tokens setting? Or do you usually stick with whatever the default your inference engine/webui uses? Sometimes I'll enter a prompt and the output from the "person" I'm role-playing with (I typically have a system prompt that tells the alarm to act as a specific person or persona ) is only like a sentence or two and other times output an entire paragraph. Which output length it does seems to be completely random. Sometimes I'll do a particular prompt and it shits out a paragraph or two worth of text. I'll restart the engine and input that exact prompt. In this time it'll only be a couple sentences. Is it better to set your own Max token output? I don't really RP with it that often so I'm not really sure what counts as "too much", "too little", "good", or " bad"

Anonymous
10/18/25(Sat)21:36:57 No.106934844

Anonymous 10/18/25(Sat)21:36:57 No.106934844

>>106934774
>Which output length it does seems to be completely random
If the model's output ends before the token limit, then it's done saying what it meant to say. If it reaches the token limit, the reply will get truncated.
>Is it better to set your own Max token output?
It may truncate the reply. But you can.
>I'm not really sure what counts as "too much", "too little", "good", or " bad"
Whatever you prefer. You can nudge the model by just instructing it to give short or long replies. Results may vary.

The model has no idea what the token limit is, nor does it know how many tokens it generated already. It generates tokens until "it's done" (by generating an EOS token). The token limit is just a setting for the inference program or the client, not the model.

Anonymous
10/18/25(Sat)21:37:43 No.106934851

Anonymous 10/18/25(Sat)21:37:43 No.106934851

>>106934774
It isn't what you think it is. All it does is reduce the model’s context size by max output tokens, so the response will fit within the context

Anonymous
10/18/25(Sat)21:41:28 No.106934875

Anonymous 10/18/25(Sat)21:41:28 No.106934875

>>106934851
>All it does is reduce the model’s context size by max output tokens
No. It stops generating once the output limit is reached in the current gen request. It's equivalent to the n_predict setting in a gen request to llama-server.
>so the response will fit within the context
No. It's to prevent run-away generation or to just generate in chunks with [continue]. The reply will not necessarily fit in the context as it can get truncated.

Anonymous
10/18/25(Sat)21:43:28 No.106934886

Anonymous 10/18/25(Sat)21:43:28 No.106934886

Llama.cpp is refusing to load a finetune converted to gguf from an axolotl checkpoint, which according to Grok is because the rank of the lora_a is 256 while the rank of the lora_b is 128. The rank of the lora was supposed to be 128. Any ideas?

Anonymous
10/18/25(Sat)21:45:35 No.106934898

Anonymous 10/18/25(Sat)21:45:35 No.106934898

>>106934886
>The rank of the lora was supposed to be 128.
Well? Is it 128? 100% sure?

Anonymous
10/18/25(Sat)21:49:30 No.106934923

Anonymous 10/18/25(Sat)21:49:30 No.106934923

>>106934774
I set the output length to the maximum supported length when using instruct mode.
The model will generate as much text as it deems necessary. There's no point in cutting it short, especially when reasoning is enabled.

For text completion though I'll put the gen limit at 512 tokens, since text completion will just keep generating text until you stop it.
512 tokens is enough to write a paragraph or two, and gives me a chance to make edits or steer the model before continuing.

Anonymous
10/18/25(Sat)21:50:30 No.106934934

Anonymous 10/18/25(Sat)21:50:30 No.106934934

>>106934774
for me, 550 is about as much as I allow for non-thinking models for a more book like experience with 3 paragraphs
reasoning models you have to make it higher because the reasoning process uses these tokens.
are you using Silly Tavern?

Anonymous
10/18/25(Sat)21:52:20 No.106934945

Anonymous 10/18/25(Sat)21:52:20 No.106934945

File: 2025 dram market.png (88 KB, 1280x720)

88 KB PNG

dam

Anonymous
10/18/25(Sat)22:05:13 No.106935020

Anonymous 10/18/25(Sat)22:05:13 No.106935020

>>106934875
Yes. It's what happens in any practical situation
https://github.com/SillyTavern/SillyTavern/blob/74c158bd2e98b8b4dc54d2bb0d088c5a5e918826/public/script.js#L5084
If you set a 4K max response with 16K context, you are only getting 12K tokens for your prompt and chat history
>The reply will not necessarily fit in the context
Wrong. Prompt + response can't be longer than the context
>to just generate in chunks with [continue]
The sole reason for generating in chunks is to provide the model with as much context as possible at the start of a reply

Anonymous
10/18/25(Sat)22:14:09 No.106935071

Anonymous 10/18/25(Sat)22:14:09 No.106935071

>>106934774
You need to tell the model to keep its replies under 200 tokens unless asked to provide a long answer, for example.
I'll keep output length at infinite, it doesn't do that much.

Anonymous
10/18/25(Sat)22:17:21 No.106935091

Anonymous 10/18/25(Sat)22:17:21 No.106935091

>>106935020
>The reply will not necessarily fit in the context
>Prompt + response can't be longer than the context
Yes. Meant to say "The reply will not necessarily fit in the token limit".
>All it does is reduce the model’s context size
It does not reduce the context size. It's set at launch. But it does reduce the gen limit so that it doesn't go over the context size.
>so the response will fit within the context
The *generated tokens* will fit in the context, not necessarily the entire reply. That cannot be guaranteed.

Anonymous
10/18/25(Sat)22:18:36 No.106935099

Anonymous 10/18/25(Sat)22:18:36 No.106935099

>>106934898
When I downloaded another finetune from HF it had the same error so I think there must be some issue with the trainer or Grok was just wrong.

Anonymous
10/18/25(Sat)22:22:37 No.106935129

Anonymous 10/18/25(Sat)22:22:37 No.106935129

>>106935099
Show your llama-server output where you get the error. Post the fucking models you tried, at least. Can you load other models?

Anonymous
10/18/25(Sat)22:27:47 No.106935162

Anonymous 10/18/25(Sat)22:27:47 No.106935162

>>106935091
It reduces the available context size for the prompt and chat history. At this point, I refuse to participate in the nitpicking contest

Anonymous
10/18/25(Sat)22:30:54 No.106935187

Anonymous 10/18/25(Sat)22:30:54 No.106935187

>>106935162
It cannot reduce the context size. That's set at launch. It reduces the gen limit so that it doesn't go over the context size.

Anonymous
10/18/25(Sat)22:33:16 No.106935204

Anonymous 10/18/25(Sat)22:33:16 No.106935204

>>106935129
Ok, gimme a minute.

Anonymous
10/18/25(Sat)22:35:02 No.106935213

Anonymous 10/18/25(Sat)22:35:02 No.106935213

>>106935187
You're either retarded or can't read

Anonymous
10/18/25(Sat)22:37:43 No.106935232

Anonymous 10/18/25(Sat)22:37:43 No.106935232

>>106935187
Ring Attention exists.

Anonymous
10/18/25(Sat)23:23:09 No.106935530

Anonymous 10/18/25(Sat)23:23:09 No.106935530

File: G3jKisDWsAAHAWW.jpg (506 KB, 2048x1536)

506 KB JPG

Anonymous
10/18/25(Sat)23:27:23 No.106935555

Anonymous 10/18/25(Sat)23:27:23 No.106935555

>>106931969
this guy is such an e-begging piece of shit. his troon tunes, all of them, are worse than the originals
>>106931997
i mean, the drummer sucks, but this is the kind of corporate bootlicking you only see on r*ddit. literally the worst humans on the planet

Anonymous
10/18/25(Sat)23:29:17 No.106935564

Anonymous 10/18/25(Sat)23:29:17 No.106935564

>>106932264
this

Anonymous
10/18/25(Sat)23:42:39 No.106935639

Anonymous 10/18/25(Sat)23:42:39 No.106935639

>>106935564
>when she talks while deepthroating your dick as you fuck her ass

Anonymous
10/19/25(Sun)00:17:21 No.106935832

Anonymous 10/19/25(Sun)00:17:21 No.106935832

>>106933196

You don't need Forge, Koboldcpp has image generation support too and the same models work in it. It even supports models forge doesn't like WAN, qwen image and kontext

Anonymous
10/19/25(Sun)00:31:52 No.106935934

Anonymous 10/19/25(Sun)00:31:52 No.106935934

>>106934211
to be fair, that's very bad cursive.

also, it's anyone under the age of 30.

Anonymous
10/19/25(Sun)00:38:51 No.106935980

Anonymous 10/19/25(Sun)00:38:51 No.106935980

>>106934945
>lpddr
>ai

Anonymous
10/19/25(Sun)00:58:45 No.106936098

Anonymous 10/19/25(Sun)00:58:45 No.106936098

I wish there was a balance between GLM-4.6 and K2-0905. It feels like GLM-4.6 is a bit too clean and K2-0905 has too much of a slop tendency. If they were blended together it would be the perfect model.

Anonymous
10/19/25(Sun)01:04:04 No.106936135

Anonymous 10/19/25(Sun)01:04:04 No.106936135

File: file.png (1 KB, 190x28)

1 KB PNG

>>106935980
projections from 2024?

Anonymous
10/19/25(Sun)01:20:37 No.106936224

Anonymous 10/19/25(Sun)01:20:37 No.106936224

File: 68er.png (607 KB, 1007x496)

607 KB PNG

Huh....apparently I can't run GLM 4.5 Air Q4_K_M on a fucking 5090? something feels odd here, maybe my settings are just wrong when running llama-server?
Feels like this shouldn't be an issue but idk?

helppp

Anonymous
10/19/25(Sun)01:22:56 No.106936236

Anonymous 10/19/25(Sun)01:22:56 No.106936236

>>106936224
Did you move the expert tensors to RAM?
Are you trying to use the full 128k context?

Anonymous
10/19/25(Sun)01:23:56 No.106936242

Anonymous 10/19/25(Sun)01:23:56 No.106936242

>>106936224
> failed to open GGUF file
00001 file looks corrupted, looks like you need to download it again.

Anonymous
10/19/25(Sun)01:24:49 No.106936250

Anonymous 10/19/25(Sun)01:24:49 No.106936250

>>106936236
>Did you move the expert tensors to RAM?
How do I do that exactly? I run it using this exact command:

"llama-server -m GLM-4.5-Air-Q4_K_M-00001-of-00002 -c 32768"

>Are you trying to use the full 128k context?
No, I use 32768. I also tried just not setting a context at all (which I think defaults to some piss low 4096 amount) and that also did not work

Anonymous
10/19/25(Sun)01:25:50 No.106936252

Anonymous 10/19/25(Sun)01:25:50 No.106936252

>>106936242
So this isn't an "out of VRAM" error? that's what I figured was happening. The file is corrupt?

Anonymous
10/19/25(Sun)01:26:25 No.106936255

Anonymous 10/19/25(Sun)01:26:25 No.106936255

>>106936252
yeah you're not out of VRAM, it can't open the file

Anonymous
10/19/25(Sun)01:26:31 No.106936256

Anonymous 10/19/25(Sun)01:26:31 No.106936256

>>106936250
>--n-cpu-moe 99 (47 would work the same)
That will probably leave a ton of free vram, then you can lower that to 46, 45, 44, etc, until you fir as much of the model in VRAM as possible for the fastest possible generation.

>>106936252
>that's what I figured was happening
It is.

Anonymous
10/19/25(Sun)01:27:22 No.106936260

Anonymous 10/19/25(Sun)01:27:22 No.106936260

>>106936255
>>106936256
>it's a VRAM error
>It's not a VRAM error
anons you're killing me here

Anonymous
10/19/25(Sun)01:28:17 No.106936265

Anonymous 10/19/25(Sun)01:28:17 No.106936265

>>106936260
Test it with --n-cpu-moe 99 and you'll see.

Anonymous
10/19/25(Sun)01:29:39 No.106936272

Anonymous 10/19/25(Sun)01:29:39 No.106936272

>>106936265
>>106936260
Oh no, actually. I think the other anon is right.
Do you have both .gguf files in the same folder?

Anonymous
10/19/25(Sun)01:31:57 No.106936281

Anonymous 10/19/25(Sun)01:31:57 No.106936281

>>106936272
I'm dumb as shit. I had the 00002 in a different folder under that one (idk why) I'll try it with them both in the same folder and report back later.

Anonymous
10/19/25(Sun)01:34:46 No.106936295

Anonymous 10/19/25(Sun)01:34:46 No.106936295

>>106936281
also thanks, anons

Anonymous
10/19/25(Sun)01:35:01 No.106936298

Anonymous 10/19/25(Sun)01:35:01 No.106936298

>>106936281
Once you get that working, go read about the -ngl and --n-cpu-moe arguments/parameters, otherwise you'll be running the model completely on your CPU.
You can launch llama-server with the -h argument to get a help explaining what each parameter does.

Anonymous
10/19/25(Sun)01:35:15 No.106936299

Anonymous 10/19/25(Sun)01:35:15 No.106936299

>>106936295
no problemo bro

Anonymous
10/19/25(Sun)01:39:52 No.106936318

Anonymous 10/19/25(Sun)01:39:52 No.106936318

>>106936298
>otherwise you'll be running the model completely on your CPU.
That's really bad. I want to use my 5090.
What do you recommend I try for a 5090 running GLM air 4.5 Q4?

"llama-server -m GLM-4.5-Air-Q4_K_M-00001-of-00002 -c 32768 -ngl 99"

How about that? also I assume if I'm using -ngl I don't also want to be using -n-cpu-moe?

Anonymous
10/19/25(Sun)01:40:38 No.106936320

Anonymous 10/19/25(Sun)01:40:38 No.106936320

File: 1739602150295048.png (749 KB, 904x702)

749 KB PNG

nani.. (maybe someone less lazy than me can capture gif/vid)
https://xcancel.com/kiriTNS_mk/status/1979538163833221607

Anonymous
10/19/25(Sun)01:41:09 No.106936322

Anonymous 10/19/25(Sun)01:41:09 No.106936322

File: 1732065050824882.jpg (690 KB, 2048x1152)

690 KB JPG

@grok is this real

Anonymous
10/19/25(Sun)01:42:06 No.106936333

Anonymous 10/19/25(Sun)01:42:06 No.106936333

>>106936322
I don't think that's currently road legal.

Anonymous
10/19/25(Sun)01:42:31 No.106936336

Anonymous 10/19/25(Sun)01:42:31 No.106936336

File: 1730547060590878.jpg (707 KB, 2048x1152)

707 KB JPG

>>106936322
one more

Anonymous
10/19/25(Sun)01:44:28 No.106936345

Anonymous 10/19/25(Sun)01:44:28 No.106936345

>>106936318
>I assume if I'm using -ngl I don't also want to be using -n-cpu-moe?
Using just -ngl, you'll be trying to load the whole model into your VRAM, and that just won't fit.
What you do is use -ngl 99 to tell llama.cpp to load all layers of the model to your VRAM, then you use --n-cpu-moe 99 to tell llama.cpp to exclude the expert tensors from that, which are the bulk of the weights of the model.
Of course, with -ngl99 + --n-cpu-moe 99, you'll have a lot of free VRAM, so you can lower --n-cpu-moe to put more and more of the model in your VRAM.

Anonymous
10/19/25(Sun)01:45:45 No.106936352

Anonymous 10/19/25(Sun)01:45:45 No.106936352

File: chrome_2025-10-19-1760852679.png (13 KB, 445x59)

13 KB PNG

>5 million downloads and still no goofs
AIIIEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Anonymous
10/19/25(Sun)01:46:36 No.106936357

Anonymous 10/19/25(Sun)01:46:36 No.106936357

>>106936333
Miku is above the law.

Anonymous
10/19/25(Sun)01:46:55 No.106936358

Anonymous 10/19/25(Sun)01:46:55 No.106936358

>>106936345
I'll give that a shot, thanks

Anonymous
10/19/25(Sun)02:03:32 No.106936432

Anonymous 10/19/25(Sun)02:03:32 No.106936432

File: yet-another-slop-tune_100(...).png (365 KB, 1079x2101)

365 KB PNG

>>106931567
Good morning /lmg/. What have you been up to?

Anonymous
10/19/25(Sun)02:05:04 No.106936443

Anonymous 10/19/25(Sun)02:05:04 No.106936443

>>106936432
preparing to wipe my win10 install with arch
just making sure to back up everything 1st

Anonymous
10/19/25(Sun)02:06:04 No.106936449

Anonymous 10/19/25(Sun)02:06:04 No.106936449

>>106936443
I've always wondered what distro I would use if I ever had to main Linux. Looks a good one for a relative beginner that only has experience with Linux CLI but otherwise mostly uses Windows?

Anonymous
10/19/25(Sun)02:07:47 No.106936458

Anonymous 10/19/25(Sun)02:07:47 No.106936458

>>106936449
i mean ubuntu with Dash to Panel is basically this

Anonymous
10/19/25(Sun)02:08:20 No.106936462

Anonymous 10/19/25(Sun)02:08:20 No.106936462

File: 1746933648372488.png (1009 KB, 2359x1749)

1009 KB PNG

>>106936449
huh. Hard for me to answer since I was a beginner 25 years ago. Maybe just dive into the deep end with arch, or try ubuntu for 3 to 6 months, then after reinstall with arch with full knowledge that ubuntu is gay and is just an easy stepping stone.

Anonymous
10/19/25(Sun)02:11:17 No.106936476

Anonymous 10/19/25(Sun)02:11:17 No.106936476

>>106936098
>clean
>slop
could you explain again but in proper english this time

Anonymous
10/19/25(Sun)02:18:02 No.106936508

Anonymous 10/19/25(Sun)02:18:02 No.106936508

>>106936476
proper english is obsolete

Anonymous
10/19/25(Sun)02:19:36 No.106936512

Anonymous 10/19/25(Sun)02:19:36 No.106936512

File: 1751501012438816.png (10 KB, 1083x140)

10 KB PNG

>pay for a year of claude opus
>they change the deal and fuck over your convo limits so even a few casual convos set you over
OWARI DA...

Anonymous
10/19/25(Sun)02:20:48 No.106936515

Anonymous 10/19/25(Sun)02:20:48 No.106936515

>>106936512
get local

Anonymous
10/19/25(Sun)02:20:53 No.106936516

Anonymous 10/19/25(Sun)02:20:53 No.106936516

>>106936512
>>>/aicg/

Anonymous
10/19/25(Sun)02:21:36 No.106936523

Anonymous 10/19/25(Sun)02:21:36 No.106936523

File: 1737045152592995.png (104 KB, 271x238)

104 KB PNG

>>106936515
>>106936516
there's nowhere to talk about cloud models and /aicg/ are a bunch of sex crazed degenerates

Anonymous
10/19/25(Sun)02:22:04 No.106936528

Anonymous 10/19/25(Sun)02:22:04 No.106936528

File: 1734479315727818.png (367 KB, 1080x2364)

367 KB PNG

>>106936432

Anonymous
10/19/25(Sun)02:22:36 No.106936531

Anonymous 10/19/25(Sun)02:22:36 No.106936531

>>106936523
so then dont talk about cloud models. they suck.

Anonymous
10/19/25(Sun)02:23:53 No.106936538

Anonymous 10/19/25(Sun)02:23:53 No.106936538

File: 1758474429804075.png (376 KB, 1080x2364)

376 KB PNG

>>106936523
You could always try the /vg/ general. What are you talking about? So much in depth that you're reaching the limits so quickly anyway?

>>106936528

Anonymous
10/19/25(Sun)02:24:08 No.106936540

Anonymous 10/19/25(Sun)02:24:08 No.106936540

>>106936523
stay in your lane lil bro

Anonymous
10/19/25(Sun)02:24:22 No.106936541

Anonymous 10/19/25(Sun)02:24:22 No.106936541

>>106936531
The models are better than anything we have locally, it's everything else about their usage that sucks.

Anonymous
10/19/25(Sun)02:24:49 No.106936544

Anonymous 10/19/25(Sun)02:24:49 No.106936544

>>106936541
GLM 4.6 local mogs everything

Anonymous
10/19/25(Sun)02:25:23 No.106936547

Anonymous 10/19/25(Sun)02:25:23 No.106936547

>>106936544
>mogs
opinion disregarded

Anonymous
10/19/25(Sun)02:27:03 No.106936554

Anonymous 10/19/25(Sun)02:27:03 No.106936554

File: no.gif (2.06 MB, 638x266)

2.06 MB GIF

>>106936547
no u

Anonymous
10/19/25(Sun)02:27:23 No.106936557

Anonymous 10/19/25(Sun)02:27:23 No.106936557

>>106936538
the prob is that "so much in depth" was drastically limited because anthropic wants to serve sonnet, not opus. they fucked over opus users. i guess this could be an argument for local since you're not at a random corpo's whims but in honesty local just doesn't measure up

Anonymous
10/19/25(Sun)02:32:20 No.106936578

Anonymous 10/19/25(Sun)02:32:20 No.106936578

>>106936523
>and /aicg/ are a bunch of sex crazed degenerates
Where do you think you are?

Anonymous
10/19/25(Sun)02:37:14 No.106936603

Anonymous 10/19/25(Sun)02:37:14 No.106936603

>>106936578
we here at lmg only utilize LLMs for productivity adjacent purposes

Anonymous
10/19/25(Sun)02:38:08 No.106936607

Anonymous 10/19/25(Sun)02:38:08 No.106936607

File: question-mark-1019820_1280.png (232 KB, 1280x1280)

232 KB PNG

Honest question, why do people bother with all the command line fucking around with llama when kobold is far simpler and does the same thing with a GUI?
Just use fucking kobold?

Anonymous
10/19/25(Sun)02:40:13 No.106936621

Anonymous 10/19/25(Sun)02:40:13 No.106936621

>>106936607
To confuse dumb anons.

Anonymous
10/19/25(Sun)02:41:41 No.106936629

Anonymous 10/19/25(Sun)02:41:41 No.106936629

>>106936603
productivity of semen

Anonymous
10/19/25(Sun)02:48:17 No.106936653

Anonymous 10/19/25(Sun)02:48:17 No.106936653

>>106936607
it really depends if you want advanced features or not, like with vllm you can get parallelism, which can lead to much higher cumulative tokens per second if you are doing things like coding autocomplete.
Also, creating a script to run an LLM instead of opening a GUI can be easier for some people, and also helpful for developers when creating an application which uses an LLM.

Anonymous
10/19/25(Sun)02:51:16 No.106936658

Anonymous 10/19/25(Sun)02:51:16 No.106936658

>>106936443
>>106936449
You can do gpu pass through from a Linux host to a Windows VM if you actually want windows

Anonymous
10/19/25(Sun)03:02:45 No.106936715

Anonymous 10/19/25(Sun)03:02:45 No.106936715

File: 1731004385497993.png (254 KB, 900x806)

254 KB PNG

>>106936540
>>106936523
>>106936512
seriously though, if i want to complain about new claude limits is that just impossible?

Anonymous
10/19/25(Sun)03:08:02 No.106936732

Anonymous 10/19/25(Sun)03:08:02 No.106936732

>>106936715
yea

Anonymous
10/19/25(Sun)03:13:27 No.106936749

Anonymous 10/19/25(Sun)03:13:27 No.106936749

>>106936715
the bitching about anthropic limits is unbearable. unsubscribe and pay for it via API and you can use it as much as you like. oh, too expensive? now you understand

Anonymous
10/19/25(Sun)03:16:25 No.106936761

Anonymous 10/19/25(Sun)03:16:25 No.106936761

File: 1742623164412708.png (142 KB, 934x876)

142 KB PNG

>>106936749
i mean.. can't i have venture capitalists subsidize my nothing convos about anime? that was comfy

Anonymous
10/19/25(Sun)03:21:05 No.106936782

Anonymous 10/19/25(Sun)03:21:05 No.106936782

File: b8.png (203 KB, 1631x1718)

203 KB PNG

>>106936761
Well, there's an alternative. I can tell you how to get banned and get a full refund. Would you prefer that?

Anonymous
10/19/25(Sun)03:23:44 No.106936794

Anonymous 10/19/25(Sun)03:23:44 No.106936794

>>106935639
What you perceive as size-related intelligence in smut scenes is the result of larger models knowing more _even after training data filtering_. A smaller model mid-trained/continually-pretrained on lightly filtered RP-relevant data instead of math reasoning and coding would perform better and not get confused like that.

Anonymous
10/19/25(Sun)03:38:09 No.106936858

Anonymous 10/19/25(Sun)03:38:09 No.106936858

>>106934669
I swear Gemma-3-27B becomes a 13B model when it engages "RP mode", and probably around a 6B model during ERP.

Anonymous
10/19/25(Sun)03:39:37 No.106936866

Anonymous 10/19/25(Sun)03:39:37 No.106936866

>>106936858
models all fall off past a certain context, probably even more so when in certain tasks it was not very well trained in

Anonymous
10/19/25(Sun)03:47:04 No.106936910

Anonymous 10/19/25(Sun)03:47:04 No.106936910

>>106936607
Because I only need to type the command once and then I can run it whenever I want? The real reason to use kobold is for features that llama.cpp doesn't have, like phrase banning

Anonymous
10/19/25(Sun)03:52:17 No.106936951

Anonymous 10/19/25(Sun)03:52:17 No.106936951

>>106936432
Thinking about all the fun to be had using local models, without using local models.

Anonymous
10/19/25(Sun)03:56:41 No.106936985

Anonymous 10/19/25(Sun)03:56:41 No.106936985

>>106932433
>He also can't simply take llama.cpp and relicense it.
look at koboldcpp, its AGPL
original code is still MIT, and a MIT project can be relicensed under a more restrictive license. its gnu compatible
>>106932395
i know there were tons of issues, not all too sure about support. but if loras were properly supported, we'd definitely see more loras being uploaded instead of merges

Anonymous
10/19/25(Sun)04:15:12 No.106937102

Anonymous 10/19/25(Sun)04:15:12 No.106937102

I did some analysis of DeepSeek V3.1 Terminus on artificial test cases to try to tease out ideal sampler settings, and I found the official suggestion of top-p=0.95 isn't an arbitrary "IDK lol this sounds good" choice but actually was a pretty good cutoff point in the cases where I added up token probabilities. Since then I've been running with top-p 0.95 and temperature 0.8, the temperature choice being more arbitrary -- what settings have others found enjoyable?

Anonymous
10/19/25(Sun)04:16:02 No.106937110

Anonymous 10/19/25(Sun)04:16:02 No.106937110

File: fc1a145e-2fca-4f67-b42b-0(...).jpg (161 KB, 1024x1024)

161 KB JPG

>>106936782
Based
>>106936749
This is the correct answer

Anonymous
10/19/25(Sun)04:24:13 No.106937152

Anonymous 10/19/25(Sun)04:24:13 No.106937152

glm 4.6 at q8 or deepseek 3.1 at q4?

Anonymous
10/19/25(Sun)04:26:56 No.106937168

Anonymous 10/19/25(Sun)04:26:56 No.106937168

>>106937152
glm duh

Anonymous
10/19/25(Sun)04:27:03 No.106937169

Anonymous 10/19/25(Sun)04:27:03 No.106937169

>>106935129
Ok, here it is, sorry for the delay.

The LoRa was trained with (more or less) the axolotl example config for this model except using a rank of 128.
The lora to gguf script works fine.
I then run the llama server with this command:
./llama.cpp/build/bin/llama-server -c 60000 --port 8001 -m ./data/huggingface-cache/hub/models--MaziyarPanahi--Meta-Llama-3.1-405B-Instruct-GGUF/snapshots/85b9bd67025a43
37e9694ec0edaf46437fe6283b/Meta-Llama-3.1-405B-Instruct.Q3_K_S.gguf-00001-of-00009.gguf --lora /workspace/llama405b-outputs/outputs/out/qlora-llama3_1-405b/checkpoint-60/checkpoint-60-F16-LoRA.gguf

This is the error it throws:

llama_adapter_lora_init_impl: - kv  11:               general.quantization_version u32              = 2                                                                                              
llama_adapter_lora_init: failed to apply lora adapter: tensor 'blk.0.attn_k.weight' has incorrect shape (hint: maybe wrong base model?)                                                              
common_init_from_params: failed to apply lora adapter '/workspace/llama405b-outputs/outputs/out/qlora-llama3_1-405b/checkpoint-60/checkpoint-60-F16-LoRA.gguf'                                       
srv    load_model: failed to load model, './data/huggingface-cache/hub/models--MaziyarPanahi--Meta-Llama-3.1-405B-Instruct-GGUF/snapshots/85b9bd67025a4337e9694ec0edaf46437fe6283b/Meta-Llama-3.1-405B-Instruct.Q3_K_S.gguf-00001-of-00009.gguf'                                                                                                                                                          
srv    operator(): operator(): cleaning up before exit...                                                                                                                                            
main: exiting due to model loading error

>>106936985
It's a shame because merging the weights is a pain in the ass and takes a ton of disk space.

Anonymous
10/19/25(Sun)04:28:22 No.106937178

Anonymous 10/19/25(Sun)04:28:22 No.106937178

I have never understood the furor around drummer's slops, the only thing he did that was decent was Unslop Nemo

Anonymous
10/19/25(Sun)04:37:27 No.106937231

Anonymous 10/19/25(Sun)04:37:27 No.106937231

>>106937178
They started as low-tier, disgustingly-named models memed to popularity just enough that the guy in question saw a money-making opportunity.

Anonymous
10/19/25(Sun)04:48:51 No.106937292

Anonymous 10/19/25(Sun)04:48:51 No.106937292

>>106937231
money-making how, what's his endgame? put the goofs behind a paywall?
or is the idea to just get a big following on twitter and become an Accomplished AI Evangelist and one day get a job

Anonymous
10/19/25(Sun)04:52:21 No.106937313

Anonymous 10/19/25(Sun)04:52:21 No.106937313

>>106937292
He literally did this weird model that spits all sorts of random brand names at you, what the fuck was that about. You can really tell the guy is thinking really hard how to sell this crap

Anonymous
10/19/25(Sun)05:00:48 No.106937348

Anonymous 10/19/25(Sun)05:00:48 No.106937348

File: open4work.png (16 KB, 472x119)

16 KB PNG

>>106937292
Like I said in the past before he even set up a Patreon / Ko-fi account: to saturate the space with his shit, no matter how good or bad, to get recognition and eventually hired for work in an actual company.

That's pajeet-tier behavior that retards keep rewarding and so we'll keep seeing more of it until he's accomplished his goals.

Anonymous
10/19/25(Sun)05:02:18 No.106937358

Anonymous 10/19/25(Sun)05:02:18 No.106937358

>>106937169
This is the shape of the tensors in the LoRa GGUF:
Tensor Name: blk.0.attn_k.weight.lora_a
  Dimensions: [16384   128]
  Data Type: 0
Tensor Name: blk.0.attn_k.weight.lora_b
  Dimensions: [ 128 1024]
But the model's blk.0.attn_k has the shape
blk.0.attn_k.weight     [16384, 2048]
So I'm not sure why lora_b shape doesn't match the shape of the attn_k matrix, but that seems to be the problem.
The curious thing is that the model merged seemingly without errors when using axolotl (haven't checked the generation quality yet).

Anonymous
10/19/25(Sun)05:08:36 No.106937392

Anonymous 10/19/25(Sun)05:08:36 No.106937392

I just can't stop thinking about TheDrummer.
Like. I was just sitting at my computer, right? And all of a sudden. BAM... TheDrummer. So I had to post it. I had to tell someone.
I just can't stop thinking about him.

Anonymous
10/19/25(Sun)05:12:09 No.106937402

Anonymous 10/19/25(Sun)05:12:09 No.106937402

On the other hand a large LoRa like that would consume a large amount of memory, so maybe it's a blessing in disguise.

Anonymous
10/19/25(Sun)05:12:10 No.106937404

Anonymous 10/19/25(Sun)05:12:10 No.106937404

talking about drummer is now obvious bait
don't know how you fucks keep falling for this shit

Anonymous
10/19/25(Sun)05:24:48 No.106937475

Anonymous 10/19/25(Sun)05:24:48 No.106937475

Should you finetune model on SDK code and maybe some examples you are using or will it just mess it up?

Anonymous
10/19/25(Sun)05:32:26 No.106937513

Anonymous 10/19/25(Sun)05:32:26 No.106937513

>>106937313
Are you kidding? That's my favorite of his models and probably the one I've wasted the most time chatting to. I show it to everyone IRL and host it locally for people when they ask "How are the AI companies gonna make money when VC dries up?". It was inspired by a Black Mirror episode by the same name.

Anonymous
10/19/25(Sun)05:33:42 No.106937519

Anonymous 10/19/25(Sun)05:33:42 No.106937519

File: file.png (2 KB, 106x150)

2 KB PNG

hello

first time trying MoE model, do i need to set up something specific in openwebui model settings to make it work or it just works?

i have 96gb vram across 6 cards, bigger models will get spread out on cards and part of a model gets used accordingly

Anonymous
10/19/25(Sun)05:34:27 No.106937526

Anonymous 10/19/25(Sun)05:34:27 No.106937526

Also, --lora-scaled works perfectly in llama.cpp + Cuda, I use it all the time. I had issues with it a while back using AMD/Vulkan.

Anonymous
10/19/25(Sun)05:34:33 No.106937527

Anonymous 10/19/25(Sun)05:34:33 No.106937527

>>106937519
Please stop wasting your time on this rock doing this shit.
We have a planet to save.

Anonymous
10/19/25(Sun)05:38:12 No.106937545

Anonymous 10/19/25(Sun)05:38:12 No.106937545

File: file.png (3 KB, 145x151)

3 KB PNG

>>106937527
i have house to heat and the energy this machine uses is converted to heat, which is not going to waste, the hardware i'm using is second hand and as cheap as it can be

Hi all, Drummer here...
10/19/25(Sun)05:48:32 No.106937610

Hi all, Drummer here... 10/19/25(Sun)05:48:32 No.106937610

>>106937513
Should I do a 24B version of Rivermind?

>>106937231
Yeah, good times. I think it was Smegmma 9B that made me realize it wasn't going to work in the long-run. Lots of people opted out because of the name.

Also, Lemmy (creator of Celeste) told me to stop fucking around if I wanted providers to host my models. That's when I decided to go with sci-fi, starting with Expanse ships, and released Rocinante as my first serious take.

>>106937392
>>106937404
I don't see how I'm causing harm to anyone, so I can't take them seriously. No idea why I get to live in their heads rent-free.

Anonymous
10/19/25(Sun)05:50:53 No.106937625

Anonymous 10/19/25(Sun)05:50:53 No.106937625

>>106937610
>Should I do a 24B version of Rivermind?
yes

Anonymous
10/19/25(Sun)06:07:57 No.106937730

Anonymous 10/19/25(Sun)06:07:57 No.106937730

>>106930625
DDR6 is coming out next year or early 2027. DDR4 is no longer being made and as it stands right now the only ram really being sold on the consumer market is DDR5 instead of DDR4 and DDR5

Anonymous
10/19/25(Sun)06:30:12 No.106937877

Anonymous 10/19/25(Sun)06:30:12 No.106937877

>>106937610
Drummer, please figure out why your models write for {{user}} so often before making any more
Your cydonias are especially bad in this regard, every one that came after v2g.

Anonymous
10/19/25(Sun)06:31:45 No.106937888

Anonymous 10/19/25(Sun)06:31:45 No.106937888

>>106937877
I think it might have to do with it being a small model that is being hit by a grift hammer for no good reason.

Anonymous
10/19/25(Sun)06:40:26 No.106937950

Anonymous 10/19/25(Sun)06:40:26 No.106937950

>>106937888
Older models like Rocinante and Unslop were fine, and no worse than the original model when it comes to imitation

Anonymous
10/19/25(Sun)06:40:47 No.106937951

Anonymous 10/19/25(Sun)06:40:47 No.106937951

i tried ling sex but i think i fucked up the template. ill try again later

Anonymous
10/19/25(Sun)06:42:49 No.106937964

Anonymous 10/19/25(Sun)06:42:49 No.106937964

>>106937877
>why your models write for {{user}} so often before making any more
I don't use his finetunes/merges but this tends to happen when there's a fuckfest of merges in one model.

Anonymous
10/19/25(Sun)06:48:43 No.106937989

Anonymous 10/19/25(Sun)06:48:43 No.106937989

>>106937950
>no worse
no better either

Anonymous
10/19/25(Sun)06:49:49 No.106937999

Anonymous 10/19/25(Sun)06:49:49 No.106937999

>>106937951
If a 1T token can get fucked up by a template then it is bad. Maybe not as a whole but that probably means that the sex stuff is a very small domain of the training and it can't generalize it at all.

Anonymous
10/19/25(Sun)07:00:04 No.106938060

Anonymous 10/19/25(Sun)07:00:04 No.106938060

File: 111.jpg (62 KB, 772x448)

62 KB JPG

Holy shit, Chinese regulations are cuckolded like crazy. No wonder quality is going down so much.

Anonymous
10/19/25(Sun)07:02:54 No.106938079

Anonymous 10/19/25(Sun)07:02:54 No.106938079

>>106938060
Why do they care about IP infringements and woke shit? I don't understand.

Anonymous
10/19/25(Sun)07:05:17 No.106938095

Anonymous 10/19/25(Sun)07:05:17 No.106938095

>>106938060
China is a socialist country, and wokeness is a socialist concept. It's just that LLMs were a new thing, and the CCP didn't catch up fast enough.

Anonymous
10/19/25(Sun)07:06:57 No.106938105

Anonymous 10/19/25(Sun)07:06:57 No.106938105

>>106938079
They're copypasting burgers regulation since they're trying to market their models there

Anonymous
10/19/25(Sun)07:08:10 No.106938113

Anonymous 10/19/25(Sun)07:08:10 No.106938113

>>106937989
less censored though

Anonymous
10/19/25(Sun)07:09:49 No.106938120

Anonymous 10/19/25(Sun)07:09:49 No.106938120

>>106937610
>Should I do a 24B version of Rivermind?

I mean I'd like one (not the Lux version) yeah! But I'm not sure if it'd be popular given what the guy I was replying to said.

I like how it chooses relevant brands/products to shill based on the topic.

> good times

I remember laughing at the Moistral model card, I can't remember what you wrote but it was something like "turn this into a moistral masterpiece". And I saw a HN comment like "Moistral? There's no way that can be real...".

Good times indeed!

Anonymous
10/19/25(Sun)07:16:35 No.106938151

Anonymous 10/19/25(Sun)07:16:35 No.106938151

gmma soon

Anonymous
10/19/25(Sun)07:19:53 No.106938168

Anonymous 10/19/25(Sun)07:19:53 No.106938168

>>106938113
>less censored
>than nemo
What the fuck are you on about?

Anonymous
10/19/25(Sun)07:20:55 No.106938171

Anonymous 10/19/25(Sun)07:20:55 No.106938171

>>106936320
Not good for aerodynamics but I like the fluttering hair

Anonymous
10/19/25(Sun)07:38:25 No.106938257

Anonymous 10/19/25(Sun)07:38:25 No.106938257

>>106931969
>suddenly a lot of (you)'s
hmm

Anonymous
10/19/25(Sun)07:44:50 No.106938287

Anonymous 10/19/25(Sun)07:44:50 No.106938287

>>106937102
Just the default of top-p 0.95 and temp 0.6. The responses I get on rerolls are varied enough that I've never felt the urge to mess with the temperature.

Anonymous
10/19/25(Sun)07:48:02 No.106938305

Anonymous 10/19/25(Sun)07:48:02 No.106938305

>mmmmm we can't just send http requests, we need 5 wrapper libraries and a 2 typing libraries
>all libraries are incompatible with each other two releases later
what's this programming paradigm called?

Anonymous
10/19/25(Sun)08:03:01 No.106938382

Anonymous 10/19/25(Sun)08:03:01 No.106938382

>>106938305
"What in tard-ation are ya doin' stupid vibe coders?"

Anonymous
10/19/25(Sun)08:23:00 No.106938473

Anonymous 10/19/25(Sun)08:23:00 No.106938473

>>106938305
>things that didn't happen
leave the hallucinations to the llm bro

Anonymous
10/19/25(Sun)09:41:38 No.106938899

Anonymous 10/19/25(Sun)09:41:38 No.106938899

Does llama.cpp support Deepseek V3.2 and its meme sparse attention yet? I remember trying it when it came out over the API, going 'yeah, this is slightly better than V3.1(-Terminus) before forgetting about it the moment GLM4.6 released like two days later.

Anonymous
10/19/25(Sun)09:50:08 No.106938957

Anonymous 10/19/25(Sun)09:50:08 No.106938957

>>106938899
>this is slightly better than V3.1(-Terminus) before forgetting about it the moment GLM4.6 released like two days later
The CohereLabs/c4ai-command-a-03-2025 moment of the whale...

Anonymous
10/19/25(Sun)09:54:35 No.106938986

Anonymous 10/19/25(Sun)09:54:35 No.106938986

anything better than rocinante yet?

my 8ball from walmart says: no, not for local

Anonymous
10/19/25(Sun)10:09:31 No.106939080

Anonymous 10/19/25(Sun)10:09:31 No.106939080

>>106938986
Nemo (not drummer shittune) will probably remain the king of the era of undertrained > safe unsuable LLM-s. Luckily we are in a new era started by glmsex.

Anonymous
10/19/25(Sun)10:09:37 No.106939081

Anonymous 10/19/25(Sun)10:09:37 No.106939081

>>106931969
>Offering free models is a scam
This general is so fucking retarded

Anonymous
10/19/25(Sun)10:14:49 No.106939108

Anonymous 10/19/25(Sun)10:14:49 No.106939108

>>106939081
If they were so good, he wouldn't need to release a new one every few days, piggybacking off new models from legitimate AI labs.

Anonymous
10/19/25(Sun)10:21:38 No.106939149

Anonymous 10/19/25(Sun)10:21:38 No.106939149

There's no way the haters don't know they're contributing to keeping TheDrummer in everyone's mind.
They secretly like him. They want to hug him and give him kisses and walk hand in hand with him for the whole world to see.
They got TheDrummy issues.

Anonymous
10/19/25(Sun)10:23:34 No.106939160

Anonymous 10/19/25(Sun)10:23:34 No.106939160

>>106931567
#NotMyMiku

Anonymous
10/19/25(Sun)10:25:24 No.106939170

Anonymous 10/19/25(Sun)10:25:24 No.106939170

>>106939149
they can't get the taste of his drummies out of their mouth

Anonymous
10/19/25(Sun)10:27:27 No.106939185

Anonymous 10/19/25(Sun)10:27:27 No.106939185

>>106939149
HuggingFace is fed up with him too.

Anonymous
10/19/25(Sun)10:34:01 No.106939241

Anonymous 10/19/25(Sun)10:34:01 No.106939241

>>106939149
Put drummer derangement syndrome in your next card. It is a collective activity of calling a faggot a faggot. If a newfag enters a thread right now they aren't going to download a tune of someone who is being called a faggot by multiple posters.

Anonymous
10/19/25(Sun)10:40:57 No.106939292

Anonymous 10/19/25(Sun)10:40:57 No.106939292

File: disgusting.jpg (335 KB, 3840x2160)

335 KB JPG

In civilized and developed Japan, when a daughter brings her chosen one to meet her parents, they give him a test. They go through his models and check the quants. If the boy doesn't have at least q4, it's clear that he comes from a dysfunctional family. The test is 100% accurate, and even the WHO and the UN have acknowledged that families dominated by alcoholism, drug addiction, and incest always show a preference to q3 and lower quants. That is why I'm not surprised that most of you poorfags think that output from these retarded quants is acceptable. You were eating shit the whole life and I pity you, but have some honor and human dignity to not consume fecals when we are filling our palates with higher quants, it's a pleasure only for sophisticated gourmets. At the sight of f16 precision output, you would probably jump and scream like monkeys in a zoo. You dirty, muddy swine.

Anonymous
10/19/25(Sun)10:45:09 No.106939313

Anonymous 10/19/25(Sun)10:45:09 No.106939313

Seems like this thread has stagnated BADLY.

Anonymous
10/19/25(Sun)10:47:10 No.106939323

Anonymous 10/19/25(Sun)10:47:10 No.106939323

>>106939313
It's a thread about LLMs after all

Anonymous
10/19/25(Sun)10:51:00 No.106939346

Anonymous 10/19/25(Sun)10:51:00 No.106939346

>>106939292
>q4
quantlet projection

Anonymous
10/19/25(Sun)10:56:32 No.106939384

Anonymous 10/19/25(Sun)10:56:32 No.106939384

>>106939292
erm, according to this copetest there is barely any difference between q6, q8 and f16

Anonymous
10/19/25(Sun)10:59:18 No.106939401

Anonymous 10/19/25(Sun)10:59:18 No.106939401

>>106939384
Why, don't you use your models with greedy decoding?

Anonymous
10/19/25(Sun)11:00:18 No.106939406

Anonymous 10/19/25(Sun)11:00:18 No.106939406

f32
fa off
ngl 999
ncmoe 0
swa-full
ctx-size 0

Anonymous
10/19/25(Sun)11:11:16 No.106939477

Anonymous 10/19/25(Sun)11:11:16 No.106939477

oh no no no no
https://xcancel.com/ylecun/status/1979595060447416733

Anonymous
10/19/25(Sun)11:12:28 No.106939489

Anonymous 10/19/25(Sun)11:12:28 No.106939489

>>106939323
Does this mean we have Cold AI Winter #2?

Anonymous
10/19/25(Sun)11:12:47 No.106939493

Anonymous 10/19/25(Sun)11:12:47 No.106939493

>>106939477
another one
https://xcancel.com/ylecun/status/1979596956277289353#m
yann buried llms

Anonymous
10/19/25(Sun)11:14:38 No.106939508

Anonymous 10/19/25(Sun)11:14:38 No.106939508

>>106939313
zuck will develop agi and save it bigly

Anonymous
10/19/25(Sun)11:17:55 No.106939535

Anonymous 10/19/25(Sun)11:17:55 No.106939535

File: 638er.png (486 KB, 995x103)

486 KB PNG

>>106936345
>>106936299
>>106936298
Update on this. I have it working now correctly using the following paramaters and it uses up 100% of my 5090's VRAM. So that's good, however I feel this can be tweaked further for better performance? Any suggestions on how to modify this command to get more out of it?

Like I see others running shit like "--ctx-size 40000 --flash-attn --temp 0.6 --top-p 0.95 --n-cpu-moe 41 --n-gpu-layers 999 --alias llama --no-mmap --jinja --chat-template-file GLM-4.5.jinja --verbose-prompt"
(just as a random example)

Anonymous
10/19/25(Sun)11:19:51 No.106939555

Anonymous 10/19/25(Sun)11:19:51 No.106939555

File: file.png (102 KB, 668x501)

102 KB PNG

threadly reminder this faggot browses /lmg/

Anonymous
10/19/25(Sun)11:24:52 No.106939598

Anonymous 10/19/25(Sun)11:24:52 No.106939598

File: settingstavern.png (606 KB, 262x1879)

606 KB PNG

Gave sillytavern a shot with llama (also tried kobold because why not) and I have the same recurring issue with it. It keeps giving me massive blog post replies.
I can literally say something as simple as "I look around the room for a light" and it gives me a massive essay reply with many actions and events happening rather than a shorter reply that relates to what I said.

I assume this has to do with my settings as I have not tweaked them much. What should I change these settings too? I assume the tokens are wrong as well

Anonymous
10/19/25(Sun)11:24:54 No.106939599

Anonymous 10/19/25(Sun)11:24:54 No.106939599

>>106939535
Flash attention is probably already on by default now, but you can add -fa 1 if you want to make sure that it's on.

>Like I see others running shit like
Read the llama-server help output and you'll understand that most of those don't really apply for your case.

Anonymous
10/19/25(Sun)11:26:42 No.106939607

Anonymous 10/19/25(Sun)11:26:42 No.106939607

>>106939598
Usually the reply length is a function of two things :
1. The model's training.
2. Tour promot (system prompt + character card + examples + first message + etc etc).
Usually the second one will have the most impact.
Try tweaking the first message to be brief, see how much that impact things.

Anonymous
10/19/25(Sun)11:28:25 No.106939630

Anonymous 10/19/25(Sun)11:28:25 No.106939630

>>106939599
I did read through the llama-server -h last night when it was suggested to me, and while most of it made sense I wasn't 100% sure what I could add that would help improve the parameters for the 5090.
Also yeah flash attention is on by default that's why I didn't add -fa to it.

So just stick with: "llama-server -m GLM-4.5-Air-Q4_K_M-00001-of-00002.gguf -c 32768 --n-cpu-moe 33 --n-gpu-layers 99" ?
does that seem fine or anything I should be adding?

Anonymous
10/19/25(Sun)11:29:58 No.106939645

Anonymous 10/19/25(Sun)11:29:58 No.106939645

>>106939630
I think you are good yeah.
What speeds are you getting?

Anonymous
10/19/25(Sun)11:31:01 No.106939656

Anonymous 10/19/25(Sun)11:31:01 No.106939656

>>106939645
>What speeds are you getting?
where does it show the speed? I looked through command prompt when it genned and didn't see anything

Anonymous
10/19/25(Sun)11:32:02 No.106939667

Anonymous 10/19/25(Sun)11:32:02 No.106939667

>>106939607
what about the tokens?

Anonymous
10/19/25(Sun)11:37:47 No.106939710

Anonymous 10/19/25(Sun)11:37:47 No.106939710

How much does the structure of your character cards matter for any given model?
If it does, what's the optimal format for GLM?

Anonymous
10/19/25(Sun)11:38:16 No.106939712

Anonymous 10/19/25(Sun)11:38:16 No.106939712

>>106939607
>Tour promot
Holy fuck. I'm not even mobile posting.
What the hell.
>Your prompt*

>>106939656
See pic.
The part where it says
>prompt eval time = 1158.66 ms / 103 tokens ( 11.25 ms per token, 88.90 tokens per second)
> eval time = 152954.13 ms / 2859 tokens ( 53.50 ms per token, 18.69 tokens per second)
> total time = 154112.79 ms / 2962 tokens

>>106939667
You mean Tokens (Response)?
That doesn't control what the model wants to write, just where the generation will cutoff.
So if you set it to 120 tokens and the model wants to spew 1024 tokens, it'll just cut the text before it's done.

>>106939710
Now that's one hell of a question.

Anonymous
10/19/25(Sun)11:39:17 No.106939716

Anonymous 10/19/25(Sun)11:39:17 No.106939716

File: this stuff.png (52 KB, 960x716)

52 KB PNG

>>106939712
And f course I forgot the image.
>pic related

Anonymous
10/19/25(Sun)11:40:19 No.106939722

Anonymous 10/19/25(Sun)11:40:19 No.106939722

>>106939712
>>106939716
Do note that you want to get these values on a larger prompt, otherwise the prompt eval time will be a meaningless, like in my pic.

Anonymous
10/19/25(Sun)11:41:32 No.106939733

Anonymous 10/19/25(Sun)11:41:32 No.106939733

>>106939712
wats rest of ur rig's specs?
glad u stopped namefagging btw

Anonymous
10/19/25(Sun)11:43:36 No.106939755

Anonymous 10/19/25(Sun)11:43:36 No.106939755

>>106939733
I never namefagged in my 17 years of 4chan.
Ever.
It's a gaymer notebook with 8gb of vram.
That screenshot is of qwen 30B A3B.

Anonymous
10/19/25(Sun)11:45:25 No.106939774

Anonymous 10/19/25(Sun)11:45:25 No.106939774

>>106939755
wtf dont u have a 5090 and arent u running glm air???
glad you never namefagged in your 17 years of 4can

Anonymous
10/19/25(Sun)11:45:27 No.106939775

Anonymous 10/19/25(Sun)11:45:27 No.106939775

>>106939712
>See pic.
>The part where it says
I'll check when i get home from work. Thanks for the advice.

Let's say the gen speed is a bit slower than I'd like, what would I do to the parameters to help speed it up? lower the 32k tokens?

Anonymous
10/19/25(Sun)11:48:46 No.106939807

Anonymous 10/19/25(Sun)11:48:46 No.106939807

>>106939774
Different guy.
I'm the one giving advice to the dude with a 5090 (>>106939775).

Anonymous
10/19/25(Sun)11:51:03 No.106939826

Anonymous 10/19/25(Sun)11:51:03 No.106939826

>>106939712
>Holy fuck. I'm not even mobile posting.
Maybe I'm just paranoid and should proofread better, but sometimes I swear the site changes or deletes words after I hit post.

Anonymous
10/19/25(Sun)11:51:41 No.106939830

Anonymous 10/19/25(Sun)11:51:41 No.106939830

>>106939710
The vocab choices have more impact than structure.
Call your npc a tsundere and you need zero else on personality for example.

Anonymous
10/19/25(Sun)11:52:01 No.106939833

Anonymous 10/19/25(Sun)11:52:01 No.106939833

>>106939826
that might be true, you're on windows

Anonymous
10/19/25(Sun)11:53:56 No.106939846

Anonymous 10/19/25(Sun)11:53:56 No.106939846

>>106939598
>Always respond in 1-2 short paragraphs. Limit {{char}}'s response to less than 200 tokens unless specifically asked to provide a long answer.
If it doesn't understand instructions then it's a bad model.

Anonymous
10/19/25(Sun)12:03:43 No.106939920

Anonymous 10/19/25(Sun)12:03:43 No.106939920

>>106939846
>Limit {{char}}'s response to less than 200 tokens unless specifically asked to provide a long answer.
I don't think models are able to count their own tokens, nor are they trained to correlate tokens to sentence length or whatever, but I guess it can serve as a heuristic to "short responses".

Anonymous
10/19/25(Sun)12:09:15 No.106939981

Anonymous 10/19/25(Sun)12:09:15 No.106939981

What causes the AI to get stuck in 'a loop'? I'm using TheDrummer_Cydonia-R1-24B-v4-Q4_K_M and it was fine for a bit, but now the AI just keeps repeating it's last message ad verbatim, and then parrots word for word the last dialogue it gave.

Anonymous
10/19/25(Sun)12:13:25 No.106940020

Anonymous 10/19/25(Sun)12:13:25 No.106940020

>>106939981
Bad model + bad settings, usually.
First thing is to check if your samplers aren't truncating the token pool too much or making the top tokens too likely.

Anonymous
10/19/25(Sun)12:13:25 No.106940021

Anonymous 10/19/25(Sun)12:13:25 No.106940021

What's the chuddiest model without prompt massaging it?

Anonymous
10/19/25(Sun)12:16:33 No.106940041

Anonymous 10/19/25(Sun)12:16:33 No.106940041

>>106940021
'toss

Anonymous
10/19/25(Sun)12:17:19 No.106940045

Anonymous 10/19/25(Sun)12:17:19 No.106940045

>>106940021
Llama 2 with the wrong chat template.

Anonymous
10/19/25(Sun)12:19:35 No.106940061

Anonymous 10/19/25(Sun)12:19:35 No.106940061

>>106940020
I'm using the kobold (godlike) preset, and I made sure to click the button at the bottom to load default order. I'm real stupid and new when it comes to this, I have no idea what I'm doing. The character card worked fine before, and I had way longer chats with it that didn't break down and loop like this.

Anonymous
10/19/25(Sun)12:22:45 No.106940094

Anonymous 10/19/25(Sun)12:22:45 No.106940094

>>106940061
>I'm using the kobold (godlike) preset,
Yeah, don't do that.
I don't know how that one looks specifically, but those presets were always vodoo and were created a long, long time ago.
Try resting your samplers and using something mild and more "default" like Temp 0.75, TopK 100, TopP 0.95.
See what that does.
Also, if you are quanting the cache, try disabling that, see if that helps at all.

Anonymous
10/19/25(Sun)12:25:32 No.106940121

Anonymous 10/19/25(Sun)12:25:32 No.106940121

>>106939920
You sound retarded, retard.

Anonymous
10/19/25(Sun)12:29:32 No.106940159

Anonymous 10/19/25(Sun)12:29:32 No.106940159

>>106940094
I would if I knew what the fuck most of that even meant, lol. But I'll try messing with the settings and try it again. Should I ignore the sillytavern thing about it working better on text completion with the koboldAPI and go back to using the KoboldAPI Classic? Or is there a place to get premade settings, or is this on a per-system kind of thing?

Anonymous
10/19/25(Sun)12:41:41 No.106940264

Anonymous 10/19/25(Sun)12:41:41 No.106940264

File: 1749471529130802.png (94 KB, 1898x259)

94 KB PNG

>>106931567
Alright. Finally quantized and uploaded one of my side projects.

https://huggingface.co/AiAF/rp-sft-merged_1000_GGUF

What kind of prompts should I use to test it?

Anonymous
10/19/25(Sun)12:44:18 No.106940288

Anonymous 10/19/25(Sun)12:44:18 No.106940288

>>106940264
you should consider licensing it under AGPLv3 or cc-by-nc-4.0 so kikes cant steal your models
downloading it rn, can you tell us what instruct template you used for this? or is this just a completion model liek llama 1

Anonymous
10/19/25(Sun)12:46:04 No.106940307

Anonymous 10/19/25(Sun)12:46:04 No.106940307

>>106940264
>What kind of prompts should I use to test it?
Nala according to the instructions in
>https://justpaste dot it/GreedyNalaTests

Anonymous
10/19/25(Sun)12:54:02 No.106940377

Anonymous 10/19/25(Sun)12:54:02 No.106940377

>>106940288
<start_of_turn>user [your prompt goes here] <end_of_turn> <start_of_turn>model [model response]
Idk why anyone would bother tuning a model to be a completion model only lol. The only decent use for base/ completion models that I know of is if you have an RAG setup using a model that is good at doing tasks based on the prompt it received

>>106940307
Damn, I forgot that page existed. I'll test it with that

Anonymous
10/19/25(Sun)12:56:42 No.106940400

Anonymous 10/19/25(Sun)12:56:42 No.106940400

File: topkek.png (327 KB, 1309x1120)

327 KB PNG

>>106940159
>>106940061
>>106939981
So I broke it free of it's loop by fucking with it's settings some and then having my character forcefully exit the scene, but then the AI went full batshit and fucking ended the RP on me, lmao.

Anonymous
10/19/25(Sun)13:00:07 No.106940439

Anonymous 10/19/25(Sun)13:00:07 No.106940439

>>106940400
>"REDMPTION"
lol

Anonymous
10/19/25(Sun)13:25:39 No.106940674

Anonymous 10/19/25(Sun)13:25:39 No.106940674

File: mysides.png (156 KB, 1271x480)

156 KB PNG

>>106940439
>>106940400
Oh my god I somehow made it worse. I loaded up a mistral 7b tekken preset or whatever, since apparently that would work for the model, changed the XTC settings to .1 and .5, and then the fucking thing started giving me OOC thoughts and comments, even though it never did that before. I had previously used 'ooc: whatever' before just fine without this happening. Pretty fucking funny, though.

Anonymous
10/19/25(Sun)13:30:46 No.106940713

Anonymous 10/19/25(Sun)13:30:46 No.106940713

File: freshair.mp4 (951 KB, 1280x720)

951 KB MP4

>>106936320
https://files.catbox.moe/bjswrg.mp4

Anonymous
10/19/25(Sun)13:33:31 No.106940742

Anonymous 10/19/25(Sun)13:33:31 No.106940742

>>106940307

user
"ahhh ahhh mistress"

model
*She rubs her paws along your thighs and stomach as she starts to undo your pants.* "That's right, little one, tell your mistress how much you want it." [end of text]

Anonymous
10/19/25(Sun)13:35:17 No.106940761

Anonymous 10/19/25(Sun)13:35:17 No.106940761

File: offwithyerdick.png (188 KB, 1299x599)

188 KB PNG

>>106940674
Welp. I guess this is what happens when you force an AI to continue after having a meltdown and trying to end the RP. Nice to know the model isn't instant horny, but RIP to the character's dick.

Anonymous
10/19/25(Sun)13:35:49 No.106940768

Anonymous 10/19/25(Sun)13:35:49 No.106940768

>>106940742
Hey.
Paw, called anon little one, ran with the mistress thing in context correctly.
You know what? I've seen worse.
How does it respond if you turn the
>"ahhh ahhh mistress"
into a whole ass descriptive paragraph?
Does it still respond with a short sentence or does it follow your input?

Anonymous
10/19/25(Sun)13:41:35 No.106940820

Anonymous 10/19/25(Sun)13:41:35 No.106940820

File: rp-2b.png (21 KB, 975x277)

21 KB PNG

>>106940377
with the spaces? what about sysprompt?

Anonymous
10/19/25(Sun)13:43:42 No.106940849

Anonymous 10/19/25(Sun)13:43:42 No.106940849

>>106940821
>>106940821
>>106940821

Anonymous
10/19/25(Sun)13:45:19 No.106940864

Anonymous 10/19/25(Sun)13:45:19 No.106940864

>>106940768
I used link rel as a completion test:

https://files.catbox.moe/q768fb.txt

And used the following command via llama.cpp
./build/bin/llama-cli -m ./rp-sft-merged_1000-f16.gguf -f Nala-Test_Gemma2.txt

Anonymous
10/19/25(Sun)14:46:25 No.106941440

Anonymous 10/19/25(Sun)14:46:25 No.106941440

>>106940820
I forgot to mention the chat template is Gemma. I have the same issue where if I forgot to include the
--chat-template gemma
flag then the model would immediately start talking about random shit ad Infinitum because llama-cli by default expects your prompts to be in the prompt format. The model expects. Using that flag fixed the issue. So maybe you need to tell your web UI/ inference engine to use that prompt template

Anonymous
10/19/25(Sun)14:54:43 No.106941492

Anonymous 10/19/25(Sun)14:54:43 No.106941492

>>106941440
>the chat template is Gemma
>>106940377
>
<start_of_turn>user [your prompt goes here] <end_of_turn> <start_of_turn>model [model response]
And then we have this
>https://ai.google.dev/gemma/docs/core/prompt-structure
Every.... Fucking... time...

Anonymous
10/19/25(Sun)14:55:49 No.106941503

Anonymous 10/19/25(Sun)14:55:49 No.106941503

>>106941492
I don't see what the issue is.

Anonymous
10/19/25(Sun)14:57:37 No.106941521

Anonymous 10/19/25(Sun)14:57:37 No.106941521

>>106941503
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Anonymous
10/19/25(Sun)14:58:47 No.106941537

Anonymous 10/19/25(Sun)14:58:47 No.106941537

>>106941503

<start_of_turn>user
knock knock<end_of_turn>
<start_of_turn>model
who is there<end_of_turn>
<start_of_turn>user
Gemma<end_of_turn>
<start_of_turn>model
Gemma who?<end_of_turn>

He used spaces instead of newlines.

Anonymous
10/19/25(Sun)15:00:11 No.106941553

Anonymous 10/19/25(Sun)15:00:11 No.106941553

>>106941521
Numb nuts, explain what the issue is instead of looking for an excuse to argue with people. You aren't even using the model. So what are you complaining about?

Anonymous
10/19/25(Sun)15:03:16 No.106941580

Anonymous 10/19/25(Sun)15:03:16 No.106941580

>>106941537
Did you read the text file it uploaded?

https://files.catbox.moe/q768fb.txt

The example written here >>106940864 was just written my me on the fly as a rough explanation as to how you're supposed to format your prompts

Anonymous
10/19/25(Sun)15:04:06 No.106941588

Anonymous 10/19/25(Sun)15:04:06 No.106941588

>>106941553
>explain what the issue is
You really cannot see the issue? Like all the other times?

Anonymous
10/19/25(Sun)15:05:45 No.106941605

Anonymous 10/19/25(Sun)15:05:45 No.106941605

>>106941588
See >>106941580

Even if that that prompt template was formatted as badly you say it is (it isn't, you didn't bother to read the file), that would not be causing the engine to spam text indefinitely.

Anonymous
10/19/25(Sun)15:20:16 No.106941737

Anonymous 10/19/25(Sun)15:20:16 No.106941737

>>106939493
>GPTards
Yann nooooo!!!! I am the autist here who has no friends and even I know not to say that cause it can be misconstrued! YAANNN!!!!

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.