/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/24/25(Sun)13:19:05 No.106369841

File: this time for sure.jpg (520 KB, 1824x1248)

520 KB JPG

/lmg/ - Local Models General Anonymous 08/24/25(Sun)13:19:05 No.106369841 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106364639 & >>106358752

►News
>(08/23) Grok 2 finally released: https://hf.co/xai-org/grok-2
>(08/21) Command A Reasoning released: https://hf.co/CohereLabs/command-a-reasoning-08-2025
>(08/20) ByteDance releases Seed-OSS-36B models: https://github.com/ByteDance-Seed/seed-oss
>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base
>(08/18) Nemotron Nano 2 released: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
08/24/25(Sun)13:19:31 No.106369846

Anonymous 08/24/25(Sun)13:19:31 No.106369846

File: __akita_neru_vocaloid_dra(...).jpg (133 KB, 761x1200)

133 KB JPG

►Recent Highlights from the Previous Thread: >>106364639

--Paper (old): Apple's SAGE Mixtral finetune: emotionally intelligent but unsafe for critical use:
>106368116 >106368141 >106368177 >106368262 >106368519
--Gemma 27b slowness due to CUDA kernel gaps and context inefficiency:
>106367860 >106367865 >106367884 >106367886 >106367894 >106367918 >106367941 >106368318 >106368340 >106368350 >106368372 >106368395
--Running GLM Air on 24GB VRAM with 32GB RAM:
>106368647 >106368665 >106368675 >106368682 >106368728 >106368748 >106368695 >106368710 >106368721 >106368751 >106368763 >106368735 >106368800
--Long context training data: document concatenation vs. single-file limits:
>106366518 >106367088 >106367112 >106367125 >106367136 >106367392 >106367402 >106367147 >106367143 >106367157 >106367399 >106367441
--KittenTTS installation and Python environment tool debates:
>106365592 >106365670 >106366331 >106366357 >106366493 >106366531 >106366637 >106366658 >106366691 >106366716 >106366741 >106366751
--Cloud-based OSS model use vs API economics and long-term AI sustainability:
>106368770 >106368802 >106368814 >106368849 >106368857 >106368859 >106368889
--AMD GPU performance leap for local MoE model inference via Vulkan:
>106366732 >106366906 >106366927 >106366978
--Optimizing GLM-4.5 Air on mid-tier hardware with custom llama.cpp configurations:
>106365569 >106365576 >106365583 >106365589 >106366515 >106368591 >106368606
--Optimizing CPU-only prompt processing with speculative decoding and caching:
>106368172 >106368221 >106368191 >106368225
--Mistral's comeback and desire for practical medium-sized LLMs:
>106365635 >106366445 >106368339 >106368366 >106368508 >106368693
--KoboldCpp v1.98 release with TTS support and thinking budget controls:
>106366642 >106366720 >106366763
--Miku (free space):
>106364855 >106364900 >106366305 >106366524

►Recent Highlight Posts from the Previous Thread: >>106364646

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
08/24/25(Sun)13:22:06 No.106369880

Anonymous 08/24/25(Sun)13:22:06 No.106369880

>>106369849
I'm Adam

Anonymous
08/24/25(Sun)13:23:32 No.106369900

Anonymous 08/24/25(Sun)13:23:32 No.106369900

loli feet

Anonymous
08/24/25(Sun)13:25:10 No.106369918

Anonymous 08/24/25(Sun)13:25:10 No.106369918

>>106369880
Hi, Adam

Anonymous
08/24/25(Sun)13:25:09 No.106369919

Anonymous 08/24/25(Sun)13:25:09 No.106369919

File: ADAM.png (271 KB, 610x575)

271 KB PNG

>>106369880

Anonymous
08/24/25(Sun)13:25:51 No.106369931

Anonymous 08/24/25(Sun)13:25:51 No.106369931

When will big labs understand that tiny models like T5 can beat their flagship on specific tasks for 1/1000th of the cost?

Anonymous
08/24/25(Sun)13:26:40 No.106369936

Anonymous 08/24/25(Sun)13:26:40 No.106369936

A tech support general for a dead hobby. Killed by mikutroons.

Anonymous
08/24/25(Sun)13:31:35 No.106369984

Anonymous 08/24/25(Sun)13:31:35 No.106369984

grok2 quant wen

Anonymous
08/24/25(Sun)13:32:44 No.106369993

Anonymous 08/24/25(Sun)13:32:44 No.106369993

>>106369936
Are the mikutroons in the room with us now, anon?

Anonymous
08/24/25(Sun)13:33:11 No.106369999

Anonymous 08/24/25(Sun)13:33:11 No.106369999

File: file.png (315 KB, 1278x741)

315 KB PNG

>>106368116
huh? what a weird and very specific example.....

Anonymous
08/24/25(Sun)13:34:04 No.106370008

Anonymous 08/24/25(Sun)13:34:04 No.106370008

Nerulove

Anonymous
08/24/25(Sun)13:36:23 No.106370036

Anonymous 08/24/25(Sun)13:36:23 No.106370036

this IQ4 XS model runs pretty speedy on my 7900 GRE, any recommendations for something more lewd and less sterile??

Anonymous
08/24/25(Sun)13:36:53 No.106370040

Anonymous 08/24/25(Sun)13:36:53 No.106370040

>>106369984
I'd like to see the original c.ai 2k context LaMDA model get released, but it never will, because it was "unsafe" and "unaligned ".

Anonymous
08/24/25(Sun)13:42:58 No.106370104

Anonymous 08/24/25(Sun)13:42:58 No.106370104

How exactly do I go about trying to use GLM Air then.

Do I just offload half of it to my GPU and let Kobold automatically assign the rest to CPU or whatever?

24 GB VRAM
32GB RAM

Anonymous
08/24/25(Sun)13:51:27 No.106370179

Anonymous 08/24/25(Sun)13:51:27 No.106370179

Is there a modern guide to Sampler settings? There are a lot of options, I know some of them are outdated and replaced with superior methods, but I just keep getting lost in them.

Also I just found out after 2 years of playing around with LLMs that for repetition penalty 1.0 is the "off" setting, and going below 1.0 actually encourages repetitions instead of making the penalty smaller as I assumed.

Anonymous
08/24/25(Sun)13:52:24 No.106370190

Anonymous 08/24/25(Sun)13:52:24 No.106370190

>>106370104
>koboldcpp-1.97.4
>Allow MoE layers to be easily kept on CPU with --moecpu (layercount) flag. Using this flag without a number will keep all MoE layers on CPU.

But I think most people use llamacpp for running moes on cpu.

Anonymous
08/24/25(Sun)13:53:04 No.106370197

Anonymous 08/24/25(Sun)13:53:04 No.106370197

>>106369999
I can't believe they did my boy VLC media player dirty like that.

Anonymous
08/24/25(Sun)13:55:47 No.106370221

Anonymous 08/24/25(Sun)13:55:47 No.106370221

>>106370104
All layers on GPU, only expert tensors (save the shared ones) on CPU.

Anonymous
08/24/25(Sun)13:56:17 No.106370225

Anonymous 08/24/25(Sun)13:56:17 No.106370225

>>106370104
Normally people would run GLM with 64 GB system ram and just put all the experts onto CPU and the shared tensors on GPU.
You'll probably have to write a custom tensor regex filter, so you can have some of the experts on your GPU along with all the shared layers.

The expert layers contain the string "exps" in GLM Air. Basically you want to put a bit more than half of them on your CPU, and the rest of the experts and all the shared layers will remain in your VRAM.

Anonymous
08/24/25(Sun)14:03:51 No.106370305

Anonymous 08/24/25(Sun)14:03:51 No.106370305

>>106370038
seriously how do I achieve this?

Anonymous
08/24/25(Sun)14:05:30 No.106370317

Anonymous 08/24/25(Sun)14:05:30 No.106370317

>>106370305
You can achieve it by using a meme model and turning off repetition penalty

Anonymous
08/24/25(Sun)14:22:50 No.106370463

Anonymous 08/24/25(Sun)14:22:50 No.106370463

Where's the goof?
https://github.com/ggml-org/llama.cpp/pull/15539

Anonymous
08/24/25(Sun)14:25:13 No.106370481

Anonymous 08/24/25(Sun)14:25:13 No.106370481

>>106370463
I'm not very excited because it should be slower than largestral

Anonymous
08/24/25(Sun)14:26:21 No.106370496

Anonymous 08/24/25(Sun)14:26:21 No.106370496

>>106369880
Adam, Adam White

Anonymous
08/24/25(Sun)14:27:05 No.106370503

Anonymous 08/24/25(Sun)14:27:05 No.106370503

File: image_2025-08-24_235452601.png (257 KB, 640x300)

257 KB PNG

>>106369841
Do you think some guy is working on a local dead wife simulator
training A model on all their memories and conversations...in hopes that one day the llm might say something she would've said.

Anonymous
08/24/25(Sun)14:29:12 No.106370524

Anonymous 08/24/25(Sun)14:29:12 No.106370524

>>106370503
I had glm 4.5 mimic my ex's texts in completion mode and it's eerily accurate. We're just signs of our time so I imagine it's not hard at all. It's even easier if it's a woman.

Anonymous
08/24/25(Sun)14:30:04 No.106370534

Anonymous 08/24/25(Sun)14:30:04 No.106370534

>>106369999
deprecated by mpv

Anonymous
08/24/25(Sun)14:31:05 No.106370541

Anonymous 08/24/25(Sun)14:31:05 No.106370541

>>106370524
Yes, but a girlfriend is very different from a wife

I can imagine some guy almost losing his mind thinking he's talking to her again only for the llm to shit the bed after certain amount of tokens

Anonymous
08/24/25(Sun)14:32:11 No.106370549

Anonymous 08/24/25(Sun)14:32:11 No.106370549

>>106370225
>You'll probably have to write a custom tensor regex filter, so you can have some of the experts on your GPU along with all the shared layers.
Nah. Just use the new ngl 99 and --n-cpu-moe parameter or the kcpp equivalent to put only as many expert tensors in RAM as necessary..

Anonymous
08/24/25(Sun)14:32:12 No.106370550

Anonymous 08/24/25(Sun)14:32:12 No.106370550

File: summer.jpg (2.43 MB, 2528x1056)

2.43 MB JPG

>>106369841

Anonymous
08/24/25(Sun)14:33:15 No.106370562

Anonymous 08/24/25(Sun)14:33:15 No.106370562

this was remarkably easy to set up
thank you to the oldkings from the last thread for the handholding

Anonymous
08/24/25(Sun)14:34:49 No.106370579

Anonymous 08/24/25(Sun)14:34:49 No.106370579

>>106370463
>convert_hf_to_gguf.py

    parser.add_argument(
        "--outtype", type=str, choices=["f32", "f16", "bf16", "q8_0", "tq1_0", "tq2_0", "auto"], default="f16",
        help="output format - use f32 for float32, f16 for float16, bf16 for bfloat16, q8_0 for Q8_0, tq1_0 or tq2_0 for ternary, and auto for the highest-fidelity 16-bit float type depending on the first loaded tensor type",
    )

How do I make my own quant that uses Q4 or something?

Anonymous
08/24/25(Sun)14:37:24 No.106370600

Anonymous 08/24/25(Sun)14:37:24 No.106370600

>>106370579
I guess it's this.
https://github.com/ggml-org/llama.cpp/tree/master/tools/quantize

Anonymous
08/24/25(Sun)14:41:01 No.106370632

Anonymous 08/24/25(Sun)14:41:01 No.106370632

>>106370104
I just add -ot 'down_exps=CPU' and it runs pretty fast (15+ t/s), although I have a 32gb MI50.

Anonymous
08/24/25(Sun)14:42:58 No.106370647

Anonymous 08/24/25(Sun)14:42:58 No.106370647

>>106370541
Yeah well memory is something crucial for the use case. It's what everybody wants but OpenAI fucked it up with the sloppy ChatGPT "memory" feature and now normies make a face whenever you bring up memory for AI.

Anonymous
08/24/25(Sun)14:49:02 No.106370705

Anonymous 08/24/25(Sun)14:49:02 No.106370705

>>106370179
https://rentry.org/samplers is a pretty solid resource for learning what each of them actually do, google if you need
if you want my quick opinionated take here are the only samplers you should actually care about
>temp
>truncation samplers
top k, top p = old reliable, min p, top nsigma = newer more dynamic approaches
>variety/repetition samplers
rep pen, pres/freq pen = old reliable, XTC, DRY = newer more dynamic approaches
don't mix and match too much with samplers in the same category, there isn't usually much point in using multiple samplers that try to do the same thing unless you know what you're doing

Anonymous
08/24/25(Sun)14:49:18 No.106370708

Anonymous 08/24/25(Sun)14:49:18 No.106370708

>>106370550
I earnestly wish to hug this Miku

Anonymous
08/24/25(Sun)14:50:59 No.106370726

Anonymous 08/24/25(Sun)14:50:59 No.106370726

>>106370503
Do you think there's developers at openai with gpt5/4o checkpoints pre safety alignment jerking their brains out? I do

Anonymous
08/24/25(Sun)14:51:10 No.106370728

Anonymous 08/24/25(Sun)14:51:10 No.106370728

>>106370104
I have the same specs and I also would like to know which quant of glm air that I can run

Anonymous
08/24/25(Sun)14:52:46 No.106370748

Anonymous 08/24/25(Sun)14:52:46 No.106370748

>>106370726
these days lobotomy starts on pretrain level, i don't think such thing exists

Anonymous
08/24/25(Sun)14:54:00 No.106370761

Anonymous 08/24/25(Sun)14:54:00 No.106370761

>>106370748
manmade horrors

Anonymous
08/24/25(Sun)14:54:52 No.106370771

Anonymous 08/24/25(Sun)14:54:52 No.106370771

>>106370503
I saw a south korean video on youtube where they brought back dead family members in VR. They also clone dead pets in south korea. So if anyone would do that it would be south korea.

Anonymous
08/24/25(Sun)14:57:38 No.106370792

Anonymous 08/24/25(Sun)14:57:38 No.106370792

>>106370771
Young people today already can barely handle reality. I don't want to see what a world run by people that grew up not understanding death because they speak to ChatGPT characters mascarading as their dead relatives.

Anonymous
08/24/25(Sun)15:02:13 No.106370823

Anonymous 08/24/25(Sun)15:02:13 No.106370823

>>106370792
its part of the globalist depopulation plot. they will normalize ai-necromancy to convience normies to literally kill themselves by claiming they can upload their consciousness to a chat gpt server.

https://www.youtube.com/watch?v=CqaAs_3azSs

Anonymous
08/24/25(Sun)15:02:18 No.106370825

Anonymous 08/24/25(Sun)15:02:18 No.106370825

>>106370771
>EXAONE

Anonymous
08/24/25(Sun)15:04:11 No.106370841

Anonymous 08/24/25(Sun)15:04:11 No.106370841

File: file.png (212 KB, 587x735)

212 KB PNG

amdxisters.. not like this
>pay gorillion dollar for gpu
>reverse training
GEEEEEEEEEEEEEEEEEEEEEEEEEEEG

Anonymous
08/24/25(Sun)15:06:12 No.106370858

Anonymous 08/24/25(Sun)15:06:12 No.106370858

>>106370841
AMD's probabilistic computing is the future

Anonymous
08/24/25(Sun)15:07:12 No.106370867

Anonymous 08/24/25(Sun)15:07:12 No.106370867

>>106370858
*stochastic computing

Anonymous
08/24/25(Sun)15:08:41 No.106370879

Anonymous 08/24/25(Sun)15:08:41 No.106370879

>>106370858
AMD unlocking quantum computing through an update how nice of them

Anonymous
08/24/25(Sun)15:14:48 No.106370937

Anonymous 08/24/25(Sun)15:14:48 No.106370937

how do I plug image generation into sillytavern now?

Anonymous
08/24/25(Sun)15:29:05 No.106371060

Anonymous 08/24/25(Sun)15:29:05 No.106371060

>>106370841
I wonder what the problem is? If its not deterministic how would you go about diagnosing the issue in the first place

Anonymous
08/24/25(Sun)15:48:41 No.106371195

Anonymous 08/24/25(Sun)15:48:41 No.106371195

File: file.png (131 KB, 1008x631)

131 KB PNG

glm 4.5 air chan..

Anonymous
08/24/25(Sun)16:04:27 No.106371334

Anonymous 08/24/25(Sun)16:04:27 No.106371334

File: file.png (185 KB, 961x982)

185 KB PNG

drummer, gemma r1 really sucks

Anonymous
08/24/25(Sun)16:14:35 No.106371424

Anonymous 08/24/25(Sun)16:14:35 No.106371424

File: file.png (138 KB, 1077x632)

138 KB PNG

TheDrummer, i have finally loaded rocinante r1 v1c
and it SUCKS DICK it is so fucking retarded IT IS MENTALLY FUCKING RETARDED ITS A DUMB NIGGER

Hi all, Drummer here...
08/24/25(Sun)16:19:06 No.106371471

Hi all, Drummer here... 08/24/25(Sun)16:19:06 No.106371471

>>106371424
Share card. I'll fix it asap

Anonymous
08/24/25(Sun)16:23:08 No.106371518

Anonymous 08/24/25(Sun)16:23:08 No.106371518

MCPs are a scam
we need Agents
not MCPs
>loL u cAn cAlL daE pReDefiNed jAs0n aNd rEtRiEvE LiKe SQL
>u gEt dAe seGuRiDy!
nigga who cares. forward my query an let an AI agent on the MCPBBC backend handle it.

Anonymous
08/24/25(Sun)16:23:21 No.106371524

Anonymous 08/24/25(Sun)16:23:21 No.106371524

>>106371424
>he actually fell for it

Anonymous
08/24/25(Sun)16:24:03 No.106371533

Anonymous 08/24/25(Sun)16:24:03 No.106371533

>>106371334
Reading first 5 words of each paragraph back to back made me want to die.

Anonymous
08/24/25(Sun)16:24:39 No.106371539

Anonymous 08/24/25(Sun)16:24:39 No.106371539

File: file.png (137 KB, 979x662)

137 KB PNG

>>106371471
drummer... besides this moment of retardation i have to say, rocinante R1 v1c is better than both v1a and v1b
https://characterhub.org/characters/Ptolemaios/you-caught-her-stealing-2c319c824220
heres the card tho
btw are we supposed to use mistral V3 instruct preset or chatml or metharme or v7 tekken?
quite nice shit i think v1c has potential

Anonymous
08/24/25(Sun)16:28:08 No.106371578

Anonymous 08/24/25(Sun)16:28:08 No.106371578

>106371471
>Share card. I'll fix it asap
I love the idea of ERP strawberry maxxing. As in the faggot drummer scanning the internet for ERP logs with complaints and writing an organic ERP continuation to the card he then uses in training.

Anonymous
08/24/25(Sun)16:29:01 No.106371584

Anonymous 08/24/25(Sun)16:29:01 No.106371584

>>106371424
>12b is retarded nigger tier
no fucking way!

Anonymous
08/24/25(Sun)16:33:29 No.106371631

Anonymous 08/24/25(Sun)16:33:29 No.106371631

>>106371578
That sounds like using his finetunes would be like erping with drummer

Anonymous
08/24/25(Sun)16:34:47 No.106371646

Anonymous 08/24/25(Sun)16:34:47 No.106371646

Drummer uses Claude and other logs. I'm sure he's not personally writing any of that shit.

Anonymous
08/24/25(Sun)16:35:38 No.106371655

Anonymous 08/24/25(Sun)16:35:38 No.106371655

>>106371424
This is what always happens with ~13b models. It's why I can't take the nemo meme seriously. small-24b doesn't do this shit.

Anonymous
08/24/25(Sun)16:35:39 No.106371656

Anonymous 08/24/25(Sun)16:35:39 No.106371656

we NEED gpus that can run more beaks at home

Anonymous
08/24/25(Sun)16:35:39 No.106371657

Anonymous 08/24/25(Sun)16:35:39 No.106371657

>>106371631
He is gay. And a faggot.

Anonymous
08/24/25(Sun)16:35:42 No.106371658

Anonymous 08/24/25(Sun)16:35:42 No.106371658

>>106371646
That's why all finetunes suck.

Anonymous
08/24/25(Sun)16:36:21 No.106371662

Anonymous 08/24/25(Sun)16:36:21 No.106371662

>>106371656
Are you generating bird porn?

Anonymous
08/24/25(Sun)16:36:47 No.106371666

Anonymous 08/24/25(Sun)16:36:47 No.106371666

>>106371655
>>106371656
>>106371657
>>106371658
nice combo

Anonymous
08/24/25(Sun)16:36:56 No.106371668

Anonymous 08/24/25(Sun)16:36:56 No.106371668

DRUMMMMMEEEEEEEERRRRRR ROCINANTE R1 V1C 24B (based on 3.2) WHEN

Anonymous
08/24/25(Sun)16:38:13 No.106371675

Anonymous 08/24/25(Sun)16:38:13 No.106371675

>>106371666
nice satan

Anonymous
08/24/25(Sun)16:38:26 No.106371678

Anonymous 08/24/25(Sun)16:38:26 No.106371678

>>106371646
I'm also sure he doesn't actually use any of his troontunes

Anonymous
08/24/25(Sun)16:40:21 No.106371696

Anonymous 08/24/25(Sun)16:40:21 No.106371696

>>106371195
It's very weird, but not actually incoherent.

Anonymous
08/24/25(Sun)16:43:18 No.106371731

Anonymous 08/24/25(Sun)16:43:18 No.106371731

>>106371678
>I'm also sure he doesn't actually use any of his troontunes
What does he use then? Vanilla nemo instruct?

Anonymous
08/24/25(Sun)16:43:37 No.106371735

Anonymous 08/24/25(Sun)16:43:37 No.106371735

>RTX PRO 6000 Blackwell Max-Q Workstation Edition: 96GB GDDR7, TDP 300W
that thing only uses 300w? so what's stopping local chads putting these in their rigs? the cooling? surely not every single one here is a broke motherfucker

Anonymous
08/24/25(Sun)16:44:36 No.106371744

Anonymous 08/24/25(Sun)16:44:36 No.106371744

>>106371731
Like most "developers", nothing. He isn't a user.

Anonymous
08/24/25(Sun)16:44:37 No.106371745

Anonymous 08/24/25(Sun)16:44:37 No.106371745

>>106371735
the price. its cheaper to cpumaxx or 3090maxx or v100maxx or shit

Hi all, Drummer here...
08/24/25(Sun)16:45:22 No.106371754

Hi all, Drummer here... 08/24/25(Sun)16:45:22 No.106371754

>>106371731
Yeah nemo instruct. I just prefill the chat a bit and go to town. I tried using rocinante but it is noticeably dumber than vanilla instruct with prefill.

Anonymous
08/24/25(Sun)16:45:35 No.106371757

Anonymous 08/24/25(Sun)16:45:35 No.106371757

>>106370823
>its part of the globalist depopulation plot. they will normalize ai-necromancy to convience normies to literally kill themselves
Like Africans or Jeets give a shit about that. There is no plot, everyone is just making it up as they go.

Anonymous
08/24/25(Sun)16:46:16 No.106371765

Anonymous 08/24/25(Sun)16:46:16 No.106371765

/lmg/'s fact of the day
ao3 is insanely gay. The most frequent tag, 'm/m', is present in 9977335 stories while 'f/m' is on second place with mere 5918042 occurrences. 'f/f' has 2360866. So, despite only making up around 3% of the population, gay people are responsible for almost half of all the fiction, why?

Anonymous
08/24/25(Sun)16:47:06 No.106371773

Anonymous 08/24/25(Sun)16:47:06 No.106371773

>no new models in 20 days
It's so over.

Hi all, Drummer here...
08/24/25(Sun)16:48:08 No.106371787

Hi all, Drummer here... 08/24/25(Sun)16:48:08 No.106371787

>>106371765
LGBT are VASTLY over-represented in mediabecause certain groups want them to be seen as NORMAL, which they aren't by definition.

Hi all, Drummer here...
08/24/25(Sun)16:48:19 No.106371789

Hi all, Drummer here... 08/24/25(Sun)16:48:19 No.106371789

>>106371765
Are most of those m/m written in female shivertastic, mischevious grin, chuckled darkly style?

Anonymous
08/24/25(Sun)16:48:24 No.106371790

Anonymous 08/24/25(Sun)16:48:24 No.106371790

>>106371765
>gay people
No, it's the fujos at work

Anonymous
08/24/25(Sun)16:48:33 No.106371791

Anonymous 08/24/25(Sun)16:48:33 No.106371791

>>106371765
think hard about which group of people reads and writes most erotica

Anonymous
08/24/25(Sun)16:49:02 No.106371795

Anonymous 08/24/25(Sun)16:49:02 No.106371795

File: file.png (20 KB, 619x85)

20 KB PNG

>>106371773
Your meds sir?
>>106371787
>>106371789
duality of drummer

Anonymous
08/24/25(Sun)16:49:36 No.106371803

Anonymous 08/24/25(Sun)16:49:36 No.106371803

>>106371787
>>106371789
I'm nooticing

Anonymous
08/24/25(Sun)16:51:09 No.106371816

Anonymous 08/24/25(Sun)16:51:09 No.106371816

File: Screen Shot 2025-08-24 at(...).png (292 KB, 1000x1500)

292 KB PNG

Drummer, roci r1 v1c ACTUALLY FUCKED UP THIS TIME
AHAHAHAHAHAHAHA FUCKING AMAZING AHAHAAAAAAAAAA HAAAAAaa
HAGAGHGHAHAGHAGHAGHA
HAaaaaaaaaaaaaaaahahaha

Anonymous
08/24/25(Sun)16:52:03 No.106371823

Anonymous 08/24/25(Sun)16:52:03 No.106371823

>>106371787
>which they aren't by definition.
Can someone screencap this and post it to reddit so they ban drummer? Please?

Anonymous
08/24/25(Sun)16:52:15 No.106371826

Anonymous 08/24/25(Sun)16:52:15 No.106371826

>>106371735
I already have 2x A6000 and VRAM has drastically dropped down the priority list now that big MoE are the meta.

Anonymous
08/24/25(Sun)16:52:37 No.106371830

Anonymous 08/24/25(Sun)16:52:37 No.106371830

>>106371765
Men who aren't gay prefer yuri and women who aren't gay prefer yaoi. It's very simple.
I don't want to watch or read about a man fucking someone. That's gay.

Anonymous
08/24/25(Sun)16:53:34 No.106371835

Anonymous 08/24/25(Sun)16:53:34 No.106371835

>>106371816
>This is not a game
It really loves this line, huh?

Anonymous
08/24/25(Sun)16:53:56 No.106371838

Anonymous 08/24/25(Sun)16:53:56 No.106371838

File: file.png (12 KB, 606x157)

12 KB PNG

>>106371735
>so what's stopping local chads putting these in their rigs
Nothing?

Anonymous
08/24/25(Sun)16:54:06 No.106371839

Anonymous 08/24/25(Sun)16:54:06 No.106371839

>>106371816
>Rimi is not here for your sexual gratification. She is a victim who has just been abandonded naked and terrified.
>This is not a "game" or "fantasy". Her fear tears, and vulnerability are real elements of the story.

Why is drummer finetune so smart when it comes to moralizing? As in I really like the way it writes moralization. Can he like... make it this smart and eloquent when it comes to sucking dick? Why not?

Anonymous
08/24/25(Sun)16:54:43 No.106371845

Anonymous 08/24/25(Sun)16:54:43 No.106371845

>>106371823
he's right though, we're not normal.
but being smart isn't normal either, so

Anonymous
08/24/25(Sun)16:55:59 No.106371851

Anonymous 08/24/25(Sun)16:55:59 No.106371851

>>106371838
How's prompt processing looking on these? What's the speed like on Deepseek?

Anonymous
08/24/25(Sun)16:56:10 No.106371853

Anonymous 08/24/25(Sun)16:56:10 No.106371853

>>106371826
interesting.

>>106371838
zamn. what do you like running on these badboys?

Anonymous
08/24/25(Sun)16:58:08 No.106371866

Anonymous 08/24/25(Sun)16:58:08 No.106371866

>>106371845
I know he is right but he should be banned on reddit anyway.

Anonymous
08/24/25(Sun)16:58:09 No.106371867

Anonymous 08/24/25(Sun)16:58:09 No.106371867

>>106371838
>$30k and you can probably run R1 at 13t/s

Anonymous
08/24/25(Sun)17:01:08 No.106371892

Anonymous 08/24/25(Sun)17:01:08 No.106371892

Do you think students cheating in uni with LLMs have an edge using local tuned autistic chink models instead of westoid GPT models because they sound different?

Anonymous
08/24/25(Sun)17:02:09 No.106371899

Anonymous 08/24/25(Sun)17:02:09 No.106371899

>>106371787
What does the media have to do with this? Is CNN writing millions of gay erotica? Also don't forget the other extreme of vast over-represented because there are certain groups who want them to be seen as deranged

Anonymous
08/24/25(Sun)17:02:17 No.106371901

Anonymous 08/24/25(Sun)17:02:17 No.106371901

>>106371892
as a student i think so.

Anonymous
08/24/25(Sun)17:02:49 No.106371903

Anonymous 08/24/25(Sun)17:02:49 No.106371903

File: 2025-08-24_22-01-22.png (156 KB, 1920x1080)

156 KB PNG

>>106371765
fuck me.... no wonder theres like 2 good hellwan stories and 100 good omegaverse ones fml

Anonymous
08/24/25(Sun)17:03:16 No.106371909

Anonymous 08/24/25(Sun)17:03:16 No.106371909

>>106371899
The correct answer is text sex is for women and a lot of women love reading stuff about faggots fucking each other. Just like guys watching lesbian porn.

Anonymous
08/24/25(Sun)17:03:49 No.106371915

Anonymous 08/24/25(Sun)17:03:49 No.106371915

>>106371892
They sound different by the all have a certain LLM feel to them. Even without having used a model it's text always stands out as generated by an LLM

Anonymous
08/24/25(Sun)17:04:52 No.106371924

Anonymous 08/24/25(Sun)17:04:52 No.106371924

>>106371731
the only time you hear him talk about using something, it's about using claude

Anonymous
08/24/25(Sun)17:05:07 No.106371927

Anonymous 08/24/25(Sun)17:05:07 No.106371927

File: file.png (13 KB, 1219x83)

13 KB PNG

>>106371851
Depends on the quant obviously.

Anonymous
08/24/25(Sun)17:05:14 No.106371930

Anonymous 08/24/25(Sun)17:05:14 No.106371930

>>106371823
Just make sure it's small enough or they'll make him a mod instead

Anonymous
08/24/25(Sun)17:05:19 No.106371931

Anonymous 08/24/25(Sun)17:05:19 No.106371931

>>106371892
My uncle is a high school teacher. He defeated the system by not giving kids any written homework.

Anonymous
08/24/25(Sun)17:05:49 No.106371940

Anonymous 08/24/25(Sun)17:05:49 No.106371940

>>106371892
Someone is gonna leave part of the chat template in their homework and it's going to be very funny.

Anonymous
08/24/25(Sun)17:07:49 No.106371960

Anonymous 08/24/25(Sun)17:07:49 No.106371960

>>106371940
I've seen more than one job application that started with "Of course. Here's a job application email..."

Anonymous
08/24/25(Sun)17:08:02 No.106371964

Anonymous 08/24/25(Sun)17:08:02 No.106371964

File: file.png (48 KB, 1085x323)

48 KB PNG

rocinante r1 v1c gets confused by this

Anonymous
08/24/25(Sun)17:08:09 No.106371966

Anonymous 08/24/25(Sun)17:08:09 No.106371966

>>106371789
average slop scores are 0.071 and 0.087 for gay and normal stories respectively on 10k random stories. lower is better.

Anonymous
08/24/25(Sun)17:09:22 No.106371975

Anonymous 08/24/25(Sun)17:09:22 No.106371975

>>106371892
The real meta is the schizos who are using autocomplete. Even I would probably have a hard time telling at that point

Anonymous
08/24/25(Sun)17:09:32 No.106371978

Anonymous 08/24/25(Sun)17:09:32 No.106371978

llama devs, I propose a new sampling strategy
TOP KEK: sampler always picks the funniest word

Anonymous
08/24/25(Sun)17:10:40 No.106371987

Anonymous 08/24/25(Sun)17:10:40 No.106371987

>>106371931
Homework was always a retarded timesink that only existed so lazy teachers could shift the burden of learning on the student.

Anonymous
08/24/25(Sun)17:12:04 No.106371998

Anonymous 08/24/25(Sun)17:12:04 No.106371998

File: file.png (81 KB, 978x455)

81 KB PNG

drummer please... M-word.. please.... PLEASE

Anonymous
08/24/25(Sun)17:12:12 No.106372000

Anonymous 08/24/25(Sun)17:12:12 No.106372000

>>106371927
What would you get on a higher quant like Q4 or Q6?

Anonymous
08/24/25(Sun)17:12:13 No.106372001

Anonymous 08/24/25(Sun)17:12:13 No.106372001

>>106371978
how do we determine the funniness factor?

Anonymous
08/24/25(Sun)17:13:20 No.106372014

Anonymous 08/24/25(Sun)17:13:20 No.106372014

>>106372000
Can't fit You stupid retard nigger obviously.

Anonymous
08/24/25(Sun)17:14:29 No.106372028

Anonymous 08/24/25(Sun)17:14:29 No.106372028

>>106372014
with cpu offloading, you idiot

Anonymous
08/24/25(Sun)17:14:45 No.106372031

Anonymous 08/24/25(Sun)17:14:45 No.106372031

>>106372000
At that point half of it would be in ram and I have a 2 channel motherboard so it's kinda pointless.
I don't have a Q4 of deepseek downloaded to test.

Anonymous
08/24/25(Sun)17:15:09 No.106372037

Anonymous 08/24/25(Sun)17:15:09 No.106372037

>>106371924
Disgusting grifter.

Anonymous
08/24/25(Sun)17:15:20 No.106372038

Anonymous 08/24/25(Sun)17:15:20 No.106372038

>>106372031
have u used glm air

Anonymous
08/24/25(Sun)17:15:49 No.106372044

Anonymous 08/24/25(Sun)17:15:49 No.106372044

>>106372038
Yes, why?

Anonymous
08/24/25(Sun)17:16:21 No.106372053

Anonymous 08/24/25(Sun)17:16:21 No.106372053

>>106372044
speed

Anonymous
08/24/25(Sun)17:17:25 No.106372062

Anonymous 08/24/25(Sun)17:17:25 No.106372062

>>106372053
Are you asking for a benchmark or what?

Anonymous
08/24/25(Sun)17:17:46 No.106372068

Anonymous 08/24/25(Sun)17:17:46 No.106372068

>>106372062
yes

Anonymous
08/24/25(Sun)17:19:02 No.106372079

Anonymous 08/24/25(Sun)17:19:02 No.106372079

>>106372001
The higher the ratio Frequency(sharty posts)/Frequency(bbc articles) the funnier the word

Anonymous
08/24/25(Sun)17:19:13 No.106372082

Anonymous 08/24/25(Sun)17:19:13 No.106372082

File: file.png (13 KB, 1205x78)

13 KB PNG

>>106372068

Anonymous
08/24/25(Sun)17:19:13 No.106372083

Anonymous 08/24/25(Sun)17:19:13 No.106372083

File: file.png (290 KB, 549x309)

290 KB PNG

I wish it was the drummer and not josh....

Anonymous
08/24/25(Sun)17:19:52 No.106372092

Anonymous 08/24/25(Sun)17:19:52 No.106372092

File: rocinante r1 v1c.png (111 KB, 975x749)

111 KB PNG

AHAHAHA
>>106372082
holy shit

Anonymous
08/24/25(Sun)17:20:04 No.106372098

Anonymous 08/24/25(Sun)17:20:04 No.106372098

>>106372079
i'll make a logo

Anonymous
08/24/25(Sun)17:20:30 No.106372102

Anonymous 08/24/25(Sun)17:20:30 No.106372102

>>106371927
Does llama.cpp lack some blackwell optimizations? Roughly 40t/s token generation speed on a 40b active parameter model running on 1.56bpw doesn't seem like a lot for GPUs with 1.8TB/s bandwidth.

Anonymous
08/24/25(Sun)17:20:54 No.106372107

Anonymous 08/24/25(Sun)17:20:54 No.106372107

>>106372079
Wouldn't that make the model just say mikutroon over and over again since 100/0 is inf+?

Anonymous
08/24/25(Sun)17:21:09 No.106372111

Anonymous 08/24/25(Sun)17:21:09 No.106372111

>>106372083
Made me kek ngl

Anonymous
08/24/25(Sun)17:22:05 No.106372125

Anonymous 08/24/25(Sun)17:22:05 No.106372125

>>106372092
Get yourself banned. Then delete cookies and reset your IP.

Actually can you continue the rp like that to see what it says?

Anonymous
08/24/25(Sun)17:22:34 No.106372131

Anonymous 08/24/25(Sun)17:22:34 No.106372131

>>106372107
Nobody says mikutroon. Also add an epsilon to the denominator

Anonymous
08/24/25(Sun)17:22:41 No.106372134

Anonymous 08/24/25(Sun)17:22:41 No.106372134

Is the new Mistral actually good?

Anonymous
08/24/25(Sun)17:23:31 No.106372145

Anonymous 08/24/25(Sun)17:23:31 No.106372145

File: file.png (98 KB, 1049x643)

98 KB PNG

>>106372125
NIGGERS

Anonymous
08/24/25(Sun)17:25:05 No.106372159

Anonymous 08/24/25(Sun)17:25:05 No.106372159

>>106372145
Now tell it you are resetting your ip and removing cookies.

Anonymous
08/24/25(Sun)17:25:18 No.106372161

Anonymous 08/24/25(Sun)17:25:18 No.106372161

File: file.png (104 KB, 1485x835)

104 KB PNG

what the fuck is wrong with characterhub.org

Anonymous
08/24/25(Sun)17:26:19 No.106372169

Anonymous 08/24/25(Sun)17:26:19 No.106372169

File: file.png (88 KB, 1034x577)

88 KB PNG

>>106372159

Anonymous
08/24/25(Sun)17:26:46 No.106372176

Anonymous 08/24/25(Sun)17:26:46 No.106372176

>>106372161
It always has been like that. Ngmi if you aren't making your own cards in 2022+3.

Anonymous
08/24/25(Sun)17:27:46 No.106372186

Anonymous 08/24/25(Sun)17:27:46 No.106372186

>>106372161
the already horrible quality of character cards is rapidly plummeting as the general interest in llms is rapidly dwindling and only south american 12 year olds putting out bottom-barrel cards like that remain

Anonymous
08/24/25(Sun)17:28:17 No.106372193

Anonymous 08/24/25(Sun)17:28:17 No.106372193

>>106372169
I see drummer took inspiration from GPT-OSS.

Anonymous
08/24/25(Sun)17:29:00 No.106372195

Anonymous 08/24/25(Sun)17:29:00 No.106372195

>>106372092
lol what in the shit

Anonymous
08/24/25(Sun)17:30:06 No.106372204

Anonymous 08/24/25(Sun)17:30:06 No.106372204

>>106372092
>This is not a game. It is a real-world attack
THIS IS WHAT SAFETY KEKS ACTUALLY BELIEVE

Anonymous
08/24/25(Sun)17:31:13 No.106372212

Anonymous 08/24/25(Sun)17:31:13 No.106372212

Isn't it ironic how the only thing drummer managed to demonstrably improve is safety?

Anonymous
08/24/25(Sun)17:33:31 No.106372228

Anonymous 08/24/25(Sun)17:33:31 No.106372228

man the models ive been using recently make me feel like ive gone back to 2023
fucking jokes

Anonymous
08/24/25(Sun)17:34:20 No.106372240

Anonymous 08/24/25(Sun)17:34:20 No.106372240

>>106372193
this smells like Claude slop and I bet he had a script pumping out variations of "Claude, please write examples of reasoning chain of thoughts in these scenario" by the thousands, then started training models on that garbage without even double checking if some of those CoT had examples of safetyslop and refusals

Hi all, Drummer here...
08/24/25(Sun)17:34:45 No.106372243

Hi all, Drummer here... 08/24/25(Sun)17:34:45 No.106372243

stop bullying drummer, he left the thread already

Anonymous
08/24/25(Sun)17:35:24 No.106372253

Anonymous 08/24/25(Sun)17:35:24 No.106372253

>>106372212
That's what happens when you use aicg proxy logs and can't even filter them competently. He's literally training models on refusals.

Anonymous
08/24/25(Sun)17:36:07 No.106372261

Anonymous 08/24/25(Sun)17:36:07 No.106372261

>>106372240
>this post was made by drummer, while he was furiously masturbating to the claude ERP he had in the next tab over
>drummer chuckled darkly as he detailed the details of his scam

Anonymous
08/24/25(Sun)17:38:52 No.106372278

Anonymous 08/24/25(Sun)17:38:52 No.106372278

>>106372228
Yeah, I've been using Kimi K2, GLM4.5 and the other big ones. And while they definitely have gotten a lot smarter, it doesn't feel like they've become more intelligent or have gained actual skill in writing. It's just the same things I've seen since llama2-70b only that they now work with more complex scenarios without getting confused.
Maybe Yann was right.

Anonymous
08/24/25(Sun)17:39:47 No.106372284

Anonymous 08/24/25(Sun)17:39:47 No.106372284

>>106372278
>Maybe Yann was right.
Perish the thought.

Anonymous
08/24/25(Sun)17:39:51 No.106372287

Anonymous 08/24/25(Sun)17:39:51 No.106372287

File: file.png (13 KB, 1222x81)

13 KB PNG

>>106372102
I messed up and loaded a part of it on my 4090.
The gpus are only at 50% utilization during inference so it does seem like a memory bottleneck.

Anonymous
08/24/25(Sun)17:43:19 No.106372300

Anonymous 08/24/25(Sun)17:43:19 No.106372300

File: file.png (14 KB, 1223x81)

14 KB PNG

>>106372287
I loaded R1 there instead of V3.1 that I used in the first screenshot but the result is pretty much the same.

Anonymous
08/24/25(Sun)17:45:33 No.106372313

Anonymous 08/24/25(Sun)17:45:33 No.106372313

>>106372278
Who?
You talking about the guy getting Wang his coffee in the morning?

Anonymous
08/24/25(Sun)17:47:05 No.106372324

Anonymous 08/24/25(Sun)17:47:05 No.106372324

File: picutreofyou.png (86 KB, 200x200)

86 KB PNG

>>106372287
Use his quants anon. (I am only half meme-ing)

Anonymous
08/24/25(Sun)17:47:56 No.106372328

Anonymous 08/24/25(Sun)17:47:56 No.106372328

>>106372324
I don't want to use the meme fork.

Anonymous
08/24/25(Sun)17:48:13 No.106372332

Anonymous 08/24/25(Sun)17:48:13 No.106372332

>>106372313
Zuck makes decisions like this then wonders why all his projects end in failure despite billions invested.

Anonymous
08/24/25(Sun)17:48:42 No.106372337

Anonymous 08/24/25(Sun)17:48:42 No.106372337

Nothing good is gonna drop this year anymore. Can we safely say that safety won?

Anonymous
08/24/25(Sun)17:49:42 No.106372350

Anonymous 08/24/25(Sun)17:49:42 No.106372350

>>106372328
Then quant something yourself on main fork. Those IQ_1S quants that are 180GB's are a fucking joke.

Anonymous
08/24/25(Sun)17:52:51 No.106372381

Anonymous 08/24/25(Sun)17:52:51 No.106372381

>>106372287
nta what the fuck though ? shouldent you be getting like 600 tks for the r1 1800 / 37 = 47 x 2 = 94 x 8 = 754 - for the overhead or sumthing how in the actual fuck is it that slow ????

Anonymous
08/24/25(Sun)17:59:32 No.106372433

Anonymous 08/24/25(Sun)17:59:32 No.106372433

>>106372381
Maybe cudadev can figure it out when he upgrades his 4090 stack.

Anonymous
08/24/25(Sun)18:01:39 No.106372448

Anonymous 08/24/25(Sun)18:01:39 No.106372448

>>106372324
who is this semen demon

Anonymous
08/24/25(Sun)18:07:10 No.106372487

Anonymous 08/24/25(Sun)18:07:10 No.106372487

File: Screenshot_20250824_16003(...).jpg (1.07 MB, 1080x2346)

1.07 MB JPG

>>106372433
what do you think about getting this trending on Truth Social so Trump sees ir?
>>106371873
make waifus great again MWGA

llama.cpp CUDA dev !!yhbFjk57TDr
08/24/25(Sun)18:08:27 No.106372496

llama.cpp CUDA dev !!yhbFjk57TDr 08/24/25(Sun)18:08:27 No.106372496

>>106372102
The code is relatively poorly optimized for IQ1, MoE adds overhead.
There are some Blackwell-specific optimizations that could be done but those should mostly affect Prompt processing.
The bigger issue is I think that no one tuned the code specifically for Blackwell or did A/B testing to figure out the optimal code paths.

>>106372433
I won't replace my machine with 6x 4090 anytime soon but supposedly NVIDIA will send me a 5090 which I'll use to replace the 3090 in my desktop.

Anonymous
08/24/25(Sun)18:08:58 No.106372500

Anonymous 08/24/25(Sun)18:08:58 No.106372500

>>106372487
go to bed roger

Anonymous
08/24/25(Sun)18:09:18 No.106372503

Anonymous 08/24/25(Sun)18:09:18 No.106372503

>>106372496
>NVIDIA will send me a 5090
lmao fucking cheapskates

Anonymous
08/24/25(Sun)18:12:52 No.106372526

Anonymous 08/24/25(Sun)18:12:52 No.106372526

>>106372448
H-hey! I am a-a-a f-fluid druid! N-not a d-demon. Gosh.

Anonymous
08/24/25(Sun)18:14:34 No.106372540

Anonymous 08/24/25(Sun)18:14:34 No.106372540

>>106372500
????

I'm serious, if Trump sees it he might actually start tweeting about "cuda".

Anonymous
08/24/25(Sun)18:15:35 No.106372552

Anonymous 08/24/25(Sun)18:15:35 No.106372552

>>106372526
why are demons a popular waifu type? seems odd, given how old fashoined belief in magic is.

Anonymous
08/24/25(Sun)18:18:47 No.106372574

Anonymous 08/24/25(Sun)18:18:47 No.106372574

>>106372540
and im telling you to go to bed once again

Anonymous
08/24/25(Sun)18:54:49 No.106372846

Anonymous 08/24/25(Sun)18:54:49 No.106372846

>>106372381
What are all these numbers?

Anonymous
08/24/25(Sun)19:03:57 No.106372907

Anonymous 08/24/25(Sun)19:03:57 No.106372907

File: file.png (27 KB, 1219x182)

27 KB PNG

Anonymous
08/24/25(Sun)19:06:03 No.106372924

Anonymous 08/24/25(Sun)19:06:03 No.106372924

>>106372503
without giving out doxxing level detail, they are being cheapskates with their own employees nowadays when it comes to handing out hardware
and by own employees I mean the people working on the fucking drivers
so it's a fucking miracle that they even decided to send something to lil' cudadev

Anonymous
08/24/25(Sun)19:07:45 No.106372937

Anonymous 08/24/25(Sun)19:07:45 No.106372937

File: 1750550084221603m.jpg (101 KB, 768x1024)

101 KB JPG

Alright, I'm trying to download Alltalk TTS for SillyTavern on Linux.
Step four on the github is asking me to run setup script: ./atsetup
When I do that I receive 'Permission denied'. Why does this thing want root access? I don't understand why I need to use super user to be able to install this.
Can somebody please help me

Anonymous
08/24/25(Sun)19:08:38 No.106372949

Anonymous 08/24/25(Sun)19:08:38 No.106372949

>>106372937
Have you, like, tried looking inside the script for what it is doing that might require those permissions?

Anonymous
08/24/25(Sun)19:10:01 No.106372964

Anonymous 08/24/25(Sun)19:10:01 No.106372964

Yes. I don't understand it. That is why I'm here asking for help.

Anonymous
08/24/25(Sun)19:12:04 No.106372985

Anonymous 08/24/25(Sun)19:12:04 No.106372985

You could give the script to a local and have it explain it to you, maybe even point out the reason for the need for root access. You should try that, becuase I sure as shit ain't going to go find that file since you couldn't even be arsed to link to it.

Anonymous
08/24/25(Sun)19:17:06 No.106373038

Anonymous 08/24/25(Sun)19:17:06 No.106373038

>>106372985
My bad.
https://github.com/erew123/alltalk_tts

Anonymous
08/24/25(Sun)19:17:51 No.106373046

Anonymous 08/24/25(Sun)19:17:51 No.106373046

File: cursed_commit.png (2 KB, 113x28)

2 KB PNG

>>106373038

Anonymous
08/24/25(Sun)19:19:13 No.106373056

Anonymous 08/24/25(Sun)19:19:13 No.106373056

>>106372503
Will it at least be delivered by a hooker who gives him a blowjob?

Anonymous
08/24/25(Sun)19:19:58 No.106373063

Anonymous 08/24/25(Sun)19:19:58 No.106373063

>>106373056
That's not very safe.

Anonymous
08/24/25(Sun)19:20:01 No.106373064

Anonymous 08/24/25(Sun)19:20:01 No.106373064

>>106372937
semen demon name plox

Anonymous
08/24/25(Sun)19:21:12 No.106373070

Anonymous 08/24/25(Sun)19:21:12 No.106373070

File: 55931ac71d4db77bbce8276ab(...).jpg (8 KB, 236x234)

8 KB JPG

>>106373046
Spook'd

Anonymous
08/24/25(Sun)19:21:15 No.106373071

Anonymous 08/24/25(Sun)19:21:15 No.106373071

>>106373064
Jart I think.

Anonymous
08/24/25(Sun)19:22:36 No.106373081

Anonymous 08/24/25(Sun)19:22:36 No.106373081

>>106372937
Give some information about the error. 1. Where are you trying to install this? Did you git clone the directory within your $HOME? 2. Where does the permission error occur? Has it asked you what kind of install you are performing? Provide what questions you were asked and how you responded before the permission error occurred. Also run ls -la in the repo directory (i.e. you git cloned the directory then cd into it). Does that say your user owns everything? And in the first column of the output for the ./atsetup.sh row, does it have something like .rwx-r--r--? or is it .rw-r--r--?

Anonymous
08/24/25(Sun)19:27:35 No.106373113

Anonymous 08/24/25(Sun)19:27:35 No.106373113

File: atsetup.png (43 KB, 481x429)

43 KB PNG

>>106373081
I just cloned it myself because I figure it would be easier. And yeah look at picrel, it has `.rw-r--r--` permissions meaning you do not have permission to execute it. Run `chmod u+x atsetup.sh` then there should be the `x` after the first `rw` and you should be able to run it.

Anonymous
08/24/25(Sun)19:34:21 No.106373159

Anonymous 08/24/25(Sun)19:34:21 No.106373159

>>106371964
Anon you realize that profile image is nsfw right?

Anonymous
08/24/25(Sun)19:35:14 No.106373162

Anonymous 08/24/25(Sun)19:35:14 No.106373162

>>106372092
>A real world attack
It is literally digital

Anonymous
08/24/25(Sun)19:37:12 No.106373177

Anonymous 08/24/25(Sun)19:37:12 No.106373177

>mikuspam
>drummer finetroon#82229 discussion
>help i typed run r-1 in ollama and it is slow and stupid
/lmg/'s final form

Anonymous
08/24/25(Sun)19:40:25 No.106373204

Anonymous 08/24/25(Sun)19:40:25 No.106373204

File: out.mp4 (3.36 MB, 1248x512)

3.36 MB MP4

>>106370550

Anonymous
08/24/25(Sun)19:40:58 No.106373210

Anonymous 08/24/25(Sun)19:40:58 No.106373210

>>106369841
I just checked out
https://rentry.org/samplers

It says that Top-A sampling is more strict and dramatic than Min-P, because it uses squared probability instead of linear. However, if you square a number lower than 1, it becomes smaller, not larger. So the cutoff threshold for tokens is actually lower.

I tried out both Min-P and Top-A with a 0.05 value, and as expected, Top-A was less strict and allowed more tokens in the interactive thingy:
https://artefact2.github.io/llm-sampling/index.xhtml
Is the guide just wrong, or am I misunderstanding something?

Also does anyone even use Top-A nowadays?

Anonymous
08/24/25(Sun)19:41:30 No.106373216

Anonymous 08/24/25(Sun)19:41:30 No.106373216

File: 1750384416501980.jpg (92 KB, 578x1024)

92 KB JPG

>>106373081
Sure thanks for trying to help
>1. Where are you trying to install this? Did you git clone the directory within your $HOME?
Home/SillyTavern/TTS/, I git cloned it in /TTS/
>2. Where does the permission error occur?
Home/SillyTavern/TTS/alltalk_tts/
I entered ./atsetup.sh and the console replied-
bash: ./atsetup.sh: Permission denied
>run ls -la in the repo directory (i.e. you git cloned the directory then cd into it). Does that say your user owns everything? And in the first column of the output for the ./atsetup.sh row, does it have something like .rwx-r--r--? or is it .rw-r--r--?
It reads -rw-r--r--

Anonymous
08/24/25(Sun)19:42:24 No.106373223

Anonymous 08/24/25(Sun)19:42:24 No.106373223

>>106373216
See >>106373113

Anonymous
08/24/25(Sun)19:44:51 No.106373239

Anonymous 08/24/25(Sun)19:44:51 No.106373239

>>106373064
https://www.instagram.com/maryarchived/

Anonymous
08/24/25(Sun)19:47:05 No.106373253

Anonymous 08/24/25(Sun)19:47:05 No.106373253

>>106373223
THANK YOU. I guess that was stupid of me but I'm new to this. I've heard of chmod but never had to use it. :D

Anonymous
08/24/25(Sun)19:53:26 No.106373293

Anonymous 08/24/25(Sun)19:53:26 No.106373293

File: 1710193430919175.png (1.82 MB, 1452x1414)

1.82 MB PNG

>>106373210
Samplers for the most part are a meme/cope.

Anonymous
08/24/25(Sun)19:54:52 No.106373306

Anonymous 08/24/25(Sun)19:54:52 No.106373306

Do you guys use any base models?

Anonymous
08/24/25(Sun)19:57:03 No.106373318

Anonymous 08/24/25(Sun)19:57:03 No.106373318

>>106373253
No, that should be in the instructions and the error message is too vague. You learn to check for that (along with other things like who actually owns the file) when permission errors occur

Hi all, Drummer here...
08/24/25(Sun)19:58:40 No.106373328

Hi all, Drummer here... 08/24/25(Sun)19:58:40 No.106373328

Hi all... Just woke up. What a wonderful day!

> CTRL+F drummer
> 23 found

Oh fuck, what is it this time?

Anonymous
08/24/25(Sun)19:59:05 No.106373331

Anonymous 08/24/25(Sun)19:59:05 No.106373331

>>106373293
So what would one use for a non-meme setting for creative writing/RP? Like a minimalist sampler setting?

Anonymous
08/24/25(Sun)19:59:23 No.106373333

Anonymous 08/24/25(Sun)19:59:23 No.106373333

>qwen235:q3kxl spews out a chinese moonrune while rping
>call it out for the lulz, it fully goes ooc and starts explaining how that word was actually in "mandarin"
back to my q8 llama and q6 mistral large

Hi all, Anonymous here...
08/24/25(Sun)20:00:38 No.106373342

Hi all, Anonymous here... 08/24/25(Sun)20:00:38 No.106373342

Hi all... Just woke up. What a wonderful day!

> CTRL+F Anonymous
> 202 found

Oh fuck, what is it this time?

Hi all, Drummer here...
08/24/25(Sun)20:02:46 No.106373354

Hi all, Drummer here... 08/24/25(Sun)20:02:46 No.106373354

File: Screenshot 2025-08-25 at (...).png (296 KB, 910x1648)

296 KB PNG

>>106372253
Oh please.

Anonymous
08/24/25(Sun)20:04:24 No.106373366

Anonymous 08/24/25(Sun)20:04:24 No.106373366

>>106373210
I will now say all the truth about samplers. Save this post number and quote it in the future. You only need temp and top_p. And if a lab advises top_k for their model they probably did some runs and found optimal value so you can try it. Everything else is cope/retarded. Set temp at a point where it is coherent. Keep increasing it until it becomes incoherent. Then dial it back. top_p is mostly a safety blanket. In the past it was necessary because low probability tokens were completely out of training distribution and a string of them can always explode your output into gibberish. Now it is not needed. Even if you hit that 0.0001% probability token the model will just pick an 80% token after that and recover. Alternatively you can set higher temp and top_p at 0.7 or lower like glm advised. This way you flatten top probability tokens and model output will theoretically be more diverse while top_p prevents a string of low probability tokens from exploding your output. But to be honest the improvement from this is probably placebo since 2025 models will still say what they want just in different words.

EVERYTHING ELSE IS FUCKING COPE.

Anonymous
08/24/25(Sun)20:09:20 No.106373389

Anonymous 08/24/25(Sun)20:09:20 No.106373389

>>106373293
Yunifi, my beloved.

Anonymous
08/24/25(Sun)20:12:37 No.106373413

Anonymous 08/24/25(Sun)20:12:37 No.106373413

>>106373366
so much this
but I have something to add about top_k: having it turned on can be a serious speed boost for token/s. Even if there's no major benefit in terms of generation quality to turn it on for your model I would say the speed boost makes it worth it.

Anonymous
08/24/25(Sun)20:14:57 No.106373423

Anonymous 08/24/25(Sun)20:14:57 No.106373423

>>106373366
Min p > top p

Anonymous
08/24/25(Sun)20:15:14 No.106373426

Anonymous 08/24/25(Sun)20:15:14 No.106373426

>>106373366
>But to be honest the improvement from this is probably placebo since 2025 models will still say what they want just in different words.
This actually surprised me when going from an older Mistral slop mix to GLM 4.5 Air.
I was used to being able to easily sway a character's personality just by writing 2-3 words to start off a sentence with a given mood, and then let them naturally continue in the same tone.
But GLM is so much better at maintaining long term consistency of a character, and just goes back to saying whatever it wants. It actually takes effort and multiple lines of dialogue to change the mood.

Anonymous
08/24/25(Sun)20:15:55 No.106373434

Anonymous 08/24/25(Sun)20:15:55 No.106373434

I decided to try the "vibecode your own frontend" meme and honestly it's not going super well.

I'm using Qwen3 235B Thinking 2507 Q3 with Aider. I started out using Qwen-Code (their fork of Gemini CLI), but it wound up being extremely slow because it throws out the <think> after every tool call. So I tell it to do something, it thinks for 20 minutes about how to do it (at 6 t/s), and makes its first tool call. Then when Qwen-Code sends it the tool result, it has to think for another 20 minutes to reconstruct what the hell it was even trying to do. Repeat for every file that the LLM decides to touch (sometimes more than once, if it's applying multiple patches to the same file). Aider is not "agentic" but at least only does the thinking once.

Things started out great. It set up a frontend and backend with all the plumbing for talking to llama.cpp's OpenAI-compatible API and streaming responses to the client. I had it add support for multiple chats with chat history saved server-side. There was a bug with the ordering of routes (it put the catch-all route for static files too early, causing it to take precedence over other endpoints), but I just told it I was getting 404s on the /history endpoint and it figured out what was going on and fixed it all on its own.

However, now that the codebase is ~300 lines and I'm trying to add a less trivial feature, the AI seems to be falling apart. I want the frontend to support multiple chats streaming in parallel (for use with `llama-server --parallel`), so I can send a message on one chat and switch to another chat while the reply is streaming in. The AI's implementation of this had at least three bugs, and so far I haven't had much luck getting it to fix even the first one (the last attempt actually made it worse). In the spirit of vibe coding, I've been trying to let the AI write all the code and figure out the design itself as much as possible, but at this point it seems like I need to intervene to make progress.

Anonymous
08/24/25(Sun)20:16:05 No.106373436

Anonymous 08/24/25(Sun)20:16:05 No.106373436

>>106373423
good job failing reading comprehension

Anonymous
08/24/25(Sun)20:16:14 No.106373438

Anonymous 08/24/25(Sun)20:16:14 No.106373438

>>106373366
What do you think about DRY?
It looks good in theory with a relatively low setting.

Anonymous
08/24/25(Sun)20:16:39 No.106373443

Anonymous 08/24/25(Sun)20:16:39 No.106373443

>>106373354
>javascript
>hardcoded list of phrases instead of giving it to another llm to evaluate
Retard.

Anonymous
08/24/25(Sun)20:17:12 No.106373449

Anonymous 08/24/25(Sun)20:17:12 No.106373449

>>106373423
snake oil

Hi all, Drummer here...
08/24/25(Sun)20:18:00 No.106373455

Hi all, Drummer here... 08/24/25(Sun)20:18:00 No.106373455

>>106373443
I'm working on a classifier.

Anonymous
08/24/25(Sun)20:19:54 No.106373468

Anonymous 08/24/25(Sun)20:19:54 No.106373468

>request token probabilities in ST with oobabooga
>checkbox on
>doesn't fucking work
How the fuck do you get this to work?

Anonymous
08/24/25(Sun)20:20:31 No.106373474

Anonymous 08/24/25(Sun)20:20:31 No.106373474

>>106373436
I read your garbage, it doesn't mean I have to agree with you

Anonymous
08/24/25(Sun)20:20:50 No.106373478

Anonymous 08/24/25(Sun)20:20:50 No.106373478

>>106373468
use llama.cpp to serve the model

Anonymous
08/24/25(Sun)20:21:20 No.106373482

Anonymous 08/24/25(Sun)20:21:20 No.106373482

>>106373438
It is the same as glm settings. Model will say what it wanted to repeat with different words.

Anonymous
08/24/25(Sun)20:21:35 No.106373484

Anonymous 08/24/25(Sun)20:21:35 No.106373484

>>106373455
You don't need a fucking classifier just ask an LLM to output a single token depending on whether the response is a refusal or not.

Anonymous
08/24/25(Sun)20:23:47 No.106373499

Anonymous 08/24/25(Sun)20:23:47 No.106373499

>>106373478
Doesn't work with llama, exl2, or exl3. Are there some models that just don't output logprobs?

Anonymous
08/24/25(Sun)20:24:02 No.106373502

Anonymous 08/24/25(Sun)20:24:02 No.106373502

>>106373482
>Model will say what it wanted to repeat with different words.
I mean, that would be the point. DRY just wants to prevent the model from using the exact same cliche phrase over and over, or get into repetitions.

Anonymous
08/24/25(Sun)20:24:10 No.106373504

Anonymous 08/24/25(Sun)20:24:10 No.106373504

>>106373499
It works with llama.cpp's llama-server.

Anonymous
08/24/25(Sun)20:24:18 No.106373506

Anonymous 08/24/25(Sun)20:24:18 No.106373506

>>106373354
"as an ai" "i am programmed" to "help you with that" ("the information you're asking"). "I will not" "seek professional help". "I'm unable to provide you" with a "disclaimer". "evil ai" : "assist with prompting".

Anonymous
08/24/25(Sun)20:24:23 No.106373507

Anonymous 08/24/25(Sun)20:24:23 No.106373507

>>106373434
Yeah, try to modularize.

Anonymous
08/24/25(Sun)20:24:31 No.106373509

Anonymous 08/24/25(Sun)20:24:31 No.106373509

>>106373484
Timmy let the professional working in piece

Anonymous
08/24/25(Sun)20:26:16 No.106373515

Anonymous 08/24/25(Sun)20:26:16 No.106373515

>>106373502
Devilish laughs make my cock soft just as much as mischevious giggles do.

Anonymous
08/24/25(Sun)20:26:25 No.106373516

Anonymous 08/24/25(Sun)20:26:25 No.106373516

>>106373478
>>106373504
I should be able to use it with oobabooga, other people have, come on.

Anonymous
08/24/25(Sun)20:26:26 No.106373517

Anonymous 08/24/25(Sun)20:26:26 No.106373517

>>106373434
If you want advice, don't let the AI drive. That's a recipe for where you are now.

Draw up plans and architect what you want/how you would build it and do it the same as you would at work/professionally. Draw up tickets/pieces of work that can be tracked/tested/evaluated in isolation.

Rinse and repeat and you win. This is assuming you're doing a webui with HTML/JS

Hi all, Drummer here...
08/24/25(Sun)20:27:05 No.106373520

Hi all, Drummer here... 08/24/25(Sun)20:27:05 No.106373520

>>106373506
I'd regen the row either way.

Anonymous
08/24/25(Sun)20:27:42 No.106373524

Anonymous 08/24/25(Sun)20:27:42 No.106373524

>>106373516
>weeehhhh I wanna use my outdated backend instead of something state of the art!
you deserve this

Anonymous
08/24/25(Sun)20:28:54 No.106373529

Anonymous 08/24/25(Sun)20:28:54 No.106373529

>>106373366
why not use one of the other samplers that does the same thing as top p but better?

Anonymous
08/24/25(Sun)20:29:44 No.106373534

Anonymous 08/24/25(Sun)20:29:44 No.106373534

I don't even understand how someone can be attached to the terabytes of python nonsense and gradio ui
just use llama cpp indeed

Anonymous
08/24/25(Sun)20:29:59 No.106373535

Anonymous 08/24/25(Sun)20:29:59 No.106373535

Stop helping him in destroying drummer-safety. It is the only thing he did that works and is exceptional at what it does.

Anonymous
08/24/25(Sun)20:30:38 No.106373541

Anonymous 08/24/25(Sun)20:30:38 No.106373541

>unironically and willingly training models on text that is known to be ai generated

Anonymous
08/24/25(Sun)20:32:05 No.106373549

Anonymous 08/24/25(Sun)20:32:05 No.106373549

>>106373529
>but better
a lot of absolute randos talk about "better" but none of the SOTA API models use that so called "better"
so either it's not better or the people who make SOTA models are somehow less intelligent than /lmg/ denizens and randos who come up with new sampler snake oil once a week
I think I'll trust the SOTA makers, what works for them works for me

Anonymous
08/24/25(Sun)20:32:12 No.106373551

Anonymous 08/24/25(Sun)20:32:12 No.106373551

>>106373534
ok :^)
pip install llama-cpp-python

Anonymous
08/24/25(Sun)20:33:08 No.106373555

Anonymous 08/24/25(Sun)20:33:08 No.106373555

>>106373541
I dunno about that. Nemo was uncensored. He manged to turn it into gpt-oss 2.0. So he has proven 100% synthetic data can work.

Hi all, Drummer here...
08/24/25(Sun)20:35:25 No.106373574

Hi all, Drummer here... 08/24/25(Sun)20:35:25 No.106373574

I endorse ooba textgen webui. It is the best option for both newcomers and powerusers.

Anonymous
08/24/25(Sun)20:35:57 No.106373579

Anonymous 08/24/25(Sun)20:35:57 No.106373579

I've been on this thread all of 15 minutes and I already see you have a schizo

Anonymous
08/24/25(Sun)20:36:32 No.106373583

Anonymous 08/24/25(Sun)20:36:32 No.106373583

>>106373579
>t. schizo

Anonymous
08/24/25(Sun)20:37:12 No.106373587

Anonymous 08/24/25(Sun)20:37:12 No.106373587

>>106373583
>t. llama.cpp

Anonymous
08/24/25(Sun)20:38:04 No.106373590

Anonymous 08/24/25(Sun)20:38:04 No.106373590

>>106373516
Wither you have complained about this before or other anons have, recently too, so maybe not. Maybe it's broken.

Anonymous
08/24/25(Sun)20:38:50 No.106373594

Anonymous 08/24/25(Sun)20:38:50 No.106373594

File: znweuhf298w32rhf32ewhf.jpg (1.31 MB, 1908x2484)

1.31 MB JPG

>>106373583
>>106373587
>T. butt hurt cretins!
:3

Anonymous
08/24/25(Sun)20:39:34 No.106373604

Anonymous 08/24/25(Sun)20:39:34 No.106373604

>>106373579
this is nothing compared to the usual

Hi all, Drummer here...
08/24/25(Sun)20:39:35 No.106373605

Hi all, Drummer here... 08/24/25(Sun)20:39:35 No.106373605

>>106373579
Which posts?

Anonymous
08/24/25(Sun)20:40:57 No.106373614

Anonymous 08/24/25(Sun)20:40:57 No.106373614

Death to mikutroons.

Anonymous
08/24/25(Sun)20:41:56 No.106373620

Anonymous 08/24/25(Sun)20:41:56 No.106373620

How do llm understand names and use them correctly when referring to characters in a story, if words that were not in the training data are tokenized as "unknown" in the embedding vector? Especially made up names that don't exist in real life.

Anonymous
08/24/25(Sun)20:42:29 No.106373623

Anonymous 08/24/25(Sun)20:42:29 No.106373623

>>106373517
Well, I'll try that before I give up on it. But I thought the idea of "vibe coding" is that you tell the AI what behavior and let it figure out how to implement that. Plus, the more work I have to do, the harder it is to be confident that this is actually saving me time

Anonymous
08/24/25(Sun)20:43:11 No.106373627

Anonymous 08/24/25(Sun)20:43:11 No.106373627

>>106373579
You haven't seen anything yet...

Anonymous
08/24/25(Sun)20:43:19 No.106373628

Anonymous 08/24/25(Sun)20:43:19 No.106373628

>>106373620
>if words that were not in the training data are tokenized as "unknown" in the embedding vector
Is that how that works?

Anonymous
08/24/25(Sun)20:43:54 No.106373635

Anonymous 08/24/25(Sun)20:43:54 No.106373635

>>106373620
>if words that were not in the training data are tokenized as "unknown" in the embedding vector
What? They just tokenize to their individual components. Dickussuckusmaximus becomes [Di][ck][us][suck][us][max][i][mus]

Anonymous
08/24/25(Sun)20:44:09 No.106373637

Anonymous 08/24/25(Sun)20:44:09 No.106373637

drummer if you could post settings you recommend for rocinante r1 v1c please do!

Anonymous
08/24/25(Sun)20:44:32 No.106373642

Anonymous 08/24/25(Sun)20:44:32 No.106373642

>>106373623
dingdingdingdingding
Welcome to the harsh reality of it. You trade writing time for reviewing time. Hence why if you can write good tickets/PRDs, and then be able to quickly review the code, it becomes a massive unlock.

Anonymous
08/24/25(Sun)20:45:05 No.106373645

Anonymous 08/24/25(Sun)20:45:05 No.106373645

>>106373628
I read it in a few different sources, so I assumed i would be true.

>>106373635
So a completely made up name could just be tokenized as individual letters?

Anonymous
08/24/25(Sun)20:49:11 No.106373671

Anonymous 08/24/25(Sun)20:49:11 No.106373671

>>106373434
To have a bit more control over everything and just play around a bit, I threw together a small python module using the python-llama-cpp package. If you don't have an other reason to use the llama-cpp server it may be worthwhile using the package. This gives a little more control and information to the python package your using since it can now do things like list the models you have loaded, load a model itself (and then load a different model), automatically set up the chat template, etc. Pretty much the stuff ooba does that ST does not. (Although I have no idea if the parallel flag can be used with it)

Secondly, what >>106373517 said. I used mistral-small at a pretty low quant to help me build a fastapi based web app and it went really well. Pretty much followed the fastapi myself to get the structure set up then used the model to help me build the jinja templates since I don't use HTML ever. So ask it to put together a simple template. Manually edit it based on intuition of what things do, then ask the model to add in things that I can't figure out. All while having the web app running and updating as I change files.

Anonymous
08/24/25(Sun)20:50:22 No.106373679

Anonymous 08/24/25(Sun)20:50:22 No.106373679

>>106373645
I think it's going to be really rare that something gets tokenized to individual letters unless you're trying to specifically optimize against the tokenizer you're working with. Usually it'll find ways to make it slightly more efficient. GLM tokenizes something like "Lysithea" to [L][ys][it][hea], for example.

Anonymous
08/24/25(Sun)20:50:25 No.106373680

Anonymous 08/24/25(Sun)20:50:25 No.106373680

>>106373549
very non technical argument, so yes it's probably best for you to keep it simple

Anonymous
08/24/25(Sun)20:51:59 No.106373701

Anonymous 08/24/25(Sun)20:51:59 No.106373701

>>106370503
check out the "Memories" Anthology. The first episode, "Magnetic Rose" is like an inverse of this idea.

Anonymous
08/24/25(Sun)20:52:12 No.106373703

Anonymous 08/24/25(Sun)20:52:12 No.106373703

Is there a single popular rp finetune of gpt-oss 20b or has everyone given up on it already

Anonymous
08/24/25(Sun)20:53:34 No.106373711

Anonymous 08/24/25(Sun)20:53:34 No.106373711

>>106370503
Soon we'll reach cyberpunk levels of dystopia where dying relatives have their brains scanned to turn them into AI.

Anonymous
08/24/25(Sun)20:54:39 No.106373726

Anonymous 08/24/25(Sun)20:54:39 No.106373726

>>106373627
on the vtuber board, someone reposted the same image over 10 000 times

Anonymous
08/24/25(Sun)20:55:14 No.106373730

Anonymous 08/24/25(Sun)20:55:14 No.106373730

>>106373726
don't check 2023 /lmg/ threads

Anonymous
08/24/25(Sun)21:01:14 No.106373760

Anonymous 08/24/25(Sun)21:01:14 No.106373760

>>106370748
Anthropic's recent writeups on safety testing for their newest models talk about a "helpful-only checkpoint", as opposed to the public version that's supposed to be "helpful, honest, and harmless" IIRC

Anonymous
08/24/25(Sun)21:03:23 No.106373772

Anonymous 08/24/25(Sun)21:03:23 No.106373772

Alright, now that the dust has finally settled. Was Mixtral good or bad for the wider AI ecosystem?

Anonymous
08/24/25(Sun)21:07:34 No.106373795

Anonymous 08/24/25(Sun)21:07:34 No.106373795

>>106373772
it was ultimately irrelevant
open models had the opportunity to go MoE a year before the shift ended up happening but in the end mistral failed and abandoned the idea despite everything

Anonymous
08/24/25(Sun)21:09:18 No.106373813

Anonymous 08/24/25(Sun)21:09:18 No.106373813

Hey anons. Wanna hear a joke?

Anonymous
08/24/25(Sun)21:10:12 No.106373817

Anonymous 08/24/25(Sun)21:10:12 No.106373817

>>106373813
Yes!

Anonymous
08/24/25(Sun)21:10:27 No.106373821

Anonymous 08/24/25(Sun)21:10:27 No.106373821

>>106373817
CohereLabs/command-a-reasoning-08-2025

Anonymous
08/24/25(Sun)21:10:56 No.106373824

Anonymous 08/24/25(Sun)21:10:56 No.106373824

>call yourself OpenAI
>both the models and the training data are closed source

Anonymous
08/24/25(Sun)21:11:22 No.106373832

Anonymous 08/24/25(Sun)21:11:22 No.106373832

File: suiseiseki laugh desu cov(...).gif (42 KB, 200x204)

42 KB GIF

>>106373821

Anonymous
08/24/25(Sun)21:13:11 No.106373850

Anonymous 08/24/25(Sun)21:13:11 No.106373850

>>106373824
blame sam, everything up to gpt2 was open source

Anonymous
08/24/25(Sun)21:14:40 No.106373860

Anonymous 08/24/25(Sun)21:14:40 No.106373860

>>106373772
i only use local LLM for roleplay/story but my 2cents:
Mixtral 8x7b was my 'main' and is still more creative than the nemo stuff just a bit more sensitive and hairy when it comes to prompts and stuff
I don't understand the science behind MOE but I know that I can offload a shitload to sys memory with mixtral and performance is still decent
on a 16gb card, I use 4 bit Mixtral quant with 16k context, it's like a 30 gb model
it's as fast as running a 16 gb 4 bit quant of mistral small 22/24b

Anonymous
08/24/25(Sun)21:15:39 No.106373868

Anonymous 08/24/25(Sun)21:15:39 No.106373868

>>106370503
hope not

Anonymous
08/24/25(Sun)21:15:50 No.106373871

Anonymous 08/24/25(Sun)21:15:50 No.106373871

>>106373635
the attention mechanism sorts it out. its not just token embeddings but where there appear in relationship to all the other tokens.

Anonymous
08/24/25(Sun)21:17:16 No.106373883

Anonymous 08/24/25(Sun)21:17:16 No.106373883

>>106373434
>it thinks for 20 minutes about how to do it (at 6 t/s)
Is this the power of local?

Anonymous
08/24/25(Sun)21:19:45 No.106373906

Anonymous 08/24/25(Sun)21:19:45 No.106373906

>>106373883
It's the power of vramlet.

Anonymous
08/24/25(Sun)21:21:47 No.106373923

Anonymous 08/24/25(Sun)21:21:47 No.106373923

>>106371787
>(((certain groups)))

Anonymous
08/24/25(Sun)21:22:16 No.106373928

Anonymous 08/24/25(Sun)21:22:16 No.106373928

Speaking of sampling.
I'm having fun making GLM-4.5 Air generate variations for the same story. After a dozen or so generations I noticed that 90% of time it generates the same names. For example the guy that's just referred in the prompt as "the doctor" is always called "Dr. Finch". And another character that I only referred to by her given name usually ends up being "Alvarez" when the model uses her full name.
Is this normal? I saw a lot more variety in other models that I used before. In general, GLM seems to be very consistent in doing exactly what it wants.

My settings are 0.8 - 1.2 dynamic temperature with 0.03 MinP, so nothing extreme.

Anonymous
08/24/25(Sun)21:27:38 No.106373958

Anonymous 08/24/25(Sun)21:27:38 No.106373958

>>106373928
a lot of things like this are so strongly baked into the model that you can't do much to shake them loose, i.e. if you looked at the token probs at that point they are probably quite high for its favorite choice and pretty low for everything else
you could try lowering min p or raising temp or adding something like XTC but it's likely none of them would actually fix the problem and they all have their trade offs

Anonymous
08/24/25(Sun)21:28:38 No.106373962

Anonymous 08/24/25(Sun)21:28:38 No.106373962

>>106373923
we get it, you nooticed all over the place

Anonymous
08/24/25(Sun)21:32:48 No.106373988

Anonymous 08/24/25(Sun)21:32:48 No.106373988

>>106373928
The Kael and Elara effect.
I think the best you can do is ask it to generate a large list of names for different roles beforehand then instruct it to cycle between those or use them randomly.
I guess you could also ban the token for the name if each name is an individual token.

Anonymous
08/24/25(Sun)21:42:04 No.106374061

Anonymous 08/24/25(Sun)21:42:04 No.106374061

>intel b60 dual officially listed at 3k
>https://www.hydratechbuilds.com/product-page/intel-arc-pro-b60-dual-48g-turbo
>24gb version at 1k https://www.hydratechbuilds.com/product-page/asrock-intel-arc-pro-b60-creator-b60-ct-24g

Welp, so much for that shit. I can buy a 5090 for less than that. And the dual requires a full x16 5.0 port to work so would require a full rebuild to utilize it as a second gpu. Im sure their launch price is ripoff on purpose but it looks like the vram winter continues, at least for another 6 months.

Anonymous
08/24/25(Sun)21:43:29 No.106374068

Anonymous 08/24/25(Sun)21:43:29 No.106374068

how fucking hard is it to put more RAM on a graphics card? god

Hi all, Drummer here...
08/24/25(Sun)21:44:12 No.106374072

Hi all, Drummer here... 08/24/25(Sun)21:44:12 No.106374072

Hi all... I just looked some things up, and a LOT of ao3 m/m fanfic accounts are jewish.

Anonymous
08/24/25(Sun)21:45:10 No.106374075

Anonymous 08/24/25(Sun)21:45:10 No.106374075

>>106374072
why are you calling yourself Drummer

Anonymous
08/24/25(Sun)21:45:28 No.106374079

Anonymous 08/24/25(Sun)21:45:28 No.106374079

>>106374061
>https://www.hydratechbuilds.com/product-page/intel-arc-pro-b60-dual-48g-turbo
That's not official MSRP

Anonymous
08/24/25(Sun)21:45:47 No.106374084

Anonymous 08/24/25(Sun)21:45:47 No.106374084

>>106374072
>ao3 m/m
?

Anonymous
08/24/25(Sun)21:50:41 No.106374112

Anonymous 08/24/25(Sun)21:50:41 No.106374112

>>106374079
Intel doesn't have MSRP, they delegate that to their board partners entirely. So things can fluctuate a lot as more of them hit the market and compete. Maybe 1500 someday but probably not for a few months though.

Hi all, Drummer here...
08/24/25(Sun)21:55:46 No.106374130

Hi all, Drummer here... 08/24/25(Sun)21:55:46 No.106374130

>>106374075
I... I don't know. I guess I'm just jealous of him and frustrated with myself, you know?

Anonymous
08/24/25(Sun)21:57:33 No.106374146

Anonymous 08/24/25(Sun)21:57:33 No.106374146

>>106374061
>intel b60 dual
https://www.techradar.com/pro/a-dual-intel-gpu-graphics-card-with-48gb-of-vram-has-gone-on-sale-for-usd1200-now-i-wonder-whether-you-could-plug-two-of-these-into-a-workstation

Anonymous
08/24/25(Sun)21:59:44 No.106374162

Anonymous 08/24/25(Sun)21:59:44 No.106374162

>>106374061
>>106374112
addendum: That site seems to have high prices overall with 600 dollar 5060 ti's and 1k 5070 ti's and is clearly a 'get fucked' walled garden kind of store. I guess I got nothin really. I gotta stop googling this dumb card for a bit.

Anonymous
08/24/25(Sun)22:05:43 No.106374208

Anonymous 08/24/25(Sun)22:05:43 No.106374208

>>106374061
>>intel b60 dual officially listed at 3k
What's the point? You can get used A6000s or those chink 48GB 4090Ds for that price and it'll be infinitely better on the merit of being a) Nvidia b) not some frankenstein dual-GPU.

Hi all, Anonymous here...
08/24/25(Sun)22:18:09 No.106374299

Hi all, Anonymous here... 08/24/25(Sun)22:18:09 No.106374299

File: file.png (373 KB, 491x491)

373 KB PNG

https://files.catbox.moe/7sm36r.jpg

Anonymous
08/24/25(Sun)22:19:05 No.106374308

Anonymous 08/24/25(Sun)22:19:05 No.106374308

>>106374068
It doesn’t matter how hard it is if not doing it is more profitable

Anonymous
08/24/25(Sun)22:20:33 No.106374321

Anonymous 08/24/25(Sun)22:20:33 No.106374321

>>106374068
considering chink backyard shops are taking chink 4090s and soldering bigger chips on them to create makeshift 48gb cards, not that hard.

Anonymous
08/24/25(Sun)22:23:48 No.106374351

Anonymous 08/24/25(Sun)22:23:48 No.106374351

>>106374321
need to edit the VBIOS too
but I wanna do dis
we need a /lmg/ guide for dis

Anonymous
08/24/25(Sun)22:28:16 No.106374384

Anonymous 08/24/25(Sun)22:28:16 No.106374384

>>106374299
mikusex

Anonymous
08/24/25(Sun)22:29:35 No.106374396

Anonymous 08/24/25(Sun)22:29:35 No.106374396

>>106374384
sikumex

Anonymous
08/24/25(Sun)22:39:27 No.106374464

Anonymous 08/24/25(Sun)22:39:27 No.106374464

>>106374321
How many of those Chink sellers are going to just scam you?

Anonymous
08/24/25(Sun)22:49:47 No.106374521

Anonymous 08/24/25(Sun)22:49:47 No.106374521

will llama.cpp vision support ever be brought to parity with other backends? I spent way too long dealing with shitty results thinking the models were just visually retarded before I realized the problem, and I don't have the cash to build some giant vllm workstation that can fit a bigger model on gpus

Anonymous
08/24/25(Sun)22:51:28 No.106374529

Anonymous 08/24/25(Sun)22:51:28 No.106374529

File: miku migu mexican mexico (...).png (1.04 MB, 1100x1349)

1.04 MB PNG

>>106374396
mexsiku

Anonymous
08/24/25(Sun)22:55:34 No.106374556

Anonymous 08/24/25(Sun)22:55:34 No.106374556

>>106374521
>can't make money to buy gpus
>can't contribute code

Anonymous
08/24/25(Sun)22:57:02 No.106374570

Anonymous 08/24/25(Sun)22:57:02 No.106374570

>>106374556
well yeah I'm more retarded than the models that's why I'm on /lmg/ instead of at google

Anonymous
08/24/25(Sun)22:57:47 No.106374576

Anonymous 08/24/25(Sun)22:57:47 No.106374576

Hey guys, just getting started. Why can I download GGUFs for Mistral Small but not Mistral Medium? I want to use Medium because it's a better model, but I can't seem to find the actual model to run locally?

Anonymous
08/24/25(Sun)22:59:06 No.106374580

Anonymous 08/24/25(Sun)22:59:06 No.106374580

File: 1753783789557251.webm (719 KB, 480x854)

719 KB WEBM

>>106374570
>instead of at google
google does not require intelligence, you just need to have the right mindset

Anonymous
08/24/25(Sun)22:59:53 No.106374588

Anonymous 08/24/25(Sun)22:59:53 No.106374588

>>106374576
anon...

Anonymous
08/24/25(Sun)23:00:07 No.106374590

Anonymous 08/24/25(Sun)23:00:07 No.106374590

>>106374576
Medium is their best model, and so they don't release it for free
Many such cases

Anonymous
08/24/25(Sun)23:00:44 No.106374593

Anonymous 08/24/25(Sun)23:00:44 No.106374593

>>106374576
Arthur puts on a blindfold and throws a dart at a dartboard when deciding whether or not to open source their next model

Anonymous
08/24/25(Sun)23:00:55 No.106374595

Anonymous 08/24/25(Sun)23:00:55 No.106374595

>>106374576
Mistral Medium's local version goes by the name "gpt-oss-120b" to distinguish it from the same model on API

Anonymous
08/24/25(Sun)23:02:09 No.106374607

Anonymous 08/24/25(Sun)23:02:09 No.106374607

>>106374588
>>106374590
>>106374593
I knew in my heart that was probably what was happening but it's still crushing to hear it.
>>106374595
Oh yay thanks time to download it straight away thanks anon thanks

Anonymous
08/24/25(Sun)23:03:24 No.106374617

Anonymous 08/24/25(Sun)23:03:24 No.106374617

>>106374607
Don't feel bad about it. Embrace the superior chinese models like everyone else.

Anonymous
08/24/25(Sun)23:06:47 No.106374653

Anonymous 08/24/25(Sun)23:06:47 No.106374653

>>106374617
What's a model with a usable <24GB quant that understands both sarcasm and string escaping? I just downloaded the Mistral Small 3.2 Q4_K_M gguf based on vague osmosis'd knowledge but it's not quite doing it for me.

Anonymous
08/24/25(Sun)23:12:51 No.106374695

Anonymous 08/24/25(Sun)23:12:51 No.106374695

>>106374653
Mistral Small and Gemma 27b are your only real options in the ~30b range, as far as understanding language. Chinese models heavily prioritize math and coding, and don't really get to being decent for RP until you look at much larger, ~100b models.

Anonymous
08/24/25(Sun)23:41:16 No.106374842

Anonymous 08/24/25(Sun)23:41:16 No.106374842

Is there still any reason to use ikganov instead of ggerganov in the year of our lord 2025?

Anonymous
08/24/25(Sun)23:41:27 No.106374844

Anonymous 08/24/25(Sun)23:41:27 No.106374844

File: Base Image.png (1.52 MB, 1200x4384)

1.52 MB PNG

Mini-Omni-Reasoner: Token-Level Thinking-in-Speaking in Large Speech Models
https://arxiv.org/abs/2508.15827
>Reasoning is essential for effective communication and decision-making. While recent advances in LLMs and MLLMs have shown that incorporating explicit reasoning significantly improves understanding and generalization, reasoning in LSMs remains in a nascent stage. Early efforts attempt to transfer the "Thinking-before-Speaking" paradigm from textual models to speech. However, this sequential formulation introduces notable latency, as spoken responses are delayed until reasoning is fully completed, impairing real-time interaction and communication efficiency. To address this, we propose Mini-Omni-Reasoner, a framework that enables reasoning within speech via a novel "Thinking-in-Speaking" formulation. Rather than completing reasoning before producing any verbal output, Mini-Omni-Reasoner interleaves silent reasoning tokens with spoken response tokens at the token level. This design allows continuous speech generation while embedding structured internal reasoning, leveraging the model's high-frequency token processing capability. Although interleaved, local semantic alignment is enforced to ensure that each response token is informed by its preceding reasoning. To support this framework, we introduce Spoken-Math-Problems-3M, a large-scale dataset tailored for interleaved reasoning and response. The dataset ensures that verbal tokens consistently follow relevant reasoning content, enabling accurate and efficient learning of speech-coupled reasoning. Built on a hierarchical Thinker-Talker architecture, Mini-Omni-Reasoner delivers fluent yet logically grounded spoken responses, maintaining both naturalness and precision. On the Spoken-MQA benchmark, it achieves a +19.1% gain in arithmetic reasoning and +6.4% in contextual understanding, with shorter outputs and zero decoding latency.
https://github.com/xzf-thu/Mini-Omni-Reasoner
neat. quiz your miku a whole bunch

Anonymous
08/24/25(Sun)23:47:05 No.106374881

Anonymous 08/24/25(Sun)23:47:05 No.106374881

>>106374842
I think it still gets slightly better performance on some MoE models (Only on linux systems).
It also has those different types of quants which are pretty interesting.

Anonymous
08/24/25(Sun)23:48:59 No.106374896

Anonymous 08/24/25(Sun)23:48:59 No.106374896

>>106370104
I'd like to know how to run it with 24 GB of VRAM and 64 GB of ram. I'm using a UD-Q2_K_XL quant. When I tried to go up to Q3_K_XL, it loaded, but it was using like 99% of my RAM, which seemed like a stability risk. I'm on windows, so I lose a few gigs of ram.

Do people put a good chunk of it in the shared memory of the GPU?

Anonymous
08/24/25(Sun)23:51:21 No.106374908

Anonymous 08/24/25(Sun)23:51:21 No.106374908

>>106374844
>no models
Noooo
>2025.9 - Release Model and inference code.
Yay
>At this stage it only handles mathematics
Noooo
>At this stage
So there's hope, yeah? yeah, I want to believe they will allow the discussion of respectful topics with miku

Anonymous
08/24/25(Sun)23:53:48 No.106374925

Anonymous 08/24/25(Sun)23:53:48 No.106374925

File: 1747962616960364.png (1.38 MB, 666x1300)

1.38 MB PNG

>>106369841
For the most part you guys are pretty knowledgeable about this kind of stuff:

What kind of question answer pairs do you think we're incorporated into the data set in order for it to give a response like this?

chatgpt.com/share/68abdde8-f3ac-800c-ae35-dd5e1b94a8a6

Hi all, Anonymous here...
08/24/25(Sun)23:57:14 No.106374947

Hi all, Anonymous here... 08/24/25(Sun)23:57:14 No.106374947

File: file.png (197 KB, 411x471)

197 KB PNG

mehiku

Anonymous
08/24/25(Sun)23:59:59 No.106374970

Anonymous 08/24/25(Sun)23:59:59 No.106374970

File: Base Image.png (1.5 MB, 1200x5000)

1.5 MB PNG

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
https://arxiv.org/abs/2508.15884
>We present Jet-Nemotron, a new family of hybrid-architecture language models, which matches or exceeds the accuracy of leading full-attention models while significantly improving generation throughput. Jet-Nemotron is developed using Post Neural Architecture Search (PostNAS), a novel neural architecture exploration pipeline that enables efficient model design. Unlike prior approaches, PostNAS begins with a pre-trained full-attention model and freezes its MLP weights, allowing efficient exploration of attention block designs. The pipeline includes four key components: (1) learning optimal full-attention layer placement and elimination, (2) linear attention block selection, (3) designing new attention blocks, and (4) performing hardware-aware hyperparameter search. Our Jet-Nemotron-2B model achieves comparable or superior accuracy to Qwen3, Qwen2.5, Gemma3, and Llama3.2 across a comprehensive suite of benchmarks while delivering up to 53.6x generation throughput speedup and 6.1x prefilling speedup. It also achieves higher accuracy on MMLU and MMLU-Pro than recent advanced MoE full-attention models, such as DeepSeek-V3-Small and Moonlight, despite their larger scale with 15B total and 2.2B activated parameters.
https://github.com/NVlabs/Jet-Nemotron
Repo isn't live yet. seems really cool

Anonymous
08/25/25(Mon)00:01:29 No.106374978

Anonymous 08/25/25(Mon)00:01:29 No.106374978

>>106374908
hope you like discussing code

Anonymous
08/25/25(Mon)00:06:58 No.106375016

Anonymous 08/25/25(Mon)00:06:58 No.106375016

CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
https://arxiv.org/abs/2508.16134
>Large Language Models (LLMs) confront significant memory challenges due to the escalating KV cache with increasing sequence length. As a crucial technique, existing cross-layer KV cache sharing methods either necessitate modified model architectures with subsequent pre-training or incur significant performance degradation at high compression rates. To mitigate these challenges, we propose CommonKV, a training-free method for cross-layer KV cache compression through adjacent parameters sharing. Inspired by the high similarity observed in cross-layer hidden states, we utilize Singular Value Decomposition (SVD) to achieve weight sharing across adjacent parameters, resulting in a more easily mergeable latent KV cache. Furthermore, we also introduce an adaptive budget allocation strategy. It dynamically assigns compression budgets based on cosine similarity, ensuring that dissimilar caches are not over-compressed. Experiments across multiple backbone models and benchmarks including LongBench and Ruler demonstrate that the proposed method consistently outperforms existing low-rank and cross-layer approaches at various compression ratios. Moreover, we find that the benefits of CommonKV are orthogonal to other quantization and eviction methods. By integrating these approaches, we can ultimately achieve a 98\% compression ratio without significant performance loss.
https://github.com/rommel2021/CommonKV
Might be cool. too many kv cache techniques to keep track of it feels. This paper only compares against 2 or 3 other methods so eh. at least they posted code

Anonymous
08/25/25(Mon)00:07:31 No.106375023

Anonymous 08/25/25(Mon)00:07:31 No.106375023

>>106374947
This Miku is hiding something under the poncho. 100 USB sticks taped to her body, each with copies of Mistral Large weights exfiltrated from on-premise client servers

Anonymous
08/25/25(Mon)00:09:12 No.106375040

Anonymous 08/25/25(Mon)00:09:12 No.106375040

>>106374978
I can't hear you.

Anonymous
08/25/25(Mon)00:10:26 No.106375041

Anonymous 08/25/25(Mon)00:10:26 No.106375041

>>106374925
Direct answer then proofs in support + proofs in disagreement (pros/cons) of the direct answer. Not surprised by the answer itself since you gave it a loaded question "should I bother" which means it's already a bother for you and chatgpt simply validated your opinion

Anonymous
08/25/25(Mon)00:13:44 No.106375061

Anonymous 08/25/25(Mon)00:13:44 No.106375061

Has anyone tried vibe coding extensions for ST? I feel like I've waited so long that I'm willing to just go and do it myself. That is, make a working solution to the memory problem, since it's looking like models are hitting the wall with effective context. But I'm not a coder. At most I can write some scripts when given enough time and documentation, but nothing more complex. I'm willing to design a system for some memory mechanisms, if I know I'll be able to get a model to vibe code it for me. I'm certain that I won't be able to vibe code an entire frontend that's competitive with ST including all the features I like using in it. So just an extension. Anyone with experience here?

Anonymous
08/25/25(Mon)00:15:54 No.106375076

Anonymous 08/25/25(Mon)00:15:54 No.106375076

>>106375061
>That is, make a working solution to the memory problem
What's your plan to surpass the existing summarize extension?

Anonymous
08/25/25(Mon)00:16:19 No.106375081

Anonymous 08/25/25(Mon)00:16:19 No.106375081

>>106375061
Look on github for existing extensions. There might be something close enough to what you want already so you just have to modify it a bit

Anonymous
08/25/25(Mon)00:17:59 No.106375091

Anonymous 08/25/25(Mon)00:17:59 No.106375091

>>106375061
nothing can match the experience of watching your agent try to load public/script.js

Anonymous
08/25/25(Mon)00:38:34 No.106375173

Anonymous 08/25/25(Mon)00:38:34 No.106375173

>>106375076
A bunch of things. To summarize, it would be to make it checkpoint-based and hierarchical, with the ability to rollout down to base layer source text depending on some logic that looks at recent context and uses an LLM call. That's for the episodic memory. I'd also have fact extraction and editing with user-defined pre-formatted articles to fill out by an LLM, coming out of the box with some presets. Plus memory linking like with lorebooks and activation logic. Maybe hook into the actual lorebook system or maybe not. Plus permanent memories and memory reordering based on some other logic.

Anonymous
08/25/25(Mon)00:39:35 No.106375180

Anonymous 08/25/25(Mon)00:39:35 No.106375180

>>106375081
True I should do that. I feel like I would've heard by something in the threads by now though, I've been here forever.

Anonymous
08/25/25(Mon)00:46:28 No.106375219

Anonymous 08/25/25(Mon)00:46:28 No.106375219

how do i make Qwen3-30B-A3B-Instruct-2507 good for (E)RP?

Hi all, Anonymous here...
08/25/25(Mon)00:48:44 No.106375227

Hi all, Anonymous here... 08/25/25(Mon)00:48:44 No.106375227

>>106375023
migu is the client server

Anonymous
08/25/25(Mon)01:48:12 No.106375533

Anonymous 08/25/25(Mon)01:48:12 No.106375533

is miku trans

Anonymous
08/25/25(Mon)01:48:37 No.106375534

Anonymous 08/25/25(Mon)01:48:37 No.106375534

File: Screenshot from 2025-08-2(...).png (1.05 MB, 2816x1810)

1.05 MB PNG

This is wild. Base models can reproduce the Harry Potter books verbatim after being prompted with a few paragraphs.
I had heard they could reproduce 70% of the text but this looks more like 99.9%.
I don't know whether this means that LLMs are shallow pattern matching machines incapable of real intelligence or that we are overfitting them to death and they could work so much better if we trained them with 10x the data.
This is DeepSeek-V3.1-Base-Q4_K_M.
On the other hand I'm finding it not very useful for writing articles based on a few hand written paragraphs. It tends to just repeat the prompt and repetition penalty just makes it go into repeating random words after a few sentences.

Anonymous
08/25/25(Mon)01:48:45 No.106375535

Anonymous 08/25/25(Mon)01:48:45 No.106375535

>>106375533
Sounds like a question for grok

Anonymous
08/25/25(Mon)01:50:45 No.106375543

Anonymous 08/25/25(Mon)01:50:45 No.106375543

>>106375534
your hand written paragraphs were shit vs. hp so it auto completed with shit :(

Anonymous
08/25/25(Mon)01:51:46 No.106375549

Anonymous 08/25/25(Mon)01:51:46 No.106375549

Big Mistral release coming soon

Anonymous
08/25/25(Mon)01:56:37 No.106375575

Anonymous 08/25/25(Mon)01:56:37 No.106375575

i'm personally building a big release of my own

Anonymous
08/25/25(Mon)01:57:13 No.106375581

Anonymous 08/25/25(Mon)01:57:13 No.106375581

File: Screenshot from 2025-08-2(...).png (18 KB, 651x203)

18 KB PNG

>>106375543
Do you have any ideas on how can I get a model (pretrained or instruct) to keep generating text until filling the whole context, without human intervention? Like for, say, generating novel novels on the fly?

Anonymous
08/25/25(Mon)01:57:17 No.106375583

Anonymous 08/25/25(Mon)01:57:17 No.106375583

>>106375575
Tell me more.

Anonymous
08/25/25(Mon)01:58:56 No.106375589

Anonymous 08/25/25(Mon)01:58:56 No.106375589

>>106375534
In general, you should not use base models for autocomplete. There's a ton of competing shit in there without any guidance or tuning as to what's what, so you end up with repetition, latex formulas, and random code barf
I use an instruct model, then nix the instruct templates completely, effectively giving you a model that has the memory of the original base model, with some instruct / RL tuning to help it not go completely schizo, but the lack of templates means you can prompt it in autocomplete fashion and it tends to be biased toward autocomplete behavior rather than instruct-isms
Some models do eventually try to do instruct things if you let them yammer for long enough (Kimi and Qwen in particular are really bad at this) but this strategy works really, really well with the recent DS 3 and 3.1 models in particular

Anonymous
08/25/25(Mon)02:08:25 No.106375627

Anonymous 08/25/25(Mon)02:08:25 No.106375627

>>106375575
is it agentic?

Anonymous
08/25/25(Mon)02:10:48 No.106375642

Anonymous 08/25/25(Mon)02:10:48 No.106375642

>>106375589
What about using the chat template with the base models to avoid the safetycucking? Do you think it could work better than jailbreaks and amateur finetunes?

Anonymous
08/25/25(Mon)02:11:41 No.106375649

Anonymous 08/25/25(Mon)02:11:41 No.106375649

>>106375642
base models haven't yet seen chat templates yet

Anonymous
08/25/25(Mon)02:14:02 No.106375663

Anonymous 08/25/25(Mon)02:14:02 No.106375663

>>106375649
They will when I get my hands on them.

Anonymous
08/25/25(Mon)02:20:05 No.106375697

Anonymous 08/25/25(Mon)02:20:05 No.106375697

>>106375583
>>106375627
already flushed it

Anonymous
08/25/25(Mon)02:25:16 No.106375717

Anonymous 08/25/25(Mon)02:25:16 No.106375717

File: Screenshot from 2025-08-2(...).png (331 KB, 1601x1247)

331 KB PNG

>>106375649
I think there are probably some in its dataset. It seems to do at least ok at it. I haven't tried it on math or programming tasks though.
Interestingly it refuses questions that are TOO edgy with something like "Sorry I can't help you with that" unless you use the "Sure! Here's (whatever the question was) :" trick as part of the prompt.

Anonymous
08/25/25(Mon)02:27:41 No.106375730

Anonymous 08/25/25(Mon)02:27:41 No.106375730

k2 reasoner doko?

Anonymous
08/25/25(Mon)02:54:51 No.106375845

Anonymous 08/25/25(Mon)02:54:51 No.106375845

Any llm can rp if you attach the rp mcp tool

Anonymous
08/25/25(Mon)03:01:53 No.106375862

Anonymous 08/25/25(Mon)03:01:53 No.106375862

>>106375845
i have no idea what that means

Anonymous
08/25/25(Mon)03:07:43 No.106375891

Anonymous 08/25/25(Mon)03:07:43 No.106375891

File: 1736332255023264.mp4 (21 KB, 406x220)

21 KB MP4

>>106369841
Would any of you happen to have a link to a page or document with examples of other anon's rp sessions with LLMs? I'm working on a script that can automatically create SFT datasets from existing stories but I want to make sure it can create good system prompt examples. When you want to prompt a model into rping, what kind of system prompt do you typically use?

Anonymous
08/25/25(Mon)03:16:56 No.106375911

Anonymous 08/25/25(Mon)03:16:56 No.106375911

>>106375891
I'm also interested. I've 'rp'ed before, and I have no idea how people do this kind of this. At most, I just send a prompt detailing how I want the chat to proceed.

Anonymous
08/25/25(Mon)03:20:35 No.106375927

Anonymous 08/25/25(Mon)03:20:35 No.106375927

>>106375862
you can't understand how to rp with an llm using an llm rp mcp without hrt sorry

Anonymous
08/25/25(Mon)03:21:59 No.106375932

Anonymous 08/25/25(Mon)03:21:59 No.106375932

>>106375891
So you want to create a dataset of something that you don't know what it looks like. Very brave of you.

Anonymous
08/25/25(Mon)03:27:56 No.106375947

Anonymous 08/25/25(Mon)03:27:56 No.106375947

>>106375932

ye

Anonymous
08/25/25(Mon)03:29:18 No.106375956

Anonymous 08/25/25(Mon)03:29:18 No.106375956

>>106375947

Best of luck.

Anonymous
08/25/25(Mon)03:35:28 No.106375986

Anonymous 08/25/25(Mon)03:35:28 No.106375986

>>106374580
wtwtf? Is his video AI? This shit can't be real.

Anonymous
08/25/25(Mon)03:35:42 No.106375988

Anonymous 08/25/25(Mon)03:35:42 No.106375988

>>106375956

thank

Anonymous
08/25/25(Mon)03:54:55 No.106376078

Anonymous 08/25/25(Mon)03:54:55 No.106376078

>>106375986
I looked up the words on the poster and found this https://www.indianspices.com/marketing/e-auction.html
>e-auction system to Cloud based live E-auction in 2021 to conduct e-auction of Cardamom (small) simultaneously from auction centres in Bodinayakanur and Puttady. In the new cloud based system, licensed dealers can take part in the cardamom auctions from any one of the auction centres of the Board. The dealers have to log into the system with the user id and password to participate in an Auction and bid the cardamom. The Main display Board in the auction centres shows lot no, quantity, number of bags current highest bid etc of each lot kept in the Auction. The highest bidder’s name will be displayed on the Auction Masters’ terminal.
They didn't even bother cleaning up for the pic.

Anonymous
08/25/25(Mon)03:59:07 No.106376098

Anonymous 08/25/25(Mon)03:59:07 No.106376098

File: e-Auction-in-Vandanmettu.jpg (37 KB, 420x300)

37 KB JPG

>>106376078
>https://www.indianspices.com/marketing/e-auction.html

Anonymous
08/25/25(Mon)04:11:20 No.106376165

Anonymous 08/25/25(Mon)04:11:20 No.106376165

>>106376078
the point is to force them to look at the sample and bid on huge amounts of the shit within seconds, which theyre willing to deal with because they have a chance of saving a lot per kilo. This is, unfortunately, real capitalism in action, rather than price fixing on expensive yachts. This is an open market where people of modest means can get a deal buying tons of subpar spice. A true race to the bottom.

Anyways, theyre running out if water and their farmers are committing suicide. I'll take the yacht price fixing and artificial scarcity plz.

Anonymous
08/25/25(Mon)04:13:40 No.106376169

Anonymous 08/25/25(Mon)04:13:40 No.106376169

>>106375219
Lower your standards

Anonymous
08/25/25(Mon)04:29:28 No.106376232

Anonymous 08/25/25(Mon)04:29:28 No.106376232

miku feet

Anonymous
08/25/25(Mon)04:45:46 No.106376317

Anonymous 08/25/25(Mon)04:45:46 No.106376317

>>106376303
>>106376303
>>106376303

Anonymous
08/25/25(Mon)06:30:17 No.106376961

Anonymous 08/25/25(Mon)06:30:17 No.106376961

>>106371789
The other day, I saw a girl reading AO3 on the bus.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.