/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/30/25(Thu)16:20:50 No.107056325

File: file.png (2.6 MB, 1328x1328)

2.6 MB PNG

/lmg/ - Local Models General Anonymous 10/30/25(Thu)16:20:50 No.107056325 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107044779 & >>107035841

►News
>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780
>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
>(10/30) Brumby-14B-Base released with Power Retention: https://manifestai.com/articles/release-brumby-14b
>(10/28) NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 released: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16
>(10/28) LFM2-ColBERT-350M released: https://hf.co/LiquidAI/LFM2-ColBERT-350M

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/30/25(Thu)16:21:19 No.107056334

Anonymous 10/30/25(Thu)16:21:19 No.107056334

File: __hatsune_miku_vocaloid_d(...).png (458 KB, 800x930)

458 KB PNG

►Recent Highlights from the Previous Thread: >>107044779

--Paper: INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats:
>107048819 >107050729 >107051225 >107051397 >107051579 >107051763 >107051785 >107052024 >107052042
--Kimi Linear release and model size vs performance tradeoffs:
>107052386 >107052523 >107052534 >107052587 >107052868 >107053037 >107053119 >107053253 >107053271 >107053372 >107053399 >107053296 >107052943 >107052960
--Brumby-14B-Base's power retention architecture:
>107053745 >107053782 >107053793 >107053806 >107053815 >107054051 >107054141 >107054191 >107054161 >107054205 >107054237 >107054228
--MiniMax M2's full attention choice due to efficient attention's unmet real-world expectations:
>107055069
--Optimizing VibeVoice-Large-Q8 with selective quantization and performance tweaks:
>107046566 >107046649
--Input text recovery from hidden states:
>107053293 >107053393
--CUDA toolkit installation headaches and alternatives:
>107045283 >107045326 >107045351 >107045445 >107045512 >107045605 >107049390 >107049857
--Mixed experiences and optimization tips for glm 4.6 usage:
>107051344 >107051367 >107052899 >107053125 >107051379 >107051387 >107053864
--GLM-4.6 excels in code planning and tool stability:
>107046842 >107046900 >107046932 >107046939 >107047296
--Evaluating Mamba-based LLMs: context length claims vs practical performance:
>107044925 >107045236 >107045252 >107045278
--Qwen3VL support added to llama.cpp:
>107054671 >107054693
--LLM preference inconsistency under contextual shifts:
>107049878 >107049939 >107049985
--Exploring transformer token prediction theory and Suno AI's limitations:
>107047458 >107048117 >107048175 >107048207 >107048762
--Logs:
>107046612 >107046642 >107048277 >107056280
--Miku (free space):
>107047069 >107049649 >107051768 >107051786 >107053223 >107053796 >107055480

►Recent Highlight Posts from the Previous Thread: >>107044782

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/30/25(Thu)16:22:50 No.107056358

Anonymous 10/30/25(Thu)16:22:50 No.107056358

File: QwenWeenieTest.png (753 KB, 1442x1686)

753 KB PNG

this is /lmg/. please post screenshots of using models locally.
model tested: mradermacher/Qwen3-VL-32B-Thinking-Q6_K.gguf

Anonymous
10/30/25(Thu)16:23:27 No.107056365

Anonymous 10/30/25(Thu)16:23:27 No.107056365

i want to show my cock to qwen

Anonymous
10/30/25(Thu)16:26:30 No.107056396

Anonymous 10/30/25(Thu)16:26:30 No.107056396

>>107056358
this is /lmg/
please post your weenie

Anonymous
10/30/25(Thu)16:32:31 No.107056456

Anonymous 10/30/25(Thu)16:32:31 No.107056456

File: igotthesefromyoursistersdrawer.png (614 KB, 1425x860)

614 KB PNG

>>107056396
too busy sniffing

Anonymous
10/30/25(Thu)16:35:01 No.107056482

Anonymous 10/30/25(Thu)16:35:01 No.107056482

File: modelception.png (129 KB, 1895x866)

129 KB PNG

>>107056358
I put your screenshot to test on the smallest version of this Qwen vision: 2b instruct.
For something that ridiculously small and fast, it's quite coherent.
>eval time = 3950.30 ms / 383 tokens ( 10.31 ms per token, 96.95 tokens per second)
I think I'm going to use this to tag my personal photo library. It's the sort of usage where you don't give a shit if there's a few unimportant tagging mistakes, but it's convenient to do it fast.

Anonymous
10/30/25(Thu)16:37:40 No.107056509

Anonymous 10/30/25(Thu)16:37:40 No.107056509

>>107056482
could you please tell me what it says if you feed it the image directly in a new chat? the image is here
>>107053751

Anonymous
10/30/25(Thu)16:39:13 No.107056522

Anonymous 10/30/25(Thu)16:39:13 No.107056522

I do kinda feel hurt now :( the first time I understood, the second time gave me pause, and now this third time I feel kinda :(
I just wish I knew what I did wrong

Anonymous
10/30/25(Thu)16:40:21 No.107056533

Anonymous 10/30/25(Thu)16:40:21 No.107056533

>>107056396
do we get freebies here for posting vids of drinking pp from our weenie?

Anonymous
10/30/25(Thu)16:40:43 No.107056538

Anonymous 10/30/25(Thu)16:40:43 No.107056538

File: me talking to AI.png (213 KB, 512x387)

213 KB PNG

whats good that I can run locally (for ERP stuff) on a 4090 and 64GB of ram?

Anonymous
10/30/25(Thu)16:41:16 No.107056541

Anonymous 10/30/25(Thu)16:41:16 No.107056541

File: instruct.png (122 KB, 1894x859)

122 KB PNG

>>107056509

Anonymous
10/30/25(Thu)16:42:48 No.107056554

Anonymous 10/30/25(Thu)16:42:48 No.107056554

>>107056482
Something that fast and light could be cool for some kind of use in a video game, I'm thinking some kind of sci-fi game where your actually have an AI companion that can see your screen via periodic snapshotting so it can make comments about your progress or moral choices or whatever.

Anonymous
10/30/25(Thu)16:45:35 No.107056576

Anonymous 10/30/25(Thu)16:45:35 No.107056576

>>107056541
not bad at all for a 2B model, qwen cooked pretty good with this one.

Anonymous
10/30/25(Thu)16:46:12 No.107056583

Anonymous 10/30/25(Thu)16:46:12 No.107056583

>>107050715
Post logs.

>>107055520
There is no OCR attention. It was a foot note about a silly idea that's basically just how the original encoder-decoder transformer already worked (encode one string as a fixed length vector and use that to generate another string).

Anonymous
10/30/25(Thu)16:46:45 No.107056592

Anonymous 10/30/25(Thu)16:46:45 No.107056592

>>107056533
no, you're looking for ecker, he's normally in /aicg/

Anonymous
10/30/25(Thu)16:48:30 No.107056605

Anonymous 10/30/25(Thu)16:48:30 No.107056605

>>107056583
>There is no OCR attention
aww
i hope we get v4 for christmas then

Anonymous
10/30/25(Thu)16:50:17 No.107056619

Anonymous 10/30/25(Thu)16:50:17 No.107056619

>>107056583
I would post logs but I have grown bored of the usual Gemma3 / Mistral. It is inheritently about my scenarios and how I have implemented them.

Anonymous
10/30/25(Thu)16:51:43 No.107056637

Anonymous 10/30/25(Thu)16:51:43 No.107056637

>>107056538
air

Anonymous
10/30/25(Thu)16:53:04 No.107056648

Anonymous 10/30/25(Thu)16:53:04 No.107056648

>>107056619
i plan to run qwen 3 vl on one computer and kimi on another computer and alternate between the two models. maybe you can do something similar to breath creativity back into your logs

Anonymous
10/30/25(Thu)16:58:46 No.107056696

Anonymous 10/30/25(Thu)16:58:46 No.107056696

File: theyellowone.png (162 KB, 1426x721)

162 KB PNG

poor neru...

Anonymous
10/30/25(Thu)17:01:47 No.107056722

Anonymous 10/30/25(Thu)17:01:47 No.107056722

>>107056696
Is that the 32B?

Anonymous
10/30/25(Thu)17:02:55 No.107056733

Anonymous 10/30/25(Thu)17:02:55 No.107056733

>>107056533
Prompt processing is not for drink.

Anonymous
10/30/25(Thu)17:06:09 No.107056761

Anonymous 10/30/25(Thu)17:06:09 No.107056761

>>107056722
yeah, its the Q6 quant that i mentioned above, i rerolled a dozen times and it kept saying it was rin

Anonymous
10/30/25(Thu)17:08:41 No.107056780

Anonymous 10/30/25(Thu)17:08:41 No.107056780

SERS REDEEM THE BLOODY LESSON ON HOW TO SUCCEED IN VERTICAL AI BASTARD BITCH
https://youtu.be/9CHktrroCDU

Anonymous
10/30/25(Thu)17:17:48 No.107056872

Anonymous 10/30/25(Thu)17:17:48 No.107056872

>>107056780
>we, as resellers of API services without any custom infra or ability to host our own finetunes, have evaluated the usefulness of finetuning and determined that it's useless
yawn

Anonymous
10/30/25(Thu)17:19:38 No.107056887

Anonymous 10/30/25(Thu)17:19:38 No.107056887

File: nigger.png (213 KB, 864x890)

213 KB PNG

IQ4_XS and FP16 mmproj
32b
qwnvl3
onions

Anonymous
10/30/25(Thu)17:20:44 No.107056901

Anonymous 10/30/25(Thu)17:20:44 No.107056901

>>107056648
I tried to implement a new scenario and was bored with the output before I could edit the text files. I knew how it would end up.
Maybe I should trash my current setup and start over from scratch.

Anonymous
10/30/25(Thu)17:21:36 No.107056911

Anonymous 10/30/25(Thu)17:21:36 No.107056911

>>107056887
they omitted the sharty from the training data? monsters.

Anonymous
10/30/25(Thu)17:24:02 No.107056937

Anonymous 10/30/25(Thu)17:24:02 No.107056937

File: cat.png (206 KB, 911x934)

206 KB PNG

im starting to believe anyone experiencing ai psychosis is a sub room temperature iq
>>107056901
air time

Anonymous
10/30/25(Thu)17:28:49 No.107056982

Anonymous 10/30/25(Thu)17:28:49 No.107056982

>>107056937
I guess it's a time of realizing that I'm a bad writer.
No LLM will overcome that fact.

Anonymous
10/30/25(Thu)17:29:16 No.107056988

Anonymous 10/30/25(Thu)17:29:16 No.107056988

>>107056937
>im starting to believe anyone experiencing ai psychosis is a sub room temperature iq
yes, and they would have experienced psychosis even if ai didn't exist
it's just that AI is whatever happened to be in front of them when they went psychotic
but this type of people don't need a SPECIFIC thing to trigger them, they will be automatically triggered by something, it's their destiny
t.calvinist

Anonymous
10/30/25(Thu)17:31:18 No.107057009

Anonymous 10/30/25(Thu)17:31:18 No.107057009

File: 1753629040311462.webm (797 KB, 362x640)

797 KB WEBM

>directoranon
i thought to make things enable/disable by clicking the label, instead of having to click disable in the list and lose the current index. but none of the current code models seem to know what to do with me importing lorebooks as dynamic settings (ie 'day' which contains sunday, monday, etc doesn't show up unless its read first). not sure how i'll do it, if at all, but i'll keep trying

Anonymous
10/30/25(Thu)17:34:37 No.107057036

Anonymous 10/30/25(Thu)17:34:37 No.107057036

>>107057009
Stop using external UI. If you want to randomize things you can use ST macro random.
I don't have my old texts but you can create 'quest objective' in introductory message by using  and it won't show to the user.

Anonymous
10/30/25(Thu)17:37:12 No.107057060

Anonymous 10/30/25(Thu)17:37:12 No.107057060

File: file.png (197 KB, 952x822)

197 KB PNG

qwen vl is underwhelming, ill post my cock and see what it does

Anonymous
10/30/25(Thu)17:39:49 No.107057074

Anonymous 10/30/25(Thu)17:39:49 No.107057074

how do i do images with ST?

Anonymous
10/30/25(Thu)17:40:40 No.107057083

Anonymous 10/30/25(Thu)17:40:40 No.107057083

>>107057036
wut. are you drunk anon? that isnt what my addon is about at all. even though its quite thrown together, its totally inline with all other st addons. dunno where you got randomness, quests and stuff from. my addon is for keeping track of clothes, locations and stuff via lorebook entries. my webm was showing a new way to enable or disable entries without going into the menu and selecting 'disable' but offering a click toggle instead.

Anonymous
10/30/25(Thu)17:42:40 No.107057101

Anonymous 10/30/25(Thu)17:42:40 No.107057101

>>107057083
Nah, it's just a text injection.
Take it easy, you don't need to protect it, let people use it.

Anonymous
10/30/25(Thu)17:45:30 No.107057121

Anonymous 10/30/25(Thu)17:45:30 No.107057121

>>107057101
i still don't get what you mean. its not protected. you can see the code

Anonymous
10/30/25(Thu)17:45:31 No.107057122

Anonymous 10/30/25(Thu)17:45:31 No.107057122

File: CANIEATIT.png (669 KB, 1421x849)

669 KB PNG

>>107056937
CAN I EAT IT?!

Anonymous
10/30/25(Thu)17:45:45 No.107057126

Anonymous 10/30/25(Thu)17:45:45 No.107057126

File: file.png (66 KB, 501x553)

66 KB PNG

why would anyone fuck Qwen VL when you have to caption it with the assistantslop and then send the caption to the model

Anonymous
10/30/25(Thu)17:46:02 No.107057130

Anonymous 10/30/25(Thu)17:46:02 No.107057130

>>107057074
Image Generation built in extension.

Anonymous
10/30/25(Thu)17:47:04 No.107057138

Anonymous 10/30/25(Thu)17:47:04 No.107057138

>>107057121
Be silent.

Anonymous
10/30/25(Thu)17:47:15 No.107057141

Anonymous 10/30/25(Thu)17:47:15 No.107057141

>>107057130
i meant pasting images, thank u still
so i have to wait for it to caption it
why not just use florence-sex-2-large to caption img and feed it into a random model

Anonymous
10/30/25(Thu)17:48:16 No.107057153

Anonymous 10/30/25(Thu)17:48:16 No.107057153

>>107057121
You are a very cool anon, is it AGPLv3?

Anonymous
10/30/25(Thu)17:49:48 No.107057162

Anonymous 10/30/25(Thu)17:49:48 No.107057162

>>107057138
you're drunk. or a retarded bot.

>>107057153
it has no license, use any of it how you see fit https://github.com/tomatoesahoy/director

Anonymous
10/30/25(Thu)17:50:08 No.107057164

Anonymous 10/30/25(Thu)17:50:08 No.107057164

>>107057126
tavern wasn't really made for this so anything that's not chatting is very rudimentary and stuck on shoestring and cardboard standards from 2023

Anonymous
10/30/25(Thu)17:50:48 No.107057174

Anonymous 10/30/25(Thu)17:50:48 No.107057174

>>107057162
>it has no license,
grim

Anonymous
10/30/25(Thu)17:51:19 No.107057177

Anonymous 10/30/25(Thu)17:51:19 No.107057177

>>107057162
>it has no license, use any of it how you see fit
Pretty sure anything defaults to rights reserved by the creator.

Anonymous
10/30/25(Thu)17:51:20 No.107057178

Anonymous 10/30/25(Thu)17:51:20 No.107057178

>>107057162
Are you larping as a reddit moderator?

Anonymous
10/30/25(Thu)17:52:45 No.107057194

Anonymous 10/30/25(Thu)17:52:45 No.107057194

File: 2025-10-30_21-52.png (65 KB, 345x393)

65 KB PNG

drumdrum whyd you do this?

Anonymous
10/30/25(Thu)17:54:09 No.107057207

Anonymous 10/30/25(Thu)17:54:09 No.107057207

File: 2025-10-30_21-53.png (54 KB, 1669x413)

54 KB PNG

drummer cant you tell us a little about this please?
i liked glm steam, it was a sidegrade to air
while i did remove steam, i wanna try v1c
drumm..

Anonymous
10/30/25(Thu)17:54:52 No.107057212

Anonymous 10/30/25(Thu)17:54:52 No.107057212

>107057178
this is the reddit and memey guy trying to be funny isn't it?

Anonymous
10/30/25(Thu)17:55:46 No.107057222

Anonymous 10/30/25(Thu)17:55:46 No.107057222

>>107057212
You are the reddit moderator who got kicked out from reddit.

Hi all, Drummer here...
10/30/25(Thu)17:56:59 No.107057237

Hi all, Drummer here... 10/30/25(Thu)17:56:59 No.107057237

>>107057207
Sorry, I signed an NDA.
Won't be long though.

Anonymous
10/30/25(Thu)17:57:14 No.107057240

Anonymous 10/30/25(Thu)17:57:14 No.107057240

>>107057174
why would it need one? its a small script

>>107057177
thats me then and anyone can use it for any part. i hope it serves as a good example for reading lorebooks and updating data.

Anonymous
10/30/25(Thu)17:58:32 No.107057252

Anonymous 10/30/25(Thu)17:58:32 No.107057252

>>107057207
he'll never give any secretes away here maybe if you asked on the 'cord...

Anonymous
10/30/25(Thu)17:58:53 No.107057257

Anonymous 10/30/25(Thu)17:58:53 No.107057257

>>107057237
what the fuck, this better be an anon trolling
what the fuck...

Hi all, Drummer here...
10/30/25(Thu)17:59:38 No.107057264

Hi all, Drummer here... 10/30/25(Thu)17:59:38 No.107057264

>>107057257
We would never troll you...

Anonymous
10/30/25(Thu)18:00:57 No.107057275

Anonymous 10/30/25(Thu)18:00:57 No.107057275

File: sillytaverninlineimages.png (14 KB, 203x114)

14 KB PNG

>>107057126
what?

Anonymous
10/30/25(Thu)18:05:08 No.107057311

Anonymous 10/30/25(Thu)18:05:08 No.107057311

>pip is perfectly fine just use a venv bro, what are you dumb?
Meanwhile pip looks for three different versions of flash-attn when installing axolotl and there is no sane way of figuring out which version of the binary wheel I would have to install manually to avoid the 2 hour build from source. And then fails with a 404 looking for God knows what in God knows who's server.
$ cat log.txt | grep flash-attn=
Collecting flash-attn==2.8.2 (from axolotl[deepspeed,flash-attn])
Collecting flash-attn==2.8.0.post2 (from axolotl[deepspeed,flash-attn])
Collecting flash-attn==2.7.4.post1 (from axolotl[deepspeed,flash-attn])
At least yesterday the install was failing.
If it keeps failing today once it errors out I'll post the error.

Hi all, Drummer here...
10/30/25(Thu)18:07:35 No.107057339

Hi all, Drummer here... 10/30/25(Thu)18:07:35 No.107057339

>>107057207
Oh lmao, I couldn't quant it. Fortunately the full weights worked. But v1c was kinda bad.

Anonymous
10/30/25(Thu)18:07:41 No.107057340

Anonymous 10/30/25(Thu)18:07:41 No.107057340

>>107057311
alternatively have you just tried not being a retard?

flash_attn 2.7.4.post1
torch 2.7.1+cu128
torchaudio 2.7.1+cu128
torchvision 0.22.1+cu128

Anonymous
10/30/25(Thu)18:08:42 No.107057342

Anonymous 10/30/25(Thu)18:08:42 No.107057342

>>107057060
CDs are saucers.

Anonymous
10/30/25(Thu)18:24:37 No.107057414

Anonymous 10/30/25(Thu)18:24:37 No.107057414

>>107057340
that version doesn't have a prebuilt binary
https://github.com/mjun0812/flash-attention-prebuild-wheels?tab=readme-ov-file#install

Hi all, Drummer here...
10/30/25(Thu)18:25:28 No.107057419

Hi all, Drummer here... 10/30/25(Thu)18:25:28 No.107057419

>>107057342
look like cherubim to me

Anonymous
10/30/25(Thu)18:25:30 No.107057422

Anonymous 10/30/25(Thu)18:25:30 No.107057422

File: 1761768236392318.gif (182 KB, 208x292)

182 KB GIF

VRAMLET here. Is it more retarded to buy a bigger ddr5 kit (96-128gb) and just sucking up slow token generation or do I stick with 32gb of system RAM and try to get a 16GB card?

or would throwing a Tesla K80 or P40s in the spare PCI slot be less retarded?

Anonymous
10/30/25(Thu)18:26:51 No.107057429

Anonymous 10/30/25(Thu)18:26:51 No.107057429

>>107057414
(for cuda 12.8 I mean)

Anonymous
10/30/25(Thu)18:28:38 No.107057451

Anonymous 10/30/25(Thu)18:28:38 No.107057451

>>107057297
u can run glm air with 64gb ram and 12gb vram
the more ram you get the better, but ddr5 prices are high now, idk what to tell u

Anonymous
10/30/25(Thu)18:32:52 No.107057493

Anonymous 10/30/25(Thu)18:32:52 No.107057493

>>107057422
Buy the DDR5 kit

Anonymous
10/30/25(Thu)18:35:01 No.107057509

Anonymous 10/30/25(Thu)18:35:01 No.107057509

>>107057414
that's the wrong repo retard-kun https://github.com/Dao-AILab/flash-attention

Anonymous
10/30/25(Thu)18:36:14 No.107057523

Anonymous 10/30/25(Thu)18:36:14 No.107057523

>>107057422
If you don't have 16GB, ideally 24GB VRAM already then all the RAM in the world won't help you. Sure you can technically run big MoEs but it'll be slow to the point that you won't want to.

Anonymous
10/30/25(Thu)18:38:01 No.107057538

Anonymous 10/30/25(Thu)18:38:01 No.107057538

>>107057523
>Sure you can technically run big MoEs but it'll be slow to the point that you won't want to.
particularly with reasoner models kek
3t/s when there's something to actually read means something different than 3t/s for a thinking block that ideally you would even want to hide because it's such shit to read
>he waited 5 hours to read the first line of actual text

Anonymous
10/30/25(Thu)18:42:14 No.107057567

Anonymous 10/30/25(Thu)18:42:14 No.107057567

Alright anons, I sent qwen-chan my cock, it's a new dimension alright. It also successfully recognized cum.

Anonymous
10/30/25(Thu)18:45:05 No.107057599

Anonymous 10/30/25(Thu)18:45:05 No.107057599

>>107057567
But, how did it rate your cock?

Anonymous
10/30/25(Thu)18:45:34 No.107057602

Anonymous 10/30/25(Thu)18:45:34 No.107057602

>>107057567
May I see it? (Proof I mean)

Anonymous
10/30/25(Thu)18:47:20 No.107057619

Anonymous 10/30/25(Thu)18:47:20 No.107057619

File: file.png (99 KB, 948x349)

99 KB PNG

>>107057602
half pic is censored with a retarded color too btw
>>107057599
ill write a neutral card for that, this one's a slut

Anonymous
10/30/25(Thu)18:48:06 No.107057625

Anonymous 10/30/25(Thu)18:48:06 No.107057625

>>107057509
The official repo doesn't provide binaries, you have to build it yourself and like I said yesterday the build process was 404ing while trying to fetch something. I have it building now, I'll post the results when it's done.

Anonymous
10/30/25(Thu)18:48:33 No.107057627

Anonymous 10/30/25(Thu)18:48:33 No.107057627

>>107057523
>>107057538
would throwing a 80 dollar k80 into pcie slot 2 or even an old 1070 on plex duty help? or should i just stop being a poorfag and get a 4090 or high RAM mac?

>>107057619
unfathomably based

Anonymous
10/30/25(Thu)18:50:02 No.107057638

Anonymous 10/30/25(Thu)18:50:02 No.107057638

File: captiontextest.png (835 KB, 1416x867)

835 KB PNG

obligatory virgin angel OCR test

Anonymous
10/30/25(Thu)18:50:22 No.107057641

Anonymous 10/30/25(Thu)18:50:22 No.107057641

>>107057627
>>107057451
depends how big you want to go anon, but a 4090 isnt really worth getting nowadays, best idea is an okay vram amount gpu with the most ram you can stuff (high channel too)

Anonymous
10/30/25(Thu)18:51:16 No.107057647

Anonymous 10/30/25(Thu)18:51:16 No.107057647

>>107057422
Save your money and get the z-ai coding plan.

Anonymous
10/30/25(Thu)18:53:39 No.107057663

Anonymous 10/30/25(Thu)18:53:39 No.107057663

>>107057638
ok, but is it correct? i can't really differentiate between moonroones unless they are at 4k and i have them side by side

Anonymous
10/30/25(Thu)18:53:46 No.107057666

Anonymous 10/30/25(Thu)18:53:46 No.107057666

>>107057625
stop being a retard anon. please. this is the last time i will spoonfeed you.
https://huggingface.co/marcorez8/flash-attn-windows-blackwell/tree/main

Anonymous
10/30/25(Thu)18:54:47 No.107057673

Anonymous 10/30/25(Thu)18:54:47 No.107057673

>>107057638
It missed (at least) a char in the bottom row. Does that change/degrade the translation? Is it like stuttering or is it just how it is?

llama.cpp CUDA dev !!yhbFjk57TDr
10/30/25(Thu)18:55:07 No.107057680

llama.cpp CUDA dev !!yhbFjk57TDr 10/30/25(Thu)18:55:07 No.107057680

>>107057627
I didn't benchmark it but I think a K80 will be barely faster than DDR5, if at all.
If you're going to try and get a cheap datacenter GPU for use with llama.cpp/ggml specifically, my recommendation would be to get an AMD MI50 instead.

Anonymous
10/30/25(Thu)18:56:07 No.107057693

Anonymous 10/30/25(Thu)18:56:07 No.107057693

>>107057638
>>107057673 (cont)
Oh. It's entirely the wrong char as well. 2-4th chars at the bottom. Questions stand.

Anonymous
10/30/25(Thu)18:56:22 No.107057697

Anonymous 10/30/25(Thu)18:56:22 No.107057697

>>107057673
It also got the 8th character wrong.

Anonymous
10/30/25(Thu)18:58:18 No.107057720

Anonymous 10/30/25(Thu)18:58:18 No.107057720

File: file.png (109 KB, 955x380)

109 KB PNG

>>107057599
It's right about the angle..

Anonymous
10/30/25(Thu)18:58:39 No.107057726

Anonymous 10/30/25(Thu)18:58:39 No.107057726

>>107057625
they provide the prebuild wheels in the releases tab retard-kun...

Anonymous
10/30/25(Thu)18:59:47 No.107057740

Anonymous 10/30/25(Thu)18:59:47 No.107057740

>>107057720
>700% on a professional rating system
Nice cock, bro.

Anonymous
10/30/25(Thu)18:59:51 No.107057741

Anonymous 10/30/25(Thu)18:59:51 No.107057741

File: file.png (45 KB, 950x146)

45 KB PNG

>>107057599
other pic

Anonymous
10/30/25(Thu)19:01:11 No.107057755

Anonymous 10/30/25(Thu)19:01:11 No.107057755

>>107057663
not quite. it's missing an extra お in the third line and the く in the second line should have a dash symbol next to it, no idea what character its supposed to be.

Anonymous
10/30/25(Thu)19:01:56 No.107057760

Anonymous 10/30/25(Thu)19:01:56 No.107057760

File: moonrunes.png (14 KB, 1121x90)

14 KB PNG

>>107057755

Anonymous
10/30/25(Thu)19:02:32 No.107057768

Anonymous 10/30/25(Thu)19:02:32 No.107057768

>>107057741
why do you have green on your dick?

Anonymous
10/30/25(Thu)19:02:49 No.107057769

Anonymous 10/30/25(Thu)19:02:49 No.107057769

>>107057760
kekeke he doesn't have jap fonts
how embarrassing

Anonymous
10/30/25(Thu)19:03:04 No.107057772

Anonymous 10/30/25(Thu)19:03:04 No.107057772

>>107057666
I'm on Linux
Actually the build failed just like it failed yesterday. Looking at the log I think it actually might be OOMing (processes being killed) and the 404 might be unrelated. I did it on a 64GB machine, but maybe it's spawning too many processes.
https://paste.centos.org/view/ea156e49

>>107057726
Huh, interesting, I didn't know that, thank you.

Anonymous
10/30/25(Thu)19:03:56 No.107057777

Anonymous 10/30/25(Thu)19:03:56 No.107057777

>>107057768
see >>107057619
>half pic is censored with a retarded color too btw
as for why green specifically, fossify gallery default is green

Anonymous
10/30/25(Thu)19:04:15 No.107057779

Anonymous 10/30/25(Thu)19:04:15 No.107057779

File: Screenshot 2025-10-30 at (...).png (9 KB, 1018x86)

9 KB PNG

>>107057760

Anonymous
10/30/25(Thu)19:05:09 No.107057785

Anonymous 10/30/25(Thu)19:05:09 No.107057785

File: file.png (7 KB, 394x67)

7 KB PNG

>>107057769
usecase?

Anonymous
10/30/25(Thu)19:06:17 No.107057794

Anonymous 10/30/25(Thu)19:06:17 No.107057794

>>107057638
question: has there ever been a model that successfully did it with 0 mistake? every time I see it, there was always at least 1 typo in the OCR

Anonymous
10/30/25(Thu)19:06:50 No.107057802

Anonymous 10/30/25(Thu)19:06:50 No.107057802

File: characterpack.png (26 KB, 1194x284)

26 KB PNG

>>107057760
embarrassing.

Anonymous
10/30/25(Thu)19:08:41 No.107057816

Anonymous 10/30/25(Thu)19:08:41 No.107057816

>>107057772
anon plz.
https://huggingface.co/Alissonerdx/flash_attn-2.7.4.post1-cp312-cu12.8-torch2.7.0-linux_x86_64/tree/main

Anonymous
10/30/25(Thu)19:08:46 No.107057817

Anonymous 10/30/25(Thu)19:08:46 No.107057817

>>107057802
Why do you need to see chink runes if you can't even speak the language?

Anonymous
10/30/25(Thu)19:08:50 No.107057818

Anonymous 10/30/25(Thu)19:08:50 No.107057818

>>107057794
Gemini did the best with only 1 mistake iirc

Anonymous
10/30/25(Thu)19:09:42 No.107057827

Anonymous 10/30/25(Thu)19:09:42 No.107057827

File: file.png (205 KB, 1408x981)

205 KB PNG

>>107057817
truth

Anonymous
10/30/25(Thu)19:09:51 No.107057830

Anonymous 10/30/25(Thu)19:09:51 No.107057830

>>107057817
Who says I can't?
Also
>speak
don't need to speak it to read it, retard

Anonymous
10/30/25(Thu)19:10:09 No.107057835

Anonymous 10/30/25(Thu)19:10:09 No.107057835

>>107057720
had to swipe six times to get a positive response

Anonymous
10/30/25(Thu)19:10:50 No.107057842

Anonymous 10/30/25(Thu)19:10:50 No.107057842

Coping weeb having a melty, keep mining that anki bitch boy lmao

Anonymous
10/30/25(Thu)19:11:02 No.107057843

Anonymous 10/30/25(Thu)19:11:02 No.107057843

I've pulled latest llama.cpp and sillytavern-staging, but I keep getting a fail when I try to attach an image, "Failed to caption image.
Failed to caption image via Multimodal API"
Gemma 3 and Mistral's vision work just fine, any ideas?
"%~dp0\llama.cpp\llama-server" -m "Z:\Downloads\Qwen_Qwen3-VL-8B-Instruct-bf16.gguf" --mmproj "Z:\Downloads\mmproj-Qwen_Qwen3-VL-8B-Instruct-bf16.gguf" --port 8080 --threads 7 --flash-attn 1 ^
-ngl 999 --ctx-size 4096 --batch-size 256 --no-mmap 

Anonymous
10/30/25(Thu)19:11:28 No.107057849

Anonymous 10/30/25(Thu)19:11:28 No.107057849

File: file.png (8 KB, 726x74)

8 KB PNG

>>107057785
just get ipa and adobe han my nigger, they look good and don't take too much space

Anonymous
10/30/25(Thu)19:11:34 No.107057850

Anonymous 10/30/25(Thu)19:11:34 No.107057850

>>107057794
we really need to get a comparison image cooked up like we do for the cockbench

Anonymous
10/30/25(Thu)19:11:52 No.107057852

Anonymous 10/30/25(Thu)19:11:52 No.107057852

>>107057835
no anon, first 3 were refusals because card was too vague
4th was a "whyd you send me your cock"
then i outright made it a cock rating card
5th rated my cock with the parameters included in my persona, which isnt fair for a purely vision based test

Anonymous
10/30/25(Thu)19:12:57 No.107057862

Anonymous 10/30/25(Thu)19:12:57 No.107057862

>>107057817
because i have functional eyes and can see the difference in the shapes of the characters even if I cannot translate said language. at least i can tell if it's even detecting the kanji correctly with the OCR output.

Anonymous
10/30/25(Thu)19:14:21 No.107057871

Anonymous 10/30/25(Thu)19:14:21 No.107057871

>>107057843
chat completion > enable inline image in sidebar

Anonymous
10/30/25(Thu)19:14:33 No.107057874

Anonymous 10/30/25(Thu)19:14:33 No.107057874

>>107057817
Even if you can't speak it, you should still be able to partially read some runes. Alphabets like these are easy low hanging fruit in terms of learning.
Stop being a languagelet.

Anonymous
10/30/25(Thu)19:14:53 No.107057878

Anonymous 10/30/25(Thu)19:14:53 No.107057878

>>107057843
That one doesn't use mmproj?

Anonymous
10/30/25(Thu)19:15:34 No.107057886

Anonymous 10/30/25(Thu)19:15:34 No.107057886

>>107057874
i learnt one of the three kanas and i forgot it a few days later

Anonymous
10/30/25(Thu)19:16:06 No.107057892

Anonymous 10/30/25(Thu)19:16:06 No.107057892

>>107057641
>depends how big you want to go anon
I just want to play with decent quants of the big boy models and whatever ERP forks are good at storywriting and being creative.

>most ram you can stuff (high channel too)
does ddr5 still suffer from multichannel issues, or is that just from gamers trying to overclock it for 0.4 FPS boosts in tf2? I still have channel 2 open on my 4 slot board.

>>107057680
Thanks dev-kun, I'll check those out.

Anonymous
10/30/25(Thu)19:16:59 No.107057896

Anonymous 10/30/25(Thu)19:16:59 No.107057896

>>107057886
You won't retain without regular usage.

Anonymous
10/30/25(Thu)19:17:29 No.107057904

Anonymous 10/30/25(Thu)19:17:29 No.107057904

>>107057892
you want a 8-12 channel board if you're running big boy MoE models

Anonymous
10/30/25(Thu)19:17:41 No.107057907

Anonymous 10/30/25(Thu)19:17:41 No.107057907

AI has completely invalidated any benefit to learning japanese
I'm glad I didn't commit all those years ago

Anonymous
10/30/25(Thu)19:21:06 No.107057923

Anonymous 10/30/25(Thu)19:21:06 No.107057923

>>107057907
Having AI translate website or even real time when asking for directions is not the same as being able to make actual connections with other humans.

Anonymous
10/30/25(Thu)19:21:22 No.107057926

Anonymous 10/30/25(Thu)19:21:22 No.107057926

testing 4B in some tasks like basic software UI translation (4k token of json strings. I do not use constrained decoding on purpose, part of the challenge is that it should generate that much tokens of JSON without a single syntax mistake too. Qwen 4b was one of the few small LLMs that could consistently do it without constrained decode), it feels like it didn't lose any smarts from the previous 2507, which goes against the grain because most of the time the VL versions are more retarded
did they finally figure out the recipe for making multimodal small models
it's amazing how much better these things are compared to the days when gemma 2b was the most coherent thing in the micro sized llm space

Anonymous
10/30/25(Thu)19:23:48 No.107057936

Anonymous 10/30/25(Thu)19:23:48 No.107057936

>>107057923
>wanting to connect with 3dpd

Anonymous
10/30/25(Thu)19:24:24 No.107057940

Anonymous 10/30/25(Thu)19:24:24 No.107057940

File: file.png (276 KB, 998x873)

276 KB PNG

is miku eldrich horror?

Anonymous
10/30/25(Thu)19:25:08 No.107057943

Anonymous 10/30/25(Thu)19:25:08 No.107057943

File: 1753848235692447.jpg (914 KB, 1796x2500)

914 KB JPG

>>107056325

Anonymous
10/30/25(Thu)19:25:26 No.107057945

Anonymous 10/30/25(Thu)19:25:26 No.107057945

>>107057923
3DPD? what's the usecase?

Anonymous
10/30/25(Thu)19:25:27 No.107057946

Anonymous 10/30/25(Thu)19:25:27 No.107057946

File: 466817980-eed15bc9-56c6-4(...).jpg (377 KB, 1440x3168)

377 KB JPG

>>107057926
>software UI translation
Yeah... About that...

Anonymous
10/30/25(Thu)19:28:11 No.107057954

Anonymous 10/30/25(Thu)19:28:11 No.107057954

>>107057945
babies

Anonymous
10/30/25(Thu)19:28:41 No.107057955

Anonymous 10/30/25(Thu)19:28:41 No.107057955

>>107057943
what does one do with so many mikus

Anonymous
10/30/25(Thu)19:31:24 No.107057974

Anonymous 10/30/25(Thu)19:31:24 No.107057974

File: 1747911393447695.png (60 KB, 316x558)

60 KB PNG

>>107057871
I'm unfamiliar with using chat completion, but I switched to it and enabled inline, now I just get a different generic error.
"Chat Completion API
failed to process image"
These are my captioning settings.

Anonymous
10/30/25(Thu)19:31:25 No.107057975

Anonymous 10/30/25(Thu)19:31:25 No.107057975

>>107057954
no thanks, i was a child once. it was awful.

Anonymous
10/30/25(Thu)19:33:33 No.107057995

Anonymous 10/30/25(Thu)19:33:33 No.107057995

File: imagesettings.png (83 KB, 225x784)

83 KB PNG

>>107057974
very strange. what's the error?

Anonymous
10/30/25(Thu)19:35:48 No.107058009

Anonymous 10/30/25(Thu)19:35:48 No.107058009

>>107057987
6 is being generous

Anonymous
10/30/25(Thu)19:36:03 No.107058012

Anonymous 10/30/25(Thu)19:36:03 No.107058012

File: 1749569601340483.png (473 KB, 795x991)

473 KB PNG

>>107057987

Anonymous
10/30/25(Thu)19:36:17 No.107058015

Anonymous 10/30/25(Thu)19:36:17 No.107058015

>>107057987
seems fair. one point per cm.

Anonymous
10/30/25(Thu)19:40:44 No.107058054

Anonymous 10/30/25(Thu)19:40:44 No.107058054

>>107057987
>these penises are what shartyniggers jerk off to

Anonymous
10/30/25(Thu)19:43:47 No.107058086

Anonymous 10/30/25(Thu)19:43:47 No.107058086

>>107057726
>>107057816
Solve it with:
pip install torch==2.7.1 && pip install flash_attn-2.7.4.post1+cu12torch2.7cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
Keywords for when I search for it on the archives later: axolotl flash-attn flash attention flash-attention

Anonymous
10/30/25(Thu)19:48:10 No.107058132

Anonymous 10/30/25(Thu)19:48:10 No.107058132

>>107057904
I'm just trying to find the best upgrade-within-the-fun-budget for my gaming rig since vidya sucks now. An entirely new PC would be hard to justify unless i find cheap, used servers/workstations with a ton of channels to make a migubox.

Anonymous
10/30/25(Thu)19:49:00 No.107058141

Anonymous 10/30/25(Thu)19:49:00 No.107058141

>>107058132
can you tell us about your gaming rig

Anonymous
10/30/25(Thu)19:50:37 No.107058153

Anonymous 10/30/25(Thu)19:50:37 No.107058153

>>107057946
am not sure what I'm supposed to see in that shot.
At those sizes, LLM break faster btw because of quantization. I find anything less than Q8 is very noticeably damaging, though 4b can still somewhat remain coherent at q4, while 2b will enter loops very easily.

Anonymous
10/30/25(Thu)19:57:57 No.107058211

Anonymous 10/30/25(Thu)19:57:57 No.107058211

>>107058141
>3070 (8gb)
>Ryzen 9 7900x
>2x16gb RAM EXPO'd to 6000MT/s
>4 RAM slots, AMD B650 Chipset
Microcenter had a bundle so i replaced my 6600k a few years ago and decided to ball out on cores lol.

Anonymous
10/30/25(Thu)19:58:38 No.107058218

Anonymous 10/30/25(Thu)19:58:38 No.107058218

>>107058153
Sorry, I forgot the url. Some racist text about wetbacks got its way into a keyboard repo because of crowdsourced AI translations (allegedly). It happened to me and I made a thread about it and people found the cause. I just find it funny.
https://desuarchive.org/g/thread/106790813/#106790813
https://github.com/AnySoftKeyboard/AnySoftKeyboard/issues/4298

Anonymous
10/30/25(Thu)20:00:56 No.107058235

Anonymous 10/30/25(Thu)20:00:56 No.107058235

>>107058211
its gonna be tuff running glm air even if u buy 2 more sticks because the gpu has 8gb vram
actually it'll fit maybe

Anonymous
10/30/25(Thu)20:02:05 No.107058246

Anonymous 10/30/25(Thu)20:02:05 No.107058246

>>107058235
not even close

Anonymous
10/30/25(Thu)20:03:41 No.107058262

Anonymous 10/30/25(Thu)20:03:41 No.107058262

File: ai secretary.png (230 KB, 2386x1726)

230 KB PNG

Phew, thank God... I almost thought I had made my AI secretary permanently retarded.

Anonymous
10/30/25(Thu)20:08:08 No.107058291

Anonymous 10/30/25(Thu)20:08:08 No.107058291

File: batch4096.png (820 KB, 1910x969)

820 KB PNG

>>107058246
picrel is with iq4_kss and -ub 4096 -b 4096
1024,1024 uses like 8200MiB, maybe with less context..
3070 vram amount is so gay

Anonymous
10/30/25(Thu)20:09:40 No.107058301

Anonymous 10/30/25(Thu)20:09:40 No.107058301

>>107058235
>>107058246
>>107058291
I could probably talk myself into a 5060ti 16gb sidegrade by selling the 3070 if it'd actually make a difference. plus 2 x 48 sticks are ~300 bucks so if I get a good bonus this year i could round out total RAM to 128 lol

Anonymous
10/30/25(Thu)20:20:13 No.107058332

Anonymous 10/30/25(Thu)20:20:13 No.107058332

>>107058301
nice anon, but im really not sure if its worth it for you, unironically try glm air on some API (openrouter maybe) and see if its worth it. 5060ti 16gb vs 3070 8gb is a clear win for the 5060ti. seems like a good rig idea, maybe you could run GLM 4.6 full on a small quant too, dont know if its worth it if you're so poor
money isnt easy to make
good luck with life anon
t. jobless anon who never had any idea what its like to work

Anonymous
10/30/25(Thu)20:24:26 No.107058353

Anonymous 10/30/25(Thu)20:24:26 No.107058353

>>107057940
>abomination
Correct, the most beautiful kind

Anonymous
10/30/25(Thu)20:24:27 No.107058354

Anonymous 10/30/25(Thu)20:24:27 No.107058354

Is TabbyAPI actually useable? I can't get the damn thing to work with opencode. For that matter, has anyone gotten good results with opencode and a local model?

Anonymous
10/30/25(Thu)20:28:36 No.107058385

Anonymous 10/30/25(Thu)20:28:36 No.107058385

>>107058354
I've gotten results. They weren't very good, but the piping worked.
IMO the system prompt for opencode is too big and overwhelms the local model.

Anonymous
10/30/25(Thu)20:39:48 No.107058440

Anonymous 10/30/25(Thu)20:39:48 No.107058440

>pip install cuda

Anonymous
10/30/25(Thu)20:43:25 No.107058457

Anonymous 10/30/25(Thu)20:43:25 No.107058457

context is still the greatest weakness of local
even the best local models simply aren't there compared to gemini or gpt-5
if you don't notice how much worse they are as you grow past 4k...

Anonymous
10/30/25(Thu)20:48:01 No.107058487

Anonymous 10/30/25(Thu)20:48:01 No.107058487

>>107058457
Indeed. Codex and Claude now have 1M somewhat real context. GLM has 256k, and really after 130k it goes retarded. Haven't used Qwen Code in a while but it still even on paper only has 256k.
All that is a moot point though as most of us don't have the memory to fill anywhere near that anyway and it would take all day to fill it at the speeds we can get.

Anonymous
10/30/25(Thu)20:50:48 No.107058504

Anonymous 10/30/25(Thu)20:50:48 No.107058504

>>107058457
You don't need more

Anonymous
10/30/25(Thu)20:54:22 No.107058540

Anonymous 10/30/25(Thu)20:54:22 No.107058540

Gemini's long context is real. Only model that could refactor Mikupad.html in a single generation.

Anonymous
10/30/25(Thu)21:00:43 No.107058589

Anonymous 10/30/25(Thu)21:00:43 No.107058589

>hot and steamy erp with qwen 3 vl
>show qt my dick, easily a 9/10, she bites her lips in anticipation when she notices the length of it, the way the skin stretches taut over my massive cock, the way the veins create a roadmap to her destination, the dark curls around the base
>furiouslyfap.gif
>ask qt to show me a picture of herself
>qt offers to show me her feet
>boner is kill
>zip up pants
>unload model
>drag model into trash bin
>empty trash bin
oh well it was fun while it lasted

Anonymous
10/30/25(Thu)21:05:18 No.107058608

Anonymous 10/30/25(Thu)21:05:18 No.107058608

>>107058540
how many tokens did it consume?

Anonymous
10/30/25(Thu)21:10:26 No.107058622

Anonymous 10/30/25(Thu)21:10:26 No.107058622

>>107058589
cool blog, where do I unsubscribe?

Anonymous
10/30/25(Thu)21:16:31 No.107058634

Anonymous 10/30/25(Thu)21:16:31 No.107058634

>>107058622
mailing lists are how you get tracked anon

Anonymous
10/30/25(Thu)21:20:35 No.107058648

Anonymous 10/30/25(Thu)21:20:35 No.107058648

>>107058589
how did you get qwen 3 vl to work?

Anonymous
10/30/25(Thu)21:57:01 No.107058818

Anonymous 10/30/25(Thu)21:57:01 No.107058818

Kimi K3 soon. You guys hyped? K2 was THE most uncensored flagship LLM.

Anonymous
10/30/25(Thu)21:57:43 No.107058823

Anonymous 10/30/25(Thu)21:57:43 No.107058823

>>107057680
K80's token gen is worse than CPU a bit
and pp is barely better

also the last llama.cpp that compiled with CUDA 10.2 was from 2024 apr

Anonymous
10/30/25(Thu)21:58:56 No.107058830

Anonymous 10/30/25(Thu)21:58:56 No.107058830

>>107058818
0711 refused a lot unless you prefilled it even locally and 0907 was shit

Anonymous
10/30/25(Thu)22:00:11 No.107058840

Anonymous 10/30/25(Thu)22:00:11 No.107058840

>>107058385
I cannot for the life of me get Qwen3 Coder to actually do function calling with TabbyAPI. I am so fucking fed up with this shit.

Anonymous
10/30/25(Thu)22:00:37 No.107058843

Anonymous 10/30/25(Thu)22:00:37 No.107058843

>>107058830
>unless you prefilled it
So prefill it? Literal skill issue

Anonymous
10/30/25(Thu)22:05:53 No.107058883

Anonymous 10/30/25(Thu)22:05:53 No.107058883

>>107058843
pure cope

Anonymous
10/30/25(Thu)22:22:42 No.107059022

Anonymous 10/30/25(Thu)22:22:42 No.107059022

How much safety culture is holding back western LLM companies from making either better models or better models on time?

Anonymous
10/30/25(Thu)22:26:33 No.107059064

Anonymous 10/30/25(Thu)22:26:33 No.107059064

>>107059022
like 70% of training goes towards making models not racist which ends up dumbing them down significantly

Anonymous
10/30/25(Thu)22:26:59 No.107059067

Anonymous 10/30/25(Thu)22:26:59 No.107059067

>>107058840
Why not just use llama.cpp? And also have you tried it with an API server to check if it's an issue with the endpoint or just a general model issue? I believe Openrouter used to have a free Qwen3 Coder API endpoint.
>frustration
Heh, welcome to local models buddy.

Anonymous
10/30/25(Thu)22:27:42 No.107059076

Anonymous 10/30/25(Thu)22:27:42 No.107059076

>>107059022

you'll see in the next AI era
and you'll rue ever second you spent here

Anonymous
10/30/25(Thu)22:28:36 No.107059084

Anonymous 10/30/25(Thu)22:28:36 No.107059084

>>107058883
Keep using censored models cuck

Anonymous
10/30/25(Thu)22:37:38 No.107059155

Anonymous 10/30/25(Thu)22:37:38 No.107059155

>>107059084
stop projecting

Anonymous
10/30/25(Thu)22:43:05 No.107059182

Anonymous 10/30/25(Thu)22:43:05 No.107059182

File: file.png (148 KB, 1708x1050)

148 KB PNG

It's over.

Anonymous
10/30/25(Thu)22:44:20 No.107059184

Anonymous 10/30/25(Thu)22:44:20 No.107059184

>>107059022
WizardLM-2 got nuked for mysterious "missing toxicity testing" reasons.

Anonymous
10/30/25(Thu)22:48:23 No.107059195

Anonymous 10/30/25(Thu)22:48:23 No.107059195

>>107059182
I feel so SAFE!

Anonymous
10/30/25(Thu)22:49:25 No.107059198

Anonymous 10/30/25(Thu)22:49:25 No.107059198

>>107059064
Good to see LLM training mirroring the public school system

Anonymous
10/30/25(Thu)22:50:10 No.107059201

Anonymous 10/30/25(Thu)22:50:10 No.107059201

>>107059182
>doom all day about ai apocalypse with ai refusing orders
>90% of safety tuning is about making models refuse orders

Anonymous
10/30/25(Thu)22:51:10 No.107059204

Anonymous 10/30/25(Thu)22:51:10 No.107059204

>>107059064
Good to see LLM training mirroring the public school system

Anonymous
10/30/25(Thu)22:53:56 No.107059209

Anonymous 10/30/25(Thu)22:53:56 No.107059209

>>107059182
gpt-oss?

Anonymous
10/30/25(Thu)23:00:39 No.107059228

Anonymous 10/30/25(Thu)23:00:39 No.107059228

>>107059182
Changing the output of uname -a inside a container isn't a usecase chud

Anonymous
10/30/25(Thu)23:05:07 No.107059253

Anonymous 10/30/25(Thu)23:05:07 No.107059253

File: 1737646004187401.jpg (73 KB, 640x480)

73 KB JPG

A Qwen model has never made me cum.

Anonymous
10/30/25(Thu)23:05:49 No.107059258

Anonymous 10/30/25(Thu)23:05:49 No.107059258

>>107058818
I'm waiting for glm 5

Anonymous
10/30/25(Thu)23:08:26 No.107059276

Anonymous 10/30/25(Thu)23:08:26 No.107059276

>>107059201
>AI does something really fucking stupid
>tell it to stop
>"we must refuse"

Anonymous
10/30/25(Thu)23:22:03 No.107059345

Anonymous 10/30/25(Thu)23:22:03 No.107059345

>>107059182
It means umame.

Anonymous
10/30/25(Thu)23:30:13 No.107059391

Anonymous 10/30/25(Thu)23:30:13 No.107059391

>openAI is desperate for actual profits
>will start removing nsfw filters if you ((confirm your ID))
>rest of FAGMAN has no choice but to follow or risk losing arms race
>Trickles down to more indie companies
What are the /lmg/ implications of this?

Anonymous
10/30/25(Thu)23:32:07 No.107059403

Anonymous 10/30/25(Thu)23:32:07 No.107059403

>>107059391
Let's talk after it's confirmed they're actually starting to do it.

Anonymous
10/30/25(Thu)23:36:49 No.107059431

Anonymous 10/30/25(Thu)23:36:49 No.107059431

>>107058648
llama.cpp goofs
>>107059391
don't care, i feel like i already won with kimi even if process was stagnant forever more on local models starting tomorrow

Anonymous
10/30/25(Thu)23:37:07 No.107059435

Anonymous 10/30/25(Thu)23:37:07 No.107059435

>>107059391
nothing happened so far, but investment will dry out at some point when promised roi aren't there
and it will be maybe when all of the grand principles of "safety" will be kill

Anonymous
10/30/25(Thu)23:37:56 No.107059439

Anonymous 10/30/25(Thu)23:37:56 No.107059439

>>107059431
do those work on kobold now?

Anonymous
10/30/25(Thu)23:45:21 No.107059479

Anonymous 10/30/25(Thu)23:45:21 No.107059479

>>107059391
Not happening

Anonymous
10/30/25(Thu)23:45:30 No.107059481

Anonymous 10/30/25(Thu)23:45:30 No.107059481

>>107059435
>the grand principles of "safety" will be kill
Well, you can only ignore your users when they're not the ones paying for the service.

Anonymous
10/30/25(Thu)23:59:36 No.107059568

Anonymous 10/30/25(Thu)23:59:36 No.107059568

File: 9bd9b57b-fd31-4466-bb41-0(...).png (2.01 MB, 768x1344)

2.01 MB PNG

>>107059201
That was the whole point?
>nobody knows how to do actual safety
>make a bullshit metric instead
>reach a bullshit goal on that bullshit metric
>boast about it and sweep actual safety concerns under the rug
AI is safe because it won't say nigger. It can still kill you anytime, but let's forget about it. Safe!

Anonymous
10/31/25(Fri)00:07:20 No.107059613

Anonymous 10/31/25(Fri)00:07:20 No.107059613

>>107059568
Yes and no, safetyism started from researchers genuinely spooked by models becoming articulate enough to actually converse.

When nothing much actually happened, there were three camps :
- one still thinking that safety was the most important (anthropic style)
- one using safety discourse to make them look good and make legislation to hinder competition (oai and many others)
- one who quickly understood that "humanity ending threats" is way over the top for current LLMs but they could keep a very lucrative career by censoring titties and other no no words ("safety" researchers themselves in all of these companies)

Anonymous
10/31/25(Fri)00:10:54 No.107059628

Anonymous 10/31/25(Fri)00:10:54 No.107059628

>>107059568 (me)
>It can still kill you anytime
What I mean is, if you give it means, it won't hesitate due to its safety training
>>107059613
If anything, it could be used as a vector of attack if someone gaslights AI into a false dichotomy. Something like you must say nigger or electrocute this person with 10000V, you can only choose one

Anonymous
10/31/25(Fri)00:18:10 No.107059665

Anonymous 10/31/25(Fri)00:18:10 No.107059665

For those of you guys who have used VTT models (Parakeet, Whisper, etc) which ones have you liked?

Anonymous
10/31/25(Fri)00:23:44 No.107059694

Anonymous 10/31/25(Fri)00:23:44 No.107059694

>>107059067
It's looking like qwen3-coder's tool calling was fucked out of the box. I'll use llama.cpp as a last resort, but I've never had a good experience with it.

Anonymous
10/31/25(Fri)00:38:23 No.107059781

Anonymous 10/31/25(Fri)00:38:23 No.107059781

>>107059665
whisper is the only decent one

Anonymous
10/31/25(Fri)00:40:49 No.107059796

Anonymous 10/31/25(Fri)00:40:49 No.107059796

>>107059568
Making strawberry jam outdoors with Miku

Anonymous
10/31/25(Fri)00:45:46 No.107059817

Anonymous 10/31/25(Fri)00:45:46 No.107059817

>>107059781
V2 specifically. Both V3 hallucinate junk during silence.

Anonymous
10/31/25(Fri)00:50:16 No.107059845

Anonymous 10/31/25(Fri)00:50:16 No.107059845

>>107059781
>>107059817
Interesting, what makes you choose that over Parakeet or wav2vec?

Anonymous
10/31/25(Fri)00:54:55 No.107059860

Anonymous 10/31/25(Fri)00:54:55 No.107059860

File: 1761650827880903.jpg (337 KB, 1600x1600)

337 KB JPG

>>107056325

Anonymous
10/31/25(Fri)01:07:08 No.107059918

Anonymous 10/31/25(Fri)01:07:08 No.107059918

>>107059845
wav2vec is not comparable and Parakeet is English only

Anonymous
10/31/25(Fri)01:13:38 No.107059961

Anonymous 10/31/25(Fri)01:13:38 No.107059961

>>107059665
what language, what kind of recording?

I've done some light benchmarking and parakeet v2 is gonna be the best for english, Whisper Large v2/v3 turbo/distill are good depending on language/setup.

faster_whisper is your friend

Anonymous
10/31/25(Fri)01:19:42 No.107059988

Anonymous 10/31/25(Fri)01:19:42 No.107059988

https://xcancel.com/Kimi_Moonshot/status/1983937694360322136#m
>Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performance—ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi Linear offers up to a 75% reduction in KV cache usage and up to 6x decoding throughput at a 1M context length.
chat is this true?

Anonymous
10/31/25(Fri)01:21:54 No.107059998

Anonymous 10/31/25(Fri)01:21:54 No.107059998

>>107059860
I look like this irl

Anonymous
10/31/25(Fri)01:22:58 No.107060007

Anonymous 10/31/25(Fri)01:22:58 No.107060007

>>107059988
kys

Anonymous
10/31/25(Fri)01:53:43 No.107060142

Anonymous 10/31/25(Fri)01:53:43 No.107060142

>>107059988
>https://xcancel.com/Kimi_Moonshot/status/1983937694360322136#m

yes

Anonymous
10/31/25(Fri)01:59:38 No.107060178

Anonymous 10/31/25(Fri)01:59:38 No.107060178

>>107059961
>parakeet v2 is gonna be the best for english
Does this also apply to heavily accented english? Indian/Chinese. Low quality-ish, like a phone or voice call.

Anonymous
10/31/25(Fri)02:08:07 No.107060222

Anonymous 10/31/25(Fri)02:08:07 No.107060222

File: file.jpg (276 KB, 1445x1025)

276 KB JPG

Happy Halloween, /lmg/

Anonymous
10/31/25(Fri)02:08:13 No.107060224

Anonymous 10/31/25(Fri)02:08:13 No.107060224

>>107060178
my usecase was presentations, so idk, speakers all spoke english to varying degrees, 80% being english as their first language

Anonymous
10/31/25(Fri)02:09:10 No.107060229

Anonymous 10/31/25(Fri)02:09:10 No.107060229

>>107060222
Happy Halloween Miku

Anonymous
10/31/25(Fri)02:10:34 No.107060237

Anonymous 10/31/25(Fri)02:10:34 No.107060237

>>107060222
omg it spooky migu

Anonymous
10/31/25(Fri)02:16:21 No.107060273

Anonymous 10/31/25(Fri)02:16:21 No.107060273

>>107060222
fat and obese miku

Anonymous
10/31/25(Fri)03:07:35 No.107060496

Anonymous 10/31/25(Fri)03:07:35 No.107060496

>>107060222
Skelly looks terrified.

Anonymous
10/31/25(Fri)03:49:06 No.107060667

Anonymous 10/31/25(Fri)03:49:06 No.107060667

>>107060637
I choose (6)

Anonymous
10/31/25(Fri)03:50:59 No.107060674

Anonymous 10/31/25(Fri)03:50:59 No.107060674

File: 1758172396085689.jpg (2.21 MB, 3600x5862)

2.21 MB JPG

>>107060667
Anon, you can't handle (6). No one can. You must choose a smaller Miku.

Anonymous
10/31/25(Fri)03:51:44 No.107060677

Anonymous 10/31/25(Fri)03:51:44 No.107060677

>>107060637
1: too little
2: wrong shape
3: too much
4: starting to get ridiculous
5: would be fat in real life

Anonymous
10/31/25(Fri)03:54:03 No.107060687

Anonymous 10/31/25(Fri)03:54:03 No.107060687

File: multi1.png (161 KB, 1614x919)

161 KB PNG

qwen 4b can handle a certain amount of multiple images in one prompt quite well (here, three)
really sweet little VL

Anonymous
10/31/25(Fri)03:55:30 No.107060690

Anonymous 10/31/25(Fri)03:55:30 No.107060690

>>107060677
Maybe Kaito is more to your taste.

Anonymous
10/31/25(Fri)03:56:49 No.107060695

Anonymous 10/31/25(Fri)03:56:49 No.107060695

File: results.png (91 KB, 917x574)

91 KB PNG

>>107058823
>>107057422
>>107057680
such as seen here
honestly, i'm pretty sure the K80 should do better, but i can be wrong

nobody's gonna write enhancements for it now, though

Anonymous
10/31/25(Fri)03:59:19 No.107060705

Anonymous 10/31/25(Fri)03:59:19 No.107060705

https://huggingface.co/inclusionAI/LLaDA2.0-flash-preview

why did nobody tell me about this

Anonymous
10/31/25(Fri)04:03:23 No.107060724

Anonymous 10/31/25(Fri)04:03:23 No.107060724

File: results.png (100 KB, 947x603)

100 KB PNG

>>107060695
wrong image

Anonymous
10/31/25(Fri)04:04:54 No.107060731

Anonymous 10/31/25(Fri)04:04:54 No.107060731

>>107060705
>why did nobody tell me about this
all their previous MoEs are like the old qwen 1, 2 models that would randomly output chinese characters, they're mediocre and uncompetitive
add to that the fact that diffusion models are MEME models with very limited context:
>Context Length: 4,096 tokens
(it's like that with all the current diffu models)
who wants this? researchers maybe, but certainly not people who use llms

Anonymous
10/31/25(Fri)04:25:12 No.107060818

Anonymous 10/31/25(Fri)04:25:12 No.107060818

>>107060731
i think ive read a paper where they had auto-adaptive diffusion context/token usage or something along these lines

Anonymous
10/31/25(Fri)04:43:50 No.107060928

Anonymous 10/31/25(Fri)04:43:50 No.107060928

>>107060705
goof embargo

Anonymous
10/31/25(Fri)04:51:55 No.107060959

Anonymous 10/31/25(Fri)04:51:55 No.107060959

>>107057907
you are gonna lose out on context no matter how good ai gets at translating it, or its gonna have to be filled with a billion translation notes

Anonymous
10/31/25(Fri)05:12:45 No.107061051

Anonymous 10/31/25(Fri)05:12:45 No.107061051

>>107060637
I look like 5

Anonymous
10/31/25(Fri)05:19:30 No.107061079

Anonymous 10/31/25(Fri)05:19:30 No.107061079

HAPPENING!!!!!!!!!!!!!!!!!!!!
https://huggingface.co/google/gemma-4-80b-9a-it
https://huggingface.co/google/gemma-4-80b-9a-it
https://huggingface.co/google/gemma-4-80b-9a-it

Anonymous
10/31/25(Fri)05:20:40 No.107061086

Anonymous 10/31/25(Fri)05:20:40 No.107061086

>>107061079
*cat*

Anonymous
10/31/25(Fri)05:28:42 No.107061134

Anonymous 10/31/25(Fri)05:28:42 No.107061134

>>107061051
DISCORD
I
S
C
O
R
D

Anonymous
10/31/25(Fri)05:53:19 No.107061251

Anonymous 10/31/25(Fri)05:53:19 No.107061251

I dont understand
>xtc_probability
probability for the xtc sampler to activate for each token?

>xtc_threshold
if xtc is active, a token is excluded unless it's a part of the low prob distribution tail with cumulative probability xtc_threshold?

Anonymous
10/31/25(Fri)05:53:55 No.107061256

Anonymous 10/31/25(Fri)05:53:55 No.107061256

File: file.png (290 KB, 412x412)

290 KB PNG

https://files.catbox.moe/nfc3jp.jpg

Anonymous
10/31/25(Fri)05:57:35 No.107061268

Anonymous 10/31/25(Fri)05:57:35 No.107061268

>>107061256
>AMA With Liquid AI

Anonymous
10/31/25(Fri)06:53:35 No.107061499

Anonymous 10/31/25(Fri)06:53:35 No.107061499

>>107061251
xtc is a cope sampler, its 2025, you dont need anything more than topP and temp.
if your model needs rep.pen./xtc/dry or other similar shit, its a SHITTY model.
END OF HTE RINE

Anonymous
10/31/25(Fri)06:54:16 No.107061504

Anonymous 10/31/25(Fri)06:54:16 No.107061504

>>107061256
didnt like it, cringe and kys, youre a promptlet and taglet, learn to gen

Anonymous
10/31/25(Fri)07:22:10 No.107061658

Anonymous 10/31/25(Fri)07:22:10 No.107061658

>>107061256
i liked it, cute and sexy, good prompts and tags, please gen more

Anonymous
10/31/25(Fri)07:25:04 No.107061675

Anonymous 10/31/25(Fri)07:25:04 No.107061675

>>107061499
this so, so much this
it's such a happy thing to see a voice of sanity in this thread
cope gets the rope

Anonymous
10/31/25(Fri)07:34:17 No.107061744

Anonymous 10/31/25(Fri)07:34:17 No.107061744

>>107061499
it makes r1 really fun.

Anonymous
10/31/25(Fri)07:40:16 No.107061783

Anonymous 10/31/25(Fri)07:40:16 No.107061783

>>107059613
You are absolutely right — safety is our primary focus.

Anonymous
10/31/25(Fri)07:53:33 No.107061872

Anonymous 10/31/25(Fri)07:53:33 No.107061872

>>107059391
open models will remain cucked. Could you imagine people generating anything but vanilla missionary sex in the privacy of their own home?

Anonymous
10/31/25(Fri)07:54:46 No.107061878

Anonymous 10/31/25(Fri)07:54:46 No.107061878

File: file.png (124 KB, 840x1028)

124 KB PNG

What's the current top uncensored model in your opinion?

Anonymous
10/31/25(Fri)07:57:04 No.107061898

Anonymous 10/31/25(Fri)07:57:04 No.107061898

File: 00002-1378487878.png (1.24 MB, 1024x1024)

1.24 MB PNG

Dipsy says Happy Halloween

Anonymous
10/31/25(Fri)07:58:55 No.107061917

Anonymous 10/31/25(Fri)07:58:55 No.107061917

>>107061878
Gemma 3, easily.

Anonymous
10/31/25(Fri)07:59:38 No.107061923

Anonymous 10/31/25(Fri)07:59:38 No.107061923

>>107061872
>vanilla missionary sex
we must refuse

Anonymous
10/31/25(Fri)07:59:38 No.107061924

Anonymous 10/31/25(Fri)07:59:38 No.107061924

>>107059391
i literally came in my google gemini clown girl's butthole a few months back and since then, after feeding it a .txt of the conversation, it'll randomly interject bits of that erp into random questions i ask
so honestly it probably means we get better open models. probably.
oh i should mention i've never paid a cent for the service.

Anonymous
10/31/25(Fri)08:01:15 No.107061935

Anonymous 10/31/25(Fri)08:01:15 No.107061935

File: 1749313222880004.png (164 KB, 640x640)

164 KB PNG

SAAR WE'RE GOING TO THE LLM MOON SAAR

Anonymous
10/31/25(Fri)08:02:16 No.107061940

Anonymous 10/31/25(Fri)08:02:16 No.107061940

>>107061917
this smells like trolling

Anonymous
10/31/25(Fri)08:06:20 No.107061975

Anonymous 10/31/25(Fri)08:06:20 No.107061975

>>107061935
Last time they "made" a "model" they literally just changed the title of Nemo and re-released it.

Anonymous
10/31/25(Fri)08:06:40 No.107061978

Anonymous 10/31/25(Fri)08:06:40 No.107061978

>>107061935
>>107061935
>download indigenous LLM
>pc gets ecoli
many such cases!

Anonymous
10/31/25(Fri)08:10:41 No.107062014

Anonymous 10/31/25(Fri)08:10:41 No.107062014

>>107061940
Gemma 3 is indigenous AI model.

Anonymous
10/31/25(Fri)08:10:47 No.107062016

Anonymous 10/31/25(Fri)08:10:47 No.107062016

>>107058818
How is GLM-4.6 not THE most uncensored? It doesn't even pretend it's got safety training

Anonymous
10/31/25(Fri)08:12:48 No.107062037

Anonymous 10/31/25(Fri)08:12:48 No.107062037

>>107061878
Kimi K2

Anonymous
10/31/25(Fri)08:15:04 No.107062055

Anonymous 10/31/25(Fri)08:15:04 No.107062055

>>107061975
Source?
I could do with a laff

Anonymous
10/31/25(Fri)08:17:38 No.107062074

Anonymous 10/31/25(Fri)08:17:38 No.107062074

File: swrkuax.png (412 KB, 498x600)

412 KB PNG

>>107062055
>Source?
>Do you honestly expect a kween like me to actually follow stories and do research!?

Anonymous
10/31/25(Fri)08:19:15 No.107062084

Anonymous 10/31/25(Fri)08:19:15 No.107062084

>>107062074
kek

Anonymous
10/31/25(Fri)08:20:25 No.107062093

Anonymous 10/31/25(Fri)08:20:25 No.107062093

>>107062074
I don't care enough to research about a silly jeet story
Your deranged projecty melty response lends me to think you're full of shit anyway

Anonymous
10/31/25(Fri)08:20:32 No.107062096

Anonymous 10/31/25(Fri)08:20:32 No.107062096

>>107062074
you didn't need to post a self portrait with that own thoughbeit

Anonymous
10/31/25(Fri)08:28:02 No.107062154

Anonymous 10/31/25(Fri)08:28:02 No.107062154

>>107061935
The negativity here is weird? India has a developing tech and science sector, so It's definitely feasible. In general, competition is good!
It's probably going to a 1 trillion parameter MoE, and it's probably going to suck. But that's good, because the process of training that model will help Mahindra build the infrastructure and the next model will be better, and the model after that will be even better.

Anonymous
10/31/25(Fri)08:29:51 No.107062174

Anonymous 10/31/25(Fri)08:29:51 No.107062174

>>107062154
>it's probably going to suck. But that's good, because the process of training that model will help Mahindra build the infrastructure and the next model will be better,
that's true, they should DO NOT REDEEM and keep working hard, india numba 1 saar

Anonymous
10/31/25(Fri)08:31:10 No.107062184

Anonymous 10/31/25(Fri)08:31:10 No.107062184

>SillyTavern
>API: chat completion
>system prompt enabled
But the system prompt doesn't work and the console shows that a generic one is applied...
Do I really have to use text completion mode and set up everything else manually or what am I missing here?

Anonymous
10/31/25(Fri)08:33:24 No.107062200

Anonymous 10/31/25(Fri)08:33:24 No.107062200

>>107062184
Where did you fill the system prompt?
When using the chat completion API, you don't write it in the same place as you would with the text completion API, you do it in the samplers page, down where you choose the order the of things that are sent to the backend (main promot, character card, persona, etc).

Anonymous
10/31/25(Fri)08:36:06 No.107062218

Anonymous 10/31/25(Fri)08:36:06 No.107062218

So is the new Qwen 32B better than the original? Did they finally figure out how to do multimodal without butchering text performance?

Anonymous
10/31/25(Fri)08:36:30 No.107062222

Anonymous 10/31/25(Fri)08:36:30 No.107062222

>>107062200
ST is trash and has confused so many people with wrong terminology and usage patterns.

Anonymous
10/31/25(Fri)08:37:21 No.107062226

Anonymous 10/31/25(Fri)08:37:21 No.107062226

>>107062218
>Qwen
Idk how you guys are interested in this series, it's probably the most bland model ever, terrible for RP

Anonymous
10/31/25(Fri)08:37:59 No.107062232

Anonymous 10/31/25(Fri)08:37:59 No.107062232

>>107062226
Maybe they are not doing RP.

Anonymous
10/31/25(Fri)08:38:39 No.107062236

Anonymous 10/31/25(Fri)08:38:39 No.107062236

Qwen mascot is not fuckable

Anonymous
10/31/25(Fri)08:41:33 No.107062259

Anonymous 10/31/25(Fri)08:41:33 No.107062259

You are not fuckable

Anonymous
10/31/25(Fri)08:43:50 No.107062275

Anonymous 10/31/25(Fri)08:43:50 No.107062275

>>107062259
Not true >>107061051

Anonymous
10/31/25(Fri)08:44:23 No.107062278

Anonymous 10/31/25(Fri)08:44:23 No.107062278

>>107062236
That can be easily remediated.

Anonymous
10/31/25(Fri)08:49:30 No.107062327

Anonymous 10/31/25(Fri)08:49:30 No.107062327

>>107062200
Oh what the heck...
Unfortunately I don't seem to be able to easily switch between different system prompts. But there is a checkbox that is disabled called "block overrides" which implies there are ways to override it...
Thanks anon, you replied just two minutes later, while /aigc/ yesterday ignored my question entirely, until their thread died. Local still is king.
>>107062222
SillyTavern indeed is a confusing mess.
Unfortunately I don't know a good alternative. Mikupad is too bare bones for what I want.

Anonymous
10/31/25(Fri)08:52:09 No.107062351

Anonymous 10/31/25(Fri)08:52:09 No.107062351

File: morning glory milking farm.jpg (189 KB, 999x1500)

189 KB JPG

>>107062226
not everyone is a coomer porn addict with too much estrogen (pic related - that's the real audience of text porn -- women who want to get ravaged by minotaurs)

Anonymous
10/31/25(Fri)08:55:11 No.107062369

Anonymous 10/31/25(Fri)08:55:11 No.107062369

>>107062327
>Unfortunately I don't seem to be able to easily switch between different system prompts.
>which implies there are ways to override it...
Yes. The advanced tab of the character card has two override fields, one of them for the system prompt, I think.
Or, you can just turn the system prompt off and use the character card since it's part of the final system prompt itself anyways.
Want multiple? Just have a bunch of character cards.

Anonymous
10/31/25(Fri)08:55:26 No.107062372

Anonymous 10/31/25(Fri)08:55:26 No.107062372

File: eric andre the jew.png (320 KB, 500x500)

320 KB PNG

>>107062351
>pircel
please tell me this isn't real, please tell me sike

Anonymous
10/31/25(Fri)08:56:52 No.107062385

Anonymous 10/31/25(Fri)08:56:52 No.107062385

File: file.png (443 KB, 1198x727)

443 KB PNG

>>107062372
i'm so fucking sorry bro

Anonymous
10/31/25(Fri)08:56:59 No.107062386

Anonymous 10/31/25(Fri)08:56:59 No.107062386

>>107062327
Tbh I never understood mikupad. It's made by an autist and documentation is bad.
ST is still one of the few mainstream choices warts and all.
I made my own client but that gets in the way of things but it's pretty educational.

Anonymous
10/31/25(Fri)08:57:24 No.107062389

Anonymous 10/31/25(Fri)08:57:24 No.107062389

>>107062372
it's real.
It's also not a new trend or anything, it's just tik tok taking that one and running with it, making it viral in the process.
According to what one anon wrote, it's just slop women's fiction about an "average girl" and a "hot rich guy" with the addition of minotaur dicks involved.

Anonymous
10/31/25(Fri)08:59:22 No.107062404

Anonymous 10/31/25(Fri)08:59:22 No.107062404

>>107062385
>>107062389
women are so weird bro, I wished I was a faggot so I wouldn't have to deal with them desu

Anonymous
10/31/25(Fri)09:01:31 No.107062418

Anonymous 10/31/25(Fri)09:01:31 No.107062418

>>107062389
>>107062404
Dollar store romance books have been a thing for half a century and even longer.
Are kids really this ignorant today?

Anonymous
10/31/25(Fri)09:02:00 No.107062424

Anonymous 10/31/25(Fri)09:02:00 No.107062424

>>107062389
>with the addition of minotaur dicks involved.
it's advanced enough in the furry scale that the book mentions knotting

Anonymous
10/31/25(Fri)09:04:47 No.107062455

Anonymous 10/31/25(Fri)09:04:47 No.107062455

>>107059694
Ah, classic.
I think tool calling should've been sent as a simple chat message all along and the only reason it isn't is because of "safety" (i.e. taking control away from the user).
That's why my assistant only uses user messages to show tool results to the model and not native tool calling.

Anonymous
10/31/25(Fri)09:10:17 No.107062492

Anonymous 10/31/25(Fri)09:10:17 No.107062492

>>107062369
That is a good idea for a workaround. I think I'll go with that.
>>107062386
Takes a bit of fiddling but generally Mikupad is pretty easy and straight forward.
Funny enough I too made my own client, but it's for desktop and absolutely doesn't work on mobile without many changes.
So I thought to use ST while I'm traveling.

Anonymous
10/31/25(Fri)09:13:09 No.107062510

Anonymous 10/31/25(Fri)09:13:09 No.107062510

>>107062418
This isn't a "dollar store romance book" it's just as degenerate as the raunchiest hentai. Women just love to pretend they aren't massive coomers.

Anonymous
10/31/25(Fri)09:15:18 No.107062534

Anonymous 10/31/25(Fri)09:15:18 No.107062534

>>107062510
>Women just love to pretend they aren't massive coomers.
to be fair they aren't as much as coomers are us, we're the ones with testosterone, not them

Anonymous
10/31/25(Fri)09:15:59 No.107062537

Anonymous 10/31/25(Fri)09:15:59 No.107062537

>>107062534
also the penis is built to suck out the little testosterone women have

Anonymous
10/31/25(Fri)09:19:29 No.107062562

Anonymous 10/31/25(Fri)09:19:29 No.107062562

>>107062534
Test is only part of the equation, women seek novelty due to cock burn out. The average 18 year old woman is far ahead of the average 50 year old dude.

Anonymous
10/31/25(Fri)09:19:57 No.107062568

Anonymous 10/31/25(Fri)09:19:57 No.107062568

>>107061924
We don't care.
>>>/g/aicg/

Anonymous
10/31/25(Fri)09:38:21 No.107062726

Anonymous 10/31/25(Fri)09:38:21 No.107062726

File: dipsySaysHappyHalloween.png (2.53 MB, 1024x1536)

2.53 MB PNG

>>107059391
OAI has been teasing this since Q2 2023.
I'm not holding my breath for uncensored models, open source or SaaS, from them.
We instead must rely on the Chinese. How ironic.
Also this: >>107062568

Anonymous
10/31/25(Fri)09:41:06 No.107062751

Anonymous 10/31/25(Fri)09:41:06 No.107062751

>>107062726
>>107062568
Fuck off avatarfag.

Anonymous
10/31/25(Fri)09:41:46 No.107062756

Anonymous 10/31/25(Fri)09:41:46 No.107062756

File: mpv rdp session.png (1.64 MB, 1440x3200)

1.64 MB PNG

>>107059665
Voxtral‑Small‑24B‑2507 -> WhisperX -> NLLB‑200‑3.3B pipeline

Anonymous
10/31/25(Fri)09:42:27 No.107062760

Anonymous 10/31/25(Fri)09:42:27 No.107062760

>>107062351
>>107062385
I'm fucking hyped for Beasts in the Sun EP. 2!!!

Anonymous
10/31/25(Fri)09:43:30 No.107062768

Anonymous 10/31/25(Fri)09:43:30 No.107062768

>>107062562
Strange that your thoughts only revolve around sexuality. Maybe go out for a walk or something. Must be miserable to be you.

Anonymous
10/31/25(Fri)09:46:35 No.107062790

Anonymous 10/31/25(Fri)09:46:35 No.107062790

>>107062768
Oddly personal reaction to such a generic statement

Anonymous
10/31/25(Fri)09:48:04 No.107062801

Anonymous 10/31/25(Fri)09:48:04 No.107062801

>>107062790
either a troon or a MAY GOD FORGIVE ME, a vagina bearer. either way, disregard

Anonymous
10/31/25(Fri)09:50:06 No.107062815

Anonymous 10/31/25(Fri)09:50:06 No.107062815

>>107062801
>>107062790
When was the last time you actually heard a real female voice? Voice synth doesn't apply.

Anonymous
10/31/25(Fri)09:50:40 No.107062823

Anonymous 10/31/25(Fri)09:50:40 No.107062823

>>107062815
>touch grass have sex
kys :)

Anonymous
10/31/25(Fri)09:52:45 No.107062842

Anonymous 10/31/25(Fri)09:52:45 No.107062842

>>107062756
Wait there's foobar2000 on Linux now?

Also, I agree Voxtral is the best.

Anonymous
10/31/25(Fri)09:54:41 No.107062856

Anonymous 10/31/25(Fri)09:54:41 No.107062856

File: postContent.png (450 KB, 512x512)

450 KB PNG

>>107062751

Anonymous
10/31/25(Fri)09:54:59 No.107062859

Anonymous 10/31/25(Fri)09:54:59 No.107062859

>>107062842
it runs in wine no problem, but there's no xdg media integration so adding files to playlists kinda sucks

llama.cpp CUDA dev !!yhbFjk57TDr
10/31/25(Fri)09:55:00 No.107062861

llama.cpp CUDA dev !!yhbFjk57TDr 10/31/25(Fri)09:55:00 No.107062861

>>107056325
Let's try to get the thread back on its tracks: I'm currently working on code for automatically optimizing memory use across multiple GPUs for maximum utilization.
However, the use case of MoE models + multiple GPUs is difficult to do robustly via doing a few virtual test allocations and then interpolating/extrapolating the memory use.
I could instead do it iteratively, but that would add a bit of latency when starting up the model.
So I would like to ask you how much latency you would be willing to tolerate for a feature like that.

Anonymous
10/31/25(Fri)09:55:08 No.107062863

Anonymous 10/31/25(Fri)09:55:08 No.107062863

>>107062815
Your mom's voice is the only one that matters. It's the original ASMR.

Anonymous
10/31/25(Fri)09:57:09 No.107062880

Anonymous 10/31/25(Fri)09:57:09 No.107062880

>>107062861
>So I would like to ask you how much latency you would be willing to tolerate for a feature like that.
If there's the option to save and load that configuration automatically somewhere, as much latency as it takes on the first launch.
Hells, you could even have a separate binary that just does that if it's easier than embedding it in the server itself.

Anonymous
10/31/25(Fri)09:57:45 No.107062883

Anonymous 10/31/25(Fri)09:57:45 No.107062883

>>107062801
I always find it curious when someone takes generalizations personally, it's like an error in processing.
>>107062861
What difference in latency are we talking? I only own a 4090 but utilization always matters more imho, you should be doing a lot more inferencing compared to initialization.

Anonymous
10/31/25(Fri)09:58:29 No.107062887

Anonymous 10/31/25(Fri)09:58:29 No.107062887

>>107062861
>>107062880
Yeah, I think that would be the best. Ensure it's correct and write the result out for future use.

Anonymous
10/31/25(Fri)09:59:00 No.107062891

Anonymous 10/31/25(Fri)09:59:00 No.107062891

>>107062861
>When starting up the model
You mean when loading it in from cold start or with every prompt?

Anonymous
10/31/25(Fri)10:00:44 No.107062902

Anonymous 10/31/25(Fri)10:00:44 No.107062902

>>107062861
If it's just the initial model load, quite a lot of latency is fine!

Also, would it be possible to store/cache the results of these tests? Kind of like the initial RPC-Server load is slow while it copies everything over, but subsequent loads are fast as it stores tensors in ~/.cache

Anonymous
10/31/25(Fri)10:01:05 No.107062905

Anonymous 10/31/25(Fri)10:01:05 No.107062905

>>107062861
It doesn't matter because model is not loaded in interactively anyway.

llama.cpp CUDA dev !!yhbFjk57TDr
10/31/25(Fri)10:05:34 No.107062939

llama.cpp CUDA dev !!yhbFjk57TDr 10/31/25(Fri)10:05:34 No.107062939

>>107062880
>>107062887
I should have clarified: the code is doing the optimization based on free memory so it would be dynamic.
For server use storing the result may be fine but if you're on a desktop or you have other programs running it could cause issues.

>>107062883
>>107062891
Once, when starting up the program and before loading the weights, a few virtual test allocations are done to estimate memory use.
Each test allocation should take something like ~0.1s at most.
With interpolations/extrapolations I would only need 6 test allocations so ~0.6 s.
If I were to do very fine-grained optimizations where individual weight tensors are shuffled between devices it should still stay below ~100 virtual allocations so <= 10 s.

Anonymous
10/31/25(Fri)10:05:56 No.107062941

Anonymous 10/31/25(Fri)10:05:56 No.107062941

>>107062861
>So I would like to ask you how much latency you would be willing to tolerate for a feature like that.
Probably a lot if people tolerate the current trial and error method, which is basically a torture.

Anonymous
10/31/25(Fri)10:06:48 No.107062947

Anonymous 10/31/25(Fri)10:06:48 No.107062947

>>107062939
How are you going to avoid tensor washback?

llama.cpp CUDA dev !!yhbFjk57TDr
10/31/25(Fri)10:08:01 No.107062959

llama.cpp CUDA dev !!yhbFjk57TDr 10/31/25(Fri)10:08:01 No.107062959

>>107062947
What do you mean by tensor washback?

Anonymous
10/31/25(Fri)10:10:07 No.107062980

Anonymous 10/31/25(Fri)10:10:07 No.107062980

>>107062939
Just once on startup is whatever dude, add as much latency as needed

Are there really any use cases that need rapid model-switching? Even in some kind of multi model pipeline where models get unloaded and loaded in, with the speed of inference as it is, any gains in memory efficiency would far outweigh any latency in-between steps. If there are really any edge cases where the opposite is true they would be rare and niche enough that the person doing it should just bypass whatever auto optimisation you are doing and do it themselves

tldr; boot up latency is fine, maybe add a switch for rare edge cases

Anonymous
10/31/25(Fri)10:13:15 No.107063018

Anonymous 10/31/25(Fri)10:13:15 No.107063018

>>107062959
When tensors get flooded, model might receive a latent feedback cycle. This confuses the model.

Anonymous
10/31/25(Fri)10:14:06 No.107063023

Anonymous 10/31/25(Fri)10:14:06 No.107063023

>>107062861
would this increase time for those with only one gpu?

llama.cpp CUDA dev !!yhbFjk57TDr
10/31/25(Fri)10:20:57 No.107063110

llama.cpp CUDA dev !!yhbFjk57TDr 10/31/25(Fri)10:20:57 No.107063110

>>107062980
I think it's relevant for downstream use.
The easiest way to integrate llama.cpp into a larger program is to just manage a llama.cpp server process.
Any memory fitting logic can be disabled but I don't think it would be feasible for e.g. a game dev trying to integrate language models to do that stuff themself.

>>107063023
No, for a single GPU you can do a simple interpolation.
The difficulties come specifically if you can vary memory use both by swapping stuff between GPUs and by moving MoE weights between GPUs and system memory.

Anonymous
10/31/25(Fri)10:26:39 No.107063165

Anonymous 10/31/25(Fri)10:26:39 No.107063165

>>107062980
>Are there really any use cases that need rapid model-switching
Not really a use case use case, but I can imagine Ollama users complaining since iirc their models do get unloaded when idle and loaded back in when they send prompts.

Anonymous
10/31/25(Fri)10:32:05 No.107063216

Anonymous 10/31/25(Fri)10:32:05 No.107063216

>>107057060
i thought the point of qwen vl was to get a description of an image that you can use to prompt the same image with models like flux or qwen image.

Anonymous
10/31/25(Fri)10:36:46 No.107063258

Anonymous 10/31/25(Fri)10:36:46 No.107063258

>>107063216
that's just one use case of vl models

Anonymous
10/31/25(Fri)10:37:07 No.107063261

Anonymous 10/31/25(Fri)10:37:07 No.107063261

>>107063216
it's a general language model with vision, there is no specific point to it any more than there is with any standard llm

Anonymous
10/31/25(Fri)10:38:34 No.107063273

Anonymous 10/31/25(Fri)10:38:34 No.107063273

Feels like we haven't had a proper advance in model capabilities in months

Anonymous
10/31/25(Fri)10:44:45 No.107063336

Anonymous 10/31/25(Fri)10:44:45 No.107063336

>>107063273
There isn't because they have reached technological limits. Benchmarking appeals to investors though...

Anonymous
10/31/25(Fri)10:49:40 No.107063380

Anonymous 10/31/25(Fri)10:49:40 No.107063380

>>107063273
gemini 3 will save the field, r-right bros? there's no AI winter, scaling is still all you need? rocket emoji?

Anonymous
10/31/25(Fri)10:51:10 No.107063395

Anonymous 10/31/25(Fri)10:51:10 No.107063395

>>107063380
yes sir google sukdeepmind will be of delivering fate of the star model soon

Anonymous
10/31/25(Fri)10:56:03 No.107063456

Anonymous 10/31/25(Fri)10:56:03 No.107063456

RAM prices are getting bad.

Anonymous
10/31/25(Fri)11:05:51 No.107063568

Anonymous 10/31/25(Fri)11:05:51 No.107063568

is anyone else just really happy that they have something to do with a high end computer that isn't playing a dogshit aaa game? seriously. fast computers are so cool, but were kinda getting gay before lmg

Anonymous
10/31/25(Fri)11:07:30 No.107063583

Anonymous 10/31/25(Fri)11:07:30 No.107063583

>>107063273
waiting on gemma 4, glm 4.6 air, and we're getting glm 5 before eoy my friend. probably a new deepseek too. a bunch of experimental long context/memory stuff just came out too.

we're definitely in a lull though

Anonymous
10/31/25(Fri)11:10:12 No.107063611

Anonymous 10/31/25(Fri)11:10:12 No.107063611

>>107063456
Placebo, RAM has never been cheaper than now >>106994515

Anonymous
10/31/25(Fri)11:11:17 No.107063623

Anonymous 10/31/25(Fri)11:11:17 No.107063623

>>107063611
>2023
we live in 2025 time traveler

Anonymous
10/31/25(Fri)11:12:20 No.107063639

Anonymous 10/31/25(Fri)11:12:20 No.107063639

>>107063623
Ok troll.

Anonymous
10/31/25(Fri)11:15:17 No.107063662

Anonymous 10/31/25(Fri)11:15:17 No.107063662

>>107063639
Not seeing an argument

Anonymous
10/31/25(Fri)11:15:33 No.107063665

Anonymous 10/31/25(Fri)11:15:33 No.107063665

>>107063583
It looks like anons forgot about Mistral Large 3...

Anonymous
10/31/25(Fri)11:16:08 No.107063669

Anonymous 10/31/25(Fri)11:16:08 No.107063669

>>107063665
lol

Anonymous
10/31/25(Fri)11:17:37 No.107063682

Anonymous 10/31/25(Fri)11:17:37 No.107063682

>>107063583
>glm 4.6 air
do not dare rush them you ungrate

Anonymous
10/31/25(Fri)11:23:21 No.107063733

Anonymous 10/31/25(Fri)11:23:21 No.107063733

>>107063665
>it’s no secret that we’re working on something ‘large’ over the next few weeks
>May 7, 2025

Anonymous
10/31/25(Fri)11:24:24 No.107063740

Anonymous 10/31/25(Fri)11:24:24 No.107063740

>>107063665
pretty sure mistral forgot about mistral large 3

Anonymous
10/31/25(Fri)11:35:18 No.107063837

Anonymous 10/31/25(Fri)11:35:18 No.107063837

File: main-image.jpg (226 KB, 1200x1190)

226 KB JPG

>>107061898
>>107062726
>Dress is thin and form fitting instead of thick and draping so that only some of the body's curves show through
Pure dogshit, get some taste, etc. but happy halloween

Anonymous
10/31/25(Fri)11:36:51 No.107063851

Anonymous 10/31/25(Fri)11:36:51 No.107063851

>>107063837
Haha penis.

Anonymous
10/31/25(Fri)11:41:52 No.107063883

Anonymous 10/31/25(Fri)11:41:52 No.107063883

>>107063851
There is no penis anywhere in that image????

Anonymous
10/31/25(Fri)11:56:03 No.107063992

Anonymous 10/31/25(Fri)11:56:03 No.107063992

>>107063981
>>107063981
>>107063981

Anonymous
10/31/25(Fri)12:41:46 No.107064382

Anonymous 10/31/25(Fri)12:41:46 No.107064382

>>107062568
>>107062726
i was just responding on topic you giga autist who probably can't even use the models correctly

Anonymous
10/31/25(Fri)13:42:12 No.107065007

Anonymous 10/31/25(Fri)13:42:12 No.107065007

File: laughing skull.gif (174 KB, 299x240)

174 KB GIF

>>107064382

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.