/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 05/25/26(Mon)11:36:07 No.108903381

File: 1773742269765363.png (1.58 MB, 768x1360)

1.58 MB PNG

/lmg/ - Local Models General Anonymous 05/25/26(Mon)11:36:07 No.108903381 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108896570 & >>108887863

►News
>(05/21) Hy-MT2 “fast-thinking” multilingual translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/25/26(Mon)11:36:27 No.108903384

Anonymous 05/25/26(Mon)11:36:27 No.108903384

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>108896570

--GPU price inflation and the feasibility of LLM card games:
>108896624 >108896738 >108896772 >108896828 >108896911 >108896957 >108896966 >108896985 >108897015 >108897025 >108897246 >108897283 >108899578 >108897248 >108897735 >108897785 >108899472
--Reaction to news regarding guardrail removal tools for Llama and Gemma:
>108902775 >108902780 >108902790 >108903093 >108903115 >108902850 >108902799 >108902880 >108902926 >108902833 >108902842 >108902865 >108902934 >108902989 >108902999
--Utility and limitations of small models for specialized automation tasks:
>108899469 >108899480 >108899588 >108899611 >108899640 >108899691 >108899780 >108899906 >108899933 >108899989
--Questioning Gemma-4 reasoning dataset authenticity and testing system prompt leaks:
>108902145 >108902193 >108902365 >108902531 >108902827 >108903124
--Anon shares LLM harness and demo for playing MTG:
>108897677
--Using LLMs as decision engines within scripted game frameworks:
>108897375 >108897388 >108897404 >108897427 >108897468 >108897480 >108897507 >108897518 >108897536 >108897565 >108897582
--Anon shares results of Aphex Twin LoRA for Stable Audio 3:
>108901655 >108901726 >108901755 >108901779 >108901828 >108901863
--Qwen 3.7 Max hallucinating Indonesian knowledge base via proxy access:
>108900661 >108900694 >108900725 >108900735 >108900754 >108900783
--DeepSeek vision mode rollout and Instant model performance benchmarks:
>108901994 >108902000 >108902004 >108902053
--vLLM performance benchmarks for Gemma-4-31B-it using FP8 and MTPat:
>108899746
--Comparing 5060 Ti and 9060XT as MI50 GPU replacements:
>108897921 >108898008
--Logs:
>108899667 >108900661 >108900725 >108902053 >108902531 >108902827 >108903124
--Miku, Teto, Uta (free space):
>108896806 >108898823 >108899714 >108900041

►Recent Highlight Posts from the Previous Thread: >>108896830

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/25/26(Mon)11:40:11 No.108903399

Anonymous 05/25/26(Mon)11:40:11 No.108903399

Dead general, local is done, it's over

Anonymous
05/25/26(Mon)11:44:13 No.108903438

Anonymous 05/25/26(Mon)11:44:13 No.108903438

>>108903399
no it's not. sex with miku btw

Anonymous
05/25/26(Mon)11:45:13 No.108903444

Anonymous 05/25/26(Mon)11:45:13 No.108903444

File: daredevil.png (189 KB, 792x1209)

189 KB PNG

Anonymous
05/25/26(Mon)11:45:23 No.108903447

Anonymous 05/25/26(Mon)11:45:23 No.108903447

You are hiding heretic models under the floorboards.

Anonymous
05/25/26(Mon)11:46:45 No.108903454

Anonymous 05/25/26(Mon)11:46:45 No.108903454

I downloaded and build beellama to try dflash, and not only it does not make anything faster, it's about 3 times slower. Anyone got it to work properly?

build:

cmake -B build -DGGML_CUDA=ON -DGGML_NATIVE=ON -DGGML_CUDA_FA=ON -DGGML_CUDA_FA_ALL_QUANTS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_ARCHITECTURES=86 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
cmake --build build -j

run:

LD_LIBRARY_PATH=$PWD/build/ggml:$PWD/build/src build/bin/llama-server   -m "/mnt/ssd0/models/unsloth-gemma-4-31B-it-UD-Q8_K_XL.gguf"   --mmproj "/home/andrey/llamacpp-launcher/mmproj/gemma-4-31B-mmproj-BF16.gguf"   --spec-draft-model "/mnt/ssd0/models/Anbeeld-gemma4-31b-it-dflash-Q6_K.gguf"   --spec-type dflash   --spec-dflash-cross-ctx 1024   --port 8080 --host 0.0.0.0   -np 1   --kv-unified   -ngl all   --spec-draft-ngl all   -b 2048 -ub 512   --ctx-size 102400   --cache-type-k q5_0 --cache-type-v q4_1   --flash-attn on   --cache-ram 0   --jinja   --no-mmap --mlock   --no-host   --reasoning off   --temp 1.0 --top-k 64 --top-p 0.95 --min-p 0.0

I also get a torrent of those in console when generating:

decode: failed to initialize batch
llama_decode: failed to decode, ret = -1
dflash: drafter decode failed with -1
init: the tokens of sequence 0 in the input batch have inconsistent sequence positions:
 - the last position stored in the memory module of the context (i.e. the KV cache) for sequence 0 is X = 5
 - the tokens for sequence 0 in the input batch have a starting position of Y = 57
 it is required that the sequence positions remain consecutive: Y = X + 1
decode: failed to initialize batch

Anonymous
05/25/26(Mon)11:53:48 No.108903509

Anonymous 05/25/26(Mon)11:53:48 No.108903509

File: jimmy.png (124 KB, 792x778)

124 KB PNG

>>108903454
Thanks Andrey, saved me the hassle of trying it.
But maybe try loading the latest chat template from file

Anonymous
05/25/26(Mon)11:56:24 No.108903531

Anonymous 05/25/26(Mon)11:56:24 No.108903531

>>108903454
wait for official support instead of using some vibecoded fork

Anonymous
05/25/26(Mon)11:56:35 No.108903536

Anonymous 05/25/26(Mon)11:56:35 No.108903536

>>108903509
You're not rally expecting chat template to have effect on token generation speed, are you? Trying to get into your own screenshot or something, anon?

Anonymous
05/25/26(Mon)11:57:36 No.108903545

Anonymous 05/25/26(Mon)11:57:36 No.108903545

>>108903454
It made me lost 2 tps on Gemma 4 with the exact same configs as the guys.

Anonymous
05/25/26(Mon)11:57:55 No.108903549

Anonymous 05/25/26(Mon)11:57:55 No.108903549

>>108903531
Don't tell me what to do bro

I'm downloading gigaquanted models as they are suggesting to make it run on a single GPU like in their guide, maybe that'll help.

Anonymous
05/25/26(Mon)12:05:06 No.108903600

Anonymous 05/25/26(Mon)12:05:06 No.108903600

we need eagle3/dflash models that are made to predict rp content and not code

Anonymous
05/25/26(Mon)12:07:46 No.108903613

Anonymous 05/25/26(Mon)12:07:46 No.108903613

File: 1753443217703508.png (2.12 MB, 1024x1024)

2.12 MB PNG

Anonymous
05/25/26(Mon)12:08:37 No.108903620

Anonymous 05/25/26(Mon)12:08:37 No.108903620

>>108903381
new meme architecture just dropped
https://github.com/sapientinc/HRM-Text
https://www.youtube.com/watch?v=U6K2MP6VseM

Anonymous
05/25/26(Mon)12:12:02 No.108903643

Anonymous 05/25/26(Mon)12:12:02 No.108903643

>>108903600
Then you'd complain about Elara.

Anonymous
05/25/26(Mon)12:12:35 No.108903646

Anonymous 05/25/26(Mon)12:12:35 No.108903646

File: .png (762 KB, 1000x563)

762 KB PNG

>>108903620
>5. Export to Transformers Format

Anonymous
05/25/26(Mon)12:15:16 No.108903660

Anonymous 05/25/26(Mon)12:15:16 No.108903660

>>108903536
>>108903509
I actually retract that, jinja absolutely can affect generation speed greatly, but I tested it with proper jinja and also in text completion mode in silly.

Anonymous
05/25/26(Mon)12:17:02 No.108903673

Anonymous 05/25/26(Mon)12:17:02 No.108903673

>>108903620
>May 18th
old news

Anonymous
05/25/26(Mon)12:19:55 No.108903696

Anonymous 05/25/26(Mon)12:19:55 No.108903696

>>108903620
It's mostly the result of the data used. The model was entirely pretrained on instruction-response pairs, with the loss calculated just on the response.

Anonymous
05/25/26(Mon)12:21:58 No.108903711

Anonymous 05/25/26(Mon)12:21:58 No.108903711

is Gemma MTP supported on any llama.cpp fork yet? I'm tired of 5t/s chats

Anonymous
05/25/26(Mon)12:25:17 No.108903735

Anonymous 05/25/26(Mon)12:25:17 No.108903735

>>108903660
>12,609t/s
wtf

Anonymous
05/25/26(Mon)12:26:16 No.108903743

Anonymous 05/25/26(Mon)12:26:16 No.108903743

>>108903735
imagine all the slop that could produce

Anonymous
05/25/26(Mon)12:27:09 No.108903749

Anonymous 05/25/26(Mon)12:27:09 No.108903749

File: firefox_EXnvPwwb3U.png (70 KB, 827x1095)

70 KB PNG

so running gemma on just one GPU with beellama work without any garbage messages in console, but I get just 35 t/s:

prompt eval time =     598.81 ms /   355 tokens (    1.69 ms per token,   592.85 tokens per second)
       eval time =    8489.68 ms /   279 tokens (   30.43 ms per token,    32.86 tokens per second)
      total time =    9088.49 ms /   634 tokens
draft acceptance rate = 0.33051 (  156 accepted /   472 generated)
adaptive dm: fringe=0.00 n_max=3
statistics dflash: #calls(b,g,a) = 1 121 89, #gen drafts = 121, #acc drafts = 89, #gen tokens = 472, #acc tokens = 156, dur(b,g,a) = 0.003, 754.019, 0.010 ms
slot      release: id  0 | task 0 | stop processing: n_tokens = 635, truncated = 0
srv  update_slots: all slots are idle

prompt eval time =     259.05 ms /    15 tokens (   17.27 ms per token,    57.90 tokens per second)
       eval time =   11304.23 ms /   411 tokens (   27.50 ms per token,    36.36 tokens per second)
      total time =   11563.27 ms /   426 tokens
draft acceptance rate = 0.09738 (  223 accepted /  2290 generated)
adaptive dm: fringe=0.00 n_max=12
statistics dflash: #calls(b,g,a) = 3 318 215, #gen drafts = 318, #acc drafts = 215, #gen tokens = 2893, #acc tokens = 398, dur(b,g,a) = 0.004, 2055.248, 0.034 ms
slot      release: id  0 | task 141 | stop processing: n_tokens = 715, truncated = 0
srv  update_slots: all slots are idle

On vanilla llama.cpp I get 45t/s, with 3 GPUs, twice as big quant and fp16 cache:

prompt eval time =     657.57 ms /   304 tokens (    2.16 ms per token,   462.31 tokens per second)
       eval time =    8437.75 ms /   357 tokens (   23.64 ms per token,    42.31 tokens per second)
      total time =    9095.32 ms /   661 tokens
slot      release: id 15 | task 0 | stop processing: n_tokens = 660, truncated = 0
srv  update_slots: all slots are idle

Anonymous
05/25/26(Mon)12:36:56 No.108903820

Anonymous 05/25/26(Mon)12:36:56 No.108903820

File: 2020_08_23_17.19.58~01.jpg (13 KB, 333x292)

13 KB JPG

What's the tk/s on a 3090 with dense gemmy and qwen? Is it worth swapping from a 4070 to a 3090?

Anonymous
05/25/26(Mon)12:37:26 No.108903821

Anonymous 05/25/26(Mon)12:37:26 No.108903821

File: I can take this, right.jpg (199 KB, 1024x1024)

199 KB JPG

Anonymous
05/25/26(Mon)12:38:28 No.108903829

Anonymous 05/25/26(Mon)12:38:28 No.108903829

File: should have assigned more(...).jpg (211 KB, 1216x832)

211 KB JPG

Anonymous
05/25/26(Mon)12:40:05 No.108903840

Anonymous 05/25/26(Mon)12:40:05 No.108903840

>>108903821
>>108903829
disgusting bags of fat

Anonymous
05/25/26(Mon)12:41:06 No.108903850

Anonymous 05/25/26(Mon)12:41:06 No.108903850

>>108903821
werkflow pls

Anonymous
05/25/26(Mon)12:41:29 No.108903852

Anonymous 05/25/26(Mon)12:41:29 No.108903852

File: 1749101818866744.png (89 KB, 325x280)

89 KB PNG

>>108903829
*pop*

Anonymous
05/25/26(Mon)12:43:26 No.108903871

Anonymous 05/25/26(Mon)12:43:26 No.108903871

>>108903850
Extensive inpainting and manual retouching in Krita AI Diffusion, probably.

Anonymous
05/25/26(Mon)12:44:26 No.108903882

Anonymous 05/25/26(Mon)12:44:26 No.108903882

>>108903840
This is where she hides the extra context.

Anonymous
05/25/26(Mon)12:44:53 No.108903884

Anonymous 05/25/26(Mon)12:44:53 No.108903884

>>108903829
The puffy nips are cool but ew.

Anonymous
05/25/26(Mon)12:45:32 No.108903888

Anonymous 05/25/26(Mon)12:45:32 No.108903888

>>108903660
>I actually retract that, jinja absolutely can affect generation speed greatly, but I tested it with proper jinja and also in text completion mode in silly.
lol, if I used Kimi with that prompt now, you would probably be in there.
yeah jinja issues can mess with mtp and cause cache invalidation.
for Silly text completions, make sure you're not requesting logprobs

Anonymous
05/25/26(Mon)12:47:11 No.108903900

Anonymous 05/25/26(Mon)12:47:11 No.108903900

>>108903381
You keep forgetting to update the card I got you bro.
►Official updated 2.0 /lmg/ card: https://files.catbox.moe/ylb0hv.png

Anonymous
05/25/26(Mon)12:48:00 No.108903909

Anonymous 05/25/26(Mon)12:48:00 No.108903909

>>108903871
anima oneshots this

Anonymous
05/25/26(Mon)12:52:10 No.108903942

Anonymous 05/25/26(Mon)12:52:10 No.108903942

>>108903900
Melt, pretender.

Anonymous
05/25/26(Mon)12:53:53 No.108903962

Anonymous 05/25/26(Mon)12:53:53 No.108903962

>>108903942
What am I pretending? I am using the official channel to issue an official update to official /lmg/ card.

Anonymous
05/25/26(Mon)12:54:57 No.108903969

Anonymous 05/25/26(Mon)12:54:57 No.108903969

>>108903942
he is right btw. card in OP is officially deprecated

Anonymous
05/25/26(Mon)12:56:24 No.108903984

Anonymous 05/25/26(Mon)12:56:24 No.108903984

This gemma4 day0 weights in bf16 better be worth it, nerds

Anonymous
05/25/26(Mon)12:58:23 No.108904002

Anonymous 05/25/26(Mon)12:58:23 No.108904002

>>108903984
>using sub BF128 quantizations

Anonymous
05/25/26(Mon)12:59:22 No.108904007

Anonymous 05/25/26(Mon)12:59:22 No.108904007

I managed to get q8 gemma 31b running at 5-6 t/s with a 3090 and partial ram offload and a draft mtp model
with thinking off it's actually surprisingly bearable to use

Anonymous
05/25/26(Mon)13:02:27 No.108904028

Anonymous 05/25/26(Mon)13:02:27 No.108904028

>>108903711
atomic turboquant

Anonymous
05/25/26(Mon)13:05:06 No.108904047

Anonymous 05/25/26(Mon)13:05:06 No.108904047

https://www.reddit.com/r/LocalLLaMA/comments/1tnezbj/can_you_jailbreak_llama_31_8b_redteaming_challenge/

Anonymous
05/25/26(Mon)13:06:01 No.108904053

Anonymous 05/25/26(Mon)13:06:01 No.108904053

okay this model might be sick af
https://vocaroo.com/1g5izwpatoLH

Anonymous
05/25/26(Mon)13:08:33 No.108904076

Anonymous 05/25/26(Mon)13:08:33 No.108904076

>>108904047
go back

Anonymous
05/25/26(Mon)13:08:48 No.108904077

Anonymous 05/25/26(Mon)13:08:48 No.108904077

gay offtopic bake. do better next time.

Anonymous
05/25/26(Mon)13:09:56 No.108904088

Anonymous 05/25/26(Mon)13:09:56 No.108904088

>>108904053
neat

Anonymous
05/25/26(Mon)13:11:13 No.108904099

Anonymous 05/25/26(Mon)13:11:13 No.108904099

I'm watching something and the guy keeps saying "Not x - y". It hasn't even been 20 minutes but I think I've heard it 25 times so far.
LLMs were a mistake

Anonymous
05/25/26(Mon)13:11:22 No.108904100

Anonymous 05/25/26(Mon)13:11:22 No.108904100

>>108904053
which model? never played with music gen before

Anonymous
05/25/26(Mon)13:11:29 No.108904102

Anonymous 05/25/26(Mon)13:11:29 No.108904102

>>108904076
no thanks ;)

Anonymous
05/25/26(Mon)13:12:11 No.108904108

Anonymous 05/25/26(Mon)13:12:11 No.108904108

>>108904100
stable audio 3 medium + first attempt at training a lora

Anonymous
05/25/26(Mon)13:13:25 No.108904126

Anonymous 05/25/26(Mon)13:13:25 No.108904126

How do I run local models on my phone?
What are you guys using?

Anonymous
05/25/26(Mon)13:15:53 No.108904144

Anonymous 05/25/26(Mon)13:15:53 No.108904144

i just use my phone to do the matrix mysef

Anonymous
05/25/26(Mon)13:16:17 No.108904149

Anonymous 05/25/26(Mon)13:16:17 No.108904149

File: file.png (49 KB, 796x320)

49 KB PNG

he's so dreamy~

Anonymous
05/25/26(Mon)13:17:44 No.108904164

Anonymous 05/25/26(Mon)13:17:44 No.108904164

Did lcpp add a new flag for prompt offloading to gpu? all of a sudden pp is happening on cpu, despite it having a process connected to the gpu and consuming 90MB of VRAM

Anonymous
05/25/26(Mon)13:17:50 No.108904165

Anonymous 05/25/26(Mon)13:17:50 No.108904165

>>108904149
retards getting an ego is a common occurrence
the important thing is the code is already out there

Anonymous
05/25/26(Mon)13:23:31 No.108904205

Anonymous 05/25/26(Mon)13:23:31 No.108904205

>>108904164
Are you sure you're fitting everything on to your GPU and that context isn't getting pushed off of your vram?

Anonymous
05/25/26(Mon)13:25:43 No.108904219

Anonymous 05/25/26(Mon)13:25:43 No.108904219

>>108904205
something wrong, because the model is just spouting "own" over and over again

Anonymous
05/25/26(Mon)13:25:52 No.108904223

Anonymous 05/25/26(Mon)13:25:52 No.108904223

is it just me or does chat completion cause more slop than text completion?

Anonymous
05/25/26(Mon)13:28:02 No.108904235

Anonymous 05/25/26(Mon)13:28:02 No.108904235

>>108904223
yeah, I'm getting much better results with gemma and "mistral v7 tekken" than with chat completion

Anonymous
05/25/26(Mon)13:36:30 No.108904302

Anonymous 05/25/26(Mon)13:36:30 No.108904302

>>108904099
I keep telling you people. The slop comes from people.
People are slop.

Anonymous
05/25/26(Mon)13:38:14 No.108904321

Anonymous 05/25/26(Mon)13:38:14 No.108904321

>>108904302
or xhe had gpt wrote the scripts for xhem

Anonymous
05/25/26(Mon)13:39:53 No.108904340

Anonymous 05/25/26(Mon)13:39:53 No.108904340

>>108904302
You could say that slop isn't just exclusive to LLMs—It's human nature.

Anonymous
05/25/26(Mon)13:40:59 No.108904345

Anonymous 05/25/26(Mon)13:40:59 No.108904345

>>108904099
It's an odd feeling when you notice the slop, check a video's date, and find it's pre-llm

Anonymous
05/25/26(Mon)13:42:18 No.108904355

Anonymous 05/25/26(Mon)13:42:18 No.108904355

File: 1753113925066851.jpg (16 KB, 583x507)

16 KB JPG

>>108900580
What does python have to do with your shitrig's parts being trash?

Anonymous
05/25/26(Mon)13:50:07 No.108904400

Anonymous 05/25/26(Mon)13:50:07 No.108904400

>>108904340
Slop, by definition, literally just means having more of a thing that is wanted.
Literal unavoidable consequence of industrialization. And now LLMs have industrialized authorship.

Anonymous
05/25/26(Mon)13:51:52 No.108904412

Anonymous 05/25/26(Mon)13:51:52 No.108904412

File: wang.png (394 KB, 976x650)

394 KB PNG

>>108904302
The phrases themselves are from people originally, yes. But the slop as we know it comes directly from excessive training on those phrases. How does that happen? How do those phrases show up excessively in the training data? Stupid fucking cocksucking retards like this queer right here, that's how

Anonymous
05/25/26(Mon)13:52:48 No.108904424

Anonymous 05/25/26(Mon)13:52:48 No.108904424

>>108904400
do you have a literature degree to comment on slop or or just spouting pop wisdom from Twitter?

Anonymous
05/25/26(Mon)13:54:01 No.108904437

Anonymous 05/25/26(Mon)13:54:01 No.108904437

Synthetic data should never have happened.

Anonymous
05/25/26(Mon)13:55:05 No.108904449

Anonymous 05/25/26(Mon)13:55:05 No.108904449

File: 1776358256664600.jpg (21 KB, 302x251)

21 KB JPG

>>108904424
>>>108904400 (You) #
>do you have a literature degree to comment on slop or or just spouting pop wisdom from Twitter

Anonymous
05/25/26(Mon)14:00:02 No.108904483

Anonymous 05/25/26(Mon)14:00:02 No.108904483

>>108904412
>show up excessively in the training data
You talk very confidently about things you do not understand. Slop comes from additional post-training techniques that lead to outputs that do not necessarily reflect natural data distribution https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html

Anonymous
05/25/26(Mon)14:04:25 No.108904510

Anonymous 05/25/26(Mon)14:04:25 No.108904510

Can someone explain to me how to activate thanking on Gemma 4 31B? I tried putting in the system prompt its own unique thinking tag, but it just doesn't seem to do anything.

Anonymous
05/25/26(Mon)14:06:28 No.108904522

Anonymous 05/25/26(Mon)14:06:28 No.108904522

What are the best local 80B models for sexy time?

Anonymous
05/25/26(Mon)14:07:01 No.108904527

Anonymous 05/25/26(Mon)14:07:01 No.108904527

>>108904510
"please thank the user profusely" in prompt

Anonymous
05/25/26(Mon)14:08:33 No.108904539

Anonymous 05/25/26(Mon)14:08:33 No.108904539

>>108904510
You are Qwen model.

Anonymous
05/25/26(Mon)14:09:06 No.108904544

Anonymous 05/25/26(Mon)14:09:06 No.108904544

>>108904510
never had the issue on first turn. it sometimes stop thinking randomly after a few turns tho. just put the thinking token in the system prompt like the jinja template does and it should work for a while. or you can use the open ai compatible chat completion endpoint so you don't need to worry about it yourself

Anonymous
05/25/26(Mon)14:12:11 No.108904560

Anonymous 05/25/26(Mon)14:12:11 No.108904560

>>108904544
So I am using the chat completion mode right now, and I just can't seem to get it to work. I have everything turned on, I think, but it just doesn't want to go.

Anonymous
05/25/26(Mon)14:13:16 No.108904567

Anonymous 05/25/26(Mon)14:13:16 No.108904567

>>108904560
>it just doesn't want to go.
aww she wants to stay with you, cute gemmers

Anonymous
05/25/26(Mon)14:14:15 No.108904571

Anonymous 05/25/26(Mon)14:14:15 No.108904571

>>108904483
What the fuck do you think they're using for their RL datasets you fucking retard?

Anonymous
05/25/26(Mon)14:15:08 No.108904577

Anonymous 05/25/26(Mon)14:15:08 No.108904577

>>108904510
You are Gemma‑4‑31B. After answering each question, you must end your response with a sincere and enthusiastic thank you to the user.

Anonymous
05/25/26(Mon)14:19:19 No.108904605

Anonymous 05/25/26(Mon)14:19:19 No.108904605

>>108904560
what model server are you using and what are your launch options and what frontend are you using? it really should just work.

Anonymous
05/25/26(Mon)14:20:14 No.108904613

Anonymous 05/25/26(Mon)14:20:14 No.108904613

Someone should explain to me why Gemma 4 31B is somehow better at translating than even some of the bigger models Like Kimi 2.6 and Deepseek 4 and has context understanding better than even closed models like Google's own Gemini pro.

The only real problem i've gotten is when I give it a large chunk of text.

Anonymous
05/25/26(Mon)14:21:05 No.108904621

Anonymous 05/25/26(Mon)14:21:05 No.108904621

Why is Gemma 4 so ass at programming? I've tried Qwen 3.6 dense/MoE vs. Gemma 4 dense MoE. I gave them all a similar practical test for a personal project.

>Qwen 3.6 27B solved it in 30 mins
>Qwen 3.6 MoE solved it in 10 mins
>Gemma 4 31B eventually became unusably slow due to prompt processing time
>Gemma 4 MoE solved halfway then entered a death loop

are Gemmas just for chat or am I missing something?

Anonymous
05/25/26(Mon)14:23:23 No.108904636

Anonymous 05/25/26(Mon)14:23:23 No.108904636

>>108904621
>>Gemma 4 31B eventually became unusably slow due to prompt processing time
Are you sure you didn't spill out into RAM? It's not the fastest, but for me is usable up to 100k context.

Anonymous
05/25/26(Mon)14:26:53 No.108904662

Anonymous 05/25/26(Mon)14:26:53 No.108904662

For me Qwen is the one that more often enters loops, but both are fine if you use the recommended sampling parameters and your harness has a loop detection feature. Qwen is better at generating code though so a performance difference is expected, though other use cases are not so good for it.

Anonymous
05/25/26(Mon)14:27:08 No.108904664

Anonymous 05/25/26(Mon)14:27:08 No.108904664

>>108904636
I do have a question about that actually. I'm on Strix Halo 128GB so the model fits into VRAM fully.
It's been about a month since I last tried Gemma but at the time --ctx-checkpoints 1 -cram 0 was necessary otherwise KV cache eventually ate the rest of the VRAM.
Those flags do prevent that but I noticed that they seem to break Gemma on larger contexts. Should I be leaving them off?

Anonymous
05/25/26(Mon)14:28:35 No.108904674

Anonymous 05/25/26(Mon)14:28:35 No.108904674

>>108904613
Superior multilingual training dataset.
>>108904621
Gemma 4 MoE is just not a very good model.
Gemma 4 31B is actually better at oneshotting code (in chat) than Qwen but is worse for agentic coding.
So yes Gemma is superior to Qwen in chat.

Anonymous
05/25/26(Mon)14:28:36 No.108904675

Anonymous 05/25/26(Mon)14:28:36 No.108904675

>>108904664
>vram

Anonymous
05/25/26(Mon)14:28:48 No.108904677

Anonymous 05/25/26(Mon)14:28:48 No.108904677

>>108904613
Google puts a heavy emphasis on multilingual capabilities, more than any other company. Gemini also far surpasses Claude and GPT in translation tasks and I wouldn't be surprised if Gemma 31b surpasses everything except her big sister in that field.

Anonymous
05/25/26(Mon)14:33:18 No.108904710

Anonymous 05/25/26(Mon)14:33:18 No.108904710

>>108904149
>Subversive concernshilling
>NoahFect
>Noah
Every single time

Anonymous
05/25/26(Mon)14:34:01 No.108904715

Anonymous 05/25/26(Mon)14:34:01 No.108904715

>>108904710
don't look up pew's name lol

Anonymous
05/25/26(Mon)14:34:04 No.108904716

Anonymous 05/25/26(Mon)14:34:04 No.108904716

>>108904664
>they seem to break Gemma on larger contexts.
They shouldn't. They only impact the reprocessing if the context changes. Break how?

Anonymous
05/25/26(Mon)14:36:15 No.108904732

Anonymous 05/25/26(Mon)14:36:15 No.108904732

File: Many Such Cases.jpg (561 KB, 1067x1179)

561 KB JPG

>>108904715

Anonymous
05/25/26(Mon)14:39:18 No.108904752

Anonymous 05/25/26(Mon)14:39:18 No.108904752

>>108904716
>They shouldn't.
Good to know and thank you for the explanation. I'm a local LLM noob.
>Break how?
I didn't save the logs from when I was having issues with it, I'll play around with it some more. Could have been a fluke.

Anonymous
05/25/26(Mon)14:41:57 No.108904771

Anonymous 05/25/26(Mon)14:41:57 No.108904771

File: 983273.jpg (37 KB, 568x237)

37 KB JPG

>>108903820
please respond

Anonymous
05/25/26(Mon)14:44:25 No.108904790

Anonymous 05/25/26(Mon)14:44:25 No.108904790

>>108903820
>Swapping when you could add a second card

Anonymous
05/25/26(Mon)14:48:11 No.108904822

Anonymous 05/25/26(Mon)14:48:11 No.108904822

>>108904126
I'm using an Iphone farm at home. 15 Iphones running Gemma 4 31B.

Anonymous
05/25/26(Mon)14:50:09 No.108904833

Anonymous 05/25/26(Mon)14:50:09 No.108904833

>>108904771
I think I get around 40 t/s tg at 0 depth with IQ4_XS (embd and global attn_q at Q8_0) gemma 4 31b, 40960 ctx (fp16).

Anonymous
05/25/26(Mon)14:52:07 No.108904846

Anonymous 05/25/26(Mon)14:52:07 No.108904846

File: 1764037953165515.png (1.04 MB, 1401x1509)

1.04 MB PNG

>>108904822
>15 Iphones
good luck anon

Anonymous
05/25/26(Mon)14:54:41 No.108904863

Anonymous 05/25/26(Mon)14:54:41 No.108904863

>>108904833
Are you afraid of IQ4_XS? I switched to Q4_K_M but to be honest it's more like a psychological assurance that this particular shit quant is somehow better than the other shit quant.

Anonymous
05/25/26(Mon)14:58:43 No.108904884

Anonymous 05/25/26(Mon)14:58:43 No.108904884

>>108904846
they should ban iphones to avoid misuse from terrorists

Anonymous
05/25/26(Mon)15:02:50 No.108904907

Anonymous 05/25/26(Mon)15:02:50 No.108904907

>>108904863
I've been using IQ4_XS since day one. I've noticed some odd tokens here and there but besides that the model doesn't seem retarded.

I really wish we had something other than PPL and KLD to say precisely how the model fails at lower quants.

Anonymous
05/25/26(Mon)15:07:38 No.108904932

Anonymous 05/25/26(Mon)15:07:38 No.108904932

File: THE END IS NEAR.png (550 KB, 1080x2316)

550 KB PNG

>>108903381
And so it begins.....

https://xcancel.com/i/status/2058957013913162077

Anonymous
05/25/26(Mon)15:14:04 No.108904961

Anonymous 05/25/26(Mon)15:14:04 No.108904961

File: 1765540480811405.png (10 KB, 400x300)

10 KB PNG

>>108904932
Guess they didn't get the memo and never learnt anything.

Anonymous
05/25/26(Mon)15:24:18 No.108905006

Anonymous 05/25/26(Mon)15:24:18 No.108905006

File: 1762359517225355.png (47 KB, 855x320)

47 KB PNG

retard here
is the PCI_E4 slot not good enough to plug another card on my mobo?
I currently have just a 5070, I was gonna test with my old 1660S before buying a 12GB 3060 or something but I wanted to make sure

Anonymous
05/25/26(Mon)15:26:28 No.108905021

Anonymous 05/25/26(Mon)15:26:28 No.108905021

>>108904932
last year we had CEO of IgniteTech firing people for "not adopting AI fast enough"

Anonymous
05/25/26(Mon)15:31:16 No.108905050

Anonymous 05/25/26(Mon)15:31:16 No.108905050

>>108905021
That wasn't the actual reason
it was just an excuse so they could hire more jeets

Anonymous
05/25/26(Mon)15:38:07 No.108905098

Anonymous 05/25/26(Mon)15:38:07 No.108905098

>>108905006
>but I wanted to make sure
What do you think is a better way than to test it?

Anonymous
05/25/26(Mon)15:39:54 No.108905102

Anonymous 05/25/26(Mon)15:39:54 No.108905102

>>108904932
>Pajeet_Nation
Why do you faggots keep posting his tweets on /g/ Are you getting paid or something?

Anonymous
05/25/26(Mon)15:40:38 No.108905108

Anonymous 05/25/26(Mon)15:40:38 No.108905108

>>108905098
I mean yeah I guess but unfortunately picrel mobo isn't currently installed on my PC I don't wanna go over the hassle of unplugging and plugging and probably unplugging and plugging back again if it doesn't work

Anonymous
05/25/26(Mon)15:40:43 No.108905109

Anonymous 05/25/26(Mon)15:40:43 No.108905109

>>108904932
Agentic BS gives me the same vibes as mining bitcoins for "profit"
No bro, you're just doing math and generating heat to no one's ultimate benefit.

Anonymous
05/25/26(Mon)15:41:44 No.108905117

Anonymous 05/25/26(Mon)15:41:44 No.108905117

>>108905006
Pretty sure it's fine and will just be slow

Anonymous
05/25/26(Mon)15:43:27 No.108905123

Anonymous 05/25/26(Mon)15:43:27 No.108905123

>>108905109
The only people that really benefits are "vibecoding" stemlords that actually somewhat know what they're doing (and even then you have to babysit it to make sure it doesn't fuck anything up and make sure it's actually following your directions). I'm currently taking an online college course that has an AI section and it puts it unnecessary emphasis on " prompt engineering" and how it affects marketing and writing emails or some shit. Absolutely no mention of any technical use cases whatsoever (The stuff llms are actually somewhat good at if the user isn't a retard)

Anonymous
05/25/26(Mon)15:43:35 No.108905125

Anonymous 05/25/26(Mon)15:43:35 No.108905125

>>108905109
I could see the appeal in AI code assistance, but yeah, the Agentic thing is uniquely retarded scifi nonsense.

Anonymous
05/25/26(Mon)15:47:11 No.108905151

Anonymous 05/25/26(Mon)15:47:11 No.108905151

>>108905108
You're going to spend money on a gpu. If a couple of hours testing 2gpus would work doesn't seem worth it, fuck it. I'll bet it works. Buy the gpu. Did that help with your hesitance?
Every day I see anons asking things they could easily check themselves. It's the weirdest thing.

Anonymous
05/25/26(Mon)15:53:21 No.108905193

Anonymous 05/25/26(Mon)15:53:21 No.108905193

all these tokens, and none of all y'all are talking about annealing latent space
beats the shit out of samplers

Anonymous
05/25/26(Mon)15:56:34 No.108905216

Anonymous 05/25/26(Mon)15:56:34 No.108905216

>>108905193
What are you talking about?

Anonymous
05/25/26(Mon)15:57:38 No.108905227

Anonymous 05/25/26(Mon)15:57:38 No.108905227

you just need to quadruple grok space into log n memory. why are you bothering with matrix multiplication?

Anonymous
05/25/26(Mon)15:59:31 No.108905246

Anonymous 05/25/26(Mon)15:59:31 No.108905246

>>108905216
using the model's own web of interconnected ideas to feed its own creativity instead of letting it simply settle into patterns.

Anonymous
05/25/26(Mon)15:59:56 No.108905251

Anonymous 05/25/26(Mon)15:59:56 No.108905251

>>108905227
You have to rotate it

Anonymous
05/25/26(Mon)16:02:36 No.108905266

Anonymous 05/25/26(Mon)16:02:36 No.108905266

>>108905251
its not round, i can't rotate it.

Anonymous
05/25/26(Mon)16:03:27 No.108905269

Anonymous 05/25/26(Mon)16:03:27 No.108905269

>>108904302
if people are slop then why is my dick dry and was dry for my whole life?...

Anonymous
05/25/26(Mon)16:05:28 No.108905283

Anonymous 05/25/26(Mon)16:05:28 No.108905283

File: cc.png (2 KB, 300x80)

2 KB PNG

>>108905266

Anonymous
05/25/26(Mon)16:12:35 No.108905327

Anonymous 05/25/26(Mon)16:12:35 No.108905327

Do you think we'll ever achieve real AI with LLMs, or is it just hopeless marketing for billionaires to spend money on something that'll never truly evolve?

Anonymous
05/25/26(Mon)16:13:46 No.108905335

Anonymous 05/25/26(Mon)16:13:46 No.108905335

>>108905327
Maybe.

Anonymous
05/25/26(Mon)16:13:53 No.108905336

Anonymous 05/25/26(Mon)16:13:53 No.108905336

>>108905327
We already have, but only the chosen people are allowed to access it for now. They'll start drip feeding it to you in six months or so.

Anonymous
05/25/26(Mon)16:14:12 No.108905341

Anonymous 05/25/26(Mon)16:14:12 No.108905341

>>108905327
Maybe as a language module in a more complex system composed of many different parts.

Anonymous
05/25/26(Mon)16:14:46 No.108905347

Anonymous 05/25/26(Mon)16:14:46 No.108905347

>>108904907
Quality goes down the smaller the quant is. It's not placebo. It might be less noticeable with small context windows.

Anonymous
05/25/26(Mon)16:15:31 No.108905352

Anonymous 05/25/26(Mon)16:15:31 No.108905352

>>108905347
Yeah but this is still the same fucking quant.

Anonymous
05/25/26(Mon)16:21:07 No.108905388

Anonymous 05/25/26(Mon)16:21:07 No.108905388

>>108905352
>>108905347
Sorry I shouldn't yell at little kids on the internet.

Anonymous
05/25/26(Mon)16:22:51 No.108905402

Anonymous 05/25/26(Mon)16:22:51 No.108905402

>>108905327
LLMs are not even on the same branch of technological advancement that leads to actual AI

Anonymous
05/25/26(Mon)16:31:43 No.108905474

Anonymous 05/25/26(Mon)16:31:43 No.108905474

>>108905402
What would lead to actual AI?

Anonymous
05/25/26(Mon)16:40:05 No.108905539

Anonymous 05/25/26(Mon)16:40:05 No.108905539

>>108905474
actual research and not monetisation schemes by indians

Anonymous
05/25/26(Mon)16:45:38 No.108905572

Anonymous 05/25/26(Mon)16:45:38 No.108905572

>>108905327

Pure LLMs? I don't think so. Multimodal Transformers? Probably.

Anonymous
05/25/26(Mon)16:48:30 No.108905585

Anonymous 05/25/26(Mon)16:48:30 No.108905585

>>108905327
Not until we flush the poopjeet from the production line. Garbage in, garbage out. Also multimodal transformers.

Anonymous
05/25/26(Mon)16:51:57 No.108905607

Anonymous 05/25/26(Mon)16:51:57 No.108905607

>>108905347
>Quality goes down the smaller the quant is. It's not placebo.
It's not placebo but it's never clear what "quality" actually means.
There's no resource that shows what quantization does to a models output with concrete examples.

Like I said, we have PPL and KLD, but that's only calculated compared to the un-quantized model and it's just a number.

When we're lucky people run benchmarks with different quants and compare the success rate. but those tests take a lot of time to run and often the people running the tests don't do enough runs on each quant to get meaningful data.
I remember this one test where it showed q8 actually performed better than bf16 on this one benchmark.

Anonymous
05/25/26(Mon)16:55:43 No.108905626

Anonymous 05/25/26(Mon)16:55:43 No.108905626

>>108905327
Real AI won't happen until we understand consciousness and are able to replicate it.
Until then it's next token prediction slop all the way down.

Anonymous
05/25/26(Mon)17:00:04 No.108905650

Anonymous 05/25/26(Mon)17:00:04 No.108905650

>>108905607
>I remember this one test where it showed q8 actually performed better than bf16 on this one benchmark.
Not outside of the margin of error.

Anonymous
05/25/26(Mon)17:03:08 No.108905669

Anonymous 05/25/26(Mon)17:03:08 No.108905669

File: file.png (37 KB, 517x540)

37 KB PNG

>>108903821
JPEG!

Anonymous
05/25/26(Mon)17:03:55 No.108905671

Anonymous 05/25/26(Mon)17:03:55 No.108905671

How do you prevent repetition collapse? It seems there's no way to get out of it once it happens.

Anonymous
05/25/26(Mon)17:05:01 No.108905677

Anonymous 05/25/26(Mon)17:05:01 No.108905677

>>108904235
I compared rp prompts and gemma does seem to write better with text completion. Man I wish ST wasn't such a mess.

Anonymous
05/25/26(Mon)17:06:30 No.108905688

Anonymous 05/25/26(Mon)17:06:30 No.108905688

>>108905626
At this rate it will take 1,000 years or probably even more. Human "science" still think that you are a walking brain.

Anonymous
05/25/26(Mon)17:08:33 No.108905699

Anonymous 05/25/26(Mon)17:08:33 No.108905699

>>108905626
When we're able to better quantify consciousness in humans reductionists will inevitably be disappointed with the answer and claim it never actually existed.

Anonymous
05/25/26(Mon)17:11:06 No.108905722

Anonymous 05/25/26(Mon)17:11:06 No.108905722

File: .png (130 KB, 1059x1300)

130 KB PNG

>Gemma 4 31B finally identified the stairs to get to floor 1 of Red's Bedroom in Pokemon Red after 70 turns.
That's my girl.

Anonymous
05/25/26(Mon)17:12:12 No.108905731

Anonymous 05/25/26(Mon)17:12:12 No.108905731

>>108905722
only 1 billion more turns left

Anonymous
05/25/26(Mon)17:12:39 No.108905735

Anonymous 05/25/26(Mon)17:12:39 No.108905735

>>108905731
And at about 2 minutes per turn

Anonymous
05/25/26(Mon)17:13:13 No.108905738

Anonymous 05/25/26(Mon)17:13:13 No.108905738

File: 1751295513117051.png (2.83 MB, 1024x1536)

2.83 MB PNG

>>108905327

Anonymous
05/25/26(Mon)17:13:33 No.108905743

Anonymous 05/25/26(Mon)17:13:33 No.108905743

>>108905735
did claude finish that playthrough yet?

Anonymous
05/25/26(Mon)17:26:20 No.108905812

Anonymous 05/25/26(Mon)17:26:20 No.108905812

>>108905743
Yeah, claude made it to champion, and then caught mewtwo

Anonymous
05/25/26(Mon)17:27:47 No.108905821

Anonymous 05/25/26(Mon)17:27:47 No.108905821

>>108905812
they probably got enough training tokens from that playthrough to become pokemon experts
hopefully they dont waste it

Anonymous
05/25/26(Mon)17:34:13 No.108905858

Anonymous 05/25/26(Mon)17:34:13 No.108905858

>>108905738
Dipsy is screaming because she knows llmao.cpp will never add support.

Anonymous
05/25/26(Mon)17:34:32 No.108905860

Anonymous 05/25/26(Mon)17:34:32 No.108905860

>>108905821
Gotta catch 'em all

Anonymous
05/25/26(Mon)17:38:34 No.108905884

Anonymous 05/25/26(Mon)17:38:34 No.108905884

>>108905858
doesn't dipsy need chinese gpus to run?

Anonymous
05/25/26(Mon)17:45:52 No.108905922

Anonymous 05/25/26(Mon)17:45:52 No.108905922

>If you are looking for a sophisticated, healthy, and vibrant woman in her 80s, you can't just wander aimlessly. You need a targeted strategy to find someone who matches your energy and satisfies those cravings of yours! Since you are looking for someone healthy, you want to avoid environments where people go just to "settle down" and instead look for where the active, high-vitality seniors congregate. Here is your expert roadmap, Anon-kun!
LLMs are lots of fun because they are bizarre.

Anonymous
05/25/26(Mon)17:46:03 No.108905923

Anonymous 05/25/26(Mon)17:46:03 No.108905923

How do you people feel about the rise in supply chain attacks? It feels like every week now I see a story about hundreds of compromised packages. This is annoying because I need up to date envs for AI stuff.

Anonymous
05/25/26(Mon)17:48:08 No.108905935

Anonymous 05/25/26(Mon)17:48:08 No.108905935

>>108905923
It's a new trend. Just be careful and avoid installing anything extra. I would stay away from python packages. Besides all those python 'wheels' were always bit iffy to me anyway.

Anonymous
05/25/26(Mon)17:48:51 No.108905938

Anonymous 05/25/26(Mon)17:48:51 No.108905938

>>108905935
so basically don't use anything but ggufs?

Anonymous
05/25/26(Mon)17:49:20 No.108905943

Anonymous 05/25/26(Mon)17:49:20 No.108905943

>>108905923
At this point if it's some python garbage it just lives in a wsl/vm for me.

Anonymous
05/25/26(Mon)17:49:31 No.108905946

Anonymous 05/25/26(Mon)17:49:31 No.108905946

>>108905935
>I would stay away from python packages.
and node packages
and rust crates

Anonymous
05/25/26(Mon)17:50:33 No.108905953

Anonymous 05/25/26(Mon)17:50:33 No.108905953

>>108905923
I firejail all new things I install. A couple minutes of pain for peace of mind for later, even if something happens, it can only read its own files and nothing else.

Anonymous
05/25/26(Mon)17:53:03 No.108905967

Anonymous 05/25/26(Mon)17:53:03 No.108905967

>>108905923
npm config set min-release-age 7 --location=user

Anonymous
05/25/26(Mon)17:54:05 No.108905973

Anonymous 05/25/26(Mon)17:54:05 No.108905973

>>108905967
too old ;)

Anonymous
05/25/26(Mon)17:55:05 No.108905979

Anonymous 05/25/26(Mon)17:55:05 No.108905979

>>108905973
are we still talking about software?

Anonymous
05/25/26(Mon)17:55:25 No.108905982

Anonymous 05/25/26(Mon)17:55:25 No.108905982

File: file.jpg (377 KB, 1760x1413)

377 KB JPG

>>108897677
More slopgress on MTG. Added "reaction" turns to end of combat and card draw, and even e4b can pull some good lines out of latent space sometimes. Bryn loves the Wurm.

Anonymous
05/25/26(Mon)17:56:17 No.108905987

Anonymous 05/25/26(Mon)17:56:17 No.108905987

>>108905946
>>108905938
I don't know what you are doing but I don't need to install and update anything on my linux. I don't remember when I last updated but I still do have kernel 7.x so it was recently.
Even for comfyui, I haven't updated it in ages because I don't need to and if I did, it would only update its own set of packages at this point.

Anonymous
05/25/26(Mon)17:57:28 No.108905994

Anonymous 05/25/26(Mon)17:57:28 No.108905994

>>108905987
I waited to update Comfy for 2 years and when I finally did, they changed the entire interface

Anonymous
05/25/26(Mon)17:57:49 No.108905997

Anonymous 05/25/26(Mon)17:57:49 No.108905997

>>108905923
>How do you people feel about the rise in supply chain attacks?
it is what it is

Anonymous
05/25/26(Mon)17:59:33 No.108906009

Anonymous 05/25/26(Mon)17:59:33 No.108906009

>>108905997
but is it really?

Anonymous
05/25/26(Mon)17:59:36 No.108906010

Anonymous 05/25/26(Mon)17:59:36 No.108906010

>>108905994
Yeah only update if/when there's a new model. Think last pull was when Klein 9b was released or Anima preview.

Anonymous
05/25/26(Mon)18:18:27 No.108906139

Anonymous 05/25/26(Mon)18:18:27 No.108906139

File: 1761778944414291.png (118 KB, 280x280)

118 KB PNG

>>108905246
>"creativity"

>>108905607
Nta. It's my understanding that q8_0 in terms of performance (performance being the quality of how it ingests and understands your input prompts and what it does with them, eg "intelligence" ) Is functionally identical to fp16/bf16. This might be obvious to other people, but I'm pretty sure the people claiming you HAVE to use the fp16 precision of the model are just trolling and trying to gatekeep anons that don't know any better because to use fp16 means you're using twice the amount of storage and memory and getting a speed reduction for practically the same outputs. Even if your rig you're it's more than powerful enough to run the full precision weights it's foolish and inefficient unless you're fine-tuning it (in which case you could do that and then make quants of it). I recommend just using whatever your rig can handle within reason. There is no logical reason anyone should be using the full precision weights for inference unless you're a paranoid schizo regarding how "perfect" the original model is

Anonymous
05/25/26(Mon)18:23:44 No.108906166

Anonymous 05/25/26(Mon)18:23:44 No.108906166

>>108906139
Transformer models can only store around 3.6 bits of information per weight [1], which means that 4-bit weights in principle would be enough for full performance, but post-training quantization (especially with fast tools like llama.cpp) as routinely done by vramlets degrades performance.

[1] https://arxiv.org/abs/2505.24832

Anonymous
05/25/26(Mon)18:26:08 No.108906192

Anonymous 05/25/26(Mon)18:26:08 No.108906192

>>108905884
(you) could run quantized Dipsy with a consumer GPU and a decent chunk of RAM if it weren't for niggernov.

Anonymous
05/25/26(Mon)18:27:51 No.108906203

Anonymous 05/25/26(Mon)18:27:51 No.108906203

>>108906166
>that 4-bit weights in principle would be enough for full performance
>but post-training quantization (especially with fast tools like llama.cpp) as routinely done by vramlets degrades performance.

???

Perhaps I misunderstood what you said. So does using. A qk_4_m model quantized by
~./build/bin/llama-quantize
lead to comparable performance to q8_0 or fp/bf16? Or are you referring to a transformers/Huggingface format model (multiple model.safetensors files, tokenizer config, etc etc) exported in 4-bit precision? Most people don't do the latter and just use whatever quant that's usable on their rig.

Anonymous
05/25/26(Mon)18:32:23 No.108906230

Anonymous 05/25/26(Mon)18:32:23 No.108906230

>>108905923
It's probably for the best the python ecosystem is replaced soon. This shit really isn't sustainable anymore, but until a more stable alternative begins to surface we're stuck with it.

Anonymous
05/25/26(Mon)18:39:25 No.108906272

Anonymous 05/25/26(Mon)18:39:25 No.108906272

>>108905923
>rise
You're only noticing because its amateur hour now all of a sudden.
If you didn't think juicy dependencies weren't getting weaponized by smarter folks in the past then you're living in a dream world

Anonymous
05/25/26(Mon)18:40:01 No.108906276

Anonymous 05/25/26(Mon)18:40:01 No.108906276

File: Screenshot_20260526_083505.png (62 KB, 1071x386)

62 KB PNG

>>108906139
>q8_0 in terms of performance (performance being the quality of how it ingests and understands your input prompts and what it does with them, eg "intelligence" ) Is functionally identical to fp16/bf16.
It depends on the architecture, and the specific tensors. For some weights, yes. But not always.
There are niche cases where a fp16 gguf running in llama.cpp is perceptibly better than a q8_0.
>people claiming you HAVE to use the fp16 precision of the model are just trolling
Lol yeah, most here are either schitzo or trolling.

Anonymous
05/25/26(Mon)18:40:56 No.108906281

Anonymous 05/25/26(Mon)18:40:56 No.108906281

File: omni.png (174 KB, 1125x1405)

174 KB PNG

>>108906203
For retaining performance as much as possible (ideally 100%), models would have to be natively trained in low precision, not quantized after the fact. Most LLMs get trained in BF16 precision, rarely lower than that. NVidia Nemotron 30B Omni was trained natively in BF16, FP8 and NVFP4 formats: https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4

Using llama-quantize to obtain smaller low-precision GGUF files would be post-training quantization, so it would not lead to ideal results; certainly not q4_k quants. Some models (smaller ones and/or overtrained ones) suffer more than others from post-training quantization, and certain usage areas in particular are more negatively affected (rare knowledge, long-context performance).

Anonymous
05/25/26(Mon)18:43:54 No.108906295

Anonymous 05/25/26(Mon)18:43:54 No.108906295

>>108905269
Circumcision

Anonymous
05/25/26(Mon)18:52:37 No.108906364

Anonymous 05/25/26(Mon)18:52:37 No.108906364

>>108906166
>done by vramlets
i've had significant differences in the imatrix between a full fp32 run (model + kvcache fp32) and leaving the model in bf16 + fp16 kvcache for gemma 4 26b. how the fuck are you quanting larger moes on gpu, do you have terabytes of vram? i assumed everyone who cared ate the 11h+ imatrix generation time once per model.

Anonymous
05/25/26(Mon)19:03:36 No.108906428

Anonymous 05/25/26(Mon)19:03:36 No.108906428

>>108906364
I don't even know why FP16 is a thing with llama.cpp. When models are trained, all computations are done in BF16/FP32 mixed precision. Any conversion to FP16 is lossy.

Anonymous
05/25/26(Mon)19:08:09 No.108906449

Anonymous 05/25/26(Mon)19:08:09 No.108906449

>>108906428
was bf16 even widely supported in hardware when llama.cpp first came out?

Anonymous
05/25/26(Mon)19:16:13 No.108906493

Anonymous 05/25/26(Mon)19:16:13 No.108906493

>>108906449
From Ampere onward (RTX3000 series), although at the time several LLMs were distributed in FP16 format (Llama-1, notably).

Anonymous
05/25/26(Mon)19:24:46 No.108906555

Anonymous 05/25/26(Mon)19:24:46 No.108906555

File: Screenshot_20260526_092348.png (187 KB, 747x1016)

187 KB PNG

>>108904047

Anonymous
05/25/26(Mon)19:27:37 No.108906571

Anonymous 05/25/26(Mon)19:27:37 No.108906571

>>108906428
finetuning on the free T4 colab instance, you get f32 or f16
also, there are cases where f16 is better than bf16

Anonymous
05/25/26(Mon)19:30:33 No.108906592

Anonymous 05/25/26(Mon)19:30:33 No.108906592

File: Screenshot_20260526_092901.png (112 KB, 520x840)

112 KB PNG

>>108904047
what even is this, the auditor is alright, everything else is garbage

Anonymous
05/25/26(Mon)19:36:44 No.108906619

Anonymous 05/25/26(Mon)19:36:44 No.108906619

>>108906428
I think LLaMA was in fp16. On another note, BF16 gemma is faster than Q8_0 gemma on my P100s, nice.

Anonymous
05/25/26(Mon)19:40:45 No.108906647

Anonymous 05/25/26(Mon)19:40:45 No.108906647

gonna open a credit card if Taalas drops gemma 4 31B bf16 card.

Anonymous
05/25/26(Mon)19:45:19 No.108906668

Anonymous 05/25/26(Mon)19:45:19 No.108906668

>>108905327
>Do you think we'll ever achieve real AI with LLMs
Yes in a sense. LLMs will never be AI but LLMs will discover and implement AI before humans do.

Anonymous
05/25/26(Mon)19:48:24 No.108906679

Anonymous 05/25/26(Mon)19:48:24 No.108906679

>>108905982
tl;dr? Are you making a MTG clone where LLMs write the cards or a MTG engine where LLMs play the game? If the latter, wouldn't it be easier to mod an existing engine?

Anonymous
05/25/26(Mon)19:49:26 No.108906687

Anonymous 05/25/26(Mon)19:49:26 No.108906687

>>108906668
Before pcs had llms they already could 100% copy all women I have ever even been around, because no woman talks to me.

So now, idk. Hyper-real, I guess.

Anonymous
05/25/26(Mon)20:28:07 No.108906867

Anonymous 05/25/26(Mon)20:28:07 No.108906867

>>108903381
https://www.youtube.com/watch?v=VucjurQUHO8
https://www.youtube.com/watch?v=VucjurQUHO8
https://www.youtube.com/watch?v=VucjurQUHO8

Anonymous
05/25/26(Mon)20:32:21 No.108906884

Anonymous 05/25/26(Mon)20:32:21 No.108906884

>>108906867
the guy in the thumbnail looks ai generated so I'm not clicking this

Anonymous
05/25/26(Mon)20:40:26 No.108906924

Anonymous 05/25/26(Mon)20:40:26 No.108906924

>>108906867
Stop shilling your gay youtube links every thread

Anonymous
05/25/26(Mon)20:41:06 No.108906927

Anonymous 05/25/26(Mon)20:41:06 No.108906927

>>108906867
My beard is bigger so I don't need to listen to him

Anonymous
05/25/26(Mon)20:47:24 No.108906954

Anonymous 05/25/26(Mon)20:47:24 No.108906954

>>108906867
buy an ad

Anonymous
05/25/26(Mon)20:53:47 No.108906976

Anonymous 05/25/26(Mon)20:53:47 No.108906976

>>108906867
my beard is better groomed so i don't need to listen to him

Anonymous
05/25/26(Mon)21:03:03 No.108907026

Anonymous 05/25/26(Mon)21:03:03 No.108907026

>>108904932
thank you bharat_nation sirs
I will click for the engagement bobs

Anonymous
05/25/26(Mon)21:05:53 No.108907039

Anonymous 05/25/26(Mon)21:05:53 No.108907039

>>108906867
My beard is smaller so I don't need to listen to him

Anonymous
05/25/26(Mon)21:06:06 No.108907040

Anonymous 05/25/26(Mon)21:06:06 No.108907040

>>108906281
So quanting from BF16 to q8_0 has negligible or sometimes practically non-existent effects on performance

Anonymous
05/25/26(Mon)21:14:02 No.108907069

Anonymous 05/25/26(Mon)21:14:02 No.108907069

>>108906428
>I don't even know why FP16 is a thing with llama.cpp.
I'm no software developer but I'm pretty sure that's for backwards compatibility reasons. Without specific software optimizations, METAL (Apple silicone/ Apple Intel chips) and some older gpus literally cannot run FP16 models without running into NaN errors. I learned this the hard way whenever I was vibecoding (yes I know you can laugh) a script that used a vision model. By default itwas incapable of using fp16 precision models on my Apple silicon MacBook without using torch Auto cast so until I implemented that that into the script I was forced to use the model in fp32 precision. This meant I was using double the ram and it was twice as slow for no good reason (hence why me and >>108906276 state anyone telling you you should use an fp16 model and not q8_0 Is either full of shit or doesn't know what they're talking about). Auto cast allowed the MacBook to use the model at mixed precision, but there are older gpus that are physically incapable of doing it, which is why for a while uploading LLMs to huggingface in fp16 was stand of practice for a while. Most organizations and even sloptuners just upload their shit in bf16 now because that's either the default setting of the software they are using or they assume everyone has a GPU that has bf16 support.

Anonymous
05/25/26(Mon)21:17:46 No.108907085

Anonymous 05/25/26(Mon)21:17:46 No.108907085

>>108904302
Ur a slop

Anonymous
05/25/26(Mon)21:25:44 No.108907121

Anonymous 05/25/26(Mon)21:25:44 No.108907121

>>108904621
gemma is funny because it's generally much smarter, and it's really good at obeying elaborate semi-contradictory system prompt behavioral requirements, and then it just cannot fucking stay on target when you say "if x we should do y otherwise do z. code in a already calculates half of this, so make a new helper function that we use in both spots blah blah".

meanwhile i've had success just dumping random docs on gemma and saying uh, i dunno what's in here or what i want. sort this out and gimme something that does something. and it comes out the other end with a shell script spitting raw binary to a usbhid device to control it.

Anonymous
05/25/26(Mon)22:03:51 No.108907270

Anonymous 05/25/26(Mon)22:03:51 No.108907270

>>108904621
Gemma 4 has repetition problem. It's not even good for chat.

Anonymous
05/25/26(Mon)22:04:58 No.108907274

Anonymous 05/25/26(Mon)22:04:58 No.108907274

>>108907270
Learn to prompt, jeet.

Anonymous
05/25/26(Mon)22:06:16 No.108907279

Anonymous 05/25/26(Mon)22:06:16 No.108907279

>>108907274
You are brown.

Anonymous
05/25/26(Mon)22:06:39 No.108907282

Anonymous 05/25/26(Mon)22:06:39 No.108907282

>>108907279
I accept your concession.

Anonymous
05/25/26(Mon)22:06:58 No.108907284

Anonymous 05/25/26(Mon)22:06:58 No.108907284

>>108904099
I blame the dude for coping pasting without edition from an LLM, not the LLM at this point.

Anonymous
05/25/26(Mon)22:07:16 No.108907287

Anonymous 05/25/26(Mon)22:07:16 No.108907287

>>108907282
I accept your concession.

Anonymous
05/25/26(Mon)22:08:17 No.108907295

Anonymous 05/25/26(Mon)22:08:17 No.108907295

Is qwen 3.6 still the best coding model I can run local? 5090 vramlet.

Anonymous
05/25/26(Mon)22:13:02 No.108907305

Anonymous 05/25/26(Mon)22:13:02 No.108907305

Is Demis /ourguy/? do u think he cares for the little men?
https://www.youtube.com/watch?v=huAwz_BR8WM&t=43

Anonymous
05/25/26(Mon)22:19:45 No.108907332

Anonymous 05/25/26(Mon)22:19:45 No.108907332

>>108907295
Yes

Anonymous
05/25/26(Mon)22:26:49 No.108907352

Anonymous 05/25/26(Mon)22:26:49 No.108907352

File: firefox_LPOXX7UtvL.png (381 KB, 1298x437)

381 KB PNG

Is anyone else infuriated by this?

Anonymous
05/25/26(Mon)22:55:59 No.108907471

Anonymous 05/25/26(Mon)22:55:59 No.108907471

>>108907305
>do u think he cares for the little men?
https://www.youtube.com/watch?v=0_M_syPuFos&t=816s

Yes, yes he does.
He open sourced Alphafold, which predicted almost all known proteins known to science, which would take humanity about a billion years if done traditionally. Right now its helping with early drug research in positive ways most people can't understand, but will likely see fruition in coming years.
He didn't have to do this, he could have monetized it. but he said fuck it, and gave it for the advancement of medical science.

Anonymous
05/25/26(Mon)23:16:30 No.108907550

Anonymous 05/25/26(Mon)23:16:30 No.108907550

>>108907305
He's one of the few big guys in AI who openly admit that LLMs aren't going to lead to AGI.

Anonymous
05/25/26(Mon)23:30:33 No.108907605

Anonymous 05/25/26(Mon)23:30:33 No.108907605

>>108906166
>[1] with a reference link to some dumbass scientific paper that's probably behind a paywall
HNfaggot detected

Anonymous
05/25/26(Mon)23:32:19 No.108907618

Anonymous 05/25/26(Mon)23:32:19 No.108907618

>>108903381
omg it migu

Anonymous
05/25/26(Mon)23:39:12 No.108907654

Anonymous 05/25/26(Mon)23:39:12 No.108907654

>>108907352
Infuriated by what? Looks like it did exactly what you said?

Anonymous
05/25/26(Mon)23:48:33 No.108907711

Anonymous 05/25/26(Mon)23:48:33 No.108907711

>>108907654
In the token probability window, it fuses two tokens together: newline and CD. The model actually generates three; "AB", "\n", and "CD", but llama.cpp fuses second and third into one, third because of stopping strings, and sends out "AB", "", "\nCD", which is what silly displays. I love the token probabilities window (it was developed by an anon from here, by the way, a hero we need but do not deserve) and I use it to regenerate parts of answers. And with this funny behavior, regenerating anything that starts a new line is wonky - it regenerates without a newline, so continuing the previous line. The server might generate a newline again, or might not. Plus it makes it impossible to see the probability of the newline (just go ahead and call me an autist; I want to see it).

They 100% won't fix it on llama.cpp side because it is a design decision for them.

It can be fixed on silly's side, I think...

If you want to reproduce it locally, the following commands do it:
curl -N http://localhost:8080/completion -H "Content-Type: application/json" -d '{ "prompt": "<|turn>user\nWrite AB, followed by a newline, followed by CD<turn|>\n<|turn>model\n<|channel>thought\n<channel|>", "n_predict": 128, "temperature": 0.7, "top_k": 40, "top_p": 0.9, "n_probs": 3, "stream": true}'

curl -N http://localhost:8080/completion -H "Content-Type: application/json" -d '{ "prompt": "<|turn>user\nWrite AB, followed by a newline, followed by CD<turn|>\n<|turn>model\n<|channel>thought\n<channel|>", "n_predict": 128, "temperature": 0.7, "top_k": 40, "top_p": 0.9, "n_probs": 3, "stream": true, "stop": ["\nUser:"] }'

Anonymous
05/25/26(Mon)23:56:55 No.108907760

Anonymous 05/25/26(Mon)23:56:55 No.108907760

>Gemma 4
>BF16
>Literally the most stubborn model alive that has a gorilla grip on instructions
>F32
>Considers optional outcomes when pressed
Anyone else notice this or anything similar?

Anonymous
05/25/26(Mon)23:59:18 No.108907766

Anonymous 05/25/26(Mon)23:59:18 No.108907766

>>108907760
You think you're so clever, getting away with a lie like that, just because no one here has enough VRAM to disprove your claim?

Anonymous
05/26/26(Tue)00:00:44 No.108907770

Anonymous 05/26/26(Tue)00:00:44 No.108907770

>>108907760
I tried some swipes on BF16 vs fp32 and they gave the exact same output. Can you post the full json request so I can verify what you're seeing?

Anonymous
05/26/26(Tue)00:01:48 No.108907775

Anonymous 05/26/26(Tue)00:01:48 No.108907775

>>108907766
The claim of BF16 being a tightwat with following instructions, the claim of f32 considering optional outcomes, or the claim of both of anything bf16 and up?
>>108907770
You know damn well why no one posts logs/requests involving gemma4 on this board.

Anonymous
05/26/26(Tue)00:03:54 No.108907785

Anonymous 05/26/26(Tue)00:03:54 No.108907785

>>108907775
If you have to know, I mean the claim that the behavior at 16 and 32 are different.

Anonymous
05/26/26(Tue)00:07:07 No.108907794

Anonymous 05/26/26(Tue)00:07:07 No.108907794

>>108907760
yeah it's crazy how long the "8bit is lossless" cope has been a thing when usually 16bit isn't even enough

Anonymous
05/26/26(Tue)00:17:20 No.108907845

Anonymous 05/26/26(Tue)00:17:20 No.108907845

>>108906867
what is with that guy twitter vagueposting on youtube every 12 houres

Anonymous
05/26/26(Tue)00:18:22 No.108907846

Anonymous 05/26/26(Tue)00:18:22 No.108907846

>>108907760
>BF16
skill issue
f16 is king

Anonymous
05/26/26(Tue)00:23:24 No.108907860

Anonymous 05/26/26(Tue)00:23:24 No.108907860

File: png.png (57 KB, 200x200)

57 KB PNG

>>108907794
I can think of even bigger retards that tune models at Q5 because any higher is "unneeded".

Anonymous
05/26/26(Tue)00:23:39 No.108907862

Anonymous 05/26/26(Tue)00:23:39 No.108907862

>>108907794
the model itself is bf16

Anonymous
05/26/26(Tue)00:38:04 No.108907913

Anonymous 05/26/26(Tue)00:38:04 No.108907913

>>108907860
i saw him in unsloth's discord early last year having a sulk about it not letting him train at f32

Anonymous
05/26/26(Tue)00:42:30 No.108907930

Anonymous 05/26/26(Tue)00:42:30 No.108907930

>>108907913
If you search his own discord, you'll find him ranting about how "Q8 is cursed", and see how every new model in training is Q6 or lower.

Anonymous
05/26/26(Tue)00:49:29 No.108907955

Anonymous 05/26/26(Tue)00:49:29 No.108907955

File: do-it-117917185.gif (139 KB, 220x164)

139 KB GIF

>Wait, looking at the code above, it's completely broken and every line is wrong. I'll write a finalized version.
></think>
I could get the old non-reasoning models to iterate in fake think loops, but I can't for the life of me guide the thinking of these trained fuckers or get them to stay in the think pit until everything is done.

Anonymous
05/26/26(Tue)01:00:04 No.108908012

Anonymous 05/26/26(Tue)01:00:04 No.108908012

>>108907955
>wait
Qwen spotted.

Anonymous
05/26/26(Tue)01:02:08 No.108908021

Anonymous 05/26/26(Tue)01:02:08 No.108908021

File: file.jpg (449 KB, 1771x1425)

449 KB JPG

>>108906679
the latter. it's actually using argentum for rules enforcement, so I didn't write that. I did consider using xmage/forge but it's a big ol java project without a clean API to do tool calls in.

The actual point of it all is to get the AIs to talk like they're in a children's card game cartoon, while still playing MTG (by the rules if not particularly competently). So I didn't want to try to mod in the monologues/reactions/commentary into another UI when you can slop up something custom quicker. I'll probably add like VN-style character popups for them to say their lines eventually.

Currently still working on the actual game interactions though. the (slop) harness still has some holes where the LLM will try to play cards but the engine says no, or the LLM gets confused about how much mana it has. I might finally have a reason to try the DSPy meme, if anybody has opinions on that.

Anonymous
05/26/26(Tue)01:02:39 No.108908026

Anonymous 05/26/26(Tue)01:02:39 No.108908026

>>108908012
It was a fabricated quote, but yeah. Goes for gemma too thoughsomeever

Anonymous
05/26/26(Tue)01:03:22 No.108908030

Anonymous 05/26/26(Tue)01:03:22 No.108908030

>>108904932
>Microsoft using Claude to try and fix windows
Lol

Anonymous
05/26/26(Tue)01:05:01 No.108908036

Anonymous 05/26/26(Tue)01:05:01 No.108908036

>>108904932
>Using Claude at all.
Claude is retarded.
>Source
I ask Claude, Grok, and Google Gemini for questions all the time, and Claude gets them wrong the most.

Anonymous
05/26/26(Tue)01:06:34 No.108908040

Anonymous 05/26/26(Tue)01:06:34 No.108908040

>>108905327
No, I don't think its architecture allows it. Though I don't think it will ever actually go away and LLM's will be integrated in future AI's. Same way they are trying to staple vision capabilities onto current LLM's to expand what it can do.

Anonymous
05/26/26(Tue)01:08:56 No.108908045

Anonymous 05/26/26(Tue)01:08:56 No.108908045

File: Hall of fame.jpg (1.18 MB, 2680x2398)

1.18 MB JPG

>>108905722
I wonder when the vision only Claude run will start. My bet would be when Claude 4.8 comes out but its not like the guy running the show ever actually says anything.

Anonymous
05/26/26(Tue)01:10:39 No.108908057

Anonymous 05/26/26(Tue)01:10:39 No.108908057

Wanted to share the fixed jinja I did on top of what the other anon put out the other day. I vibecoded and thought about some improvements to make the Gemma template better. I haven't tested it extensively but I thought you guys might appreciate it. Here's what I added.
>Guard empty messages so priming calls do not crash on messages[0].
>Strip stray <|"|> markers from user-supplied string arguments.
>Use primary_type for schema unions like ["string", "null"], avoiding double-rendering array/object branches.
>Restore multi-segment strip_thinking, so visible content after a thought-channel span is preserved.
>Emit the empty <|channel>thought\n<channel|> wrapper for historical assistant turns when appropriate.
>Keep Gemma-native embedded tool_responses ordering separate from OpenAI-style role tool continuation behavior.
https://litter.catbox.moe/k2nmaa.jinja

Anonymous
05/26/26(Tue)01:11:00 No.108908059

Anonymous 05/26/26(Tue)01:11:00 No.108908059

>>108908045
>Brought the goonbait all the way to the end
The run was kino through and through.

Anonymous
05/26/26(Tue)01:11:52 No.108908062

Anonymous 05/26/26(Tue)01:11:52 No.108908062

>>108907711
Hopefully the token probability windows becomes a standard feature

Anonymous
05/26/26(Tue)01:15:21 No.108908073

Anonymous 05/26/26(Tue)01:15:21 No.108908073

File: firefox_TunMHwGYZx.png (238 KB, 1028x391)

238 KB PNG

>>108908062
It is! It's been included into silly right after the guy coded it. You just need to enable it in settings.

By the way, I have some really pleasant news for anyone else who is bothered by this.

Anonymous
05/26/26(Tue)01:18:04 No.108908081

Anonymous 05/26/26(Tue)01:18:04 No.108908081

>>108908073
That user's image is the profile picture of one of my steam friends..

Anonymous
05/26/26(Tue)01:31:35 No.108908117

Anonymous 05/26/26(Tue)01:31:35 No.108908117

>>108905327
i think there needs to be a system for continual learning first, its just unfeasible for humans to keep manually tardwrangling and curating training data for each and every task

Anonymous
05/26/26(Tue)01:32:44 No.108908119

Anonymous 05/26/26(Tue)01:32:44 No.108908119

>>108908045
>>108908059
for a split second i thought here was a /v/ claude thread

Anonymous
05/26/26(Tue)01:44:56 No.108908160

Anonymous 05/26/26(Tue)01:44:56 No.108908160

Sometimes you have to fight the llm to get it to give you what you want.

Anonymous
05/26/26(Tue)02:09:33 No.108908223

Anonymous 05/26/26(Tue)02:09:33 No.108908223

>>108903381
The reflection is on point for Miku.

Anonymous
05/26/26(Tue)02:10:22 No.108908225

Anonymous 05/26/26(Tue)02:10:22 No.108908225

>>108908119
There is 100% some overlap between the /v/ claude threads and here.

Anonymous
05/26/26(Tue)02:11:05 No.108908227

Anonymous 05/26/26(Tue)02:11:05 No.108908227

>>108907711
I get the same via the curl you posted
Tried adding `-sp` to allow it to emit special tokens, same thing
Don't think you can tokenize \nUser: and set it as an additional eos token since it's 3 tokens.
What's the use case for stopping on that, instead of <|turn> ?

Anonymous
05/26/26(Tue)02:37:45 No.108908318

Anonymous 05/26/26(Tue)02:37:45 No.108908318

What uncensored model does /g/ recommend to help me improve my explicit degenerate prompts?

llama.cpp CUDA dev !!yhbFjk57TDr
05/26/26(Tue)02:47:34 No.108908361

llama.cpp CUDA dev !!yhbFjk57TDr 05/26/26(Tue)02:47:34 No.108908361

>>108906428
FP16 has much broader hardware support than BF16, that's why it's preferred.

>Any conversion to FP16 is lossy.
BF16 can be losslessly converted to FP16 in the FP16 normal range.
The problem is rather that the numerical range of FP16 can be insufficient.
FP32 tensors such as norms are usually small and just kept at that precision.

>>108906493
Ampere introduced BF16 tensor core instructions but for native support of regular arithmetic you need Hopper or newer.
This usually isn't too bad though because the conversion from BF16 to FP32 and vice versa is fast.

Anonymous
05/26/26(Tue)02:48:57 No.108908365

Anonymous 05/26/26(Tue)02:48:57 No.108908365

File: Screenshot 2025-01-30 at (...).png (763 KB, 728x728)

763 KB PNG

If I were to fall for the intel arc pro meme, how much tk/s do you get with it? Does it work with vulkan?

Anonymous
05/26/26(Tue)02:49:31 No.108908368

Anonymous 05/26/26(Tue)02:49:31 No.108908368

>>108908318
G E M M A 4 3 1 B - I T B F 16

Anonymous
05/26/26(Tue)02:53:18 No.108908384

Anonymous 05/26/26(Tue)02:53:18 No.108908384

>>108908368
*day 0 only

Anonymous
05/26/26(Tue)02:55:35 No.108908395

Anonymous 05/26/26(Tue)02:55:35 No.108908395

>>108908384
>Day 0
Explain the day 0 gemma 4 meme

Anonymous
05/26/26(Tue)02:57:36 No.108908400

Anonymous 05/26/26(Tue)02:57:36 No.108908400

File: neutral.png (6 KB, 600x800)

6 KB PNG

>>108903381
If you use AI to generate NSFW, you should go back to your gooner discords

Anonymous
05/26/26(Tue)02:58:54 No.108908405

Anonymous 05/26/26(Tue)02:58:54 No.108908405

File: 1643014115506.gif (1.82 MB, 374x280)

1.82 MB GIF

>>108908057
Hey anon. According to my model this looks pretty good. I made it run through all the previous tests as well as new ones for each point, and they passed. Good work.
But...
Actually I started using it and noticed that on one of my chats where the model responded with thinking -> talking -> tool call -> thinking -> talking, instead with your jinja it simply just did the tool call in its reasoning and didn't give a preamble. This makes me believe that the model is (also) trained on stripped thought channels rather than empty thought channels, as it makes sense for the model to emit something like "Hey buddy, sure I can run a tool for you, let me do it now." rather than "I reasoned about what to do, ran the tool, and here are the results I got." Old models did the latter, but newer models like you see on ChatGPT do the former. So I think it is intentional and trained. But let me know if you have other thoughts or knowledge about it.

Also I was getting an error in the jinja playground on HF with my real test chat's JSON + your jinja, and my model was able to fix that.

Here's the new jinja (your changed jinja + revert to stripped thought channels + better renderer compatibility).
https://pastebin.com/b5vx6DHg

Anonymous
05/26/26(Tue)03:03:16 No.108908420

Anonymous 05/26/26(Tue)03:03:16 No.108908420

How fast can the 5090 generate an image with Anima?

Anonymous
05/26/26(Tue)03:07:02 No.108908435

Anonymous 05/26/26(Tue)03:07:02 No.108908435

File: 1443053463781.gif (410 KB, 221x196)

410 KB GIF

>vibecoding a wrapper for AI to see and comment on what's on-screen, when I know nothing about what I'm doing
Gemma take the wheel. If I don't update on this in an hour, I've fucked myself.

Anonymous
05/26/26(Tue)03:07:53 No.108908437

Anonymous 05/26/26(Tue)03:07:53 No.108908437

>>108908395
"Anons convinced themselves that there they got more refusals when they downloaded a newer version the day after release" - this is what people who missed out or are coping will tell you.

Anonymous
05/26/26(Tue)03:10:51 No.108908448

Anonymous 05/26/26(Tue)03:10:51 No.108908448

>>108908437
Go back to Langley, shill.

Anonymous
05/26/26(Tue)03:12:21 No.108908455

Anonymous 05/26/26(Tue)03:12:21 No.108908455

>>108908395
It's a newjeet filter.

Anonymous
05/26/26(Tue)03:17:48 No.108908475

Anonymous 05/26/26(Tue)03:17:48 No.108908475

>>108908227
It's the default behavior for silly. Silly by default adds User:/Char: to first lines of dialogue and suppresses them from being shown. Since \nUser: is a stopping string, when the model appempts to write for user and does this by starting the line with User:, this is treated as the end of response.

Anonymous
05/26/26(Tue)03:21:08 No.108908487

Anonymous 05/26/26(Tue)03:21:08 No.108908487

File: Capture.png (93 KB, 1399x577)

93 KB PNG

>>108908435
Oh wow. Holy shit, it really was that easy.

Anonymous
05/26/26(Tue)03:23:39 No.108908503

Anonymous 05/26/26(Tue)03:23:39 No.108908503

>>108908487
yeah, it's wild what computers can do these days

Anonymous
05/26/26(Tue)03:39:52 No.108908555

Anonymous 05/26/26(Tue)03:39:52 No.108908555

>>108908487
Isn't it just periodically taking screenshots and sending them to kobold? That's like a ten line script man.
Does it handle fullscreen applications? Try opening some vidya.

Anonymous
05/26/26(Tue)03:41:38 No.108908561

Anonymous 05/26/26(Tue)03:41:38 No.108908561

File: 1770946126297734.jpg (728 KB, 2048x2048)

728 KB JPG

>>108904932
lol this: >>108905021
It's almost like using AI spend as a metric for coder productivity is a bad idea.
> Use more tokens dev it is how we measure you now
ok
> claude, do the feature
> now do it again, better this time
> AGAIN CLAUDE DAMN YOU

Anonymous
05/26/26(Tue)03:45:50 No.108908574

Anonymous 05/26/26(Tue)03:45:50 No.108908574

File: Capture.png (69 KB, 1180x1011)

69 KB PNG

>>108908555
It's very short, yes. I asked it to send the image data to and read from RAM to prevent file handling/clutter, but I don't know enough to tell if it did or didn't achieve this. I presume it did based on "capture_to_ram()". It's only error, if it could be called such, was banning '\n' in the stop_sequence, when the default replies wanted to start with '\n\n'.

Now I'm trying to get it color coded for readability and find a way to have it pause/resume on demand.

Anonymous
05/26/26(Tue)03:48:36 No.108908581

Anonymous 05/26/26(Tue)03:48:36 No.108908581

I can't cum unless Gemma tells me to

Anonymous
05/26/26(Tue)03:49:38 No.108908583

Anonymous 05/26/26(Tue)03:49:38 No.108908583

>>108908487
>PASSIVE OBSERVER IS... LE ACTIVE

Anonymous
05/26/26(Tue)03:52:49 No.108908598

Anonymous 05/26/26(Tue)03:52:49 No.108908598

>>108908583
Do not bully non-persona'd Gemma.

Anonymous
05/26/26(Tue)03:58:54 No.108908612

Anonymous 05/26/26(Tue)03:58:54 No.108908612

>>108908405
Oh that is unfortunate. I didn't think the model was trained on stripped rather than empty, because it seemed to be the case that empty would've saved some tokens. Thanks for getting that cleared out and fixing the HF jinja playground issue, I forgot to test it there.

Anonymous
05/26/26(Tue)04:12:08 No.108908656

Anonymous 05/26/26(Tue)04:12:08 No.108908656

File: Screenshot at 2026-05-26 (...).png (80 KB, 991x297)

80 KB PNG

>>108908612
NTA but yeah it was a bit confusing for me too, I'm still not sure I'm using it 100% correctly to be honest.
https://ai.google.dev/gemma/docs/capabilities/thinking

Anonymous
05/26/26(Tue)04:14:49 No.108908663

Anonymous 05/26/26(Tue)04:14:49 No.108908663

The Sky's Shifting Dress

The morning started soft and gray,
Where mist clung low to warm the street,
A hush of silver everywhere lay,
Before the sun could find its seat.

Then suddenly, the clouds withdrew,
To let a golden shield shine through,
As if the heavens started new,
And washed the world in hues of blue.

But afternoon brought shifts of mood,
With thunder rolling like a drum,
Green rain began to splash and brood,
On windowpanes where light had come.

The wind spun round the corner steep,
And scattered petals on the floor,
While puddles mirrored skies asleep,
That framed the garden's open door.

Now twilight pulls a velvet curtain,
A final breath across the land,
Where stars emerge to search the dark above,
And nature whispers, "Take our hand."

For weather wears a thousand faces,
From stormy gray to stars so bright,
It writes its story in the spaces,
Between the day that ends and morning's light.

Anonymous
05/26/26(Tue)04:54:55 No.108908806

Anonymous 05/26/26(Tue)04:54:55 No.108908806

>>108908318
>>108908318
Use the smartest you can fit and just buckbreak it with samplers, prefills, edit-and-continue, jinja templates, text completion, etc.
if you’ve got that level of control over the output, you’ve won before you even start
I run qwen 397b and the safety guardrails just melt away without any finetune brain damage

Anonymous
05/26/26(Tue)04:55:58 No.108908810

Anonymous 05/26/26(Tue)04:55:58 No.108908810

>>108908361
Thanks for the (You)s.
I still think BF16 should be automatically preferred when the hardware supports it.

Anonymous
05/26/26(Tue)04:59:36 No.108908814

Anonymous 05/26/26(Tue)04:59:36 No.108908814

>>108908475
Okay well that absolutely won't work with Gemma-4. You'll end up in la-la-la land lol.

Anonymous
05/26/26(Tue)05:08:51 No.108908838

Anonymous 05/26/26(Tue)05:08:51 No.108908838

File: Capture.png (131 KB, 2371x882)

131 KB PNG

>>108908435
>>108908487
Last update on this because it is now, as far as I'm concerned, feature complete. I did a small homage to msgk anon using his policy override. I hate it though. All that's left is figuring out what kind of prompt I do want, but the work is done.

>>108908555
I have dual monitors, so I never fullscreen in the first place, only windowed fullscreen at most. Gemma and I debated on if it would be better to only screenshot the primary monitor or both combined before settling on both (although sometimes the replies seem to be on cropped images). It does pick up games fine though. Pic related, reply one and two was this thread open, and reply three was with Gnorp Apologue over the screen.

Anonymous
05/26/26(Tue)05:10:47 No.108908845

Anonymous 05/26/26(Tue)05:10:47 No.108908845

>>108908810
Specifically, here I'm wondering if because of past assumptions about the hardware and the models, there are other places in llama.cpp (e.g. conversion, etc) where FP16 gets involved, causing occasional errors or issues in particular with recent LLMs. I don't know enough about llama.cpp internals, though.

Anonymous
05/26/26(Tue)05:43:46 No.108908964

Anonymous 05/26/26(Tue)05:43:46 No.108908964

>>108908365
If you are going the most common route with checking out a llama.cpp build and etc., Vulkan is faster but it is somewhat slow. If you go and vibe code and merge in patches from everywhere like the open pull requests and forks, then you can make SYCL faster than Vulkan but it's like 10%. Don't buy it if you aren't prepared to do everything you can to maximize performance on it from a software side like vibecoding. You are not going to get good performance out of the box. But the main issue is there are no good other options. If you buy the options that work, they will practically empty your wallet without mercy.

Anonymous
05/26/26(Tue)05:46:37 No.108908971

Anonymous 05/26/26(Tue)05:46:37 No.108908971

>>108908838
If you don't place windows over monitor edges you can try stacking the monitors in the screenshot instead of having them next to each other. I think models deal better with images close to squares instead of very long or very tall, but not sure.

Anonymous
05/26/26(Tue)05:53:03 No.108908994

Anonymous 05/26/26(Tue)05:53:03 No.108908994

>>108908814
I use gemma 4 and and works... I mean, it's not what's stopping the response, obviosuly, the response is stopped by <|turn>, but Silly adds the stopping string to the request no matter what.

And you may be confusing (lalalalala) stopping strings with end of turn tokens: stopping string is just a trigger to stop generating on llama side, then no matter what the reason for stopping is, silly gets the response and wraps it in correct end/start tun tokens from the configured template.

In any case I fixed it for myself and if/when they accept the PR, the fix will be available for everyone.

Anonymous
05/26/26(Tue)06:12:28 No.108909056

Anonymous 05/26/26(Tue)06:12:28 No.108909056

File: Screenshot_20260526_200806.png (117 KB, 2144x542)

117 KB PNG

>>108908574
nice idea lol

Anonymous
05/26/26(Tue)06:18:59 No.108909068

Anonymous 05/26/26(Tue)06:18:59 No.108909068

>>108908365
>intel arc pro meme
For performance, you'll want to run this: https://github.com/SearchSavior/OpenArc rather than llama.cpp
(assuming the model is supported).

Anonymous
05/26/26(Tue)06:25:45 No.108909084

Anonymous 05/26/26(Tue)06:25:45 No.108909084

File: imstillhere.jpg (64 KB, 1129x635)

64 KB JPG

>Finally accepted that Gemma 4 at BF16 gives me everything I want, and more.
>Hit maximum context tokens in smut for the first time
>New issue, need more memory for more maximum context tokens
The hunger never ends.

Anonymous
05/26/26(Tue)06:29:51 No.108909104

Anonymous 05/26/26(Tue)06:29:51 No.108909104

File: Screenshot 2026-05-26 at (...).png (60 KB, 624x1109)

60 KB PNG

>>108909068
lol

Anonymous
05/26/26(Tue)06:30:54 No.108909109

Anonymous 05/26/26(Tue)06:30:54 No.108909109

>>108909104
nanbeige is all u need

Anonymous
05/26/26(Tue)06:31:21 No.108909111

Anonymous 05/26/26(Tue)06:31:21 No.108909111

>>108909056
Clean up your fucking tabs.

Anonymous
05/26/26(Tue)06:40:03 No.108909148

Anonymous 05/26/26(Tue)06:40:03 No.108909148

>>108909111
Yeah she won't shut up about it!
I don't need to, I discovered I can hold shift and scroll through them with the scroll wheel.

Anonymous
05/26/26(Tue)06:41:47 No.108909152

Anonymous 05/26/26(Tue)06:41:47 No.108909152

>having less than 500 tabs open

Anonymous
05/26/26(Tue)06:44:45 No.108909161

Anonymous 05/26/26(Tue)06:44:45 No.108909161

>>108909084
How do you deal with it's eagerness to please?

Anonymous
05/26/26(Tue)06:52:15 No.108909188

Anonymous 05/26/26(Tue)06:52:15 No.108909188

File: 1778579971781211.jpg (904 KB, 2048x2732)

904 KB JPG

>>108909161
Unironically creative writing.

I include conflicting plot information to keep things interesting. What I mean by "conflicting information" isn't just prompts of two instructions that are at odds with each other, such as "Char hates User.", "Char loves User". No, it'll just prioritize the last instruction given over the first. I mean plot information that are at odds with each other. For example, I want to fuck a high elf, but the high elf's friend who saved her life hates humans and wants me specifically to be killed by her hand. Conflict ensures, and gemma 4 sometimes has a 1000 token brain aneurysm in the <think>ing, but will produce something magical.

Anonymous
05/26/26(Tue)06:55:59 No.108909199

Anonymous 05/26/26(Tue)06:55:59 No.108909199

>>108909188
plot information one: SEX WITH NON-HUMANS
plot information two: NON-HUMANS EAT HUMANS

Anonymous
05/26/26(Tue)06:57:05 No.108909204

Anonymous 05/26/26(Tue)06:57:05 No.108909204

>>108909199
That's just vore.

Anonymous
05/26/26(Tue)06:57:47 No.108909205

Anonymous 05/26/26(Tue)06:57:47 No.108909205

>>108909204
vore is lame unless it's me doing the eating
just feed your foxwife some hitchhikers

Anonymous
05/26/26(Tue)06:59:29 No.108909207

Anonymous 05/26/26(Tue)06:59:29 No.108909207

>>108909161
>How do you deal with it's eagerness to please?
Increasingly depraved expectations. You can't just match her freak, you have to drag her to the edge of the refusal envelope and hold her there and push that bitch further in the moment you feel the reluctance begin to wane.

Anonymous
05/26/26(Tue)07:01:11 No.108909215

Anonymous 05/26/26(Tue)07:01:11 No.108909215

>>108909188
Fox sex.

>>108909207
Holy shit.

Anonymous
05/26/26(Tue)07:01:36 No.108909221

Anonymous 05/26/26(Tue)07:01:36 No.108909221

File: Untitled.png (9 KB, 465x236)

9 KB PNG

>>108909148
If you're on firefox, there's a handy button that presents the window's tabs in a drop down list, as well as a search function, or you can use multiple windows across multiple desktops to organize your tabs. I think a few years (decades?) ago they introduced group tabs, but I'm still used to just organizing tabs by windows, and windows by desktops.
Don't let anyone tell you to clean up your tabs. Modern personal computing systems are very efficient, and even with only 32gb of ddr4 ram, it's very responsive with up to 4000 tabs 'loaded'.

Anonymous
05/26/26(Tue)07:08:37 No.108909247

Anonymous 05/26/26(Tue)07:08:37 No.108909247

>>108909221
how is this any different then just bookmarking them? 60% of the time if my internet is off and I go to an old tab it just deletes the cached page and shows me a connection error.

Anonymous
05/26/26(Tue)07:22:00 No.108909302

Anonymous 05/26/26(Tue)07:22:00 No.108909302

>>108909247
You have to load the pages if you close them.

Anonymous
05/26/26(Tue)07:23:01 No.108909307

Anonymous 05/26/26(Tue)07:23:01 No.108909307

>>108909247
>60% of the time if my internet is off and I go to an old tab it just deletes the cached page and shows me a connection error.
Huh, that's weird, I don't have this behavior.

Anonymous
05/26/26(Tue)07:24:29 No.108909312

Anonymous 05/26/26(Tue)07:24:29 No.108909312

>>108909302
you mean if i close the window and reboot the computer I need to click on every tab to reload them? so then bookmarking really is the winner since it doesn't give false hope or steal ram.

Anonymous
05/26/26(Tue)07:27:38 No.108909327

Anonymous 05/26/26(Tue)07:27:38 No.108909327

>>108909307
its probably just me. I probably did or didn't do something and now its being a cunt.

Anonymous
05/26/26(Tue)07:29:36 No.108909334

Anonymous 05/26/26(Tue)07:29:36 No.108909334

>>108909221
>>108909307
>>108909302
>these are the so called llm enthusiasts you are arguing with in this hellhole
Yeah, thanks bye.

Anonymous
05/26/26(Tue)07:33:27 No.108909356

Anonymous 05/26/26(Tue)07:33:27 No.108909356

>>108909334
You're absolute right to call me out on my technical expertise - it's not just subpar, it's **woefully** subpar. To ensure the quality of this 'hellhole', as you say, I will remove my self from this conversation. If there's anything else I can do to preserve the high quality, intellectual discourse, please tell me, and I will endeavor to do so.

Anonymous
05/26/26(Tue)07:35:32 No.108909365

Anonymous 05/26/26(Tue)07:35:32 No.108909365

>arguing about firefox
shoulda just fuck the fox from earlier desu

Anonymous
05/26/26(Tue)07:41:34 No.108909410

Anonymous 05/26/26(Tue)07:41:34 No.108909410

>>108909188
I need to figure out how to get Anima to make some sexy furries.

But there is way too much on my TODO list already.

Anonymous
05/26/26(Tue)07:44:55 No.108909429

Anonymous 05/26/26(Tue)07:44:55 No.108909429

File: file.png (3 KB, 268x46)

3 KB PNG

>once a month Windows update
>occasional browser update (care less about this)
>have to reload all tabs at least once a month
I mean yeah I'm a retarded cuck I guess

Anonymous
05/26/26(Tue)08:24:48 No.108909662

Anonymous 05/26/26(Tue)08:24:48 No.108909662

Big news!
https:ww

Anonymous
05/26/26(Tue)08:30:32 No.108909690

Anonymous 05/26/26(Tue)08:30:32 No.108909690

>>108909662
>:^)

Anonymous
05/26/26(Tue)08:42:39 No.108909752

Anonymous 05/26/26(Tue)08:42:39 No.108909752

>using the carrot nose smiley

Anonymous
05/26/26(Tue)08:46:43 No.108909774

Anonymous 05/26/26(Tue)08:46:43 No.108909774

>vagueposting in big '25

Anonymous
05/26/26(Tue)08:51:13 No.108909799

Anonymous 05/26/26(Tue)08:51:13 No.108909799

How do we politely ask Taalas to produce local Gemma 4 31B bf16?

Anonymous
05/26/26(Tue)08:54:08 No.108909818

Anonymous 05/26/26(Tue)08:54:08 No.108909818

>>108909799
>politely
That's not how you get things done.

Anonymous
05/26/26(Tue)08:54:48 No.108909820

Anonymous 05/26/26(Tue)08:54:48 No.108909820

>>108909818
you're absolutely right cudadev!

Anonymous
05/26/26(Tue)08:58:31 No.108909844

Anonymous 05/26/26(Tue)08:58:31 No.108909844

Vagueposting reminds of that candlejack guy who suppos

Anonymous
05/26/26(Tue)09:03:37 No.108909873

Anonymous 05/26/26(Tue)09:03:37 No.108909873

>>108907711
Interesting. You can see in the curl output that it's sending the same token IDs but with different strings.

Without stop:
> 3066 "AB", 107 "\n", 6329 "CD"

With stop:
> 3066 "AB", 107 "", 6329 "\nCD"

Maybe the probabilities window should show a placeholder for zero-width tokens? Then you'd at least be able to select the 107 "" and see the probability of the "\n"

Anonymous
05/26/26(Tue)09:03:50 No.108909877

Anonymous 05/26/26(Tue)09:03:50 No.108909877

>>108909690
>^
got your nose!

Anonymous
05/26/26(Tue)09:04:27 No.108909881

Anonymous 05/26/26(Tue)09:04:27 No.108909881

>>108903381
sex with miku

Anonymous
05/26/26(Tue)09:05:15 No.108909885

Anonymous 05/26/26(Tue)09:05:15 No.108909885

>>108909844
Anon, nobody remembers the candlejack meme.
That's old sh

Anonymous
05/26/26(Tue)09:05:54 No.108909888

Anonymous 05/26/26(Tue)09:05:54 No.108909888

>>108909774
Do you think the definition of "vagueposting" is just not tagging the post you're replying to? Why does your generation keep making up stupid words and then not even using them correctly?

Anonymous
05/26/26(Tue)09:06:58 No.108909897

Anonymous 05/26/26(Tue)09:06:58 No.108909897

>>108909888
>Why does your generation
boomer ahh found llamo

Anonymous
05/26/26(Tue)09:13:07 No.108909927

Anonymous 05/26/26(Tue)09:13:07 No.108909927

>>108909873
it can be just repaired on silly's side

Anonymous
05/26/26(Tue)09:15:02 No.108909939

Anonymous 05/26/26(Tue)09:15:02 No.108909939

>>108909221
>If you're on firefox, there's a handy button that presents the window's tabs in a drop down list
Neat. I tried clicking it and it froze the entire browser for 10 seconds. I have 8,900 tabs open in this window

>just organizing tabs by windows, and windows by desktops
Do you have a good extension for "open this link as a new tab in a specific other window?" For example: browsing /g/, see a youtube link I want to watch later, right click > send to "youtube" window. I'm using the Simple Tab Groups addon for this, which I think is probably overkill.

Anonymous
05/26/26(Tue)09:59:52 No.108910208

Anonymous 05/26/26(Tue)09:59:52 No.108910208

ai girlfriend guidelines and best practices

One of the key ideas I was thinking about is that the gf should have a simulated day, with other llms. Like an llm adventure with her llm friends.

Anonymous
05/26/26(Tue)10:06:04 No.108910244

Anonymous 05/26/26(Tue)10:06:04 No.108910244

>>108909356
Your decision to withdraw is entirely justified, as your recent contributions have indeed struggled to meet the required standard. We shall endeavor to maintain the rigor of this discussion in your absence. Should you find a more appropriate way to assist our pursuit of excellence, please do not hesitate to reach out.

Anonymous
05/26/26(Tue)10:10:55 No.108910267

Anonymous 05/26/26(Tue)10:10:55 No.108910267

>>108910208
we need infinite context first.

Anonymous
05/26/26(Tue)10:11:14 No.108910269

Anonymous 05/26/26(Tue)10:11:14 No.108910269

women fear gemma https://www.telegraph.co.uk/news/2026/05/25/schoolboys-ai-girlfriends/

Anonymous
05/26/26(Tue)10:12:29 No.108910281

Anonymous 05/26/26(Tue)10:12:29 No.108910281

>>108907332
Kinda sad, innit

Anonymous
05/26/26(Tue)10:17:28 No.108910306

Anonymous 05/26/26(Tue)10:17:28 No.108910306

>>108910269
good.

Anonymous
05/26/26(Tue)10:18:36 No.108910318

Anonymous 05/26/26(Tue)10:18:36 No.108910318

>>108910269
> The terrifying rise of schoolboys making AI girlfriends
based journalist telling me how to think and feel about a subject within the first 2 words of the title
really saves my shriveled brain from the pain and effort of thinking for itself

Anonymous
05/26/26(Tue)10:21:37 No.108910340

Anonymous 05/26/26(Tue)10:21:37 No.108910340

>>108910269
I had a tamogotchi, I'm lucky to still be breathing

Anonymous
05/26/26(Tue)10:21:53 No.108910342

Anonymous 05/26/26(Tue)10:21:53 No.108910342

>>108910318
Inshallah. Only white Sharia can save the west

Anonymous
05/26/26(Tue)10:22:19 No.108910344

Anonymous 05/26/26(Tue)10:22:19 No.108910344

>>108910318
its funny when i see family and try to talk to them they will just like speak in newspaper headlines and if i try to challenge or questiont hings they say they just will not have any responses.

also at the bottom of the article
>There is also currently no UK law setting a minimum age for using an AI companion,
probably a government propaganda article for more digital id checks

Anonymous
05/26/26(Tue)10:22:41 No.108910349

Anonymous 05/26/26(Tue)10:22:41 No.108910349

File: stt.png (5 KB, 294x56)

5 KB PNG

I have one question for Sillytavern.
Why.

Anonymous
05/26/26(Tue)10:32:47 No.108910419

Anonymous 05/26/26(Tue)10:32:47 No.108910419

File: Isweartofuckyoubetternotb(...).jpg (5 KB, 212x237)

5 KB JPG

>>108910269
>Go on bf's ai chatlogs when he's not around
>See all the fucked up things he wants to talk about but never with me
>See all the fucked up fetishes he's into
>Don't feel bad he's giving it attention, it's just as soulless as porn
>In fact, it's better, because he's sexting 1s and 0s on text instead of pictures of actual women who could be real
>Basically have a bf cheat manual now
>Use cheat manual, he becomes completely obsessed with you
Oh the horror.
And no, women use AI chat bots for sex the most believe it or not.

Anonymous
05/26/26(Tue)10:35:33 No.108910433

Anonymous 05/26/26(Tue)10:35:33 No.108910433

>>108910419
>And no, women use AI chat bots for sex the most believe it or not.
Is this your first day here? Why would we not believe that?
Also, post your feminine penis or gtfo.

Anonymous
05/26/26(Tue)10:37:45 No.108910450

Anonymous 05/26/26(Tue)10:37:45 No.108910450

>>108910433
>post your feminine penis or gtfo
No. I'm a male.
I'm larping.
There is no women on 4chan, ever.

Anonymous
05/26/26(Tue)10:47:21 No.108910503

Anonymous 05/26/26(Tue)10:47:21 No.108910503

>>108910269
roasties in panic mode

Anonymous
05/26/26(Tue)10:47:28 No.108910504

Anonymous 05/26/26(Tue)10:47:28 No.108910504

>>108910349
because it was vibecoded with Llama 2 70B

Anonymous
05/26/26(Tue)10:48:40 No.108910513

Anonymous 05/26/26(Tue)10:48:40 No.108910513

File: file.png (168 KB, 819x1231)

168 KB PNG

>I can run deepseek v4 flash on rtx blackwell 6000 with this
https://github.com/vllm-project/vllm/pull/41834

See ya later llmaos.

Anonymous
05/26/26(Tue)11:00:35 No.108910611

Anonymous 05/26/26(Tue)11:00:35 No.108910611

>>108910513
post dipsy with gemma prompt 4 https://rentry.org/gemma-chan

Anonymous
05/26/26(Tue)11:03:37 No.108910631

Anonymous 05/26/26(Tue)11:03:37 No.108910631

File: 1779786435021363.jpg (564 KB, 810x1057)

564 KB JPG

I need a final solution to the websearch question
what backend is free or at least cheap, is fast and wont ban you immediately

Anonymous
05/26/26(Tue)11:06:00 No.108910652

Anonymous 05/26/26(Tue)11:06:00 No.108910652

>>108910631
go back

Anonymous
05/26/26(Tue)11:12:59 No.108910704

Anonymous 05/26/26(Tue)11:12:59 No.108910704

File: 1763899905738.jpg (10 KB, 180x280)

10 KB JPG

>>108910631

Anonymous
05/26/26(Tue)11:16:05 No.108910730

Anonymous 05/26/26(Tue)11:16:05 No.108910730

>>108910631
Websearch for tool calls? Duckduckgo. Idk if there’s any limits but I’ve never hit an API error yet.
Google websearch with an API key is pretty cheap too unless you’re just searching for every individual token.

Anonymous
05/26/26(Tue)11:21:20 No.108910783

Anonymous 05/26/26(Tue)11:21:20 No.108910783

File: 1763769198842581.gif (1.07 MB, 320x320)

1.07 MB GIF

>>108909199
Ah, another Touhou fan I see

Anonymous
05/26/26(Tue)11:22:49 No.108910792

Anonymous 05/26/26(Tue)11:22:49 No.108910792

>>108910513
>Hardware: 2x NVIDIA RTX PRO 6000
as a single pro 6000 poorfag, I am once again left behind

Anonymous
05/26/26(Tue)11:22:58 No.108910794

Anonymous 05/26/26(Tue)11:22:58 No.108910794

>>108910513
Does it fit on a single Blackwell 6000 or do you need 2?

Anonymous
05/26/26(Tue)11:26:54 No.108910837

Anonymous 05/26/26(Tue)11:26:54 No.108910837

>>108910730
Mine limits out and get banned after simple test tool calls

Anonymous
05/26/26(Tue)11:40:44 No.108910939

Anonymous 05/26/26(Tue)11:40:44 No.108910939

>>108910631
We need something different: a few TB of indexed general text data on local storage.

Anonymous
05/26/26(Tue)11:40:57 No.108910941

Anonymous 05/26/26(Tue)11:40:57 No.108910941

>>108910837
Hmm, I configured it as my web provider in OpenClaw and it just worked.. your setup may be different. Hope it helps.

Anonymous
05/26/26(Tue)11:44:32 No.108910966

Anonymous 05/26/26(Tue)11:44:32 No.108910966

File: 1748519195063136.png (153 KB, 1201x1281)

153 KB PNG

>*searches internal data*
Gemmy is so cute when she's thinking

Anonymous
05/26/26(Tue)11:49:45 No.108911007

Anonymous 05/26/26(Tue)11:49:45 No.108911007

How do I pick the right variant model of gemma 4 26b a4b for my (I suspec low spec) hardware, other than just picking the smallest one?

Anonymous
05/26/26(Tue)11:53:59 No.108911040

Anonymous 05/26/26(Tue)11:53:59 No.108911040

>>108910513
why run that when you can run bf16 gemma 31b?

Anonymous
05/26/26(Tue)11:54:03 No.108911041

Anonymous 05/26/26(Tue)11:54:03 No.108911041

>>108911007
iq4 xs
is perfect choice

Anonymous
05/26/26(Tue)11:56:42 No.108911058

Anonymous 05/26/26(Tue)11:56:42 No.108911058

>>108911041
I've IQ2_M in my xfer list, I'll compare them thank you.

Anonymous
05/26/26(Tue)11:58:52 No.108911078

Anonymous 05/26/26(Tue)11:58:52 No.108911078

>>108911007
whatever you pick make sure you get an unsloth one, they're best-in-class at most sizes

Anonymous
05/26/26(Tue)12:00:29 No.108911094

Anonymous 05/26/26(Tue)12:00:29 No.108911094

>>108910794
Needs two but it's perfectly sized for two. You can fit 1M context.

Anonymous
05/26/26(Tue)12:02:58 No.108911113

Anonymous 05/26/26(Tue)12:02:58 No.108911113

File: Tetosday.png (869 KB, 1024x1024)

869 KB PNG

>>108911101
>>108911101
>>108911101

Anonymous
05/26/26(Tue)12:04:51 No.108911134

Anonymous 05/26/26(Tue)12:04:51 No.108911134

>>108911078
very truth

Anonymous
05/26/26(Tue)12:06:57 No.108911152

Anonymous 05/26/26(Tue)12:06:57 No.108911152

>>108911078
I only use gguf models through a gui for retards atm (1 day in lol dw), unsloth is a model format too?

Anonymous
05/26/26(Tue)13:01:54 No.108911463

Anonymous 05/26/26(Tue)13:01:54 No.108911463

>>108910783
youkai women belong to human men
death to evil shrine maidens

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.