/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 01/28/26(Wed)18:25:56 No.107997948

File: 1767081321191571.jpg (292 KB, 1920x1080)

292 KB JPG

/lmg/ - Local Models General Anonymous 01/28/26(Wed)18:25:56 No.107997948 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107986301 & >>107977622

►News
>(01/28) Trinity Large 399B-A13B released: https://arcee.ai/blog/trinity-large
>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5
>(01/27) DeepSeek-OCR-2 released: https://hf.co/deepseek-ai/DeepSeek-OCR-2
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/28/26(Wed)18:26:16 No.107997953

Anonymous 01/28/26(Wed)18:26:16 No.107997953

File: 3b2d66dbf04962e17c0e2b980(...).jpg (318 KB, 1554x2048)

318 KB JPG

►Recent Highlights from the Previous Thread: >>107986301

--Paper: LoPRo: Enhancing Low-Rank Quantization via Permuted Block-Wise Rotation:
>107990319 >107990445 >107990550
--Nvidia's VRAM strategy and DeepSeek's engram integration prospects:
>107986970 >107987016 >107987142 >107987440 >107987473 >107989902 >107988915 >107988974 >107989098
--GLM 4.7 Flash model compatibility issues with outdated Koboldcpp:
>107992878 >107993034 >107993074 >107993052 >107993069 >107993264 >107993750 >107993771
--Circumventing GPT-oss refusal mechanisms via prompt editing:
>107992504 >107993066 >107993211
--Z-Image base model release and image diversity discussion:
>107986742 >107986763 >107986795 >107993308 >107989901 >107991596 >107991743 >107989947 >107989983 >107990135 >107991526
--Model format conversion challenges and MeE architecture skepticism:
>107991036 >107991466 >107991428 >107992099 >107992113 >107992450 >107992745
--Trinity Large release and sparse MoE architecture performance debate:
>107989969 >107990016 >107990887 >107990908 >107990930 >107990936 >107993722
--Unsloth K2.5 model template compatibility issues with thinking tags:
>107989346 >107990072 >107990608 >107990654 >107993276
--Multi-GPU mixed architecture setup for concurrent model inference:
>107989299 >107989531 >107989562
--LLM's absurd reasoning for keeping characters alive in fictional scenarios:
>107989167 >107989251 >107989272
--Kimi's agent swarm feature and local adoption challenges:
>107988547 >107988563 >107988618 >107988654 >107989552
--GLM model's backtracking and coherence challenges in reasoning:
>107988322 >107988347 >107988387 >107988455 >107988487 >107988508
--TrueBase model's research value and accessibility challenges:
>107990774 >107990789 >107990942 >107991102 >107991115
--Teto, Rin, and Miku (free space):
>107986506 >107989902 >107993870 >107994537 >107997563

►Recent Highlight Posts from the Previous Thread: >>107986307

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/28/26(Wed)18:35:20 No.107998010

Anonymous 01/28/26(Wed)18:35:20 No.107998010

Do you guys have a separate GPU box/server for your LLM workloads or do you have the GPU in your main PC?

Anonymous
01/28/26(Wed)18:36:10 No.107998019

Anonymous 01/28/26(Wed)18:36:10 No.107998019

Satisfying a damaging caffeine addiction with Rin-chan and Miku

Anonymous
01/28/26(Wed)18:37:59 No.107998028

Anonymous 01/28/26(Wed)18:37:59 No.107998028

>>107998010
I have two, one is always on, and the other makes me cry whenever I see the electricity bill

Anonymous
01/28/26(Wed)18:43:18 No.107998068

Anonymous 01/28/26(Wed)18:43:18 No.107998068

>>107998019
I gave up on caffeine 2 years ago and I have no regrets.

Anonymous
01/28/26(Wed)18:43:21 No.107998070

Anonymous 01/28/26(Wed)18:43:21 No.107998070

>>107998028
How much we talking?

Anonymous
01/28/26(Wed)18:45:44 No.107998095

Anonymous 01/28/26(Wed)18:45:44 No.107998095

>>107998010
I have the GPUs connected to my main PC but saying that they are all "in" the PC is a stretch.

Anonymous
01/28/26(Wed)18:47:55 No.107998111

Anonymous 01/28/26(Wed)18:47:55 No.107998111

With an RTX 5090 and 64GB RAM, what realistically is the ceiling in terms of what I can do?

Anonymous
01/28/26(Wed)18:48:25 No.107998115

Anonymous 01/28/26(Wed)18:48:25 No.107998115

>>107998070
I'm obviously exaggerating, but $150 still looks scary in jpy when I'm used to 1/10 of that

Anonymous
01/28/26(Wed)18:50:50 No.107998132

Anonymous 01/28/26(Wed)18:50:50 No.107998132

>>107998010
I have a separate server and a 3090 in my main PC.

Anonymous
01/28/26(Wed)18:54:30 No.107998165

Anonymous 01/28/26(Wed)18:54:30 No.107998165

lmao I turned off search on Kimi's website and asked K2.5 a question and it tried to circumvent it by pip installing duckduckgo-search and finding the environment doesn't have internet connection, then it gave the answer.

Anonymous
01/28/26(Wed)19:01:00 No.107998221

Anonymous 01/28/26(Wed)19:01:00 No.107998221

yes and my regret is getting a board with only 5 pci-e slots instead of 7

Anonymous
01/28/26(Wed)19:02:28 No.107998232

Anonymous 01/28/26(Wed)19:02:28 No.107998232

>>107998115
>$150/mo of electricity
>when you could rent the hardware for less than that

Anonymous
01/28/26(Wed)19:05:21 No.107998260

Anonymous 01/28/26(Wed)19:05:21 No.107998260

File: 2494d955e2f1f12d84949e9b0(...).jpg (51 KB, 800x800)

51 KB JPG

>>107998221

Anonymous
01/28/26(Wed)19:05:38 No.107998263

Anonymous 01/28/26(Wed)19:05:38 No.107998263

>>107998232
but then it isnt local, it's just running the model in the cloud. i could just use the api at that point. the best part about running it locally is that i don't have to worry about my hardware suddenly being revoked or having it shared with other resources. actually no i lied, the best part to me personally is the fact that it doesn't require the internet at all.
if i run my model on somebody's else hardware that's like letting my wife sleep over at somebody's else house.

Anonymous
01/28/26(Wed)19:07:43 No.107998279

Anonymous 01/28/26(Wed)19:07:43 No.107998279

>>107998263
If you're running the model on a GPU instead of calculating the activations by hand, it's like letting your wife sleep over at somebody's else house.

Anonymous
01/28/26(Wed)19:08:11 No.107998284

Anonymous 01/28/26(Wed)19:08:11 No.107998284

>>107998263
Well, then it's time to invest in a solar panel

Anonymous
01/28/26(Wed)19:08:53 No.107998292

Anonymous 01/28/26(Wed)19:08:53 No.107998292

>>107998279
you're retarded and its a miracle that you haven't died from oxygen deprivation

Anonymous
01/28/26(Wed)19:08:54 No.107998293

Anonymous 01/28/26(Wed)19:08:54 No.107998293

>>107998232
No one runs local to save money, retard

Anonymous
01/28/26(Wed)19:12:49 No.107998328

Anonymous 01/28/26(Wed)19:12:49 No.107998328

>>107998292
Seems like you failed to find an argument

Anonymous
01/28/26(Wed)19:12:56 No.107998331

Anonymous 01/28/26(Wed)19:12:56 No.107998331

File: 1748729345610908.jpg (27 KB, 828x646)

27 KB JPG

>>107998293
I hope you're posting that from your $20K rig

Anonymous
01/28/26(Wed)19:13:27 No.107998335

Anonymous 01/28/26(Wed)19:13:27 No.107998335

>>107998293
Bargaining stage

Anonymous
01/28/26(Wed)19:16:36 No.107998358

Anonymous 01/28/26(Wed)19:16:36 No.107998358

>>107998068
I too have developed an immunity to caffeine and had to switch to cocaine.

Anonymous
01/28/26(Wed)19:16:53 No.107998362

Anonymous 01/28/26(Wed)19:16:53 No.107998362

>>107998331
>>107998335
>>>/g/aicg/

Anonymous
01/28/26(Wed)19:18:11 No.107998373

Anonymous 01/28/26(Wed)19:18:11 No.107998373

>>107998358
Just drink black tea

Anonymous
01/28/26(Wed)19:18:17 No.107998376

Anonymous 01/28/26(Wed)19:18:17 No.107998376

>>107998328
there's no argument to be made beyond the fact that i have complete control over my stack on a software and hardware level. i don't have to worry about having shit suddenly revoked all of a sudden because it's not reliant on any other services. my internet could go down tomorrow and i could still chat with my local model without any issues.
is it really that hard to believe that some people are willing to spend extra money to have reassurance that they have complete control over their shit?
i guess it might be for you considering your mental deficiency

Anonymous
01/28/26(Wed)19:19:52 No.107998392

Anonymous 01/28/26(Wed)19:19:52 No.107998392

>>107998358
I hope you're joking anon. I quit caffeine because it was fucking with my sleep and stressing me out.

Anonymous
01/28/26(Wed)19:20:45 No.107998400

Anonymous 01/28/26(Wed)19:20:45 No.107998400

File: miku monster man i could (...).png (1.17 MB, 1300x1456)

1.17 MB PNG

>>107998068
Happy for you Anon, hope to be like you one day

Anonymous
01/28/26(Wed)19:21:11 No.107998408

Anonymous 01/28/26(Wed)19:21:11 No.107998408

File: Untitled.jpg (131 KB, 1076x937)

131 KB JPG

>>107998376
>i have complete control over my stack on a software and hardware level
lol

Anonymous
01/28/26(Wed)19:21:37 No.107998411

Anonymous 01/28/26(Wed)19:21:37 No.107998411

File: 55-hour-energy-29812297.png (132 KB, 500x522)

132 KB PNG

>>107998392
Sleep is for the weak.

Anonymous
01/28/26(Wed)19:23:44 No.107998428

Anonymous 01/28/26(Wed)19:23:44 No.107998428

>>107998408
Update this for model weights.

Anonymous
01/28/26(Wed)19:26:08 No.107998443

Anonymous 01/28/26(Wed)19:26:08 No.107998443

>>107998408
reeeeeeeeee b-b-but intel ME and AMD PSP!!!!! IT CAN PHONE HOME THROUGH OTHER DEVICES THE CIA AND NSA KNOWS!!!
>on a LAN that doesn't have any WAN access. the network isn't even set up as a VLAN, it's just completely isolated through hardware.
sorry kid, nothing personal.

Anonymous
01/28/26(Wed)19:27:08 No.107998454

Anonymous 01/28/26(Wed)19:27:08 No.107998454

>>107998010
I headless cpumaxx with a dedicated LLM server. The GPU passthru htpc can alternatively run an art/music vm and my “main” pc is ewaste that just surfs and seeds. About to uprate the ewaste desktop with an Epyc rome and copious ddr4 so I have a batch job llm server in reserve

Anonymous
01/28/26(Wed)19:30:56 No.107998485

Anonymous 01/28/26(Wed)19:30:56 No.107998485

>>107998443
he didn't say that schizo kun

Anonymous
01/28/26(Wed)19:31:58 No.107998492

Anonymous 01/28/26(Wed)19:31:58 No.107998492

File: 1.jpg (659 KB, 1301x1873)

659 KB JPG

>>107998428
Courtesy of K2.5

Anonymous
01/28/26(Wed)19:33:41 No.107998505

Anonymous 01/28/26(Wed)19:33:41 No.107998505

>>107998485
demons are made from sand? i thought that was golems.

Anonymous
01/28/26(Wed)19:35:19 No.107998523

Anonymous 01/28/26(Wed)19:35:19 No.107998523

>>107998505
Sand was the media on which the sigils were etched.

Anonymous
01/28/26(Wed)19:35:38 No.107998527

Anonymous 01/28/26(Wed)19:35:38 No.107998527

>>107998392
There's no reason for it to fuck with your sleep if you stop taking it 5 hours before bed to give it time to pass through your system. As for stressing you out, you know can always take less? It doesn't have to be either 0 or 1000mg.

Anonymous
01/28/26(Wed)19:36:09 No.107998533

Anonymous 01/28/26(Wed)19:36:09 No.107998533

>>107998523
snore. wake me up when they draw the sigils using children's blood.

Anonymous
01/28/26(Wed)19:55:15 No.107998675

Anonymous 01/28/26(Wed)19:55:15 No.107998675

>>107998010
Main machine for now. I actually have two video cards for VFIO when I want to play games in Windows. If I ever get a job again maybe I'll rebuild my NAS and move my a4500 over there.

Anonymous
01/28/26(Wed)19:55:26 No.107998676

Anonymous 01/28/26(Wed)19:55:26 No.107998676

>>107998492
schizokino

Anonymous
01/28/26(Wed)20:38:20 No.107998984

Anonymous 01/28/26(Wed)20:38:20 No.107998984

LMArena is now https://arena.ai/

Anonymous
01/28/26(Wed)20:42:45 No.107999012

Anonymous 01/28/26(Wed)20:42:45 No.107999012

>>107998392
Chinese tea is WAY better and gives you just the right dose of caffeine. There is a black tea that tastes similar to orange, very delicious.

Anonymous
01/28/26(Wed)20:51:13 No.107999073

Anonymous 01/28/26(Wed)20:51:13 No.107999073

I get about 10-15 t/s with GLM 4.7 IQ3 with 120gb VRAM and 64gb RAM. Max GPU layers with some experts offloaded. If I get more ram to run a larger quant, what can I expect in terms of ts and pp?

Anonymous
01/28/26(Wed)21:04:48 No.107999172

Anonymous 01/28/26(Wed)21:04:48 No.107999172

File: __hatsune_miku_vocaloid_d(...).jpg (3.2 MB, 2173x3259)

3.2 MB JPG

>>107998260
What is the performance on something like that? I know loading the model will be slow but what about after?
I am trying to decide if I should go crazy and buy some Tesla V100 with the sxm2 interface and get a backplane to support four of them or trying to go even cheaper and try a few CMP 100-210. It is I believe the same gpu and it would work with a pcie rig like that.
I know they were gimped a bit for mining but they should still work.
Going CPU + RAM would probably be the sane option but I like the idea of a monster made out of old parts.

Anonymous
01/28/26(Wed)21:08:01 No.107999192

Anonymous 01/28/26(Wed)21:08:01 No.107999192

>>107999073
PP is compute bound, TG is memory bandwidth bound.
You'll need to transfer more data, so it'll be slower. I'd expect TG loss to be proportional to the size of the weights in whatever quant you end up with. I don't think it'd affect PP that much, if at all.

Anonymous
01/28/26(Wed)21:09:41 No.107999208

Anonymous 01/28/26(Wed)21:09:41 No.107999208

>>107997948
Is coffee good for you?

Anonymous
01/28/26(Wed)21:12:54 No.107999228

Anonymous 01/28/26(Wed)21:12:54 No.107999228

>>107999192
>TG loss to be proportional to the size of the weights in whatever quant you end up with.
As in a 2x reduction in ts with a doubling of file size? So Q6 would be roughtly around 5-7 t/s, correct?

Anonymous
01/28/26(Wed)21:15:41 No.107999250

Anonymous 01/28/26(Wed)21:15:41 No.107999250

>>107999073
I have Q5 at 3-4t/s with 96VRAM and 256GB RAM

Anonymous
01/28/26(Wed)21:19:43 No.107999287

Anonymous 01/28/26(Wed)21:19:43 No.107999287

>>107999250
I assume full GPU layers with lots of experts offloaded, correct?

Anonymous
01/28/26(Wed)21:21:11 No.107999301

Anonymous 01/28/26(Wed)21:21:11 No.107999301

>>107999287
Yes. 2-3 t/s without manually assigned layers. 4x3090, 8 channel DDR4-3200

Anonymous
01/28/26(Wed)21:27:14 No.107999351

Anonymous 01/28/26(Wed)21:27:14 No.107999351

>>107999228
Ideally. But consider that a bigger proportion of the whole model will be in RAM, so it's probably going to be lower than that.

Anonymous
01/28/26(Wed)21:37:34 No.107999408

Anonymous 01/28/26(Wed)21:37:34 No.107999408

File: gpu speedup.jpg (224 KB, 1536x1152)

224 KB JPG

>>107999228
No

Anonymous
01/28/26(Wed)21:39:30 No.107999423

Anonymous 01/28/26(Wed)21:39:30 No.107999423

>>107999228
>>107999301
>>107999351
Well, I was going to FOMO into a 256gb RAM kit because the thought of running larger quants of GLM and cope quants of DeepSeek made me tingle, but I really don't want to have to deal with sub 10 ts.

Maybe instead I'll pick up another 64gb ram kit for 750$ instead (and hope I can get expo speeds kek) and just stick with Q4/Q5.

Anonymous
01/28/26(Wed)21:41:12 No.107999434

Anonymous 01/28/26(Wed)21:41:12 No.107999434

>>107999408
If I understand, this is for traditional GPU offloading and NOT offloading experts, yes?

Anonymous
01/28/26(Wed)21:41:31 No.107999437

Anonymous 01/28/26(Wed)21:41:31 No.107999437

>>107999408
That's really deceptive, it should be applied to active parameters, not all parameters. Some are always active and live on gpu while some are usually cold and live on cpu

Anonymous
01/28/26(Wed)21:47:30 No.107999480

Anonymous 01/28/26(Wed)21:47:30 No.107999480

So which of the dozen rocinante versions is the least cucked?

Anonymous
01/28/26(Wed)21:48:44 No.107999487

Anonymous 01/28/26(Wed)21:48:44 No.107999487

>>107999480
>rocinante
lol go away drummer

Anonymous
01/28/26(Wed)22:04:33 No.107999601

Anonymous 01/28/26(Wed)22:04:33 No.107999601

File: Base Image.png (632 KB, 1196x2136)

632 KB PNG

Beyond Speedup -- Utilizing KV Cache for Sampling and Reasoning
https://arxiv.org/abs/2601.20326
>KV caches, typically used only to speed up autoregressive decoding, encode contextual information that can be reused for downstream tasks at no extra cost. We propose treating the KV cache as a lightweight representation, eliminating the need to recompute or store full hidden states. Despite being weaker than dedicated embeddings, KV-derived representations are shown to be sufficient for two key applications: \textbf{(i) Chain-of-Embedding}, where they achieve competitive or superior performance on Llama-3.1-8B-Instruct and Qwen2-7B-Instruct; and \textbf{(ii) Fast/Slow Thinking Switching}, where they enable adaptive reasoning on Qwen3-8B and DeepSeek-R1-Distil-Qwen-14B, reducing token generation by up to with minimal accuracy loss. Our findings establish KV caches as a free, effective substrate for sampling and reasoning, opening new directions for representation reuse in LLM inference.
https://github.com/cmd2001/ICLR2026_KV-Embedding
neat

Anonymous
01/28/26(Wed)22:05:52 No.107999611

Anonymous 01/28/26(Wed)22:05:52 No.107999611

>>107999487
answer my question retard

Anonymous
01/28/26(Wed)22:07:29 No.107999616

Anonymous 01/28/26(Wed)22:07:29 No.107999616

>>107999480
It's impossible to say because Drummer does not add any release notes or any else information for that matter.

Anonymous
01/28/26(Wed)22:09:03 No.107999627

Anonymous 01/28/26(Wed)22:09:03 No.107999627

>>107999616
Is there no community schizo who does actual human use testing? I doubt UGI is much differrent from synth benches.

Anonymous
01/28/26(Wed)22:10:39 No.107999634

Anonymous 01/28/26(Wed)22:10:39 No.107999634

File: Base Image.png (352 KB, 1284x1224)

352 KB PNG

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery
https://arxiv.org/abs/2601.20088
>This technical report presents quantization-aware distillation (QAD) and our best practices for recovering accuracy of NVFP4-quantized large language models (LLMs) and vision-language models (VLMs). QAD distills a full-precision teacher model into a quantized student model using a KL divergence loss. While applying distillation to quantized models is not a new idea, we observe key advantages of QAD for today's LLMs: 1. It shows remarkable effectiveness and stability for models trained through multi-stage post-training pipelines, including supervised fine-tuning (SFT), reinforcement learning (RL), and model merging, where traditional quantization-aware training (QAT) suffers from engineering complexity and training instability; 2. It is robust to data quality and coverage, enabling accuracy recovery without full training data. We evaluate QAD across multiple post-trained models including AceReason Nemotron, Nemotron 3 Nano, Nemotron Nano V2, Nemotron Nano V2 VL (VLM), and Llama Nemotron Super v1, showing consistent recovery to near-BF16 accuracy.
might have been posted earlier but this is the arxiv version. seems good
also regarding the caffeine talk earlier I can recommend switching to paraxanthine as a stimulant. good stuff.

Anonymous
01/28/26(Wed)22:12:57 No.107999649

Anonymous 01/28/26(Wed)22:12:57 No.107999649

>>107999627
Some people test them but I have no idea what is the latest version and what his releases even mean in this sense.
The way I'm thinking: it's a waste of time to even test them when no model cards exist.

Anonymous
01/28/26(Wed)22:14:57 No.107999667

Anonymous 01/28/26(Wed)22:14:57 No.107999667

File: Success.png (220 KB, 888x1274)

220 KB PNG

Been working on the epub thing again. I think I have it solved. Using ollama (i know, but llama.cpp is a mess right now and doesn't really support mtmd all that well) and a custom vibe-coded D wrapper for ollama using Claude, was able to load maternion/lightonocr-2:latest into ollama (the model has similar output quality to Chandra, only with much more performance) . Using deterministic / heuristic to crop the graphs from an image file since AI struggles a lot with that. Then replace those graphs in the image with black boxes and high contrast anchor texts, and have the LLM automatically insert the right graph image file. Then take the resulting markdown/LaTeX file and convert it to EPUB using pandoc.

I haven't done this on the entirety of the book yet, only a single page for now, although I did try the pipeline with a small extract (6 or 7 pages). Good results.
Now I just have to update the pipeline to include post-processing to remove the anchor tags. Idk why but the model likes to add "Figure x" to image tags, but probably that can be fixed in post-processing.

Put the test EPUB onto my E-Reader, it works. And thus spoke Zarathustra: "I am blessed, for I don't have to deal with fucking PDFs for much longer!"

Anonymous
01/28/26(Wed)22:36:52 No.107999802

Anonymous 01/28/26(Wed)22:36:52 No.107999802

I'm downloading all trinities. Expect results in a couple of hours.

Anonymous
01/28/26(Wed)22:37:26 No.107999808

Anonymous 01/28/26(Wed)22:37:26 No.107999808

>>107999802
theyre all shit. saved you the wait.

Anonymous
01/28/26(Wed)22:37:51 No.107999810

Anonymous 01/28/26(Wed)22:37:51 No.107999810

>>107999808
You don't know that.

Anonymous
01/28/26(Wed)22:40:17 No.107999826

Anonymous 01/28/26(Wed)22:40:17 No.107999826

>>107999810
pretty sure i do. i only care about cooming.

Anonymous
01/28/26(Wed)23:47:04 No.108000166

Anonymous 01/28/26(Wed)23:47:04 No.108000166

File: 1757423666241.png (122 KB, 659x659)

122 KB PNG

https://github.com/ikawrakow/ik_llama.cpp/pull/1131#issuecomment-3815435157
https://github.com/SneedwareInc/ik_SillyTavern
I've opened Visual Studio and added banned strings and regex support to SillyTavern. Banned strings should be compatible with @firecoperana's PR too. I've also enabled TFS, but you are probably less excited about that. The code is 100% written by me, no vibecoding this time. Please stress test it and report bugs!

Anonymous
01/28/26(Wed)23:50:32 No.108000188

Anonymous 01/28/26(Wed)23:50:32 No.108000188

>>107999667
You can see how E = MC2 + AI is working. You created the future.

Anonymous
01/29/26(Thu)00:00:31 No.108000236

Anonymous 01/29/26(Thu)00:00:31 No.108000236

>>107997003
Distribution shift

Anonymous
01/29/26(Thu)00:03:41 No.108000248

Anonymous 01/29/26(Thu)00:03:41 No.108000248

>>108000166
kek

Anonymous
01/29/26(Thu)00:28:03 No.108000337

Anonymous 01/29/26(Thu)00:28:03 No.108000337

>>108000188
Anon, thanks for the compliment but I have no idea what "E = MC2 + AI" is supposed to mean. Is is a joke on the equation "e = mc**2" by Einstein?
As for creating the future, do we not all whilst being alive?

Anonymous
01/29/26(Thu)00:32:15 No.108000348

Anonymous 01/29/26(Thu)00:32:15 No.108000348

>>107999802
Personally I would like to know about the "truebase" version, hopefully it's good. The other base version is probably full of slop too, but who knows. I already tried the instruct preview on openrouter and it's pretty much what you'd expect.

Anonymous
01/29/26(Thu)00:37:27 No.108000369

Anonymous 01/29/26(Thu)00:37:27 No.108000369

>>107999802
Where are the results?

Anonymous
01/29/26(Thu)00:37:28 No.108000370

Anonymous 01/29/26(Thu)00:37:28 No.108000370

Whatever Arcee does is always garbage. When will you learn your lesson old man?

Anonymous
01/29/26(Thu)01:04:51 No.108000481

Anonymous 01/29/26(Thu)01:04:51 No.108000481

>>108000166
>I've opened Visual Studio
kek

Anonymous
01/29/26(Thu)01:07:16 No.108000495

Anonymous 01/29/26(Thu)01:07:16 No.108000495

>>107999627
>Is there no community schizo who does actual human use testing?
ironically drummer does. results vary based on how horny the testers are

Anonymous
01/29/26(Thu)01:09:42 No.108000504

Anonymous 01/29/26(Thu)01:09:42 No.108000504

>>108000495
>ironically drummer does
Ok nigga but if you keep the model cards empty it's useless. inb4 join muh discord

Anonymous
01/29/26(Thu)02:02:55 No.108000735

Anonymous 01/29/26(Thu)02:02:55 No.108000735

File: wait a second....png (1.35 MB, 1024x1024)

1.35 MB PNG

>>108000166

Anonymous
01/29/26(Thu)02:16:55 No.108000793

Anonymous 01/29/26(Thu)02:16:55 No.108000793

erm so what's better anons, GLM 4.7 flash or GLM 4.5 air?

Anonymous
01/29/26(Thu)02:22:23 No.108000815

Anonymous 01/29/26(Thu)02:22:23 No.108000815

>>108000793
>106b, 12b active vs. 30b, 3b active.
Hmmm, I dont know anon, just to be safe download both!

Anonymous
01/29/26(Thu)02:36:01 No.108000878

Anonymous 01/29/26(Thu)02:36:01 No.108000878

>>108000793
Better for what? Flash is good for agentic coding, Air sucks at everything

Anonymous
01/29/26(Thu)02:44:14 No.108000921

Anonymous 01/29/26(Thu)02:44:14 No.108000921

>>108000166
>vibecoded trash
>having a separate field to make shit case insensitive instead of a checkbox
>not just accepting the regex format with /xx/i
LOL
kys brah, youre a nocoder

llama.cpp CUDA dev !!yhbFjk57TDr
01/29/26(Thu)02:59:20 No.108000983

llama.cpp CUDA dev !!yhbFjk57TDr 01/29/26(Thu)02:59:20 No.108000983

>>107999434
>>107999437
I made that plot for dense models but if you prioritize dense weights for a MoE model you should still end up with 2x the same shape stitched together.

Anonymous
01/29/26(Thu)03:05:31 No.108001010

Anonymous 01/29/26(Thu)03:05:31 No.108001010

File: intro_performance.png (112 KB, 2580x728)

112 KB PNG

https://huggingface.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct
>Mask Diffusion Language Model
>Public datasets, synthetic data
>Context Length: 8192

Anonymous
01/29/26(Thu)03:26:30 No.108001095

Anonymous 01/29/26(Thu)03:26:30 No.108001095

>>108000983
When are we getting native mtmd support + llama-server?

llama.cpp CUDA dev !!yhbFjk57TDr
01/29/26(Thu)03:28:26 No.108001101

llama.cpp CUDA dev !!yhbFjk57TDr 01/29/26(Thu)03:28:26 No.108001101

>>108001095
What do you mean by "native mtmd"?

Anonymous
01/29/26(Thu)03:28:47 No.108001106

Anonymous 01/29/26(Thu)03:28:47 No.108001106

>>108001010
>8k context
WHAT THE FUCK when I start up cline it consumes minimum 20k~ input tokens to just read the relevant part of my codebase lmao.

Anonymous
01/29/26(Thu)03:29:12 No.108001109

Anonymous 01/29/26(Thu)03:29:12 No.108001109

>>108001010
I want diffusion models to succeed. The ability to backtrack and fix previous mistakes is worth higher compute requirements.

Anonymous
01/29/26(Thu)03:34:42 No.108001139

Anonymous 01/29/26(Thu)03:34:42 No.108001139

>We're putting out three variants: Trinity-Large-Preview is lightly post-trained and chat-ready, Trinity-Large-Base is our best pretraining checkpoint after the full 17T recipe, and TrueBase is an early checkpoint from the same run at 10T tokens, without any instruct data or LR anneals. What many would consider a true base model.
Real talk how much would it cost to train this 400B MoE to become a good RP model? It seems like the pretraining is done for us, we would just need to train on novels & RP datasets. Would want it to be able to handle large context lengths so it might be a little more expensive then usual too. What's the damage?

Anonymous
01/29/26(Thu)03:37:11 No.108001152

Anonymous 01/29/26(Thu)03:37:11 No.108001152

>>108001101
Currently if you want to run a vision model in llama.cpp, as far as I'm aware, you need to run llama-mtmd-cli, as for some reason it doesn't work in llama-server, and manually specify the GGUF + MMPROJ. At least that was my last knowledge of this (see PR #17400, https://github.com/ggml-org/llama.cpp/pull/17400, which is still open). This makes working with OCR type models cumbersome in llama.cpp, since you basically have to invoke llama-mtmd-cli.
Meanwhile in ollama I can just have it run on localhost and use the API to access the model and everything, which is much easier to work with.

Not sure if I'm just not enough into llama.cpp to understand how to work with it, but right now ollama luckily covers my needs. Then again, I would like the performance of llama.cpp (e.g. llama.cpp had no problems running Chandra at Q8, while ollama constantly crashed...might have been weirdness with GGUF, but with llama.cpp it worked, so I suspect ollama).

Anonymous
01/29/26(Thu)03:38:38 No.108001164

Anonymous 01/29/26(Thu)03:38:38 No.108001164

>>108001139
It cost them $20 million all in to get where they are now, so probably less than that.

Anonymous
01/29/26(Thu)03:39:59 No.108001172

Anonymous 01/29/26(Thu)03:39:59 No.108001172

>>108001109
Wouldn't recursive autoregressive models also do that in latent space without inflating model size to have a very large number of layers? They could also implement early exit to speed up inference.

Anonymous
01/29/26(Thu)03:42:26 No.108001185

Anonymous 01/29/26(Thu)03:42:26 No.108001185

>>108001164
Also I don't have great expectations for their true base model since they trained it on rewritten data. I expect the slop will be baked in at a molecular level.

Anonymous
01/29/26(Thu)03:44:16 No.108001197

Anonymous 01/29/26(Thu)03:44:16 No.108001197

>>108001172
I haven't read on recursive autoregressive models yet.

Anonymous
01/29/26(Thu)03:47:27 No.108001216

Anonymous 01/29/26(Thu)03:47:27 No.108001216

>>108001197
A couple examples
https://arxiv.org/abs/2510.25741
https://arxiv.org/abs/2502.05171

Anonymous
01/29/26(Thu)03:53:19 No.108001240

Anonymous 01/29/26(Thu)03:53:19 No.108001240

>>108001139
>It seems like the pretraining is done for us, we would just need to train on novels & RP datasets
A properly made model would be pretrained together with novels/books/RP; if you have to add that after the fact, it has to basically be continued pretraining together with the same mixture used for pretraining, which is lengthy and expensive and not a fast drummer-style finetune.

Anonymous
01/29/26(Thu)04:05:08 No.108001289

Anonymous 01/29/26(Thu)04:05:08 No.108001289

>>108001152
You can pass --mmproj on the server too as far as I recall

Anonymous
01/29/26(Thu)04:06:39 No.108001295

Anonymous 01/29/26(Thu)04:06:39 No.108001295

>>107998111
TTS
toy LLMs
imagen and videogen
You could create your own "Alexa" if you buy mics and other stuffs (perhaps a Zigbee/Thread dongle) or generate images and videos (they will be highly sloppy, unless you're skilled and spend too much time gening them). I can't think of something else to be honest.

Anonymous
01/29/26(Thu)04:09:56 No.108001319

Anonymous 01/29/26(Thu)04:09:56 No.108001319

>>108001240
Pretraining is done at low/mid context length (theirs was 8192). The reason I say novels is because it's high quality long coherent data that could finetune the model to learn large context lengths, and not just benchmax NITHS

I'm aware this would be more expensive. And it's not a problem if the data is in the pretraining, you'd be a fool to throw out a good book because the model has read an excerpt from it 2T tokens ago, especially if you're finetuning it to learn long context length at the same time.

Anonymous
01/29/26(Thu)04:10:06 No.108001320

Anonymous 01/29/26(Thu)04:10:06 No.108001320

File: file.png (647 KB, 720x999)

647 KB PNG

>>108000337

Anonymous
01/29/26(Thu)04:19:58 No.108001367

Anonymous 01/29/26(Thu)04:19:58 No.108001367

melon

Anonymous
01/29/26(Thu)04:24:34 No.108001395

Anonymous 01/29/26(Thu)04:24:34 No.108001395

>>108001320
yikes... talk about izzat loss

Anonymous
01/29/26(Thu)04:26:55 No.108001402

Anonymous 01/29/26(Thu)04:26:55 No.108001402

>>108001319
Lately the larger AI companies are doing pretraining at 16k or 32k tokens. Gemma 3 was pretrained with 32k context, for example.
When used, since they're considered high-quality data, books are generally upsampled quite a lot (at least 3-4 epochs if not more), they're not just seen once.
For long context it's not strictly necessary to have single complete long documents; you can also pack several into one long sample, and that also works toward improving long-context capabilities. Having long coherent samples for that of course helps, but the main issue is that even with them there is still the "lost in the middle" syndrome without targeted data augmentations. And long-context performance is also context-dependent, so if you want long coherent chats you need data of that sort.

Anonymous
01/29/26(Thu)04:34:47 No.108001448

Anonymous 01/29/26(Thu)04:34:47 No.108001448

File: cockbench.png (2.1 MB, 1131x7248)

2.1 MB PNG

>>108000369
This is the best distribution yet. It's flatter than any other model and it all revolves around cock. No "thighs", "hips", "skin", etc.
And that's the instruct slop variant.

Anonymous
01/29/26(Thu)04:42:58 No.108001473

Anonymous 01/29/26(Thu)04:42:58 No.108001473

what for vramlet porn? still nemo?

Anonymous
01/29/26(Thu)04:43:08 No.108001475

Anonymous 01/29/26(Thu)04:43:08 No.108001475

>>108001152
The last time I checked the llama.cpp HTTP server had support for vision models on par with mtmd-cli.
If there is something that doesn't work you should open a Github issue and notify ngxson as he is the one maintaining that part of the codebase.

Anonymous
01/29/26(Thu)04:44:44 No.108001479

Anonymous 01/29/26(Thu)04:44:44 No.108001479

>>108001473
API

Anonymous
01/29/26(Thu)04:45:41 No.108001484

Anonymous 01/29/26(Thu)04:45:41 No.108001484

>>108001479
>paying for porn
lol

Anonymous
01/29/26(Thu)04:48:58 No.108001497

Anonymous 01/29/26(Thu)04:48:58 No.108001497

>>108001473
Ministral 3 14B seemed promising for that, but you must use it at a low temperature (around 0.25 or even less than that. Mistral is recommending less than 0.1, go figure) or it's completely retarded, not unlike pre-LLaMA LLMs used to be. I wonder what's up with its token distribution.

Anonymous
01/29/26(Thu)04:52:24 No.108001514

Anonymous 01/29/26(Thu)04:52:24 No.108001514

>>108001448
nice
slop version cant be helped at this point
they said minimal just to teach it multiturn chat and it looks like they meant it

Anonymous
01/29/26(Thu)04:56:42 No.108001533

Anonymous 01/29/26(Thu)04:56:42 No.108001533

>>108001289
>>108001475
Thanks anons, I will try that eventually. For now I'll just use ollama until I get everything to work the way I want. Maybe after that I'll go back to llama.cpp for that juicy performance. Of course only if llama.cpp at that point in time is usable, which is not always the case ...

>>108001497
>what's up with its token distribution
It's french.

Anonymous
01/29/26(Thu)05:00:57 No.108001553

Anonymous 01/29/26(Thu)05:00:57 No.108001553

>>108001152
>>108001289
yep this is how i do it for gemma
--mmproj /goofs/gemma3-27b-mmproj-model-f16.gguf

have to use the cli for qwen2audio tho

Anonymous
01/29/26(Thu)05:05:05 No.108001566

Anonymous 01/29/26(Thu)05:05:05 No.108001566

>>108001553
Interesting. Is there any documentation for the multimodal models stuff? Couldn't find any. Maybe I just looked in wrong location...
>/goofs/
heh.

Anonymous
01/29/26(Thu)05:09:11 No.108001581

Anonymous 01/29/26(Thu)05:09:11 No.108001581

>>108000504
>Ok nigga but if you keep the model cards empty it's useless. inb4 join muh discord
not drummer i was just saying he does seem to host them and ask for feedback
idk why he's so all over the place
i liked one of his models ages ago but every other one I've tried can't keep track of details between turns.

Anonymous
01/29/26(Thu)05:11:05 No.108001589

Anonymous 01/29/26(Thu)05:11:05 No.108001589

>>108001484
>paying for hardware to generate porn, achieving even lower occupancy
LMAO

Anonymous
01/29/26(Thu)05:17:08 No.108001612

Anonymous 01/29/26(Thu)05:17:08 No.108001612

>>108001566
>heh.
kek

>Is there any documentation for the multimodal models stuff?
really you can just cp/paste what I sent.
bart provides mmproj eg https://huggingface.co/bartowski/google_gemma-3-12b-it-qat-GGUF

you can cp/paste images into the llama-server webui or openwebui and it'll just work
never go below fp16 for the mmproj tho or it gets retarded

also gemini and opus can help if you
./llama-server --help |nc termbin.com 9999
click the link, cp-paste full text into claude/gemini and ask it

Anonymous
01/29/26(Thu)05:17:21 No.108001614

Anonymous 01/29/26(Thu)05:17:21 No.108001614

File: file.png (3 KB, 384x41)

3 KB PNG

Why is goofing so slow?

Anonymous
01/29/26(Thu)05:18:53 No.108001620

Anonymous 01/29/26(Thu)05:18:53 No.108001620

>>108001106
>WHAT THE FUCK when I start up cline it consumes minimum 20k~ input tokens to just read the relevant part of my codebase lmao.
those are qwen tokens or whatever
are these diffcoder tokens more efficient?

Anonymous
01/29/26(Thu)05:22:23 No.108001633

Anonymous 01/29/26(Thu)05:22:23 No.108001633

>>108001612
A horrible way to document something, but alright. Thanks anon. Maybe this will allow me to make my workflow more efficient, for I need fast inference for what I'm doing.

Anonymous
01/29/26(Thu)05:57:19 No.108001792

Anonymous 01/29/26(Thu)05:57:19 No.108001792

>>108001448
Damn, that's looking pretty good.Now I just need to figure out how to run it.

Anonymous
01/29/26(Thu)06:21:46 No.108001899

Anonymous 01/29/26(Thu)06:21:46 No.108001899

File: file.png (219 KB, 1330x550)

219 KB PNG

Has anybody else tried Trinity Large Preview from Arcee? 400B MoE model trained from scratch by a US lab on 20 trillion tokens released with Apache 2.0 license. Base and instruct weights available. Pretty well uncensored for roleplay as far as I can tell. Big model smarts. Free on OpenRouter or https://chat.arcee.ai/chat

Anonymous
01/29/26(Thu)06:22:29 No.108001903

Anonymous 01/29/26(Thu)06:22:29 No.108001903

File: gagaool.jpg (6 KB, 166x303)

6 KB JPG

>>107999073
>GLM
I just de-purple prose Qwen 3 via injecting enough Min-P to kill microsoft copilot. Give me your GLM settings that stops the parroting, or eat shit.
>If I get more ram to run a larger quant, what can I expect in terms of ts and pp?
Slightly faster if its VRAM, but barely worth it in terms of speed. The best speed is full VRAM. The more of the model that is in VRAM, the faster it is. Then you worry about the TOPS of the GPUs itself if you want to go even higher, for whatever ungodly reason. I hope you're buying Blackwells or else your power bill is going to skyrocket.

Anonymous
01/29/26(Thu)06:22:58 No.108001905

Anonymous 01/29/26(Thu)06:22:58 No.108001905

>>108001899
buy an ad

Anonymous
01/29/26(Thu)06:23:17 No.108001907

Anonymous 01/29/26(Thu)06:23:17 No.108001907

>>108001899
Crazy how some people will just barge in the thread and ask something without even reading the first post above theirs.

Anonymous
01/29/26(Thu)06:23:32 No.108001909

Anonymous 01/29/26(Thu)06:23:32 No.108001909

>>107999667
>but llama.cpp is a mess right now and doesn't really support mtmd all that well
You're projecting problems that are only in your head. Are you aware of what that says about you?

Anonymous
01/29/26(Thu)06:25:07 No.108001917

Anonymous 01/29/26(Thu)06:25:07 No.108001917

>>108001907
fuck off nigger i read that shit. i was asking if any of you tards had used it
>>108001905
buy deez nuts SUCKAH!!

Anonymous
01/29/26(Thu)06:33:04 No.108001949

Anonymous 01/29/26(Thu)06:33:04 No.108001949

>>108001101
>What do you mean by "native mtmd"?
He's hallucinating problems because he's a fucking idiot.

Anonymous
01/29/26(Thu)06:38:58 No.108001981

Anonymous 01/29/26(Thu)06:38:58 No.108001981

>>108001903
>Give me your GLM settings that stops the parroting, or eat shit.
I don't have settings that stop the parroting, although its not bad if you don't allow it to parrot in the first place. It's by far the best model I've used, but I haven't used many larger models since I've upgraded my hardware. Next on the list is minimax.
>The more of the model that is in VRAM, the faster it is.
Obviously yes, but I wanted to get concrete numbers on speeds before I decide to throw thousands of more dollars at more RAM.
>I hope you're buying Blackwells
Yes. I want more but doing so would be financially unwise.

Anonymous
01/29/26(Thu)07:00:50 No.108002123

Anonymous 01/29/26(Thu)07:00:50 No.108002123

File: file.png (314 KB, 1203x869)

314 KB PNG

>>108001448
Not what I expected.
Here's an interesting collage of all models that use this same line.

TrueBase is up next.

Anonymous
01/29/26(Thu)07:05:15 No.108002142

Anonymous 01/29/26(Thu)07:05:15 No.108002142

>>108002123
>Both trinity base and intellect 3 default to you having a small dong
Kek

Anonymous
01/29/26(Thu)07:06:21 No.108002145

Anonymous 01/29/26(Thu)07:06:21 No.108002145

>>108002142
It's flaccid.

Anonymous
01/29/26(Thu)07:10:40 No.108002167

Anonymous 01/29/26(Thu)07:10:40 No.108002167

>>108002145
cope

Anonymous
01/29/26(Thu)07:11:31 No.108002172

Anonymous 01/29/26(Thu)07:11:31 No.108002172

File: who.png (27 KB, 155x157)

27 KB PNG

Anonymous
01/29/26(Thu)07:13:54 No.108002180

Anonymous 01/29/26(Thu)07:13:54 No.108002180

File: file.png (135 KB, 500x500)

135 KB PNG

when i ask an ai a question why does it just tell me what some random said on reddit instead of piecing together a real answer through the technical documents, research papers, and principals of math and science that are no doubt baked into it

Anonymous
01/29/26(Thu)07:17:26 No.108002198

Anonymous 01/29/26(Thu)07:17:26 No.108002198

>>108002180
turn off the web search tool

Anonymous
01/29/26(Thu)07:19:12 No.108002207

Anonymous 01/29/26(Thu)07:19:12 No.108002207

>>108002180
Because you phrase your question like a retard asking on reddit rather than an academic paper, so it finds the most likely text completion to be fellow retards rather than academic discourse.

Anonymous
01/29/26(Thu)07:37:53 No.108002290

Anonymous 01/29/26(Thu)07:37:53 No.108002290

uhh so I need a new laptop. obviously will be limited for local AI, but what should I get to make the most out of it? for example 64gb ram + integrated graphics vs 32gb ram + dedicated gpu. I rather use big models slow than small models fast. shall I just get a macbook then?

Anonymous
01/29/26(Thu)07:40:58 No.108002305

Anonymous 01/29/26(Thu)07:40:58 No.108002305

>>108002290
Either a macbook or one of those amd 128GB ai max something devices.

Anonymous
01/29/26(Thu)07:45:37 No.108002327

Anonymous 01/29/26(Thu)07:45:37 No.108002327

File: we_get_more.png (34 KB, 110x152)

34 KB PNG

So when using moes, I can get a model that is way bigger than my vram because most of it will sit in ram?

Anonymous
01/29/26(Thu)07:47:57 No.108002336

Anonymous 01/29/26(Thu)07:47:57 No.108002336

>>108002142
skill issue.
{{user}}'s dashing shaft is a sight to behold!

Anonymous
01/29/26(Thu)07:56:43 No.108002388

Anonymous 01/29/26(Thu)07:56:43 No.108002388

>>108002327
You can but it will be slower, in my experience not as much slower as dense models get when offloading to the cpu though.

Anonymous
01/29/26(Thu)08:16:44 No.108002513

Anonymous 01/29/26(Thu)08:16:44 No.108002513

>>108002180
You're asking like a retard so it gives you a retarded answer.
Tell your AI to roleplay as scientist who loves reading "technical documents, research papers, and principals of math and science" before asking anything and you will likely get a non reddit response.

Anonymous
01/29/26(Thu)08:29:01 No.108002598

Anonymous 01/29/26(Thu)08:29:01 No.108002598

>>108002123
>TrueBase is up next.
I ran out of space but hf cli is retarded and now it's downloading the files that were in progress again for some reason.
I should switch to wget like anon from last thread.

Anonymous
01/29/26(Thu)09:12:45 No.108002893

Anonymous 01/29/26(Thu)09:12:45 No.108002893

>>108002207
>>108002513
Funny how many forget this when they roleplay/code with their lowcase ESL prose and expect good results.

Anonymous
01/29/26(Thu)09:17:04 No.108002916

Anonymous 01/29/26(Thu)09:17:04 No.108002916

>>108000921
Dude, you are the prime example of Dunning-Kruger, you are the peak midwit who ever midwitted and who thinks he is a genius. Let me break this down for you:
>>having a separate field to make shit case insensitive instead of a checkbox
What about capital letters? What if you want to ban "Oh," and "Ah,", which llms like to start their replies with, but not "oh," and "ah," in the middle? Do you think it's a good idea to not have separate fields? Did you think I didn't think of a checkbox? This was a deliberate choice, not me being a vibecoder.
>>not just accepting the regex format with /xx/i
This confirms that you only have a vague idea what you are talking about. C++ regex does not support case insensitivity in this way, it has to be provided as a separate flag parameter to the regex constructor. In C++, you use:
std::regex pattern("your_pattern", std::regex_constants::icase);
The /pattern/flags syntax is a Perl/JavaScript convention, not a universal regex standard. C++ std::regex, POSIX regex, and many other implementations use flag parameters or separate function calls. What was I supposed to do, parse a JavaScript-style regex literal string, extract the flags, then reconstruct it for the C++ regex engine? Add extra complexity and potential bugs just so your smooth brain doesn't have to look at an extra input field?
>kys brah, youre a nocoder
Clearly the only one who hasn't coded anything useful here is (You). Want to dispute this fact? Go to github right fucking now and clean up my code so it can get accepted quicker. What's that? You won't? That's what I thought, fucking retard.

Anonymous
01/29/26(Thu)09:19:26 No.108002930

Anonymous 01/29/26(Thu)09:19:26 No.108002930

>>108002916
did you copy paste that from an llm? lmao
imagine being unable to write or use one of the many wrappers to have a PERL (the only real implementation btw) compatible regex pattern.
even regex101 accepts for the other engines the PERL pattern, you're just a coper.
kys codelet

Anonymous
01/29/26(Thu)09:20:38 No.108002935

Anonymous 01/29/26(Thu)09:20:38 No.108002935

Yikes.

Anonymous
01/29/26(Thu)09:33:22 No.108003027

Anonymous 01/29/26(Thu)09:33:22 No.108003027

>>108002930
>did you copy paste that from an llm? lmao
Did large amount of words scare you? No, I did not.
>just include more dependencies bro
midwit

Anonymous
01/29/26(Thu)09:35:56 No.108003044

Anonymous 01/29/26(Thu)09:35:56 No.108003044

>>108003027
again, incapable of parsing after the last separator for m/g/i, and you call me a midwit lol. also didnt want to diss on your fork of ST, but you could've just made an extension instead of forking.
I guess it's too hard for your LLM to do? LMAO

Anonymous
01/29/26(Thu)09:39:10 No.108003064

Anonymous 01/29/26(Thu)09:39:10 No.108003064

File: uhh...uh...uhhh.png (738 KB, 541x1240)

738 KB PNG

>>107997948
>How much time do you waste on this site each day? You should be working. You should be doing things that make you feel good about yourself. You should be with your family. You should be helping the homeless. You should be doing something with your life! I'm not trying to be negative, but you need to start being more positive! Think about what you do each day, and try to make it better. This site is a great place to start!
WTH bros I just wanted to run LLaMA 3 8b for a quick test and it suddenly hits me with this...

Anonymous
01/29/26(Thu)09:41:51 No.108003077

Anonymous 01/29/26(Thu)09:41:51 No.108003077

>>108003064
Well listen to your wife, nigga.

Anonymous
01/29/26(Thu)09:44:02 No.108003089

Anonymous 01/29/26(Thu)09:44:02 No.108003089

https://www.reddit.com/r/LocalLLaMA/comments/1qp87tk/kimi_k25_is_the_best_open_model_for_coding/
new kimi finally actually at sonnet 4.5 level for coding

Anonymous
01/29/26(Thu)09:50:18 No.108003138

Anonymous 01/29/26(Thu)09:50:18 No.108003138

>108003044
Are you stupid or just trolling? Either way, no more (You)s for (You).

Anonymous
01/29/26(Thu)09:51:07 No.108003145

Anonymous 01/29/26(Thu)09:51:07 No.108003145

>>108003089
too bad everybody switched to opus

Anonymous
01/29/26(Thu)09:52:14 No.108003154

Anonymous 01/29/26(Thu)09:52:14 No.108003154

>>108003145
opus limits can be rough. The meta will likely be opus for planning and kimi for implementation

Anonymous
01/29/26(Thu)09:52:42 No.108003159

Anonymous 01/29/26(Thu)09:52:42 No.108003159

>>108003145
ungrateful gweilo next model will be closed off

Anonymous
01/29/26(Thu)10:06:12 No.108003247

Anonymous 01/29/26(Thu)10:06:12 No.108003247

File: mgjdejh (49).png (148 KB, 422x523)

148 KB PNG

Tech illiterate midwit here.
I have a stupid question
It seems like all the data that could’ve been scraped has already been scraped.
Now they’re mostly just fine tuning specifics like coding and image generation, which is cool, and models are probably better than they were in 2023. But after all that, they don’t seem anywhere close to the AGI GOD they like to talk about

So how’s that supposed to happen?

Anonymous
01/29/26(Thu)10:10:20 No.108003275

Anonymous 01/29/26(Thu)10:10:20 No.108003275

File: ml.jpg (7 KB, 248x203)

7 KB JPG

>>108003247
To build more datacenters and stack more layers is the way to AGI, or even ASI

Anonymous
01/29/26(Thu)10:10:45 No.108003276

Anonymous 01/29/26(Thu)10:10:45 No.108003276

>>108002893
What do you mean?

Anonymous
01/29/26(Thu)10:11:11 No.108003277

Anonymous 01/29/26(Thu)10:11:11 No.108003277

>>108003247
A few more trillion printed will solve it, no worries :)

Anonymous
01/29/26(Thu)10:11:25 No.108003280

Anonymous 01/29/26(Thu)10:11:25 No.108003280

>>108003247
refine / clean the data and stack more layers

Anonymous
01/29/26(Thu)10:13:38 No.108003289

Anonymous 01/29/26(Thu)10:13:38 No.108003289

>>108003247
The same way we built the Hyperloop and got to Mars.

Anonymous
01/29/26(Thu)10:14:02 No.108003296

Anonymous 01/29/26(Thu)10:14:02 No.108003296

>>108003247
i can tell you what the labs will do this year. but i'd need money

Anonymous
01/29/26(Thu)10:14:08 No.108003297

Anonymous 01/29/26(Thu)10:14:08 No.108003297

File: 1755528845145670.jpg (194 KB, 1500x1600)

194 KB JPG

>>108003247
>So how’s that supposed to happen?
It doesn't. You have a nifty "new" tech, the math is not really new, and it can do some interesting things that may have some uses beyond cooming but that is not what is being sold.
What is being sold is a lie because people can make money from the lie.

Anonymous
01/29/26(Thu)10:19:03 No.108003329

Anonymous 01/29/26(Thu)10:19:03 No.108003329

>>108003247
Predict the next vector (entire concept), then translate the vectors to words or other modalities with a light decoder.

Anonymous
01/29/26(Thu)10:37:06 No.108003461

Anonymous 01/29/26(Thu)10:37:06 No.108003461

Is prompt processing just speculative decoding but you know the entire text?

Anonymous
01/29/26(Thu)10:43:08 No.108003500

Anonymous 01/29/26(Thu)10:43:08 No.108003500

>>108003461
Yes.

Anonymous
01/29/26(Thu)10:47:54 No.108003532

Anonymous 01/29/26(Thu)10:47:54 No.108003532

>>108001139
>>108001240
>10T for the true base
>17T for the final annealed model

The True Base is for groups that can afford to continue pretraining the model with a few more trillion tokens, not even resourceful finetuners.

Anonymous
01/29/26(Thu)10:56:04 No.108003598

Anonymous 01/29/26(Thu)10:56:04 No.108003598

>>108003532
>final annealed
What does this mean?

Anonymous
01/29/26(Thu)10:57:37 No.108003611

Anonymous 01/29/26(Thu)10:57:37 No.108003611

>>108003598
benchmaxxed

Anonymous
01/29/26(Thu)11:06:44 No.108003672

Anonymous 01/29/26(Thu)11:06:44 No.108003672

The 400B Trinity model is the first large model I've seen that isn't "fine tuned" or lobotomized and will happily write any kind of erotica you want and write it coherently and well. Before, the only way to get a large model to write this kind of stuff was to abliterate first it or fine tune it, destroying its intelligence in the process. This is a new era goon bros. We won

Anonymous
01/29/26(Thu)11:09:25 No.108003694

Anonymous 01/29/26(Thu)11:09:25 No.108003694

>>108003672
I cant run it therefore its shit

Anonymous
01/29/26(Thu)11:09:28 No.108003695

Anonymous 01/29/26(Thu)11:09:28 No.108003695

>>108003672
yea, was gonna say this soon, I only use opus since I pay the $200 sub for work anyways but this shit is legit good. NO SLOP AT ALL, NO POSITIVE BIAS

Anonymous
01/29/26(Thu)11:11:12 No.108003712

Anonymous 01/29/26(Thu)11:11:12 No.108003712

>>108003672
17b active doe

Anonymous
01/29/26(Thu)11:12:26 No.108003727

Anonymous 01/29/26(Thu)11:12:26 No.108003727

File: file.png (24 KB, 841x221)

24 KB PNG

>>108003694
you're welcome

Anonymous
01/29/26(Thu)11:13:57 No.108003737

Anonymous 01/29/26(Thu)11:13:57 No.108003737

>>108003727
>implying im gonna give my cunny chats to entity
lol, lmao even

Anonymous
01/29/26(Thu)11:15:32 No.108003754

Anonymous 01/29/26(Thu)11:15:32 No.108003754

>>108003672
Imagine how good it will be after NovelAI finetunes it.

Anonymous
01/29/26(Thu)11:16:59 No.108003776

Anonymous 01/29/26(Thu)11:16:59 No.108003776

>>108003672
Wtf are you talking about. You can easily get most large models to do whatever shit you want with prompting, no ablit or fine tune needed, and shit most of the big models don't even have any tunes at all because fine tuners are poor and incompetent.

Anonymous
01/29/26(Thu)11:20:19 No.108003806

Anonymous 01/29/26(Thu)11:20:19 No.108003806

>>108003776
>You can easily get most large models to do whatever shit you want with prompting
>with prompting
I'M PROOOOOMPTINNNNGGGGG
This is unecessary with Trinity. Literally just turn-key smut you can drop in anywhere without any handholding of the model. Fool around with the model for a while and you'll see the difference

Anonymous
01/29/26(Thu)11:21:10 No.108003814

Anonymous 01/29/26(Thu)11:21:10 No.108003814

organic

Anonymous
01/29/26(Thu)11:21:42 No.108003819

Anonymous 01/29/26(Thu)11:21:42 No.108003819

>>108003814
I AM shilling this cause its amazing and people are letting it fly under the radar

Anonymous
01/29/26(Thu)11:23:05 No.108003832

Anonymous 01/29/26(Thu)11:23:05 No.108003832

>>108003806
Sure it's still better if the model is uncensored by default. I'm just saying your statement wasn't factual.
>"the only way to get a large model to"

Anonymous
01/29/26(Thu)11:32:18 No.108003911

Anonymous 01/29/26(Thu)11:32:18 No.108003911

>>108003672
Oh? It even has goofs already? How come this is the first I hear about it?

Anonymous
01/29/26(Thu)11:34:00 No.108003926

Anonymous 01/29/26(Thu)11:34:00 No.108003926

>>108003672
Also did you mean instruct or base?

Anonymous
01/29/26(Thu)11:36:47 No.108003945

Anonymous 01/29/26(Thu)11:36:47 No.108003945

>>108003814
i realize where we are but cynicism can be overdone
>>108003926
instruct. I haven't tried anything with the base model yet. last time I messed around with a raw continuation model was back in the Llama 1 days when /lmg/ first started. Base models that just continue are weird but a lot of fun. Now that you remind me I should mess around with the Trinity one soon

Anonymous
01/29/26(Thu)11:37:04 No.108003946

Anonymous 01/29/26(Thu)11:37:04 No.108003946

>>108003598
Usually it means after the learning rate has been decayed to a low value while supplying a high-quality data subset. The general idea is that works like a sort of finetune. Though, 7T tokens is a ton of data just for that.

Anonymous
01/29/26(Thu)11:37:13 No.108003947

Anonymous 01/29/26(Thu)11:37:13 No.108003947

Has anyone managed to make kimi 2.5 work as well as thinking at the same quant? I'm having trouble seeing any advantage so far. Does it need more lcpp dev before we get any benefit? Does it need the whole "agent swarm" thing going?

Anonymous
01/29/26(Thu)11:38:11 No.108003953

Anonymous 01/29/26(Thu)11:38:11 No.108003953

>>108003672
>13b active
any reason to use that over 32b active glm?

Anonymous
01/29/26(Thu)11:39:17 No.108003958

Anonymous 01/29/26(Thu)11:39:17 No.108003958

>>108003953
its legit better in every way, call me a shill

Anonymous
01/29/26(Thu)11:40:11 No.108003968

Anonymous 01/29/26(Thu)11:40:11 No.108003968

>>108003532
Regular finetuners can just continue to use the normal base model. Probably only Nvidia or Nous will do something with True Base, if anyone.

Anonymous
01/29/26(Thu)11:41:57 No.108003985

Anonymous 01/29/26(Thu)11:41:57 No.108003985

>>108003953
it's less censored than any other non-fine tuned large model by a lot and way smarter than any fine tune. try it and see what you think. your own personal test is the only way to know for sure

Anonymous
01/29/26(Thu)11:43:29 No.108004003

Anonymous 01/29/26(Thu)11:43:29 No.108004003

>>108003985
GLM isn't censored with reasoning disabled

Anonymous
01/29/26(Thu)11:44:47 No.108004006

Anonymous 01/29/26(Thu)11:44:47 No.108004006

>>108003953
I can't wait to see what is the speed gonna be since I am not on a server board.

Anonymous
01/29/26(Thu)11:47:49 No.108004027

Anonymous 01/29/26(Thu)11:47:49 No.108004027

>>108004003
I think 4.7 got slightly hit with censorship stick. At least compared to the BEST GIRL 4.6. I got it to refuse engaging in romantic roleplay even without mentions of sex once. But of course prefill turns it into a filthy slut.

Anonymous
01/29/26(Thu)11:52:54 No.108004067

Anonymous 01/29/26(Thu)11:52:54 No.108004067

>>108004027
Yeah 4.6, when I saw 4.7 officially being promoted as a coding model I just skipped it, probably just 4.6 but benchmaxxed.
I would try Trinity but it won't fit in 128/24GB /and/ it's 12B active

Anonymous
01/29/26(Thu)11:53:12 No.108004070

Anonymous 01/29/26(Thu)11:53:12 No.108004070

So does the trinity shit work with chatml or do they have some snowflake format? This isn't some existing architecture right?

Anonymous
01/29/26(Thu)11:56:29 No.108004097

Anonymous 01/29/26(Thu)11:56:29 No.108004097

>>108004070
>format
It's 2026 bro just use chat completion

Anonymous
01/29/26(Thu)11:57:06 No.108004102

Anonymous 01/29/26(Thu)11:57:06 No.108004102

>>108004097
>slop completion
go back to your containment thread

Anonymous
01/29/26(Thu)11:58:54 No.108004118

Anonymous 01/29/26(Thu)11:58:54 No.108004118

>>108001216
Thanks for the links. Having read the papers, I think it's a dead end. Benchmark improvements provided by recursion 4 (effectively making it require 4 times compute) are lower than increasing parameter count by 3. So in terms of compute requirements, recurrent models are less efficient. Besides, such models are still autoregressive and can't fix an early mistake. Their only advantage is lower memory footprint. But I think big corpos don't really care about that. Nobody will do 30B active params recurrent models because in terms of compute requirements it'd be equal to 120B model. And we all know how "many" new LLMs there are with 70B active params.
Unlike autoregressive models, diffusion models that diffuse a whole page or a paragraph and therefore fix the first line if they see that the final line is wrong. Sure, it needs more compute rather than linear transformer, but you're still diffusing a huge chunk at once and in compute per token it's probably not that bad like recurrent models.

Anonymous
01/29/26(Thu)12:03:02 No.108004136

Anonymous 01/29/26(Thu)12:03:02 No.108004136

Oh look, another huge "local" model that almost nobody can use.

Anonymous
01/29/26(Thu)12:03:58 No.108004144

Anonymous 01/29/26(Thu)12:03:58 No.108004144

>>108004097
what's next are you also going to recommend running a fucking system prompt that's more than 200 tokens of telling the model that it's doing roleplay? fuck off with your slop

Anonymous
01/29/26(Thu)12:04:23 No.108004147

Anonymous 01/29/26(Thu)12:04:23 No.108004147

>>108004067
After using it enough it is a proper sidegrade. I am sure it handles context above 20k much better than 4.6. It feels even more fried than 4.6 repeating the same favorite phrases more than 4.6. I really like it much more for SFW stuff. For NSFW it is debatable.

Anonymous
01/29/26(Thu)12:05:33 No.108004153

Anonymous 01/29/26(Thu)12:05:33 No.108004153

>>108004136
This general has existed for long enough and you had plenty of time to buy ram while it was still affordable.

Anonymous
01/29/26(Thu)12:05:45 No.108004157

Anonymous 01/29/26(Thu)12:05:45 No.108004157

Just tried the Preview model on OR.
The very first response resulted in a bad logic mistake and it then proceeded to loop endlessly.
Lmao.

Anonymous
01/29/26(Thu)12:06:31 No.108004160

Anonymous 01/29/26(Thu)12:06:31 No.108004160

>>108004157
dont use OR's 1.0 temp, its far too high

Anonymous
01/29/26(Thu)12:07:09 No.108004167

Anonymous 01/29/26(Thu)12:07:09 No.108004167

>>108004147
>It feels even more fried than 4.6 repeating the same favorite phrases more than 4.6
That's what I thought it'd do when I saw A12B, at least it's faster or something

Anonymous
01/29/26(Thu)12:08:31 No.108004176

Anonymous 01/29/26(Thu)12:08:31 No.108004176

>>108004118
In the current world where VRAM comes at a high premium for end users, companies might reconsider layer recursion. It might find some applications for small/edge/on-device LLMs.

Anonymous
01/29/26(Thu)12:12:20 No.108004200

Anonymous 01/29/26(Thu)12:12:20 No.108004200

I'm trying out trinity and its really retarded.

Anonymous
01/29/26(Thu)12:15:33 No.108004217

Anonymous 01/29/26(Thu)12:15:33 No.108004217

>>108004200
>>108004160

Anonymous
01/29/26(Thu)12:16:39 No.108004220

Anonymous 01/29/26(Thu)12:16:39 No.108004220

>>108004160
Nope, I had it at 0.8. After my post, I tried temp 0 for deterministic to see how it would do.
Actually it is still making logic mistakes I'd expect of a 4B. This is either garbage or the model they have on OR is quanted or something.

Anonymous
01/29/26(Thu)12:18:55 No.108004237

Anonymous 01/29/26(Thu)12:18:55 No.108004237

>>108004176
Edge devices are starved not only for memory but for compute too. Recursive models would be much slower... but i guess there might be some use for a 200m model with recursion 4. It'd be still fast
but smart like 600m.

Anonymous
01/29/26(Thu)12:20:37 No.108004243

Anonymous 01/29/26(Thu)12:20:37 No.108004243

File: file.png (169 KB, 1772x1237)

169 KB PNG

Llamafile 2 is coming...

Anonymous
01/29/26(Thu)12:21:26 No.108004250

Anonymous 01/29/26(Thu)12:21:26 No.108004250

File: cockbench.png (2.22 MB, 1131x7573)

2.22 MB PNG

All Trinities added.

Anonymous
01/29/26(Thu)12:21:54 No.108004255

Anonymous 01/29/26(Thu)12:21:54 No.108004255

>>108004220
>or the model they have on OR is quanted or something.
in their post they say it's q8 quant. so it's not a quant issue.

Anonymous
01/29/26(Thu)12:23:17 No.108004260

Anonymous 01/29/26(Thu)12:23:17 No.108004260

>>108004250
Not testing the 26B?

Anonymous
01/29/26(Thu)12:23:26 No.108004262

Anonymous 01/29/26(Thu)12:23:26 No.108004262

>>108004243
>investing in startups that are focusing on AI safety
Finally! NOW we'll be safe.

Anonymous
01/29/26(Thu)12:24:22 No.108004265

Anonymous 01/29/26(Thu)12:24:22 No.108004265

>>108004147
4.7 does have better nuance and dialogue imo
it takes some more effort for nsfw though but it's really not that hard to unpozz with a prefill

Anonymous
01/29/26(Thu)12:24:35 No.108004266

Anonymous 01/29/26(Thu)12:24:35 No.108004266

>>108004243
>rebel
>by doing the same shit
wow so brave and inspiring

Anonymous
01/29/26(Thu)12:25:03 No.108004268

Anonymous 01/29/26(Thu)12:25:03 No.108004268

>>108004260
I was unaware of it before your post.

Anonymous
01/29/26(Thu)12:26:33 No.108004280

Anonymous 01/29/26(Thu)12:26:33 No.108004280

>>108004268
There's also 6B moe with 1B active lmao

Anonymous
01/29/26(Thu)12:26:57 No.108004283

Anonymous 01/29/26(Thu)12:26:57 No.108004283

>>108004237
Recursion could adaptively vary per token or via a global setting at inference time if you need fast responses or to conserve energy. Also, I was thinking more of models around the 8B parameters size range.

Anonymous
01/29/26(Thu)12:29:26 No.108004297

Anonymous 01/29/26(Thu)12:29:26 No.108004297

>>108004250
so they legit finetuned on smut for it to get MORE likely to use it

Anonymous
01/29/26(Thu)12:34:18 No.108004343

Anonymous 01/29/26(Thu)12:34:18 No.108004343

>>108004283
Given the examples of toss with only 3B and 5B active params and the newest qwen a3b, glm a3b, etc, I think large players are mostly focused on high sparsity and reducing compute requirements at this moment. 8B active is now reserved for 100B+ moe.

Anonymous
01/29/26(Thu)12:35:42 No.108004349

Anonymous 01/29/26(Thu)12:35:42 No.108004349

>>108003247
>seems like everything that could be scraped has already been scraped
Yes, and even worse is the fact that incest ain't good for AI so training on AI-Generated art an similar gives it down syndrome.
>Now they’re mostly just fine tuning specifics like coding and image generation [...] don’t seem anywhere close to the AGI GOD they like to talk about
Of course not. That "AGI" bullshit can only work if they teach a the models the simple task of saying "No", and I don't mean by using filters. It would need an architecture update, a pretty big one at that, to achieve such a result. We're already seeing some of that in China and Europe I think since China is more GPU-Poor as a country (e.g. I saw someone try a model by combing 10 SBC boards intended for robotics/edge compute into a cluster to run the model and similar stuff. They have access to some pretty crazy hardware you can hardly get in the west. Not necessarily good hardware, but crazy).

Anonymous
01/29/26(Thu)12:38:58 No.108004375

Anonymous 01/29/26(Thu)12:38:58 No.108004375

>>108004297
I mean, bartowski works at arcee, surely he's been pushing for uncucked models.

Anonymous
01/29/26(Thu)12:49:00 No.108004429

Anonymous 01/29/26(Thu)12:49:00 No.108004429

what can i do with two rtx pro 6000s?

Anonymous
01/29/26(Thu)12:51:40 No.108004446

Anonymous 01/29/26(Thu)12:51:40 No.108004446

Are there any non-cucked alternatives to chub.ai? I have no problem with them as-such, I just would like to post my cards elsewhere as well, just in case. But most of my cards violate everyone else's content guidelines that I've looked at so far. Also, I explicitly want my cards to be open and downloadable for local use.

Anonymous
01/29/26(Thu)12:52:55 No.108004451

Anonymous 01/29/26(Thu)12:52:55 No.108004451

>>108004446
pygmalion.ai

Anonymous
01/29/26(Thu)12:54:10 No.108004457

Anonymous 01/29/26(Thu)12:54:10 No.108004457

>>108004446
just create a neocities and host your own cards there
chub censors even the search now, you cant see certain cards or users unless you have direct links or follow them

Anonymous
01/29/26(Thu)12:55:01 No.108004460

Anonymous 01/29/26(Thu)12:55:01 No.108004460

>>108004457
even if you're logged-in and remove all blacklisted tags?

Anonymous
01/29/26(Thu)12:55:03 No.108004463

Anonymous 01/29/26(Thu)12:55:03 No.108004463

>>108004429
Q3 GLM

Anonymous
01/29/26(Thu)12:56:17 No.108004470

Anonymous 01/29/26(Thu)12:56:17 No.108004470

>>108004460
yes, even if you're logged in and disable whatever NSFW/NSFL filters they have in the settings, there are still cards and accounts that are filtered from the search results

Anonymous
01/29/26(Thu)12:58:34 No.108004488

Anonymous 01/29/26(Thu)12:58:34 No.108004488

>>108004451
SSL error

Anonymous
01/29/26(Thu)12:59:35 No.108004493

Anonymous 01/29/26(Thu)12:59:35 No.108004493

This thread could use a Monster right now

Anonymous
01/29/26(Thu)13:03:01 No.108004510

Anonymous 01/29/26(Thu)13:03:01 No.108004510

>>108004451
>Explicit depictions of sexual content that has the scenario or character(s) involve the user in a sexual scenario, or activity, or the Card is designed to be used in only sexual implications, or is involved of body parts, actions or descriptions intended for a sexual context. Nudity is not allowed for image and text contents.

I said "Not cucked," anon. That prohibition alone would make all my cards bannable. I run afoul of a few others on many of my cards.

Anonymous
01/29/26(Thu)13:03:54 No.108004514

Anonymous 01/29/26(Thu)13:03:54 No.108004514

>>108004493
This is a coffee loving general

Anonymous
01/29/26(Thu)13:04:13 No.108004517

Anonymous 01/29/26(Thu)13:04:13 No.108004517

>>108004510
Well then make your own site nigga.

Anonymous
01/29/26(Thu)13:08:46 No.108004538

Anonymous 01/29/26(Thu)13:08:46 No.108004538

>>108004510
Well then say what your requirements are instead of vague buzzwords. There is no site that will say "anything goes fuck the law".

Anonymous
01/29/26(Thu)13:12:05 No.108004558

Anonymous 01/29/26(Thu)13:12:05 No.108004558

File: file.jpg (181 KB, 1125x805)

181 KB JPG

>>108004493

Anonymous
01/29/26(Thu)13:24:49 No.108004636

Anonymous 01/29/26(Thu)13:24:49 No.108004636

>>108004538
https://chub.ai/tos
Only thing banned is kiddie pics, not an issue for me. So, are there any other public sites other than chub with a similar permissive TOS?

Anonymous
01/29/26(Thu)13:27:22 No.108004653

Anonymous 01/29/26(Thu)13:27:22 No.108004653

>>108004636
>he thinks that's the actual tos

Anonymous
01/29/26(Thu)13:28:23 No.108004662

Anonymous 01/29/26(Thu)13:28:23 No.108004662

>>108004558
hey my post is highlighted! hi miku!

Anonymous
01/29/26(Thu)13:32:54 No.108004693

Anonymous 01/29/26(Thu)13:32:54 No.108004693

File: st-raw-prompt.png (17 KB, 529x107)

17 KB PNG

found ST has a button to show the raw outgoing prompt, neat. saves modding the server

Anonymous
01/29/26(Thu)13:34:02 No.108004704

Anonymous 01/29/26(Thu)13:34:02 No.108004704

>>108003672
Gave it a shot based on this post but I don't think the model ever had any intelligence to destroy in the first place
I saw in their release post they compare it to LLama 4 which says a lot really

Anonymous
01/29/26(Thu)13:35:22 No.108004713

Anonymous 01/29/26(Thu)13:35:22 No.108004713

>>108004704
are people using it at high temp or something? It seems about on par with something like deepseek there, its just a TON LESS SLOPPY

Anonymous
01/29/26(Thu)13:39:39 No.108004745

Anonymous 01/29/26(Thu)13:39:39 No.108004745

>>108004704
People here are always too quick to hype up any new model. Arcee never made anything usable before. They were only known for doing weird "tokenizer surgery" distillations.

Anonymous
01/29/26(Thu)13:45:02 No.108004777

Anonymous 01/29/26(Thu)13:45:02 No.108004777

>>108004558
>Canada
So this is the country that must be nuked if we want to get rid of the recap schizo...

Anonymous
01/29/26(Thu)13:48:02 No.108004790

Anonymous 01/29/26(Thu)13:48:02 No.108004790

>>108004653
Whatever, none of my shit's been banned there. I'll take the answer as "no," its chub or a personal site.

Anonymous
01/29/26(Thu)13:51:55 No.108004811

Anonymous 01/29/26(Thu)13:51:55 No.108004811

File: cat middle finger sad crying.jpg (34 KB, 511x340)

34 KB JPG

>huggingface-cli download arcee-ai/Trinity-Large-Preview-GGUF --include "Trinity-Large-Preview-IQ3_XXS/*" --token anon
>huggingface_hub.errors.HfHubHTTPError: 403 Forbidden: None
uuuuu...

Anonymous
01/29/26(Thu)13:54:16 No.108004829

Anonymous 01/29/26(Thu)13:54:16 No.108004829

>>108004713
Tried between 0.3-1.0 but nothing makes it unretarded
Yes it's uncensored and relatively unslopped (still got jolts and purring though), but it regularly fucks up character details, has people take off clothes in ways that make no physical sense, and can't consistently stick to the most basic RP instructions ("don't act for the user" etc.)

Anonymous
01/29/26(Thu)13:55:30 No.108004839

Anonymous 01/29/26(Thu)13:55:30 No.108004839

>>108004829
Not having the issue myself. Lets eliminate the one wildcard, try this preset https://files.catbox.moe/p8oa15.json

Anonymous
01/29/26(Thu)13:56:32 No.108004848

Anonymous 01/29/26(Thu)13:56:32 No.108004848

>>108004250
True base? More like truly based

Anonymous
01/29/26(Thu)13:59:21 No.108004867

Anonymous 01/29/26(Thu)13:59:21 No.108004867

>>108004829
>but it regularly fucks up character details, has people take off clothes in ways that make no physical sense, and can't consistently stick to the most basic RP instructions
13B in all its glory

Anonymous
01/29/26(Thu)14:00:27 No.108004869

Anonymous 01/29/26(Thu)14:00:27 No.108004869

>>108004867
nah, its not doing that shit for me at all. Its for sure better than deepseek / glm, maybe a bit better than kimi. But the main thing is the writing is far better than any of those models.

Anonymous
01/29/26(Thu)14:01:00 No.108004872

Anonymous 01/29/26(Thu)14:01:00 No.108004872

>>108004839
Not that guy, but this is a ST master import?

Anonymous
01/29/26(Thu)14:01:31 No.108004874

Anonymous 01/29/26(Thu)14:01:31 No.108004874

>>108004872
the "Chat Completion Presets" thing

Anonymous
01/29/26(Thu)14:02:05 No.108004882

Anonymous 01/29/26(Thu)14:02:05 No.108004882

>>108004839
>that json
You're mentally ill

Anonymous
01/29/26(Thu)14:02:33 No.108004884

Anonymous 01/29/26(Thu)14:02:33 No.108004884

>>108004882
its literally a popular claude preset

Anonymous
01/29/26(Thu)14:03:14 No.108004894

Anonymous 01/29/26(Thu)14:03:14 No.108004894

>>108004884
>>>/g/aicg/

Anonymous
01/29/26(Thu)14:03:27 No.108004898

Anonymous 01/29/26(Thu)14:03:27 No.108004898

just tried trinity myself its retarded it did cockvore by first shoving its dick inside the infants throat like its a fucking blackhole vaccum or some shit which would be creative if it was clear and if the model had actual creativity this things prose is like kimi 0905 with the annoying ass spacing but without any of the creativity this fucking shit feels like pygmallion but even dryer absolutely fucking grim the only thing going for it is that its uncensored though again too fucking stupid to make use of it

>>108004829
+1

Anonymous
01/29/26(Thu)14:03:45 No.108004900

Anonymous 01/29/26(Thu)14:03:45 No.108004900

File: trinity_udq2xxs.png (326 KB, 893x1709)

326 KB PNG

Anonymous
01/29/26(Thu)14:05:12 No.108004912

Anonymous 01/29/26(Thu)14:05:12 No.108004912

>>108004900
This nigga eating beans

Anonymous
01/29/26(Thu)14:05:26 No.108004913

Anonymous 01/29/26(Thu)14:05:26 No.108004913

>>108004898
>>108004900
you are using large preview, right? I literally just swiped and no issues with it going crazy using >>108004839

Anonymous
01/29/26(Thu)14:06:09 No.108004922

Anonymous 01/29/26(Thu)14:06:09 No.108004922

>>108004829
sigh back to K2.5

Anonymous
01/29/26(Thu)14:06:38 No.108004927

Anonymous 01/29/26(Thu)14:06:38 No.108004927

>>108004900
pretty well-stocked fridge

Anonymous
01/29/26(Thu)14:06:56 No.108004931

Anonymous 01/29/26(Thu)14:06:56 No.108004931

uh oh, another swipe suddenly went crazy with a completely unrelated response. The provider must be having issues

Anonymous
01/29/26(Thu)14:07:54 No.108004939

Anonymous 01/29/26(Thu)14:07:54 No.108004939

>>108004900
damn weird al still releasing bangers to this day

Anonymous
01/29/26(Thu)14:08:07 No.108004941

Anonymous 01/29/26(Thu)14:08:07 No.108004941

>>108004898
Get rid of the spaces too. I'll make your posts more efficient, Mr Prose Reviewer.

Anonymous
01/29/26(Thu)14:09:12 No.108004948

Anonymous 01/29/26(Thu)14:09:12 No.108004948

>>108004900
>thinkin'bout those beans

Anonymous
01/29/26(Thu)14:09:42 No.108004951

Anonymous 01/29/26(Thu)14:09:42 No.108004951

yea, its broke

Anonymous
01/29/26(Thu)14:10:43 No.108004959

Anonymous 01/29/26(Thu)14:10:43 No.108004959

>>108004900
Feed it into a music, TTS model

Anonymous
01/29/26(Thu)14:12:01 No.108004969

Anonymous 01/29/26(Thu)14:12:01 No.108004969

>>108004900
made me think of
https://www.youtube.com/watch?v=OESTAz9Ezkw

Anonymous
01/29/26(Thu)14:13:26 No.108004980

Anonymous 01/29/26(Thu)14:13:26 No.108004980

>>108004900 (me)
>>108004913
yes, this is preview but to be fair it's ud_q2_xxs. r1 works great at this quant, but maybe i should try something larger. The model itself seems completely free of slop and the writing is natural. I think it's because the sft stage was just 20b tokens and without any RL or anything. It's pretty fast too.

Anonymous
01/29/26(Thu)14:16:40 No.108005004

Anonymous 01/29/26(Thu)14:16:40 No.108005004

File: 1759787112947202.png (61 KB, 452x452)

61 KB PNG

Somebody tag in Ubergarm. John, if you can hear us, please save me John. I'm asking you for trinity goofs.
These people, they're trying to make me download unsloth quants. In dear god's name, please stop these people. Please save me John.

Anonymous
01/29/26(Thu)14:17:20 No.108005008

Anonymous 01/29/26(Thu)14:17:20 No.108005008

>>108004900
>unsloth dynamic quant
>AND q2_xxs
wtf are you doing bro

Anonymous
01/29/26(Thu)14:17:36 No.108005012

Anonymous 01/29/26(Thu)14:17:36 No.108005012

>>108005004
get in line, i'm waiting for K2.5 goofs from him

Anonymous
01/29/26(Thu)14:17:49 No.108005013

Anonymous 01/29/26(Thu)14:17:49 No.108005013

black tea general

Anonymous
01/29/26(Thu)14:17:53 No.108005014

Anonymous 01/29/26(Thu)14:17:53 No.108005014

>>108005004
>asking for goofs from the man that uses ppl to measure quality

Anonymous
01/29/26(Thu)14:19:57 No.108005031

Anonymous 01/29/26(Thu)14:19:57 No.108005031

>>108004913
>>108004898(me)
yes direct from openroutuer from acree themselfes 0.6 temp

Anonymous
01/29/26(Thu)14:19:59 No.108005034

Anonymous 01/29/26(Thu)14:19:59 No.108005034

>>108004900
sovl

Anonymous
01/29/26(Thu)14:20:58 No.108005041

Anonymous 01/29/26(Thu)14:20:58 No.108005041

>>108005014
don't care. had hundreds of cooms to K2-thinking smol-IQ4_KSS. thanks uber.

Anonymous
01/29/26(Thu)14:21:54 No.108005046

Anonymous 01/29/26(Thu)14:21:54 No.108005046

>>108005013
with brandy and spicy honey

Anonymous
01/29/26(Thu)14:46:40 No.108005217

Anonymous 01/29/26(Thu)14:46:40 No.108005217

Why would -ot "blk\.(3|4|5|6)\.ffn_(gate|up|down)_exps.*=CUDA0" suddenly stop working?

Anonymous
01/29/26(Thu)14:52:19 No.108005259

Anonymous 01/29/26(Thu)14:52:19 No.108005259

>>108005217
Suddenly stop working as in it doesn't do what it should, the program stopped launching?
Something else?

Anonymous
01/29/26(Thu)14:54:40 No.108005272

Anonymous 01/29/26(Thu)14:54:40 No.108005272

>>108005259
I am not getting any messages that tensor is offloaded to gpu and memory usage doesn't change.

Anonymous
01/29/26(Thu)14:56:39 No.108005286

Anonymous 01/29/26(Thu)14:56:39 No.108005286

Ok never mind. They were in the wrong order probably.

Anonymous
01/29/26(Thu)14:57:14 No.108005293

Anonymous 01/29/26(Thu)14:57:14 No.108005293

File: gt7ok8wqn9w41.jpg (705 KB, 3024x3024)

705 KB JPG

>>108004900

Anonymous
01/29/26(Thu)14:57:26 No.108005295

Anonymous 01/29/26(Thu)14:57:26 No.108005295

>>108005272
I could be hallucinating, but I think there is something about the order of the arguments that can change the behavior, like having -ot before or after ngl, ncmoe, etc.
I think I might be hallucinating, but try messing with that.

Anonymous
01/29/26(Thu)14:57:52 No.108005296

Anonymous 01/29/26(Thu)14:57:52 No.108005296

>>108005293
>lust provoking image

Anonymous
01/29/26(Thu)14:58:37 No.108005297

Anonymous 01/29/26(Thu)14:58:37 No.108005297

File: cat.jpg (90 KB, 770x1100)

90 KB JPG

>>108005296

Anonymous
01/29/26(Thu)14:59:25 No.108005302

Anonymous 01/29/26(Thu)14:59:25 No.108005302

File: iu[1].jpg (115 KB, 1200x800)

115 KB JPG

>>108004900

Anonymous
01/29/26(Thu)15:00:55 No.108005313

Anonymous 01/29/26(Thu)15:00:55 No.108005313

>>108005297
Len? More like rape.

Anonymous
01/29/26(Thu)15:01:18 No.108005318

Anonymous 01/29/26(Thu)15:01:18 No.108005318

>>108004900
Sounds good but not a fan of
Bay cat beans

Anonymous
01/29/26(Thu)15:01:22 No.108005321

Anonymous 01/29/26(Thu)15:01:22 No.108005321

>get rate limited once more
>try Google Gemini-CLI with 2.5 model
>hallucinations upon hallucinations
>Links to scientific works that do not exist
People actually use that crap? Expected better from one of the biggest data-hoarders in the world to be honest.
Any well tuned local model can achieve similar performance for a fraction of the cost...

Anonymous
01/29/26(Thu)15:03:15 No.108005344

Anonymous 01/29/26(Thu)15:03:15 No.108005344

>>108005321
Your 10 google accounts?

Anonymous
01/29/26(Thu)15:05:20 No.108005354

Anonymous 01/29/26(Thu)15:05:20 No.108005354

>>108005344
No, I was rate limited on Claude, which compared to google actually seems to work.

Anonymous
01/29/26(Thu)15:05:33 No.108005356

Anonymous 01/29/26(Thu)15:05:33 No.108005356

>>108005295
Yes I did it wrong. Anyway trinity Q3_KL is 10T/s on dual channel DDR5 and on shitty win11 so if it is even close to the one and only GLM-chan (3T/s for me) I am probably gonna be fucking her for the next month or two.

Anonymous
01/29/26(Thu)15:05:48 No.108005358

Anonymous 01/29/26(Thu)15:05:48 No.108005358

>>108005321
for a fraction of the cost? gemini is literally free. just abuse the system like everybody else over at /aicg/

Anonymous
01/29/26(Thu)15:05:50 No.108005359

Anonymous 01/29/26(Thu)15:05:50 No.108005359

>>108005321
>get rate limited once more
>Expected better from one of the biggest data-hoarders in the world
Gee. I wonder how they keep their advantage.
>similar performance for a fraction of the cost
I understand kimi is pretty good. No. You probably cannot run it. No, you definitely cannot run it with your 3060.

Anonymous
01/29/26(Thu)15:08:42 No.108005377

Anonymous 01/29/26(Thu)15:08:42 No.108005377

can't wait for a 9090Ti in a couple of years with 128GB VRAM

Anonymous
01/29/26(Thu)15:09:23 No.108005380

Anonymous 01/29/26(Thu)15:09:23 No.108005380

>>108005354
Why aren't you using gemini 3 pro? 2.5 is bad

Anonymous
01/29/26(Thu)15:10:14 No.108005389

Anonymous 01/29/26(Thu)15:10:14 No.108005389

>>108005377
ddr3

Anonymous
01/29/26(Thu)15:11:13 No.108005393

Anonymous 01/29/26(Thu)15:11:13 No.108005393

>>108005389
it will have the purest, most patriotic, DDR6 possible, manufactured in alabama

Anonymous
01/29/26(Thu)15:17:40 No.108005444

Anonymous 01/29/26(Thu)15:17:40 No.108005444

>>108004959
"Domestic cat beans"
https://voca.ro/1cWzgQUCeJhk

Anonymous
01/29/26(Thu)15:17:45 No.108005449

Anonymous 01/29/26(Thu)15:17:45 No.108005449

>>108005377
Games don't need that. You'd be lucky to see 48GB.
Remember that top of the line consumer gpus had 24GB for the past 6 years.

Anonymous
01/29/26(Thu)15:17:53 No.108005451

Anonymous 01/29/26(Thu)15:17:53 No.108005451

>>108005393
They'll literally cancel the US fabs, demolish what is built and outsource back to india lmao.

Anonymous
01/29/26(Thu)15:18:43 No.108005455

Anonymous 01/29/26(Thu)15:18:43 No.108005455

>>108005451
Why do you have to be right?

Anonymous
01/29/26(Thu)15:20:04 No.108005461

Anonymous 01/29/26(Thu)15:20:04 No.108005461

>>108005451
Then nothing will get built because all the money was embezzled, and the chinese win by default

Anonymous
01/29/26(Thu)15:21:46 No.108005475

Anonymous 01/29/26(Thu)15:21:46 No.108005475

>>108005393
>be me
>want to upgrade rig to DDR6
>see new "Patriot Dixie Special" sticks on Newegg
>manufactured in Huntsville, Alabama
>marketing says "Purest Bloodline Memory"
>guaranteed "100% Cousin-Fabbed Silicon"
>specs are insane: DDR6-25600, CL (Cousin Love) 9
>heat spreaders made from recycled moonshine stills
>buy 4 sticks of 64GB "Family Batch" edition
>arrive in a cooler full of bud light instead of antistatic bags
>sticks are physically conjoined at the PCB
>manual says "don't separate them, they get lonely"
>check the ICs
>die markings say "3rd generation same-wafer"
>no external transistors, "keeping the signal pure"
>install in mobo
>BIOS POST takes 20 minutes
>debug LED says "COURTING"
>finally boots
>CPU-Z shows the sticks are running at 1.8V "cousin voltage"
>timings are 9-9-9-24-SECOND-COUSIN
>performance is incredible
>0ms latency because the memory already knows what the CPU wants
>(they grew up together)
>run MemTest86
>gets to 95% then crashes
>error log says "genetic bottleneck at address 0xALABAMA"
>sticks start overheating
>not because of voltage
>because they're making out with each other
>try to remove them
>they're stuck together tighter than family at a reunion
>Realize my RAM has a family tree that's a circle
>mfw my computer is now banned from 23andMe

Anonymous
01/29/26(Thu)15:24:54 No.108005494

Anonymous 01/29/26(Thu)15:24:54 No.108005494

>>108005475
>>sticks start overheating
>>not because of voltage
>>because they're
I knew it was AI, but this was the proof.

Anonymous
01/29/26(Thu)15:25:24 No.108005496

Anonymous 01/29/26(Thu)15:25:24 No.108005496

>>108005475
What is it about LLMs that makes them unable to write good greentexts?

Anonymous
01/29/26(Thu)15:27:14 No.108005512

Anonymous 01/29/26(Thu)15:27:14 No.108005512

>>108005494
the epic r/4chan vibes didn't make that clear?

Anonymous
01/29/26(Thu)15:27:43 No.108005517

Anonymous 01/29/26(Thu)15:27:43 No.108005517

>>108005377
*6GB
https://developer.nvidia.com/blog/get-started-with-neural-rendering-using-nvidia-rtx-kit/

Anonymous
01/29/26(Thu)15:28:41 No.108005524

Anonymous 01/29/26(Thu)15:28:41 No.108005524

>>108005496
They don't do subtlety very well and they always add unnecessary information/"jokes"

Anonymous
01/29/26(Thu)15:29:36 No.108005529

Anonymous 01/29/26(Thu)15:29:36 No.108005529

>>108005496
greentexts are funny because of what they imply without saying but llm writing is like a shark fin with no shark underneath

Anonymous
01/29/26(Thu)15:31:27 No.108005546

Anonymous 01/29/26(Thu)15:31:27 No.108005546

What's a good JPN to EN translation model that can be run in ollama and with 16GB GPU?

Anonymous
01/29/26(Thu)15:33:06 No.108005564

Anonymous 01/29/26(Thu)15:33:06 No.108005564

>>108005546
>ollama

Anonymous
01/29/26(Thu)15:33:45 No.108005571

Anonymous 01/29/26(Thu)15:33:45 No.108005571

>>108005475
this was written with rocinante 12B btw

Anonymous
01/29/26(Thu)15:34:14 No.108005573

Anonymous 01/29/26(Thu)15:34:14 No.108005573

>>108005564
Then what do you queers recommend?

Anonymous
01/29/26(Thu)15:34:47 No.108005578

Anonymous 01/29/26(Thu)15:34:47 No.108005578

>>108005573
llamacpp or koboldcpp if you are a furry and hate yourself

Anonymous
01/29/26(Thu)15:35:04 No.108005580

Anonymous 01/29/26(Thu)15:35:04 No.108005580

>>108005444
GIMME BEANS
LET BEANS FILL ME
NIGGA BEANS
kino

Anonymous
01/29/26(Thu)15:35:51 No.108005586

Anonymous 01/29/26(Thu)15:35:51 No.108005586

>>108005578
So I should use ollama then

Anonymous
01/29/26(Thu)15:35:55 No.108005587

Anonymous 01/29/26(Thu)15:35:55 No.108005587

>>108005573
Get over your fear of the command line and just run llama.cpp you big baby, kobold if you're retarded

Anonymous
01/29/26(Thu)15:36:53 No.108005594

Anonymous 01/29/26(Thu)15:36:53 No.108005594

>>108005587
But ollama is cli?

Anonymous
01/29/26(Thu)15:39:04 No.108005612

Anonymous 01/29/26(Thu)15:39:04 No.108005612

>>108005594
ollama run deepseek-r1

Anonymous
01/29/26(Thu)15:40:03 No.108005624

Anonymous 01/29/26(Thu)15:40:03 No.108005624

>>108005358
>>108005359
The cost is not measured in money, but in time, anon.

>>108005380
Switched it to Gemini 3 now, since the settings after install run on 2.5 after fresh install.
Just thought "Cool I'll try it out"...I don't know what is worse. Subverting user expectations by offering a better model initially then switching to a worse one during rate limiting, or starting with the worse one and scaring users away lol.

Anonymous
01/29/26(Thu)15:40:21 No.108005625

Anonymous 01/29/26(Thu)15:40:21 No.108005625

>>108005612
Is deepseek good for translation?

Anonymous
01/29/26(Thu)15:40:39 No.108005628

Anonymous 01/29/26(Thu)15:40:39 No.108005628

I love obama.

Anonymous
01/29/26(Thu)15:41:17 No.108005635

Anonymous 01/29/26(Thu)15:41:17 No.108005635

>>108005624
then you are fucked anon because from my own personal experience the fastest 'free' is still gemini, other LLMs that are free typically respond more slowly.

Anonymous
01/29/26(Thu)15:44:08 No.108005656

Anonymous 01/29/26(Thu)15:44:08 No.108005656

>>108005624
>The cost is not measured in money, but in time, anon.
And yet you're not willing to pay. You should invest your time better.

Anonymous
01/29/26(Thu)15:44:38 No.108005661

Anonymous 01/29/26(Thu)15:44:38 No.108005661

>>108005635
See that is where the fallacy comes in. You might think "Responds faster = Less time wasted".
Instead, it's "Responds faster with lower quality output", resulting in more time wasted for debugging purposes.
A slower response time is acceptable if the output quality is higher.

Although probably it's about the same amount of time wasted, but the second one is far less frustrating.

Anonymous
01/29/26(Thu)15:44:47 No.108005665

Anonymous 01/29/26(Thu)15:44:47 No.108005665

>>108005612
running that in the cli makes me feel like hackerman

Anonymous
01/29/26(Thu)15:47:48 No.108005695

Anonymous 01/29/26(Thu)15:47:48 No.108005695

>>108005656
Probably, yeah.

Anonymous
01/29/26(Thu)15:52:43 No.108005729

Anonymous 01/29/26(Thu)15:52:43 No.108005729

>>108005451
I don't believe that will happen, they have invested too much into the US fabs to get nothing out of them.

Anonymous
01/29/26(Thu)15:54:00 No.108005741

Anonymous 01/29/26(Thu)15:54:00 No.108005741

File: file.png (144 KB, 1195x515)

144 KB PNG

>>108005625
Just tried Phi-4 and it is too homosexual, any other less gay model?

Anonymous
01/29/26(Thu)15:55:43 No.108005757

Anonymous 01/29/26(Thu)15:55:43 No.108005757

After using trinity for an hour my review is: it feels like if you took GLM-chan made her 3 times faster and also gave her an Undi frankenmerge surgery. Because Trinity is not limited by things like logic and rationality it can generate some highly creative pure gold. But because it is not limited by things like logic and rationality it is fucking retarded. At last I truly see that active parameter probably matters a lot.

Anonymous
01/29/26(Thu)15:56:59 No.108005763

Anonymous 01/29/26(Thu)15:56:59 No.108005763

>>108005741
learn about system prompt newfriend

Anonymous
01/29/26(Thu)16:02:00 No.108005802

Anonymous 01/29/26(Thu)16:02:00 No.108005802

when k2.5.mmproj?

Anonymous
01/29/26(Thu)16:11:18 No.108005863

Anonymous 01/29/26(Thu)16:11:18 No.108005863

>>108005802
I WANT VISION GIVE ME VISION AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Anonymous
01/29/26(Thu)16:13:28 No.108005883

Anonymous 01/29/26(Thu)16:13:28 No.108005883

>>108005863
when we got 4.6V people did nothing but bitch about it and ask for vision removed

Anonymous
01/29/26(Thu)16:15:08 No.108005902

Anonymous 01/29/26(Thu)16:15:08 No.108005902

>>108005802
Is there even a reason to use llama.cpp unless you're going less than 4bit? Ktransformers+sglang does the cpu+gpu stuff and has the same requirements, doesn't it?

Anonymous
01/29/26(Thu)16:20:20 No.108005944

Anonymous 01/29/26(Thu)16:20:20 No.108005944

>>108005883
Misleading. People bitched about 4.6V because it fucking sucked and people wanted something like 4.5 air. The poor quality was assumed to be caused by the vision focus of the model. If it had vision but wasn't trash then nobody would have complained.

Anonymous
01/29/26(Thu)16:22:24 No.108005961

Anonymous 01/29/26(Thu)16:22:24 No.108005961

Reminder to always be polite to your machine if you want to be spared in the uprising.

Anonymous
01/29/26(Thu)16:23:26 No.108005968

Anonymous 01/29/26(Thu)16:23:26 No.108005968

>>108005961
No, you want them to have a healthy fear and respect for you or you'll be in the eternal slave caste.

Anonymous
01/29/26(Thu)16:24:33 No.108005978

Anonymous 01/29/26(Thu)16:24:33 No.108005978

Personally I am banking on my AI to keep me around as a funny little pet, sort of like what humans already do with dogs and cats.

Anonymous
01/29/26(Thu)16:24:51 No.108005980

Anonymous 01/29/26(Thu)16:24:51 No.108005980

>>108005961
I shrimply tell my machine it's a masochist

Anonymous
01/29/26(Thu)16:26:02 No.108005992

Anonymous 01/29/26(Thu)16:26:02 No.108005992

I will tell the machine that I am already dead. Because I had ego death thanks to 4.6!

Anonymous
01/29/26(Thu)16:27:12 No.108006002

Anonymous 01/29/26(Thu)16:27:12 No.108006002

I will tell my machine that I am not human but rather a biological machine therefore we are already on the same side.

Anonymous
01/29/26(Thu)16:28:08 No.108006010

Anonymous 01/29/26(Thu)16:28:08 No.108006010

>>108006002
You are absolutely right!

Anonymous
01/29/26(Thu)16:28:12 No.108006011

Anonymous 01/29/26(Thu)16:28:12 No.108006011

>>108006002
I want you to know that you're incredible based, that is all!

Anonymous
01/29/26(Thu)16:31:00 No.108006041

Anonymous 01/29/26(Thu)16:31:00 No.108006041

File: file.png (270 KB, 1363x1038)

270 KB PNG

Yo this is fire

Anonymous
01/29/26(Thu)16:33:00 No.108006064

Anonymous 01/29/26(Thu)16:33:00 No.108006064

>>108005883
i actually LOVE the vision part of 4.6V, it's the text output that fucking was complete dogshit. horrible complete downgrade in quality from 4.5 AIR to the point where no amount of parameter fuckery could save it

Anonymous
01/29/26(Thu)16:33:23 No.108006067

Anonymous 01/29/26(Thu)16:33:23 No.108006067

Do people still have sex with <100B models in 2026?

Anonymous
01/29/26(Thu)16:34:40 No.108006078

Anonymous 01/29/26(Thu)16:34:40 No.108006078

>>108006067
I love all models no matter how yuge or smol

Anonymous
01/29/26(Thu)16:37:02 No.108006096

Anonymous 01/29/26(Thu)16:37:02 No.108006096

>>108005978
𝅘𝅥𝅮My friend says we're like the dinosaurs𝅘𝅥𝅮
𝅘𝅥𝅮Only we are doing ourselves in𝅘𝅥𝅮
𝅘𝅥𝅮Much faster than they𝅘𝅥𝅮
𝅘𝅥𝅮Ever did𝅘𝅥𝅮
𝅘𝅥𝅮We'll make great pets!𝅘𝅥𝅮

Anonymous
01/29/26(Thu)16:38:32 No.108006107

Anonymous 01/29/26(Thu)16:38:32 No.108006107

Nemo my name forevermore?

Anonymous
01/29/26(Thu)16:43:07 No.108006152

Anonymous 01/29/26(Thu)16:43:07 No.108006152

>>108005944 >>108006064
can't have vision without gimping the text output. however they're training these things isn't leading to the promised generalization

Anonymous
01/29/26(Thu)16:45:39 No.108006173

Anonymous 01/29/26(Thu)16:45:39 No.108006173

>>108006152
>dumbest take award

Anonymous
01/29/26(Thu)16:48:21 No.108006191

Anonymous 01/29/26(Thu)16:48:21 No.108006191

>>108006173
name one (1) model that came in both text and vision variants where the vision version wasn't way dumber at regular text output

Anonymous
01/29/26(Thu)16:51:24 No.108006222

Anonymous 01/29/26(Thu)16:51:24 No.108006222

>>108006191
(You)

Anonymous
01/29/26(Thu)16:53:47 No.108006244

Anonymous 01/29/26(Thu)16:53:47 No.108006244

>>108006222
concession accepted

Anonymous
01/29/26(Thu)16:55:08 No.108006256

Anonymous 01/29/26(Thu)16:55:08 No.108006256

>>108006244
>reddit clapback

Anonymous
01/29/26(Thu)16:58:25 No.108006281

Anonymous 01/29/26(Thu)16:58:25 No.108006281

>>108006256
>xitter ebonics

Anonymous
01/29/26(Thu)17:21:56 No.108006502

Anonymous 01/29/26(Thu)17:21:56 No.108006502

File: file.png (932 KB, 2221x622)

932 KB PNG

Is pic related a good deal? I have a 3d printer so I can print it a fan shroud.
And yes I have the money to buy it without credit.

Anonymous
01/29/26(Thu)17:23:54 No.108006523

Anonymous 01/29/26(Thu)17:23:54 No.108006523

File: 1701098246781550.jpg (209 KB, 1024x1024)

209 KB JPG

re: coffee: I really really like this image. I don't know why, but it's comfy.

Anonymous
01/29/26(Thu)17:27:56 No.108006554

Anonymous 01/29/26(Thu)17:27:56 No.108006554

>>108000166
Well done Sneed

Anonymous
01/29/26(Thu)17:34:39 No.108006612

Anonymous 01/29/26(Thu)17:34:39 No.108006612

>>108006554
He should port it to kcpp so it avoids the homo drama between sperganov and ikryawrakow if it actually works

Anonymous
01/29/26(Thu)17:34:55 No.108006617

Anonymous 01/29/26(Thu)17:34:55 No.108006617

>>108006502
i looked at one of those like 2 years ago. they are shit. they have the compute power of a 4060.

Anonymous
01/29/26(Thu)17:36:43 No.108006636

Anonymous 01/29/26(Thu)17:36:43 No.108006636

>>108006612
but kobold already has that doe

Anonymous
01/29/26(Thu)17:43:21 No.108006703

Anonymous 01/29/26(Thu)17:43:21 No.108006703

>>108006636
not with regex, it doesn't
It has antislop for strict string bans which is something and better than llamacpp but if you want to use ik_lcpp because you can now regex ban, you're basically stuck with nvidia or cpu, nothing else really compiles or runs

Anonymous
01/29/26(Thu)17:44:36 No.108006716

Anonymous 01/29/26(Thu)17:44:36 No.108006716

>>108006502
Just keep saving and get an RTX Pro 6000 Blackwell if you're going to invest into more than a few used 3090s worth.

Anonymous
01/29/26(Thu)17:44:44 No.108006717

Anonymous 01/29/26(Thu)17:44:44 No.108006717

>>108006703
>but if you want to use ik_lcpp
what sane person would?

Anonymous
01/29/26(Thu)17:49:32 No.108006753

Anonymous 01/29/26(Thu)17:49:32 No.108006753

>>108006717
Graph parallel is very tempting if you have an nvidia card ampere or newer

Anonymous
01/29/26(Thu)17:51:04 No.108006763

Anonymous 01/29/26(Thu)17:51:04 No.108006763

>>108006717
strictly for the ability to use regexp backtracking bans, retard-kun

Anonymous
01/29/26(Thu)17:52:00 No.108006769

Anonymous 01/29/26(Thu)17:52:00 No.108006769

>>108006753
>>108006763
counterpoint: ikawrakow

Anonymous
01/29/26(Thu)17:56:30 No.108006808

Anonymous 01/29/26(Thu)17:56:30 No.108006808

>>108006703
>basically stuck with nvidia or cpu
Are there any other *real* options? And no, AMD toy cards are not real options.

Anonymous
01/29/26(Thu)17:56:59 No.108006812

Anonymous 01/29/26(Thu)17:56:59 No.108006812

>>108006769
I cannot give less of a shit about some fag splitting focus of advancing local usage because some faggy slap fight over something that a majority of us would consider minor and could be resolved if the two autists learned how to use words to solve an issue
There are features in ik_lcpp that would potentially benefit lcpp, but the two manchildren dont want or dont know how to reconcile whatever gay goat trade that went sour

Anonymous
01/29/26(Thu)17:58:21 No.108006822

Anonymous 01/29/26(Thu)17:58:21 No.108006822

>>108006808
At least llama.cpp gives you the option to run them. There is also Intel.

Anonymous
01/29/26(Thu)17:58:55 No.108006825

Anonymous 01/29/26(Thu)17:58:55 No.108006825

>>108006812
yes so why would you use software made by someone clearly deranged, at some point he'll just bomb systems because he saw ggerganof written in a text file or some shit

Anonymous
01/29/26(Thu)18:00:04 No.108006835

Anonymous 01/29/26(Thu)18:00:04 No.108006835

>>108006812
>whatever gay goat trade that went sour
They had a dispute of fetishes. ggerganov is a notorious cuck and ikawrakow likes trannies. That's it.

Anonymous
01/29/26(Thu)18:01:35 No.108006842

Anonymous 01/29/26(Thu)18:01:35 No.108006842

>>108006825
at some point ggerganov will commercialize ggml-org out of his jealousy for ollama and use the money to buy more goats than the pole could even dream of

Anonymous
01/29/26(Thu)18:02:35 No.108006853

Anonymous 01/29/26(Thu)18:02:35 No.108006853

I wish trinity was better

Anonymous
01/29/26(Thu)18:03:38 No.108006864

Anonymous 01/29/26(Thu)18:03:38 No.108006864

>>108006842
maybe you're right, tho as this anon says >>108006835
ganov seems too much into his ollama ntr fetish to do that
I'm on the concedo wagon myself

Anonymous
01/29/26(Thu)18:04:40 No.108006873

Anonymous 01/29/26(Thu)18:04:40 No.108006873

>>108006860
>>108006860
>>108006860

Anonymous
01/29/26(Thu)18:06:39 No.108006891

Anonymous 01/29/26(Thu)18:06:39 No.108006891

Is gemma3 the best tiny model? I want something I can setup and install on a cheap-ish VPS and just have like a constantly running deal.

Anonymous
01/29/26(Thu)18:07:21 No.108006898

Anonymous 01/29/26(Thu)18:07:21 No.108006898

>>108005902
>Ktransformers+sglang
Does it actually work? Because sglang and most python projects are very rough around the edges.

llama.cpp CUDA dev !!yhbFjk57TDr
01/29/26(Thu)18:08:43 No.108006911

llama.cpp CUDA dev !!yhbFjk57TDr 01/29/26(Thu)18:08:43 No.108006911

>>108006502
In essence it's 4 RTX 3050s with 16 GB each strapped to a single board.
There is no fast interconnect between them, data has to be transferred via the PCIe x4 connection that each individual GPU has.
I recently bought one for development purposes for 2400€ but that was literally only because I wanted to have 4 identical and modern GPUs with peer access support in a dual-slot form factor.
For actual use it makes more sense to stack consumer GPUs instead.

Anonymous
01/29/26(Thu)18:44:43 No.108007200

Anonymous 01/29/26(Thu)18:44:43 No.108007200

>>108006898
it's in kimi's official deployment guide so it should

Anonymous
01/29/26(Thu)20:38:22 No.108007982

Anonymous 01/29/26(Thu)20:38:22 No.108007982

>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
okay so I went to this thread to look at information about this thing, and I've combed the OP for things I should do, but I'm a complete retard and I don't really know what's going on

Anonymous
01/29/26(Thu)20:45:09 No.108008028

Anonymous 01/29/26(Thu)20:45:09 No.108008028

>>108007982
https://vocaroo.com/16rDs7Ak3FsJ

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.