/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 05/14/26(Thu)08:58:28 No.108821001

File: 1766360935764827.jpg (1.3 MB, 4000x2252)

1.3 MB JPG

/lmg/ - Local Models General Anonymous 05/14/26(Thu)08:58:28 No.108821001

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108813392 & >>108805584

►News
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
05/14/26(Thu)08:59:15 No.108821005

Anonymous 05/14/26(Thu)08:59:15 No.108821005

File: __hatsune_miku_vocaloid_a(...).jpg (1.23 MB, 2323x2592)

1.23 MB JPG

►Recent Highlights from the Previous Thread: >>108813392

--Paper (old): Activated LoRA: Fine-tuned LLMs for Intrinsics:
>108814192 >108814207 >108814326 >108814898 >108814962
--Debating the validity of PPL and KLD for comparing quants:
>108819872 >108819961 >108819996 >108820054 >108820084 >108820090 >108820124 >108820137
--Criticism of llama-server logs and discussion on token counting endpoints:
>108813998 >108814016 >108814027 >108814169 >108814729 >108814349 >108814458
--llama.cpp added continue generation support for reasoning models:
>108814696 >108814740 >108814801 >108814861
--Using Kokoro voice models with a multi-agent character system:
>108820291 >108820303 >108820316 >108820408 >108820447 >108820467
--llama.cpp development priorities and Firefox WebGPU support:
>108818249 >108818267 >108818284 >108818347 >108818276 >108819396 >108819670 >108820049
--Anon buys 4 Intel Arc Pro B60s for cheap 96GB VRAM:
>108817916 >108817941 >108817950 >108818620 >108819781 >108819795 >108818032
--Anons building minimalist custom frontends to replace SillyTavern:
>108815525 >108816950 >108815617 >108816045 >108815674 >108815699 >108815730 >108815935
--Comparing LibreChat and OpenWebUI with tips for llama.cpp thinking toggle:
>108816798 >108817231 >108817290
--Searching for MCP servers using vision and OCR for web search:
>108820282 >108820321
--nla.cpp for interpreting LLM internal states and Gemma 4 compatibility:
>108817420 >108817600 >108817615 >108817630
--Comparing Zoo Code and Copilot gateways for privacy-focused local coding:
>108814561 >108814606 >108814636 >108814653 >108814694
--llama.cpp added continue generation support for reasoning models:
>108813529 >108813601
--Preventing cloud models from collecting data in opencode:
>108818616 >108818870
--Logs:
>108813780
--Gumi, Miku (free space):
>108813423 >108818032 >108818052 >108819762

►Recent Highlight Posts from the Previous Thread: >>108813394

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
05/14/26(Thu)09:08:00 No.108821073

Anonymous 05/14/26(Thu)09:08:00 No.108821073

SEX

Anonymous
05/14/26(Thu)09:12:01 No.108821104

Anonymous 05/14/26(Thu)09:12:01 No.108821104

File: Screenshot from 2026-05-1(...).png (624 KB, 1596x650)

624 KB PNG

worth buying one of these for AI? turing is old, but that VRAM buffer opens up a lot of possibilities. It's more VRAM than a 5090.

Anonymous
05/14/26(Thu)09:13:36 No.108821114

Anonymous 05/14/26(Thu)09:13:36 No.108821114

>>108821104
Not for 2000 usd they're not.

Anonymous
05/14/26(Thu)09:15:29 No.108821134

Anonymous 05/14/26(Thu)09:15:29 No.108821134

>>108821104
worth it?, no, you can get like 4 p100 for around 1/10th of the price, even a new full pc dedicated only for that will be around 1/4th or 1/3 of the price
but depends on your usecase, if you have some extra limitations etc.

Anonymous
05/14/26(Thu)09:16:59 No.108821143

Anonymous 05/14/26(Thu)09:16:59 No.108821143

>>108821104
Does Turing have FA yet?

Anonymous
05/14/26(Thu)09:17:07 No.108821145

Anonymous 05/14/26(Thu)09:17:07 No.108821145

>>108821104
>2000 spanish real de ocho dólares
lmao
You can buy like at least x3 3090 with that.

Anonymous
05/14/26(Thu)09:19:14 No.108821159

Anonymous 05/14/26(Thu)09:19:14 No.108821159

>>108821145
the power draw would be insane. you'd need a 2000 watt power supply.

Anonymous
05/14/26(Thu)09:19:59 No.108821166

Anonymous 05/14/26(Thu)09:19:59 No.108821166

File: Untitled.png (244 KB, 950x573)

244 KB PNG

>>108821145
I can't even buy two in my country.

Anonymous
05/14/26(Thu)09:23:35 No.108821187

Anonymous 05/14/26(Thu)09:23:35 No.108821187

>>108821159
You don't, undervolt nigga, you have decently priced 1600w psu now, although they may run with 1200w with a heavy undervolt and a light system.

Anonymous
05/14/26(Thu)09:24:52 No.108821198

Anonymous 05/14/26(Thu)09:24:52 No.108821198

>>108821187
You'd also need a server CPU since consumer grade CPUs don't have the PCIE bandwidth to support 3 cards.

Anonymous
05/14/26(Thu)09:25:40 No.108821209

Anonymous 05/14/26(Thu)09:25:40 No.108821209

>>108821198
do you really need the bandwidth?

Anonymous
05/14/26(Thu)09:27:52 No.108821225

Anonymous 05/14/26(Thu)09:27:52 No.108821225

>>108821209
won't the cards run slow as shit without it?

Anonymous
05/14/26(Thu)09:28:15 No.108821227

Anonymous 05/14/26(Thu)09:28:15 No.108821227

https://huggingface.co/deepseek-ai/DeepSeek-V4-Lite

Anonymous
05/14/26(Thu)09:33:41 No.108821259

Anonymous 05/14/26(Thu)09:33:41 No.108821259

since wikitext PPL is meaningless on instruct, would testing the quant on base models work out? trying to see how much i can get out of 4-bit quantization.
i'm downloading qwen 3.5 9b base for now.

Anonymous
05/14/26(Thu)09:35:56 No.108821273

Anonymous 05/14/26(Thu)09:35:56 No.108821273

>>108821166
Vram market in aus is absolutely crap. I imported my stuff from hong kong a little while before prices went apeshit, might still be worth taking a squiz on stores from there that ship to oz.

Anonymous
05/14/26(Thu)09:38:14 No.108821290

Anonymous 05/14/26(Thu)09:38:14 No.108821290

>>108821166
I feel your pain
t. JP

Anonymous
05/14/26(Thu)09:38:38 No.108821294

Anonymous 05/14/26(Thu)09:38:38 No.108821294

>>108821225
are they actually exchanging information? i thought tensor split didnt do that, but i may be mistaken

Anonymous
05/14/26(Thu)09:39:40 No.108821302

Anonymous 05/14/26(Thu)09:39:40 No.108821302

>>108821290
>>108821273
>>108821166
xianyu and a forwarder

Anonymous
05/14/26(Thu)09:41:02 No.108821308

Anonymous 05/14/26(Thu)09:41:02 No.108821308

>>108821225
I've had no troubles with x16+x4 tp2 3090s back in the exl2 days. Don't know about how it'd affect llama.cpp. I run all my cards at x16 now, since you can get a h12d-8d mb with 4 gen4 x16 slots and epyc 7502 combo for 400 usd in china.

Anonymous
05/14/26(Thu)09:42:03 No.108821318

Anonymous 05/14/26(Thu)09:42:03 No.108821318

>>108821302
They killed my hard drives.

Anonymous
05/14/26(Thu)09:42:15 No.108821322

Anonymous 05/14/26(Thu)09:42:15 No.108821322

>>108821225
NTA but not if you are running them in series for inferencing rather than in parallel.

Anonymous
05/14/26(Thu)09:44:07 No.108821333

Anonymous 05/14/26(Thu)09:44:07 No.108821333

>>108821302
idk never brought new I just camp forever and sweep up local glitched deals actually

Anonymous
05/14/26(Thu)09:47:26 No.108821353

Anonymous 05/14/26(Thu)09:47:26 No.108821353

>>108821302
imma gook and i use xianyu+alibaba integrated forwarder
idk if that works for westerners

Anonymous
05/14/26(Thu)09:52:24 No.108821385

Anonymous 05/14/26(Thu)09:52:24 No.108821385

>>108821353
Which variety of gook, like vietcong or muslim ones

Anonymous
05/14/26(Thu)09:52:59 No.108821388

Anonymous 05/14/26(Thu)09:52:59 No.108821388

>>108821385
worst korea, the opposite of the best one

Anonymous
05/14/26(Thu)09:56:47 No.108821412

Anonymous 05/14/26(Thu)09:56:47 No.108821412

>>108821166
The Gigabyte Turbo RTX 3090 is overpriced by like 100% and always has been. You literally just looked for that exact model to compare to be a contrarian shitbag.

Anonymous
05/14/26(Thu)09:57:09 No.108821416

Anonymous 05/14/26(Thu)09:57:09 No.108821416

File: 1778766911316629.jpg (502 KB, 3992x978)

502 KB JPG

>>108821290
Tokyo? I have two 3090s collecting dust on the shelf. Haven't used them since Mistral Large times

Anonymous
05/14/26(Thu)10:00:05 No.108821434

Anonymous 05/14/26(Thu)10:00:05 No.108821434

has anyone tested to see if gemma 4 is good enough fort the minecraft ai companion mod?

Anonymous
05/14/26(Thu)10:01:50 No.108821455

Anonymous 05/14/26(Thu)10:01:50 No.108821455

>>108821434
isnt minecraft a fundamentally spatial task that
llms cannot really comprehend whatever is happenuing even with aggressive harness

Anonymous
05/14/26(Thu)10:02:59 No.108821458

Anonymous 05/14/26(Thu)10:02:59 No.108821458

>>108821455
i've seen them doing some stuff but building anything entirely on their own looked horrendous unless it's prompting the building with some sort of minecraft building genai model

Anonymous
05/14/26(Thu)10:03:26 No.108821461

Anonymous 05/14/26(Thu)10:03:26 No.108821461

I feel sorry for the anon who wasted his money on B60s. I had a B70 for a weekend, what a piece of shit. It technically worked, but it took a ton of compromises. Here's what annoyed me enough to RMA it:
- had to use a custom kernel which supports the card (not terrible, you have to dick around like this for nvidia too)
- sycl was a pain in the ass, only supporting limited distro choices
- llama.cpp mmap was somehow fucked up, insisting on first loading the model to memory, which ruined my testing, since my test box only had 32GB
- couldn't get vllm to work at all

Since sending it back I hear Intel is giving up on the whole of consumer discrete GPUs anyway. I hate paying the nvidia tax, but shit at least works. I say if you have the patience to play with B70/B60 then you might as well V100 32GB max instead, you'll probably get similar performance, it'll cost less, and more things will be supported.

Anonymous
05/14/26(Thu)10:05:53 No.108821475

Anonymous 05/14/26(Thu)10:05:53 No.108821475

>>108821455
Yes but so are the LLM's usually used to run it. The mod says you can locally host it but you need to have a REALLY high end model. (don't you just love vague niggers)I figured if Claudussy and chatgpt can do it then surely gemma might.

Anonymous
05/14/26(Thu)10:06:38 No.108821482

Anonymous 05/14/26(Thu)10:06:38 No.108821482

>>108821461
Did you try vulkan?
That should work without much work right?

Anonymous
05/14/26(Thu)10:06:41 No.108821484

Anonymous 05/14/26(Thu)10:06:41 No.108821484

>>108821461
They're giving up on gaming gpus since the money is in workstation cards for ai.

Anonymous
05/14/26(Thu)10:13:10 No.108821523

Anonymous 05/14/26(Thu)10:13:10 No.108821523

>>108821461
>and more things will be supported.
Not really. It's already out of support, which means you can't use cuda 13 or the latest drivers and most inferencing software besides llama.cpp have already dropped it as well.

Anonymous
05/14/26(Thu)10:14:28 No.108821531

Anonymous 05/14/26(Thu)10:14:28 No.108821531

>>108821475
gpclaumini-opus-xhigh-max will fail to build a human grade moderately decorated base at the moment

Anonymous
05/14/26(Thu)10:18:15 No.108821557

Anonymous 05/14/26(Thu)10:18:15 No.108821557

>>108821531
That's fine, I pretty much already figured this was going to be the case. Its just when you have local models at the level gemma 4 is currently at, I don't see why I can't find a single youtube video of something like E4b being used to drive NPC behavior. You can do a lot with an LLM just by handing it some tools and still letting classical systems for visual input take over the hard work. An AI doesn't need to see a literal cheese wheel to pick it up in skyrim. It just needs to see the cheese wheel on some text formatted list that it can use the option to pick up.

Anonymous
05/14/26(Thu)10:18:17 No.108821560

Anonymous 05/14/26(Thu)10:18:17 No.108821560

>>108821482
I thought about it, but I wasted so much time merely getting llama.cpp to work at all I'd lost patience for it and decided to cut my losses. A single B70 really isn't enough to run qwen 3.6 27b at q8 with full context, and I'm trying to save for a maxxed out M5 Studio when they're released - if there's at least a 256GB model.
I used to be a CPU deepseek-haver, back when no one wanted DDR4 memory and 512GB of it was only like $800. It worked, but far too slowly. I don't think I need to go that large, but it would be nice to run 300-400B-tier models at q4, and that's not really doable in 128GB shared memory.

Anonymous
05/14/26(Thu)10:20:30 No.108821577

Anonymous 05/14/26(Thu)10:20:30 No.108821577

>>108821523
Correct on CUDA13 but I am pretty sure volta continues to get improvements on llama.cpp, no?

The best value I've gotten from nvidia was buying a modded 4090D 48GB. Ada is still relevant, the card has been totally reliable, and 48GB is enough to play with things like ltx-2.3 and 27b-tier models.

Anonymous
05/14/26(Thu)10:21:00 No.108821581

Anonymous 05/14/26(Thu)10:21:00 No.108821581

File: workspace.jpg (142 KB, 811x1299)

142 KB JPG

If you have multiple GPUs try out split mode tensor.
My Gemma 4 31B Q8 jumped from 25 t/s to 36 t/s, IBM Granite 4.1 30B and Qwen 3.6 27B even go beyond 40 t/s.
The only downside is that it doesn't support context kv quant, but since it's faster than using speculative decoding I can kick the draft model and use the vram for higher quality context.
My server now sounds like Ramiel charging an attack during inference though.

Anonymous
05/14/26(Thu)10:22:29 No.108821592

Anonymous 05/14/26(Thu)10:22:29 No.108821592

>>108821557
i thihk you can do that with openai compatible ser ver on llamacpp
try it and let me know
i do like minecraftg

Anonymous
05/14/26(Thu)10:23:22 No.108821601

Anonymous 05/14/26(Thu)10:23:22 No.108821601

For anyone using ik_llama and GLM 4.X, try MTP. Went from 4.8t/s to 6.1t/s with --draft-max 5 and --draft-p-min 0.75.
Don't pull past this commit though, I'm getting a performance regression from one of the commits after.
https://github.com/ikawrakow/ik_llama.cpp/pull/1784

Anonymous
05/14/26(Thu)10:24:22 No.108821610

Anonymous 05/14/26(Thu)10:24:22 No.108821610

Anon, what's the sota translation model for EN>FR... I want to translate a book on MJ but it's only in English.

Anonymous
05/14/26(Thu)10:25:39 No.108821621

Anonymous 05/14/26(Thu)10:25:39 No.108821621

>>108821461
>lama.cpp mmap was somehow fucked up, insisting on first loading the model to memory, which ruined my testing, since my test box only had 32GB
i can't believe that's still a thing!
i got fucked by that a year ago with 2 a770s

Anonymous
05/14/26(Thu)10:26:15 No.108821627

Anonymous 05/14/26(Thu)10:26:15 No.108821627

>>108821592
Probably won't be in this exact thread but I'll definitely set it up later today or tomorrow after some sleep.

Anonymous
05/14/26(Thu)10:29:08 No.108821643

Anonymous 05/14/26(Thu)10:29:08 No.108821643

>>108821601
> using ik_llama and GLM 4.X, try MTP.
I use 4.6
Give your speeds, are you on mostly on CPU?

Anonymous
05/14/26(Thu)10:30:30 No.108821653

Anonymous 05/14/26(Thu)10:30:30 No.108821653

>>108821627
quote me on /lmg/ with the thread please when you do that
i am interested

Anonymous
05/14/26(Thu)10:34:25 No.108821678

Anonymous 05/14/26(Thu)10:34:25 No.108821678

>>108821577
llama.cpp yes, ik_llama and vllm no. Image and video stuff was already dodgy due to the lack of flash attention support. There's no problems with drivers or cuda though, which is nice, but in every other way it's not much different from going amd or intel in that you're basically locked in the llm inference through llama.cpp only and they have very little use outside of that.

Anonymous
05/14/26(Thu)10:36:51 No.108821694

Anonymous 05/14/26(Thu)10:36:51 No.108821694

File: 1774094521777896.png (63 KB, 659x171)

63 KB PNG

>>108821643
>Give your speeds
It fluctuates. I'm getting +1t/s basically, but it's free and it's a 20% boost.
>are you on mostly on CPU
Yes

Anonymous
05/14/26(Thu)10:42:23 No.108821740

Anonymous 05/14/26(Thu)10:42:23 No.108821740

>>108821610
google/translategemma

Anonymous
05/14/26(Thu)10:42:32 No.108821743

Anonymous 05/14/26(Thu)10:42:32 No.108821743

>>108821416
Price has gone up almost 2 fold, you should sell

Anonymous
05/14/26(Thu)10:42:46 No.108821745

Anonymous 05/14/26(Thu)10:42:46 No.108821745

Wait, wasn't MTP speculative decoding merged in to llama? Not seeing the option available.

Anonymous
05/14/26(Thu)10:43:32 No.108821752

Anonymous 05/14/26(Thu)10:43:32 No.108821752

>>108821581
>mode tensor.
which gpu? that is for richfags with NVLink

Anonymous
05/14/26(Thu)10:49:27 No.108821802

Anonymous 05/14/26(Thu)10:49:27 No.108821802

>>108821745
gerganiggers decided it will not be merged until all the poorfag platforms are also supported
has been dragging on for a week now
https://github.com/ggml-org/llama.cpp/pull/22673

Anonymous
05/14/26(Thu)10:49:46 No.108821806

Anonymous 05/14/26(Thu)10:49:46 No.108821806

>>108821743
Too bothersome. I'm happy to sell them for 100k each if it's a hand-to-hand exchange

Anonymous
05/14/26(Thu)10:50:47 No.108821815

Anonymous 05/14/26(Thu)10:50:47 No.108821815

price in japanese yen, obviously

Anonymous
05/14/26(Thu)10:56:18 No.108821872

Anonymous 05/14/26(Thu)10:56:18 No.108821872

I know some industry fags browse here, can one of you Xtards remind elon that he's supposed to open source old grok versions? or just go ahead and publish it without asking him?

Anonymous
05/14/26(Thu)10:58:45 No.108821899

Anonymous 05/14/26(Thu)10:58:45 No.108821899

>>108821752
Two asus 3090. I don't have NVLink, both communicate over PCIe (x8/x8 mode) over CPU. The topology is PHB to be precise.
I was looking into direct PCIe communication and P2P, but there is no support for 3090 for that.
But yeah, tensor split also works over PCIe.

Anonymous
05/14/26(Thu)11:00:29 No.108821915

Anonymous 05/14/26(Thu)11:00:29 No.108821915

File: Screenshot_20260514_105802.png (54 KB, 1106x357)

54 KB PNG

Anonymous
05/14/26(Thu)11:00:53 No.108821922

Anonymous 05/14/26(Thu)11:00:53 No.108821922

>>108821899
Unless stuff changed recently won't the pcie be totally overwhelm by row, tensor mode chatter?
Was it really faster on tensor mode?

Anonymous
05/14/26(Thu)11:00:58 No.108821923

Anonymous 05/14/26(Thu)11:00:58 No.108821923

File: wally feat.jpg (280 KB, 1041x1600)

280 KB JPG

What model size can an rtx 5090 32gb vram all on its own? I am satisfied with like 7-10 tokens per second speed. I need to ascend from the online chatbot slop services that charge like 20 bucks for a model with 16k context tokens ontop of LIMITED TOTAL MESSAGES (im not even joking, half the websites do that shit)

If I blow like 3k-4k on that graphics card, what type of model can I run with those at around those speeds and 32k context?

Anonymous
05/14/26(Thu)11:03:28 No.108821944

Anonymous 05/14/26(Thu)11:03:28 No.108821944

>>108821899
>there is no support for 3090 for that
There is on some. Have you tried https://github.com/tinygrad/open-gpu-kernel-modules ?

Anonymous
05/14/26(Thu)11:04:02 No.108821949

Anonymous 05/14/26(Thu)11:04:02 No.108821949

>>108821923
I block images in these threads but if programing qwen 27b at q6 can give you over 220k tokens at kv cache q8_0
Everything else you can run gemma 31b at 50k at full kv because it doesn't handle quants well, both will do 90% of task if you're doing rp and not precise work you can quant the kv for gemma it will be fine.

Anonymous
05/14/26(Thu)11:08:53 No.108821988

Anonymous 05/14/26(Thu)11:08:53 No.108821988

>>108821949
Thanks, may I ask why you block images in these threads?

Also aren't those models usually really strict with censorship and break character often?

Anonymous
05/14/26(Thu)11:11:06 No.108822000

Anonymous 05/14/26(Thu)11:11:06 No.108822000

>>108821949
>at full kv because it doesn't handle quants well,
Meanwhile, I'm having 0 issues with 3.0 bpw exl3 gemma 31 with q8 context

Anonymous
05/14/26(Thu)11:11:18 No.108822003

Anonymous 05/14/26(Thu)11:11:18 No.108822003

>>108821915
Kek, been going well for you so far, I take it?

t. spent more energy than I care to admit telling gemma and qwen they were both retarded when they broke things.

Anonymous
05/14/26(Thu)11:13:29 No.108822015

Anonymous 05/14/26(Thu)11:13:29 No.108822015

>>108822003
I get so fucking angry with the dumb shit it does
>>108822000
No idea what that is but if it's in llama I'll look into it
>>108821988
Like most people in this hobby we have jobs and families, I refuse to let them walk in on some degenerate shit while I'm talking about a passion

Anonymous
05/14/26(Thu)11:16:28 No.108822038

Anonymous 05/14/26(Thu)11:16:28 No.108822038

Anyone know an almost instant results VLM OCR model that's also still really accurate? Gemma 31B is great, but slow especially on large images with 1120 tokens. I tried Gemini 2.5 flash lite through API and funnily it's not so flashy either, although it definitely is faster than my local gemma (I also tried Gemini 3 models but they all give me a rate limit error).

Anonymous
05/14/26(Thu)11:17:31 No.108822046

Anonymous 05/14/26(Thu)11:17:31 No.108822046

>>108822038
You should use a MoE model for that in all seriousness unless you're willing to walk away for a bit

Anonymous
05/14/26(Thu)11:18:38 No.108822053

Anonymous 05/14/26(Thu)11:18:38 No.108822053

>>108822038
Have you tried e2b/e4b gemmas? They are not so bad for small tasks

Anonymous
05/14/26(Thu)11:20:00 No.108822066

Anonymous 05/14/26(Thu)11:20:00 No.108822066

>>108822046
But then I'd have to give up 31B as I can't load two big models... For my use case privacy isn't an issue so I'd be willing to use API, but I don't free ones seem like a well that's been dried up.

>>108822053
Oh true. I forgot those existed kek. Will try later.

Anonymous
05/14/26(Thu)11:21:01 No.108822077

Anonymous 05/14/26(Thu)11:21:01 No.108822077

>>108822066
God, I need sleep.

Anonymous
05/14/26(Thu)11:22:21 No.108822083

Anonymous 05/14/26(Thu)11:22:21 No.108822083

>>108822038
Why not try one of the OCR specific models? llama.cpp had a wave of adding support for a bunch of OCR models last month and most of them are tiny like 3B. HunyuanOCR is only 1B.

Anonymous
05/14/26(Thu)11:26:08 No.108822100

Anonymous 05/14/26(Thu)11:26:08 No.108822100

>>108821922
Nope. I tried row before and it was obviously slower than layer because of the constant transfer.
Layer used GPU0 first and then switched to GPU1. With tensor both GPUs are now used at at 100% at all times, giving me a 50 - 60% boost. I was careful to get a motherboard with two electrical x8/mechanical x16 slots and an amd 9950x with 24 lanes, I monitored around 1GB/s transfer over PCIe during inference.

>>108821944
I've read that along
https://morgangiraud.medium.com/multi-gpu-nvidia-p2p-capabilities-and-debugging-tips-fb7597b4e2b5
https://morgangiraud.medium.com/multi-gpu-tinygrad-patch-4904a75f8e16
but I got somewhat deterred by the old driver.
Maybe I'll try it, though I don't want to destroy my working arch system with the custom kernel-module right now.

Anonymous
05/14/26(Thu)11:30:32 No.108822119

Anonymous 05/14/26(Thu)11:30:32 No.108822119

>>108822100
Funny, I feel the same about new drivers. I have a dedicated machine for LLMs, though

Anonymous
05/14/26(Thu)11:32:24 No.108822131

Anonymous 05/14/26(Thu)11:32:24 No.108822131

>Qwen 0.6B on Llama.cpp (CPU only, default args, 1024 ctx)
58 t/s
>Qwen 0.6B on Llamafile (CPU only, default args, 1024 ctx)
32 t/s

How is llamafile so friggin terrible? I was thinking of using it as part of a cross-platform app but with a performance hit that bad I'm half considering just having it download all the different major OS releases of llamacpp and routing to the binary of a detected OS environment.

Anonymous
05/14/26(Thu)11:32:29 No.108822132

Anonymous 05/14/26(Thu)11:32:29 No.108822132

>>108822100
>but I got somewhat deterred by the old driver.
Newer updates are at
https://github.com/aikitoria/open-gpu-kernel-modules

Anonymous
05/14/26(Thu)11:34:25 No.108822143

Anonymous 05/14/26(Thu)11:34:25 No.108822143

>>108822131
Its a meme fork, there is basically zero reason for it's existence.

Anonymous
05/14/26(Thu)11:34:37 No.108822147

Anonymous 05/14/26(Thu)11:34:37 No.108822147

>>108822131
Jack of all trades, etc.

Anonymous
05/14/26(Thu)11:38:17 No.108822168

Anonymous 05/14/26(Thu)11:38:17 No.108822168

>>108822131
Ask llm how to compile llama.cpp into APE with Cosmopolitan Libc

Anonymous
05/14/26(Thu)11:42:41 No.108822188

Anonymous 05/14/26(Thu)11:42:41 No.108822188

>>108822168
Is it really that simple? I figured llamafile was just what happened when you try to compile it with Cosmopolitan Libc, did mozilla do something retarded in their build process?

Anonymous
05/14/26(Thu)11:47:00 No.108822226

Anonymous 05/14/26(Thu)11:47:00 No.108822226

>>108821922
>won't the pcie be totally overwhelm by row, tensor mode chatter?
nvlink is faster, but pcie4.0 x8 and pcie4.0 x16 are fine
it's slightly slower at prompt processing with x8
makes a bigger difference if you run command-r or mistral-large
i've got 6 running at x8 now. with gemma-4-31b there's barely any difference if i use the nvlink briged pair or random cards.

Anonymous
05/14/26(Thu)11:55:35 No.108822292

Anonymous 05/14/26(Thu)11:55:35 No.108822292

>>108822188
no idea. ask llm

Anonymous
05/14/26(Thu)13:10:39 No.108822748

Anonymous 05/14/26(Thu)13:10:39 No.108822748

File: 1773462512518368.jpg (123 KB, 892x1024)

123 KB JPG

My gf said "vibecode yourself into existence" and then broke up with me.
What did she mean by this?

Anonymous
05/14/26(Thu)13:11:57 No.108822761

Anonymous 05/14/26(Thu)13:11:57 No.108822761

>>108822748
she's not real, get help

Anonymous
05/14/26(Thu)13:13:04 No.108822767

Anonymous 05/14/26(Thu)13:13:04 No.108822767

>>108822748
In her view you were too busy vibecoding instead of "existing", ie being out in the world, socialising, connecting with her, etc.,
Seems pretty obvious but I guess you are autistic

Anonymous
05/14/26(Thu)13:13:39 No.108822774

Anonymous 05/14/26(Thu)13:13:39 No.108822774

>>108822748
do real things instead of playing with llms
touch grass anon

Anonymous
05/14/26(Thu)13:18:28 No.108822808

Anonymous 05/14/26(Thu)13:18:28 No.108822808

Did the guy who wanted to make a space port actually make it?
Just wondering if it's actually possible.

Anonymous
05/14/26(Thu)13:26:34 No.108822872

Anonymous 05/14/26(Thu)13:26:34 No.108822872

>>108822808
Funding has been secured.

Anonymous
05/14/26(Thu)13:33:12 No.108822913

Anonymous 05/14/26(Thu)13:33:12 No.108822913

>>108822761
he vibecoded his gf, the breakup is simply a clue as to the quality of his vibecode

Anonymous
05/14/26(Thu)13:46:11 No.108823008

Anonymous 05/14/26(Thu)13:46:11 No.108823008

File: file.png (85 KB, 1735x332)

85 KB PNG

>ggml.ai:80
>accidentally approved the request
It's over... They know...

Anonymous
05/14/26(Thu)13:52:36 No.108823043

Anonymous 05/14/26(Thu)13:52:36 No.108823043

>>108823008
No context on what made the connection? Just disinfo?

Anonymous
05/14/26(Thu)13:55:21 No.108823065

Anonymous 05/14/26(Thu)13:55:21 No.108823065

>>108823043
./ci/run.sh

Anonymous
05/14/26(Thu)14:06:35 No.108823132

Anonymous 05/14/26(Thu)14:06:35 No.108823132

>>108823065

grep -R 'ggml\.ai' *
...
common/arg.cpp:            "[(card)](https://ggml.ai/f0.png)", params.n_cache_reuse
gguf-py/pyproject.toml:authors = ["GGML <ggml@ggml.ai>"]
gguf-py/pyproject.toml:homepage = "https://ggml.ai"
pyproject.toml:authors = ["GGML <ggml@ggml.ai>"]
pyproject.toml:homepage = "https://ggml.ai"
tests/test-arg-parser.cpp:    const char * GOOD_URL = "http://ggml.ai/";
tests/test-arg-parser.cpp:    const char * BAD_URL  = "http://ggml.ai/404";
tools/server/README.md:| `--cache-reuse N` | min chunk size to attempt reusing from the cache via KV shifting, requires prompt caching to be enabled (default: 0)<br/>[(card)](https://ggml.ai/f0.png)<br/>(env: LLAMA_ARG_CACHE_REUSE) |
tools/server/tests/unit/test_vision_api.py:        ("What is this:\n", "https://ggml.ai",        False, None), # non-image data

Gee, why is your computer running what you told it to run? It's crazy.

Anonymous
05/14/26(Thu)14:14:14 No.108823183

Anonymous 05/14/26(Thu)14:14:14 No.108823183

>>108823132
I didn't expect it to phone home... Every person that ran the tests has been logged...

Anonymous
05/14/26(Thu)14:20:53 No.108823232

Anonymous 05/14/26(Thu)14:20:53 No.108823232

>>108823183
So it was disinfo. Cool. Stop running shit you don't understand.

Anonymous
05/14/26(Thu)14:28:23 No.108823286

Anonymous 05/14/26(Thu)14:28:23 No.108823286

>>108823183
who fucking cares

Anonymous
05/14/26(Thu)14:30:31 No.108823299

Anonymous 05/14/26(Thu)14:30:31 No.108823299

>>108823232
Why do you want to be logged each time you run the tests? It's also in plain text, everybody along the way knows what you're doing.

Anonymous
05/14/26(Thu)14:31:05 No.108823300

Anonymous 05/14/26(Thu)14:31:05 No.108823300

>>108823286
clearly at least one person
did you not get taught binary in school?

Anonymous
05/14/26(Thu)14:35:11 No.108823331

Anonymous 05/14/26(Thu)14:35:11 No.108823331

>>108823286
why defend pointless telemetry

Anonymous
05/14/26(Thu)14:35:49 No.108823334

Anonymous 05/14/26(Thu)14:35:49 No.108823334

File: smart.jpg (36 KB, 432x418)

36 KB JPG

I'm an absolute retard running gemma-4 26b on ooba's textgen. How do I stop the vision model from squishing the images down to 512px? It's supposed to go higher than that, isn't it?

Anonymous
05/14/26(Thu)14:36:12 No.108823338

Anonymous 05/14/26(Thu)14:36:12 No.108823338

>>108823183
Every person that runs a test that downloads 20 files from huggingface gets logged. Crazy.
If you're that worried why haven't you been using a VPN this entire time? The ggml.ai ping is to test if it's connected to the internet before trying to download files directly from huggingface.

Anonymous
05/14/26(Thu)14:40:20 No.108823363

Anonymous 05/14/26(Thu)14:40:20 No.108823363

>>108823338
"it's to test if you're connected to the internet" is the most retarded excuse for telemetry i've ever heard

Anonymous
05/14/26(Thu)14:41:52 No.108823380

Anonymous 05/14/26(Thu)14:41:52 No.108823380

>>108823334
No idea on ooba. On llama-server you have --image-min-tokens and --image-max-tokens to control that. See if you have an equivalent or add it to the parameters if you're running the llama.cpp backend.

Anonymous
05/14/26(Thu)14:44:02 No.108823389

Anonymous 05/14/26(Thu)14:44:02 No.108823389

>computer. connect to the internet
>done
>why are you like this!?

Anonymous
05/14/26(Thu)14:45:32 No.108823400

Anonymous 05/14/26(Thu)14:45:32 No.108823400

>>108823389
I just want to be able to connect to other machines on the information superhighway without those other machines knowing that I connected to them. Is that so much to ask?

Anonymous
05/14/26(Thu)14:47:26 No.108823412

Anonymous 05/14/26(Thu)14:47:26 No.108823412

>>108823338
>The ggml.ai ping is to test if it's connected to the internet before trying to download files
That would be a bullshit excuse.
It's actually to test a function to download remote content, but you can test that against a local server. Or a random one not directly controlled by the repo owners.

Anonymous
05/14/26(Thu)14:47:27 No.108823413

Anonymous 05/14/26(Thu)14:47:27 No.108823413

>>108823400
Just do it quietly.

Anonymous
05/14/26(Thu)14:48:53 No.108823426

Anonymous 05/14/26(Thu)14:48:53 No.108823426

>>108823389
>>108823400
would you old decrepit faggots just die of old age already

Anonymous
05/14/26(Thu)14:55:31 No.108823469

Anonymous 05/14/26(Thu)14:55:31 No.108823469

>>108823412
>That would be a bullshit excuse.
>It's actually to test a function to download remote content
He said the same thing. Are you ok?
>but you can test that against a local server.
Which one exactly? Do you want it to scan your network as well?
>Or a random one not directly controlled by the repo owners.
Sure. Let's make it cia.gov. It'd be funny.

Anonymous
05/14/26(Thu)15:01:52 No.108823511

Anonymous 05/14/26(Thu)15:01:52 No.108823511

File: kld.png (197 KB, 1007x1400)

197 KB PNG

there's not much of a difference, is there.

Anonymous
05/14/26(Thu)15:09:48 No.108823574

Anonymous 05/14/26(Thu)15:09:48 No.108823574

>>108823511
both insectoid-tier intelligence

Anonymous
05/14/26(Thu)15:14:28 No.108823605

Anonymous 05/14/26(Thu)15:14:28 No.108823605

>>108823380
Thank you. There are no settings in the GUI yet, apparently, but those variables are exactly what I needed.

Anonymous
05/14/26(Thu)15:15:06 No.108823611

Anonymous 05/14/26(Thu)15:15:06 No.108823611

>>108821610
Deepl

Anonymous
05/14/26(Thu)15:15:25 No.108823615

Anonymous 05/14/26(Thu)15:15:25 No.108823615

>>108823574
i meant between left and right for each model, but yeah

Anonymous
05/14/26(Thu)16:02:03 No.108823936

Anonymous 05/14/26(Thu)16:02:03 No.108823936

I started noticing my gemma4-powered companion chatbot ending a lot of messages with "I am curious [about followup detail]." What's weird is I didn't ban it from asking followup questions, just the "X? or Y?" formulation. Specifically:

It's ok to ask follow-up questions if they are natural (not solely for the sake of keeping the conversation going), however, never use any "X? Or Y?" form, e.g. "Do you want to X? Or Y?", "Are you X, or are you Y?", "Question X? Do you Y, or are you Z?" etc.

Maybe it's so X-or-Y brained that banning that is, to its mind, the same as banning questions entirely. Anyways I'm chalking this one up in the "sneaky little fucker" column.

Anonymous
05/14/26(Thu)16:05:04 No.108823966

Anonymous 05/14/26(Thu)16:05:04 No.108823966

>https://huggingface.co/circlestone-labs/Anima/tree/main/split_files/diffusion_models
>anima-base-v1.0.safetensors
OH FUCK IT'S OUT

Anonymous
05/14/26(Thu)16:16:23 No.108824041

Anonymous 05/14/26(Thu)16:16:23 No.108824041

>>108823966
I FELL FOR IT AGAIN

Anonymous
05/14/26(Thu)16:18:27 No.108824063

Anonymous 05/14/26(Thu)16:18:27 No.108824063

My VR headset finally arrived in the mail. Was supposed to use it for software development but so far I've only had japanese women sit on my face.

Anonymous
05/14/26(Thu)16:19:06 No.108824068

Anonymous 05/14/26(Thu)16:19:06 No.108824068

Is there any frontend which allows for writing novels? I know mikupad but its abandoned.

Anonymous
05/14/26(Thu)16:20:40 No.108824077

Anonymous 05/14/26(Thu)16:20:40 No.108824077

>>108824068
>nooo, i NEED to updoot the software!

Anonymous
05/14/26(Thu)16:21:01 No.108824079

Anonymous 05/14/26(Thu)16:21:01 No.108824079

>>108824063
Try blade and sorcery
>>108824068
Mikupad is all you need.

Anonymous
05/14/26(Thu)16:22:27 No.108824093

Anonymous 05/14/26(Thu)16:22:27 No.108824093

>>108824077
it crashes llama.cpp with gemma 4. might be user error I guess.

Anonymous
05/14/26(Thu)16:24:03 No.108824106

Anonymous 05/14/26(Thu)16:24:03 No.108824106

>>108824079
Looks cool. Will dew.

Anonymous
05/14/26(Thu)16:25:14 No.108824117

Anonymous 05/14/26(Thu)16:25:14 No.108824117

>>108824068
https://github.com/tealios/errata

Anonymous
05/14/26(Thu)16:38:23 No.108824192

Anonymous 05/14/26(Thu)16:38:23 No.108824192

>>108824117
Why would someone use this "Bun, TanStack Start + React 19, Elysia, Zod v4, Vercel AI SDK v6, Tailwind v4 + shadcn/ui, Vitest" slop over a single html file

Anonymous
05/14/26(Thu)16:38:29 No.108824194

Anonymous 05/14/26(Thu)16:38:29 No.108824194

File: 2026-05-14_201900_seed11_(...).png (1.21 MB, 1024x1024)

1.21 MB PNG

>>108823966
Not sure if there's much difference. Maybe with more use.

Anonymous
05/14/26(Thu)16:41:34 No.108824216

Anonymous 05/14/26(Thu)16:41:34 No.108824216

is it me or is the latest llama.cpp "re-reading" the entire context on each turn now?

Anonymous
05/14/26(Thu)16:43:51 No.108824234

Anonymous 05/14/26(Thu)16:43:51 No.108824234

>>108824192
>TanStack
Oh yeah. That thing had a massive hack yesterday didn't it?

Anonymous
05/14/26(Thu)16:45:53 No.108824246

Anonymous 05/14/26(Thu)16:45:53 No.108824246

>>108824192
Better design + better UX + more features.
Novel writing software looks more like that than notepad.

Anonymous
05/14/26(Thu)16:52:51 No.108824287

Anonymous 05/14/26(Thu)16:52:51 No.108824287

>>108824192
Keeping up with the latest FotM framework churn is the only job security webshitters have.

Anonymous
05/14/26(Thu)16:53:04 No.108824289

Anonymous 05/14/26(Thu)16:53:04 No.108824289

>>108824246
"better UX" already lost me with the agentic nonsense lingo that is "fragments", just call it macros. I know claude orchestrated the entire thing judging from the "Stack" but come on.

Anonymous
05/14/26(Thu)16:56:16 No.108824314

Anonymous 05/14/26(Thu)16:56:16 No.108824314

>>108824289
But it's okay when Orb does it, right?

Anonymous
05/14/26(Thu)16:57:36 No.108824329

Anonymous 05/14/26(Thu)16:57:36 No.108824329

>>108824314
No I also think Orb is retarded both in execution (vibecoded) and the idea, why would you assume otherwise?

Anonymous
05/14/26(Thu)16:59:31 No.108824347

Anonymous 05/14/26(Thu)16:59:31 No.108824347

File: old-ass-sdg-output.png (1.48 MB, 1280x960)

1.48 MB PNG

/sdg/ retard here with a 4070ti. Is Nemo + llama.cpp the best place to start for getting started? Is improving performance as simple as "plug in more GPU and let llama.cpp scale resources"? I'm on linux for what it's worth.

Anonymous
05/14/26(Thu)16:59:35 No.108824348

Anonymous 05/14/26(Thu)16:59:35 No.108824348

>>108824329
>luddite on /lmg/

Anonymous
05/14/26(Thu)17:00:50 No.108824364

Anonymous 05/14/26(Thu)17:00:50 No.108824364

>>108824347
use qwen 3.6

Anonymous
05/14/26(Thu)17:01:53 No.108824373

Anonymous 05/14/26(Thu)17:01:53 No.108824373

>>108824329
>why would you assume otherwise?
I don't remember anyone being hostile at Orb.

Anonymous
05/14/26(Thu)17:02:53 No.108824383

Anonymous 05/14/26(Thu)17:02:53 No.108824383

File: 1755017220514339.png (274 KB, 657x1766)

274 KB PNG

>>108824068
Was vibeslopping my own with logprob selector, screenshot, autosaves, llama-server cache management, client-side phrase banning etc. but it shits its bed when trying get a robust undo/redo system.

Anonymous
05/14/26(Thu)17:03:38 No.108824390

Anonymous 05/14/26(Thu)17:03:38 No.108824390

>>108824348
>>108824373
>let's stall the output with several "agentic passes" that triples your time wasted instead of directly instructing the instruct model what to do
Not a big fan

Anonymous
05/14/26(Thu)17:04:48 No.108824398

Anonymous 05/14/26(Thu)17:04:48 No.108824398

anything that uses those retarded cards is terrible slop for retards who can't write their own prompt

Anonymous
05/14/26(Thu)17:10:57 No.108824446

Anonymous 05/14/26(Thu)17:10:57 No.108824446

>>108823966
Ah dammit, that's the next 16 hour down the drain

Anonymous
05/14/26(Thu)17:11:34 No.108824450

Anonymous 05/14/26(Thu)17:11:34 No.108824450

>>108824390
You didn't really think that prompt engineering wasn't going to automated into obsolescence, did you?

Anonymous
05/14/26(Thu)17:15:17 No.108824475

Anonymous 05/14/26(Thu)17:15:17 No.108824475

>>108824068
https://github.com/akarshkashyap4-ui/NovelWriter

Anonymous
05/14/26(Thu)17:16:00 No.108824483

Anonymous 05/14/26(Thu)17:16:00 No.108824483

>>108824450
They got agents writing the prompts for other agents, but they haven't figured out how to make agents that write the prompt for agents that write prompts for other agents

Anonymous
05/14/26(Thu)17:18:48 No.108824495

Anonymous 05/14/26(Thu)17:18:48 No.108824495

What do you guys use LLMs for that isn't RP/ERP and vibe coding? Wanna expand my horizons. I've seen some people mention giving it access to obsidian but I'm not sure what you would even do with it.

Anonymous
05/14/26(Thu)17:22:37 No.108824522

Anonymous 05/14/26(Thu)17:22:37 No.108824522

>>108824390
>that triples your time wasted
But it's okay when reasoning models do it?

Anonymous
05/14/26(Thu)17:25:10 No.108824532

Anonymous 05/14/26(Thu)17:25:10 No.108824532

File: file.png (231 KB, 1263x1183)

231 KB PNG

nonlocal stuff but lol
they really quantized the shit out of search ai thing huh?

Anonymous
05/14/26(Thu)17:25:49 No.108824538

Anonymous 05/14/26(Thu)17:25:49 No.108824538

>>108824522
The agents ARE using reasoning as well, this must've been an epic zinger in your head

Anonymous
05/14/26(Thu)17:26:09 No.108824545

Anonymous 05/14/26(Thu)17:26:09 No.108824545

>>108824522
NTA but a reasoning model doesn't force you to reload your kv when you're god only knows how deep in an RP.
That's the main dealbreaker with Orb for me, I can deal with slow gen times and waiting, but having to reprocess kv every single turn is hellish. Not for me. Still glad someone tried something new, even if it's not for me, though.

Anonymous
05/14/26(Thu)17:26:37 No.108824548

Anonymous 05/14/26(Thu)17:26:37 No.108824548

>>108824532
I mean, they need it to run tens of thousands of times per second, it's probably both very small and extremely quantized.

Anonymous
05/14/26(Thu)17:26:40 No.108824549

Anonymous 05/14/26(Thu)17:26:40 No.108824549

>>108824532
was expecting to see smooth criminal lol

Anonymous
05/14/26(Thu)17:27:20 No.108824552

Anonymous 05/14/26(Thu)17:27:20 No.108824552

best model for anime?
also koboldcpp wont let me select a model, guess I'll just die

Anonymous
05/14/26(Thu)17:27:32 No.108824556

Anonymous 05/14/26(Thu)17:27:32 No.108824556

>>108824532
https://www.youtube.com/watch?v=h_D3VFfhvs4

Anonymous
05/14/26(Thu)17:28:09 No.108824559

Anonymous 05/14/26(Thu)17:28:09 No.108824559

anima v1 is huge

Anonymous
05/14/26(Thu)17:28:49 No.108824564

Anonymous 05/14/26(Thu)17:28:49 No.108824564

>>108824559
...nothingburger

Anonymous
05/14/26(Thu)17:29:03 No.108824567

Anonymous 05/14/26(Thu)17:29:03 No.108824567

>>108824545
But it was okay when SillyTavern forced you to regenerate the cache when it swapped a lorebook entry?

Anonymous
05/14/26(Thu)17:33:33 No.108824603

Anonymous 05/14/26(Thu)17:33:33 No.108824603

>>108824567
Works fine for me since it's at a shallow depth

Anonymous
05/14/26(Thu)17:36:21 No.108824619

Anonymous 05/14/26(Thu)17:36:21 No.108824619

>>108824548
i wonder how small it is
>>108824549
>>108824556
kek

Anonymous
05/14/26(Thu)17:46:19 No.108824665

Anonymous 05/14/26(Thu)17:46:19 No.108824665

File: 1754911108976875.png (44 KB, 562x479)

44 KB PNG

>>108824495
I use it to manage my notes. I use obsidian. Shuffling information around, tagging notes, cross-linking references, etc. Obsidian is nice for that because it's all markdown, which is easy for models to work with and edit. It also has a CLI that's easy for agents to use and can control the UI to show you notes etc.

A workflow that I've adopted recently is that I have an open text to speech running pretty much all the time I'm alone in my room (aka most of the day). I'll just verbally muse through the things I'm working on or doing. These get automatically appended in rough bullet form to my daily journal.

Then, at the end of the day, I'll review line by line with an agent, and dispatch it off to integrate any points of use into my general body of notes or task tracking.

It's a little tedious to start the habit but as a NEET I'm trying to keep a more ordered life. Forcing myself to think about things more deliberately helps me avoid wasting days away.

Anonymous
05/14/26(Thu)17:47:19 No.108824670

Anonymous 05/14/26(Thu)17:47:19 No.108824670

>>108824556
How were his music videos always so fucking good?

Anonymous
05/14/26(Thu)17:48:46 No.108824686

Anonymous 05/14/26(Thu)17:48:46 No.108824686

>>108824665
Anything more than one single note that spans years and is full of checklists is "productivity hacking" bullshit that only wastes your time. It's the functional equivalent of female note-taking/handwriting in school.

Anonymous
05/14/26(Thu)18:00:23 No.108824762

Anonymous 05/14/26(Thu)18:00:23 No.108824762

>>108824567
I don't use lorebooks, it started because of that, and then became a problem of the keyword not necessarily being tripped until something was already said in chat that contradicts the lorebook entry because it was spoken about obliquely.
It's why I'm of the opinion that a 'lorebook' system should be at post-history depth and use a structured vector search on a graph with configurable hops to avoid both problems.
It's a cunt of a thing to work on.

Anonymous
05/14/26(Thu)18:10:49 No.108824809

Anonymous 05/14/26(Thu)18:10:49 No.108824809

>https://github.com/oobabooga/textgen
>gemma spazzes out in chat mode
>ignores my reply and just responds to the first message in chat-instruct mode
>can't use characters in instruct mode
Guess I'll go back to orb or ST until I figure out how to vibe code my own frontend...

Anonymous
05/14/26(Thu)18:13:41 No.108824823

Anonymous 05/14/26(Thu)18:13:41 No.108824823

https://github.com/cchuter/llama.cpp/tree/feat/v4-port-cuda

I got this running on my 4090 + AM5 + saardows 11. 36T/s prompt processing. I am now at 3% preparation to being disappointed with the result of asking flash to write something similar to 400k tokens of hentai game script I pasted into it.

Anonymous
05/14/26(Thu)18:15:52 No.108824832

Anonymous 05/14/26(Thu)18:15:52 No.108824832

>>108824823
>11. 36T/s prompt processing
That is absolutely brutal, what quant are you running it at and with what batch size?

Anonymous
05/14/26(Thu)18:15:54 No.108824833

Anonymous 05/14/26(Thu)18:15:54 No.108824833

>>108824823
v4 support is in?

Anonymous
05/14/26(Thu)18:19:27 No.108824850

Anonymous 05/14/26(Thu)18:19:27 No.108824850

>>108824823
400,000t/36t/s
11111s/60s/m
185m/60m/h
3h

Anonymous
05/14/26(Thu)18:25:33 No.108824879

Anonymous 05/14/26(Thu)18:25:33 No.108824879

>>108824832
-ub 512 and default 2048

Btw is there some argument that makes it save checkpoints to SSD?

Anonymous
05/14/26(Thu)18:26:18 No.108824883

Anonymous 05/14/26(Thu)18:26:18 No.108824883

>>108824823
It'll get better, MLA was also slow as shit for the first few months.

Anonymous
05/14/26(Thu)18:33:38 No.108824918

Anonymous 05/14/26(Thu)18:33:38 No.108824918

i should get a 5090 for my main pc to do imgen while my server runs my llm
my 3090 is just too slow

Anonymous
05/14/26(Thu)18:33:44 No.108824919

Anonymous 05/14/26(Thu)18:33:44 No.108824919

File: file.png (27 KB, 1177x208)

27 KB PNG

What drives a serious professional software developer to google the price of a GPU for someone in his repo?

Anonymous
05/14/26(Thu)18:38:14 No.108824946

Anonymous 05/14/26(Thu)18:38:14 No.108824946

>>108824919
Would you have been able to make your post if he hadn't make his post first just a week ago? Hu? HU??? I don't think so.

Anonymous
05/14/26(Thu)18:40:30 No.108824959

Anonymous 05/14/26(Thu)18:40:30 No.108824959

File: 1756225645109984.png (183 KB, 1410x888)

183 KB PNG

I like the idea of orb but it doesn't filter out enough slop to be worth the speed decrease desu.

Anonymous
05/14/26(Thu)18:48:18 No.108825012

Anonymous 05/14/26(Thu)18:48:18 No.108825012

File: 2026-05-14_210611_seed51_(...).png (1.68 MB, 1536x864)

1.68 MB PNG

The coherency feels a bit better on average. However I've noticed that, at least on this prompt, it has a color bias, tending strongly towards blue, while the previous version for the same prompt had all kinds of hues it would output in a pretty equal distribution.

Anonymous
05/14/26(Thu)18:50:46 No.108825020

Anonymous 05/14/26(Thu)18:50:46 No.108825020

>>108825012
what model?

Anonymous
05/14/26(Thu)18:50:49 No.108825021

Anonymous 05/14/26(Thu)18:50:49 No.108825021

>>108824946
>Would you have been able to make your post if he hadn't make his post first just a week ago? Hu? HU??? I don't think so.
that's the real question
I apologies, but I will have to close this Thread. Thank you for your effort.

Anonymous
05/14/26(Thu)18:51:33 No.108825025

Anonymous 05/14/26(Thu)18:51:33 No.108825025

>>108825020
here >>108823966

Anonymous
05/14/26(Thu)18:55:33 No.108825048

Anonymous 05/14/26(Thu)18:55:33 No.108825048

File: cringe-tps.png (29 KB, 800x160)

29 KB PNG

>>108824364

Anonymous
05/14/26(Thu)19:12:47 No.108825156

Anonymous 05/14/26(Thu)19:12:47 No.108825156

>>108825048
maybe try the 35b then?

Anonymous
05/14/26(Thu)19:14:16 No.108825164

Anonymous 05/14/26(Thu)19:14:16 No.108825164

>>108824665
Got to agree with >>108824686
Simple is better. I tried all kinds of note taking applications for productivity but they just added effort for no gain. Even markdown checklists are too much of a hassle because they quickly become messy and don't offer any kind of tracking or reminders. Even with your system, you still have to spend time every day reviewing your trascriptions and it doesn't even spare you the mental burden of thinking about tasks you need to do later. You'll burn out eventually and give it up.
Doing it like my parents is only thing I found that worked. A simple spiral notebook jotting down things as soon as they come to mind so I can forget about them and focus on what I need to do at that moment. The most review I do is at the end of the month to collect the tasks I haven't completed and put them into a caldav task list so I can prioritize them and set reminders.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.